summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2018-04-15Merge tag 'kbuild-v4.17-2' of ↵Linus Torvalds5-10/+4
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - pass HOSTLDFLAGS when compiling single .c host programs - build genksyms lexer and parser files instead of using shipped versions - rename *-asn1.[ch] to *.asn1.[ch] for suffix consistency - let the top .gitignore globally ignore artifacts generated by flex, bison, and asn1_compiler - let the top Makefile globally clean artifacts generated by flex, bison, and asn1_compiler - use safer .SECONDARY marker instead of .PRECIOUS to prevent intermediate files from being removed - support -fmacro-prefix-map option to make __FILE__ a relative path - fix # escaping to prepare for the future GNU Make release - clean up deb-pkg by using debian tools instead of handrolled source/changes generation - improve rpm-pkg portability by supporting kernel-install as a fallback of new-kernel-pkg - extend Kconfig listnewconfig target to provide more information * tag 'kbuild-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: extend output of 'listnewconfig' kbuild: rpm-pkg: use kernel-install as a fallback for new-kernel-pkg Kbuild: fix # escaping in .cmd files for future Make kbuild: deb-pkg: split generating packaging and build kbuild: use -fmacro-prefix-map to make __FILE__ a relative path kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers kbuild: rename *-asn1.[ch] to *.asn1.[ch] kbuild: clean up *-asn1.[ch] patterns from top-level Makefile .gitignore: move *-asn1.[ch] patterns to the top-level .gitignore kbuild: add %.dtb.S and %.dtb to 'targets' automatically kbuild: add %.lex.c and %.tab.[ch] to 'targets' automatically genksyms: generate lexer and parser during build instead of shipping kbuild: clean up *.lex.c and *.tab.[ch] patterns from top-level Makefile .gitignore: move *.lex.c *.tab.[ch] patterns to the top-level .gitignore kbuild: use HOSTLDFLAGS for single .c executables
2018-04-15Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds24-777/+1040
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes and updates for x86: - Address a swiotlb regression which was caused by the recent DMA rework and made driver fail because dma_direct_supported() returned false - Fix a signedness bug in the APIC ID validation which caused invalid APIC IDs to be detected as valid thereby bloating the CPU possible space. - Fix inconsisten config dependcy/select magic for the MFD_CS5535 driver. - Fix a corruption of the physical address space bits when encryption has reduced the address space and late cpuinfo updates overwrite the reduced bit information with the original value. - Dominiks syscall rework which consolidates the architecture specific syscall functions so all syscalls can be wrapped with the same macros. This allows to switch x86/64 to struct pt_regs based syscalls. Extend the clearing of user space controlled registers in the entry patch to the lower registers" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Fix signedness bug in APIC ID validity checks x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption x86/olpc: Fix inconsistent MFD_CS5535 configuration swiotlb: Use dma_direct_supported() for swiotlb_ops syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*() syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention syscalls/core, syscalls/x86: Clean up syscall stub naming convention syscalls/x86: Extend register clearing on syscall entry to lower registers syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64 syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32 syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y x86/syscalls: Don't pointlessly reload the system call number x86/mm: Fix documentation of module mapping range with 4-level paging x86/cpuid: Switch to 'static const' specifier
2018-04-15Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds22-105/+329
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti updates from Thomas Gleixner: "Another series of PTI related changes: - Remove the manual stack switch for user entries from the idtentry code. This debloats entry by 5k+ bytes of text. - Use the proper types for the asm/bootparam.h defines to prevent user space compile errors. - Use PAGE_GLOBAL for !PCID systems to gain back performance - Prevent setting of huge PUD/PMD entries when the entries are not leaf entries otherwise the entries to which the PUD/PMD points to and are populated get lost" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pgtable: Don't set huge PUD/PMD on non-leaf entries x86/pti: Leave kernel text global for !PCID x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image x86/pti: Enable global pages for shared areas x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init x86/mm: Comment _PAGE_GLOBAL mystery x86/mm: Remove extra filtering in pageattr code x86/mm: Do not auto-massage page protections x86/espfix: Document use of _PAGE_GLOBAL x86/mm: Introduce "default" kernel PTE mask x86/mm: Undo double _PAGE_PSE clearing x86/mm: Factor out pageattr _PAGE_GLOBAL setting x86/entry/64: Drop idtentry's manual stack switch for user entries x86/uapi: Fix asm/bootparam.h userspace compilation errors
2018-04-15Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-10/+24
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Thomas Gleixner: "A rather large set of perf updates: Kernel: - Fix various initialization issues - Prevent creating [ku]probes for not CAP_SYS_ADMIN users Tooling: - Show only failing syscalls with 'perf trace --failure' (Arnaldo Carvalho de Melo) e.g: See what 'openat' syscalls are failing: # perf trace --failure -e openat 762.323 ( 0.007 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video2) = -1 ENOENT No such file or directory <SNIP N /dev/videoN open attempts... sigh, where is that improvised camera lid?!? > 790.228 ( 0.008 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video63) = -1 ENOENT No such file or directory ^C# - Show information about the event (freq, nr_samples, total period/nr_events) in the annotate --tui and --stdio2 'perf annotate' output, similar to the first line in the 'perf report --tui', but just for the samples for a the annotated symbol (Arnaldo Carvalho de Melo) - Introduce 'perf version --build-options' to show what features were linked, aliased as well as a shorter 'perf -vv' (Jin Yao) - Add a "dso_size" sort order (Kim Phillips) - Remove redundant ')' in the tracepoint output in 'perf trace' (Changbin Du) - Synchronize x86's cpufeatures.h, no effect on toolss (Arnaldo Carvalho de Melo) - Show group details on the title line in the annotate browser and 'perf annotate --stdio2' output, so that the per-event columns can have headers (Arnaldo Carvalho de Melo) - Fixup vertical line separating metrics from instructions and cleaning unused lines at the bottom, both in the annotate TUI browser (Arnaldo Carvalho de Melo) - Remove duplicated 'samples' in lost samples warning in 'perf report' (Arnaldo Carvalho de Melo) - Synchronize i915_drm.h, silencing the perf build process, automagically adding support for the new DRM_I915_QUERY ioctl (Arnaldo Carvalho de Melo) - Make auxtrace_queues__add_buffer() allocate struct buffer, from a patchkit already applied (Adrian Hunter) - Fix the --stdio2/TUI annotate output to include group details, be it for a recorded '{a,b,f}' explicit event group or when forcing group display using 'perf report --group' for a set of events not recorded as a group (Arnaldo Carvalho de Melo) - Fix display artifacts in the ui browser (base class for the annotate and main report/top TUI browser) related to the extra title lines work (Arnaldo Carvalho de Melo) - perf auxtrace refactorings, leftovers from a previously partially processed patchset (Adrian Hunter) - Fix the builtin clang build (Sandipan Das, Arnaldo Carvalho de Melo) - Synchronize i915_drm.h, silencing a perf build warning and in the process automagically adding support for a new ioctl command (Arnaldo Carvalho de Melo) - Fix a strncpy issue in uprobe tracing" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) perf/core: Need CAP_SYS_ADMIN to create k/uprobe with perf_event_open() tracing/uprobe_event: Fix strncpy corner case perf/core: Fix perf_uprobe_init() perf/core: Fix perf_kprobe_init() perf/core: Fix use-after-free in uprobe_perf_close() perf tests clang: Fix function name for clang IR test perf clang: Add support for recent clang versions perf tools: Fix perf builds with clang support perf tools: No need to include namespaces.h in util.h perf hists browser: Remove leftover from row returned from refresh perf hists browser: Show extra_title_lines in the 'D' debug hotkey perf auxtrace: Make auxtrace_queues__add_buffer() do CPU filtering tools headers uapi: Synchronize i915_drm.h perf report: Remove duplicated 'samples' in lost samples warning perf ui browser: Fixup cleaning unused lines at the bottom perf annotate browser: Fixup vertical line separating metrics from instructions perf annotate: Show group details on the title line perf auxtrace: Make auxtrace_queues__add_buffer() allocate struct buffer perf/x86/intel: Move regs->flags EXACT bit init perf trace: Remove redundant ')' ...
2018-04-15Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI bootup fixlet from Thomas Gleixner: "A single fix for an early boot warning caused by invoking this_cpu_has() before SMP initialization" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
2018-04-15Merge tag 'for-linus' of git://github.com/openrisc/linuxLinus Torvalds1-2/+0
Pull OpenRISC fixlet from Stafford Horne: "Just one small thing here, it came in a while back but I didnt have anything in my 4.16 queue, still its the only thing for 4.17 so sending it alone. Small cleanup: remove unused __ARCH_HAVE_MMU define" * tag 'for-linus' of git://github.com/openrisc/linux: openrisc: remove unused __ARCH_HAVE_MMU define
2018-04-15Merge tag 'powerpc-4.17-2' of ↵Linus Torvalds10-40/+63
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix crashes when loading modules built with a different CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic. - Fix busy loops in the OPAL NVRAM driver if we get certain error conditions from firmware. - Remove tlbie trace points from KVM code that's called in real mode, because it causes crashes. - Fix checkstops caused by invalid tlbiel on Power9 Radix. - Ensure the set of CPU features we "know" are always enabled is actually the minimal set when we build with support for firmware supplied CPU features. Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin. * tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features powerpc/mm/radix: Fix checkstops caused by invalid tlbiel KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode powerpc/8xx: Fix build with hugetlbfs enabled powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops powerpc/fscr: Enable interrupts earlier before calling get_user() powerpc/64s: Fix section mismatch warnings from setup_rfi_flush() powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
2018-04-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds18-708/+138
Merge yet more updates from Andrew Morton: - various hotfixes - kexec_file updates and feature work * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits) kernel/kexec_file.c: move purgatories sha256 to common code kernel/kexec_file.c: allow archs to set purgatory load address kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs kernel/kexec_file.c: split up __kexec_load_puragory kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations* kernel/kexec_file.c: search symbols in read-only kexec_purgatory kernel/kexec_file.c: make purgatory_info->ehdr const kernel/kexec_file.c: remove checks in kexec_purgatory_load include/linux/kexec.h: silence compile warnings kexec_file, x86: move re-factored code to generic side x86: kexec_file: clean up prepare_elf64_headers() x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers() x86: kexec_file: purge system-ram walking from prepare_elf64_headers() kexec_file,x86,powerpc: factor out kexec_file_ops functions kexec_file: make use of purgatory optional proc: revalidate misc dentries mm, slab: reschedule cache_reap() on the same CPU ...
2018-04-13kernel/kexec_file.c: move purgatories sha256 to common codePhilipp Rudo5-305/+16
The code to verify the new kernels sha digest is applicable for all architectures. Move it to common code. One problem is the string.c implementation on x86. Currently sha256 includes x86/boot/string.h which defines memcpy and memset to be gcc builtins. By moving the sha256 implementation to common code and changing the include to linux/string.h both functions are no longer defined. Thus definitions have to be provided in x86/purgatory/string.c Link: http://lkml.kernel.org/r/20180321112751.22196-12-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kernel/kexec_file.c: allow archs to set purgatory load addressPhilipp Rudo2-8/+9
For s390 new kernels are loaded to fixed addresses in memory before they are booted. With the current code this is a problem as it assumes the kernel will be loaded to an 'arbitrary' address. In particular, kexec_locate_mem_hole searches for a large enough memory region and sets the load address (kexec_bufer->mem) to it. Luckily there is a simple workaround for this problem. By returning 1 in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off. This allows the architecture to set kbuf->mem by hand. While the trick works fine for the kernel it does not for the purgatory as here the architectures don't have access to its kexec_buffer. Give architectures access to the purgatories kexec_buffer by changing kexec_load_purgatory to take a pointer to it. With this change architectures have access to the buffer and can edit it as they need. A nice side effect of this change is that we can get rid of the purgatory_info->purgatory_load_address field. As now the information stored there can directly be accessed from kbuf->mem. Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory loadPhilipp Rudo1-4/+6
The current code uses the sh_offset field in purgatory_info->sechdrs to store a pointer to the current load address of the section. Depending whether the section will be loaded or not this is either a pointer into purgatory_info->purgatory_buf or kexec_purgatory. This is not only a violation of the ELF standard but also makes the code very hard to understand as you cannot tell if the memory you are using is read-only or not. Remove this misuse and store the offset of the section in pugaroty_info->purgatory_buf in sh_offset. Link: http://lkml.kernel.org/r/20180321112751.22196-10-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*Philipp Rudo1-36/+20
When the relocations are applied to the purgatory only the section the relocations are applied to is writable. The other sections, i.e. the symtab and .rel/.rela, are in read-only kexec_purgatory. Highlight this by marking the corresponding variables as 'const'. While at it also change the signatures of arch_kexec_apply_relocations* to take section pointers instead of just the index of the relocation section. This removes the second lookup and sanity check of the sections in arch code. Link: http://lkml.kernel.org/r/20180321112751.22196-6-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kexec_file, x86: move re-factored code to generic sideAKASHI Takahiro1-188/+7
In the previous patches, commonly-used routines, exclude_mem_range() and prepare_elf64_headers(), were carved out. Now place them in kexec common code. A prefix "crash_" is given to each of their names to avoid possible name collisions. Link: http://lkml.kernel.org/r/20180306102303.9063-8-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13x86: kexec_file: clean up prepare_elf64_headers()AKASHI Takahiro1-11/+7
Removing bufp variable in prepare_elf64_headers() makes the code simpler and more understandable. Link: http://lkml.kernel.org/r/20180306102303.9063-7-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem bufferAKASHI Takahiro1-51/+31
While CRASH_MAX_RANGES (== 16) seems to be good enough, fixed-number array is not a good idea in general. In this patch, size of crash_mem buffer is calculated as before and the buffer is now dynamically allocated. This change also allows removing crash_elf_data structure. Link: http://lkml.kernel.org/r/20180306102303.9063-6-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()AKASHI Takahiro1-12/+12
The code guarded by CONFIG_X86_64 is necessary on some architectures which have a dedicated kernel mapping outside of linear memory mapping. (arm64 is among those.) In this patch, an additional argument, kernel_map, is added to enable/ disable the code removing #ifdef. Link: http://lkml.kernel.org/r/20180306102303.9063-5-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13x86: kexec_file: purge system-ram walking from prepare_elf64_headers()AKASHI Takahiro1-63/+58
While prepare_elf64_headers() in x86 looks pretty generic for other architectures' use, it contains some code which tries to list crash memory regions by walking through system resources, which is not always architecture agnostic. To make this function more generic, the related code should be purged. In this patch, prepare_elf64_headers() simply scans crash_mem buffer passed and add all the listed regions to elf header as a PT_LOAD segment. So walk_system_ram_res(prepare_elf64_headers_callback) have been moved forward before prepare_elf64_headers() where the callback, prepare_elf64_headers_callback(), is now responsible for filling up crash_mem buffer. Meanwhile exclude_elf_header_ranges() used to be called every time in this callback it is rather redundant and now called only once in prepare_elf_headers() as well. Link: http://lkml.kernel.org/r/20180306102303.9063-4-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kexec_file,x86,powerpc: factor out kexec_file_ops functionsAKASHI Takahiro6-83/+9
As arch_kexec_kernel_image_{probe,load}(), arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig() are almost duplicated among architectures, they can be commonalized with an architecture-defined kexec_file_ops array. So let's factor them out. Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13kexec_file: make use of purgatory optionalAKASHI Takahiro2-0/+6
Patch series "kexec_file, x86, powerpc: refactoring for other architecutres", v2. This is a preparatory patchset for adding kexec_file support on arm64. It was originally included in a arm64 patch set[1], but Philipp is also working on their kexec_file support on s390[2] and some changes are now conflicting. So these common parts were extracted and put into a separate patch set for better integration. What's more, my original patch#4 was split into a few small chunks for easier review after Dave's comment. As such, the resulting code is basically identical with my original, and the only *visible* differences are: - renaming of _kexec_kernel_image_probe() and _kimage_file_post_load_cleanup() - change one of types of arguments at prepare_elf64_headers() Those, unfortunately, require a couple of trivial changes on the rest (#1, #6 to #13) of my arm64 kexec_file patch set[1]. Patch #1 allows making a use of purgatory optional, particularly useful for arm64. Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load, verify_sig}() and arch_kimage_file_post_load_cleanup() across architectures. Patches #3-#7 are also intended to generalize parse_elf64_headers(), along with exclude_mem_range(), to be made best re-use of. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html [2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html This patch (of 7): On arm64, crash dump kernel's usable memory is protected by *unmapping* it from kernel virtual space unlike other architectures where the region is just made read-only. It is highly unlikely that the region is accidentally corrupted and this observation rationalizes that digest check code can also be dropped from purgatory. The resulting code is so simple as it doesn't require a bit ugly re-linking/relocation stuff, i.e. arch_kexec_apply_relocations_add(). Please see: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html All that the purgatory does is to shuffle arguments and jump into a new kernel, while we still need to have some space for a hash value (purgatory_sha256_digest) which is never checked against. As such, it doesn't make sense to have trampline code between old kernel and new kernel on arm64. This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and allows related code to be compiled in only if necessary. [takahiro.akashi@linaro.org: fix trivial screwup] Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13mm/gup.c: document return valueMichael S. Tsirkin4-0/+10
__get_user_pages_fast handles errors differently from get_user_pages_fast: the former always returns the number of pages pinned, the later might return a negative error code. Link: http://lkml.kernel.org/r/1522962072-182137-6-git-send-email-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13Merge tag 'sh-for-4.17' of git://git.libc.org/linux-shLinus Torvalds9-38/+84
Pull arch/sh updates from Rich Felker: "Fixes for bugs in futex, device tree, and userspace breakpoint traps, and for PCI issues on SH7786" * tag 'sh-for-4.17' of git://git.libc.org/linux-sh: arch/sh: pcie-sh7786: handle non-zero DMA offset arch/sh: pcie-sh7786: adjust the memory mapping arch/sh: pcie-sh7786: adjust PCI MEM and IO regions arch/sh: pcie-sh7786: exclude unusable PCI MEM areas arch/sh: pcie-sh7786: mark unavailable PCI resource as disabled arch/sh: pci: don't use disabled resources arch/sh: make the DMA mapping operations observe dev->dma_pfn_offset arch/sh: add sh7786_mm_sel() function sh: fix debug trap failure to process signals before return to user sh: fix memory corruption of unflattened device tree sh: fix futex FUTEX_OP_SET op on userspace addresses
2018-04-13Merge tag 'arm64-upstream' of ↵Linus Torvalds10-199/+242
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull more arm64 updates from Will Deacon: "A few late updates to address some issues arising from conflicts with other trees: - Removal of Qualcomm-specific Spectre-v2 mitigation in favour of the generic SMCCC-based firmware call - Fix EL2 hardening capability checking, which was bodged to reduce conflicts with the KVM tree - Add some currently unused assembler macros for managing SIMD registers which will be used by some crypto code in the next merge window" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: assembler: add macros to conditionally yield the NEON under PREEMPT arm64: assembler: add utility macros to push/pop stack frames arm64: Move the content of bpi.S to hyp-entry.S arm64: Get rid of __smccc_workaround_1_hvc_* arm64: capabilities: Rework EL2 vector hardening entry arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening
2018-04-13Merge branch 'for-linus' of ↵Linus Torvalds17-636/+174
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Martin Schwidefsky: "Three notable larger changes next to the usual bug fixing: - update the email addresses in MAINTAINERS for the s390 folks to use the simpler linux.ibm.com domain instead of the old linux.vnet.ibm.com - an update for the zcrypt device driver that removes some old and obsolete interfaces and add support for up to 256 crypto adapters - a rework of the IPL aka boot code" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (23 commits) s390: correct nospec auto detection init order s390/zcrypt: Support up to 256 crypto adapters. s390/zcrypt: Remove deprecated zcrypt proc interface. s390/zcrypt: Remove deprecated ioctls. s390/zcrypt: Make ap init functions static. MAINTAINERS: update s390 maintainers email addresses s390/ipl: remove reipl_method and dump_method s390/ipl: correct kdump reipl block checksum calculation s390/ipl: remove non-existing functions declaration s390: assume diag308 set always works s390/ipl: avoid adding scpdata to cmdline during ftp/dvd boot s390/ipl: correct ipl parmblock valid checks s390/ipl: rely on diag308 store to get ipl info s390/ipl: move ipl_flags to ipl.c s390/ipl: get rid of ipl_ssid and ipl_devno s390/ipl: unite diag308 and scsi boot ipl blocks s390/ipl: ensure loadparm valid flag is set s390/qdio: lock device while installing IRQ handler s390/qdio: clear intparm during shutdown s390/ccwgroup: require at least one ccw device ...
2018-04-13powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU featuresMichael Ellerman2-15/+22
The cpu_has_feature() mechanism has an optimisation where at build time we construct a mask of the CPU feature bits that will always be true for the given .config, based on the platform/bitness/etc. that we are building for. That is incompatible with DT CPU features, where the set of CPU features is dependent on feature flags that are given to us by firmware. The result is that some feature bits can not be *disabled* by DT CPU features. Or more accurately, they can be disabled but they will still appear in the ALWAYS mask, meaning cpu_has_feature() will always return true for them. In the past this hasn't really been a problem because on Book3S 64 (where we support DT CPU features), the set of ALWAYS bits has been very small. That was because we always built for POWER4 and later, meaning the set of common bits was small. The only bit that could be cleared by DT CPU features that was also in the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in the alignment handler to create a fake DSISR. That code was itself deleted in 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults") (Sep 2017). However the set of ALWAYS features changed with the recent commit db5ae1c155af ("powerpc/64s: Refine feature sets for little endian builds") which restricted the set of feature flags when building little endian to Power7 or later. That caused the ALWAYS mask to become much larger for little endian builds. The result is that the following feature bits can currently not be *disabled* by DT CPU features: CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT, CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY, CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR. To fix it we need to mask the set of ALWAYS features with the base set of DT CPU features, ie. the features that are always enabled by DT CPU features. That way there are no bits in the ALWAYS mask that are not also always set by DT CPU features. Fixes: db5ae1c155af ("powerpc/64s: Refine feature sets for little endian builds") Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-12Merge branch 'parisc-4.17-2' of ↵Linus Torvalds10-156/+58
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: - fix panic when halting system via "shutdown -h now" - drop own coding in favour of generic CONFIG_COMPAT_BINFMT_ELF implementation - add FPE_CONDTRAP constant: last outstanding parisc-specific cleanup for Eric Biedermans siginfo patches - move some functions to .init and some to .text.hot linker sections * 'parisc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Prevent panic at system halt parisc: Switch to generic COMPAT_BINFMT_ELF parisc: Move cache flush functions into .text.hot section parisc/signal: Add FPE_CONDTRAP for conditional trap handling
2018-04-12arch/sh: pcie-sh7786: handle non-zero DMA offsetThomas Petazzoni1-0/+8
On SuperH, the base of the physical memory might be different from zero. In this case, PCI address zero will map to a non-zero physical address. In order to make sure that the DMA mapping API takes care of this DMA offset, we must fill in the dev->dma_pfn_offset field for PCI devices. This gets done in the pcibios_bus_add_device() hook, called for each new PCI device detected. The dma_pfn_offset global variable is re-calculated for every PCI controller available on the platform, but that's not an issue because its value will each time be exactly the same, as it only depends on the memory start address and memory size. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12arch/sh: pcie-sh7786: adjust the memory mappingThomas Petazzoni1-5/+14
The code setting up the PCI -> SuperHighway mapping doesn't take into account the fact that the address stored in PCIELARx must be aligned with the size stored in PCIELAMRx. For example, when your physical memory starts at 0x0800_0000 (128 MB), a size of 64 MB or 128 MB is fine. However, if you have 256 MB of memory, it doesn't work because the base address is not aligned on the size. In such situation, we have to round down the base address to make sure it is aligned on the size of the area. For for a 0x0800_0000 base address with 256 MB of memory, we will round down to 0x0, and extend the size of the mapping to 512 MB. This allows the mapping to work on platforms that have 256 MB of RAM. The current setup would only work with 128 MB of RAM or less. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12arch/sh: pcie-sh7786: adjust PCI MEM and IO regionsThomas Petazzoni1-18/+18
The current definition of the PCIe IO and MEM resources for SH7786 doesn't match what the datasheet says. For example, for PCIe0 0xfe100000 is advertised by the datasheet as a PCI IO region, while 0xfd000000 is advertised as a PCI MEM region. The code currently inverts the two. The SH4A_PCIEPARL and SH4A_PCIEPTCTLR registers allow to define the base address and role of the different regions (including whether it's a MEM or IO region). However, practical experience on a SH7786 shows that if 0xfe100000 is used for LEL and 0xfd000000 for IO, a PCIe device using two MEM BARs cannot be accessed at all. Simply using 0xfe100000 for IO and 0xfd000000 for MEM makes the PCIe device accessible. It is very likely that this was never seen because there are two other PCI MEM region listed in the resources. However, for different reasons, none of the two other MEM regions are usable on the specific SH7786 platform the problem was encountered. Therefore, the last MEM region at 0xfe100000 was used to place the BARs, making the device non-functional. This commit therefore adjusts those PCI MEM and IO resources definitions so that they match what the datasheet says. They have only been tested with PCIe 0. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12arch/sh: pcie-sh7786: exclude unusable PCI MEM areasThomas Petazzoni1-0/+12
Depending on the physical memory layout, some PCI MEM areas are not usable. According to the SH7786 datasheet, the PCI MEM area from 1000_0000 to 13FF_FFFF is only usable if the physical memory layout (in MMSELR) is 1, 2, 5 or 6. In all other configurations, this PCI MEM area is not usable (because it overlaps with DRAM). Therefore, this commit adjusts the PCI SH7786 initialization to mark the relevant PCI resource as IORESOURCE_DISABLED if we can't use it. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12arch/sh: pcie-sh7786: mark unavailable PCI resource as disabledThomas Petazzoni1-0/+3
Some PCI MEM resources are marked as IORESOURCE_MEM_32BIT, which means they are only usable when the SH core runs in 32-bit mode. In 29-bit mode, such memory regions are not usable. The existing code for SH7786 properly skips such regions when configuring the PCIe controller registers. However, because such regions are still described in the resource array, the pcibios_scanbus() function in the SuperH pci.c will register them to the PCI core. Due to this, the PCI core will allocate MEM areas from this resource, and assign BARs pointing to this area, even though it's unusable. In order to prevent this from happening, we mark such regions as IORESOURCE_DISABLED, which tells the SuperH pci.c pcibios_scanbus() function to skip them. Note that we separate marking the region as disabled from skipping it, because other regions will be marked as disabled in follow-up patches. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12arch/sh: pci: don't use disabled resourcesThomas Petazzoni1-0/+5
In pcibios_scanbus(), we provide to the PCI core the usable MEM and IO regions using pci_add_resource_offset(). We travel through all resources available in the "struct pci_channel". Also, in register_pci_controller(), we travel through all resources to request them, making sure they don't conflict with already requested resources. However, some resources may be disabled, in which case they should not be requested nor provided to the PCI core. In the current situation, none of the resources are disabled. However, follow-up patches in this series will make some resources disabled, making this preliminary change necessary. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12arch/sh: make the DMA mapping operations observe dev->dma_pfn_offsetThomas Petazzoni2-4/+7
Some devices may have a non-zero DMA offset, i.e an offset between the DMA address and the physical address. Such an offset can be encoded into the dma_pfn_offset field of "struct device", but the SuperH implementation of the DMA mapping API does not observe this information. This commit fixes that by ensuring the DMA address is properly calculated depending on this DMA offset. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12arch/sh: add sh7786_mm_sel() functionThomas Petazzoni1-0/+7
The SH7786 has different physical memory layout configurations, configurable through the MMSELR register. The configuration is typically defined by the bootloader, so Linux generally doesn't care. Except that depending on the configuration, some PCI MEM areas may or may not be available. This commit adds a helper function that allows to retrieve the current physical memory layout configuration. It will be used in a following patch to exclude unusable PCI MEM areas during the PCI initialization. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12sh: fix debug trap failure to process signals before return to userRich Felker1-1/+1
When responding to a debug trap (breakpoint) in userspace, the kernel's trap handler raised SIGTRAP but returned from the trap via a code path that ignored pending signals, resulting in an infinite loop re-executing the trapping instruction. Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12sh: fix memory corruption of unflattened device treeRich Felker2-6/+8
unflatten_device_tree() makes use of memblock allocation, and therefore must be called before paging_init() migrates the memblock allocation data to the bootmem framework. Otherwise the record of the allocation for the expanded device tree will be lost, and will eventually be clobbered when allocated for another use. Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12sh: fix futex FUTEX_OP_SET op on userspace addressesAurelien Jarno1-4/+1
Commit 00b73d8d1b71 ("sh: add working futex atomic ops on userspace addresses for smp") changed the futex_atomic_op_inuser function to use a loop. In case of the FUTEX_OP_SET op with a userspace address containing a value different of 0, this loop is an endless loop. Fix that by loading the value of oldval from the userspace before doing the cmpxchg op, also for the FUTEX_OP_SET case. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12Merge tag 'for-linus-4.17-rc1-tag' of ↵Linus Torvalds2-5/+7
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "A few fixes of Xen related core code and drivers" * tag 'for-linus-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvh: Indicate XENFEAT_linux_rsdp_unrestricted to Xen xen/acpi: off by one in read_acpi_id() xen/acpi: upload _PSD info for non Dom0 CPUs too x86/xen: Delay get_cpu_cap until stack canary is established xen: xenbus_dev_frontend: Verify body of XS_TRANSACTION_END xen: xenbus: Catch closing of non existent transactions xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling
2018-04-12Merge tag 'microblaze-4.17-rc1' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds3-93/+15
Pull microblaze updates from Michal Simek: "Use generic pci_mmap_resource_range()" * tag 'microblaze-4.17-rc1' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Use generic pci_mmap_resource_range() microblaze: Provide pgprot_device/writecombine macros for nommu
2018-04-12powerpc/mm/radix: Fix checkstops caused by invalid tlbielMichael Ellerman1-3/+2
In tlbiel_radix_set_isa300() we use the PPC_TLBIEL() macro to construct tlbiel instructions. The instruction takes 5 fields, two of which are registers, and the others are constants. But because it's constructed with inline asm the compiler doesn't know that. We got the constraint wrong on the 'r' field, using "r" tells the compiler to put the value in a register. The value we then get in the macro is the *register number*, not the value of the field. That means when we mask the register number with 0x1 we get 0 or 1 depending on which register the compiler happens to put the constant in, eg: li r10,1 tlbiel r8,r9,2,0,0 li r7,1 tlbiel r10,r6,0,0,1 If we're unlucky we might generate an invalid instruction form, for example RIC=0, PRS=1 and R=0, tlbiel r8,r7,0,1,0, this has been observed to cause machine checks: Oops: Machine check, sig: 7 [#1] CPU: 24 PID: 0 Comm: swapper NIP: 00000000000385f4 LR: 000000000100ed00 CTR: 000000000000007f REGS: c00000000110bb40 TRAP: 0200 MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48002222 XER: 20040000 CFAR: 00000000000385d0 DAR: 0000000000001c00 DSISR: 00000200 SOFTE: 1 If the machine check happens early in boot while we have MSR_ME=0 it will escalate into a checkstop and kill the box entirely. To fix it we could change the inline asm constraint to "i" which tells the compiler the value is a constant. But a better fix is to just pass a literal 1 into the macro, which bypasses any problems with inline asm constraints. Fixes: d4748276ae14 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-04-12Merge branch 'WIP.x86/asm' into x86/urgent, because the topic is readyIngo Molnar2589-417469/+16194
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12x86/pgtable: Don't set huge PUD/PMD on non-leaf entriesJoerg Roedel1-0/+9
The pmd_set_huge() and pud_set_huge() functions are used from the generic ioremap() code to establish large mappings where this is possible. But the generic ioremap() code does not check whether the PMD/PUD entries are already populated with a non-leaf entry, so that any page-table pages these entries point to will be lost. Further, on x86-32 with SHARED_KERNEL_PMD=0, this causes a BUG_ON() in vmalloc_sync_one() when PMD entries are synced from swapper_pg_dir to the current page-table. This happens because the PMD entry from swapper_pg_dir was promoted to a huge-page entry while the current PGD still contains the non-leaf entry. Because both entries are present and point to a different page, the BUG_ON() triggers. This was actually triggered with pti-x32 enabled in a KVM virtual machine by the graphics driver. A real and better fix for that would be to improve the page-table handling in the generic ioremap() code. But that is out-of-scope for this patch-set and left for later work. Reported-by: David H. Gutteridge <dhgutteridge@sympatico.ca> Signed-off-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Waiman Long <llong@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180411152437.GC15462@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12x86/pti: Leave kernel text global for !PCIDDave Hansen3-4/+82
Global pages are bad for hardening because they potentially let an exploit read the kernel image via a Meltdown-style attack which makes it easier to find gadgets. But, global pages are good for performance because they reduce TLB misses when making user/kernel transitions, especially when PCIDs are not available, such as on older hardware, or where a hypervisor has disabled them for some reason. This patch implements a basic, sane policy: If you have PCIDs, you only map a minimal amount of kernel text global. If you do not have PCIDs, you map all kernel text global. This policy effectively makes PCIDs something that not only adds performance but a little bit of hardening as well. I ran a simple "lseek" microbenchmark[1] to test the benefit on a modern Atom microserver. Most of the benefit comes from applying the series before this patch ("entry only"), but there is still a signifiant benefit from this patch. No Global Lines (baseline ): 6077741 lseeks/sec 88 Global Lines (entry only): 7528609 lseeks/sec (+23.9%) 94 Global Lines (this patch): 8433111 lseeks/sec (+38.8%) [1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205518.E3D989EB@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel imageDave Hansen3-10/+35
Summary: In current kernels, with PTI enabled, no pages are marked Global. This potentially increases TLB misses. But, the mechanism by which the Global bit is set and cleared is rather haphazard. This patch makes the process more explicit. In the end, it leaves us with Global entries in the page tables for the areas truly shared by userspace and kernel and increases TLB hit rates. The place this patch really shines in on systems without PCIDs. In this case, we are using an lseek microbenchmark[1] to see how a reasonably non-trivial syscall behaves. Higher is better: No Global pages (baseline): 6077741 lseeks/sec 88 Global Pages (this set): 7528609 lseeks/sec (+23.9%) On a modern Skylake desktop with PCIDs, the benefits are tangible, but not huge for a kernel compile (lower is better): No Global pages (baseline): 186.951 seconds time elapsed ( +- 0.35% ) 28 Global pages (this set): 185.756 seconds time elapsed ( +- 0.09% ) -1.195 seconds (-0.64%) I also re-checked everything using the lseek1 test[1]: No Global pages (baseline): 15783951 lseeks/sec 28 Global pages (this set): 16054688 lseeks/sec +270737 lseeks/sec (+1.71%) The effect is more visible, but still modest. Details: The kernel page tables are inherited from head_64.S which rudely marks them as _PAGE_GLOBAL. For PTI, we have been relying on the grace of $DEITY and some insane behavior in pageattr.c to clear _PAGE_GLOBAL. This patch tries to do better. First, stop filtering out "unsupported" bits from being cleared in the pageattr code. It's fine to filter out *setting* these bits but it is insane to keep us from clearing them. Then, *explicitly* go clear _PAGE_GLOBAL from the kernel identity map. Do not rely on pageattr to do it magically. After this patch, we can see that "GLB" shows up in each copy of the page tables, that we have the same number of global entries in each and that they are the *same* entries. /sys/kernel/debug/page_tables/current_kernel:11 /sys/kernel/debug/page_tables/current_user:11 /sys/kernel/debug/page_tables/kernel:11 9caae8ad6a1fb53aca2407ec037f612d current_kernel.GLB 9caae8ad6a1fb53aca2407ec037f612d current_user.GLB 9caae8ad6a1fb53aca2407ec037f612d kernel.GLB A quick visual audit also shows that all the entries make sense. 0xfffffe0000000000 is the cpu_entry_area and 0xffffffff81c00000 is the entry/exit text: 0xfffffe0000000000-0xfffffe0000002000 8K ro GLB NX pte 0xfffffe0000002000-0xfffffe0000003000 4K RW GLB NX pte 0xfffffe0000003000-0xfffffe0000006000 12K ro GLB NX pte 0xfffffe0000006000-0xfffffe0000007000 4K ro GLB x pte 0xfffffe0000007000-0xfffffe000000d000 24K RW GLB NX pte 0xfffffe000002d000-0xfffffe000002e000 4K ro GLB NX pte 0xfffffe000002e000-0xfffffe000002f000 4K RW GLB NX pte 0xfffffe000002f000-0xfffffe0000032000 12K ro GLB NX pte 0xfffffe0000032000-0xfffffe0000033000 4K ro GLB x pte 0xfffffe0000033000-0xfffffe0000039000 24K RW GLB NX pte 0xffffffff81c00000-0xffffffff81e00000 2M ro PSE GLB x pmd [1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205517.C80FBE05@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12x86/pti: Enable global pages for shared areasDave Hansen2-2/+35
The entry/exit text and cpu_entry_area are mapped into userspace and the kernel. But, they are not _PAGE_GLOBAL. This creates unnecessary TLB misses. Add the _PAGE_GLOBAL flag for these areas. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205515.2977EE7D@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12x86/mm: Do not forbid _PAGE_RW before init for __ro_after_initDave Hansen1-2/+4
__ro_after_init data gets stuck in the .rodata section. That's normally fine because the kernel itself manages the R/W properties. But, if we run __change_page_attr() on an area which is __ro_after_init, the .rodata checks will trigger and force the area to be immediately read-only, even if it is early-ish in boot. This caused problems when trying to clear the _PAGE_GLOBAL bit for these area in the PTI code: it cleared _PAGE_GLOBAL like I asked, but also took it up on itself to clear _PAGE_RW. The kernel then oopses the next time it wrote to a __ro_after_init data structure. To fix this, add the kernel_set_to_readonly check, just like we have for kernel text, just a few lines below in this function. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205514.8D898241@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12x86/mm: Comment _PAGE_GLOBAL mysteryDave Hansen1-1/+10
I was mystified as to where the _PAGE_GLOBAL in the kernel page tables for kernel text came from. I audited all the places I could find, but I missed one: head_64.S. The page tables that we create in here live for a long time, and they also have _PAGE_GLOBAL set, despite whether the processor supports it or not. It's harmless, and we got *lucky* that the pageattr code accidentally clears it when we wipe it out of __supported_pte_mask and then later try to mark kernel text read-only. Comment some of these properties to make it easier to find and understand in the future. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205513.079BB265@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12x86/mm: Remove extra filtering in pageattr codeDave Hansen1-4/+2
The pageattr code has a mode where it can set or clear PTE bits in existing PTEs, so the page protections of the *new* PTEs come from one of two places: 1. The set/clear masks: cpa->mask_clr / cpa->mask_set 2. The existing PTE We filter ->mask_set/clr for supported PTE bits at entry to __change_page_attr() so we never need to filter them again. The only other place permissions can come from is an existing PTE and those already presumably have good bits. We do not need to filter them again. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205511.BC072352@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12x86/mm: Do not auto-massage page protectionsDave Hansen10-12/+75
A PTE is constructed from a physical address and a pgprotval_t. __PAGE_KERNEL, for instance, is a pgprot_t and must be converted into a pgprotval_t before it can be used to create a PTE. This is done implicitly within functions like pfn_pte() by massage_pgprot(). However, this makes it very challenging to set bits (and keep them set) if your bit is being filtered out by massage_pgprot(). This moves the bit filtering out of pfn_pte() and friends. For users of PAGE_KERNEL*, filtering will be done automatically inside those macros but for users of __PAGE_KERNEL*, they need to do their own filtering now. Note that we also just move pfn_pte/pmd/pud() over to check_pgprot() instead of massage_pgprot(). This way, we still *look* for unsupported bits and properly warn about them if we find them. This might happen if an unfiltered __PAGE_KERNEL* value was passed in, for instance. - printk format warning fix from: Arnd Bergmann <arnd@arndb.de> - boot crash fix from: Tom Lendacky <thomas.lendacky@amd.com> - crash bisected by: Mike Galbraith <efault@gmx.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reported-and-fixed-by: Arnd Bergmann <arnd@arndb.de> Fixed-by: Tom Lendacky <thomas.lendacky@amd.com> Bisected-by: Mike Galbraith <efault@gmx.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-11Merge tag 'pm-4.17-rc1-2' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These include one big-ticket item which is the rework of the idle loop in order to prevent CPUs from spending too much time in shallow idle states. It reduces idle power on some systems by 10% or more and may improve performance of workloads in which the idle loop overhead matters. This has been in the works for several weeks and it has been tested and reviewed quite thoroughly. Also included are changes that finalize the cpufreq cleanup moving frequency table validation from drivers to the core, a few fixes and cleanups of cpufreq drivers, a cpuidle documentation update and a PM QoS core update to mark the expected switch fall-throughs in it. Specifics: - Rework the idle loop in order to prevent CPUs from spending too much time in shallow idle states by making it stop the scheduler tick before putting the CPU into an idle state only if the idle duration predicted by the idle governor is long enough. That required the code to be reordered to invoke the idle governor before stopping the tick, among other things (Rafael Wysocki, Frederic Weisbecker, Arnd Bergmann). - Add the missing description of the residency sysfs attribute to the cpuidle documentation (Prashanth Prakash). - Finalize the cpufreq cleanup moving frequency table validation from drivers to the core (Viresh Kumar). - Fix a clock leak regression in the armada-37xx cpufreq driver (Gregory Clement). - Fix the initialization of the CPU performance data structures for shared policies in the CPPC cpufreq driver (Shunyong Yang). - Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers a bit (Viresh Kumar, Rafael Wysocki). - Mark the expected switch fall-throughs in the PM QoS core (Gustavo Silva)" * tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits) tick-sched: avoid a maybe-uninitialized warning cpufreq: Drop cpufreq_table_validate_and_show() cpufreq: SCMI: Don't validate the frequency table twice cpufreq: CPPC: Initialize shared perf capabilities of CPUs cpufreq: armada-37xx: Fix clock leak cpufreq: CPPC: Don't set transition_latency cpufreq: ti-cpufreq: Use builtin_platform_driver() cpufreq: intel_pstate: Do not include debugfs.h PM / QoS: mark expected switch fall-throughs cpuidle: Add definition of residency to sysfs documentation time: hrtimer: Use timerqueue_iterate_next() to get to the next timer nohz: Avoid duplication of code related to got_idle_tick nohz: Gather tick_sched booleans under a common flag field cpuidle: menu: Avoid selecting shallow states with stopped tick cpuidle: menu: Refine idle state selection for running tick sched: idle: Select idle state before stopping the tick time: hrtimer: Introduce hrtimer_next_event_without() time: tick-sched: Split tick_nohz_stop_sched_tick() cpuidle: Return nohz hint from cpuidle_select() jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC ...
2018-04-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/rw/umlLinus Torvalds23-325/+3395
Pull UML updates from Richard Weinberger: - a new and faster epoll based IRQ controller and NIC driver - misc fixes and janitorial updates * git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: Fix vector raw inintialization logic Migrate vector timers to new timer API um: Compile with modern headers um: vector: Fix an error handling path in 'vector_parse()' um: vector: Fix a memory allocation check um: vector: fix missing unlock on error in vector_net_open() um: Add missing EXPORT for free_irq_by_fd() High Performance UML Vector Network Driver Epoll based IRQ controller um: Use POSIX ucontext_t instead of struct ucontext um: time: Use timespec64 for persistent clock um: Restore symbol versions for __memcpy and memcpy