diff options
Diffstat (limited to 'Documentation/x86')
-rw-r--r-- | Documentation/x86/intel_mpx.rst | 252 |
1 files changed, 0 insertions, 252 deletions
diff --git a/Documentation/x86/intel_mpx.rst b/Documentation/x86/intel_mpx.rst deleted file mode 100644 index 387a640941a6..000000000000 --- a/Documentation/x86/intel_mpx.rst +++ /dev/null @@ -1,252 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -=========================================== -Intel(R) Memory Protection Extensions (MPX) -=========================================== - -Intel(R) MPX Overview -===================== - -Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability -introduced into Intel Architecture. Intel MPX provides hardware features -that can be used in conjunction with compiler changes to check memory -references, for those references whose compile-time normal intentions are -usurped at runtime due to buffer overflow or underflow. - -You can tell if your CPU supports MPX by looking in /proc/cpuinfo:: - - cat /proc/cpuinfo | grep ' mpx ' - -For more information, please refer to Intel(R) Architecture Instruction -Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection -Extensions. - -Note: As of December 2014, no hardware with MPX is available but it is -possible to use SDE (Intel(R) Software Development Emulator) instead, which -can be downloaded from -http://software.intel.com/en-us/articles/intel-software-development-emulator - - -How to get the advantage of MPX -=============================== - -For MPX to work, changes are required in the kernel, binutils and compiler. -No source changes are required for applications, just a recompile. - -There are a lot of moving parts of this to all work right. The following -is how we expect the compiler, application and kernel to work together. - -1) Application developer compiles with -fmpx. The compiler will add the - instrumentation as well as some setup code called early after the app - starts. New instruction prefixes are noops for old CPUs. -2) That setup code allocates (virtual) space for the "bounds directory", - points the "bndcfgu" register to the directory (must also set the valid - bit) and notifies the kernel (via the new prctl(PR_MPX_ENABLE_MANAGEMENT)) - that the app will be using MPX. The app must be careful not to access - the bounds tables between the time when it populates "bndcfgu" and - when it calls the prctl(). This might be hard to guarantee if the app - is compiled with MPX. You can add "__attribute__((bnd_legacy))" to - the function to disable MPX instrumentation to help guarantee this. - Also be careful not to call out to any other code which might be - MPX-instrumented. -3) The kernel detects that the CPU has MPX, allows the new prctl() to - succeed, and notes the location of the bounds directory. Userspace is - expected to keep the bounds directory at that location. We note it - instead of reading it each time because the 'xsave' operation needed - to access the bounds directory register is an expensive operation. -4) If the application needs to spill bounds out of the 4 registers, it - issues a bndstx instruction. Since the bounds directory is empty at - this point, a bounds fault (#BR) is raised, the kernel allocates a - bounds table (in the user address space) and makes the relevant entry - in the bounds directory point to the new table. -5) If the application violates the bounds specified in the bounds registers, - a separate kind of #BR is raised which will deliver a signal with - information about the violation in the 'struct siginfo'. -6) Whenever memory is freed, we know that it can no longer contain valid - pointers, and we attempt to free the associated space in the bounds - tables. If an entire table becomes unused, we will attempt to free - the table and remove the entry in the directory. - -To summarize, there are essentially three things interacting here: - -GCC with -fmpx: - * enables annotation of code with MPX instructions and prefixes - * inserts code early in the application to call in to the "gcc runtime" -GCC MPX Runtime: - * Checks for hardware MPX support in cpuid leaf - * allocates virtual space for the bounds directory (malloc() essentially) - * points the hardware BNDCFGU register at the directory - * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to - start managing the bounds directories -Kernel MPX Code: - * Checks for hardware MPX support in cpuid leaf - * Handles #BR exceptions and sends SIGSEGV to the app when it violates - bounds, like during a buffer overflow. - * When bounds are spilled in to an unallocated bounds table, the kernel - notices in the #BR exception, allocates the virtual space, then - updates the bounds directory to point to the new table. It keeps - special track of the memory with a VM_MPX flag. - * Frees unused bounds tables at the time that the memory they described - is unmapped. - - -How does MPX kernel code work -============================= - -Handling #BR faults caused by MPX ---------------------------------- - -When MPX is enabled, there are 2 new situations that can generate -#BR faults. - - * new bounds tables (BT) need to be allocated to save bounds. - * bounds violation caused by MPX instructions. - -We hook #BR handler to handle these two new situations. - -On-demand kernel allocation of bounds tables --------------------------------------------- - -MPX only has 4 hardware registers for storing bounds information. If -MPX-enabled code needs more than these 4 registers, it needs to spill -them somewhere. It has two special instructions for this which allow -the bounds to be moved between the bounds registers and some new "bounds -tables". - -#BR exceptions are a new class of exceptions just for MPX. They are -similar conceptually to a page fault and will be raised by the MPX -hardware during both bounds violations or when the tables are not -present. The kernel handles those #BR exceptions for not-present tables -by carving the space out of the normal processes address space and then -pointing the bounds-directory over to it. - -The tables need to be accessed and controlled by userspace because -the instructions for moving bounds in and out of them are extremely -frequent. They potentially happen every time a register points to -memory. Any direct kernel involvement (like a syscall) to access the -tables would obviously destroy performance. - -Why not do this in userspace? MPX does not strictly require anything in -the kernel. It can theoretically be done completely from userspace. Here -are a few ways this could be done. We don't think any of them are practical -in the real-world, but here they are. - -:Q: Can virtual space simply be reserved for the bounds tables so that we - never have to allocate them? -:A: MPX-enabled application will possibly create a lot of bounds tables in - process address space to save bounds information. These tables can take - up huge swaths of memory (as much as 80% of the memory on the system) - even if we clean them up aggressively. In the worst-case scenario, the - tables can be 4x the size of the data structure being tracked. IOW, a - 1-page structure can require 4 bounds-table pages. An X-GB virtual - area needs 4*X GB of virtual space, plus 2GB for the bounds directory. - If we were to preallocate them for the 128TB of user virtual address - space, we would need to reserve 512TB+2GB, which is larger than the - entire virtual address space today. This means they can not be reserved - ahead of time. Also, a single process's pre-populated bounds directory - consumes 2GB of virtual *AND* physical memory. IOW, it's completely - infeasible to prepopulate bounds directories. - -:Q: Can we preallocate bounds table space at the same time memory is - allocated which might contain pointers that might eventually need - bounds tables? -:A: This would work if we could hook the site of each and every memory - allocation syscall. This can be done for small, constrained applications. - But, it isn't practical at a larger scale since a given app has no - way of controlling how all the parts of the app might allocate memory - (think libraries). The kernel is really the only place to intercept - these calls. - -:Q: Could a bounds fault be handed to userspace and the tables allocated - there in a signal handler instead of in the kernel? -:A: mmap() is not on the list of safe async handler functions and even - if mmap() would work it still requires locking or nasty tricks to - keep track of the allocation state there. - -Having ruled out all of the userspace-only approaches for managing -bounds tables that we could think of, we create them on demand in -the kernel. - -Decoding MPX instructions -------------------------- - -If a #BR is generated due to a bounds violation caused by MPX. -We need to decode MPX instructions to get violation address and -set this address into extended struct siginfo. - -The _sigfault field of struct siginfo is extended as follow:: - - 87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */ - 88 struct { - 89 void __user *_addr; /* faulting insn/memory ref. */ - 90 #ifdef __ARCH_SI_TRAPNO - 91 int _trapno; /* TRAP # which caused the signal */ - 92 #endif - 93 short _addr_lsb; /* LSB of the reported address */ - 94 struct { - 95 void __user *_lower; - 96 void __user *_upper; - 97 } _addr_bnd; - 98 } _sigfault; - -The '_addr' field refers to violation address, and new '_addr_and' -field refers to the upper/lower bounds when a #BR is caused. - -Glibc will be also updated to support this new siginfo. So user -can get violation address and bounds when bounds violations occur. - -Cleanup unused bounds tables ----------------------------- - -When a BNDSTX instruction attempts to save bounds to a bounds directory -entry marked as invalid, a #BR is generated. This is an indication that -no bounds table exists for this entry. In this case the fault handler -will allocate a new bounds table on demand. - -Since the kernel allocated those tables on-demand without userspace -knowledge, it is also responsible for freeing them when the associated -mappings go away. - -Here, the solution for this issue is to hook do_munmap() to check -whether one process is MPX enabled. If yes, those bounds tables covered -in the virtual address region which is being unmapped will be freed also. - -Adding new prctl commands -------------------------- - -Two new prctl commands are added to enable and disable MPX bounds tables -management in kernel. -:: - - 155 #define PR_MPX_ENABLE_MANAGEMENT 43 - 156 #define PR_MPX_DISABLE_MANAGEMENT 44 - -Runtime library in userspace is responsible for allocation of bounds -directory. So kernel have to use XSAVE instruction to get the base -of bounds directory from BNDCFG register. - -But XSAVE is expected to be very expensive. In order to do performance -optimization, we have to get the base of bounds directory and save it -into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT -command execution. - - -Special rules -============= - -1) If userspace is requesting help from the kernel to do the management -of bounds tables, it may not create or modify entries in the bounds directory. - -Certainly users can allocate bounds tables and forcibly point the bounds -directory at them through XSAVE instruction, and then set valid bit -of bounds entry to have this entry valid. But, the kernel will decline -to assist in managing these tables. - -2) Userspace may not take multiple bounds directory entries and point -them at the same bounds table. - -This is allowed architecturally. See more information "Intel(R) Architecture -Instruction Set Extensions Programming Reference" (9.3.4). - -However, if users did this, the kernel might be fooled in to unmapping an -in-use bounds table since it does not recognize sharing. |