summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPhilip Reames <listmail@philipreames.com>2015-09-10 17:03:10 +0000
committerPhilip Reames <listmail@philipreames.com>2015-09-10 17:03:10 +0000
commit8b25542f3d0abc9f25039b55aed1cdc71fd368fb (patch)
tree0cdd700b984ff271e0d859e4e1f6ffa3ef4cf119
parenta8d8dba0a6851f0998dce78e61b1dbb2974fd901 (diff)
[docs][PerformanceTips] Add text on allocas and alignment
This summarizes two recent llvm-dev discussions. Most of the text provided by David Chisnall and Benoit Belley with minor editting by me. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247301 91177308-0d34-0410-b5e6-96231b3b80d8
-rw-r--r--docs/Frontend/PerformanceTips.rst41
1 files changed, 41 insertions, 0 deletions
diff --git a/docs/Frontend/PerformanceTips.rst b/docs/Frontend/PerformanceTips.rst
index a3f977f0e03..142d262eb65 100644
--- a/docs/Frontend/PerformanceTips.rst
+++ b/docs/Frontend/PerformanceTips.rst
@@ -46,6 +46,22 @@ The Basics
perform badly with confronted with such structures. The only exception to
this guidance is that a unified return block with high in-degree is fine.
+Use of allocas
+^^^^^^^^^^^^^^
+
+An alloca instruction can be used to represent a function scoped stack slot,
+but can also represent dynamic frame expansion. When representing function
+scoped variables or locations, placing alloca instructions at the beginning of
+the entry block should be preferred. In particular, place them before any
+call instructions. Call instructions might get inlined and replaced with
+multiple basic blocks. The end result is that a following alloca instruction
+would no longer be in the entry basic block afterward.
+
+The SROA (Scalar Replacement Of Aggregates) and Mem2Reg passes only attempt
+to eliminate alloca instructions that are in the entry basic block. Given
+SSA is the canonical form expected by much of the optimizer; if allocas can
+not be eliminated by Mem2Reg or SROA, the optimizer is likely to be less
+effective than it could be.
Avoid loads and stores of large aggregate type
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -79,6 +95,31 @@ operations for safety. If your source language provides information about
the range of the index, you may wish to manually extend indices to machine
register width using a zext instruction.
+When to specify alignment
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+LLVM will always generate correct code if you don’t specify alignment, but may
+generate inefficient code. For example, if you are targeting MIPS (or older
+ARM ISAs) then the hardware does not handle unaligned loads and stores, and
+so you will enter a trap-and-emulate path if you do a load or store with
+lower-than-natural alignment. To avoid this, LLVM will emit a slower
+sequence of loads, shifts and masks (or load-right + load-left on MIPS) for
+all cases where the load / store does not have a sufficiently high alignment
+in the IR.
+
+The alignment is used to guarantee the alignment on allocas and globals,
+though in most cases this is unnecessary (most targets have a sufficiently
+high default alignment that they’ll be fine). It is also used to provide a
+contract to the back end saying ‘either this load/store has this alignment, or
+it is undefined behavior’. This means that the back end is free to emit
+instructions that rely on that alignment (and mid-level optimizers are free to
+perform transforms that require that alignment). For x86, it doesn’t make
+much difference, as almost all instructions are alignment-independent. For
+MIPS, it can make a big difference.
+
+Note that if your loads and stores are atomic, the backend will be unable to
+lower an under aligned access into a sequence of natively aligned accesses.
+As a result, alignment is mandatory for atomic loads and stores.
+
Other Things to Consider
^^^^^^^^^^^^^^^^^^^^^^^^