Don't use a potentially expensive shift if all we want is one set bit.

[oota-llvm.git] / docs / Vectorizers.rst
diff --git a/docs/Vectorizers.rst b/docs/Vectorizers.rst

index d5c286abd928f1a743730391afc027c2989938e5..221fb2949f8124f4a8cd883c35e788296d348d84 100644 (file)
--- a/docs/Vectorizers.rst
+++ b/docs/Vectorizers.rst
@@ -5,13 +5,13 @@ Auto-Vectorization in LLVM
  .. contents::
     :local:
  
-LLVM has two kind vectorizers: The :ref:`Loop Vectorizer <loop-vectorizer>`,
+LLVM has two vectorizers: The :ref:`Loop Vectorizer <loop-vectorizer>`,
  which operates on Loops, and the :ref:`SLP Vectorizer
-<slp-vectorizer>`, which optimizes straight-line code. These vectorizers
+<slp-vectorizer>`. These vectorizers
  focus on different optimization opportunities and use different techniques.
-The BB vectorizer merges multiple scalars that are found in the code into
-vectors while the Loop Vectorizer widens instructions in the original loop
-to operate on multiple consecutive loop iterations.
+The SLP vectorizer merges multiple scalars that are found in the code into
+vectors while the Loop Vectorizer widens instructions in loops
+to operate on multiple consecutive iterations.
  
  .. _loop-vectorizer:
  
@@ -302,10 +302,9 @@ Details
  -------
  
  The goal of SLP vectorization (a.k.a. superword-level parallelism) is
-to combine similar independent instructions within simple control-flow regions
-into vector instructions. Memory accesses, arithemetic operations, comparison
-operations and some math functions can all be vectorized using this technique
-(subject to the capabilities of the target architecture).
+to combine similar independent instructions
+into vector instructions. Memory accesses, arithmetic operations, comparison
+operations, PHI-nodes, can all be vectorized using this technique.
  
  For example, the following function performs very similar operations on its
  inputs (a1, b1) and (a2, b2). The basic-block vectorizer may combine these
@@ -318,6 +317,7 @@ into vector operations.
      A[1] = a2*(a2 + b2)/b2 + 50*b2/a2;
    }
  
+The SLP-vectorizer processes the code bottom-up, across basic blocks, in search of scalars to combine.
  
  Usage
  ------
@@ -329,7 +329,7 @@ through clang using the command line flag:
  
     $ clang -fslp-vectorize file.c
  
-LLVM has a second phase basic block vectorization phase
+LLVM has a second basic block vectorization phase
  which is more compile-time intensive (The BB vectorizer). This optimization
  can be enabled through clang using the command line flag: