lib/Target/X86/README.txt

   1 //===- README.txt - Information about the X86 backend and related files ---===//
   2 //
   3 // This file contains random notes and points of interest about the X86 backend.
   4 //
   5 // Snippets of this document will probably become the final report for CS497
   6 //
   7 //===----------------------------------------------------------------------===//
   8
   9 ===========
  10 I. Overview
  11 ===========
  12
  13 This directory contains a machine description for the X86 processor.  Currently
  14 this machine description is used for a high performance code generator used by a
  15 LLVM JIT.  One of the main objectives that we would like to support with this
  16 project is to build a nice clean code generator that may be extended in the
  17 future in a variety of ways: new targets, new optimizations, new
  18 transformations, etc.
  19
  20 This document describes the current state of the LLVM JIT, along with
  21 implementation notes, design decisions, and other stuff.
  22
  23
  24 ===================================
  25 II. Architecture / Design Decisions
  26 ===================================
  27
  28 We designed the infrastructure for the machine specific representation to be as
  29 light-weight as possible, while also being able to support as many targets as
  30 possible with our framework.  This framework should allow us to share many
  31 common machine specific transformations (register allocation, instruction
  32 scheduling, etc...) among all of the backends that may eventually be supported
  33 by the JIT, and unify the JIT and static compiler backends.
  34
  35 At the high-level, LLVM code is translated to a machine specific representation
  36 formed out of MFunction, MBasicBlock, and MInstruction instances (defined in
  37 include/llvm/CodeGen).  This representation is completely target agnostic,
  38 representing instructions in their most abstract form: an opcode, a destination,
  39 and a series of operands.  This representation is designed to support both SSA
  40 representation for machine code, as well as a register allocated, non-SSA form.
  41
  42 Because the M* representation must work regardless of the target machine, it
  43 contains very little semantic information about the program.  To get semantic
  44 information about the program, a layer of Target description datastructures are
  45 used, defined in include/llvm/Target.
  46
  47 Currently the Sparc backend and the X86 backend do not share a common
  48 representation.  This is an intentional decision, and will be rectified in the
  49 future (after the project is done).
  50
  51
  52 =======================
  53 III. Source Code Layout
  54 =======================
  55
  56 The LLVM-JIT is composed of source files primarily in the following locations:
  57
  58 include/llvm/CodeGen
  59 --------------------
  60
  61 This directory contains header files that are used to represent the program in a
  62 machine specific representation.  It currently also contains a bunch of stuff
  63 used by the Sparc backend that we don't want to get mixed up in.
  64
  65 include/llvm/Target
  66 -------------------
  67
  68 This directory contains header files that are used to interpret the machine
  69 specific representation of the program.  This allows us to write generic
  70 transformations that will work on any target that implements the interfaces
  71 defined in this directory.  Again, this also contains a bunch of stuff from the
  72 Sparc Backend that we don't want to deal with.
  73
  74 lib/CodeGen
  75 -----------
  76 This directory will contain all of the target independant transformations (for
  77 example, register allocation) that we write.  These transformations should only
  78 use information exposed through the Target interface, it should not include any
  79 target specific header files.
  80
  81 lib/Target/X86
  82 --------------
  83 This directory contains the machine description for X86 that is required to the
  84 rest of the compiler working.  It contains any code that is truely specific to
  85 the X86 backend, for example the instruction selector and machine code emitter.
  86
  87 tools/jello
  88 -----------
  89 This directory contains the top-level code for the JIT compiler.
  90
  91 test/Regression/Jello
  92 ---------------------
  93 This directory contains regression tests for the JIT.  Initially it contains a
  94 bunch of really trivial testcases that we should build up to supporting.
  95
  96
  97 ==========================
  98 IV. TODO / Future Projects
  99 ==========================
 100
 101 There are a large number of things remaining to do.  Here is a partial list:
 102
 103 Critial path:
 104 -------------
 105
 106 0. Finish providing SSA form.  This involves keeping track of some information
 107    when instructions are added to the function, but should not affect that API
 108    for creating new MInstructions or adding them to the program.  There are
 109    also various FIXMEs in the M* files that need to get taken care of in the
 110    near term.
 111 1. Finish dumb instruction selector
 112 2. Write dumb register allocator
 113 3. Write assembly language emitter
 114 4. Write machine code emitter
 115
 116 Next Phase:
 117 -----------
 118 1. Implement linear time optimal instruction selector
 119 2. Implement smarter (linear scan?) register allocator
 120
 121 After this project:
 122 -------------------
 123 1. Implement lots of nifty runtime optimizations
 124 2. Implement a static compiler backend for x86
 125 3. Migrate Sparc backend to new representation
 126 4. Implement new spiffy targets: IA64? X86-64? M68k?  Who knows...
 127
 128 Infrastructure Improvements:
 129 ----------------------------
 130
 131 1. Bytecode is designed to be able to read particular functions from the
 132    bytecode without having to read the whole program.  Bytecode reader should be
 133    extended to allow on demand loading of functions.
 134
 135 2. PassManager needs to be able to run just a single function through a pipeline
 136    of FunctionPass's.  When this happens, all of our code will become
 137    FunctionPass's for real.
 138
 139 3. llvmgcc needs to be modified to output 32-bit little endian LLVM files.
 140    Preferably it will be parameterizable so that multiple binaries need not
 141    exist.  Until this happens, we will be restricted to using type safe
 142    programs (most of the Olden suite and many smaller tests), which should be
 143    sufficient for our 497 project.