X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FSourceLevelDebugging.html;h=166ce0790c1113333101ad9ba7c79f196c6b9e1e;hb=a75ce9f5d2236d93c117e861e60e6f3f748c9555;hp=a7037f237190049dbce7a5b2232d033d8953838b;hpb=8ff759072f5dd1fd4158944c90d1f1c93cc2f047;p=oota-llvm.git diff --git a/docs/SourceLevelDebugging.html b/docs/SourceLevelDebugging.html index a7037f23719..166ce0790c1 100644 --- a/docs/SourceLevelDebugging.html +++ b/docs/SourceLevelDebugging.html @@ -2,6 +2,7 @@ "http://www.w3.org/TR/html4/strict.dtd"> + Source Level Debugging with LLVM @@ -9,74 +10,75 @@
Source Level Debugging with LLVM
+ + + + +
+
  • Debugger intrinsic functions +
  • + +
  • Object lifetimes and scoping
  • +
  • C/C++ front-end specific debug information
      -
    1. Program Scope Entries
    2. - -
    3. Data objects (program variables)
    4. -
    +
  • C/C++ source file information
  • +
  • C/C++ global variable information
  • +
  • C/C++ function information
  • +
  • C/C++ basic types
  • +
  • C/C++ derived types
  • +
  • C/C++ struct/union types
  • +
  • C/C++ enumeration types
  • + +
    +A leafy and green bug eater +
    + +
    +

    Written by Chris Lattner + and Jim Laskey

    +
    + -
    Introduction
    +
    Introduction

    This document is the central repository for all information pertaining to -debug information in LLVM. It describes the user -interface for the llvm-db -tool, which provides a powerful source-level debugger -to users of LLVM-based compilers. It then describes the various components that make up the debugger and the -libraries which future clients may use. Finally, it describes the actual format that the LLVM debug information takes, -which is useful for those interested in creating front-ends or dealing directly -with the information.

    + debug information in LLVM. It describes the actual format + that the LLVM debug information takes, which is useful for those + interested in creating front-ends or dealing directly with the information. + Further, this document provides specific examples of what debug information + for C/C++.

    @@ -87,54 +89,72 @@ with the information.

    -

    -The idea of the LLVM debugging information is to capture how the important -pieces of the source-language's Abstract Syntax Tree map onto LLVM code. -Several design aspects have shaped the solution that appears here. The -important ones are:

    +

    The idea of the LLVM debugging information is to capture how the important + pieces of the source-language's Abstract Syntax Tree map onto LLVM code. + Several design aspects have shaped the solution that appears here. The + important ones are:

    -

    +

    The approach used by the LLVM implementation is to use a small set + of intrinsic functions to define a + mapping between LLVM program objects and the source-level objects. The + description of the source-level program is maintained in LLVM metadata + in an implementation-defined format + (the C/C++ front-end currently uses working draft 7 of + the DWARF 3 + standard).

    -

    -The approach used by the LLVM implementation is to use a small set of intrinsic functions to define a mapping -between LLVM program objects and the source-level objects. The description of -the source-level program is maintained in LLVM global variables in an implementation-defined format (the C/C++ front-end -currently uses working draft 7 of the Dwarf 3 standard).

    +

    When a program is being debugged, a debugger interacts with the user and + turns the stored debug information into source-language specific information. + As such, a debugger must be aware of the source-language, and is thus tied to + a specific language or family of languages.

    -

    -When a program is debugged, the debugger interacts with the user and turns the -stored debug information into source-language specific information. As such, -the debugger must be aware of the source-language, and is thus tied to a -specific language of family of languages. The LLVM -debugger is designed to be modular in its support for source-languages. -

    +
    + +
    + Debug information consumers
    +
    + +

    The role of debug information is to provide meta information normally + stripped away during the compilation process. This meta information provides + an LLVM user a relationship between generated code and the original program + source code.

    + +

    Currently, debug information is consumed by DwarfDebug to produce dwarf + information used by the gdb debugger. Other targets could use the same + information to produce stabs or other debug forms.

    + +

    It would also be reasonable to use debug information to feed profiling tools + for analysis of generated code, or, tools for reconstructing the original + source from generated code.

    + +

    TODO - expound a bit more.

    + +
    @@ -142,1028 +162,1628 @@ debugger is designed to be modular in its support for source-languages.
    -

    -An extremely high priority of LLVM debugging information is to make it interact -well with optimizations and analysis. In particular, the LLVM debug information -provides the following guarantees:

    -

    -
  • As desired, LLVM optimizations can be upgraded to be aware of the LLVM -debugging information, allowing them to update the debugging information as they -perform aggressive optimizations. This means that, with effort, the LLVM -optimizers could optimize debug code just as well as non-debug code.
  • + +
    + Debugging information format +
    + -
  • LLVM debug information does not prevent many important optimizations from -happening (for example inlining, basic block reordering/merging/cleanup, tail -duplication, etc), further reducing the amount of the compiler that eventually -is "aware" of debugging information.
  • +
    -
  • LLVM debug information is automatically optimized along with the rest of the -program, using existing facilities. For example, duplicate information is -automatically merged by the linker, and unused information is automatically -removed.
  • +

    LLVM debugging information has been carefully designed to make it possible + for the optimizer to optimize the program and debugging information without + necessarily having to know anything about debugging information. In + particular, the use of metadata avoids duplicated debugging information from + the beginning, and the global dead code elimination pass automatically + deletes debugging information for a function if it decides to delete the + function.

    -

    +

    To do this, most of the debugging information (descriptors for types, + variables, functions, source files, etc) is inserted by the language + front-end in the form of LLVM metadata.

    + +

    Debug information is designed to be agnostic about the target debugger and + debugging information representation (e.g. DWARF/Stabs/etc). It uses a + generic pass to decode the information that represents variables, types, + functions, namespaces, etc: this allows for arbitrary source-language + semantics and type-systems to be used, as long as there is a module + written for the target debugger to interpret the information.

    + +

    To provide basic functionality, the LLVM debugger does have to make some + assumptions about the source-level language being debugged, though it keeps + these to a minimum. The only common features that the LLVM debugger assumes + exist are source files, + and program objects. These abstract + objects are used by a debugger to form stack traces, show information about + local variables, etc.

    -

    -Basically, the debug information allows you to compile a program with "-O0 --g" and get full debug information, allowing you to arbitrarily modify the -program as it executes from the debugger. Compiling a program with "-O3 --g" gives you full debug information that is always available and accurate -for reading (e.g., you get accurate stack traces despite tail call elimination -and inlining), but you might lose the ability to modify the program and call -functions where were optimized out of the program, or inlined away completely. -

    +

    This section of the documentation first describes the representation aspects + common to any source-language. The next section + describes the data layout conventions used by the C and C++ front-ends.

    -
    - Future work + Debug information descriptors
    -

    -There are several important extensions that could be eventually added to the -LLVM debugger. The most important extension would be to upgrade the LLVM code -generators to support debugging information. This would also allow, for -example, the X86 code generator to emit native objects that contain debugging -information consumable by traditional source-level debuggers like GDB or -DBX.

    -

    -Additionally, LLVM optimizations can be upgraded to incrementally update the -debugging information, new commands can be added to the -debugger, and thread support could be added to the debugger.

    +

    In consideration of the complexity and volume of debug information, LLVM + provides a specification for well formed debug descriptors.

    + +

    Consumers of LLVM debug information expect the descriptors for program + objects to start in a canonical format, but the descriptors can include + additional information appended at the end that is source-language + specific. All LLVM debugging information is versioned, allowing backwards + compatibility in the case that the core structures need to change in some + way. Also, all debugging information objects start with a tag to indicate + what type of object it is. The source-language is allowed to define its own + objects, by using unreserved tag numbers. We recommend using with tags in + the range 0x1000 through 0x2000 (there is a defined enum DW_TAG_user_base = + 0x1000.)

    + +

    The fields of debug descriptors used internally by LLVM + are restricted to only the simple data types i32, i1, + float, double, mdstring and mdnode.

    + +
    +
    +!1 = metadata !{
    +  i32,   ;; A tag
    +  ...
    +}
    +
    +
    + +

    The first field of a descriptor is always an + i32 containing a tag value identifying the content of the + descriptor. The remaining fields are specific to the descriptor. The values + of tags are loosely bound to the tag values of DWARF information entries. + However, that does not restrict the use of the information supplied to DWARF + targets. To facilitate versioning of debug information, the tag is augmented + with the current debug version (LLVMDebugVersion = 8 << 16 or 0x80000 or + 524288.)

    -

    -The "SourceLanguage" modules provided by llvm-db could be substantially -improved to provide good support for C++ language features like namespaces and -scoping rules.

    +

    The details of the various descriptors follow.

    -

    -After working with the debugger for a while, perhaps the nicest improvement -would be to add some sort of line editor, such as GNU readline (but one that is -compatible with the LLVM license).

    +
    + + +
    + Compile unit descriptors +
    -

    -For someone so inclined, it should be straight-forward to write different -front-ends for the LLVM debugger, as the LLVM debugging engine is cleanly -separated from the llvm-db front-end. A new LLVM GUI debugger or IDE -would be nice. :) -

    +
    +
    +
    +!0 = metadata !{
    +  i32,       ;; Tag = 17 + LLVMDebugVersion 
    +             ;; (DW_TAG_compile_unit)
    +  i32,       ;; Unused field. 
    +  i32,       ;; DWARF language identifier (ex. DW_LANG_C89) 
    +  metadata,  ;; Source file name
    +  metadata,  ;; Source file directory (includes trailing slash)
    +  metadata   ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
    +  i1,        ;; True if this is a main compile unit. 
    +  i1,        ;; True if this is optimized.
    +  metadata,  ;; Flags
    +  i32        ;; Runtime version
    +}
    +
    +

    These descriptors contain a source language ID for the file (we use the DWARF + 3.0 ID numbers, such as DW_LANG_C89, DW_LANG_C_plus_plus, + DW_LANG_Cobol74, etc), three strings describing the filename, + working directory of the compiler, and an identifier string for the compiler + that produced it.

    + +

    Compile unit descriptors provide the root context for objects declared in a + specific compilation unit. File descriptors are defined using this context.

    - -
    - Using the llvm-db tool
    - + + +
    + File descriptors +
    -

    -The llvm-db tool provides a GDB-like interface for source-level -debugging of programs. This tool provides many standard commands for inspecting -and modifying the program as it executes, loading new programs, single stepping, -placing breakpoints, etc. This section describes how to use the debugger. -

    +
    +
    +!0 = metadata !{
    +  i32,       ;; Tag = 41 + LLVMDebugVersion 
    +             ;; (DW_TAG_file_type)
    +  metadata,  ;; Source file name
    +  metadata,  ;; Source file directory (includes trailing slash)
    +  metadata   ;; Reference to compile unit where defined
    +}
    +
    +
    + +

    These descriptors contain information for a file. Global variables and top + level functions would be defined using this context.k File descriptors also + provide context for source line correspondence.

    -

    llvm-db has been designed to be as similar to GDB in its user -interface as possible. This should make it extremely easy to learn -llvm-db if you already know GDB. In general, llvm-db -provides the subset of GDB commands that are applicable to LLVM debugging users. -If there is a command missing that make a reasonable amount of sense within the -limitations of llvm-db, please report it as -a bug or, better yet, submit a patch to add it. :)

    +

    Each input file is encoded as a separate file descriptor in LLVM debugging + information output. Each file descriptor would be defined using a + compile unit.

    -
    - Limitations of llvm-db +
    -

    llvm-db is designed to be modular and easy to extend. This -extensibility was key to getting the debugger up-and-running quickly, because we -can start with simple-but-unsophisicated implementations of various components. -Because of this, it is currently missing many features, though they should be -easy to add over time (patches welcomed!). The biggest inherent limitations of -llvm-db are currently due to extremely simple debugger backend (implemented in -"lib/Debugger/UnixLocalInferiorProcess.cpp") which is designed to work without -any cooperation from the code generators. Because it is so simple, it suffers -from the following inherent limitations:

    +
    +
    +!1 = metadata !{
    +  i32,      ;; Tag = 52 + LLVMDebugVersion 
    +            ;; (DW_TAG_variable)
    +  i32,      ;; Unused field.
    +  metadata, ;; Reference to context descriptor
    +  metadata, ;; Name
    +  metadata, ;; Display name (fully qualified C++ name)
    +  metadata, ;; MIPS linkage name (for C++)
    +  metadata, ;; Reference to file where defined
    +  i32,      ;; Line number where defined
    +  metadata, ;; Reference to type descriptor
    +  i1,       ;; True if the global is local to compile unit (static)
    +  i1,       ;; True if the global is defined in the compile unit (not extern)
    +  {}*       ;; Reference to the global variable
    +}
    +
    +
    + +

    These descriptors provide debug information about globals variables. The +provide details such as name, type and where the variable is defined.

    -

      +
    -
  • Running a program in llvm-db is a bit slower than running it with -lli (i.e., in the JIT).
  • + + -
  • Inspection of the target hardware is not supported. This means that you -cannot, for example, print the contents of X86 registers.
  • +
    -
  • Inspection of LLVM code is not supported. This means that you cannot print -the contents of arbitrary LLVM values, or use commands such as stepi. -This also means that you cannot debug code without debug information.
  • +
    +
    +!2 = metadata !{
    +  i32,      ;; Tag = 46 + LLVMDebugVersion
    +            ;; (DW_TAG_subprogram)
    +  i32,      ;; Unused field.
    +  metadata, ;; Reference to context descriptor
    +  metadata, ;; Name
    +  metadata, ;; Display name (fully qualified C++ name)
    +  metadata, ;; MIPS linkage name (for C++)
    +  metadata, ;; Reference to file where defined
    +  i32,      ;; Line number where defined
    +  metadata, ;; Reference to type descriptor
    +  i1,       ;; True if the global is local to compile unit (static)
    +  i1        ;; True if the global is defined in the compile unit (not extern)
    +  i32       ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
    +  i32       ;; Index into a virtual function
    +  metadata, ;; indicates which base type contains the vtable pointer for the 
    +            ;; derived class
    +  i1        ;; isArtificial
    +  i1        ;; isOptimized
    +  Function *;; Pointer to LLVM function
    +}
    +
    +
    -
  • Portions of the debugger run in the same address space as the program being -debugged. This means that memory corruption by the program could trample on -portions of the debugger.
  • +

    These descriptors provide debug information about functions, methods and + subprograms. They provide details such as name, return types and the source + location where the subprogram is defined.

    -
  • Attaching to existing processes and core files is not currently -supported.
  • +
    -

    + + -

    That said, the debugger is still quite useful, and all of these limitations -can be eliminated by integrating support for the debugger into the code -generators, and writing a new InferiorProcess -subclass to use it. See the future work section for ideas -of how to extend the LLVM debugger despite these limitations.

    +
    +
    +
    +!3 = metadata !{
    +  i32,     ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block)
    +  metadata,;; Reference to context descriptor
    +  i32,     ;; Line number
    +  i32      ;; Column number
    +}
    +
    +

    These descriptors provide debug information about nested blocks within a + subprogram. The line number and column numbers are used to dinstinguish + two lexical blocks at same depth.

    + +
    -
    - A sample llvm-db session +
    -

    TODO: this is obviously lame, when more is implemented, this can be much -better.

    - -

    -$ llvm-db funccall
    -llvm-db: The LLVM source-level debugger
    -Loading program... successfully loaded 'funccall.bc'!
    -(llvm-db) create
    -Starting program: funccall.bc
    -main at funccall.c:9:2
    -9 ->            q = 0;
    -(llvm-db) list main
    -4       void foo() {
    -5               int t = q;
    -6               q = t + 1;
    -7       }
    -8       int main() {
    -9 ->            q = 0;
    -10              foo();
    -11              q = q - 1;
    -12
    -13              return q;
    -(llvm-db) list
    -14      }
    -(llvm-db) step
    -10 ->           foo();
    -(llvm-db) s
    -foo at funccall.c:5:2
    -5 ->            int t = q;
    -(llvm-db) bt
    -#0 ->   0x85ffba0 in foo at funccall.c:5:2
    -#1      0x85ffd98 in main at funccall.c:10:2
    -(llvm-db) finish
    -main at funccall.c:11:2
    -11 ->           q = q - 1;
    -(llvm-db) s
    -13 ->           return q;
    -(llvm-db) s
    -The program stopped with exit code 0
    -(llvm-db) quit
    -$
    -

    - +
    +
    +!4 = metadata !{
    +  i32,      ;; Tag = 36 + LLVMDebugVersion 
    +            ;; (DW_TAG_base_type)
    +  metadata, ;; Reference to context (typically a compile unit)
    +  metadata, ;; Name (may be "" for anonymous types)
    +  metadata, ;; Reference to file where defined (may be NULL)
    +  i32,      ;; Line number where defined (may be 0)
    +  i64,      ;; Size in bits
    +  i64,      ;; Alignment in bits
    +  i64,      ;; Offset in bits
    +  i32,      ;; Flags
    +  i32       ;; DWARF type encoding
    +}
    +
    +

    These descriptors define primitive types used in the code. Example int, bool + and float. The context provides the scope of the type, which is usually the + top level. Since basic types are not usually user defined the compile unit + and line number can be left as NULL and 0. The size, alignment and offset + are expressed in bits and can be 64 bit values. The alignment is used to + round the offset when embedded in a + composite type (example to keep float + doubles on 64 bit boundaries.) The offset is the bit offset if embedded in + a composite type.

    + +

    The type encoding provides the details of the type. The values are typically + one of the following:

    + +
    +
    +DW_ATE_address       = 1
    +DW_ATE_boolean       = 2
    +DW_ATE_float         = 4
    +DW_ATE_signed        = 5
    +DW_ATE_signed_char   = 6
    +DW_ATE_unsigned      = 7
    +DW_ATE_unsigned_char = 8
    +
    +
    +
    -
    - Starting the debugger +
    -

    There are three ways to start up the llvm-db debugger:

    +
    +
    +!5 = metadata !{
    +  i32,      ;; Tag (see below)
    +  metadata, ;; Reference to context
    +  metadata, ;; Name (may be "" for anonymous types)
    +  metadata, ;; Reference to file where defined (may be NULL)
    +  i32,      ;; Line number where defined (may be 0)
    +  i64,      ;; Size in bits
    +  i64,      ;; Alignment in bits
    +  i64,      ;; Offset in bits
    +  metadata  ;; Reference to type derived from
    +}
    +
    +
    + +

    These descriptors are used to define types derived from other types. The +value of the tag varies depending on the meaning. The following are possible +tag values:

    + +
    +
    +DW_TAG_formal_parameter = 5
    +DW_TAG_member           = 13
    +DW_TAG_pointer_type     = 15
    +DW_TAG_reference_type   = 16
    +DW_TAG_typedef          = 22
    +DW_TAG_const_type       = 38
    +DW_TAG_volatile_type    = 53
    +DW_TAG_restrict_type    = 55
    +
    +
    + +

    DW_TAG_member is used to define a member of + a composite type + or subprogram. The type of the member is + the derived + type. DW_TAG_formal_parameter is used to define a member which + is a formal argument of a subprogram.

    -

    When run with no options, just llvm-db, the debugger starts up -without a program loaded at all. You must use the file command to load a program, and the set args or run -commands to specify the arguments for the program.

    +

    DW_TAG_typedef is used to provide a name for the derived type.

    -

    If you start the debugger with one argument, as llvm-db -<program>, the debugger will start up and load in the specified -program. You can then optionally specify arguments to the program with the set args or run -commands.

    +

    DW_TAG_pointer_type,DW_TAG_reference_type, + DW_TAG_const_type, DW_TAG_volatile_type + and DW_TAG_restrict_type are used to qualify + the derived type.

    -

    The third way to start the program is with the --args option. This -option allows you to specify the program to load and the arguments to start out -with. Example use: llvm-db --args ls /home

    +

    Derived type location can be determined + from the compile unit and line number. The size, alignment and offset are + expressed in bits and can be 64 bit values. The alignment is used to round + the offset when embedded in a composite + type (example to keep float doubles on 64 bit boundaries.) The offset is + the bit offset if embedded in a composite + type.

    + +

    Note that the void * type is expressed as a + llvm.dbg.derivedtype.type with tag of DW_TAG_pointer_type + and NULL derived type.

    -
    - Commands recognized by the debugger +
    -

    FIXME: this needs work obviously. See the GDB documentation for -information about what these do, or try 'help [command]' within -llvm-db to get information.

    +
    +
    +!6 = metadata !{
    +  i32,      ;; Tag (see below)
    +  metadata, ;; Reference to context
    +  metadata, ;; Name (may be "" for anonymous types)
    +  metadata, ;; Reference to file where defined (may be NULL)
    +  i32,      ;; Line number where defined (may be 0)
    +  i64,      ;; Size in bits
    +  i64,      ;; Alignment in bits
    +  i64,      ;; Offset in bits
    +  i32,      ;; Flags
    +  metadata, ;; Reference to type derived from
    +  metadata, ;; Reference to array of member descriptors
    +  i32       ;; Runtime languages
    +}
    +
    +
    -

    -

    General usage:

    -
      -
    • help [command]
    • -
    • quit
    • -
    • file [program]
    • -
    +

    These descriptors are used to define types that are composed of 0 or more +elements. The value of the tag varies depending on the meaning. The following +are possible tag values:

    + +
    +
    +DW_TAG_array_type       = 1
    +DW_TAG_enumeration_type = 4
    +DW_TAG_structure_type   = 19
    +DW_TAG_union_type       = 23
    +DW_TAG_vector_type      = 259
    +DW_TAG_subroutine_type  = 21
    +DW_TAG_inheritance      = 28
    +
    +
    -

    Program inspection and interaction:

    -
      -
    • create (start the program, stopping it ASAP in main)
    • -
    • kill
    • -
    • run [args]
    • -
    • step [num]
    • -
    • next [num]
    • -
    • cont
    • -
    • finish
    • - -
    • list [start[, end]]
    • -
    • info source
    • -
    • info sources
    • -
    • info functions
    • -
    +

    The vector flag indicates that an array type is a native packed vector.

    + +

    The members of array types (tag = DW_TAG_array_type) or vector types + (tag = DW_TAG_vector_type) are subrange + descriptors, each representing the range of subscripts at that level of + indexing.

    + +

    The members of enumeration types (tag = DW_TAG_enumeration_type) are + enumerator descriptors, each representing + the definition of enumeration value for the set.

    + +

    The members of structure (tag = DW_TAG_structure_type) or union (tag + = DW_TAG_union_type) types are any one of + the basic, + derived + or composite type descriptors, each + representing a field member of the structure or union.

    + +

    For C++ classes (tag = DW_TAG_structure_type), member descriptors + provide information about base classes, static members and member + functions. If a member is a derived type + descriptor and has a tag of DW_TAG_inheritance, then the type + represents a base class. If the member of is + a global variable descriptor then it + represents a static member. And, if the member is + a subprogram descriptor then it represents + a member function. For static members and member + functions, getName() returns the members link or the C++ mangled + name. getDisplayName() the simplied version of the name.

    + +

    The first member of subroutine (tag = DW_TAG_subroutine_type) type + elements is the return type for the subroutine. The remaining elements are + the formal arguments to the subroutine.

    + +

    Composite type location can be + determined from the compile unit and line number. The size, alignment and + offset are expressed in bits and can be 64 bit values. The alignment is used + to round the offset when embedded in + a composite type (as an example, to keep + float doubles on 64 bit boundaries.) The offset is the bit offset if embedded + in a composite type.

    -

    Call stack inspection:

    -
      -
    • backtrace
    • -
    • up [n]
    • -
    • down [n]
    • -
    • frame [n]
    • -
    +
    + + -

    Debugger inspection and interaction:

    -
      -
    • info target
    • -
    • show prompt
    • -
    • set prompt
    • -
    • show listsize
    • -
    • set listsize
    • -
    • show language
    • -
    • set language
    • -
    • show args
    • -
    • set args [args]
    • -
    +
    -

    TODO:

    -
      -
    • info frame
    • -
    • break
    • -
    • print
    • -
    • ptype
    • - -
    • info types
    • -
    • info variables
    • -
    • info program
    • - -
    • info args
    • -
    • info locals
    • -
    • info catch
    • -
    • ... many others
    • -
    -

    +
    +
    +%llvm.dbg.subrange.type = type {
    +  i32,    ;; Tag = 33 + LLVMDebugVersion (DW_TAG_subrange_type)
    +  i64,    ;; Low value
    +  i64     ;; High value
    +}
    +
    - -
    - Architecture of the LLVM debugger +

    These descriptors are used to define ranges of array subscripts for an array + composite type. The low value defines + the lower bounds typically zero for C/C++. The high value is the upper + bounds. Values are 64 bit. High - low + 1 is the size of the array. If low + == high the array will be unbounded.

    + +
    + + + -
    -

    -The LLVM debugger is built out of three distinct layers of software. These -layers provide clients with different interface options depending on what pieces -of they want to implement themselves, and it also promotes code modularity and -good design. The three layers are the Debugger -interface, the "info" interfaces, and the -llvm-db tool itself. -

    +
    +
    +!6 = metadata !{
    +  i32,      ;; Tag = 40 + LLVMDebugVersion 
    +            ;; (DW_TAG_enumerator)
    +  metadata, ;; Name
    +  i64       ;; Value
    +}
    +
    +
    + +

    These descriptors are used to define members of an + enumeration composite type, it + associates the name to the value.

    +
    -
    - The Debugger and InferiorProcess classes +
    -

    -The Debugger class (defined in the include/llvm/Debugger/ directory) is -a low-level class which is used to maintain information about the loaded -program, as well as start and stop the program running as necessary. This class -does not provide any high-level analysis or control over the program, only -exposing simple interfaces like load/unloadProgram, -create/killProgram, step/next/finish/contProgram, and -low-level methods for installing breakpoints. -

    - -

    -The Debugger class is itself a wrapper around the lowest-level InferiorProcess -class. This class is used to represent an instance of the program running under -debugger control. The InferiorProcess class can be implemented in different -ways for different targets and execution scenarios (e.g., remote debugging). -The InferiorProcess class exposes a small and simple collection of interfaces -which are useful for inspecting the current state of the program (such as -collecting stack trace information, reading the memory image of the process, -etc). The interfaces in this class are designed to be as low-level and simple -as possible, to make it easy to create new instances of the class. -

    - -

    -The Debugger class exposes the currently active instance of InferiorProcess -through the Debugger::getRunningProcess method, which returns a -const reference to the class. This means that clients of the Debugger -class can only inspect the running instance of the program directly. To -change the executing process in some way, they must use the interces exposed by -the Debugger class. -

    + +
    +
    +!7 = metadata !{
    +  i32,      ;; Tag (see below)
    +  metadata, ;; Context
    +  metadata, ;; Name
    +  metadata, ;; Reference to file where defined
    +  i32,      ;; Line number where defined
    +  metadata  ;; Type descriptor
    +}
    +
    +
    + +

    These descriptors are used to define variables local to a sub program. The + value of the tag depends on the usage of the variable:

    + +
    +
    +DW_TAG_auto_variable   = 256
    +DW_TAG_arg_variable    = 257
    +DW_TAG_return_variable = 258
    +
    +
    + +

    An auto variable is any variable declared in the body of the function. An + argument variable is any variable that appears as a formal argument to the + function. A return variable is used to track the result of a function and + has no source correspondent.

    + +

    The context is either the subprogram or block where the variable is defined. + Name the source variable name. Compile unit and line indicate where the + variable was defined. Type descriptor defines the declared type of the + variable.

    +
    -

    -The next-highest level of debugger abstraction is provided through the -ProgramInfo, RuntimeInfo, SourceLanguage and related classes (also defined in -the include/llvm/Debugger/ directory). These classes efficiently -decode the debugging information and low-level interfaces exposed by -InferiorProcess into a higher-level representation, suitable for analysis by the -debugger. -

    - -

    -The ProgramInfo class exposes a variety of different kinds of information about -the program objects in the source-level-language. The SourceFileInfo class -represents a source-file in the program (e.g. a .cpp or .h file). The -SourceFileInfo class captures information such as which SourceLanguage was used -to compile the file, where the debugger can get access to the actual file text -(which is lazily loaded on demand), etc. The SourceFunctionInfo class -represents a... FIXME: finish. The ProgramInfo class provides interfaces -to lazily find and decode the information needed to create the Source*Info -classes requested by the debugger. -

    - -

    -The RuntimeInfo class exposes information about the currently executed program, -by decoding information from the InferiorProcess and ProgramInfo classes. It -provides a StackFrame class which provides an easy-to-use interface for -inspecting the current and suspended stack frames in the program. -

    - -

    -The SourceLanguage class is an abstract interface used by the debugger to -perform all source-language-specific tasks. For example, this interface is used -by the ProgramInfo class to decode language-specific types and functions and by -the debugger front-end (such as llvm-db to -evaluate source-langauge expressions typed into the debugger. This class uses -the RuntimeInfo & ProgramInfo classes to get information about the current -execution context and the loaded program, respectively. -

    + +

    LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to + provide debug information at various points in generated code.

    -
    - The llvm-db tool +
    -

    -The llvm-db is designed to be a debugger providing an interface as similar to GDB as reasonable, but no more so than that. -Because the Debugger and info classes implement all of the heavy lifting and -analysis, llvm-db (which lives in llvm/tools/llvm-db) consists -mainly of of code to interact with the user and parse commands. The CLIDebugger -constructor registers all of the builtin commands for the debugger, and each -command is implemented as a CLIDebugger::[name]Command method. -

    +
    +  void %llvm.dbg.declare(metadata, metadata)
    +
    + +

    This intrinsic provides information about a local element (ex. variable.) The + first argument is metadata holding alloca for the variable.. The + second argument is + the %llvm.dbg.variable containing + the description of the variable.

    + +
    + + + +
    +
    +  void %llvm.dbg.value(metadata, i64, metadata)
    +
    + +

    This intrinsic provides information when a user source variable is set to a + new value. The first argument is the new value (wrapped as metadata). The + second argument is the offset in the user source variable where the new value + is written. The third argument is + the %llvm.dbg.variable containing + the description of the user source variable.

    + +
    +

    In many languages, the local variables in functions can have their lifetimes + or scopes limited to a subset of a function. In the C family of languages, + for example, variables are only live (readable and writable) within the + source block that they are defined in. In functional languages, values are + only readable after they have been defined. Though this is a very obvious + concept, it is non-trivial to model in LLVM, because it has no notion of + scoping in this sense, and does not want to be tied to a language's scoping + rules.

    + +

    In order to handle this, the LLVM debug format uses the metadata attached to + llvm instructions to encode line number and scoping information. Consider + the following C fragment, for example:

    + +
    +
    +1.  void foo() {
    +2.    int X = 21;
    +3.    int Y = 22;
    +4.    {
    +5.      int Z = 23;
    +6.      Z = X;
    +7.    }
    +8.    X = Y;
    +9.  }
    +
    +
    -

    -FIXME: this section will eventually go away. These are notes to myself of -things that should be implemented, but haven't yet. -

    - -

    -Breakpoints: Support is already implemented in the 'InferiorProcess' -class, though it hasn't been tested yet. To finish breakpoint support, we need -to implement breakCommand (which should reuse the linespec parser from the list -command), and handle the fact that 'break foo' or 'break file.c:53' may insert -multiple breakpoints. Also, if you say 'break file.c:53' and there is no -stoppoint on line 53, the breakpoint should go on the next available line. My -idea was to have the Debugger class provide a "Breakpoint" class which -encapsulated this messiness, giving the debugger front-end a simple interface. -The debugger front-end would have to map the really complex semantics of -temporary breakpoints and 'conditional' breakpoints onto this intermediate -level. Also, breakpoints should survive as much as possible across program -reloads. -

    - -

    -UnixLocalInferiorProcess.cpp speedup: There is no reason for the debugged -process to code gen the globals corresponding to debug information. The -IntrinsicLowering object could instead change descriptors into constant expr -casts of the constant address of the LLVM objects for the descriptors. This -would also allow us to eliminate the mapping back and forth between physical -addresses that must be done.

    - -

    -Process deaths: The InferiorProcessDead exception should be extended to -know "how" a process died, i.e., it was killed by a signal. This is easy to -collect in the UnixLocalInferiorProcess, we just need to represent it.

    +

    Compiled to LLVM, this function would be represented like this:

    + +
    +
    +define void @foo() nounwind ssp {
    +entry:
    +  %X = alloca i32, align 4                        ; <i32*> [#uses=4]
    +  %Y = alloca i32, align 4                        ; <i32*> [#uses=4]
    +  %Z = alloca i32, align 4                        ; <i32*> [#uses=3]
    +  %0 = bitcast i32* %X to {}*                     ; <{}*> [#uses=1]
    +  call void @llvm.dbg.declare({}* %0, metadata !0), !dbg !7
    +  store i32 21, i32* %X, !dbg !8
    +  %1 = bitcast i32* %Y to {}*                     ; <{}*> [#uses=1]
    +  call void @llvm.dbg.declare({}* %1, metadata !9), !dbg !10
    +  store i32 22, i32* %Y, !dbg !11
    +  %2 = bitcast i32* %Z to {}*                     ; <{}*> [#uses=1]
    +  call void @llvm.dbg.declare({}* %2, metadata !12), !dbg !14
    +  store i32 23, i32* %Z, !dbg !15
    +  %tmp = load i32* %X, !dbg !16                   ; <i32> [#uses=1]
    +  %tmp1 = load i32* %Y, !dbg !16                  ; <i32> [#uses=1]
    +  %add = add nsw i32 %tmp, %tmp1, !dbg !16        ; <i32> [#uses=1]
    +  store i32 %add, i32* %Z, !dbg !16
    +  %tmp2 = load i32* %Y, !dbg !17                  ; <i32> [#uses=1]
    +  store i32 %tmp2, i32* %X, !dbg !17
    +  ret void, !dbg !18
    +}
    +
    +declare void @llvm.dbg.declare({}*, metadata) nounwind readnone
    +
    +!0 = metadata !{i32 459008, metadata !1, metadata !"X", 
    +                metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ]
    +!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
    +!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo", 
    +               metadata !"foo", metadata !3, i32 1, metadata !4, 
    +               i1 false, i1 true}; [DW_TAG_subprogram ]
    +!3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", 
    +                metadata !"/private/tmp", metadata !"clang 1.1", i1 true, 
    +                i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ]
    +!4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0, 
    +                i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ]
    +!5 = metadata !{null}
    +!6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0, 
    +                i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ]
    +!7 = metadata !{i32 2, i32 7, metadata !1, null}
    +!8 = metadata !{i32 2, i32 3, metadata !1, null}
    +!9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3, 
    +                metadata !6}; [ DW_TAG_auto_variable ]
    +!10 = metadata !{i32 3, i32 7, metadata !1, null}
    +!11 = metadata !{i32 3, i32 3, metadata !1, null}
    +!12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5, 
    +                 metadata !6}; [ DW_TAG_auto_variable ]
    +!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
    +!14 = metadata !{i32 5, i32 9, metadata !13, null}
    +!15 = metadata !{i32 5, i32 5, metadata !13, null}
    +!16 = metadata !{i32 6, i32 5, metadata !13, null}
    +!17 = metadata !{i32 8, i32 3, metadata !1, null}
    +!18 = metadata !{i32 9, i32 1, metadata !2, null}
    +
    +
    + +

    This example illustrates a few important details about LLVM debugging + information. In particular, it shows how the llvm.dbg.declare + intrinsic and location information, which are attached to an instruction, + are applied together to allow a debugger to analyze the relationship between + statements, variable definitions, and the code used to implement the + function.

    + +
    +
    +call void @llvm.dbg.declare({}* %0, metadata !0), !dbg !7   
    +
    +
    + +

    The first intrinsic + %llvm.dbg.declare + encodes debugging information for the variable X. The metadata + !dbg !7 attached to the intrinsic provides scope information for the + variable X.

    + +
    +
    +!7 = metadata !{i32 2, i32 7, metadata !1, null}
    +!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
    +!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", 
    +                metadata !"foo", metadata !"foo", metadata !3, i32 1, 
    +                metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ]   
    +
    +
    + +

    Here !7 is metadata providing location information. It has four + fields: line number, column number, scope, and original scope. The original + scope represents inline location if this instruction is inlined inside a + caller, and is null otherwise. In this example, scope is encoded by + !1. !1 represents a lexical block inside the scope + !2, where !2 is a + subprogram descriptor. This way the + location information attached to the intrinsics indicates that the + variable X is declared at line number 2 at a function level scope in + function foo.

    + +

    Now lets take another example.

    + +
    +
    +call void @llvm.dbg.declare({}* %2, metadata !12), !dbg !14
    +
    +
    + +

    The second intrinsic + %llvm.dbg.declare + encodes debugging information for variable Z. The metadata + !dbg !14 attached to the intrinsic provides scope information for + the variable Z.

    + +
    +
    +!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
    +!14 = metadata !{i32 5, i32 9, metadata !13, null}
    +
    +
    + +

    Here !14 indicates that Z is declared at line number 5 and + column number 9 inside of lexical scope !13. The lexical scope + itself resides inside of lexical scope !1 described above.

    + +

    The scope information attached with each instruction provides a + straightforward way to find instructions covered by a scope.

    -

    LLVM debugging information has been carefully designed to make it possible -for the optimizer to optimize the program and debugging information without -necessarily having to know anything about debugging information. In particular, -the global constant merging pass automatically eliminates duplicated debugging -information (often caused by header files), the global dead code elimination -pass automatically deletes debugging information for a function if it decides to -delete the function, and the linker eliminates debug information when it merges -linkonce functions.

    - -

    To do this, most of the debugging information (descriptors for types, -variables, functions, source files, etc) is inserted by the language front-end -in the form of LLVM global variables. These LLVM global variables are no -different from any other global variables, except that they have a web of LLVM -intrinsic functions that point to them. If the last references to a particular -piece of debugging information are deleted (for example, by the --globaldce pass), the extraneous debug information will automatically -become dead and be removed by the optimizer.

    - -

    The debugger is designed to be agnostic about the contents of most of the -debugging information. It uses a source-language-specific -module to decode the information that represents variables, types, -functions, namespaces, etc: this allows for arbitrary source-language semantics -and type-systems to be used, as long as there is a module written for the -debugger to interpret the information. -

    - -

    -To provide basic functionality, the LLVM debugger does have to make some -assumptions about the source-level language being debugged, though it keeps -these to a minimum. The only common features that the LLVM debugger assumes -exist are source files, and program objects. These abstract objects are -used by the debugger to form stack traces, show information about local -variables, etc. - -

    This section of the documentation first describes the representation aspects -common to any source-language. The next section -describes the data layout conventions used by the C and C++ front-ends.

    +

    The C and C++ front-ends represent information about the program in a format + that is effectively identical + to DWARF 3.0 in + terms of information content. This allows code generators to trivially + support native debuggers by generating standard dwarf information, and + contains enough information for non-dwarf targets to translate it as + needed.

    + +

    This section describes the forms used to represent C and C++ programs. Other + languages could pattern themselves after this (which itself is tuned to + representing programs in the same way that DWARF 3 does), or they could + choose to provide completely different forms if they don't fit into the DWARF + model. As support for debugging information gets added to the various LLVM + source-language front-ends, the information used should be documented + here.

    + +

    The following sections provide examples of various C/C++ constructs and the + debug information that would best describe those constructs.

    -

    -One important aspect of the LLVM debug representation is that it allows the LLVM -debugger to efficiently index all of the global objects without having the scan -the program. To do this, all of the global objects use "anchor" globals of type -"{}", with designated names. These anchor objects obviously do not -contain any content or meaning by themselves, but all of the global objects of a -particular type (e.g., source file descriptors) contain a pointer to the anchor. -This pointer allows the debugger to use def-use chains to find all global -objects of that type. -

    - -

    -So far, the following names are recognized as anchors by the LLVM debugger: -

    - -

    -  %llvm.dbg.translation_units = linkonce global {} {}
    -  %llvm.dbg.globals         = linkonce global {} {}
    -

    - -

    -Using anchors in this way (where the source file descriptor points to the -anchors, as opposed to having a list of source file descriptors) allows for the -standard dead global elimination and merging passes to automatically remove -unused debugging information. If the globals were kept track of through lists, -there would always be an object pointing to the descriptors, thus would never be -deleted. -

    +

    Given the source files MySource.cpp and MyHeader.h located + in the directory /Users/mine/sources, the following code:

    + +
    +
    +#include "MyHeader.h"
    +
    +int main(int argc, char *argv[]) {
    +  return 0;
    +}
    +
    +
    + +

    a C/C++ front-end would generate the following descriptors:

    + +
    +
    +...
    +;;
    +;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp".
    +;;
    +!2 = metadata !{
    +  i32 524305,    ;; Tag
    +  i32 0,         ;; Unused
    +  i32 4,         ;; Language Id
    +  metadata !"MySource.cpp", 
    +  metadata !"/Users/mine/sources", 
    +  metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)", 
    +  i1 true,       ;; Main Compile Unit
    +  i1 false,      ;; Optimized compile unit
    +  metadata !"",  ;; Compiler flags
    +  i32 0}         ;; Runtime version
    +
    +;;
    +;; Define the file for the file "/Users/mine/sources/MySource.cpp".
    +;;
    +!1 = metadata !{
    +  i32 524329,    ;; Tag
    +  metadata !"MySource.cpp", 
    +  metadata !"/Users/mine/sources", 
    +  metadata !2    ;; Compile unit
    +}
    +
    +;;
    +;; Define the file for the file "/Users/mine/sources/Myheader.h"
    +;;
    +!3 = metadata !{
    +  i32 524329,    ;; Tag
    +  metadata !"Myheader.h"
    +  metadata !"/Users/mine/sources", 
    +  metadata !2    ;; Compile unit
    +}
    +
    +...
    +
    +

    llvm::Instruction provides easy access to metadata attached with an +instruction. One can extract line number information encoded in LLVM IR +using Instruction::getMetadata() and +DILocation::getLineNumber(). +

    + if (MDNode *N = I->getMetadata("dbg")) {  // Here I is an LLVM instruction
    +   DILocation Loc(N);                      // DILocation is in DebugInfo.h
    +   unsigned Line = Loc.getLineNumber();
    +   StringRef File = Loc.getFilename();
    +   StringRef Dir = Loc.getDirectory();
    + }
    +
    +
    -

    LLVM debugger "stop points" are a key part of the debugging representation -that allows the LLVM to maintain simple semantics for debugging optimized code. The basic idea is that the -front-end inserts calls to the %llvm.dbg.stoppoint intrinsic function -at every point in the program where the debugger should be able to inspect the -program (these correspond to places the debugger stops when you "step" -through it). The front-end can choose to place these as fine-grained as it -would like (for example, before every subexpression evaluated), but it is -recommended to only put them after every source statement that includes -executable code.

    - -

    -Using calls to this intrinsic function to demark legal points for the debugger -to inspect the program automatically disables any optimizations that could -potentially confuse debugging information. To non-debug-information-aware -transformations, these calls simply look like calls to an external function, -which they must assume to do anything (including reading or writing to any part -of reachable memory). On the other hand, it does not impact many optimizations, -such as code motion of non-trapping instructions, nor does it impact -optimization of subexpressions, code duplication transformations, or basic-block -reordering transformations.

    - -

    -An important aspect of the calls to the %llvm.dbg.stoppoint intrinsic -is that the function-local debugging information is woven together with use-def -chains. This makes it easy for the debugger to, for example, locate the 'next' -stop point. For a concrete example of stop points, see the example in the next section.

    +

    Given an integer global variable declared as follows:

    + +
    +
    +int MyGlobal = 100;
    +
    +
    + +

    a C/C++ front-end would generate the following descriptors:

    + +
    +
    +;;
    +;; Define the global itself.
    +;;
    +%MyGlobal = global int 100
    +...
    +;;
    +;; List of debug info of globals
    +;;
    +!llvm.dbg.gv = !{!0}
    +
    +;;
    +;; Define the global variable descriptor.  Note the reference to the global
    +;; variable anchor and the global variable itself.
    +;;
    +!0 = metadata !{
    +  i32 524340,              ;; Tag
    +  i32 0,                   ;; Unused
    +  metadata !1,             ;; Context
    +  metadata !"MyGlobal",    ;; Name
    +  metadata !"MyGlobal",    ;; Display Name
    +  metadata !"MyGlobal",    ;; Linkage Name
    +  metadata !3,             ;; Compile Unit
    +  i32 1,                   ;; Line Number
    +  metadata !4,             ;; Type
    +  i1 false,                ;; Is a local variable
    +  i1 true,                 ;; Is this a definition
    +  i32* @MyGlobal           ;; The global variable
    +}
     
    +;;
    +;; Define the basic type of 32 bit signed integer.  Note that since int is an
    +;; intrinsic type the source file is NULL and line 0.
    +;;    
    +!4 = metadata !{
    +  i32 524324,              ;; Tag
    +  metadata !1,             ;; Context
    +  metadata !"int",         ;; Name
    +  metadata !1,             ;; File
    +  i32 0,                   ;; Line number
    +  i64 32,                  ;; Size in Bits
    +  i64 32,                  ;; Align in Bits
    +  i64 0,                   ;; Offset in Bits
    +  i32 0,                   ;; Flags
    +  i32 5                    ;; Encoding
    +}
    +
    +
    +
    -

    -In many languages, the local variables in functions can have their lifetime or -scope limited to a subset of a function. In the C family of languages, for -example, variables are only live (readable and writable) within the source block -that they are defined in. In functional languages, values are only readable -after they have been defined. Though this is a very obvious concept, it is also -non-trivial to model in LLVM, because it has no notion of scoping in this sense, -and does not want to be tied to a language's scoping rules. -

    - -

    -In order to handle this, the LLVM debug format uses the notion of "regions" of a -function, delineated by calls to intrinsic functions. These intrinsic functions -define new regions of the program and indicate when the region lifetime expires. -Consider the following C fragment, for example: -

    - -

    -1.  void foo() {
    -2.    int X = ...;
    -3.    int Y = ...;
    -4.    {
    -5.      int Z = ...;
    -6.      ...
    -7.    }
    -8.    ...
    -9.  }
    -

    - -

    -Compiled to LLVM, this function would be represented like this (FIXME: CHECK AND -UPDATE THIS): -

    - -

    -void %foo() {
    -    %X = alloca int
    -    %Y = alloca int
    -    %Z = alloca int
    -    %D1 = call {}* %llvm.dbg.func.start(%lldb.global* %d.foo)
    -    %D2 = call {}* %llvm.dbg.stoppoint({}* %D1, uint 2, uint 2, %lldb.compile_unit* %file)
    -
    -    %D3 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D2, ...)
    -    ;; Evaluate expression on line 2, assigning to X.
    -    %D4 = call {}* %llvm.dbg.stoppoint({}* %D3, uint 3, uint 2, %lldb.compile_unit* %file)
    -
    -    %D5 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D4, ...)
    -    ;; Evaluate expression on line 3, assigning to Y.
    -    %D6 = call {}* %llvm.dbg.stoppoint({}* %D5, uint 5, uint 4, %lldb.compile_unit* %file)
    -
    -    %D7 = call {}* %llvm.region.start({}* %D6)
    -    %D8 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D7, ...)
    -    ;; Evaluate expression on line 5, assigning to Z.
    -    %D9 = call {}* %llvm.dbg.stoppoint({}* %D8, uint 6, uint 4, %lldb.compile_unit* %file)
    -
    -    ;; Code for line 6.
    -    %D10 = call {}* %llvm.region.end({}* %D9)
    -    %D11 = call {}* %llvm.dbg.stoppoint({}* %D10, uint 8, uint 2, %lldb.compile_unit* %file)
    -
    -    ;; Code for line 8.
    -    %D12 = call {}* %llvm.region.end({}* %D11)
    -    ret void
    +
    +

    Given a function declared as follows:

    + +
    +
    +int main(int argc, char *argv[]) {
    +  return 0;
     }
    -

    - -

    -This example illustrates a few important details about the LLVM debugging -information. In particular, it shows how the various intrinsics used are woven -together with def-use and use-def chains, similar to how anchors are used with globals. This allows the -debugger to analyze the relationship between statements, variable definitions, -and the code used to implement the function.

    - -

    -In this example, two explicit regions are defined, one with the definition of the %D1 variable and one with the -definition of %D7. In the case of -%D1, the debug information indicates that the function whose descriptor is specified as an argument to the -intrinsic. This defines a new stack frame whose lifetime ends when the region -is ended by the %D12 call.

    - -

    -Using regions to represent the boundaries of source-level functions allow LLVM -interprocedural optimizations to arbitrarily modify LLVM functions without -having to worry about breaking mapping information between the LLVM code and the -and source-level program. In particular, the inliner requires no modification -to support inlining with debugging information: there is no explicit correlation -drawn between LLVM functions and their source-level counterparts (note however, -that if the inliner inlines all instances of a non-strong-linkage function into -its caller that it will not be possible for the user to manually invoke the -inlined function from the debugger).

    - -

    -Once the function has been defined, the stopping point corresponding to line #2 of the -function is encountered. At this point in the function, no local -variables are live. As lines 2 and 3 of the example are executed, their -variable definitions are automatically introduced into the program, without the -need to specify a new region. These variables do not require new regions to be -introduced because they go out of scope at the same point in the program: line -9. -

    - -

    -In contrast, the Z variable goes out of scope at a different time, on -line 7. For this reason, it is defined within the -%D7 region, which kills the availability of Z before the -code for line 8 is executed. In this way, regions can support arbitrary -source-language scoping rules, as long as they can only be nested (ie, one scope -cannot partially overlap with a part of another scope). -

    - -

    -It is worth noting that this scoping mechanism is used to control scoping of all -declarations, not just variable declarations. For example, the scope of a C++ -using declaration is controlled with this, and the llvm-db C++ support -routines could use this to change how name lookup is performed (though this is -not implemented yet). -

    +
    +
    +

    a C/C++ front-end would generate the following descriptors:

    + +
    +
    +;;
    +;; Define the anchor for subprograms.  Note that the second field of the
    +;; anchor is 46, which is the same as the tag for subprograms
    +;; (46 = DW_TAG_subprogram.)
    +;;
    +!6 = metadata !{
    +  i32 524334,        ;; Tag
    +  i32 0,             ;; Unused
    +  metadata !1,       ;; Context
    +  metadata !"main",  ;; Name
    +  metadata !"main",  ;; Display name
    +  metadata !"main",  ;; Linkage name
    +  metadata !1,       ;; File
    +  i32 1,             ;; Line number
    +  metadata !4,       ;; Type
    +  i1 false,          ;; Is local 
    +  i1 true            ;; Is definition
    +}
    +;;
    +;; Define the subprogram itself.
    +;;
    +define i32 @main(i32 %argc, i8** %argv) {
    +...
    +}
    +
    +
    -

    -The LLVM debugger expects the descriptors for program objects to start in a -canonical format, but the descriptors can include additional information -appended at the end that is source-language specific. All LLVM debugging -information is versioned, allowing backwards compatibility in the case that the -core structures need to change in some way. Also, all debugging information -objects start with a tag to indicate what type -of object it is. The source-language is allows to define its own objects, by -using unreserved tag numbers.

    -

    The lowest-level descriptor are those describing the files containing the program source -code, as most other descriptors (sometimes indirectly) refer to them. -

    -
    +

    The following are the basic type descriptors for C/C++ core types:

    +
    - +
    -

    -Source file descriptors are patterned after the Dwarf "compile_unit" object. -The descriptor currently is defined to have at least the following LLVM -type entries:

    - -

    -%lldb.compile_unit = type {
    -       uint,                 ;; Tag: LLVM_COMPILE_UNIT
    -       ushort,               ;; LLVM debug version number
    -       ushort,               ;; Dwarf language identifier
    -       sbyte*,               ;; Filename
    -       sbyte*,               ;; Working directory when compiled
    -       sbyte*                ;; Producer of the debug information
    +
    +
    +
    +!2 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"bool",  ;; Name
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 8,             ;; Size in Bits
    +  i64 8,             ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 2              ;; Encoding
     }
    -

    - -

    -These descriptors contain the version number for the debug info, a source -language ID for the file (we use the Dwarf 3.0 ID numbers, such as -DW_LANG_C89, DW_LANG_C_plus_plus, DW_LANG_Cobol74, -etc), three strings describing the filename, working directory of the compiler, -and an identifier string for the compiler that produced it. Note that actual -compile_unit declarations must also include an anchor to llvm.dbg.translation_units, -but it is not specified where the anchor is to be located. Here is an example -descriptor: -

    - -

    -%arraytest_source_file = internal constant %lldb.compile_unit {
    -    uint 17,                                                      ; Tag value
    -    ushort 0,                                                     ; Version #0
    -    ushort 1,                                                     ; DW_LANG_C89
    -    sbyte* getelementptr ([12 x sbyte]* %.str_1, long 0, long 0), ; filename
    -    sbyte* getelementptr ([12 x sbyte]* %.str_2, long 0, long 0), ; working dir
    -    sbyte* getelementptr ([12 x sbyte]* %.str_3, long 0, long 0), ; producer
    -    {}* %llvm.dbg.translation_units                               ; Anchor
    +
    +
    + +
    + + +
    + char +
    + +
    + +
    +
    +!2 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"char",  ;; Name
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 8,             ;; Size in Bits
    +  i64 8,             ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 6              ;; Encoding
     }
    -%.str_1 = internal constant [12 x sbyte] c"arraytest.c\00"
    -%.str_2 = internal constant [12 x sbyte] c"/home/sabre\00"
    -%.str_3 = internal constant [12 x sbyte] c"llvmgcc 3.4\00"
    -

    + +
    + +
    + + + -

    -Note that the LLVM constant merging pass should eliminate duplicate copies of -the strings that get emitted to each translation unit, such as the producer. -

    +
    +
    +
    +!2 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"unsigned char", 
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 8,             ;; Size in Bits
    +  i64 8,             ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 8              ;; Encoding
    +}
    +
    +
    - +
    -

    -The LLVM debugger needs to know about some source-language program objects, in -order to build stack traces, print information about local variables, and other -related activities. The LLVM debugger differentiates between three different -types of program objects: subprograms (functions, messages, methods, etc), -variables (locals and globals), and others. Because source-languages have -widely varying forms of these objects, the LLVM debugger expects only a few -fields in the descriptor for each object: -

    - -

    -%lldb.object = type {
    -       uint,                  ;; A tag
    -       any*,                  ;; The context for the object
    -       sbyte*                 ;; The object 'name'
    +
    +
    +
    +!2 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"short int",
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 16,            ;; Size in Bits
    +  i64 16,            ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 5              ;; Encoding
     }
    -

    - -

    -The first field contains a tag for the descriptor. The second field contains -either a pointer to the descriptor for the containing source file, or it contains a pointer to -another program object whose context pointer eventually reaches a source file. -Through this context pointer, the -LLVM debugger can establish the debug version number of the object.

    - -

    -The third field contains a string that the debugger can use to identify the -object if it does not contain explicit support for the source-language in use -(ie, the 'unknown' source language handler uses this string). This should be -some sort of unmangled string that corresponds to the object, but it is a -quality of implementation issue what exactly it contains (it is legal, though -not useful, for all of these strings to be null). -

    - -

    -Note again that descriptors can be extended to include source-language-specific -information in addition to the fields required by the LLVM debugger. See the section on the C/C++ front-end for more -information. Also remember that global objects (functions, selectors, global -variables, etc) must contain an anchor to -the llvm.dbg.globals variable. -

    +
    +
    -
    - Program object contexts +
    -

    -Allow source-language specific contexts, use to identify namespaces etc
    -Must end up in a source file descriptor.
    -Debugger core ignores all unknown context objects.
    -

    + +
    +
    +!2 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"short unsigned int",
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 16,            ;; Size in Bits
    +  i64 16,            ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 7              ;; Encoding
    +}
    +
    +
    + +
    + int +
    + +
    + +
    +
    +!2 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"int",   ;; Name
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 32,            ;; Size in Bits
    +  i64 32,            ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 5              ;; Encoding
    +}
    +
    + +
    -
    - Debugger intrinsic functions +
    -

    -Define each intrinsics, as an extension of the language reference manual.
     
    -llvm.dbg.stoppoint
    -llvm.dbg.region.start
    -llvm.dbg.region.end
    -llvm.dbg.function.start
    -llvm.dbg.declare
    -

    +
    +
    +!2 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"unsigned int",
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 32,            ;; Size in Bits
    +  i64 32,            ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 7              ;; Encoding
    +}
    +
    +
    + +
    + + + +
    + +
    +
    +!2 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"long long int",
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 64,            ;; Size in Bits
    +  i64 64,            ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 5              ;; Encoding
    +}
    +
    +
    +
    -
    - Values for debugger tags +
    -

    -Happen to be the same value as the similarly named Dwarf-3 tags, this may change -in the future. -

    +
    +
    +!2 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"long long unsigned int",
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 64,            ;; Size in Bits
    +  i64 64,            ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 7              ;; Encoding
    +}
    +
    +
    + +
    -

    -

    -  LLVM_COMPILE_UNIT     : 17
    -  LLVM_SUBPROGRAM       : 46
    -  LLVM_VARIABLE         : 52
    -
    -

    + +
    + float
    +
    +
    +
    +!2 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"float",
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 32,            ;; Size in Bits
    +  i64 32,            ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 4              ;; Encoding
    +}
    +
    +
    - - + + +
    + double
    -

    -The C and C++ front-ends represent information about the program in a format -that is effectively identical to Dwarf 3.0 in terms of -information content. This allows code generators to trivially support native -debuggers by generating standard dwarf information, and contains enough -information for non-dwarf targets to translate it as needed.

    - -

    -The basic debug information required by the debugger is (intentionally) designed -to be as minimal as possible. This basic information is so minimal that it is -unlikely that any source-language could be adequately described by it. -Because of this, the debugger format was designed for extension to support -source-language-specific information. The extended descriptors are read and -interpreted by the language-specific modules in the -debugger if there is support available, otherwise it is ignored. -

    - -

    -This section describes the extensions used to represent C and C++ programs. -Other languages could pattern themselves after this (which itself is tuned to -representing programs in the same way that Dwarf 3 does), or they could choose -to provide completely different extensions if they don't fit into the Dwarf -model. As support for debugging information gets added to the various LLVM -source-language front-ends, the information used should be documented here. -

    +
    +
    +!2 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"double",;; Name
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 64,            ;; Size in Bits
    +  i64 64,            ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 4              ;; Encoding
    +}
    +
    +
    -

    -

    +

    Given the following as an example of C/C++ derived type:

    + +
    +
    +typedef const int *IntPtr;
    +
    - -
    - Compilation unit entries +

    a C/C++ front-end would generate the following descriptors:

    + +
    +
    +;;
    +;; Define the typedef "IntPtr".
    +;;
    +!2 = metadata !{
    +  i32 524310,          ;; Tag
    +  metadata !1,         ;; Context
    +  metadata !"IntPtr",  ;; Name
    +  metadata !3,         ;; File
    +  i32 0,               ;; Line number
    +  i64 0,               ;; Size in bits
    +  i64 0,               ;; Align in bits
    +  i64 0,               ;; Offset in bits
    +  i32 0,               ;; Flags
    +  metadata !4          ;; Derived From type
    +}
    +
    +;;
    +;; Define the pointer type.
    +;;
    +!4 = metadata !{
    +  i32 524303,          ;; Tag
    +  metadata !1,         ;; Context
    +  metadata !"",        ;; Name
    +  metadata !1,         ;; File
    +  i32 0,               ;; Line number
    +  i64 64,              ;; Size in bits
    +  i64 64,              ;; Align in bits
    +  i64 0,               ;; Offset in bits
    +  i32 0,               ;; Flags
    +  metadata !5          ;; Derived From type
    +}
    +;;
    +;; Define the const type.
    +;;
    +!5 = metadata !{
    +  i32 524326,          ;; Tag
    +  metadata !1,         ;; Context
    +  metadata !"",        ;; Name
    +  metadata !1,         ;; File
    +  i32 0,               ;; Line number
    +  i64 32,              ;; Size in bits
    +  i64 32,              ;; Align in bits
    +  i64 0,               ;; Offset in bits
    +  i32 0,               ;; Flags
    +  metadata !6          ;; Derived From type
    +}
    +;;
    +;; Define the int type.
    +;;
    +!6 = metadata !{
    +  i32 524324,          ;; Tag
    +  metadata !1,         ;; Context
    +  metadata !"int",     ;; Name
    +  metadata !1,         ;; File
    +  i32 0,               ;; Line number
    +  i64 32,              ;; Size in bits
    +  i64 32,              ;; Align in bits
    +  i64 0,               ;; Offset in bits
    +  i32 0,               ;; Flags
    +  5                    ;; Encoding
    +}
    +
    -
    -

    -Translation units do not add any information over the standard source file representation already -expected by the debugger. As such, it uses descriptors of the type specified, -with a trailing anchor. -

    - -
    - Module, namespace, and importing entries + +
    -

    -

    +

    Given the following as an example of C/C++ struct type:

    + +
    +
    +struct Color {
    +  unsigned Red;
    +  unsigned Green;
    +  unsigned Blue;
    +};
    +
    +
    + +

    a C/C++ front-end would generate the following descriptors:

    + +
    +
    +;;
    +;; Define basic type for unsigned int.
    +;;
    +!5 = metadata !{
    +  i32 524324,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"unsigned int",
    +  metadata !1,       ;; File
    +  i32 0,             ;; Line number
    +  i64 32,            ;; Size in Bits
    +  i64 32,            ;; Align in Bits
    +  i64 0,             ;; Offset in Bits
    +  i32 0,             ;; Flags
    +  i32 7              ;; Encoding
    +}
    +;;
    +;; Define composite type for struct Color.
    +;;
    +!2 = metadata !{
    +  i32 524307,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"Color", ;; Name
    +  metadata !1,       ;; Compile unit
    +  i32 1,             ;; Line number
    +  i64 96,            ;; Size in bits
    +  i64 32,            ;; Align in bits
    +  i64 0,             ;; Offset in bits
    +  i32 0,             ;; Flags
    +  null,              ;; Derived From
    +  metadata !3,       ;; Elements
    +  i32 0              ;; Runtime Language
    +}
    +
    +;;
    +;; Define the Red field.
    +;;
    +!4 = metadata !{
    +  i32 524301,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"Red",   ;; Name
    +  metadata !1,       ;; File
    +  i32 2,             ;; Line number
    +  i64 32,            ;; Size in bits
    +  i64 32,            ;; Align in bits
    +  i64 0,             ;; Offset in bits
    +  i32 0,             ;; Flags
    +  metadata !5        ;; Derived From type
    +}
    +
    +;;
    +;; Define the Green field.
    +;;
    +!6 = metadata !{
    +  i32 524301,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"Green", ;; Name
    +  metadata !1,       ;; File
    +  i32 3,             ;; Line number
    +  i64 32,            ;; Size in bits
    +  i64 32,            ;; Align in bits
    +  i64 32,             ;; Offset in bits
    +  i32 0,             ;; Flags
    +  metadata !5        ;; Derived From type
    +}
    +
    +;;
    +;; Define the Blue field.
    +;;
    +!7 = metadata !{
    +  i32 524301,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"Blue",  ;; Name
    +  metadata !1,       ;; File
    +  i32 4,             ;; Line number
    +  i64 32,            ;; Size in bits
    +  i64 32,            ;; Align in bits
    +  i64 64,             ;; Offset in bits
    +  i32 0,             ;; Flags
    +  metadata !5        ;; Derived From type
    +}
    +
    +;;
    +;; Define the array of fields used by the composite type Color.
    +;;
    +!3 = metadata !{metadata !4, metadata !6, metadata !7}
    +
    +
    +
    -

    -

    +

    Given the following as an example of C/C++ enumeration type:

    + +
    +
    +enum Trees {
    +  Spruce = 100,
    +  Oak = 200,
    +  Maple = 300
    +};
    +
    +

    a C/C++ front-end would generate the following descriptors:

    + +
    +
    +;;
    +;; Define composite type for enum Trees
    +;;
    +!2 = metadata !{
    +  i32 524292,        ;; Tag
    +  metadata !1,       ;; Context
    +  metadata !"Trees", ;; Name
    +  metadata !1,       ;; File
    +  i32 1,             ;; Line number
    +  i64 32,            ;; Size in bits
    +  i64 32,            ;; Align in bits
    +  i64 0,             ;; Offset in bits
    +  i32 0,             ;; Flags
    +  null,              ;; Derived From type
    +  metadata !3,       ;; Elements
    +  i32 0              ;; Runtime language
    +}
    +
    +;;
    +;; Define the array of enumerators used by composite type Trees.
    +;;
    +!3 = metadata !{metadata !4, metadata !5, metadata !6}
    +
    +;;
    +;; Define Spruce enumerator.
    +;;
    +!4 = metadata !{i32 524328, metadata !"Spruce", i64 100}
    +
    +;;
    +;; Define Oak enumerator.
    +;;
    +!5 = metadata !{i32 524328, metadata !"Oak", i64 200}
    +
    +;;
    +;; Define Maple enumerator.
    +;;
    +!6 = metadata !{i32 524328, metadata !"Maple", i64 300}
    +
    +
    +
    + +
    +
    - +