X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FSourceLevelDebugging.html;h=918383bc21302f4aa807014eaab1bbe9ecb9fd9a;hb=ea2c50c0416555a91cf963618f07c90a4c791708;hp=db2c65339c957cce54f12c6e9f00d0a368b6e116;hpb=3d11beeaddc26700dbcdf5bac464e4a2dacf6e60;p=oota-llvm.git diff --git a/docs/SourceLevelDebugging.html b/docs/SourceLevelDebugging.html index db2c65339c9..918383bc213 100644 --- a/docs/SourceLevelDebugging.html +++ b/docs/SourceLevelDebugging.html @@ -2,12 +2,13 @@ "http://www.w3.org/TR/html4/strict.dtd">
+
-![]() |
This document is the central repository for all information pertaining to -debug information in LLVM. It describes the actual format -that the LLVM debug information takes, which is useful for those interested -in creating front-ends or dealing directly with the information. Further, this -document provides specifc examples of what debug information for C/C++.
- -The idea of the LLVM debugging information is to capture how the important -pieces of the source-language's Abstract Syntax Tree map onto LLVM code. -Several design aspects have shaped the solution that appears here. The -important ones are:
+ pieces of the source-language's Abstract Syntax Tree map onto LLVM code. + Several design aspects have shaped the solution that appears here. The + important ones are:The approach used by the LLVM implementation is to use a small set of intrinsic functions to define a mapping -between LLVM program objects and the source-level objects. The description of -the source-level program is maintained in LLVM global variables in an implementation-defined format (the C/C++ front-end -currently uses working draft 7 of the Dwarf 3 standard).
+The approach used by the LLVM implementation is to use a small set + of intrinsic functions to define a + mapping between LLVM program objects and the source-level objects. The + description of the source-level program is maintained in LLVM metadata + in an implementation-defined format + (the C/C++ front-end currently uses working draft 7 of + the DWARF 3 + standard).
When a program is being debugged, a debugger interacts with the user and -turns the stored debug information into source-language specific information. -As such, a debugger must be aware of the source-language, and is thus tied to -a specific language of family of languages.
+ turns the stored debug information into source-language specific information. + As such, a debugger must be aware of the source-language, and is thus tied to + a specific language or family of languages.The role of debug information is to provide meta information normally + stripped away during the compilation process. This meta information provides + an LLVM user a relationship between generated code and the original program + source code.
+ +Currently, debug information is consumed by DwarfDebug to produce dwarf + information used by the gdb debugger. Other targets could use the same + information to produce stabs or other debug forms.
+ +It would also be reasonable to use debug information to feed profiling tools + for analysis of generated code, or, tools for reconstructing the original + source from generated code.
+ +TODO - expound a bit more.
+An extremely high priority of LLVM debugging information is to make it -interact well with optimizations and analysis. In particular, the LLVM debug -information provides the following guarantees:
+ interact well with optimizations and analysis. In particular, the LLVM debug + information provides the following guarantees:Basically, the debug information allows you to compile a program with -"-O0 -g" and get full debug information, allowing you to arbitrarily -modify the program as it executes from a debugger. Compiling a program with -"-O3 -g" gives you full debug information that is always available and -accurate for reading (e.g., you get accurate stack traces despite tail call -elimination and inlining), but you might lose the ability to modify the program -and call functions where were optimized out of the program, or inlined away -completely.
+ "-O0 -g" and get full debug information, allowing you to arbitrarily + modify the program as it executes from a debugger. Compiling a program with + "-O3 -g" gives you full debug information that is always available + and accurate for reading (e.g., you get accurate stack traces despite tail + call elimination and inlining), but you might lose the ability to modify the + program and call functions where were optimized out of the program, or + inlined away completely. + +LLVM test suite provides a + framework to test optimizer's handling of debugging information. It can be + run like this:
+ ++% cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level +% make TEST=dbgopt ++
This will test impact of debugging information on optimization passes. If + debugging information influences optimization passes then it will be reported + as a failure. See TestingGuide for more + information on LLVM test infrastructure and how to run various tests.
+ +LLVM debugging information has been carefully designed to make it possible -for the optimizer to optimize the program and debugging information without -necessarily having to know anything about debugging information. In particular, -the global constant merging pass automatically eliminates duplicated debugging -information (often caused by header files), the global dead code elimination -pass automatically deletes debugging information for a function if it decides to -delete the function, and the linker eliminates debug information when it merges -linkonce functions.
+ for the optimizer to optimize the program and debugging information without + necessarily having to know anything about debugging information. In + particular, the use of metadata avoids duplicated debugging information from + the beginning, and the global dead code elimination pass automatically + deletes debugging information for a function if it decides to delete the + function.To do this, most of the debugging information (descriptors for types, -variables, functions, source files, etc) is inserted by the language front-end -in the form of LLVM global variables. These LLVM global variables are no -different from any other global variables, except that they have a web of LLVM -intrinsic functions that point to them. If the last references to a particular -piece of debugging information are deleted (for example, by the --globaldce pass), the extraneous debug information will automatically -become dead and be removed by the optimizer.
+ variables, functions, source files, etc) is inserted by the language + front-end in the form of LLVM metadata.Debug information is designed to be agnostic about the target debugger and -debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic -machine debug information pass to decode the information that represents -variables, types, functions, namespaces, etc: this allows for arbitrary -source-language semantics and type-systems to be used, as long as there is a -module written for the target debugger to interpret the information. In -addition, debug global variables are declared in the "llvm.metadata" -section. All values declared in this section are stripped away after target -debug information is constructed and before the program object is emitted.
+ debugging information representation (e.g. DWARF/Stabs/etc). It uses a + generic pass to decode the information that represents variables, types, + functions, namespaces, etc: this allows for arbitrary source-language + semantics and type-systems to be used, as long as there is a module + written for the target debugger to interpret the information.To provide basic functionality, the LLVM debugger does have to make some -assumptions about the source-level language being debugged, though it keeps -these to a minimum. The only common features that the LLVM debugger assumes -exist are source files, and program objects. These abstract objects are -used by a debugger to form stack traces, show information about local -variables, etc.
+ assumptions about the source-level language being debugged, though it keeps + these to a minimum. The only common features that the LLVM debugger assumes + exist are source files, + and program objects. These abstract + objects are used by a debugger to form stack traces, show information about + local variables, etc.This section of the documentation first describes the representation aspects -common to any source-language. The next section -describes the data layout conventions used by the C and C++ front-ends.
- -In consideration of the complexity and volume of debug information, LLVM -provides a specification for well formed debug global variables. The constant -value of each of these globals is one of a limited set of structures, known as -debug descriptors.
+ provides a specification for well formed debug descriptors.Consumers of LLVM debug information expect the descriptors for program -objects to start in a canonical format, but the descriptors can include -additional information appended at the end that is source-language specific. -All LLVM debugging information is versioned, allowing backwards compatibility in -the case that the core structures need to change in some way. Also, all -debugging information objects start with a tag to indicate what type of object -it is. The source-language is allowed to define its own objects, by using -unreserved tag numbers.
- -The fields of debug descriptors used internally by LLVM (MachineDebugInfo) -are restricted to only the simple data types int, uint, -bool, float, double, sbyte* and { }* -. References to arbitrary values are handled using a { }* and a -cast to { }* expression; typically references to other field -descriptors, arrays of descriptors or global variables.
- -- %llvm.dbg.object.type = type { - uint, ;; A tag - ... - } -- -
The first field of a descriptor is always an uint containing a tag -value identifying the content of the descriptor. The remaining fields are -specific to the descriptor. The values of tags are loosely bound to the tag -values of Dwarf information entries. However, that does not restrict the use of -the information supplied to Dwarf targets.
+ objects to start in a canonical format, but the descriptors can include + additional information appended at the end that is source-language + specific. All LLVM debugging information is versioned, allowing backwards + compatibility in the case that the core structures need to change in some + way. Also, all debugging information objects start with a tag to indicate + what type of object it is. The source-language is allowed to define its own + objects, by using unreserved tag numbers. We recommend using with tags in + the range 0x1000 through 0x2000 (there is a defined enum DW_TAG_user_base = + 0x1000.) + +The fields of debug descriptors used internally by LLVM + are restricted to only the simple data types i32, i1, + float, double, mdstring and mdnode.
+ ++!1 = metadata !{ + i32, ;; A tag + ... +} ++
The details of the various descriptors follow.
+ -The details of the various descriptors follow.
- +- %llvm.dbg.anchor.type = type { - uint, ;; Tag = 0 - uint ;; Tag of descriptors grouped by the anchor - } +!0 = metadata !{ + i32, ;; Tag = 17 + LLVMDebugVersion + ;; (DW_TAG_compile_unit) + i32, ;; Unused field. + i32, ;; DWARF language identifier (ex. DW_LANG_C89) + metadata, ;; Source file name + metadata, ;; Source file directory (includes trailing slash) + metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)") + i1, ;; True if this is a main compile unit. + i1, ;; True if this is optimized. + metadata, ;; Flags + i32 ;; Runtime version + metadata ;; List of enums types + metadata ;; List of retained types + metadata ;; List of subprograms + metadata ;; List of global variables +}+
One important aspect of the LLVM debug representation is that it allows the -LLVM debugger to efficiently index all of the global objects without having the -scan the program. To do this, all of the global objects use "anchor" -descriptors with designated names. All of the global objects of a particular -type (e.g., compile units) contain a pointer to the anchor. This pointer allows -a debugger to use def-use chains to find all global objects of that type.
- -The following names are recognized as anchors by LLVM:
+These descriptors contain a source language ID for the file (we use the DWARF + 3.0 ID numbers, such as DW_LANG_C89, DW_LANG_C_plus_plus, + DW_LANG_Cobol74, etc), three strings describing the filename, + working directory of the compiler, and an identifier string for the compiler + that produced it.
-- %llvm.dbg.compile_units = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 17 } ;; DW_TAG_compile_unit - %llvm.dbg.global_variables = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 52 } ;; DW_TAG_variable - %llvm.dbg.subprograms = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 46 } ;; DW_TAG_subprogram -- -
Using anchors in this way (where the compile unit descriptor points to the -anchors, as opposed to having a list of compile unit descriptors) allows for the -standard dead global elimination and merging passes to automatically remove -unused debugging information. If the globals were kept track of through lists, -there would always be an object pointing to the descriptors, thus would never be -deleted.
+Compile unit descriptors provide the root context for objects declared in a + specific compilation unit. File descriptors are defined using this context. + These descriptors are collected by a named metadata + !llvm.dbg.cu. Compile unit descriptor keeps track of subprograms, + global variables and type information.
- %llvm.dbg.compile_unit.type = type { - uint, ;; Tag = 17 (DW_TAG_compile_unit) - { }*, ;; Compile unit anchor = cast = (%llvm.dbg.anchor.type* %llvm.dbg.compile_units to { }*) - uint, ;; LLVM debug version number = 2 - uint, ;; Dwarf language identifier (ex. DW_LANG_C89) - sbyte*, ;; Source file name - sbyte*, ;; Source file directory (includes trailing slash) - sbyte* ;; Producer (ex. "4.0.1 LLVM (LLVM research group)") - } +!0 = metadata !{ + i32, ;; Tag = 41 + LLVMDebugVersion + ;; (DW_TAG_file_type) + metadata, ;; Source file name + metadata, ;; Source file directory (includes trailing slash) + metadata ;; Unused +}+
These descriptors contain the version number for the debug info (currently -2), a source language ID for the file (we use the Dwarf 3.0 ID numbers, such as -DW_LANG_C89, DW_LANG_C_plus_plus, DW_LANG_Cobol74, -etc), three strings describing the filename, working directory of the compiler, -and an identifier string for the compiler that produced it.
+These descriptors contain information for a file. Global variables and top + level functions would be defined using this context.k File descriptors also + provide context for source line correspondence.
-Compile unit descriptors provide the root context for objects declared in a -specific source file. Global variables and top level functions would be defined -using this context. Compile unit descriptors also provide context for source -line correspondence.
+Each input file is encoded as a separate file descriptor in LLVM debugging + information output.
- %llvm.dbg.global_variable.type = type { - uint, ;; Tag = 52 (DW_TAG_variable) - { }*, ;; Global variable anchor = cast (%llvm.dbg.anchor.type* %llvm.dbg.global_variables to { }*), - { }*, ;; Reference to context descriptor - sbyte*, ;; Name - { }*, ;; Reference to compile unit where defined - int, ;; Line number where defined - { }*, ;; Reference to type descriptor - bool, ;; True if the global is local to compile unit (static) - bool, ;; True if the global is defined in the compile unit (not extern) - { }* ;; Reference to the global variable - } +!1 = metadata !{ + i32, ;; Tag = 52 + LLVMDebugVersion + ;; (DW_TAG_variable) + i32, ;; Unused field. + metadata, ;; Reference to context descriptor + metadata, ;; Name + metadata, ;; Display name (fully qualified C++ name) + metadata, ;; MIPS linkage name (for C++) + metadata, ;; Reference to file where defined + i32, ;; Line number where defined + metadata, ;; Reference to type descriptor + i1, ;; True if the global is local to compile unit (static) + i1, ;; True if the global is defined in the compile unit (not extern) + {}* ;; Reference to the global variable +}+
These descriptors provide debug information about globals variables. The -provide details such as name, type and where the variable is defined.
+provide details such as name, type and where the variable is defined. All +global variables are collected inside the named metadata +!llvm.dbg.cu.- %llvm.dbg.subprogram.type = type { - uint, ;; Tag = 46 (DW_TAG_subprogram) - { }*, ;; Subprogram anchor = cast (%llvm.dbg.anchor.type* %llvm.dbg.subprograms to { }*), - { }*, ;; Reference to context descriptor - sbyte*, ;; Name - { }*, ;; Reference to compile unit where defined - int, ;; Line number where defined - { }*, ;; Reference to type descriptor - bool, ;; True if the global is local to compile unit (static) - bool, ;; True if the global is defined in the compile unit (not extern) - { }* ;; Reference to array of member descriptors - } +!2 = metadata !{ + i32, ;; Tag = 46 + LLVMDebugVersion + ;; (DW_TAG_subprogram) + i32, ;; Unused field. + metadata, ;; Reference to context descriptor + metadata, ;; Name + metadata, ;; Display name (fully qualified C++ name) + metadata, ;; MIPS linkage name (for C++) + metadata, ;; Reference to file where defined + i32, ;; Line number where defined + metadata, ;; Reference to type descriptor + i1, ;; True if the global is local to compile unit (static) + i1, ;; True if the global is defined in the compile unit (not extern) + i32, ;; Line number where the scope of the subprogram begins + i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual + i32, ;; Index into a virtual function + metadata, ;; indicates which base type contains the vtable pointer for the + ;; derived class + i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped. + i1, ;; isOptimized + Function *,;; Pointer to LLVM function + metadata, ;; Lists function template parameters + metadata ;; Function declaration descriptor + metadata ;; List of function variables +}+
These descriptors provide debug information about functions, methods and -subprograms. The provide details such as name, return and argument types and -where the subprogram is defined.
- -The array of member descriptors is used to define arguments local variables -and nested blocks.
+ subprograms. They provide details such as name, return types and the source + location where the subprogram is defined. ++!3 = metadata !{ + i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block) + metadata,;; Reference to context descriptor + i32, ;; Line number + i32, ;; Column number + metadata,;; Reference to source file + i32 ;; Unique ID to identify blocks from a template function +} +
This descriptor provides debug information about nested blocks within a + subprogram. The line number and column numbers are used to dinstinguish + two lexical blocks at same depth.
+- %llvm.dbg.block = type { - uint, ;; Tag = 13 (DW_TAG_lexical_block) - { }* ;; Reference to array of member descriptors - } +!3 = metadata !{ + i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block) + metadata ;; Reference to the scope we're annotating with a file change + metadata,;; Reference to the file the scope is enclosed in. +}+
These descriptors provide debug information about nested blocks within a -subprogram. The array of member descriptors is used to define local variables -and deeper nested blocks.
+This descriptor provides a wrapper around a lexical scope to handle file + changes in the middle of a lexical block.
- %llvm.dbg.basictype.type = type { - uint, ;; Tag = 36 (DW_TAG_base_type) - { }*, ;; Reference to context (typically a compile unit) - sbyte*, ;; Name (may be "" for anonymous types) - { }*, ;; Reference to compile unit where defined (may be NULL) - int, ;; Line number where defined (may be 0) - uint, ;; Size in bits - uint, ;; Alignment in bits - uint, ;; Offset in bits - uint ;; Dwarf type encoding - } +!4 = metadata !{ + i32, ;; Tag = 36 + LLVMDebugVersion + ;; (DW_TAG_base_type) + metadata, ;; Reference to context + metadata, ;; Name (may be "" for anonymous types) + metadata, ;; Reference to file where defined (may be NULL) + i32, ;; Line number where defined (may be 0) + i64, ;; Size in bits + i64, ;; Alignment in bits + i64, ;; Offset in bits + i32, ;; Flags + i32 ;; DWARF type encoding +}+
These descriptors define primitive types used in the code. Example int, bool -and float. The context provides the scope of the type, which is usually the top -level. Since basic types are not usually user defined the compile unit and line -number can be left as NULL and 0. The size, alignment and offset are expressed -in bits and can be 64 bit values. The alignment is used to round the offset -when embedded in a composite type -(example to keep float doubles on 64 bit boundaries.) The offset is the bit -offset if embedded in a composite -type.
+ and float. The context provides the scope of the type, which is usually the + top level. Since basic types are not usually user defined the context + and line number can be left as NULL and 0. The size, alignment and offset + are expressed in bits and can be 64 bit values. The alignment is used to + round the offset when embedded in a + composite type (example to keep float + doubles on 64 bit boundaries.) The offset is the bit offset if embedded in + a composite type.The type encoding provides the details of the type. The values are typically -one of the following;
+ one of the following: +- DW_ATE_address = 1 - DW_ATE_boolean = 2 - DW_ATE_float = 4 - DW_ATE_signed = 5 - DW_ATE_signed_char = 6 - DW_ATE_unsigned = 7 - DW_ATE_unsigned_char = 8 +DW_ATE_address = 1 +DW_ATE_boolean = 2 +DW_ATE_float = 4 +DW_ATE_signed = 5 +DW_ATE_signed_char = 6 +DW_ATE_unsigned = 7 +DW_ATE_unsigned_char = 8+
- %llvm.dbg.derivedtype.type = type { - uint, ;; Tag (see below) - { }*, ;; Reference to context - sbyte*, ;; Name (may be "" for anonymous types) - { }*, ;; Reference to compile unit where defined (may be NULL) - int, ;; Line number where defined (may be 0) - uint, ;; Size in bits - uint, ;; Alignment in bits - uint, ;; Offset in bits - { }* ;; Reference to type derived from - } +!5 = metadata !{ + i32, ;; Tag (see below) + metadata, ;; Reference to context + metadata, ;; Name (may be "" for anonymous types) + metadata, ;; Reference to file where defined (may be NULL) + i32, ;; Line number where defined (may be 0) + i64, ;; Size in bits + i64, ;; Alignment in bits + i64, ;; Offset in bits + i32, ;; Flags to encode attributes, e.g. private + metadata, ;; Reference to type derived from + metadata, ;; (optional) Name of the Objective C property associated with + ;; Objective-C an ivar + metadata, ;; (optional) Name of the Objective C property getter selector. + metadata, ;; (optional) Name of the Objective C property setter selector. + i32 ;; (optional) Objective C property attributes. +}+
These descriptors are used to define types derived from other types. The value of the tag varies depending on the meaning. The following are possible -tag values;
+tag values: +- DW_TAG_formal_parameter = 5 - DW_TAG_member = 13 - DW_TAG_pointer_type = 15 - DW_TAG_reference_type = 16 - DW_TAG_typedef = 22 - DW_TAG_const_type = 38 - DW_TAG_volatile_type = 53 - DW_TAG_restrict_type = 55 +DW_TAG_formal_parameter = 5 +DW_TAG_member = 13 +DW_TAG_pointer_type = 15 +DW_TAG_reference_type = 16 +DW_TAG_typedef = 22 +DW_TAG_const_type = 38 +DW_TAG_volatile_type = 53 +DW_TAG_restrict_type = 55+
DW_TAG_member is used to define a member of a composite type or subprogram. The type of the member is the derived type. DW_TAG_formal_parameter -is used to define a member which is a formal argument of a subprogram.
+DW_TAG_member is used to define a member of + a composite type + or subprogram. The type of the member is + the derived + type. DW_TAG_formal_parameter is used to define a member which + is a formal argument of a subprogram.
-DW_TAG_typedef is used to -provide a name for the derived type.
+DW_TAG_typedef is used to provide a name for the derived type.
-DW_TAG_pointer_type, -DW_TAG_reference_type, DW_TAG_const_type, -DW_TAG_volatile_type and DW_TAG_restrict_type are used to -qualify the derived type.
+DW_TAG_pointer_type, DW_TAG_reference_type, + DW_TAG_const_type, DW_TAG_volatile_type and + DW_TAG_restrict_type are used to qualify + the derived type.
Derived type location can be determined -from the compile unit and line number. The size, alignment and offset are -expressed in bits and can be 64 bit values. The alignment is used to round the -offset when embedded in a composite type -(example to keep float doubles on 64 bit boundaries.) The offset is the bit -offset if embedded in a composite -type.
+ from the context and line number. The size, alignment and offset are + expressed in bits and can be 64 bit values. The alignment is used to round + the offset when embedded in a composite + type (example to keep float doubles on 64 bit boundaries.) The offset is + the bit offset if embedded in a composite + type. -Note that the void * type is expressed as a -llvm.dbg.derivedtype.type with tag of DW_TAG_pointer_type and -NULL derived type.
+Note that the void * type is expressed as a type derived from NULL. +
- %llvm.dbg.compositetype.type = type { - uint, ;; Tag (see below) - { }*, ;; Reference to context - sbyte*, ;; Name (may be "" for anonymous types) - { }*, ;; Reference to compile unit where defined (may be NULL) - int, ;; Line number where defined (may be 0) - uint, ;; Size in bits - uint, ;; Alignment in bits - uint, ;; Offset in bits - { }* ;; Reference to array of member descriptors - } +!6 = metadata !{ + i32, ;; Tag (see below) + metadata, ;; Reference to context + metadata, ;; Name (may be "" for anonymous types) + metadata, ;; Reference to file where defined (may be NULL) + i32, ;; Line number where defined (may be 0) + i64, ;; Size in bits + i64, ;; Alignment in bits + i64, ;; Offset in bits + i32, ;; Flags + metadata, ;; Reference to type derived from + metadata, ;; Reference to array of member descriptors + i32 ;; Runtime languages +}+
These descriptors are used to define types that are composed of 0 or more elements. The value of the tag varies depending on the meaning. The following -are possible tag values;
+are possible tag values: +- DW_TAG_array_type = 1 - DW_TAG_enumeration_type = 4 - DW_TAG_structure_type = 19 - DW_TAG_union_type = 23 +DW_TAG_array_type = 1 +DW_TAG_enumeration_type = 4 +DW_TAG_structure_type = 19 +DW_TAG_union_type = 23 +DW_TAG_vector_type = 259 +DW_TAG_subroutine_type = 21 +DW_TAG_inheritance = 28+
The vector flag indicates that an array type is a native packed vector.
-The members of array types (tag = DW_TAG_array_type) are subrange descriptors, each representing the range of -subscripts at that level of indexing.
+The members of array types (tag = DW_TAG_array_type) or vector types + (tag = DW_TAG_vector_type) are subrange + descriptors, each representing the range of subscripts at that level of + indexing.
The members of enumeration types (tag = DW_TAG_enumeration_type) are -enumerator descriptors, each representing the -definition of enumeration value -for the set.
+ enumerator descriptors, each representing + the definition of enumeration value for the set. All enumeration type + descriptors are collected inside the named metadata + !llvm.dbg.cu.The members of structure (tag = DW_TAG_structure_type) or union (tag -= DW_TAG_union_type) types are any one of the basic, derived -or composite type descriptors, each -representing a field member of the structure or union.
+ = DW_TAG_union_type) types are any one of + the basic, + derived + or composite type descriptors, each + representing a field member of the structure or union. + +For C++ classes (tag = DW_TAG_structure_type), member descriptors + provide information about base classes, static members and member + functions. If a member is a derived type + descriptor and has a tag of DW_TAG_inheritance, then the type + represents a base class. If the member of is + a global variable descriptor then it + represents a static member. And, if the member is + a subprogram descriptor then it represents + a member function. For static members and member + functions, getName() returns the members link or the C++ mangled + name. getDisplayName() the simplied version of the name.
+ +The first member of subroutine (tag = DW_TAG_subroutine_type) type + elements is the return type for the subroutine. The remaining elements are + the formal arguments to the subroutine.
Composite type location can be -determined from the compile unit and line number. The size, alignment and -offset are expressed in bits and can be 64 bit values. The alignment is used to -round the offset when embedded in a composite -type (as an example, to keep float doubles on 64 bit boundaries.) The offset -is the bit offset if embedded in a composite -type.
+ determined from the context and line number. The size, alignment and + offset are expressed in bits and can be 64 bit values. The alignment is used + to round the offset when embedded in + a composite type (as an example, to keep + float doubles on 64 bit boundaries.) The offset is the bit offset if embedded + in a composite type.- %llvm.dbg.subrange.type = type { - uint, ;; Tag = 33 (DW_TAG_subrange_type) - uint, ;; Low value - uint ;; High value - } +!42 = metadata !{ + i32, ;; Tag = 33 + LLVMDebugVersion (DW_TAG_subrange_type) + i64, ;; Low value + i64 ;; High value +}+
These descriptors are used to define ranges of array subscripts for an array -composite type. The low value defines the -lower bounds typically zero for C/C++. The high value is the upper bounds. -Values are 64 bit. High - low + 1 is the size of the array. If -low == high the array will be unbounded.
+ composite type. The low value defines + the lower bounds typically zero for C/C++. The high value is the upper + bounds. Values are 64 bit. High - low + 1 is the size of the array. If low + > high the array bounds are not included in generated debugging information. +- %llvm.dbg.enumerator.type = type { - uint, ;; Tag = 40 (DW_TAG_enumerator) - sbyte*, ;; Name - uint ;; Value - } +!6 = metadata !{ + i32, ;; Tag = 40 + LLVMDebugVersion + ;; (DW_TAG_enumerator) + metadata, ;; Name + i64 ;; Value +}+
These descriptors are used to define members of an enumeration composite type, it associates the name to the -value.
+These descriptors are used to define members of an + enumeration composite type, it + associates the name to the value.
LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to -provide debug information at various points in generated code.
++!7 = metadata !{ + i32, ;; Tag (see below) + metadata, ;; Context + metadata, ;; Name + metadata, ;; Reference to file where defined + i32, ;; 24 bit - Line number where defined + ;; 8 bit - Argument number. 1 indicates 1st argument. + metadata, ;; Type descriptor + i32, ;; flags + metadata ;; (optional) Reference to inline location +} +
These descriptors are used to define variables local to a sub program. The + value of the tag depends on the usage of the variable:
-- void %llvm.dbg.stoppoint( uint, uint, %llvm.dbg.compile_unit* ) +DW_TAG_auto_variable = 256 +DW_TAG_arg_variable = 257 +DW_TAG_return_variable = 258+
An auto variable is any variable declared in the body of the function. An + argument variable is any variable that appears as a formal argument to the + function. A return variable is used to track the result of a function and + has no source correspondent.
-This intrinsic is used to provide correspondence between the source file and -the generated code. The first argument is the line number (base 1), second -argument si the column number (0 if unknown) and the third argument the source -compile unit. Code following a call to this intrinsic will have been defined in -close proximity of the line, column and file. This information holds until the -next call to lvm.dbg.stoppoint.
+The context is either the subprogram or block where the variable is defined. + Name the source variable name. Context and line indicate where the + variable was defined. Type descriptor defines the declared type of the + variable.
- void %llvm.dbg.func.start( %llvm.dbg.subprogram.type* ) -+ +
This intrinsic is used to link the debug information in %llvm.dbg.subprogram to the function. It also -defines the beginning of the function's declarative region (scope.) The -intrinsic should be called early in the function after the all the alloca -instructions.
+LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to + provide debug information at various points in generated code.
- +- void %llvm.dbg.region.start() + void %llvm.dbg.declare(metadata, metadata)-
This intrinsic is used to define the beginning of a declarative scope (ex. -block) for local language elements. It should be paired off with a closing -%llvm.dbg.region.end.
- +This intrinsic provides information about a local element (e.g., variable). The + first argument is metadata holding the alloca for the variable. The + second argument is metadata containing a description of the variable.
- void %llvm.dbg.region.end() + void %llvm.dbg.value(metadata, i64, metadata)-
This intrinsic is used to define the end of a declarative scope (ex. block) -for local language elements. It should be paired off with an opening %llvm.dbg.region.start or %llvm.dbg.func.start.
- +This intrinsic provides information when a user source variable is set to a + new value. The first argument is the new value (wrapped as metadata). The + second argument is the offset in the user source variable where the new value + is written. The third argument is metadata containing a description of the + user source variable.
In many languages, the local variables in functions can have their lifetimes + or scopes limited to a subset of a function. In the C family of languages, + for example, variables are only live (readable and writable) within the + source block that they are defined in. In functional languages, values are + only readable after they have been defined. Though this is a very obvious + concept, it is non-trivial to model in LLVM, because it has no notion of + scoping in this sense, and does not want to be tied to a language's scoping + rules.
+ +In order to handle this, the LLVM debug format uses the metadata attached to + llvm instructions to encode line number and scoping information. Consider + the following C fragment, for example:
+ +- void %llvm.dbg.declare( {} *, ... ) +1. void foo() { +2. int X = 21; +3. int Y = 22; +4. { +5. int Z = 23; +6. Z = X; +7. } +8. X = Y; +9. }- -
This intrinsic provides information about a local element (ex. variable.) -TODO - details.
-Compiled to LLVM, this function would be represented like this:
-+define void @foo() nounwind ssp { +entry: + %X = alloca i32, align 4 ; <i32*> [#uses=4] + %Y = alloca i32, align 4 ; <i32*> [#uses=4] + %Z = alloca i32, align 4 ; <i32*> [#uses=3] + %0 = bitcast i32* %X to {}* ; <{}*> [#uses=1] + call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7 + store i32 21, i32* %X, !dbg !8 + %1 = bitcast i32* %Y to {}* ; <{}*> [#uses=1] + call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10 + store i32 22, i32* %Y, !dbg !11 + %2 = bitcast i32* %Z to {}* ; <{}*> [#uses=1] + call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14 + store i32 23, i32* %Z, !dbg !15 + %tmp = load i32* %X, !dbg !16 ; <i32> [#uses=1] + %tmp1 = load i32* %Y, !dbg !16 ; <i32> [#uses=1] + %add = add nsw i32 %tmp, %tmp1, !dbg !16 ; <i32> [#uses=1] + store i32 %add, i32* %Z, !dbg !16 + %tmp2 = load i32* %Y, !dbg !17 ; <i32> [#uses=1] + store i32 %tmp2, i32* %X, !dbg !17 + ret void, !dbg !18 +} -+LLVM debugger "stop points" are a key part of the debugging representation -that allows the LLVM to maintain simple semantics for debugging optimized code. The basic idea is that the -front-end inserts calls to the %llvm.dbg.stoppoint intrinsic -function at every point in the program where a debugger should be able to -inspect the program (these correspond to places a debugger stops when you -"step" through it). The front-end can choose to place these as -fine-grained as it would like (for example, before every subexpression -evaluated), but it is recommended to only put them after every source statement -that includes executable code.
+declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone + +!0 = metadata !{i32 459008, metadata !1, metadata !"X", + metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ] +!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] +!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo", + metadata !"foo", metadata !3, i32 1, metadata !4, + i1 false, i1 true}; [DW_TAG_subprogram ] +!3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", + metadata !"/private/tmp", metadata !"clang 1.1", i1 true, + i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ] +!4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0, + i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ] +!5 = metadata !{null} +!6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0, + i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ] +!7 = metadata !{i32 2, i32 7, metadata !1, null} +!8 = metadata !{i32 2, i32 3, metadata !1, null} +!9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3, + metadata !6}; [ DW_TAG_auto_variable ] +!10 = metadata !{i32 3, i32 7, metadata !1, null} +!11 = metadata !{i32 3, i32 3, metadata !1, null} +!12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5, + metadata !6}; [ DW_TAG_auto_variable ] +!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] +!14 = metadata !{i32 5, i32 9, metadata !13, null} +!15 = metadata !{i32 5, i32 5, metadata !13, null} +!16 = metadata !{i32 6, i32 5, metadata !13, null} +!17 = metadata !{i32 8, i32 3, metadata !1, null} +!18 = metadata !{i32 9, i32 1, metadata !2, null} +
Using calls to this intrinsic function to demark legal points for the -debugger to inspect the program automatically disables any optimizations that -could potentially confuse debugging information. To non-debug-information-aware -transformations, these calls simply look like calls to an external function, -which they must assume to do anything (including reading or writing to any part -of reachable memory). On the other hand, it does not impact many optimizations, -such as code motion of non-trapping instructions, nor does it impact -optimization of subexpressions, code duplication transformations, or basic-block -reordering transformations.
+This example illustrates a few important details about LLVM debugging + information. In particular, it shows how the llvm.dbg.declare + intrinsic and location information, which are attached to an instruction, + are applied together to allow a debugger to analyze the relationship between + statements, variable definitions, and the code used to implement the + function.
++call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7 +
The first intrinsic + %llvm.dbg.declare + encodes debugging information for the variable X. The metadata + !dbg !7 attached to the intrinsic provides scope information for the + variable X.
- -+!7 = metadata !{i32 2, i32 7, metadata !1, null} +!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] +!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", + metadata !"foo", metadata !"foo", metadata !3, i32 1, + metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ] +
In many languages, the local variables in functions can have their lifetime -or scope limited to a subset of a function. In the C family of languages, for -example, variables are only live (readable and writable) within the source block -that they are defined in. In functional languages, values are only readable -after they have been defined. Though this is a very obvious concept, it is also -non-trivial to model in LLVM, because it has no notion of scoping in this sense, -and does not want to be tied to a language's scoping rules.
+Here !7 is metadata providing location information. It has four + fields: line number, column number, scope, and original scope. The original + scope represents inline location if this instruction is inlined inside a + caller, and is null otherwise. In this example, scope is encoded by + !1. !1 represents a lexical block inside the scope + !2, where !2 is a + subprogram descriptor. This way the + location information attached to the intrinsics indicates that the + variable X is declared at line number 2 at a function level scope in + function foo.
-In order to handle this, the LLVM debug format uses the notion of "regions" -of a function, delineated by calls to intrinsic functions. These intrinsic -functions define new regions of the program and indicate when the region -lifetime expires. Consider the following C fragment, for example:
+Now lets take another example.
+-1. void foo() { -2. int X = ...; -3. int Y = ...; -4. { -5. int Z = ...; -6. ... -7. } -8. ... -9. } +call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14+
Compiled to LLVM, this function would be represented like this:
+The second intrinsic + %llvm.dbg.declare + encodes debugging information for variable Z. The metadata + !dbg !14 attached to the intrinsic provides scope information for + the variable Z.
+-void %foo() { -entry: - %X = alloca int - %Y = alloca int - %Z = alloca int - - ... - - call void %llvm.dbg.func.start( %llvm.dbg.subprogram.type* %llvm.dbg.subprogram ) - - call void %llvm.dbg.stoppoint( uint 2, uint 2, %llvm.dbg.compile_unit* %llvm.dbg.compile_unit ) - - call void %llvm.dbg.declare({}* %X, ...) - call void %llvm.dbg.declare({}* %Y, ...) - - ;; Evaluate expression on line 2, assigning to X. - - call void %llvm.dbg.stoppoint( uint 3, uint 2, %llvm.dbg.compile_unit* %llvm.dbg.compile_unit ) - - ;; Evaluate expression on line 3, assigning to Y. - - call void %llvm.region.start() - call void %llvm.dbg.stoppoint( uint 5, uint 4, %llvm.dbg.compile_unit* %llvm.dbg.compile_unit ) - call void %llvm.dbg.declare({}* %X, ...) - - ;; Evaluate expression on line 5, assigning to Z. - - call void %llvm.dbg.stoppoint( uint 7, uint 2, %llvm.dbg.compile_unit* %llvm.dbg.compile_unit ) - call void %llvm.region.end() - - call void %llvm.dbg.stoppoint( uint 9, uint 2, %llvm.dbg.compile_unit* %llvm.dbg.compile_unit ) - - call void %llvm.region.end() - - ret void -} +!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] +!14 = metadata !{i32 5, i32 9, metadata !13, null}+
This example illustrates a few important details about the LLVM debugging -information. In particular, it shows how the various intrinsics are applied -together to allow a debugger to analyze the relationship between statements, -variable definitions, and the code used to implement the function.
- -The first intrinsic %llvm.dbg.func.start provides -a link with the subprogram descriptor -containing the details of this function. This call also defines the beginning -of the function region, bounded by the %llvm.region.end at the end of -the function. This region is used to bracket the lifetime of variables declared -within. For a function, this outer region defines a new stack frame whose -lifetime ends when the region is ended.
- -It is possible to define inner regions for short term variables by using the -%llvm.region.start and %llvm.region.end to bound a -region. The inner region in this example would be for the block containing the -declaration of Z.
- -Using regions to represent the boundaries of source-level functions allow -LLVM interprocedural optimizations to arbitrarily modify LLVM functions without -having to worry about breaking mapping information between the LLVM code and the -and source-level program. In particular, the inliner requires no modification -to support inlining with debugging information: there is no explicit correlation -drawn between LLVM functions and their source-level counterparts (note however, -that if the inliner inlines all instances of a non-strong-linkage function into -its caller that it will not be possible for the user to manually invoke the -inlined function from a debugger).
- -Once the function has been defined, the stopping point corresponding to -line #2 (column #2) of the function is encountered. At this point in the -function, no local variables are live. As lines 2 and 3 of the example -are executed, their variable definitions are introduced into the program using -%llvm.dbg.declare, without the -need to specify a new region. These variables do not require new regions to be -introduced because they go out of scope at the same point in the program: line -9.
- -In contrast, the Z variable goes out of scope at a different time, -on line 7. For this reason, it is defined within the inner region, which kills -the availability of Z before the code for line 8 is executed. In this -way, regions can support arbitrary source-language scoping rules, as long as -they can only be nested (ie, one scope cannot partially overlap with a part of -another scope).
- -It is worth noting that this scoping mechanism is used to control scoping of -all declarations, not just variable declarations. For example, the scope of a -C++ using declaration is controlled with this couldchange how name lookup is -performed.
+Here !14 indicates that Z is declared at line number 5 and + column number 9 inside of lexical scope !13. The lexical scope + itself resides inside of lexical scope !1 described above.
-The scope information attached with each instruction provides a + straightforward way to find instructions covered by a scope.
+The C and C++ front-ends represent information about the program in a format -that is effectively identical to Dwarf 3.0 in terms of -information content. This allows code generators to trivially support native -debuggers by generating standard dwarf information, and contains enough -information for non-dwarf targets to translate it as needed.
+ that is effectively identical + to DWARF 3.0 in + terms of information content. This allows code generators to trivially + support native debuggers by generating standard dwarf information, and + contains enough information for non-dwarf targets to translate it as + needed.This section describes the forms used to represent C and C++ programs. Other -languages could pattern themselves after this (which itself is tuned to -representing programs in the same way that Dwarf 3 does), or they could choose -to provide completely different forms if they don't fit into the Dwarf model. -As support for debugging information gets added to the various LLVM -source-language front-ends, the information used should be documented here.
+ languages could pattern themselves after this (which itself is tuned to + representing programs in the same way that DWARF 3 does), or they could + choose to provide completely different forms if they don't fit into the DWARF + model. As support for debugging information gets added to the various LLVM + source-language front-ends, the information used should be documented + here.The following sections provide examples of various C/C++ constructs and the -debug information that would best describe those constructs.
- -Given the source files "MySource.cpp" and "MyHeader.h" located in the -directory "/Users/mine/sources", the following code;
+Given the source files MySource.cpp and MyHeader.h located + in the directory /Users/mine/sources, the following code:
+#include "MyHeader.h" @@ -941,553 +1074,619 @@ int main(int argc, char *argv[]) { return 0; }+
a C/C++ front-end would generate the following descriptors;
+a C/C++ front-end would generate the following descriptors:
+... ;; -;; Define types used. In this case we need one for compile unit anchors and one -;; for compile units. +;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp". ;; -%llvm.dbg.anchor.type = type { uint, uint } -%llvm.dbg.compile_unit.type = type { uint, { }*, uint, uint, sbyte*, sbyte*, sbyte* } -... -;; -;; Define the anchor for compile units. Note that the second field of the -;; anchor is 17, which is the same as the tag for compile units -;; (17 = DW_TAG_compile_unit.) -;; -%llvm.dbg.compile_units = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 17 }, section "llvm.metadata" +!2 = metadata !{ + i32 524305, ;; Tag + i32 0, ;; Unused + i32 4, ;; Language Id + metadata !"MySource.cpp", + metadata !"/Users/mine/sources", + metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)", + i1 true, ;; Main Compile Unit + i1 false, ;; Optimized compile unit + metadata !"", ;; Compiler flags + i32 0} ;; Runtime version ;; -;; Define the compile unit for the source file "/Users/mine/sources/MySource.cpp". -;; -%llvm.dbg.compile_unit1 = internal constant %llvm.dbg.compile_unit.type { - uint 17, - { }* cast (%llvm.dbg.anchor.type* %llvm.dbg.compile_units to { }*), - uint 1, - uint 1, - sbyte* getelementptr ([13 x sbyte]* %str1, int 0, int 0), - sbyte* getelementptr ([21 x sbyte]* %str2, int 0, int 0), - sbyte* getelementptr ([33 x sbyte]* %str3, int 0, int 0) }, section "llvm.metadata" - +;; Define the file for the file "/Users/mine/sources/MySource.cpp". ;; -;; Define the compile unit for the header file "/Users/mine/sources/MyHeader.h". -;; -%llvm.dbg.compile_unit2 = internal constant %llvm.dbg.compile_unit.type { - uint 17, - { }* cast (%llvm.dbg.anchor.type* %llvm.dbg.compile_units to { }*), - uint 1, - uint 1, - sbyte* getelementptr ([11 x sbyte]* %str4, int 0, int 0), - sbyte* getelementptr ([21 x sbyte]* %str2, int 0, int 0), - sbyte* getelementptr ([33 x sbyte]* %str3, int 0, int 0) }, section "llvm.metadata" +!1 = metadata !{ + i32 524329, ;; Tag + metadata !"MySource.cpp", + metadata !"/Users/mine/sources", + metadata !2 ;; Compile unit +} ;; -;; Define each of the strings used in the compile units. +;; Define the file for the file "/Users/mine/sources/Myheader.h" ;; -%str1 = internal constant [13 x sbyte] c"MySource.cpp\00", section "llvm.metadata"; -%str2 = internal constant [21 x sbyte] c"/Users/mine/sources/\00", section "llvm.metadata"; -%str3 = internal constant [33 x sbyte] c"4.0.1 LLVM (LLVM research group)\00", section "llvm.metadata"; -%str4 = internal constant [11 x sbyte] c"MyHeader.h\00", section "llvm.metadata"; +!3 = metadata !{ + i32 524329, ;; Tag + metadata !"Myheader.h" + metadata !"/Users/mine/sources", + metadata !2 ;; Compile unit +} + ...+
llvm::Instruction provides easy access to metadata attached with an +instruction. One can extract line number information encoded in LLVM IR +using Instruction::getMetadata() and +DILocation::getLineNumber(). +
+ if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction + DILocation Loc(N); // DILocation is in DebugInfo.h + unsigned Line = Loc.getLineNumber(); + StringRef File = Loc.getFilename(); + StringRef Dir = Loc.getDirectory(); + } +
Given an integer global variable declared as follows;
+Given an integer global variable declared as follows:
+int MyGlobal = 100;+
a C/C++ front-end would generate the following descriptors;
+a C/C++ front-end would generate the following descriptors:
+;; -;; Define types used. One for global variable anchors, one for the global -;; variable descriptor, one for the global's basic type and one for the global's -;; compile unit. -;; -%llvm.dbg.anchor.type = type { uint, uint } -%llvm.dbg.global_variable.type = type { uint, { }*, { }*, sbyte*, { }*, uint, { }*, bool, bool, { }*, uint } -%llvm.dbg.basictype.type = type { uint, { }*, sbyte*, { }*, int, uint, uint, uint, uint } -%llvm.dbg.compile_unit.type = ... -... -;; ;; Define the global itself. ;; %MyGlobal = global int 100 ... ;; -;; Define the anchor for global variables. Note that the second field of the -;; anchor is 52, which is the same as the tag for global variables -;; (52 = DW_TAG_variable.) +;; List of debug info of globals ;; -%llvm.dbg.global_variables = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 52 }, section "llvm.metadata" +!llvm.dbg.cu = !{!0} + +;; Define the compile unit. +!0 = metadata !{ + i32 786449, ;; Tag + i32 0, ;; Context + i32 4, ;; Language + metadata !"foo.cpp", ;; File + metadata !"/Volumes/Data/tmp", ;; Directory + metadata !"clang version 3.1 ", ;; Producer + i1 true, ;; Deprecated field + i1 false, ;; "isOptimized"? + metadata !"", ;; Flags + i32 0, ;; Runtime Version + metadata !1, ;; Enum Types + metadata !1, ;; Retained Types + metadata !1, ;; Subprograms + metadata !3 ;; Global Variables +} ; [ DW_TAG_compile_unit ] + +;; The Array of Global Variables +!3 = metadata !{ + metadata !4 +} +!4 = metadata !{ + metadata !5 +} + +;; +;; Define the global variable itself. ;; -;; Define the global variable descriptor. Note the reference to the global -;; variable anchor and the global variable itself. +!5 = metadata !{ + i32 786484, ;; Tag + i32 0, ;; Unused + null, ;; Unused + metadata !"MyGlobal", ;; Name + metadata !"MyGlobal", ;; Display Name + metadata !"", ;; Linkage Name + metadata !6, ;; File + i32 1, ;; Line + metadata !7, ;; Type + i32 0, ;; IsLocalToUnit + i32 1, ;; IsDefinition + i32* @MyGlobal ;; LLVM-IR Value +} ; [ DW_TAG_variable ] + ;; -%llvm.dbg.global_variable = internal constant %llvm.dbg.global_variable.type { - uint 52, - { }* cast (%llvm.dbg.anchor.type* %llvm.dbg.global_variables to { }*), - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([9 x sbyte]* %str1, int 0, int 0), - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - uint 1, - { }* cast (%llvm.dbg.basictype.type* %llvm.dbg.basictype to { }*), - bool false, - bool true, - { }* cast (int* %MyGlobal to { }*) }, section "llvm.metadata" - +;; Define the file ;; -;; Define the basic type of 32 bit signed integer. Note that since int is an -;; intrinsic type the source file is NULL and line 0. -;; -%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([4 x sbyte]* %str2, int 0, int 0), - { }* null, - int 0, - uint 32, - uint 32, - uint 0, - uint 5 }, section "llvm.metadata" +!6 = metadata !{ + i32 786473, ;; Tag + metadata !"foo.cpp", ;; File + metadata !"/Volumes/Data/tmp", ;; Directory + null ;; Unused +} ; [ DW_TAG_file_type ] ;; -;; Define the names of the global variable and basic type. +;; Define the type ;; -%str1 = internal constant [9 x sbyte] c"MyGlobal\00", section "llvm.metadata" -%str2 = internal constant [4 x sbyte] c"int\00", section "llvm.metadata" +!7 = metadata !{ + i32 786468, ;; Tag + null, ;; Unused + metadata !"int", ;; Name + null, ;; Unused + i32 0, ;; Line + i64 32, ;; Size in Bits + i64 32, ;; Align in Bits + i64 0, ;; Offset + i32 0, ;; Flags + i32 5 ;; Encoding +} ; [ DW_TAG_base_type ] ++
Given a function declared as follows;
+Given a function declared as follows:
+int main(int argc, char *argv[]) { return 0; }+
a C/C++ front-end would generate the following descriptors;
+a C/C++ front-end would generate the following descriptors:
+-;; -;; Define types used. One for subprogram anchors, one for the subprogram -;; descriptor, one for the global's basic type and one for the subprogram's -;; compile unit. -;; -%llvm.dbg.subprogram.type = type { uint, { }*, { }*, sbyte*, { }*, bool, bool, { }* } -%llvm.dbg.anchor.type = type { uint, uint } -%llvm.dbg.compile_unit.type = ... - ;; ;; Define the anchor for subprograms. Note that the second field of the ;; anchor is 46, which is the same as the tag for subprograms ;; (46 = DW_TAG_subprogram.) ;; -%llvm.dbg.subprograms = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 46 }, section "llvm.metadata" - -;; -;; Define the descriptor for the subprogram. TODO - more details. -;; -%llvm.dbg.subprogram = internal constant %llvm.dbg.subprogram.type { - uint 46, - { }* cast (%llvm.dbg.anchor.type* %llvm.dbg.subprograms to { }*), - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([5 x sbyte]* %str1, int 0, int 0), - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - uint 1, - { }* null, - bool false, - bool true, - null }, section "llvm.metadata" - -;; -;; Define the name of the subprogram. -;; -%str1 = internal constant [5 x sbyte] c"main\00", section "llvm.metadata" - +!6 = metadata !{ + i32 524334, ;; Tag + i32 0, ;; Unused + metadata !1, ;; Context + metadata !"main", ;; Name + metadata !"main", ;; Display name + metadata !"main", ;; Linkage name + metadata !1, ;; File + i32 1, ;; Line number + metadata !4, ;; Type + i1 false, ;; Is local + i1 true, ;; Is definition + i32 0, ;; Virtuality attribute, e.g. pure virtual function + i32 0, ;; Index into virtual table for C++ methods + i32 0, ;; Type that holds virtual table. + i32 0, ;; Flags + i1 false, ;; True if this function is optimized + Function *, ;; Pointer to llvm::Function + null ;; Function template parameters +} ;; ;; Define the subprogram itself. ;; -int %main(int %argc, sbyte** %argv) { +define i32 @main(i32 %argc, i8** %argv) { ... }+
The following are the basic type descriptors for C/C++ core types;
- -The following are the basic type descriptors for C/C++ core types:
--%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([5 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 32, - uint 32, - uint 0, - uint 2 }, section "llvm.metadata" -%str1 = internal constant [5 x sbyte] c"bool\00", section "llvm.metadata" +!2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"bool", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 8, ;; Size in Bits + i64 8, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 2 ;; Encoding +}+
-%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([5 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 8, - uint 8, - uint 0, - uint 6 }, section "llvm.metadata" -%str1 = internal constant [5 x sbyte] c"char\00", section "llvm.metadata" +!2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"char", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 8, ;; Size in Bits + i64 8, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 6 ;; Encoding +}+
-%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([14 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 8, - uint 8, - uint 0, - uint 8 }, section "llvm.metadata" -%str1 = internal constant [14 x sbyte] c"unsigned char\00", section "llvm.metadata" +!2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"unsigned char", + metadata !1, ;; File + i32 0, ;; Line number + i64 8, ;; Size in Bits + i64 8, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 8 ;; Encoding +}+
-%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([10 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 16, - uint 16, - uint 0, - uint 5 }, section "llvm.metadata" -%str1 = internal constant [10 x sbyte] c"short int\00", section "llvm.metadata" +!2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"short int", + metadata !1, ;; File + i32 0, ;; Line number + i64 16, ;; Size in Bits + i64 16, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 5 ;; Encoding +}+
-%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([19 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 16, - uint 16, - uint 0, - uint 7 }, section "llvm.metadata" -%str1 = internal constant [19 x sbyte] c"short unsigned int\00", section "llvm.metadata" +!2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"short unsigned int", + metadata !1, ;; File + i32 0, ;; Line number + i64 16, ;; Size in Bits + i64 16, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 7 ;; Encoding +}+
-%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([4 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 32, - uint 32, - uint 0, - uint 5 }, section "llvm.metadata" -%str1 = internal constant [4 x sbyte] c"int\00", section "llvm.metadata" -+!2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"int", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in Bits + i64 32, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 5 ;; Encoding +} +
-%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([13 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 32, - uint 32, - uint 0, - uint 7 }, section "llvm.metadata" -%str1 = internal constant [13 x sbyte] c"unsigned int\00", section "llvm.metadata" +!2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"unsigned int", + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in Bits + i64 32, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 7 ;; Encoding +}+
-%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([14 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 64, - uint 64, - uint 0, - uint 5 }, section "llvm.metadata" -%str1 = internal constant [14 x sbyte] c"long long int\00", section "llvm.metadata" +!2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"long long int", + metadata !1, ;; File + i32 0, ;; Line number + i64 64, ;; Size in Bits + i64 64, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 5 ;; Encoding +}+
-%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([23 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 64, - uint 64, - uint 0, - uint 7 }, section "llvm.metadata" -%str1 = internal constant [23 x sbyte] c"long long unsigned int\00", section "llvm.metadata" +!2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"long long unsigned int", + metadata !1, ;; File + i32 0, ;; Line number + i64 64, ;; Size in Bits + i64 64, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 7 ;; Encoding +}+
-%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([6 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 32, - uint 32, - uint 0, - uint 4 }, section "llvm.metadata" -%str1 = internal constant [6 x sbyte] c"float\00", section "llvm.metadata" +!2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"float", + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in Bits + i64 32, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 4 ;; Encoding +}+
-%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([7 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 64, - uint 64, - uint 0, - uint 4 }, section "llvm.metadata" -%str1 = internal constant [7 x sbyte] c"double\00", section "llvm.metadata" +!2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"double",;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 64, ;; Size in Bits + i64 64, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 4 ;; Encoding +}+
Given the following as an example of C/C++ derived type;
+Given the following as an example of C/C++ derived type:
+typedef const int *IntPtr;+
a C/C++ front-end would generate the following descriptors;
+a C/C++ front-end would generate the following descriptors:
+;; ;; Define the typedef "IntPtr". ;; -%llvm.dbg.derivedtype1 = internal constant %llvm.dbg.derivedtype.type { - uint 22, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([7 x sbyte]* %str1, int 0, int 0), - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - int 1, - uint 0, - uint 0, - uint 0, - { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype2 to { }*) }, section "llvm.metadata" -%str1 = internal constant [7 x sbyte] c"IntPtr\00", section "llvm.metadata" +!2 = metadata !{ + i32 524310, ;; Tag + metadata !1, ;; Context + metadata !"IntPtr", ;; Name + metadata !3, ;; File + i32 0, ;; Line number + i64 0, ;; Size in bits + i64 0, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + metadata !4 ;; Derived From type +} ;; ;; Define the pointer type. ;; -%llvm.dbg.derivedtype2 = internal constant %llvm.dbg.derivedtype.type { - uint 15, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* null, - { }* null, - int 0, - uint 32, - uint 32, - uint 0, - { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype3 to { }*) }, section "llvm.metadata" - +!4 = metadata !{ + i32 524303, ;; Tag + metadata !1, ;; Context + metadata !"", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 64, ;; Size in bits + i64 64, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + metadata !5 ;; Derived From type +} ;; ;; Define the const type. ;; -%llvm.dbg.derivedtype3 = internal constant %llvm.dbg.derivedtype.type { - uint 38, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* null, - { }* null, - int 0, - uint 0, - uint 0, - uint 0, - { }* cast (%llvm.dbg.basictype.type* %llvm.dbg.basictype1 to { }*) }, section "llvm.metadata" - +!5 = metadata !{ + i32 524326, ;; Tag + metadata !1, ;; Context + metadata !"", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + metadata !6 ;; Derived From type +} ;; ;; Define the int type. ;; -%llvm.dbg.basictype1 = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([4 x sbyte]* %str2, int 0, int 0), - { }* null, - int 0, - uint 32, - uint 32, - uint 0, - uint 5 }, section "llvm.metadata" -%str2 = internal constant [4 x sbyte] c"int\00", section "llvm.metadata" +!6 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"int", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + 5 ;; Encoding +}+
Given the following as an example of C/C++ struct type;
+Given the following as an example of C/C++ struct type:
+struct Color { unsigned Red; @@ -1495,106 +1694,112 @@ struct Color { unsigned Blue; };+
a C/C++ front-end would generate the following descriptors;
+a C/C++ front-end would generate the following descriptors:
+;; ;; Define basic type for unsigned int. ;; -%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { - uint 36, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([13 x sbyte]* %str1, int 0, int 0), - { }* null, - int 0, - uint 32, - uint 32, - uint 0, - uint 7 }, section "llvm.metadata" -%str1 = internal constant [13 x sbyte] c"unsigned int\00", section "llvm.metadata" - +!5 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"unsigned int", + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in Bits + i64 32, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 7 ;; Encoding +} ;; ;; Define composite type for struct Color. ;; -%llvm.dbg.compositetype = internal constant %llvm.dbg.compositetype.type { - uint 19, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([6 x sbyte]* %str2, int 0, int 0), - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - int 1, - uint 96, - uint 32, - uint 0, - { }* null, - { }* cast ([3 x { }*]* %llvm.dbg.array to { }*) }, section "llvm.metadata" -%str2 = internal constant [6 x sbyte] c"Color\00", section "llvm.metadata" +!2 = metadata !{ + i32 524307, ;; Tag + metadata !1, ;; Context + metadata !"Color", ;; Name + metadata !1, ;; Compile unit + i32 1, ;; Line number + i64 96, ;; Size in bits + i64 32, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + null, ;; Derived From + metadata !3, ;; Elements + i32 0 ;; Runtime Language +} ;; ;; Define the Red field. ;; -%llvm.dbg.derivedtype1 = internal constant %llvm.dbg.derivedtype.type { - uint 13, - { }* null, - sbyte* getelementptr ([4 x sbyte]* %str3, int 0, int 0), - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - int 2, - uint 32, - uint 32, - uint 0, - { }* cast (%llvm.dbg.basictype.type* %llvm.dbg.basictype to { }*) }, section "llvm.metadata" -%str3 = internal constant [4 x sbyte] c"Red\00", section "llvm.metadata" +!4 = metadata !{ + i32 524301, ;; Tag + metadata !1, ;; Context + metadata !"Red", ;; Name + metadata !1, ;; File + i32 2, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + metadata !5 ;; Derived From type +} ;; ;; Define the Green field. ;; -%llvm.dbg.derivedtype2 = internal constant %llvm.dbg.derivedtype.type { - uint 13, - { }* null, - sbyte* getelementptr ([6 x sbyte]* %str4, int 0, int 0), - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - int 3, - uint 32, - uint 32, - uint 32, - { }* cast (%llvm.dbg.basictype.type* %llvm.dbg.basictype to { }*) }, section "llvm.metadata" -%str4 = internal constant [6 x sbyte] c"Green\00", section "llvm.metadata" +!6 = metadata !{ + i32 524301, ;; Tag + metadata !1, ;; Context + metadata !"Green", ;; Name + metadata !1, ;; File + i32 3, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 32, ;; Offset in bits + i32 0, ;; Flags + metadata !5 ;; Derived From type +} ;; ;; Define the Blue field. ;; -%llvm.dbg.derivedtype3 = internal constant %llvm.dbg.derivedtype.type { - uint 13, - { }* null, - sbyte* getelementptr ([5 x sbyte]* %str5, int 0, int 0), - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - int 4, - uint 32, - uint 32, - uint 64, - { }* cast (%llvm.dbg.basictype.type* %llvm.dbg.basictype to { }*) }, section "llvm.metadata" -%str5 = internal constant [5 x sbyte] c"Blue\00", section "llvm.metadata" +!7 = metadata !{ + i32 524301, ;; Tag + metadata !1, ;; Context + metadata !"Blue", ;; Name + metadata !1, ;; File + i32 4, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 64, ;; Offset in bits + i32 0, ;; Flags + metadata !5 ;; Derived From type +} ;; ;; Define the array of fields used by the composite type Color. ;; -%llvm.dbg.array = internal constant [3 x { }*] [ - { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype1 to { }*), - { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype2 to { }*), - { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype3 to { }*) ], section "llvm.metadata" +!3 = metadata !{metadata !4, metadata !6, metadata !7}+
Given the following as an example of C/C++ enumeration type;
+Given the following as an example of C/C++ enumeration type:
+enum Trees { Spruce = 100, @@ -1602,62 +1807,1037 @@ enum Trees { Maple = 300 };+
a C/C++ front-end would generate the following descriptors;
+a C/C++ front-end would generate the following descriptors:
+;; ;; Define composite type for enum Trees ;; -%llvm.dbg.compositetype = internal constant %llvm.dbg.compositetype.type { - uint 4, - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - sbyte* getelementptr ([6 x sbyte]* %str1, int 0, int 0), - { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), - int 1, - uint 32, - uint 32, - uint 0, - { }* null, - { }* cast ([3 x { }*]* %llvm.dbg.array to { }*) }, section "llvm.metadata" -%str1 = internal constant [6 x sbyte] c"Trees\00", section "llvm.metadata" +!2 = metadata !{ + i32 524292, ;; Tag + metadata !1, ;; Context + metadata !"Trees", ;; Name + metadata !1, ;; File + i32 1, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + null, ;; Derived From type + metadata !3, ;; Elements + i32 0 ;; Runtime language +} + +;; +;; Define the array of enumerators used by composite type Trees. +;; +!3 = metadata !{metadata !4, metadata !5, metadata !6} ;; ;; Define Spruce enumerator. ;; -%llvm.dbg.enumerator1 = internal constant %llvm.dbg.enumerator.type { - uint 40, - sbyte* getelementptr ([7 x sbyte]* %str2, int 0, int 0), - int 100 }, section "llvm.metadata" -%str2 = internal constant [7 x sbyte] c"Spruce\00", section "llvm.metadata" +!4 = metadata !{i32 524328, metadata !"Spruce", i64 100} ;; ;; Define Oak enumerator. ;; -%llvm.dbg.enumerator2 = internal constant %llvm.dbg.enumerator.type { - uint 40, - sbyte* getelementptr ([4 x sbyte]* %str3, int 0, int 0), - int 200 }, section "llvm.metadata" -%str3 = internal constant [4 x sbyte] c"Oak\00", section "llvm.metadata" +!5 = metadata !{i32 524328, metadata !"Oak", i64 200} ;; ;; Define Maple enumerator. ;; -%llvm.dbg.enumerator3 = internal constant %llvm.dbg.enumerator.type { - uint 40, - sbyte* getelementptr ([6 x sbyte]* %str4, int 0, int 0), - int 300 }, section "llvm.metadata" -%str4 = internal constant [6 x sbyte] c"Maple\00", section "llvm.metadata" +!6 = metadata !{i32 524328, metadata !"Maple", i64 300} -;; -;; Define the array of enumerators used by composite type Trees. -;; -%llvm.dbg.array = internal constant [3 x { }*] [ - { }* cast (%llvm.dbg.enumerator.type* %llvm.dbg.enumerator1 to { }*), - { }* cast (%llvm.dbg.enumerator.type* %llvm.dbg.enumerator2 to { }*), - { }* cast (%llvm.dbg.enumerator.type* %llvm.dbg.enumerator3 to { }*) ], section "llvm.metadata"+
Objective C provides a simpler way to declare and define accessor methods +using declared properties. The language provides features to declare a +property and to let compiler synthesize accessor methods. +
+ +The debugger lets developer inspect Objective C interfaces and their +instance variables and class variables. However, the debugger does not know +anything about the properties defined in Objective C interfaces. The debugger +consumes information generated by compiler in DWARF format. The format does +not support encoding of Objective C properties. This proposal describes DWARF +extensions to encode Objective C properties, which the debugger can use to let +developers inspect Objective C properties. +
+ +Objective C properties exist separately from class members. A property +can be defined only by "setter" and "getter" selectors, and +be calculated anew on each access. Or a property can just be a direct access +to some declared ivar. Finally it can have an ivar "automatically +synthesized" for it by the compiler, in which case the property can be +referred to in user code directly using the standard C dereference syntax as +well as through the property "dot" syntax, but there is no entry in +the @interface declaration corresponding to this ivar. +
++To facilitate debugging, these properties we will add a new DWARF TAG into the +DW_TAG_structure_type definition for the class to hold the description of a +given property, and a set of DWARF attributes that provide said description. +The property tag will also contain the name and declared type of the property. +
++If there is a related ivar, there will also be a DWARF property attribute placed +in the DW_TAG_member DIE for that ivar referring back to the property TAG for +that property. And in the case where the compiler synthesizes the ivar directly, +the compiler is expected to generate a DW_TAG_member for that ivar (with the +DW_AT_artificial set to 1), whose name will be the name used to access this +ivar directly in code, and with the property attribute pointing back to the +property it is backing. +
++The following examples will serve as illustration for our discussion: +
+ ++@interface I1 { + int n2; +} + +@property int p1; +@property int p2; +@end + +@implementation I1 +@synthesize p1; +@synthesize p2 = n2; +@end ++
+This produces the following DWARF (this is a "pseudo dwarfdump" output): +
++0x00000100: TAG_structure_type [7] * + AT_APPLE_runtime_class( 0x10 ) + AT_name( "I1" ) + AT_decl_file( "Objc_Property.m" ) + AT_decl_line( 3 ) + +0x00000110 TAG_APPLE_property + AT_name ( "p1" ) + AT_type ( {0x00000150} ( int ) ) + +0x00000120: TAG_APPLE_property + AT_name ( "p2" ) + AT_type ( {0x00000150} ( int ) ) + +0x00000130: TAG_member [8] + AT_name( "_p1" ) + AT_APPLE_property ( {0x00000110} "p1" ) + AT_type( {0x00000150} ( int ) ) + AT_artificial ( 0x1 ) + +0x00000140: TAG_member [8] + AT_name( "n2" ) + AT_APPLE_property ( {0x00000120} "p2" ) + AT_type( {0x00000150} ( int ) ) + +0x00000150: AT_type( ( int ) ) ++
Note, the current convention is that the name of the ivar for an +auto-synthesized property is the name of the property from which it derives with +an underscore prepended, as is shown in the example. +But we actually don't need to know this convention, since we are given the name +of the ivar directly. +
+ ++Also, it is common practice in ObjC to have different property declarations in +the @interface and @implementation - e.g. to provide a read-only property in +the interface,and a read-write interface in the implementation. In that case, +the compiler should emit whichever property declaration will be in force in the +current translation unit. +
+ +Developers can decorate a property with attributes which are encoded using +DW_AT_APPLE_property_attribute. +
+ ++@property (readonly, nonatomic) int pr; ++
+Which produces a property tag: +
+
+TAG_APPLE_property [8] + AT_name( "pr" ) + AT_type ( {0x00000147} (int) ) + AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic) ++
The setter and getter method names are attached to the property using +DW_AT_APPLE_property_setter and DW_AT_APPLE_property_getter attributes. +
++@interface I1 +@property (setter=myOwnP3Setter:) int p3; +-(void)myOwnP3Setter:(int)a; +@end + +@implementation I1 +@synthesize p3; +-(void)myOwnP3Setter:(int)a{ } +@end ++
+The DWARF for this would be: +
++0x000003bd: TAG_structure_type [7] * + AT_APPLE_runtime_class( 0x10 ) + AT_name( "I1" ) + AT_decl_file( "Objc_Property.m" ) + AT_decl_line( 3 ) + +0x000003cd TAG_APPLE_property + AT_name ( "p3" ) + AT_APPLE_property_setter ( "myOwnP3Setter:" ) + AT_type( {0x00000147} ( int ) ) + +0x000003f3: TAG_member [8] + AT_name( "_p3" ) + AT_type ( {0x00000147} ( int ) ) + AT_APPLE_property ( {0x000003cd} ) + AT_artificial ( 0x1 ) ++
TAG | +Value | +
---|---|
DW_TAG_APPLE_property | +0x4200 | +
Attribute | +Value | +Classes | +
---|---|---|
DW_AT_APPLE_property | +0x3fed | +Reference | +
DW_AT_APPLE_property_getter | +0x3fe9 | +String | +
DW_AT_APPLE_property_setter | +0x3fea | +String | +
DW_AT_APPLE_property_attribute | +0x3feb | +Constant | +
Name | +Value | +
---|---|
DW_AT_APPLE_PROPERTY_readonly | +0x1 | +
DW_AT_APPLE_PROPERTY_readwrite | +0x2 | +
DW_AT_APPLE_PROPERTY_assign | +0x4 | +
DW_AT_APPLE_PROPERTY_retain | +0x8 | +
DW_AT_APPLE_PROPERTY_copy | +0x10 | +
DW_AT_APPLE_PROPERTY_nonatomic | +0x20 | +
The .debug_pubnames and .debug_pubtypes formats are not what a debugger + needs. The "pub" in the section name indicates that the entries in the + table are publicly visible names only. This means no static or hidden + functions show up in the .debug_pubnames. No static variables or private class + variables are in the .debug_pubtypes. Many compilers add different things to + these tables, so we can't rely upon the contents between gcc, icc, or clang.
+ +The typical query given by users tends not to match up with the contents of + these tables. For example, the DWARF spec states that "In the case of the + name of a function member or static data member of a C++ structure, class or + union, the name presented in the .debug_pubnames section is not the simple + name given by the DW_AT_name attribute of the referenced debugging information + entry, but rather the fully qualified name of the data or function member." + So the only names in these tables for complex C++ entries is a fully + qualified name. Debugger users tend not to enter their search strings as + "a::b::c(int,const Foo&) const", but rather as "c", "b::c" , or "a::b::c". So + the name entered in the name table must be demangled in order to chop it up + appropriately and additional names must be manually entered into the table + to make it effective as a name lookup table for debuggers to use.
+ +All debuggers currently ignore the .debug_pubnames table as a result of + its inconsistent and useless public-only name content making it a waste of + space in the object file. These tables, when they are written to disk, are + not sorted in any way, leaving every debugger to do its own parsing + and sorting. These tables also include an inlined copy of the string values + in the table itself making the tables much larger than they need to be on + disk, especially for large C++ programs.
+ +Can't we just fix the sections by adding all of the names we need to this + table? No, because that is not what the tables are defined to contain and we + won't know the difference between the old bad tables and the new good tables. + At best we could make our own renamed sections that contain all of the data + we need.
+ +These tables are also insufficient for what a debugger like LLDB needs. + LLDB uses clang for its expression parsing where LLDB acts as a PCH. LLDB is + then often asked to look for type "foo" or namespace "bar", or list items in + namespace "baz". Namespaces are not included in the pubnames or pubtypes + tables. Since clang asks a lot of questions when it is parsing an expression, + we need to be very fast when looking up names, as it happens a lot. Having new + accelerator tables that are optimized for very quick lookups will benefit + this type of debugging experience greatly.
+ +We would like to generate name lookup tables that can be mapped into + memory from disk, and used as is, with little or no up-front parsing. We would + also be able to control the exact content of these different tables so they + contain exactly what we need. The Name Accelerator Tables were designed + to fix these issues. In order to solve these issues we need to:
+ +Table size is important and the accelerator table format should allow the + reuse of strings from common string tables so the strings for the names are + not duplicated. We also want to make sure the table is ready to be used as-is + by simply mapping the table into memory with minimal header parsing.
+ +The name lookups need to be fast and optimized for the kinds of lookups + that debuggers tend to do. Optimally we would like to touch as few parts of + the mapped table as possible when doing a name lookup and be able to quickly + find the name entry we are looking for, or discover there are no matches. In + the case of debuggers we optimized for lookups that fail most of the time.
+ +Each table that is defined should have strict rules on exactly what is in + the accelerator tables and documented so clients can rely on the content.
+ +Typical hash tables have a header, buckets, and each bucket points to the +bucket contents: +
+ ++.------------. +| HEADER | +|------------| +| BUCKETS | +|------------| +| DATA | +`------------' ++
The BUCKETS are an array of offsets to DATA for each hash:
+ ++.------------. +| 0x00001000 | BUCKETS[0] +| 0x00002000 | BUCKETS[1] +| 0x00002200 | BUCKETS[2] +| 0x000034f0 | BUCKETS[3] +| | ... +| 0xXXXXXXXX | BUCKETS[n_buckets] +'------------' ++
So for bucket[3] in the example above, we have an offset into the table + 0x000034f0 which points to a chain of entries for the bucket. Each bucket + must contain a next pointer, full 32 bit hash value, the string itself, + and the data for the current string value.
+ ++ .------------. +0x000034f0: | 0x00003500 | next pointer + | 0x12345678 | 32 bit hash + | "erase" | string value + | data[n] | HashData for this bucket + |------------| +0x00003500: | 0x00003550 | next pointer + | 0x29273623 | 32 bit hash + | "dump" | string value + | data[n] | HashData for this bucket + |------------| +0x00003550: | 0x00000000 | next pointer + | 0x82638293 | 32 bit hash + | "main" | string value + | data[n] | HashData for this bucket + `------------' ++
The problem with this layout for debuggers is that we need to optimize for + the negative lookup case where the symbol we're searching for is not present. + So if we were to lookup "printf" in the table above, we would make a 32 hash + for "printf", it might match bucket[3]. We would need to go to the offset + 0x000034f0 and start looking to see if our 32 bit hash matches. To do so, we + need to read the next pointer, then read the hash, compare it, and skip to + the next bucket. Each time we are skipping many bytes in memory and touching + new cache pages just to do the compare on the full 32 bit hash. All of these + accesses then tell us that we didn't have a match.
+ +To solve the issues mentioned above we have structured the hash tables + a bit differently: a header, buckets, an array of all unique 32 bit hash + values, followed by an array of hash value data offsets, one for each hash + value, then the data for all hash values:
+ ++.-------------. +| HEADER | +|-------------| +| BUCKETS | +|-------------| +| HASHES | +|-------------| +| OFFSETS | +|-------------| +| DATA | +`-------------' ++
The BUCKETS in the name tables are an index into the HASHES array. By + making all of the full 32 bit hash values contiguous in memory, we allow + ourselves to efficiently check for a match while touching as little + memory as possible. Most often checking the 32 bit hash values is as far as + the lookup goes. If it does match, it usually is a match with no collisions. + So for a table with "n_buckets" buckets, and "n_hashes" unique 32 bit hash + values, we can clarify the contents of the BUCKETS, HASHES and OFFSETS as:
+ ++.-------------------------. +| HEADER.magic | uint32_t +| HEADER.version | uint16_t +| HEADER.hash_function | uint16_t +| HEADER.bucket_count | uint32_t +| HEADER.hashes_count | uint32_t +| HEADER.header_data_len | uint32_t +| HEADER_DATA | HeaderData +|-------------------------| +| BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes +|-------------------------| +| HASHES | uint32_t[n_buckets] // 32 bit hash values +|-------------------------| +| OFFSETS | uint32_t[n_buckets] // 32 bit offsets to hash value data +|-------------------------| +| ALL HASH DATA | +`-------------------------' ++
So taking the exact same data from the standard hash example above we end up + with:
+ ++ .------------. + | HEADER | + |------------| + | 0 | BUCKETS[0] + | 2 | BUCKETS[1] + | 5 | BUCKETS[2] + | 6 | BUCKETS[3] + | | ... + | ... | BUCKETS[n_buckets] + |------------| + | 0x........ | HASHES[0] + | 0x........ | HASHES[1] + | 0x........ | HASHES[2] + | 0x........ | HASHES[3] + | 0x........ | HASHES[4] + | 0x........ | HASHES[5] + | 0x12345678 | HASHES[6] hash for BUCKETS[3] + | 0x29273623 | HASHES[7] hash for BUCKETS[3] + | 0x82638293 | HASHES[8] hash for BUCKETS[3] + | 0x........ | HASHES[9] + | 0x........ | HASHES[10] + | 0x........ | HASHES[11] + | 0x........ | HASHES[12] + | 0x........ | HASHES[13] + | 0x........ | HASHES[n_hashes] + |------------| + | 0x........ | OFFSETS[0] + | 0x........ | OFFSETS[1] + | 0x........ | OFFSETS[2] + | 0x........ | OFFSETS[3] + | 0x........ | OFFSETS[4] + | 0x........ | OFFSETS[5] + | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3] + | 0x00003500 | OFFSETS[7] offset for BUCKETS[3] + | 0x00003550 | OFFSETS[8] offset for BUCKETS[3] + | 0x........ | OFFSETS[9] + | 0x........ | OFFSETS[10] + | 0x........ | OFFSETS[11] + | 0x........ | OFFSETS[12] + | 0x........ | OFFSETS[13] + | 0x........ | OFFSETS[n_hashes] + |------------| + | | + | | + | | + | | + | | + |------------| +0x000034f0: | 0x00001203 | .debug_str ("erase") + | 0x00000004 | A 32 bit array count - number of HashData with name "erase" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x........ | HashData[3] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + |------------| +0x00003500: | 0x00001203 | String offset into .debug_str ("collision") + | 0x00000002 | A 32 bit array count - number of HashData with name "collision" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x00001203 | String offset into .debug_str ("dump") + | 0x00000003 | A 32 bit array count - number of HashData with name "dump" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + |------------| +0x00003550: | 0x00001203 | String offset into .debug_str ("main") + | 0x00000009 | A 32 bit array count - number of HashData with name "main" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x........ | HashData[3] + | 0x........ | HashData[4] + | 0x........ | HashData[5] + | 0x........ | HashData[6] + | 0x........ | HashData[7] + | 0x........ | HashData[8] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + `------------' ++
So we still have all of the same data, we just organize it more efficiently + for debugger lookup. If we repeat the same "printf" lookup from above, we + would hash "printf" and find it matches BUCKETS[3] by taking the 32 bit hash + value and modulo it by n_buckets. BUCKETS[3] contains "6" which is the index + into the HASHES table. We would then compare any consecutive 32 bit hashes + values in the HASHES array as long as the hashes would be in BUCKETS[3]. We + do this by verifying that each subsequent hash value modulo n_buckets is still + 3. In the case of a failed lookup we would access the memory for BUCKETS[3], and + then compare a few consecutive 32 bit hashes before we know that we have no match. + We don't end up marching through multiple words of memory and we really keep the + number of processor data cache lines being accessed as small as possible.
+ +The string hash that is used for these lookup tables is the Daniel J. + Bernstein hash which is also used in the ELF GNU_HASH sections. It is a very + good hash for all kinds of names in programs with very few hash collisions.
+ +Empty buckets are designated by using an invalid hash index of UINT32_MAX.
+These name hash tables are designed to be generic where specializations of + the table get to define additional data that goes into the header + ("HeaderData"), how the string value is stored ("KeyType") and the content + of the data for each hash value.
+ +The header has a fixed part, and the specialized part. The exact format of + the header is:
++struct Header +{ + uint32_t magic; // 'HASH' magic value to allow endian detection + uint16_t version; // Version number + uint16_t hash_function; // The hash function enumeration that was used + uint32_t bucket_count; // The number of buckets in this hash table + uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table + uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment + // Specifically the length of the following HeaderData field - this does not + // include the size of the preceding fields + HeaderData header_data; // Implementation specific header data +}; ++
The header starts with a 32 bit "magic" value which must be 'HASH' encoded as + an ASCII integer. This allows the detection of the start of the hash table and + also allows the table's byte order to be determined so the table can be + correctly extracted. The "magic" value is followed by a 16 bit version number + which allows the table to be revised and modified in the future. The current + version number is 1. "hash_function" is a uint16_t enumeration that specifies + which hash function was used to produce this table. The current values for the + hash function enumerations include:
++enum HashFunctionType +{ + eHashFunctionDJB = 0u, // Daniel J Bernstein hash function +}; ++
"bucket_count" is a 32 bit unsigned integer that represents how many buckets + are in the BUCKETS array. "hashes_count" is the number of unique 32 bit hash + values that are in the HASHES array, and is the same number of offsets are + contained in the OFFSETS array. "header_data_len" specifies the size in + bytes of the HeaderData that is filled in by specialized versions of this + table.
+ +The header is followed by the buckets, hashes, offsets, and hash value + data. +
+struct FixedTable +{ + uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below + uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table + uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above +}; ++
"buckets" is an array of 32 bit indexes into the "hashes" array. The + "hashes" array contains all of the 32 bit hash values for all names in the + hash table. Each hash in the "hashes" table has an offset in the "offsets" + array that points to the data for the hash value.
+ +This table setup makes it very easy to repurpose these tables to contain + different data, while keeping the lookup mechanism the same for all tables. + This layout also makes it possible to save the table to disk and map it in + later and do very efficient name lookups with little or no parsing.
+ +DWARF lookup tables can be implemented in a variety of ways and can store + a lot of information for each name. We want to make the DWARF tables + extensible and able to store the data efficiently so we have used some of the + DWARF features that enable efficient data storage to define exactly what kind + of data we store for each name.
+ +The "HeaderData" contains a definition of the contents of each HashData + chunk. We might want to store an offset to all of the debug information + entries (DIEs) for each name. To keep things extensible, we create a list of + items, or Atoms, that are contained in the data for each name. First comes the + type of the data in each atom:
++enum AtomType +{ + eAtomTypeNULL = 0u, + eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding + eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question + eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2 + eAtomTypeNameFlags = 4u, // Flags from enum NameFlags + eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags +}; ++
The enumeration values and their meanings are:
++ eAtomTypeNULL - a termination atom that specifies the end of the atom list + eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name + eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE + eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is + eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...) + eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...) ++
Then we allow each atom type to define the atom type and how the data for + each atom type data is encoded:
++struct Atom +{ + uint16_t type; // AtomType enum value + uint16_t form; // DWARF DW_FORM_XXX defines +}; ++
The "form" type above is from the DWARF specification and defines the + exact encoding of the data for the Atom type. See the DWARF specification for + the DW_FORM_ definitions.
++struct HeaderData +{ + uint32_t die_offset_base; + uint32_t atom_count; + Atoms atoms[atom_count0]; +}; ++
"HeaderData" defines the base DIE offset that should be added to any atoms + that are encoded using the DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4, + DW_FORM_ref8 or DW_FORM_ref_udata. It also defines what is contained in + each "HashData" object -- Atom.form tells us how large each field will be in + the HashData and the Atom.type tells us how this data should be interpreted.
+ +For the current implementations of the ".apple_names" (all functions + globals), + the ".apple_types" (names of all types that are defined), and the + ".apple_namespaces" (all namespaces), we currently set the Atom array to be:
++HeaderData.atom_count = 1; +HeaderData.atoms[0].type = eAtomTypeDIEOffset; +HeaderData.atoms[0].form = DW_FORM_data4; ++
This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is + encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have + multiple matching DIEs in a single file, which could come up with an inlined + function for instance. Future tables could include more information about the + DIE such as flags indicating if the DIE is a function, method, block, + or inlined.
+ +The KeyType for the DWARF table is a 32 bit string table offset into the + ".debug_str" table. The ".debug_str" is the string table for the DWARF which + may already contain copies of all of the strings. This helps make sure, with + help from the compiler, that we reuse the strings between all of the DWARF + sections and keeps the hash table size down. Another benefit to having the + compiler generate all strings as DW_FORM_strp in the debug info, is that + DWARF parsing can be made much faster.
+ +After a lookup is made, we get an offset into the hash data. The hash data + needs to be able to deal with 32 bit hash collisions, so the chunk of data + at the offset in the hash data consists of a triple:
++uint32_t str_offset +uint32_t hash_data_count +HashData[hash_data_count] ++
If "str_offset" is zero, then the bucket contents are done. 99.9% of the + hash data chunks contain a single item (no 32 bit hash collision):
++.------------. +| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") +| 0x00000004 | uint32_t HashData count +| 0x........ | uint32_t HashData[0] DIE offset +| 0x........ | uint32_t HashData[1] DIE offset +| 0x........ | uint32_t HashData[2] DIE offset +| 0x........ | uint32_t HashData[3] DIE offset +| 0x00000000 | uint32_t KeyType (end of hash chain) +`------------' ++
If there are collisions, you will have multiple valid string offsets:
++.------------. +| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") +| 0x00000004 | uint32_t HashData count +| 0x........ | uint32_t HashData[0] DIE offset +| 0x........ | uint32_t HashData[1] DIE offset +| 0x........ | uint32_t HashData[2] DIE offset +| 0x........ | uint32_t HashData[3] DIE offset +| 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print") +| 0x00000002 | uint32_t HashData count +| 0x........ | uint32_t HashData[0] DIE offset +| 0x........ | uint32_t HashData[1] DIE offset +| 0x00000000 | uint32_t KeyType (end of hash chain) +`------------' ++
Current testing with real world C++ binaries has shown that there is around 1 + 32 bit hash collision per 100,000 name entries.
+As we said, we want to strictly define exactly what is included in the + different tables. For DWARF, we have 3 tables: ".apple_names", ".apple_types", + and ".apple_namespaces".
+ +".apple_names" sections should contain an entry for each DWARF DIE whose + DW_TAG is a DW_TAG_label, DW_TAG_inlined_subroutine, or DW_TAG_subprogram that + has address attributes: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges or + DW_AT_entry_pc. It also contains DW_TAG_variable DIEs that have a DW_OP_addr + in the location (global and static variables). All global and static variables + should be included, including those scoped within functions and classes. For + example using the following code:
++static int var = 0; + +void f () +{ + static int var = 0; +} ++
Both of the static "var" variables would be included in the table. All + functions should emit both their full names and their basenames. For C or C++, + the full name is the mangled name (if available) which is usually in the + DW_AT_MIPS_linkage_name attribute, and the DW_AT_name contains the function + basename. If global or static variables have a mangled name in a + DW_AT_MIPS_linkage_name attribute, this should be emitted along with the + simple name found in the DW_AT_name attribute.
+ +".apple_types" sections should contain an entry for each DWARF DIE whose + tag is one of:
+Only entries with a DW_AT_name attribute are included, and the entry must + not be a forward declaration (DW_AT_declaration attribute with a non-zero value). + For example, using the following code:
++int main () +{ + int *b = 0; + return *b; +} ++
We get a few type DIEs:
++0x00000067: TAG_base_type [5] + AT_encoding( DW_ATE_signed ) + AT_name( "int" ) + AT_byte_size( 0x04 ) + +0x0000006e: TAG_pointer_type [6] + AT_type( {0x00000067} ( int ) ) + AT_byte_size( 0x08 ) ++
The DW_TAG_pointer_type is not included because it does not have a DW_AT_name.
+ +".apple_namespaces" section should contain all DW_TAG_namespace DIEs. If + we run into a namespace that has no name this is an anonymous namespace, + and the name should be output as "(anonymous namespace)" (without the quotes). + Why? This matches the output of the abi::cxa_demangle() that is in the standard + C++ library that demangles mangled names.
+".apple_objc" section should contain all DW_TAG_subprogram DIEs for an + Objective-C class. The name used in the hash table is the name of the + Objective-C class itself. If the Objective-C class has a category, then an + entry is made for both the class name without the category, and for the class + name with the category. So if we have a DIE at offset 0x1234 with a name + of method "-[NSString(my_additions) stringWithSpecialString:]", we would add + an entry for "NSString" that points to DIE 0x1234, and an entry for + "NSString(my_additions)" that points to 0x1234. This allows us to quickly + track down all Objective-C methods for an Objective-C class when doing + expressions. It is needed because of the dynamic nature of Objective-C where + anyone can add methods to a class. The DWARF for Objective-C methods is also + emitted differently from C++ classes where the methods are not usually + contained in the class definition, they are scattered about across one or more + compile units. Categories can also be defined in different shared libraries. + So we need to be able to quickly find all of the methods and class functions + given the Objective-C class name, or quickly find all methods and class + functions for a class + category name. This table does not contain any selector + names, it just maps Objective-C class names (or class names + category) to all + of the methods and class functions. The selectors are added as function + basenames in the .debug_names section.
+ +In the ".apple_names" section for Objective-C functions, the full name is the + entire function name with the brackets ("-[NSString stringWithCString:]") and the + basename is the selector only ("stringWithCString:").
+ +The sections names for the apple hash tables are for non mach-o files. For + mach-o files, the sections should be contained in the "__DWARF" segment with + names as follows:
+