docs/SourceLevelDebugging.rst

   1 ================================
   2 Source Level Debugging with LLVM
   3 ================================
   4
   5 .. contents::
   6    :local:
   7
   8 Introduction
   9 ============
  10
  11 This document is the central repository for all information pertaining to debug
  12 information in LLVM.  It describes the :ref:`actual format that the LLVM debug
  13 information takes <format>`, which is useful for those interested in creating
  14 front-ends or dealing directly with the information.  Further, this document
  15 provides specific examples of what debug information for C/C++ looks like.
  16
  17 Philosophy behind LLVM debugging information
  18 --------------------------------------------
  19
  20 The idea of the LLVM debugging information is to capture how the important
  21 pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
  22 Several design aspects have shaped the solution that appears here.  The
  23 important ones are:
  24
  25 * Debugging information should have very little impact on the rest of the
  26   compiler.  No transformations, analyses, or code generators should need to
  27   be modified because of debugging information.
  28
  29 * LLVM optimizations should interact in :ref:`well-defined and easily described
  30   ways <intro_debugopt>` with the debugging information.
  31
  32 * Because LLVM is designed to support arbitrary programming languages,
  33   LLVM-to-LLVM tools should not need to know anything about the semantics of
  34   the source-level-language.
  35
  36 * Source-level languages are often **widely** different from one another.
  37   LLVM should not put any restrictions of the flavor of the source-language,
  38   and the debugging information should work with any language.
  39
  40 * With code generator support, it should be possible to use an LLVM compiler
  41   to compile a program to native machine code and standard debugging
  42   formats.  This allows compatibility with traditional machine-code level
  43   debuggers, like GDB or DBX.
  44
  45 The approach used by the LLVM implementation is to use a small set of
  46 :ref:`intrinsic functions <format_common_intrinsics>` to define a mapping
  47 between LLVM program objects and the source-level objects.  The description of
  48 the source-level program is maintained in LLVM metadata in an
  49 :ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end
  50 currently uses working draft 7 of the `DWARF 3 standard
  51 <http://www.eagercon.com/dwarf/dwarf3std.htm>`_).
  52
  53 When a program is being debugged, a debugger interacts with the user and turns
  54 the stored debug information into source-language specific information.  As
  55 such, a debugger must be aware of the source-language, and is thus tied to a
  56 specific language or family of languages.
  57
  58 Debug information consumers
  59 ---------------------------
  60
  61 The role of debug information is to provide meta information normally stripped
  62 away during the compilation process.  This meta information provides an LLVM
  63 user a relationship between generated code and the original program source
  64 code.
  65
  66 Currently, debug information is consumed by DwarfDebug to produce dwarf
  67 information used by the gdb debugger.  Other targets could use the same
  68 information to produce stabs or other debug forms.
  69
  70 It would also be reasonable to use debug information to feed profiling tools
  71 for analysis of generated code, or, tools for reconstructing the original
  72 source from generated code.
  73
  74 TODO - expound a bit more.
  75
  76 .. _intro_debugopt:
  77
  78 Debugging optimized code
  79 ------------------------
  80
  81 An extremely high priority of LLVM debugging information is to make it interact
  82 well with optimizations and analysis.  In particular, the LLVM debug
  83 information provides the following guarantees:
  84
  85 * LLVM debug information **always provides information to accurately read
  86   the source-level state of the program**, regardless of which LLVM
  87   optimizations have been run, and without any modification to the
  88   optimizations themselves.  However, some optimizations may impact the
  89   ability to modify the current state of the program with a debugger, such
  90   as setting program variables, or calling functions that have been
  91   deleted.
  92
  93 * As desired, LLVM optimizations can be upgraded to be aware of the LLVM
  94   debugging information, allowing them to update the debugging information
  95   as they perform aggressive optimizations.  This means that, with effort,
  96   the LLVM optimizers could optimize debug code just as well as non-debug
  97   code.
  98
  99 * LLVM debug information does not prevent optimizations from
 100   happening (for example inlining, basic block reordering/merging/cleanup,
 101   tail duplication, etc).
 102
 103 * LLVM debug information is automatically optimized along with the rest of
 104   the program, using existing facilities.  For example, duplicate
 105   information is automatically merged by the linker, and unused information
 106   is automatically removed.
 107
 108 Basically, the debug information allows you to compile a program with
 109 "``-O0 -g``" and get full debug information, allowing you to arbitrarily modify
 110 the program as it executes from a debugger.  Compiling a program with
 111 "``-O3 -g``" gives you full debug information that is always available and
 112 accurate for reading (e.g., you get accurate stack traces despite tail call
 113 elimination and inlining), but you might lose the ability to modify the program
 114 and call functions where were optimized out of the program, or inlined away
 115 completely.
 116
 117 :ref:`LLVM test suite <test-suite-quickstart>` provides a framework to test
 118 optimizer's handling of debugging information.  It can be run like this:
 119
 120 .. code-block:: bash
 121
 122   % cd llvm/projects/test-suite/MultiSource/Benchmarks  # or some other level
 123   % make TEST=dbgopt
 124
 125 This will test impact of debugging information on optimization passes.  If
 126 debugging information influences optimization passes then it will be reported
 127 as a failure.  See :doc:`TestingGuide` for more information on LLVM test
 128 infrastructure and how to run various tests.
 129
 130 .. _format:
 131
 132 Debugging information format
 133 ============================
 134
 135 LLVM debugging information has been carefully designed to make it possible for
 136 the optimizer to optimize the program and debugging information without
 137 necessarily having to know anything about debugging information.  In
 138 particular, the use of metadata avoids duplicated debugging information from
 139 the beginning, and the global dead code elimination pass automatically deletes
 140 debugging information for a function if it decides to delete the function.
 141
 142 To do this, most of the debugging information (descriptors for types,
 143 variables, functions, source files, etc) is inserted by the language front-end
 144 in the form of LLVM metadata.
 145
 146 Debug information is designed to be agnostic about the target debugger and
 147 debugging information representation (e.g. DWARF/Stabs/etc).  It uses a generic
 148 pass to decode the information that represents variables, types, functions,
 149 namespaces, etc: this allows for arbitrary source-language semantics and
 150 type-systems to be used, as long as there is a module written for the target
 151 debugger to interpret the information.
 152
 153 To provide basic functionality, the LLVM debugger does have to make some
 154 assumptions about the source-level language being debugged, though it keeps
 155 these to a minimum.  The only common features that the LLVM debugger assumes
 156 exist are :ref:`source files <format_files>`, and :ref:`program objects
 157 <format_global_variables>`.  These abstract objects are used by a debugger to
 158 form stack traces, show information about local variables, etc.
 159
 160 This section of the documentation first describes the representation aspects
 161 common to any source-language.  :ref:`ccxx_frontend` describes the data layout
 162 conventions used by the C and C++ front-ends.
 163
 164 Debug information descriptors
 165 -----------------------------
 166
 167 In consideration of the complexity and volume of debug information, LLVM
 168 provides a specification for well formed debug descriptors.
 169
 170 Consumers of LLVM debug information expect the descriptors for program objects
 171 to start in a canonical format, but the descriptors can include additional
 172 information appended at the end that is source-language specific.  All debugging
 173 information objects start with a tag to indicate what type of object it is.
 174 The source-language is allowed to define its own objects, by using unreserved
 175 tag numbers.  We recommend using with tags in the range 0x1000 through 0x2000
 176 (there is a defined ``enum DW_TAG_user_base = 0x1000``.)
 177
 178 The fields of debug descriptors used internally by LLVM are restricted to only
 179 the simple data types ``i32``, ``i1``, ``float``, ``double``, ``mdstring`` and
 180 ``mdnode``.
 181
 182 .. code-block:: llvm
 183
 184   !1 = metadata !{
 185     i32,   ;; A tag
 186     ...
 187   }
 188
 189 <a name="LLVMDebugVersion">The first field of a descriptor is always an
 190 ``i32`` containing a tag value identifying the content of the descriptor.
 191 The remaining fields are specific to the descriptor.  The values of tags are
 192 loosely bound to the tag values of DWARF information entries.  However, that
 193 does not restrict the use of the information supplied to DWARF targets.
 194
 195 The details of the various descriptors follow.
 196
 197 Compile unit descriptors
 198 ^^^^^^^^^^^^^^^^^^^^^^^^
 199
 200 .. code-block:: llvm
 201
 202   !0 = metadata !{
 203     i32,       ;; Tag = 17 (DW_TAG_compile_unit)
 204     metadata,  ;; Source directory (including trailing slash) & file pair
 205     i32,       ;; DWARF language identifier (ex. DW_LANG_C89)
 206     metadata   ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
 207     i1,        ;; True if this is optimized.
 208     metadata,  ;; Flags
 209     i32        ;; Runtime version
 210     metadata   ;; List of enums types
 211     metadata   ;; List of retained types
 212     metadata   ;; List of subprograms
 213     metadata   ;; List of global variables
 214     metadata   ;; List of imported entities
 215     metadata   ;; Split debug filename
 216   }
 217
 218 These descriptors contain a source language ID for the file (we use the DWARF
 219 3.0 ID numbers, such as ``DW_LANG_C89``, ``DW_LANG_C_plus_plus``,
 220 ``DW_LANG_Cobol74``, etc), a reference to a metadata node containing a pair of
 221 strings for the source file name and the working directory, as well as an
 222 identifier string for the compiler that produced it.
 223
 224 Compile unit descriptors provide the root context for objects declared in a
 225 specific compilation unit.  File descriptors are defined using this context.
 226 These descriptors are collected by a named metadata ``!llvm.dbg.cu``.  They
 227 keep track of subprograms, global variables, type information, and imported
 228 entities (declarations and namespaces).
 229
 230 .. _format_files:
 231
 232 File descriptors
 233 ^^^^^^^^^^^^^^^^
 234
 235 .. code-block:: llvm
 236
 237   !0 = metadata !{
 238     i32,       ;; Tag = 41 (DW_TAG_file_type)
 239     metadata,  ;; Source directory (including trailing slash) & file pair
 240   }
 241
 242 These descriptors contain information for a file.  Global variables and top
 243 level functions would be defined using this context.  File descriptors also
 244 provide context for source line correspondence.
 245
 246 Each input file is encoded as a separate file descriptor in LLVM debugging
 247 information output.
 248
 249 .. _format_global_variables:
 250
 251 Global variable descriptors
 252 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 253
 254 .. code-block:: llvm
 255
 256   !1 = metadata !{
 257     i32,      ;; Tag = 52 (DW_TAG_variable)
 258     i32,      ;; Unused field.
 259     metadata, ;; Reference to context descriptor
 260     metadata, ;; Name
 261     metadata, ;; Display name (fully qualified C++ name)
 262     metadata, ;; MIPS linkage name (for C++)
 263     metadata, ;; Reference to file where defined
 264     i32,      ;; Line number where defined
 265     metadata, ;; Reference to type descriptor
 266     i1,       ;; True if the global is local to compile unit (static)
 267     i1,       ;; True if the global is defined in the compile unit (not extern)
 268     {}*,      ;; Reference to the global variable
 269     metadata, ;; The static member declaration, if any
 270   }
 271
 272 These descriptors provide debug information about globals variables.  They
 273 provide details such as name, type and where the variable is defined.  All
 274 global variables are collected inside the named metadata ``!llvm.dbg.cu``.
 275
 276 .. _format_subprograms:
 277
 278 Subprogram descriptors
 279 ^^^^^^^^^^^^^^^^^^^^^^
 280
 281 .. code-block:: llvm
 282
 283   !2 = metadata !{
 284     i32,      ;; Tag = 46 (DW_TAG_subprogram)
 285     metadata, ;; Source directory (including trailing slash) & file pair
 286     metadata, ;; Reference to context descriptor
 287     metadata, ;; Name
 288     metadata, ;; Display name (fully qualified C++ name)
 289     metadata, ;; MIPS linkage name (for C++)
 290     i32,      ;; Line number where defined
 291     metadata, ;; Reference to type descriptor
 292     i1,       ;; True if the global is local to compile unit (static)
 293     i1,       ;; True if the global is defined in the compile unit (not extern)
 294     i32,      ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
 295     i32,      ;; Index into a virtual function
 296     metadata, ;; indicates which base type contains the vtable pointer for the
 297               ;; derived class
 298     i32,      ;; Flags - Artificial, Private, Protected, Explicit, Prototyped.
 299     i1,       ;; isOptimized
 300     Function * , ;; Pointer to LLVM function
 301     metadata, ;; Lists function template parameters
 302     metadata, ;; Function declaration descriptor
 303     metadata, ;; List of function variables
 304     i32       ;; Line number where the scope of the subprogram begins
 305   }
 306
 307 These descriptors provide debug information about functions, methods and
 308 subprograms.  They provide details such as name, return types and the source
 309 location where the subprogram is defined.
 310
 311 Block descriptors
 312 ^^^^^^^^^^^^^^^^^
 313
 314 .. code-block:: llvm
 315
 316   !3 = metadata !{
 317     i32,     ;; Tag = 11 (DW_TAG_lexical_block)
 318     metadata,;; Source directory (including trailing slash) & file pair
 319     metadata,;; Reference to context descriptor
 320     i32,     ;; Line number
 321     i32,     ;; Column number
 322     i32,     ;; DWARF path discriminator value
 323     i32      ;; Unique ID to identify blocks from a template function
 324   }
 325
 326 This descriptor provides debug information about nested blocks within a
 327 subprogram.  The line number and column numbers are used to dinstinguish two
 328 lexical blocks at same depth.
 329
 330 .. code-block:: llvm
 331
 332   !3 = metadata !{
 333     i32,     ;; Tag = 11 (DW_TAG_lexical_block)
 334     metadata,;; Source directory (including trailing slash) & file pair
 335     metadata ;; Reference to the scope we're annotating with a file change
 336   }
 337
 338 This descriptor provides a wrapper around a lexical scope to handle file
 339 changes in the middle of a lexical block.
 340
 341 .. _format_basic_type:
 342
 343 Basic type descriptors
 344 ^^^^^^^^^^^^^^^^^^^^^^
 345
 346 .. code-block:: llvm
 347
 348   !4 = metadata !{
 349     i32,      ;; Tag = 36 (DW_TAG_base_type)
 350     metadata, ;; Source directory (including trailing slash) & file pair (may be null)
 351     metadata, ;; Reference to context
 352     metadata, ;; Name (may be "" for anonymous types)
 353     i32,      ;; Line number where defined (may be 0)
 354     i64,      ;; Size in bits
 355     i64,      ;; Alignment in bits
 356     i64,      ;; Offset in bits
 357     i32,      ;; Flags
 358     i32       ;; DWARF type encoding
 359   }
 360
 361 These descriptors define primitive types used in the code.  Example ``int``,
 362 ``bool`` and ``float``.  The context provides the scope of the type, which is
 363 usually the top level.  Since basic types are not usually user defined the
 364 context and line number can be left as NULL and 0.  The size, alignment and
 365 offset are expressed in bits and can be 64 bit values.  The alignment is used
 366 to round the offset when embedded in a :ref:`composite type
 367 <format_composite_type>` (example to keep float doubles on 64 bit boundaries).
 368 The offset is the bit offset if embedded in a :ref:`composite type
 369 <format_composite_type>`.
 370
 371 The type encoding provides the details of the type.  The values are typically
 372 one of the following:
 373
 374 .. code-block:: llvm
 375
 376   DW_ATE_address       = 1
 377   DW_ATE_boolean       = 2
 378   DW_ATE_float         = 4
 379   DW_ATE_signed        = 5
 380   DW_ATE_signed_char   = 6
 381   DW_ATE_unsigned      = 7
 382   DW_ATE_unsigned_char = 8
 383
 384 .. _format_derived_type:
 385
 386 Derived type descriptors
 387 ^^^^^^^^^^^^^^^^^^^^^^^^
 388
 389 .. code-block:: llvm
 390
 391   !5 = metadata !{
 392     i32,      ;; Tag (see below)
 393     metadata, ;; Source directory (including trailing slash) & file pair (may be null)
 394     metadata, ;; Reference to context
 395     metadata, ;; Name (may be "" for anonymous types)
 396     i32,      ;; Line number where defined (may be 0)
 397     i64,      ;; Size in bits
 398     i64,      ;; Alignment in bits
 399     i64,      ;; Offset in bits
 400     i32,      ;; Flags to encode attributes, e.g. private
 401     metadata, ;; Reference to type derived from
 402     metadata, ;; (optional) Name of the Objective C property associated with
 403               ;; Objective-C an ivar, or the type of which this
 404               ;; pointer-to-member is pointing to members of.
 405     metadata, ;; (optional) Name of the Objective C property getter selector.
 406     metadata, ;; (optional) Name of the Objective C property setter selector.
 407     i32       ;; (optional) Objective C property attributes.
 408   }
 409
 410 These descriptors are used to define types derived from other types.  The value
 411 of the tag varies depending on the meaning.  The following are possible tag
 412 values:
 413
 414 .. code-block:: llvm
 415
 416   DW_TAG_formal_parameter   = 5
 417   DW_TAG_member             = 13
 418   DW_TAG_pointer_type       = 15
 419   DW_TAG_reference_type     = 16
 420   DW_TAG_typedef            = 22
 421   DW_TAG_ptr_to_member_type = 31
 422   DW_TAG_const_type         = 38
 423   DW_TAG_volatile_type      = 53
 424   DW_TAG_restrict_type      = 55
 425
 426 ``DW_TAG_member`` is used to define a member of a :ref:`composite type
 427 <format_composite_type>` or :ref:`subprogram <format_subprograms>`.  The type
 428 of the member is the :ref:`derived type <format_derived_type>`.
 429 ``DW_TAG_formal_parameter`` is used to define a member which is a formal
 430 argument of a subprogram.
 431
 432 ``DW_TAG_typedef`` is used to provide a name for the derived type.
 433
 434 ``DW_TAG_pointer_type``, ``DW_TAG_reference_type``, ``DW_TAG_const_type``,
 435 ``DW_TAG_volatile_type`` and ``DW_TAG_restrict_type`` are used to qualify the
 436 :ref:`derived type <format_derived_type>`.
 437
 438 :ref:`Derived type <format_derived_type>` location can be determined from the
 439 context and line number.  The size, alignment and offset are expressed in bits
 440 and can be 64 bit values.  The alignment is used to round the offset when
 441 embedded in a :ref:`composite type <format_composite_type>`  (example to keep
 442 float doubles on 64 bit boundaries.) The offset is the bit offset if embedded
 443 in a :ref:`composite type <format_composite_type>`.
 444
 445 Note that the ``void *`` type is expressed as a type derived from NULL.
 446
 447 .. _format_composite_type:
 448
 449 Composite type descriptors
 450 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 451
 452 .. code-block:: llvm
 453
 454   !6 = metadata !{
 455     i32,      ;; Tag (see below)
 456     metadata, ;; Source directory (including trailing slash) & file pair (may be null)
 457     metadata, ;; Reference to context
 458     metadata, ;; Name (may be "" for anonymous types)
 459     i32,      ;; Line number where defined (may be 0)
 460     i64,      ;; Size in bits
 461     i64,      ;; Alignment in bits
 462     i64,      ;; Offset in bits
 463     i32,      ;; Flags
 464     metadata, ;; Reference to type derived from
 465     metadata, ;; Reference to array of member descriptors
 466     i32,      ;; Runtime languages
 467     metadata, ;; Base type containing the vtable pointer for this type
 468     metadata, ;; Template parameters
 469     metadata  ;; A unique identifier for type uniquing purpose (may be null)
 470   }
 471
 472 These descriptors are used to define types that are composed of 0 or more
 473 elements.  The value of the tag varies depending on the meaning.  The following
 474 are possible tag values:
 475
 476 .. code-block:: llvm
 477
 478   DW_TAG_array_type       = 1
 479   DW_TAG_enumeration_type = 4
 480   DW_TAG_structure_type   = 19
 481   DW_TAG_union_type       = 23
 482   DW_TAG_subroutine_type  = 21
 483   DW_TAG_inheritance      = 28
 484
 485 The vector flag indicates that an array type is a native packed vector.
 486
 487 The members of array types (tag = ``DW_TAG_array_type``) are
 488 :ref:`subrange descriptors <format_subrange>`, each
 489 representing the range of subscripts at that level of indexing.
 490
 491 The members of enumeration types (tag = ``DW_TAG_enumeration_type``) are
 492 :ref:`enumerator descriptors <format_enumerator>`, each representing the
 493 definition of enumeration value for the set.  All enumeration type descriptors
 494 are collected inside the named metadata ``!llvm.dbg.cu``.
 495
 496 The members of structure (tag = ``DW_TAG_structure_type``) or union (tag =
 497 ``DW_TAG_union_type``) types are any one of the :ref:`basic
 498 <format_basic_type>`, :ref:`derived <format_derived_type>` or :ref:`composite
 499 <format_composite_type>` type descriptors, each representing a field member of
 500 the structure or union.
 501
 502 For C++ classes (tag = ``DW_TAG_structure_type``), member descriptors provide
 503 information about base classes, static members and member functions.  If a
 504 member is a :ref:`derived type descriptor <format_derived_type>` and has a tag
 505 of ``DW_TAG_inheritance``, then the type represents a base class.  If the member
 506 of is a :ref:`global variable descriptor <format_global_variables>` then it
 507 represents a static member.  And, if the member is a :ref:`subprogram
 508 descriptor <format_subprograms>` then it represents a member function.  For
 509 static members and member functions, ``getName()`` returns the members link or
 510 the C++ mangled name.  ``getDisplayName()`` the simplied version of the name.
 511
 512 The first member of subroutine (tag = ``DW_TAG_subroutine_type``) type elements
 513 is the return type for the subroutine.  The remaining elements are the formal
 514 arguments to the subroutine.
 515
 516 :ref:`Composite type <format_composite_type>` location can be determined from
 517 the context and line number.  The size, alignment and offset are expressed in
 518 bits and can be 64 bit values.  The alignment is used to round the offset when
 519 embedded in a :ref:`composite type <format_composite_type>` (as an example, to
 520 keep float doubles on 64 bit boundaries).  The offset is the bit offset if
 521 embedded in a :ref:`composite type <format_composite_type>`.
 522
 523 .. _format_subrange:
 524
 525 Subrange descriptors
 526 ^^^^^^^^^^^^^^^^^^^^
 527
 528 .. code-block:: llvm
 529
 530   !42 = metadata !{
 531     i32,    ;; Tag = 33 (DW_TAG_subrange_type)
 532     i64,    ;; Low value
 533     i64     ;; High value
 534   }
 535
 536 These descriptors are used to define ranges of array subscripts for an array
 537 :ref:`composite type <format_composite_type>`.  The low value defines the lower
 538 bounds typically zero for C/C++.  The high value is the upper bounds.  Values
 539 are 64 bit.  ``High - Low + 1`` is the size of the array.  If ``Low > High``
 540 the array bounds are not included in generated debugging information.
 541
 542 .. _format_enumerator:
 543
 544 Enumerator descriptors
 545 ^^^^^^^^^^^^^^^^^^^^^^
 546
 547 .. code-block:: llvm
 548
 549   !6 = metadata !{
 550     i32,      ;; Tag = 40 (DW_TAG_enumerator)
 551     metadata, ;; Name
 552     i64       ;; Value
 553   }
 554
 555 These descriptors are used to define members of an enumeration :ref:`composite
 556 type <format_composite_type>`, it associates the name to the value.
 557
 558 Local variables
 559 ^^^^^^^^^^^^^^^
 560
 561 .. code-block:: llvm
 562
 563   !7 = metadata !{
 564     i32,      ;; Tag (see below)
 565     metadata, ;; Context
 566     metadata, ;; Name
 567     metadata, ;; Reference to file where defined
 568     i32,      ;; 24 bit - Line number where defined
 569               ;; 8 bit - Argument number. 1 indicates 1st argument.
 570     metadata, ;; Reference to the type descriptor
 571     i32,      ;; flags
 572     metadata  ;; (optional) Reference to inline location
 573   }
 574
 575 These descriptors are used to define variables local to a sub program.  The
 576 value of the tag depends on the usage of the variable:
 577
 578 .. code-block:: llvm
 579
 580   DW_TAG_auto_variable   = 256
 581   DW_TAG_arg_variable    = 257
 582
 583 An auto variable is any variable declared in the body of the function.  An
 584 argument variable is any variable that appears as a formal argument to the
 585 function.
 586
 587 The context is either the subprogram or block where the variable is defined.
 588 Name the source variable name.  Context and line indicate where the variable
 589 was defined.  Type descriptor defines the declared type of the variable.
 590
 591 .. _format_common_intrinsics:
 592
 593 Debugger intrinsic functions
 594 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 595
 596 LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to
 597 provide debug information at various points in generated code.
 598
 599 ``llvm.dbg.declare``
 600 ^^^^^^^^^^^^^^^^^^^^
 601
 602 .. code-block:: llvm
 603
 604   void %llvm.dbg.declare(metadata, metadata)
 605
 606 This intrinsic provides information about a local element (e.g., variable).
 607 The first argument is metadata holding the alloca for the variable.  The second
 608 argument is metadata containing a description of the variable.
 609
 610 ``llvm.dbg.value``
 611 ^^^^^^^^^^^^^^^^^^
 612
 613 .. code-block:: llvm
 614
 615   void %llvm.dbg.value(metadata, i64, metadata)
 616
 617 This intrinsic provides information when a user source variable is set to a new
 618 value.  The first argument is the new value (wrapped as metadata).  The second
 619 argument is the offset in the user source variable where the new value is
 620 written.  The third argument is metadata containing a description of the user
 621 source variable.
 622
 623 Object lifetimes and scoping
 624 ============================
 625
 626 In many languages, the local variables in functions can have their lifetimes or
 627 scopes limited to a subset of a function.  In the C family of languages, for
 628 example, variables are only live (readable and writable) within the source
 629 block that they are defined in.  In functional languages, values are only
 630 readable after they have been defined.  Though this is a very obvious concept,
 631 it is non-trivial to model in LLVM, because it has no notion of scoping in this
 632 sense, and does not want to be tied to a language's scoping rules.
 633
 634 In order to handle this, the LLVM debug format uses the metadata attached to
 635 llvm instructions to encode line number and scoping information.  Consider the
 636 following C fragment, for example:
 637
 638 .. code-block:: c
 639
 640   1.  void foo() {
 641   2.    int X = 21;
 642   3.    int Y = 22;
 643   4.    {
 644   5.      int Z = 23;
 645   6.      Z = X;
 646   7.    }
 647   8.    X = Y;
 648   9.  }
 649
 650 Compiled to LLVM, this function would be represented like this:
 651
 652 .. code-block:: llvm
 653
 654   define void @foo() #0 {
 655   entry:
 656    %X = alloca i32, align 4
 657     %Y = alloca i32, align 4
 658     %Z = alloca i32, align 4
 659     call void @llvm.dbg.declare(metadata !{i32* %X}, metadata !10), !dbg !12
 660       ; [debug line = 2:7] [debug variable = X]
 661     store i32 21, i32* %X, align 4, !dbg !12
 662     call void @llvm.dbg.declare(metadata !{i32* %Y}, metadata !13), !dbg !14
 663       ; [debug line = 3:7] [debug variable = Y]
 664     store i32 22, i32* %Y, align 4, !dbg !14
 665     call void @llvm.dbg.declare(metadata !{i32* %Z}, metadata !15), !dbg !17
 666       ; [debug line = 5:9] [debug variable = Z]
 667     store i32 23, i32* %Z, align 4, !dbg !17
 668     %0 = load i32* %X, align 4, !dbg !18
 669       [debug line = 6:5]
 670     store i32 %0, i32* %Z, align 4, !dbg !18
 671     %1 = load i32* %Y, align 4, !dbg !19
 672       [debug line = 8:3]
 673     store i32 %1, i32* %X, align 4, !dbg !19
 674     ret void, !dbg !20
 675   }
 676
 677   ; Function Attrs: nounwind readnone
 678   declare void @llvm.dbg.declare(metadata, metadata) #1
 679
 680   attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false"
 681     "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"
 682     "no-infs-fp-math"="false" "no-nans-fp-math"="false"
 683     "stack-protector-buffer-size"="8" "unsafe-fp-math"="false"
 684     "use-soft-float"="false" }
 685   attributes #1 = { nounwind readnone }
 686
 687   !llvm.dbg.cu = !{!0}
 688   !llvm.module.flags = !{!8}
 689   !llvm.ident = !{!9}
 690
 691   !0 = metadata !{i32 786449, metadata !1, i32 12,
 692                   metadata !"clang version 3.4 (trunk 193128) (llvm/trunk 193139)",
 693                   i1 false, metadata !"", i32 0, metadata !2, metadata !2, metadata !3,
 694                   metadata !2, metadata !2, metadata !""} ; [ DW_TAG_compile_unit ] \
 695                     [/private/tmp/foo.c] \
 696                     [DW_LANG_C99]
 697   !1 = metadata !{metadata !"t.c", metadata !"/private/tmp"}
 698   !2 = metadata !{i32 0}
 699   !3 = metadata !{metadata !4}
 700   !4 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"foo",
 701                   metadata !"foo", metadata !"", i32 1, metadata !6,
 702                   i1 false, i1 true, i32 0, i32 0, null, i32 0, i1 false,
 703                   void ()* @foo, null, null, metadata !2, i32 1}
 704                   ; [ DW_TAG_subprogram ] [line 1] [def] [foo]
 705   !5 = metadata !{i32 786473, metadata !1}  ; [ DW_TAG_file_type ] \
 706                     [/private/tmp/t.c]
 707   !6 = metadata !{i32 786453, i32 0, null, metadata !"", i32 0, i64 0, i64 0,
 708                   i64 0, i32 0, null, metadata !7, i32 0, null, null, null}
 709                   ; [ DW_TAG_subroutine_type ] \
 710                     [line 0, size 0, align 0, offset 0] [from ]
 711   !7 = metadata !{null}
 712   !8 = metadata !{i32 2, metadata !"Dwarf Version", i32 2}
 713   !9 = metadata !{metadata !"clang version 3.4 (trunk 193128) (llvm/trunk 193139)"}
 714   !10 = metadata !{i32 786688, metadata !4, metadata !"X", metadata !5, i32 2,
 715                    metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [X] \
 716                      [line 2]
 717   !11 = metadata !{i32 786468, null, null, metadata !"int", i32 0, i64 32,
 718                    i64 32, i64 0, i32 0, i32 5} ; [ DW_TAG_base_type ] [int] \
 719                      [line 0, size 32, align 32, offset 0, enc DW_ATE_signed]
 720   !12 = metadata !{i32 2, i32 0, metadata !4, null}
 721   !13 = metadata !{i32 786688, metadata !4, metadata !"Y", metadata !5, i32 3,
 722                    metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [Y] \
 723                      [line 3]
 724   !14 = metadata !{i32 3, i32 0, metadata !4, null}
 725   !15 = metadata !{i32 786688, metadata !16, metadata !"Z", metadata !5, i32 5,
 726                    metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [Z] \
 727                      [line 5]
 728   !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0,
 729                    i32 0} \
 730                    ; [ DW_TAG_lexical_block ] [/private/tmp/t.c]
 731   !17 = metadata !{i32 5, i32 0, metadata !16, null}
 732   !18 = metadata !{i32 6, i32 0, metadata !16, null}
 733   !19 = metadata !{i32 8, i32 0, metadata !4, null} ; [ DW_TAG_imported_declaration ]
 734   !20 = metadata !{i32 9, i32 0, metadata !4, null}
 735
 736 This example illustrates a few important details about LLVM debugging
 737 information.  In particular, it shows how the ``llvm.dbg.declare`` intrinsic and
 738 location information, which are attached to an instruction, are applied
 739 together to allow a debugger to analyze the relationship between statements,
 740 variable definitions, and the code used to implement the function.
 741
 742 .. code-block:: llvm
 743
 744   call void @llvm.dbg.declare(metadata !{i32* %X}, metadata !10), !dbg !12
 745     ; [debug line = 2:7] [debug variable = X]
 746
 747 The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the
 748 variable ``X``.  The metadata ``!dbg !12`` attached to the intrinsic provides
 749 scope information for the variable ``X``.
 750
 751 .. code-block:: llvm
 752
 753   !12 = metadata !{i32 2, i32 0, metadata !4, null}
 754   !4 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"foo",
 755                   metadata !"foo", metadata !"", i32 1, metadata !6,
 756                   i1 false, i1 true, i32 0, i32 0, null, i32 0, i1 false,
 757                   void ()* @foo, null, null, metadata !2, i32 1}
 758                     ; [ DW_TAG_subprogram ] [line 1] [def] [foo]
 759
 760 Here ``!12`` is metadata providing location information.  It has four fields:
 761 line number, column number, scope, and original scope.  The original scope
 762 represents inline location if this instruction is inlined inside a caller, and
 763 is null otherwise.  In this example, scope is encoded by ``!4``, a
 764 :ref:`subprogram descriptor <format_subprograms>`.  This way the location
 765 information attached to the intrinsics indicates that the variable ``X`` is
 766 declared at line number 2 at a function level scope in function ``foo``.
 767
 768 Now lets take another example.
 769
 770 .. code-block:: llvm
 771
 772   call void @llvm.dbg.declare(metadata !{i32* %Z}, metadata !15), !dbg !17
 773     ; [debug line = 5:9] [debug variable = Z]
 774
 775 The third intrinsic ``%llvm.dbg.declare`` encodes debugging information for
 776 variable ``Z``.  The metadata ``!dbg !17`` attached to the intrinsic provides
 777 scope information for the variable ``Z``.
 778
 779 .. code-block:: llvm
 780
 781   !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0,
 782                    i32 0}
 783                    ; [ DW_TAG_lexical_block ] [/private/tmp/t.c]
 784   !17 = metadata !{i32 5, i32 0, metadata !16, null}
 785
 786 Here ``!15`` indicates that ``Z`` is declared at line number 5 and
 787 column number 0 inside of lexical scope ``!16``.  The lexical scope itself
 788 resides inside of subprogram ``!4`` described above.
 789
 790 The scope information attached with each instruction provides a straightforward
 791 way to find instructions covered by a scope.
 792
 793 .. _ccxx_frontend:
 794
 795 C/C++ front-end specific debug information
 796 ==========================================
 797
 798 The C and C++ front-ends represent information about the program in a format
 799 that is effectively identical to `DWARF 3.0
 800 <http://www.eagercon.com/dwarf/dwarf3std.htm>`_ in terms of information
 801 content.  This allows code generators to trivially support native debuggers by
 802 generating standard dwarf information, and contains enough information for
 803 non-dwarf targets to translate it as needed.
 804
 805 This section describes the forms used to represent C and C++ programs.  Other
 806 languages could pattern themselves after this (which itself is tuned to
 807 representing programs in the same way that DWARF 3 does), or they could choose
 808 to provide completely different forms if they don't fit into the DWARF model.
 809 As support for debugging information gets added to the various LLVM
 810 source-language front-ends, the information used should be documented here.
 811
 812 The following sections provide examples of various C/C++ constructs and the
 813 debug information that would best describe those constructs.
 814
 815 C/C++ source file information
 816 -----------------------------
 817
 818 Given the source files ``MySource.cpp`` and ``MyHeader.h`` located in the
 819 directory ``/Users/mine/sources``, the following code:
 820
 821 .. code-block:: c
 822
 823   #include "MyHeader.h"
 824
 825   int main(int argc, char *argv[]) {
 826     return 0;
 827   }
 828
 829 a C/C++ front-end would generate the following descriptors:
 830
 831 .. code-block:: llvm
 832
 833   ...
 834   ;;
 835   ;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp".
 836   ;;
 837   !0 = metadata !{
 838     i32 786449,   ;; Tag
 839     metadata !1,  ;; File/directory name
 840     i32 4,        ;; Language Id
 841     metadata !"clang version 3.4 ",
 842     i1 false,     ;; Optimized compile unit
 843     metadata !"", ;; Compiler flags
 844     i32 0,        ;; Runtime version
 845     metadata !2,  ;; Enumeration types
 846     metadata !2,  ;; Retained types
 847     metadata !3,  ;; Subprograms
 848     metadata !2,  ;; Global variables
 849     metadata !2,  ;; Imported entities (declarations and namespaces)
 850     metadata !""  ;; Split debug filename
 851   }
 852
 853   ;;
 854   ;; Define the file for the file "/Users/mine/sources/MySource.cpp".
 855   ;;
 856   !1 = metadata !{
 857     metadata !"MySource.cpp",
 858     metadata !"/Users/mine/sources"
 859   }
 860   !5 = metadata !{
 861     i32 786473, ;; Tag
 862     metadata !1
 863   }
 864
 865   ;;
 866   ;; Define the file for the file "/Users/mine/sources/Myheader.h"
 867   ;;
 868   !14 = metadata !{
 869     i32 786473, ;; Tag
 870     metadata !15
 871   }
 872   !15 = metadata !{
 873     metadata !"./MyHeader.h",
 874     metadata !"/Users/mine/sources",
 875   }
 876
 877   ...
 878
 879 ``llvm::Instruction`` provides easy access to metadata attached with an
 880 instruction.  One can extract line number information encoded in LLVM IR using
 881 ``Instruction::getMetadata()`` and ``DILocation::getLineNumber()``.
 882
 883 .. code-block:: c++
 884
 885   if (MDNode *N = I->getMetadata("dbg")) {  // Here I is an LLVM instruction
 886     DILocation Loc(N);                      // DILocation is in DebugInfo.h
 887     unsigned Line = Loc.getLineNumber();
 888     StringRef File = Loc.getFilename();
 889     StringRef Dir = Loc.getDirectory();
 890   }
 891
 892 C/C++ global variable information
 893 ---------------------------------
 894
 895 Given an integer global variable declared as follows:
 896
 897 .. code-block:: c
 898
 899   int MyGlobal = 100;
 900
 901 a C/C++ front-end would generate the following descriptors:
 902
 903 .. code-block:: llvm
 904
 905   ;;
 906   ;; Define the global itself.
 907   ;;
 908   %MyGlobal = global int 100
 909   ...
 910   ;;
 911   ;; List of debug info of globals
 912   ;;
 913   !llvm.dbg.cu = !{!0}
 914
 915   ;; Define the compile unit.
 916   !0 = metadata !{
 917     i32 786449,                       ;; Tag
 918     i32 0,                            ;; Context
 919     i32 4,                            ;; Language
 920     metadata !"foo.cpp",              ;; File
 921     metadata !"/Volumes/Data/tmp",    ;; Directory
 922     metadata !"clang version 3.1 ",   ;; Producer
 923     i1 true,                          ;; Deprecated field
 924     i1 false,                         ;; "isOptimized"?
 925     metadata !"",                     ;; Flags
 926     i32 0,                            ;; Runtime Version
 927     metadata !1,                      ;; Enum Types
 928     metadata !1,                      ;; Retained Types
 929     metadata !1,                      ;; Subprograms
 930     metadata !3,                      ;; Global Variables
 931     metadata !1,                      ;; Imported entities
 932     "",                               ;; Split debug filename
 933   } ; [ DW_TAG_compile_unit ]
 934
 935   ;; The Array of Global Variables
 936   !3 = metadata !{
 937     metadata !4
 938   }
 939
 940   ;;
 941   ;; Define the global variable itself.
 942   ;;
 943   !4 = metadata !{
 944     i32 786484,                        ;; Tag
 945     i32 0,                             ;; Unused
 946     null,                              ;; Unused
 947     metadata !"MyGlobal",              ;; Name
 948     metadata !"MyGlobal",              ;; Display Name
 949     metadata !"",                      ;; Linkage Name
 950     metadata !6,                       ;; File
 951     i32 1,                             ;; Line
 952     metadata !7,                       ;; Type
 953     i32 0,                             ;; IsLocalToUnit
 954     i32 1,                             ;; IsDefinition
 955     i32* @MyGlobal,                    ;; LLVM-IR Value
 956     null                               ;; Static member declaration
 957   } ; [ DW_TAG_variable ]
 958
 959   ;;
 960   ;; Define the file
 961   ;;
 962   !5 = metadata !{
 963     metadata !"foo.cpp",               ;; File
 964     metadata !"/Volumes/Data/tmp",     ;; Directory
 965   }
 966   !6 = metadata !{
 967     i32 786473,                        ;; Tag
 968     metadata !5                        ;; Unused
 969   } ; [ DW_TAG_file_type ]
 970
 971   ;;
 972   ;; Define the type
 973   ;;
 974   !7 = metadata !{
 975     i32 786468,                         ;; Tag
 976     null,                               ;; Unused
 977     null,                               ;; Unused
 978     metadata !"int",                    ;; Name
 979     i32 0,                              ;; Line
 980     i64 32,                             ;; Size in Bits
 981     i64 32,                             ;; Align in Bits
 982     i64 0,                              ;; Offset
 983     i32 0,                              ;; Flags
 984     i32 5                               ;; Encoding
 985   } ; [ DW_TAG_base_type ]
 986
 987 C/C++ function information
 988 --------------------------
 989
 990 Given a function declared as follows:
 991
 992 .. code-block:: c
 993
 994   int main(int argc, char *argv[]) {
 995     return 0;
 996   }
 997
 998 a C/C++ front-end would generate the following descriptors:
 999
1000 .. code-block:: llvm
1001
1002   ;;
1003   ;; Define the anchor for subprograms.
1004   ;;
1005   !6 = metadata !{
1006     i32 786484,        ;; Tag
1007     metadata !1,       ;; File
1008     metadata !1,       ;; Context
1009     metadata !"main",  ;; Name
1010     metadata !"main",  ;; Display name
1011     metadata !"main",  ;; Linkage name
1012     i32 1,             ;; Line number
1013     metadata !4,       ;; Type
1014     i1 false,          ;; Is local
1015     i1 true,           ;; Is definition
1016     i32 0,             ;; Virtuality attribute, e.g. pure virtual function
1017     i32 0,             ;; Index into virtual table for C++ methods
1018     i32 0,             ;; Type that holds virtual table.
1019     i32 0,             ;; Flags
1020     i1 false,          ;; True if this function is optimized
1021     Function *,        ;; Pointer to llvm::Function
1022     null,              ;; Function template parameters
1023     null,              ;; List of function variables (emitted when optimizing)
1024     1                  ;; Line number of the opening '{' of the function
1025   }
1026   ;;
1027   ;; Define the subprogram itself.
1028   ;;
1029   define i32 @main(i32 %argc, i8** %argv) {
1030   ...
1031   }
1032
1033 C/C++ basic types
1034 -----------------
1035
1036 The following are the basic type descriptors for C/C++ core types:
1037
1038 bool
1039 ^^^^
1040
1041 .. code-block:: llvm
1042
1043   !2 = metadata !{
1044     i32 786468,        ;; Tag
1045     null,              ;; File
1046     null,              ;; Context
1047     metadata !"bool",  ;; Name
1048     i32 0,             ;; Line number
1049     i64 8,             ;; Size in Bits
1050     i64 8,             ;; Align in Bits
1051     i64 0,             ;; Offset in Bits
1052     i32 0,             ;; Flags
1053     i32 2              ;; Encoding
1054   }
1055
1056 char
1057 ^^^^
1058
1059 .. code-block:: llvm
1060
1061   !2 = metadata !{
1062     i32 786468,        ;; Tag
1063     null,              ;; File
1064     null,              ;; Context
1065     metadata !"char",  ;; Name
1066     i32 0,             ;; Line number
1067     i64 8,             ;; Size in Bits
1068     i64 8,             ;; Align in Bits
1069     i64 0,             ;; Offset in Bits
1070     i32 0,             ;; Flags
1071     i32 6              ;; Encoding
1072   }
1073
1074 unsigned char
1075 ^^^^^^^^^^^^^
1076
1077 .. code-block:: llvm
1078
1079   !2 = metadata !{
1080     i32 786468,        ;; Tag
1081     null,              ;; File
1082     null,              ;; Context
1083     metadata !"unsigned char",
1084     i32 0,             ;; Line number
1085     i64 8,             ;; Size in Bits
1086     i64 8,             ;; Align in Bits
1087     i64 0,             ;; Offset in Bits
1088     i32 0,             ;; Flags
1089     i32 8              ;; Encoding
1090   }
1091
1092 short
1093 ^^^^^
1094
1095 .. code-block:: llvm
1096
1097   !2 = metadata !{
1098     i32 786468,        ;; Tag
1099     null,              ;; File
1100     null,              ;; Context
1101     metadata !"short int",
1102     i32 0,             ;; Line number
1103     i64 16,            ;; Size in Bits
1104     i64 16,            ;; Align in Bits
1105     i64 0,             ;; Offset in Bits
1106     i32 0,             ;; Flags
1107     i32 5              ;; Encoding
1108   }
1109
1110 unsigned short
1111 ^^^^^^^^^^^^^^
1112
1113 .. code-block:: llvm
1114
1115   !2 = metadata !{
1116     i32 786468,        ;; Tag
1117     null,              ;; File
1118     null,              ;; Context
1119     metadata !"short unsigned int",
1120     i32 0,             ;; Line number
1121     i64 16,            ;; Size in Bits
1122     i64 16,            ;; Align in Bits
1123     i64 0,             ;; Offset in Bits
1124     i32 0,             ;; Flags
1125     i32 7              ;; Encoding
1126   }
1127
1128 int
1129 ^^^
1130
1131 .. code-block:: llvm
1132
1133   !2 = metadata !{
1134     i32 786468,        ;; Tag
1135     null,              ;; File
1136     null,              ;; Context
1137     metadata !"int",   ;; Name
1138     i32 0,             ;; Line number
1139     i64 32,            ;; Size in Bits
1140     i64 32,            ;; Align in Bits
1141     i64 0,             ;; Offset in Bits
1142     i32 0,             ;; Flags
1143     i32 5              ;; Encoding
1144   }
1145
1146 unsigned int
1147 ^^^^^^^^^^^^
1148
1149 .. code-block:: llvm
1150
1151   !2 = metadata !{
1152     i32 786468,        ;; Tag
1153     null,              ;; File
1154     null,              ;; Context
1155     metadata !"unsigned int",
1156     i32 0,             ;; Line number
1157     i64 32,            ;; Size in Bits
1158     i64 32,            ;; Align in Bits
1159     i64 0,             ;; Offset in Bits
1160     i32 0,             ;; Flags
1161     i32 7              ;; Encoding
1162   }
1163
1164 long long
1165 ^^^^^^^^^
1166
1167 .. code-block:: llvm
1168
1169   !2 = metadata !{
1170     i32 786468,        ;; Tag
1171     null,              ;; File
1172     null,              ;; Context
1173     metadata !"long long int",
1174     i32 0,             ;; Line number
1175     i64 64,            ;; Size in Bits
1176     i64 64,            ;; Align in Bits
1177     i64 0,             ;; Offset in Bits
1178     i32 0,             ;; Flags
1179     i32 5              ;; Encoding
1180   }
1181
1182 unsigned long long
1183 ^^^^^^^^^^^^^^^^^^
1184
1185 .. code-block:: llvm
1186
1187   !2 = metadata !{
1188     i32 786468,        ;; Tag
1189     null,              ;; File
1190     null,              ;; Context
1191     metadata !"long long unsigned int",
1192     i32 0,             ;; Line number
1193     i64 64,            ;; Size in Bits
1194     i64 64,            ;; Align in Bits
1195     i64 0,             ;; Offset in Bits
1196     i32 0,             ;; Flags
1197     i32 7              ;; Encoding
1198   }
1199
1200 float
1201 ^^^^^
1202
1203 .. code-block:: llvm
1204
1205   !2 = metadata !{
1206     i32 786468,        ;; Tag
1207     null,              ;; File
1208     null,              ;; Context
1209     metadata !"float",
1210     i32 0,             ;; Line number
1211     i64 32,            ;; Size in Bits
1212     i64 32,            ;; Align in Bits
1213     i64 0,             ;; Offset in Bits
1214     i32 0,             ;; Flags
1215     i32 4              ;; Encoding
1216   }
1217
1218 double
1219 ^^^^^^
1220
1221 .. code-block:: llvm
1222
1223   !2 = metadata !{
1224     i32 786468,        ;; Tag
1225     null,              ;; File
1226     null,              ;; Context
1227     metadata !"double",;; Name
1228     i32 0,             ;; Line number
1229     i64 64,            ;; Size in Bits
1230     i64 64,            ;; Align in Bits
1231     i64 0,             ;; Offset in Bits
1232     i32 0,             ;; Flags
1233     i32 4              ;; Encoding
1234   }
1235
1236 C/C++ derived types
1237 -------------------
1238
1239 Given the following as an example of C/C++ derived type:
1240
1241 .. code-block:: c
1242
1243   typedef const int *IntPtr;
1244
1245 a C/C++ front-end would generate the following descriptors:
1246
1247 .. code-block:: llvm
1248
1249   ;;
1250   ;; Define the typedef "IntPtr".
1251   ;;
1252   !2 = metadata !{
1253     i32 786454,          ;; Tag
1254     metadata !3,         ;; File
1255     metadata !1,         ;; Context
1256     metadata !"IntPtr",  ;; Name
1257     i32 0,               ;; Line number
1258     i64 0,               ;; Size in bits
1259     i64 0,               ;; Align in bits
1260     i64 0,               ;; Offset in bits
1261     i32 0,               ;; Flags
1262     metadata !4          ;; Derived From type
1263   }
1264   ;;
1265   ;; Define the pointer type.
1266   ;;
1267   !4 = metadata !{
1268     i32 786447,          ;; Tag
1269     null,                ;; File
1270     null,                ;; Context
1271     metadata !"",        ;; Name
1272     i32 0,               ;; Line number
1273     i64 64,              ;; Size in bits
1274     i64 64,              ;; Align in bits
1275     i64 0,               ;; Offset in bits
1276     i32 0,               ;; Flags
1277     metadata !5          ;; Derived From type
1278   }
1279   ;;
1280   ;; Define the const type.
1281   ;;
1282   !5 = metadata !{
1283     i32 786470,          ;; Tag
1284     null,                ;; File
1285     null,                ;; Context
1286     metadata !"",        ;; Name
1287     i32 0,               ;; Line number
1288     i64 0,               ;; Size in bits
1289     i64 0,               ;; Align in bits
1290     i64 0,               ;; Offset in bits
1291     i32 0,               ;; Flags
1292     metadata !6          ;; Derived From type
1293   }
1294   ;;
1295   ;; Define the int type.
1296   ;;
1297   !6 = metadata !{
1298     i32 786468,          ;; Tag
1299     null,                ;; File
1300     null,                ;; Context
1301     metadata !"int",     ;; Name
1302     i32 0,               ;; Line number
1303     i64 32,              ;; Size in bits
1304     i64 32,              ;; Align in bits
1305     i64 0,               ;; Offset in bits
1306     i32 0,               ;; Flags
1307     i32 5                ;; Encoding
1308   }
1309
1310 C/C++ struct/union types
1311 ------------------------
1312
1313 Given the following as an example of C/C++ struct type:
1314
1315 .. code-block:: c
1316
1317   struct Color {
1318     unsigned Red;
1319     unsigned Green;
1320     unsigned Blue;
1321   };
1322
1323 a C/C++ front-end would generate the following descriptors:
1324
1325 .. code-block:: llvm
1326
1327   ;;
1328   ;; Define basic type for unsigned int.
1329   ;;
1330   !5 = metadata !{
1331     i32 786468,        ;; Tag
1332     null,              ;; File
1333     null,              ;; Context
1334     metadata !"unsigned int",
1335     i32 0,             ;; Line number
1336     i64 32,            ;; Size in Bits
1337     i64 32,            ;; Align in Bits
1338     i64 0,             ;; Offset in Bits
1339     i32 0,             ;; Flags
1340     i32 7              ;; Encoding
1341   }
1342   ;;
1343   ;; Define composite type for struct Color.
1344   ;;
1345   !2 = metadata !{
1346     i32 786451,        ;; Tag
1347     metadata !1,       ;; Compile unit
1348     null,              ;; Context
1349     metadata !"Color", ;; Name
1350     i32 1,             ;; Line number
1351     i64 96,            ;; Size in bits
1352     i64 32,            ;; Align in bits
1353     i64 0,             ;; Offset in bits
1354     i32 0,             ;; Flags
1355     null,              ;; Derived From
1356     metadata !3,       ;; Elements
1357     i32 0,             ;; Runtime Language
1358     null,              ;; Base type containing the vtable pointer for this type
1359     null               ;; Template parameters
1360   }
1361
1362   ;;
1363   ;; Define the Red field.
1364   ;;
1365   !4 = metadata !{
1366     i32 786445,        ;; Tag
1367     metadata !1,       ;; File
1368     metadata !1,       ;; Context
1369     metadata !"Red",   ;; Name
1370     i32 2,             ;; Line number
1371     i64 32,            ;; Size in bits
1372     i64 32,            ;; Align in bits
1373     i64 0,             ;; Offset in bits
1374     i32 0,             ;; Flags
1375     metadata !5        ;; Derived From type
1376   }
1377
1378   ;;
1379   ;; Define the Green field.
1380   ;;
1381   !6 = metadata !{
1382     i32 786445,        ;; Tag
1383     metadata !1,       ;; File
1384     metadata !1,       ;; Context
1385     metadata !"Green", ;; Name
1386     i32 3,             ;; Line number
1387     i64 32,            ;; Size in bits
1388     i64 32,            ;; Align in bits
1389     i64 32,             ;; Offset in bits
1390     i32 0,             ;; Flags
1391     metadata !5        ;; Derived From type
1392   }
1393
1394   ;;
1395   ;; Define the Blue field.
1396   ;;
1397   !7 = metadata !{
1398     i32 786445,        ;; Tag
1399     metadata !1,       ;; File
1400     metadata !1,       ;; Context
1401     metadata !"Blue",  ;; Name
1402     i32 4,             ;; Line number
1403     i64 32,            ;; Size in bits
1404     i64 32,            ;; Align in bits
1405     i64 64,             ;; Offset in bits
1406     i32 0,             ;; Flags
1407     metadata !5        ;; Derived From type
1408   }
1409
1410   ;;
1411   ;; Define the array of fields used by the composite type Color.
1412   ;;
1413   !3 = metadata !{metadata !4, metadata !6, metadata !7}
1414
1415 C/C++ enumeration types
1416 -----------------------
1417
1418 Given the following as an example of C/C++ enumeration type:
1419
1420 .. code-block:: c
1421
1422   enum Trees {
1423     Spruce = 100,
1424     Oak = 200,
1425     Maple = 300
1426   };
1427
1428 a C/C++ front-end would generate the following descriptors:
1429
1430 .. code-block:: llvm
1431
1432   ;;
1433   ;; Define composite type for enum Trees
1434   ;;
1435   !2 = metadata !{
1436     i32 786436,        ;; Tag
1437     metadata !1,       ;; File
1438     metadata !1,       ;; Context
1439     metadata !"Trees", ;; Name
1440     i32 1,             ;; Line number
1441     i64 32,            ;; Size in bits
1442     i64 32,            ;; Align in bits
1443     i64 0,             ;; Offset in bits
1444     i32 0,             ;; Flags
1445     null,              ;; Derived From type
1446     metadata !3,       ;; Elements
1447     i32 0              ;; Runtime language
1448   }
1449
1450   ;;
1451   ;; Define the array of enumerators used by composite type Trees.
1452   ;;
1453   !3 = metadata !{metadata !4, metadata !5, metadata !6}
1454
1455   ;;
1456   ;; Define Spruce enumerator.
1457   ;;
1458   !4 = metadata !{i32 786472, metadata !"Spruce", i64 100}
1459
1460   ;;
1461   ;; Define Oak enumerator.
1462   ;;
1463   !5 = metadata !{i32 786472, metadata !"Oak", i64 200}
1464
1465   ;;
1466   ;; Define Maple enumerator.
1467   ;;
1468   !6 = metadata !{i32 786472, metadata !"Maple", i64 300}
1469
1470 Debugging information format
1471 ============================
1472
1473 Debugging Information Extension for Objective C Properties
1474 ----------------------------------------------------------
1475
1476 Introduction
1477 ^^^^^^^^^^^^
1478
1479 Objective C provides a simpler way to declare and define accessor methods using
1480 declared properties.  The language provides features to declare a property and
1481 to let compiler synthesize accessor methods.
1482
1483 The debugger lets developer inspect Objective C interfaces and their instance
1484 variables and class variables.  However, the debugger does not know anything
1485 about the properties defined in Objective C interfaces.  The debugger consumes
1486 information generated by compiler in DWARF format.  The format does not support
1487 encoding of Objective C properties.  This proposal describes DWARF extensions to
1488 encode Objective C properties, which the debugger can use to let developers
1489 inspect Objective C properties.
1490
1491 Proposal
1492 ^^^^^^^^
1493
1494 Objective C properties exist separately from class members.  A property can be
1495 defined only by "setter" and "getter" selectors, and be calculated anew on each
1496 access.  Or a property can just be a direct access to some declared ivar.
1497 Finally it can have an ivar "automatically synthesized" for it by the compiler,
1498 in which case the property can be referred to in user code directly using the
1499 standard C dereference syntax as well as through the property "dot" syntax, but
1500 there is no entry in the ``@interface`` declaration corresponding to this ivar.
1501
1502 To facilitate debugging, these properties we will add a new DWARF TAG into the
1503 ``DW_TAG_structure_type`` definition for the class to hold the description of a
1504 given property, and a set of DWARF attributes that provide said description.
1505 The property tag will also contain the name and declared type of the property.
1506
1507 If there is a related ivar, there will also be a DWARF property attribute placed
1508 in the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG
1509 for that property.  And in the case where the compiler synthesizes the ivar
1510 directly, the compiler is expected to generate a ``DW_TAG_member`` for that
1511 ivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used
1512 to access this ivar directly in code, and with the property attribute pointing
1513 back to the property it is backing.
1514
1515 The following examples will serve as illustration for our discussion:
1516
1517 .. code-block:: objc
1518
1519   @interface I1 {
1520     int n2;
1521   }
1522
1523   @property int p1;
1524   @property int p2;
1525   @end
1526
1527   @implementation I1
1528   @synthesize p1;
1529   @synthesize p2 = n2;
1530   @end
1531
1532 This produces the following DWARF (this is a "pseudo dwarfdump" output):
1533
1534 .. code-block:: none
1535
1536   0x00000100:  TAG_structure_type [7] *
1537                  AT_APPLE_runtime_class( 0x10 )
1538                  AT_name( "I1" )
1539                  AT_decl_file( "Objc_Property.m" )
1540                  AT_decl_line( 3 )
1541
1542   0x00000110    TAG_APPLE_property
1543                   AT_name ( "p1" )
1544                   AT_type ( {0x00000150} ( int ) )
1545
1546   0x00000120:   TAG_APPLE_property
1547                   AT_name ( "p2" )
1548                   AT_type ( {0x00000150} ( int ) )
1549
1550   0x00000130:   TAG_member [8]
1551                   AT_name( "_p1" )
1552                   AT_APPLE_property ( {0x00000110} "p1" )
1553                   AT_type( {0x00000150} ( int ) )
1554                   AT_artificial ( 0x1 )
1555
1556   0x00000140:    TAG_member [8]
1557                    AT_name( "n2" )
1558                    AT_APPLE_property ( {0x00000120} "p2" )
1559                    AT_type( {0x00000150} ( int ) )
1560
1561   0x00000150:  AT_type( ( int ) )
1562
1563 Note, the current convention is that the name of the ivar for an
1564 auto-synthesized property is the name of the property from which it derives
1565 with an underscore prepended, as is shown in the example.  But we actually
1566 don't need to know this convention, since we are given the name of the ivar
1567 directly.
1568
1569 Also, it is common practice in ObjC to have different property declarations in
1570 the @interface and @implementation - e.g. to provide a read-only property in
1571 the interface,and a read-write interface in the implementation.  In that case,
1572 the compiler should emit whichever property declaration will be in force in the
1573 current translation unit.
1574
1575 Developers can decorate a property with attributes which are encoded using
1576 ``DW_AT_APPLE_property_attribute``.
1577
1578 .. code-block:: objc
1579
1580   @property (readonly, nonatomic) int pr;
1581
1582 .. code-block:: none
1583
1584   TAG_APPLE_property [8]
1585     AT_name( "pr" )
1586     AT_type ( {0x00000147} (int) )
1587     AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
1588
1589 The setter and getter method names are attached to the property using
1590 ``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes.
1591
1592 .. code-block:: objc
1593
1594   @interface I1
1595   @property (setter=myOwnP3Setter:) int p3;
1596   -(void)myOwnP3Setter:(int)a;
1597   @end
1598
1599   @implementation I1
1600   @synthesize p3;
1601   -(void)myOwnP3Setter:(int)a{ }
1602   @end
1603
1604 The DWARF for this would be:
1605
1606 .. code-block:: none
1607
1608   0x000003bd: TAG_structure_type [7] *
1609                 AT_APPLE_runtime_class( 0x10 )
1610                 AT_name( "I1" )
1611                 AT_decl_file( "Objc_Property.m" )
1612                 AT_decl_line( 3 )
1613
1614   0x000003cd      TAG_APPLE_property
1615                     AT_name ( "p3" )
1616                     AT_APPLE_property_setter ( "myOwnP3Setter:" )
1617                     AT_type( {0x00000147} ( int ) )
1618
1619   0x000003f3:     TAG_member [8]
1620                     AT_name( "_p3" )
1621                     AT_type ( {0x00000147} ( int ) )
1622                     AT_APPLE_property ( {0x000003cd} )
1623                     AT_artificial ( 0x1 )
1624
1625 New DWARF Tags
1626 ^^^^^^^^^^^^^^
1627
1628 +-----------------------+--------+
1629 | TAG                   | Value  |
1630 +=======================+========+
1631 | DW_TAG_APPLE_property | 0x4200 |
1632 +-----------------------+--------+
1633
1634 New DWARF Attributes
1635 ^^^^^^^^^^^^^^^^^^^^
1636
1637 +--------------------------------+--------+-----------+
1638 | Attribute                      | Value  | Classes   |
1639 +================================+========+===========+
1640 | DW_AT_APPLE_property           | 0x3fed | Reference |
1641 +--------------------------------+--------+-----------+
1642 | DW_AT_APPLE_property_getter    | 0x3fe9 | String    |
1643 +--------------------------------+--------+-----------+
1644 | DW_AT_APPLE_property_setter    | 0x3fea | String    |
1645 +--------------------------------+--------+-----------+
1646 | DW_AT_APPLE_property_attribute | 0x3feb | Constant  |
1647 +--------------------------------+--------+-----------+
1648
1649 New DWARF Constants
1650 ^^^^^^^^^^^^^^^^^^^
1651
1652 +--------------------------------+-------+
1653 | Name                           | Value |
1654 +================================+=======+
1655 | DW_AT_APPLE_PROPERTY_readonly  | 0x1   |
1656 +--------------------------------+-------+
1657 | DW_AT_APPLE_PROPERTY_readwrite | 0x2   |
1658 +--------------------------------+-------+
1659 | DW_AT_APPLE_PROPERTY_assign    | 0x4   |
1660 +--------------------------------+-------+
1661 | DW_AT_APPLE_PROPERTY_retain    | 0x8   |
1662 +--------------------------------+-------+
1663 | DW_AT_APPLE_PROPERTY_copy      | 0x10  |
1664 +--------------------------------+-------+
1665 | DW_AT_APPLE_PROPERTY_nonatomic | 0x20  |
1666 +--------------------------------+-------+
1667
1668 Name Accelerator Tables
1669 -----------------------
1670
1671 Introduction
1672 ^^^^^^^^^^^^
1673
1674 The "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a
1675 debugger needs.  The "``pub``" in the section name indicates that the entries
1676 in the table are publicly visible names only.  This means no static or hidden
1677 functions show up in the "``.debug_pubnames``".  No static variables or private
1678 class variables are in the "``.debug_pubtypes``".  Many compilers add different
1679 things to these tables, so we can't rely upon the contents between gcc, icc, or
1680 clang.
1681
1682 The typical query given by users tends not to match up with the contents of
1683 these tables.  For example, the DWARF spec states that "In the case of the name
1684 of a function member or static data member of a C++ structure, class or union,
1685 the name presented in the "``.debug_pubnames``" section is not the simple name
1686 given by the ``DW_AT_name attribute`` of the referenced debugging information
1687 entry, but rather the fully qualified name of the data or function member."
1688 So the only names in these tables for complex C++ entries is a fully
1689 qualified name.  Debugger users tend not to enter their search strings as
1690 "``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or
1691 "``a::b::c``".  So the name entered in the name table must be demangled in
1692 order to chop it up appropriately and additional names must be manually entered
1693 into the table to make it effective as a name lookup table for debuggers to
1694 se.
1695
1696 All debuggers currently ignore the "``.debug_pubnames``" table as a result of
1697 its inconsistent and useless public-only name content making it a waste of
1698 space in the object file.  These tables, when they are written to disk, are not
1699 sorted in any way, leaving every debugger to do its own parsing and sorting.
1700 These tables also include an inlined copy of the string values in the table
1701 itself making the tables much larger than they need to be on disk, especially
1702 for large C++ programs.
1703
1704 Can't we just fix the sections by adding all of the names we need to this
1705 table? No, because that is not what the tables are defined to contain and we
1706 won't know the difference between the old bad tables and the new good tables.
1707 At best we could make our own renamed sections that contain all of the data we
1708 need.
1709
1710 These tables are also insufficient for what a debugger like LLDB needs.  LLDB
1711 uses clang for its expression parsing where LLDB acts as a PCH.  LLDB is then
1712 often asked to look for type "``foo``" or namespace "``bar``", or list items in
1713 namespace "``baz``".  Namespaces are not included in the pubnames or pubtypes
1714 tables.  Since clang asks a lot of questions when it is parsing an expression,
1715 we need to be very fast when looking up names, as it happens a lot.  Having new
1716 accelerator tables that are optimized for very quick lookups will benefit this
1717 type of debugging experience greatly.
1718
1719 We would like to generate name lookup tables that can be mapped into memory
1720 from disk, and used as is, with little or no up-front parsing.  We would also
1721 be able to control the exact content of these different tables so they contain
1722 exactly what we need.  The Name Accelerator Tables were designed to fix these
1723 issues.  In order to solve these issues we need to:
1724
1725 * Have a format that can be mapped into memory from disk and used as is
1726 * Lookups should be very fast
1727 * Extensible table format so these tables can be made by many producers
1728 * Contain all of the names needed for typical lookups out of the box
1729 * Strict rules for the contents of tables
1730
1731 Table size is important and the accelerator table format should allow the reuse
1732 of strings from common string tables so the strings for the names are not
1733 duplicated.  We also want to make sure the table is ready to be used as-is by
1734 simply mapping the table into memory with minimal header parsing.
1735
1736 The name lookups need to be fast and optimized for the kinds of lookups that
1737 debuggers tend to do.  Optimally we would like to touch as few parts of the
1738 mapped table as possible when doing a name lookup and be able to quickly find
1739 the name entry we are looking for, or discover there are no matches.  In the
1740 case of debuggers we optimized for lookups that fail most of the time.
1741
1742 Each table that is defined should have strict rules on exactly what is in the
1743 accelerator tables and documented so clients can rely on the content.
1744
1745 Hash Tables
1746 ^^^^^^^^^^^
1747
1748 Standard Hash Tables
1749 """"""""""""""""""""
1750
1751 Typical hash tables have a header, buckets, and each bucket points to the
1752 bucket contents:
1753
1754 .. code-block:: none
1755
1756   .------------.
1757   |  HEADER    |
1758   |------------|
1759   |  BUCKETS   |
1760   |------------|
1761   |  DATA      |
1762   `------------'
1763
1764 The BUCKETS are an array of offsets to DATA for each hash:
1765
1766 .. code-block:: none
1767
1768   .------------.
1769   | 0x00001000 | BUCKETS[0]
1770   | 0x00002000 | BUCKETS[1]
1771   | 0x00002200 | BUCKETS[2]
1772   | 0x000034f0 | BUCKETS[3]
1773   |            | ...
1774   | 0xXXXXXXXX | BUCKETS[n_buckets]
1775   '------------'
1776
1777 So for ``bucket[3]`` in the example above, we have an offset into the table
1778 0x000034f0 which points to a chain of entries for the bucket.  Each bucket must
1779 contain a next pointer, full 32 bit hash value, the string itself, and the data
1780 for the current string value.
1781
1782 .. code-block:: none
1783
1784               .------------.
1785   0x000034f0: | 0x00003500 | next pointer
1786               | 0x12345678 | 32 bit hash
1787               | "erase"    | string value
1788               | data[n]    | HashData for this bucket
1789               |------------|
1790   0x00003500: | 0x00003550 | next pointer
1791               | 0x29273623 | 32 bit hash
1792               | "dump"     | string value
1793               | data[n]    | HashData for this bucket
1794               |------------|
1795   0x00003550: | 0x00000000 | next pointer
1796               | 0x82638293 | 32 bit hash
1797               | "main"     | string value
1798               | data[n]    | HashData for this bucket
1799               `------------'
1800
1801 The problem with this layout for debuggers is that we need to optimize for the
1802 negative lookup case where the symbol we're searching for is not present.  So
1803 if we were to lookup "``printf``" in the table above, we would make a 32 hash
1804 for "``printf``", it might match ``bucket[3]``.  We would need to go to the
1805 offset 0x000034f0 and start looking to see if our 32 bit hash matches.  To do
1806 so, we need to read the next pointer, then read the hash, compare it, and skip
1807 to the next bucket.  Each time we are skipping many bytes in memory and
1808 touching new cache pages just to do the compare on the full 32 bit hash.  All
1809 of these accesses then tell us that we didn't have a match.
1810
1811 Name Hash Tables
1812 """"""""""""""""
1813
1814 To solve the issues mentioned above we have structured the hash tables a bit
1815 differently: a header, buckets, an array of all unique 32 bit hash values,
1816 followed by an array of hash value data offsets, one for each hash value, then
1817 the data for all hash values:
1818
1819 .. code-block:: none
1820
1821   .-------------.
1822   |  HEADER     |
1823   |-------------|
1824   |  BUCKETS    |
1825   |-------------|
1826   |  HASHES     |
1827   |-------------|
1828   |  OFFSETS    |
1829   |-------------|
1830   |  DATA       |
1831   `-------------'
1832
1833 The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array.  By
1834 making all of the full 32 bit hash values contiguous in memory, we allow
1835 ourselves to efficiently check for a match while touching as little memory as
1836 possible.  Most often checking the 32 bit hash values is as far as the lookup
1837 goes.  If it does match, it usually is a match with no collisions.  So for a
1838 table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash
1839 values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and
1840 ``OFFSETS`` as:
1841
1842 .. code-block:: none
1843
1844   .-------------------------.
1845   |  HEADER.magic           | uint32_t
1846   |  HEADER.version         | uint16_t
1847   |  HEADER.hash_function   | uint16_t
1848   |  HEADER.bucket_count    | uint32_t
1849   |  HEADER.hashes_count    | uint32_t
1850   |  HEADER.header_data_len | uint32_t
1851   |  HEADER_DATA            | HeaderData
1852   |-------------------------|
1853   |  BUCKETS                | uint32_t[n_buckets] // 32 bit hash indexes
1854   |-------------------------|
1855   |  HASHES                 | uint32_t[n_hashes] // 32 bit hash values
1856   |-------------------------|
1857   |  OFFSETS                | uint32_t[n_hashes] // 32 bit offsets to hash value data
1858   |-------------------------|
1859   |  ALL HASH DATA          |
1860   `-------------------------'
1861
1862 So taking the exact same data from the standard hash example above we end up
1863 with:
1864
1865 .. code-block:: none
1866
1867               .------------.
1868               | HEADER     |
1869               |------------|
1870               |          0 | BUCKETS[0]
1871               |          2 | BUCKETS[1]
1872               |          5 | BUCKETS[2]
1873               |          6 | BUCKETS[3]
1874               |            | ...
1875               |        ... | BUCKETS[n_buckets]
1876               |------------|
1877               | 0x........ | HASHES[0]
1878               | 0x........ | HASHES[1]
1879               | 0x........ | HASHES[2]
1880               | 0x........ | HASHES[3]
1881               | 0x........ | HASHES[4]
1882               | 0x........ | HASHES[5]
1883               | 0x12345678 | HASHES[6]    hash for BUCKETS[3]
1884               | 0x29273623 | HASHES[7]    hash for BUCKETS[3]
1885               | 0x82638293 | HASHES[8]    hash for BUCKETS[3]
1886               | 0x........ | HASHES[9]
1887               | 0x........ | HASHES[10]
1888               | 0x........ | HASHES[11]
1889               | 0x........ | HASHES[12]
1890               | 0x........ | HASHES[13]
1891               | 0x........ | HASHES[n_hashes]
1892               |------------|
1893               | 0x........ | OFFSETS[0]
1894               | 0x........ | OFFSETS[1]
1895               | 0x........ | OFFSETS[2]
1896               | 0x........ | OFFSETS[3]
1897               | 0x........ | OFFSETS[4]
1898               | 0x........ | OFFSETS[5]
1899               | 0x000034f0 | OFFSETS[6]   offset for BUCKETS[3]
1900               | 0x00003500 | OFFSETS[7]   offset for BUCKETS[3]
1901               | 0x00003550 | OFFSETS[8]   offset for BUCKETS[3]
1902               | 0x........ | OFFSETS[9]
1903               | 0x........ | OFFSETS[10]
1904               | 0x........ | OFFSETS[11]
1905               | 0x........ | OFFSETS[12]
1906               | 0x........ | OFFSETS[13]
1907               | 0x........ | OFFSETS[n_hashes]
1908               |------------|
1909               |            |
1910               |            |
1911               |            |
1912               |            |
1913               |            |
1914               |------------|
1915   0x000034f0: | 0x00001203 | .debug_str ("erase")
1916               | 0x00000004 | A 32 bit array count - number of HashData with name "erase"
1917               | 0x........ | HashData[0]
1918               | 0x........ | HashData[1]
1919               | 0x........ | HashData[2]
1920               | 0x........ | HashData[3]
1921               | 0x00000000 | String offset into .debug_str (terminate data for hash)
1922               |------------|
1923   0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
1924               | 0x00000002 | A 32 bit array count - number of HashData with name "collision"
1925               | 0x........ | HashData[0]
1926               | 0x........ | HashData[1]
1927               | 0x00001203 | String offset into .debug_str ("dump")
1928               | 0x00000003 | A 32 bit array count - number of HashData with name "dump"
1929               | 0x........ | HashData[0]
1930               | 0x........ | HashData[1]
1931               | 0x........ | HashData[2]
1932               | 0x00000000 | String offset into .debug_str (terminate data for hash)
1933               |------------|
1934   0x00003550: | 0x00001203 | String offset into .debug_str ("main")
1935               | 0x00000009 | A 32 bit array count - number of HashData with name "main"
1936               | 0x........ | HashData[0]
1937               | 0x........ | HashData[1]
1938               | 0x........ | HashData[2]
1939               | 0x........ | HashData[3]
1940               | 0x........ | HashData[4]
1941               | 0x........ | HashData[5]
1942               | 0x........ | HashData[6]
1943               | 0x........ | HashData[7]
1944               | 0x........ | HashData[8]
1945               | 0x00000000 | String offset into .debug_str (terminate data for hash)
1946               `------------'
1947
1948 So we still have all of the same data, we just organize it more efficiently for
1949 debugger lookup.  If we repeat the same "``printf``" lookup from above, we
1950 would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit
1951 hash value and modulo it by ``n_buckets``.  ``BUCKETS[3]`` contains "6" which
1952 is the index into the ``HASHES`` table.  We would then compare any consecutive
1953 32 bit hashes values in the ``HASHES`` array as long as the hashes would be in
1954 ``BUCKETS[3]``.  We do this by verifying that each subsequent hash value modulo
1955 ``n_buckets`` is still 3.  In the case of a failed lookup we would access the
1956 memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes
1957 before we know that we have no match.  We don't end up marching through
1958 multiple words of memory and we really keep the number of processor data cache
1959 lines being accessed as small as possible.
1960
1961 The string hash that is used for these lookup tables is the Daniel J.
1962 Bernstein hash which is also used in the ELF ``GNU_HASH`` sections.  It is a
1963 very good hash for all kinds of names in programs with very few hash
1964 collisions.
1965
1966 Empty buckets are designated by using an invalid hash index of ``UINT32_MAX``.
1967
1968 Details
1969 ^^^^^^^
1970
1971 These name hash tables are designed to be generic where specializations of the
1972 table get to define additional data that goes into the header ("``HeaderData``"),
1973 how the string value is stored ("``KeyType``") and the content of the data for each
1974 hash value.
1975
1976 Header Layout
1977 """""""""""""
1978
1979 The header has a fixed part, and the specialized part.  The exact format of the
1980 header is:
1981
1982 .. code-block:: c
1983
1984   struct Header
1985   {
1986     uint32_t   magic;           // 'HASH' magic value to allow endian detection
1987     uint16_t   version;         // Version number
1988     uint16_t   hash_function;   // The hash function enumeration that was used
1989     uint32_t   bucket_count;    // The number of buckets in this hash table
1990     uint32_t   hashes_count;    // The total number of unique hash values and hash data offsets in this table
1991     uint32_t   header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
1992                                 // Specifically the length of the following HeaderData field - this does not
1993                                 // include the size of the preceding fields
1994     HeaderData header_data;     // Implementation specific header data
1995   };
1996
1997 The header starts with a 32 bit "``magic``" value which must be ``'HASH'``
1998 encoded as an ASCII integer.  This allows the detection of the start of the
1999 hash table and also allows the table's byte order to be determined so the table
2000 can be correctly extracted.  The "``magic``" value is followed by a 16 bit
2001 ``version`` number which allows the table to be revised and modified in the
2002 future.  The current version number is 1. ``hash_function`` is a ``uint16_t``
2003 enumeration that specifies which hash function was used to produce this table.
2004 The current values for the hash function enumerations include:
2005
2006 .. code-block:: c
2007
2008   enum HashFunctionType
2009   {
2010     eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
2011   };
2012
2013 ``bucket_count`` is a 32 bit unsigned integer that represents how many buckets
2014 are in the ``BUCKETS`` array.  ``hashes_count`` is the number of unique 32 bit
2015 hash values that are in the ``HASHES`` array, and is the same number of offsets
2016 are contained in the ``OFFSETS`` array.  ``header_data_len`` specifies the size
2017 in bytes of the ``HeaderData`` that is filled in by specialized versions of
2018 this table.
2019
2020 Fixed Lookup
2021 """"""""""""
2022
2023 The header is followed by the buckets, hashes, offsets, and hash value data.
2024
2025 .. code-block:: c
2026
2027   struct FixedTable
2028   {
2029     uint32_t buckets[Header.bucket_count];  // An array of hash indexes into the "hashes[]" array below
2030     uint32_t hashes [Header.hashes_count];  // Every unique 32 bit hash for the entire table is in this table
2031     uint32_t offsets[Header.hashes_count];  // An offset that corresponds to each item in the "hashes[]" array above
2032   };
2033
2034 ``buckets`` is an array of 32 bit indexes into the ``hashes`` array.  The
2035 ``hashes`` array contains all of the 32 bit hash values for all names in the
2036 hash table.  Each hash in the ``hashes`` table has an offset in the ``offsets``
2037 array that points to the data for the hash value.
2038
2039 This table setup makes it very easy to repurpose these tables to contain
2040 different data, while keeping the lookup mechanism the same for all tables.
2041 This layout also makes it possible to save the table to disk and map it in
2042 later and do very efficient name lookups with little or no parsing.
2043
2044 DWARF lookup tables can be implemented in a variety of ways and can store a lot
2045 of information for each name.  We want to make the DWARF tables extensible and
2046 able to store the data efficiently so we have used some of the DWARF features
2047 that enable efficient data storage to define exactly what kind of data we store
2048 for each name.
2049
2050 The ``HeaderData`` contains a definition of the contents of each HashData chunk.
2051 We might want to store an offset to all of the debug information entries (DIEs)
2052 for each name.  To keep things extensible, we create a list of items, or
2053 Atoms, that are contained in the data for each name.  First comes the type of
2054 the data in each atom:
2055
2056 .. code-block:: c
2057
2058   enum AtomType
2059   {
2060     eAtomTypeNULL       = 0u,
2061     eAtomTypeDIEOffset  = 1u,   // DIE offset, check form for encoding
2062     eAtomTypeCUOffset   = 2u,   // DIE offset of the compiler unit header that contains the item in question
2063     eAtomTypeTag        = 3u,   // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
2064     eAtomTypeNameFlags  = 4u,   // Flags from enum NameFlags
2065     eAtomTypeTypeFlags  = 5u,   // Flags from enum TypeFlags
2066   };
2067
2068 The enumeration values and their meanings are:
2069
2070 .. code-block:: none
2071
2072   eAtomTypeNULL       - a termination atom that specifies the end of the atom list
2073   eAtomTypeDIEOffset  - an offset into the .debug_info section for the DWARF DIE for this name
2074   eAtomTypeCUOffset   - an offset into the .debug_info section for the CU that contains the DIE
2075   eAtomTypeDIETag     - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
2076   eAtomTypeNameFlags  - Flags for functions and global variables (isFunction, isInlined, isExternal...)
2077   eAtomTypeTypeFlags  - Flags for types (isCXXClass, isObjCClass, ...)
2078
2079 Then we allow each atom type to define the atom type and how the data for each
2080 atom type data is encoded:
2081
2082 .. code-block:: c
2083
2084   struct Atom
2085   {
2086     uint16_t type;  // AtomType enum value
2087     uint16_t form;  // DWARF DW_FORM_XXX defines
2088   };
2089
2090 The ``form`` type above is from the DWARF specification and defines the exact
2091 encoding of the data for the Atom type.  See the DWARF specification for the
2092 ``DW_FORM_`` definitions.
2093
2094 .. code-block:: c
2095
2096   struct HeaderData
2097   {
2098     uint32_t die_offset_base;
2099     uint32_t atom_count;
2100     Atoms    atoms[atom_count0];
2101   };
2102
2103 ``HeaderData`` defines the base DIE offset that should be added to any atoms
2104 that are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``,
2105 ``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``.  It also defines
2106 what is contained in each ``HashData`` object -- ``Atom.form`` tells us how large
2107 each field will be in the ``HashData`` and the ``Atom.type`` tells us how this data
2108 should be interpreted.
2109
2110 For the current implementations of the "``.apple_names``" (all functions +
2111 globals), the "``.apple_types``" (names of all types that are defined), and
2112 the "``.apple_namespaces``" (all namespaces), we currently set the ``Atom``
2113 array to be:
2114
2115 .. code-block:: c
2116
2117   HeaderData.atom_count = 1;
2118   HeaderData.atoms[0].type = eAtomTypeDIEOffset;
2119   HeaderData.atoms[0].form = DW_FORM_data4;
2120
2121 This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
2122 encoded as a 32 bit value (DW_FORM_data4).  This allows a single name to have
2123 multiple matching DIEs in a single file, which could come up with an inlined
2124 function for instance.  Future tables could include more information about the
2125 DIE such as flags indicating if the DIE is a function, method, block,
2126 or inlined.
2127
2128 The KeyType for the DWARF table is a 32 bit string table offset into the
2129 ".debug_str" table.  The ".debug_str" is the string table for the DWARF which
2130 may already contain copies of all of the strings.  This helps make sure, with
2131 help from the compiler, that we reuse the strings between all of the DWARF
2132 sections and keeps the hash table size down.  Another benefit to having the
2133 compiler generate all strings as DW_FORM_strp in the debug info, is that
2134 DWARF parsing can be made much faster.
2135
2136 After a lookup is made, we get an offset into the hash data.  The hash data
2137 needs to be able to deal with 32 bit hash collisions, so the chunk of data
2138 at the offset in the hash data consists of a triple:
2139
2140 .. code-block:: c
2141
2142   uint32_t str_offset
2143   uint32_t hash_data_count
2144   HashData[hash_data_count]
2145
2146 If "str_offset" is zero, then the bucket contents are done. 99.9% of the
2147 hash data chunks contain a single item (no 32 bit hash collision):
2148
2149 .. code-block:: none
2150
2151   .------------.
2152   | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
2153   | 0x00000004 | uint32_t HashData count
2154   | 0x........ | uint32_t HashData[0] DIE offset
2155   | 0x........ | uint32_t HashData[1] DIE offset
2156   | 0x........ | uint32_t HashData[2] DIE offset
2157   | 0x........ | uint32_t HashData[3] DIE offset
2158   | 0x00000000 | uint32_t KeyType (end of hash chain)
2159   `------------'
2160
2161 If there are collisions, you will have multiple valid string offsets:
2162
2163 .. code-block:: none
2164
2165   .------------.
2166   | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
2167   | 0x00000004 | uint32_t HashData count
2168   | 0x........ | uint32_t HashData[0] DIE offset
2169   | 0x........ | uint32_t HashData[1] DIE offset
2170   | 0x........ | uint32_t HashData[2] DIE offset
2171   | 0x........ | uint32_t HashData[3] DIE offset
2172   | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
2173   | 0x00000002 | uint32_t HashData count
2174   | 0x........ | uint32_t HashData[0] DIE offset
2175   | 0x........ | uint32_t HashData[1] DIE offset
2176   | 0x00000000 | uint32_t KeyType (end of hash chain)
2177   `------------'
2178
2179 Current testing with real world C++ binaries has shown that there is around 1
2180 32 bit hash collision per 100,000 name entries.
2181
2182 Contents
2183 ^^^^^^^^
2184
2185 As we said, we want to strictly define exactly what is included in the
2186 different tables.  For DWARF, we have 3 tables: "``.apple_names``",
2187 "``.apple_types``", and "``.apple_namespaces``".
2188
2189 "``.apple_names``" sections should contain an entry for each DWARF DIE whose
2190 ``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or
2191 ``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``,
2192 ``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``.  It also contains
2193 ``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and
2194 static variables).  All global and static variables should be included,
2195 including those scoped within functions and classes.  For example using the
2196 following code:
2197
2198 .. code-block:: c
2199
2200   static int var = 0;
2201
2202   void f ()
2203   {
2204     static int var = 0;
2205   }
2206
2207 Both of the static ``var`` variables would be included in the table.  All
2208 functions should emit both their full names and their basenames.  For C or C++,
2209 the full name is the mangled name (if available) which is usually in the
2210 ``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the
2211 function basename.  If global or static variables have a mangled name in a
2212 ``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the
2213 simple name found in the ``DW_AT_name`` attribute.
2214
2215 "``.apple_types``" sections should contain an entry for each DWARF DIE whose
2216 tag is one of:
2217
2218 * DW_TAG_array_type
2219 * DW_TAG_class_type
2220 * DW_TAG_enumeration_type
2221 * DW_TAG_pointer_type
2222 * DW_TAG_reference_type
2223 * DW_TAG_string_type
2224 * DW_TAG_structure_type
2225 * DW_TAG_subroutine_type
2226 * DW_TAG_typedef
2227 * DW_TAG_union_type
2228 * DW_TAG_ptr_to_member_type
2229 * DW_TAG_set_type
2230 * DW_TAG_subrange_type
2231 * DW_TAG_base_type
2232 * DW_TAG_const_type
2233 * DW_TAG_constant
2234 * DW_TAG_file_type
2235 * DW_TAG_namelist
2236 * DW_TAG_packed_type
2237 * DW_TAG_volatile_type
2238 * DW_TAG_restrict_type
2239 * DW_TAG_interface_type
2240 * DW_TAG_unspecified_type
2241 * DW_TAG_shared_type
2242
2243 Only entries with a ``DW_AT_name`` attribute are included, and the entry must
2244 not be a forward declaration (``DW_AT_declaration`` attribute with a non-zero
2245 value).  For example, using the following code:
2246
2247 .. code-block:: c
2248
2249   int main ()
2250   {
2251     int *b = 0;
2252     return *b;
2253   }
2254
2255 We get a few type DIEs:
2256
2257 .. code-block:: none
2258
2259   0x00000067:     TAG_base_type [5]
2260                   AT_encoding( DW_ATE_signed )
2261                   AT_name( "int" )
2262                   AT_byte_size( 0x04 )
2263
2264   0x0000006e:     TAG_pointer_type [6]
2265                   AT_type( {0x00000067} ( int ) )
2266                   AT_byte_size( 0x08 )
2267
2268 The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``.
2269
2270 "``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs.
2271 If we run into a namespace that has no name this is an anonymous namespace, and
2272 the name should be output as "``(anonymous namespace)``" (without the quotes).
2273 Why?  This matches the output of the ``abi::cxa_demangle()`` that is in the
2274 standard C++ library that demangles mangled names.
2275
2276
2277 Language Extensions and File Format Changes
2278 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2279
2280 Objective-C Extensions
2281 """"""""""""""""""""""
2282
2283 "``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an
2284 Objective-C class.  The name used in the hash table is the name of the
2285 Objective-C class itself.  If the Objective-C class has a category, then an
2286 entry is made for both the class name without the category, and for the class
2287 name with the category.  So if we have a DIE at offset 0x1234 with a name of
2288 method "``-[NSString(my_additions) stringWithSpecialString:]``", we would add
2289 an entry for "``NSString``" that points to DIE 0x1234, and an entry for
2290 "``NSString(my_additions)``" that points to 0x1234.  This allows us to quickly
2291 track down all Objective-C methods for an Objective-C class when doing
2292 expressions.  It is needed because of the dynamic nature of Objective-C where
2293 anyone can add methods to a class.  The DWARF for Objective-C methods is also
2294 emitted differently from C++ classes where the methods are not usually
2295 contained in the class definition, they are scattered about across one or more
2296 compile units.  Categories can also be defined in different shared libraries.
2297 So we need to be able to quickly find all of the methods and class functions
2298 given the Objective-C class name, or quickly find all methods and class
2299 functions for a class + category name.  This table does not contain any
2300 selector names, it just maps Objective-C class names (or class names +
2301 category) to all of the methods and class functions.  The selectors are added
2302 as function basenames in the "``.debug_names``" section.
2303
2304 In the "``.apple_names``" section for Objective-C functions, the full name is
2305 the entire function name with the brackets ("``-[NSString
2306 stringWithCString:]``") and the basename is the selector only
2307 ("``stringWithCString:``").
2308
2309 Mach-O Changes
2310 """"""""""""""
2311
2312 The sections names for the apple hash tables are for non-mach-o files.  For
2313 mach-o files, the sections should be contained in the ``__DWARF`` segment with
2314 names as follows:
2315
2316 * "``.apple_names``" -> "``__apple_names``"
2317 * "``.apple_types``" -> "``__apple_types``"
2318 * "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit)
2319 * "``.apple_objc``" -> "``__apple_objc``"
2320