-.. _bitcode_format:
-
.. role:: raw-html(raw)
:format: html
provides a mechanism for the file to self-describe "abbreviations", which are
effectively size optimizations for the content.
-LLVM IR files may be optionally embedded into a `wrapper`_ structure that makes
-it easy to embed extra data along with LLVM IR files.
+LLVM IR files may be optionally embedded into a `wrapper`_ structure, or in a
+`native object file`_. Both of these mechanisms make it easy to embed extra
+data along with LLVM IR files.
This document first describes the LLVM bitstream format, describes the wrapper
format, then describes the record structure used by LLVM IR files.
* Abbreviations, which specify compression optimizations for the file.
-Note that the `llvm-bcanalyzer <CommandGuide/html/llvm-bcanalyzer.html>`_ tool
-can be used to dump and inspect arbitrary bitstreams, which is very useful for
+Note that the :doc:`llvm-bcanalyzer <CommandGuide/llvm-bcanalyzer>` tool can be
+used to dump and inspect arbitrary bitstreams, which is very useful for
understanding the encoding.
.. _magic number:
in bytes of the stream. CPUType is a target-specific value that can be used to
encode the CPU of the target.
+.. _native object file:
+
+Native Object File Wrapper Format
+=================================
+
+Bitcode files for LLVM IR may also be wrapped in a native object file
+(i.e. ELF, COFF, Mach-O). The bitcode must be stored in a section of the
+object file named ``.llvmbc``. This wrapper format is useful for accommodating
+LTO in compilation pipelines where intermediate objects must be native object
+files which contain metadata in other sections.
+
+Not all tools support this format.
+
.. _encoding of LLVM IR:
LLVM IR Encoding
When combined with the bitcode magic number and viewed as bytes, this is
``"BC 0xC0DE"``.
+.. _Signed VBRs:
+
Signed VBRs
^^^^^^^^^^^
With this encoding, small positive and small negative values can both be emitted
efficiently. Signed VBR encoding is used in ``CST_CODE_INTEGER`` and
``CST_CODE_WIDE_INTEGER`` records within ``CONSTANTS_BLOCK`` blocks.
+It is also used for phi instruction operands in `MODULE_CODE_VERSION`_ 1.
LLVM IR Blocks
^^^^^^^^^^^^^^
* `FUNCTION_BLOCK`_
* `METADATA_BLOCK`_
+.. _MODULE_CODE_VERSION:
+
MODULE_CODE_VERSION Record
^^^^^^^^^^^^^^^^^^^^^^^^^^
``[VERSION, version#]``
The ``VERSION`` record (code 1) contains a single value indicating the format
-version. Only version 0 is supported at this time.
+version. Versions 0 and 1 are supported at this time. The difference between
+version 0 and 1 is in the encoding of instruction operands in
+each `FUNCTION_BLOCK`_.
+
+In version 0, each value defined by an instruction is assigned an ID
+unique to the function. Function-level value IDs are assigned starting from
+``NumModuleValues`` since they share the same namespace as module-level
+values. The value enumerator resets after each function. When a value is
+an operand of an instruction, the value ID is used to represent the operand.
+For large functions or large modules, these operand values can be large.
+
+The encoding in version 1 attempts to avoid large operand values
+in common cases. Instead of using the value ID directly, operands are
+encoded as relative to the current instruction. Thus, if an operand
+is the value defined by the previous instruction, the operand
+will be encoded as 1.
+
+For example, instead of
+
+.. code-block:: llvm
+
+ #n = load #n-1
+ #n+1 = icmp eq #n, #const0
+ br #n+1, label #(bb1), label #(bb2)
+
+version 1 will encode the instructions as
+
+.. code-block:: llvm
+
+ #n = load #1
+ #n+1 = icmp eq #1, (#n+1)-#const0
+ br #1, label #(bb1), label #(bb2)
+
+Note in the example that operands which are constants also use
+the relative encoding, while operands like basic block labels
+do not use the relative encoding.
+
+Forward references will result in a negative value.
+This can be inefficient, as operands are normally encoded
+as unsigned VBRs. However, forward references are rare, except in the
+case of phi instructions. For phi instructions, operands are encoded as
+`Signed VBRs`_ to deal with forward references.
+
MODULE_CODE_TRIPLE Record
^^^^^^^^^^^^^^^^^^^^^^^^^
MODULE_CODE_GLOBALVAR Record
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-``[GLOBALVAR, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr]``
+``[GLOBALVAR, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr, externally_initialized, dllstorageclass, comdat]``
The ``GLOBALVAR`` record (code 7) marks the declaration or definition of a
global variable. The operand fields are:
* ``weak_odr``: code 10
* ``linkonce_odr``: code 11
* ``available_externally``: code 12
- * ``linker_private``: code 13
+ * deprecated : code 13
+ * deprecated : code 14
* alignment*: The logarithm base 2 of the variable's requested alignment, plus 1
* *unnamed_addr*: If present and non-zero, indicates that the variable has
``unnamed_addr``
+.. _bcdllstorageclass:
+
+* *dllstorageclass*: If present, an encoding of the DLL storage class of this variable:
+
+ * ``default``: code 0
+ * ``dllimport``: code 1
+ * ``dllexport``: code 2
+
.. _FUNCTION:
MODULE_CODE_FUNCTION Record
^^^^^^^^^^^^^^^^^^^^^^^^^^^
-``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc]``
+``[FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prologuedata, dllstorageclass, comdat, prefixdata, personalityfn]``
The ``FUNCTION`` record (code 8) marks the declaration or definition of a
function. The operand fields are:
* ``ccc``: code 0
* ``fastcc``: code 8
* ``coldcc``: code 9
+ * ``webkit_jscc``: code 12
+ * ``anyregcc``: code 13
+ * ``preserve_mostcc``: code 14
+ * ``preserve_allcc``: code 15
+ * ``cxx_fast_tlscc``: code 17
* ``x86_stdcallcc``: code 64
* ``x86_fastcallcc``: code 65
* ``arm_apcscc``: code 66
* *unnamed_addr*: If present and non-zero, indicates that the function has
``unnamed_addr``
+* *prologuedata*: If non-zero, the value index of the prologue data for this function,
+ plus 1.
+
+* *dllstorageclass*: An encoding of the
+ :ref:`dllstorageclass<bcdllstorageclass>` of this function
+
+* *comdat*: An encoding of the COMDAT of this function
+
+* *prefixdata*: If non-zero, the value index of the prefix data for this function,
+ plus 1.
+
+* *personalityfn*: If non-zero, the value index of the personality function for this function,
+ plus 1.
+
MODULE_CODE_ALIAS Record
^^^^^^^^^^^^^^^^^^^^^^^^
-``[ALIAS, alias type, aliasee val#, linkage, visibility]``
+``[ALIAS, alias type, aliasee val#, linkage, visibility, dllstorageclass]``
The ``ALIAS`` record (code 9) marks the definition of an alias. The operand
fields are
* *visibility*: If present, an encoding of the `visibility`_ of the alias
+* *dllstorageclass*: If present, an encoding of the
+ :ref:`dllstorageclass<bcdllstorageclass>` of the alias
+
MODULE_CODE_PURGEVALS Record
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*attr* field of function block ``INST_INVOKE`` and ``INST_CALL`` records.
Entries within ``PARAMATTR_BLOCK`` are constructed to ensure that each is unique
-(i.e., no two indicies represent equivalent attribute lists).
+(i.e., no two indices represent equivalent attribute lists).
.. _PARAMATTR_CODE_ENTRY:
constants, metadata, type symbol table entries, or other type operator records.
Entries within ``TYPE_BLOCK`` are constructed to ensure that each entry is
-unique (i.e., no two indicies represent structurally equivalent types).
+unique (i.e., no two indices represent structurally equivalent types).
.. _TYPE_CODE_NUMENTRY:
.. _NUMENTRY:
VALUE_SYMTAB_BLOCK Contents
---------------------------
-The ``VALUE_SYMTAB_BLOCK`` block (id 14) ...
+The ``VALUE_SYMTAB_BLOCK`` block (id 14) ...
.. _METADATA_BLOCK: