From daeb63c22064a4f25f6df2b04c34a5d3aa6af873 Mon Sep 17 00:00:00 2001 From: Chris Lattner Date: Sat, 12 May 2007 07:49:15 +0000 Subject: [PATCH] continued description git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@37003 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/BitCodeFormat.html | 113 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 108 insertions(+), 5 deletions(-) diff --git a/docs/BitCodeFormat.html b/docs/BitCodeFormat.html index b84cd0e75bc..16171d3663f 100644 --- a/docs/BitCodeFormat.html +++ b/docs/BitCodeFormat.html @@ -18,6 +18,7 @@
  • Abbreviation IDs
  • Blocks
  • Data Records
  • +
  • Abbreviations
  • LLVM IR Encoding
  • @@ -213,12 +214,14 @@ The set of builtin abbrev IDs is: current block.
  • 1 - ENTER_SUBBLOCK - This abbrev ID marks the beginning of a new block.
  • -
  • 2 - DEFINE_ABBREV - This defines a new abbreviation.
  • -
  • 3 - UNABBREV_RECORD - This ID specifies the definition of an unabbreviated - record.
  • +
  • 2 - DEFINE_ABBREV - This defines a new + abbreviation.
  • +
  • 3 - UNABBREV_RECORD - This ID specifies the + definition of an unabbreviated record.
  • -

    Abbreviation IDs 4 and above are defined by the stream itself.

    +

    Abbreviation IDs 4 and above are defined by the stream itself, and specify +an abbreviated record encoding.

    @@ -303,11 +306,111 @@ multiple of 32-bits.

    +

    +Data records consist of a record code and a number of (up to) 64-bit integer +values. The interpretation of the code and values is application specific and +there are multiple different ways to encode a record (with an unabbrev record +or with an abbreviation). In the LLVM IR format, for example, there is a record +which encodes the target triple of a module. The code is MODULE_CODE_TRIPLE, +and the values of the record are the ascii codes for the characters in the +string.

    + +
    + + +
    UNABBREV_RECORD +Encoding
    + +
    + +

    [UNABBREV_RECORD, codevbr6, numopsvbr6, + op0vbr6, op1vbr6, ...]

    + +

    An UNABBREV_RECORD provides a default fallback encoding, which is both +completely general and also extremely inefficient. It can describe an arbitrary +record, by emitting the code and operands as vbrs.

    + +

    For example, emitting an LLVM IR target triple as an unabbreviated record +requires emitting the UNABBREV_RECORD abbrevid, a vbr6 for the +MODULE_CODE_TRIPLE code, a vbr6 for the length of the string (which is equal to +the number of operands), and a vbr6 for each character. Since there are no +letters with value less than 32, each letter would need to be emitted as at +least a two-part VBR, which means that each letter would require at least 12 +bits. This is not an efficient encoding, but it is fully general.

    +
    + + +
    Abbreviated Record +Encoding
    + +
    + +

    [<abbrevid>, fields...]

    + +

    An abbreviated record is a abbreviation id followed by a set of fields that +are encoded according to the abbreviation +definition. This allows records to be encoded significantly more densely +than records encoded with the UNABBREV_RECORD +type, and allows the abbreviation types to be specified in the stream itself, +which allows the files to be completely self describing. The actual encoding +of abbreviations is defined below. +

    + +
    + + +
    Abbreviations +
    + +

    -blah +Abbreviations are an important form of compression for bitstreams. The idea is +to specify a dense encoding for a class of records once, then use that encoding +to emit many records. It takes space to emit the encoding into the file, but +the space is recouped (hopefully plus some) when the records that use it are +emitted.

    +

    +Abbreviations can be determined dynamically per client, per file. Since the +abbreviations are stored in the bitstream itself, different streams of the same +format can contain different sets of abbreviations if the specific stream does +not need it. As a concrete example, LLVM IR files usually emit an abbreviation +for binary operators. If a specific LLVM module contained no or few binary +operators, the abbreviation does not need to be emitted. +

    +
    + + +
    DEFINE_ABBREV + Encoding
    + +
    + +

    [DEFINE_ABBREV, numabbrevopsvbr5, abbrevop0, abbrevop1, + ...]

    + +

    An abbreviation definition consists of the DEFINE_ABBREV abbrevid followed +by a VBR that specifies the number of abbrev operands, then the abbrev +operands themselves. Abbreviation operands come in three forms. They all start +with a single bit that indicates whether the abbrev operand is a literal operand +(when the bit is 1) or an encoding operand (when the bit is 0).

    + +
      +
    1. Literal operands - [11, litvaluevbr8] - +Literal operands specify that the value in the result +is always a single specific value. This specific value is emitted as a vbr8 +after the bit indicating that it is a literal operand.
    2. +
    3. Encoding info without data - [01, encoding3] + - blah +
    4. +
    5. Encoding info with data - [01, encoding3, +valuevbr5] - + +
    6. +
    +
    -- 2.34.1