From: Chris Lattner Abbreviation IDs 4 and above are defined by the stream itself. Abbreviation IDs 4 and above are defined by the stream itself, and specify
+an abbreviated record encoding.
+Data records consist of a record code and a number of (up to) 64-bit integer +values. The interpretation of the code and values is application specific and +there are multiple different ways to encode a record (with an unabbrev record +or with an abbreviation). In the LLVM IR format, for example, there is a record +which encodes the target triple of a module. The code is MODULE_CODE_TRIPLE, +and the values of the record are the ascii codes for the characters in the +string.
+ +[UNABBREV_RECORD, codevbr6, numopsvbr6, + op0vbr6, op1vbr6, ...]
+ +An UNABBREV_RECORD provides a default fallback encoding, which is both +completely general and also extremely inefficient. It can describe an arbitrary +record, by emitting the code and operands as vbrs.
+ +For example, emitting an LLVM IR target triple as an unabbreviated record +requires emitting the UNABBREV_RECORD abbrevid, a vbr6 for the +MODULE_CODE_TRIPLE code, a vbr6 for the length of the string (which is equal to +the number of operands), and a vbr6 for each character. Since there are no +letters with value less than 32, each letter would need to be emitted as at +least a two-part VBR, which means that each letter would require at least 12 +bits. This is not an efficient encoding, but it is fully general.
+[<abbrevid>, fields...]
+ +An abbreviated record is a abbreviation id followed by a set of fields that +are encoded according to the abbreviation +definition. This allows records to be encoded significantly more densely +than records encoded with the UNABBREV_RECORD +type, and allows the abbreviation types to be specified in the stream itself, +which allows the files to be completely self describing. The actual encoding +of abbreviations is defined below. +
+ +-blah +Abbreviations are an important form of compression for bitstreams. The idea is +to specify a dense encoding for a class of records once, then use that encoding +to emit many records. It takes space to emit the encoding into the file, but +the space is recouped (hopefully plus some) when the records that use it are +emitted.
++Abbreviations can be determined dynamically per client, per file. Since the +abbreviations are stored in the bitstream itself, different streams of the same +format can contain different sets of abbreviations if the specific stream does +not need it. As a concrete example, LLVM IR files usually emit an abbreviation +for binary operators. If a specific LLVM module contained no or few binary +operators, the abbreviation does not need to be emitted. +
+[DEFINE_ABBREV, numabbrevopsvbr5, abbrevop0, abbrevop1, + ...]
+ +An abbreviation definition consists of the DEFINE_ABBREV abbrevid followed +by a VBR that specifies the number of abbrev operands, then the abbrev +operands themselves. Abbreviation operands come in three forms. They all start +with a single bit that indicates whether the abbrev operand is a literal operand +(when the bit is 1) or an encoding operand (when the bit is 0).
+ +