Improve fast path of Cursor
authorStepan Palamarchuk <stepan@fb.com>
Tue, 16 Jan 2018 00:36:41 +0000 (16:36 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Tue, 16 Jan 2018 00:54:23 +0000 (16:54 -0800)
commit2c730d6fe79b5642133c55545c4ed7570ce2abb9
tree2d52fe0560e4650114c61ec54917a5b141155a7d
parentd7b6ad4972b288f90bf57d7597103c44c244decd
Improve fast path of Cursor

Summary:
This change simplifies the fastpath by reducing it to bare minimum (i.e. check length, load data) and removes indirection to IOBuf.
Additionally it adds `skipNoAdvance` method to have 1-instruction skip.

Disassembly of `read<signed char>` is over 35 instructions (just hot path). With this change it's doesn to 8.
Disassembly after:
  Dump of assembler code for function folly::io::detail::CursorBase<folly::io::Cursor, folly::IOBuf const>::read<unsigned char>():
     0x000000000041f0f0 <+0>:     mov    0x18(%rdi),%rax
     0x000000000041f0f4 <+4>:     lea    0x1(%rax),%rcx
     0x000000000041f0f8 <+8>:     cmp    0x10(%rdi),%rcx
     0x000000000041f0fc <+12>:    ja     0x41f105 <folly::io::detail::CursorBase<folly::io::Cursor, folly::IOBuf const>::read<unsigned char>()+21>
     0x000000000041f0fe <+14>:    mov    (%rax),%al
     0x000000000041f100 <+16>:    mov    %rcx,0x18(%rdi)
     0x000000000041f104 <+20>:    retq
     0x000000000041f105 <+21>:    jmpq   0x41f110 <folly::io::detail::CursorBase<folly::io::Cursor, folly::IOBuf const>::readSlow<unsigned char>()>

With this diff Thrift deserialization becomes ~20% faster (with prod workloads).

Thrift benchmark:
Before:
  ============================================================================
  thrift/lib/cpp2/test/ProtocolBench.cpp          relative  time/iter  iters/s
  ============================================================================
  BinaryProtocol_read_Empty                                   12.98ns   77.03M
  BinaryProtocol_read_SmallInt                                20.94ns   47.76M
  BinaryProtocol_read_BigInt                                  20.86ns   47.93M
  BinaryProtocol_read_SmallString                             34.64ns   28.86M
  BinaryProtocol_read_BigString                              185.53ns    5.39M
  BinaryProtocol_read_BigBinary                               67.34ns   14.85M
  BinaryProtocol_read_LargeBinary                             62.23ns   16.07M
  BinaryProtocol_read_Mixed                                   58.74ns   17.03M
  BinaryProtocol_read_SmallListInt                            89.99ns   11.11M
  BinaryProtocol_read_BigListInt                              39.92us   25.05K
  BinaryProtocol_read_BigListMixed                           616.20us    1.62K
  BinaryProtocol_read_LargeListMixed                          83.49ms    11.98
  CompactProtocol_read_Empty                                  11.28ns   88.67M
  CompactProtocol_read_SmallInt                               19.15ns   52.22M
  CompactProtocol_read_BigInt                                 26.14ns   38.25M
  CompactProtocol_read_SmallString                            31.04ns   32.22M
  CompactProtocol_read_BigString                             184.55ns    5.42M
  CompactProtocol_read_BigBinary                              69.73ns   14.34M
  CompactProtocol_read_LargeBinary                            64.39ns   15.53M
  CompactProtocol_read_Mixed                                  58.73ns   17.03M
  CompactProtocol_read_SmallListInt                           76.50ns   13.07M
  CompactProtocol_read_BigListInt                             25.93us   38.56K
  CompactProtocol_read_BigListMixed                          623.15us    1.60K
  CompactProtocol_read_LargeListMixed                         80.57ms    12.41
  ============================================================================

After:
  ============================================================================
  thrift/lib/cpp2/test/ProtocolBench.cpp          relative  time/iter  iters/s
  ============================================================================
  BinaryProtocol_read_Empty                                   10.40ns   96.17M
  BinaryProtocol_read_SmallInt                                15.14ns   66.03M
  BinaryProtocol_read_BigInt                                  15.19ns   65.84M
  BinaryProtocol_read_SmallString                             25.19ns   39.70M
  BinaryProtocol_read_BigString                              172.85ns    5.79M
  BinaryProtocol_read_BigBinary                               56.88ns   17.58M
  BinaryProtocol_read_LargeBinary                             56.77ns   17.61M
  BinaryProtocol_read_Mixed                                   43.98ns   22.74M
  BinaryProtocol_read_SmallListInt                            58.19ns   17.19M
  BinaryProtocol_read_BigListInt                              19.75us   50.63K
  BinaryProtocol_read_BigListMixed                           440.20us    2.27K
  BinaryProtocol_read_LargeListMixed                          56.94ms    17.56
  CompactProtocol_read_Empty                                   9.35ns  106.93M
  CompactProtocol_read_SmallInt                               13.07ns   76.49M
  CompactProtocol_read_BigInt                                 18.23ns   54.87M
  CompactProtocol_read_SmallString                            25.61ns   39.05M
  CompactProtocol_read_BigString                             174.46ns    5.73M
  CompactProtocol_read_BigBinary                              59.77ns   16.73M
  CompactProtocol_read_LargeBinary                            60.81ns   16.44M
  CompactProtocol_read_Mixed                                  42.70ns   23.42M
  CompactProtocol_read_SmallListInt                           66.89ns   14.95M
  CompactProtocol_read_BigListInt                             25.08us   39.87K
  CompactProtocol_read_BigListMixed                          427.93us    2.34K
  CompactProtocol_read_LargeListMixed                         56.11ms    17.82
  ============================================================================

Reviewed By: yfeldblum

Differential Revision: D6635325

fbshipit-source-id: 393fc1005689042977c03f37f5a898ebe7814d44
folly/io/Cursor.h