improve io::Cursor read() performance for small sizeof(T)
authorPhilip Pronin <philipp@fb.com>
Sun, 7 Dec 2014 00:49:11 +0000 (16:49 -0800)
committerDave Watson <davejwatson@fb.com>
Thu, 11 Dec 2014 16:01:06 +0000 (08:01 -0800)
commit173356a35a2894817b7b45134eb062211b54dfc7
tree8bcce78484a0f9a044a78158048f2b5053fd506f
parent2bda6641d6a803545b43af65228cb94fdbf32d78
improve io::Cursor read() performance for small sizeof(T)

Summary:
I just found that gcc (4.8.2) failed to unroll the loop in
`pullAtMost()`, so it didn't replace `memcpy` with a simple load
for small `len`.

Test Plan:
fbconfig -r folly/io/test thrift/lib/cpp2/test && fbmake runtests_opt -j32

Ran unicorn-specific thrift deserialization benchmark from
D1724070, verified 50% improvement in `SearchRequest` deserialization
performance.

`thrift/lib/cpp2/test/ProtocolBench` results:

```
|---- before -----| |---- after  -----|
================================================================================================
thrift/lib/cpp2/test/ProtocolBench.cpp          relative  time/iter  iters/s  time/iter  iters/s
================================================================================================
BinaryProtocol_read_Empty                                   21.72ns   46.04M    17.58ns   56.89M
BinaryProtocol_read_SmallInt                                43.03ns   23.24M    23.64ns   42.30M
BinaryProtocol_read_BigInt                                  43.72ns   22.87M    22.03ns   45.38M
BinaryProtocol_read_SmallString                             88.57ns   11.29M    47.01ns   21.27M
BinaryProtocol_read_BigString                              365.76ns    2.73M   323.58ns    3.09M
BinaryProtocol_read_BigBinary                              207.78ns    4.81M   169.09ns    5.91M
BinaryProtocol_read_LargeBinary                            187.81ns    5.32M   172.09ns    5.81M
BinaryProtocol_read_Mixed                                  161.18ns    6.20M    68.41ns   14.62M
BinaryProtocol_read_SmallListInt                           177.32ns    5.64M    96.91ns   10.32M
BinaryProtocol_read_BigListInt                              77.03us   12.98K    15.88us   62.97K
BinaryProtocol_read_BigListMixed                             1.79ms   557.79   923.99us    1.08K
BinaryProtocol_read_LargeListMixed                         195.01ms     5.13   103.78ms     9.64
================================================================================================
```

Reviewed By: soren@fb.com

Subscribers: alandau, bmatheny, mshneer, trunkagent, njormrod, folly-diffs@

FB internal diff: D1724111

Tasks: 5770136

Signature: t1:1724111:1417977810:b7d643d0c819a0bbac77fa0048206153929e50a8
folly/io/Compression.cpp
folly/io/Cursor.h