Elias-Fano micro-optimizations
Summary:
Short skips have been optimized by adding special cases that use simple iteration when it is convenient. Large skips have been optimized by using the broadword selection algorithm by Vigna (improved with ideas by Gog&Petri) instead of iterating on the zeros/ones of the upper bits.
The benchmarks had to be made more granular to measure the differences, in particular they used to test skipping with a fixed skip length for each test, while now we average over a range of skips to better simulate a random distribution.
The improvements are very significant for `skipTo()` on short skips, about 2-3x for skips at distance 1 or 2, which can occur when intersecting dense lists. On large skips the gain is about 17%.
`skip()` exhibits slightly smaller improvements.
before after
============================================================================ ==================
folly/experimental/test/EliasFanoCodingTest.cpp relative time/iter iters/s time/iter iters/s
============================================================================ ==================
Next 2.52ns 396.26M 2.52ns 397.28M
Skip_ForwardQ128(1) 8.66ns 115.52M 3.92ns 255.28M
Skip_ForwardQ128(2) 8.37ns 119.42M 5.08ns 197.04M
Skip_ForwardQ128(4_pm_1) 9.67ns 103.41M 7.04ns 142.02M
Skip_ForwardQ128(16_pm_4) 21.44ns 46.65M 19.68ns 50.82M
Skip_ForwardQ128(64_pm_16) 30.86ns 32.40M 27.58ns 36.26M
Skip_ForwardQ128(256_pm_64) 37.80ns 26.45M 32.49ns 30.78M
Skip_ForwardQ128(1024_pm_256) 38.99ns 25.65M 33.39ns 29.95M
Jump_ForwardQ128 37.91ns 26.37M 34.05ns 29.37M
---------------------------------------------------------------------------- ------------------
SkipTo_SkipQ128(1) 13.87ns 72.10M 4.42ns 226.49M
SkipTo_SkipQ128(2) 18.80ns 53.20M 8.58ns 116.55M
SkipTo_SkipQ128(4_pm_1) 23.59ns 42.38M 11.43ns 87.50M
SkipTo_SkipQ128(16_pm_4) 36.04ns 27.74M 31.19ns 32.06M
SkipTo_SkipQ128(64_pm_16) 53.34ns 18.75M 43.88ns 22.79M
SkipTo_SkipQ128(256_pm_64) 62.27ns 16.06M 49.08ns 20.37M
SkipTo_SkipQ128(1024_pm_256) 65.63ns 15.24M 52.24ns 19.14M
JumpTo_SkipQ128 65.89ns 15.18M 54.61ns 18.31M
---------------------------------------------------------------------------- ------------------
Encode_10 111.94ns 8.93M 117.24ns 8.53M
Encode 5.35ms 187.02 5.64ms 177.15
---------------------------------------------------------------------------- ------------------
Select64 8.07ns 123.96M 8.04ns 124.35M
============================================================================ ==================
Test Plan:
fbconfig folly/experimental/test:eliasfano_test && fbmake runtests_opt
Reviewed By: philipp@fb.com
Subscribers: yfeldblum, fbcode-common-diffs@, chaoyc, search-fbcode-diffs@, unicorn-diffs@, trunkagent, folly-diffs@
FB internal diff:
D1793554
Signature: t1:
1793554:
1423619344:
1b078c0789639f317342ebcc77b11fe91859cd7b