From: Hans Fugal Date: Tue, 7 Oct 2014 23:36:06 +0000 (-0700) Subject: serial/parallel benchmark X-Git-Tag: v0.22.0~295 X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=d6488e3c10266657e12b8a837ced500022f4ad8e;p=folly.git serial/parallel benchmark Summary: This is an attempt to see the effect of lock contention in the less-common case where a promise is fulfilled at the same time the future is still being set up. Although this is expected to be off the beaten path, it will still be a good idea to keep an eye on how we improve or worsen perf, when we play with different locking strategies (which I intend to do a lot of in the near term). Test Plan: ============================================================================ folly/wangle/test/Benchmark.cpp relative time/iter iters/s ============================================================================ constantFuture 249.31ns 4.01M promiseAndFuture 100.88% 247.13ns 4.05M withThen 43.87% 568.30ns 1.76M ---------------------------------------------------------------------------- oneThen 569.62ns 1.76M twoThens 63.46% 897.60ns 1.11M fourThens 39.64% 1.44us 695.90K hundredThens 1.91% 29.75us 33.61K ---------------------------------------------------------------------------- serial 4.97ms 201.21 parallel 60.22% 8.25ms 121.18 ============================================================================ Reviewed By: jsedgwick@fb.com Subscribers: meisner, net-systems@, fugalh, exa, njormrod FB internal diff: D1593010 --- diff --git a/folly/wangle/test/Benchmark.cpp b/folly/wangle/test/Benchmark.cpp index 7d18db6b..d88d9b9d 100644 --- a/folly/wangle/test/Benchmark.cpp +++ b/folly/wangle/test/Benchmark.cpp @@ -14,11 +14,15 @@ * limitations under the License. */ +#include #include #include #include +#include +#include using namespace folly::wangle; +using namespace std; template T incr(Try&& t) { @@ -74,6 +78,88 @@ BENCHMARK_RELATIVE(hundredThens) { someThens(100); } +// Lock contention. Although in practice fulfil()s tend to be temporally +// separate from then()s, still sometimes they will be concurrent. So the +// higher this number is, the better. +BENCHMARK_DRAW_LINE() + +BENCHMARK(no_contention) { + vector> promises(10000); + vector> futures; + std::thread producer, consumer; + + BENCHMARK_SUSPEND { + folly::Baton<> b1, b2; + for (auto& p : promises) + futures.push_back(p.getFuture()); + + consumer = std::thread([&]{ + b1.post(); + for (auto& f : futures) f.then(incr); + }); + consumer.join(); + + producer = std::thread([&]{ + b2.post(); + for (auto& p : promises) p.setValue(42); + }); + + b1.wait(); + b2.wait(); + } + + // The only thing we are measuring is how long fulfil + callbacks take + producer.join(); +} + +BENCHMARK_RELATIVE(contention) { + vector> promises(10000); + vector> futures; + std::thread producer, consumer; + sem_t sem; + sem_init(&sem, 0, 0); + + BENCHMARK_SUSPEND { + folly::Baton<> b1, b2; + for (auto& p : promises) + futures.push_back(p.getFuture()); + + consumer = std::thread([&]{ + b1.post(); + for (auto& f : futures) { + sem_wait(&sem); + f.then(incr); + } + }); + + producer = std::thread([&]{ + b2.post(); + for (auto& p : promises) { + sem_post(&sem); + p.setValue(42); + } + }); + + b1.wait(); + b2.wait(); + } + + // The astute reader will notice that we're not *precisely* comparing apples + // to apples here. Well, maybe it's like comparing Granny Smith to + // Braeburn or something. In the serial version, we waited for the futures + // to be all set up, but here we are probably still doing that work + // (although in parallel). But even though there is more work (on the order + // of 2x), it is being done by two threads. Hopefully most of the difference + // we see is due to lock contention and not false parallelism. + // + // Be warned that if the box is under heavy load, this will greatly skew + // these results (scheduling overhead will begin to dwarf lock contention). + // I'm not sure but I'd guess in Windtunnel this will mean large variance, + // because I expect they load the boxes as much as they can? + consumer.join(); + producer.join(); +} + int main() { folly::runBenchmarks(); }