[fuzzer] document the -tokens flag. Also change the diagnostic output

author Kostya Serebryany <kcc@google.com>

Wed, 1 Apr 2015 21:33:20 +0000 (21:33 +0000)

committer Kostya Serebryany <kcc@google.com>

Wed, 1 Apr 2015 21:33:20 +0000 (21:33 +0000)
author Kostya Serebryany <kcc@google.com>
Wed, 1 Apr 2015 21:33:20 +0000 (21:33 +0000)
committer Kostya Serebryany <kcc@google.com>
Wed, 1 Apr 2015 21:33:20 +0000 (21:33 +0000)
diff --git a/docs/LibFuzzer.rst b/docs/LibFuzzer.rst

index 354e8719035af00be266a7eed0b8a178bd57615e..684d9def787867e39445cb2290daa9ae40f71a4e 100644 (file)
--- a/docs/LibFuzzer.rst
+++ b/docs/LibFuzzer.rst
@@ -163,6 +163,27 @@ which will cause the fuzzer to exit on the first new synthesised input::
  
    N=100; M=4; ./pcre_fuzzer ./CORPUS -jobs=$N -workers=$M -exit_on_first=1
  
+Advanced features
+=================
+
+Tokens
+------
+
+By default, the fuzzer is not aware of complexities of the input language
+and when fuzzing e.g. a C++ parser it will mostly stress the lexer.
+It is very hard for the fuzzer to come up with something like ``reinterpret_cast<int>``
+from a test corpus that doesn't have it.
+See a detailed discussion of this topic at
+http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html.
+
+lib/Fuzzer implements a simple technique that allows to fuzz input languages with
+long tokens. All you need is to prepare a text file containing up to 253 tokens, one token per line,
+and pass it to the fuzzer as ``-tokens=TOKENS_FILE.txt``.
+Three implicit tokens are added: ``" "``, ``"\t"``, and ``"\n"``.
+The fuzzer itself will still be mutating a string of bytes
+but before passing this input to the target library it will replace every byte ``b`` with the ``b``-th token.
+If there are less than ``b`` tokens, a space will be added instead.
+
  
  Fuzzing components of LLVM
  ==========================
@@ -188,6 +209,7 @@ clang-fuzzer
  ------------
  
  The default behavior is very similar to ``clang-format-fuzzer``.
+Clang can also be fuzzed with Tokens_ using ``-tokens=$LLVM/lib/Fuzzer/cxx_fuzzer_tokens.txt`` option.
  
  Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057
  
diff --git a/lib/Fuzzer/FuzzerUtil.cpp b/lib/Fuzzer/FuzzerUtil.cpp

index 3f62a1f1d1e2fc5dd978887672c009a2b58d7b58..3635f39a10def348bec49580a346884229c1afc7 100644 (file)
--- a/lib/Fuzzer/FuzzerUtil.cpp
+++ b/lib/Fuzzer/FuzzerUtil.cpp
@@ -19,15 +19,18 @@
  namespace fuzzer {
  
  void Print(const Unit &v, const char *PrintAfter) {
-  std::cerr << v.size() << ": ";
    for (auto x : v)
-    std::cerr << (unsigned) x << " ";
+    std::cerr << "0x" << std::hex << (unsigned) x << std::dec << ",";
    std::cerr << PrintAfter;
  }
  
  void PrintASCII(const Unit &U, const char *PrintAfter) {
-  for (auto X : U)
-    std::cerr << (char)((isascii(X) && X >= ' ') ? X : '?');
+  for (auto X : U) {
+    if (isprint(X))
+      std::cerr << X;
+    else
+      std::cerr << "\\x" << std::hex << (int)(unsigned)X << std::dec;
+  }
    std::cerr << PrintAfter;
  }
author	Kostya Serebryany <kcc@google.com>
	Wed, 1 Apr 2015 21:33:20 +0000 (21:33 +0000)
committer	Kostya Serebryany <kcc@google.com>
	Wed, 1 Apr 2015 21:33:20 +0000 (21:33 +0000)
docs/LibFuzzer.rst		patch \| blob \| history
lib/Fuzzer/FuzzerUtil.cpp		patch \| blob \| history