summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorKostya Serebryany <kcc@google.com>2015-04-01 21:33:20 +0000
committerKostya Serebryany <kcc@google.com>2015-04-01 21:33:20 +0000
commit01055ec7e316e4b6e1b37e9e165b66d07716830c (patch)
tree3578c38ff426bcb03896514b917230811c4fe996 /docs
parenta8d688454d2a7cf1e38574b836183579b01476ff (diff)
[fuzzer] document the -tokens flag. Also change the diagnostic output
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233842 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r--docs/LibFuzzer.rst22
1 files changed, 22 insertions, 0 deletions
diff --git a/docs/LibFuzzer.rst b/docs/LibFuzzer.rst
index 354e871903..684d9def78 100644
--- a/docs/LibFuzzer.rst
+++ b/docs/LibFuzzer.rst
@@ -163,6 +163,27 @@ which will cause the fuzzer to exit on the first new synthesised input::
N=100; M=4; ./pcre_fuzzer ./CORPUS -jobs=$N -workers=$M -exit_on_first=1
+Advanced features
+=================
+
+Tokens
+------
+
+By default, the fuzzer is not aware of complexities of the input language
+and when fuzzing e.g. a C++ parser it will mostly stress the lexer.
+It is very hard for the fuzzer to come up with something like ``reinterpret_cast<int>``
+from a test corpus that doesn't have it.
+See a detailed discussion of this topic at
+http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html.
+
+lib/Fuzzer implements a simple technique that allows to fuzz input languages with
+long tokens. All you need is to prepare a text file containing up to 253 tokens, one token per line,
+and pass it to the fuzzer as ``-tokens=TOKENS_FILE.txt``.
+Three implicit tokens are added: ``" "``, ``"\t"``, and ``"\n"``.
+The fuzzer itself will still be mutating a string of bytes
+but before passing this input to the target library it will replace every byte ``b`` with the ``b``-th token.
+If there are less than ``b`` tokens, a space will be added instead.
+
Fuzzing components of LLVM
==========================
@@ -188,6 +209,7 @@ clang-fuzzer
------------
The default behavior is very similar to ``clang-format-fuzzer``.
+Clang can also be fuzzed with Tokens_ using ``-tokens=$LLVM/lib/Fuzzer/cxx_fuzzer_tokens.txt`` option.
Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057