Hacking on Tartan
======================

A very basic and incomplete guide.

Development principles
---

The intention is for Tartan to always be developed according to these
principles, which are in no particular order:

 * Stability: Tartan needs to reliably run, and needs to have a consistent
   command line and build system interface, so that projects only ever need to
   integrate support for it once.
 * Limiting false positives: Nobody will use Tartan if it produces too many
   false positives. Keeping the false positive rate low is more important than
   increasing the true positive rate or decreasing the false negative rate.
 * Allowing choice over warnings/errors: Where it’s not possible to keep the
   false positive rate low, Tartan needs to offer users the choice to disable
   specific warnings/errors or classes of warnings/errors, so that they can
   manually keep their own false positive rate low.
 * Depth of checks before breadth of checks: It’s more important to catch all
   the problems with a certain data type (for example, `GError`) than to catch
   some problems with all data types, as this means users of Tartan only have to
   refactor their (for example) `GError` usage once, rather than multiple times
   as Tartan adds more checks. Checkers which are in progress can be marked as
   experimental and disabled by default.
 * Focus on big gain checks first: Prioritise spending implementation time on
   checks which are going to be useful to as many people as possible, before
   working on ones which are less likely to be needed.
 * Focus on ease of use: If Tartan is not easy to use, or its output not easy
   to understand, nobody is going to use it.
 * Do not require user code modifications: Users should not have to modify their
   code in order to run Tartan on it. That is not easy to use.
 * Unit tests for everything: Clang keeps changing, and once users have found a
   bug in their code using Tartan, they are going to fix it — so we can’t rely
   on Tartan’s behaviour not changing over time, and we can’t rely on users
   consistently and reproducibly testing it. So we must do that ourselves.

Plugins
-------

Tartan currently provides a single plugin to be loaded by the Clang static
analyser. In the future, it may provide several plugins, but the number of such
should be limited to reduce the length of command lines needed for compilation.
For example, it would be reasonable to have one plugin specific to GLib, one
to libsoup, one to libgdata, etc.


Concepts
--------

The code in Tartan can be split up into three types of module.

Annotaters:
    Annotaters consume metadata (such as GIR annotation data or precondition
    assertions in C code) and modify Clang’s AST by adding qualifiers and
    attributes to aid its normal static analysis checkers avoid false negatives
    and find new true positives.

Checkers:
    Checkers examine (and do not modify) the Clang AST, looking for specific
    constructs which they warn or error about. For example, one checker compares
    nonnull attributes with precondition assertions and warns if they disagree.
    Each checker should be self-contained and only check one type of construct;
    this allows the user to disable checkers they don’t want.

    There is a conflict between many of these checkers and annotations added by
    the annotaters above. Ideally, any AST changes made by the annotaters will
    be tagged as such, and the checkers will warn about them. Otherwise false
    negatives will result, where the annotaters have fixed up bad code rather
    than getting the user to fix it. (Having the annotaters fix this code is
    necessary to allow for further static analysis; e.g. nonnull checks.)

Analysers:
    Analysers run only at analysis time, modifying the symbolic program state
    (rather than the AST) during analysis to help reduce the number of false
    positives. Analysers do not emit warnings or errors.


Measurement
-----------

Any changes made to the checking or reporting in Tartan should be carefully
measured by running the modified plugin against a large number of GNOME modules,
and analysing how the error counts of those modules change. Avoiding false
positives is highly preferred over avoiding false negatives, on the principle
that nobody will use the plugin if it produces more than a couple of false
positives. As long as the plugin finds some true positives, the number of false
negatives is of low importance — we’re not losing anything by them.


Background reading
------------------

http://www.clic.cs.columbia.edu/~junfeng/reliable-software/papers/coverity.pdf
http://lists.llvm.org/pipermail/cfe-dev/2015-August/044825.html