Hacking on Tartan ====================== A very basic and incomplete guide. Development principles --- The intention is for Tartan to always be developed according to these principles, which are in no particular order: * Stability: Tartan needs to reliably run, and needs to have a consistent command line and build system interface, so that projects only ever need to integrate support for it once. * Limiting false positives: Nobody will use Tartan if it produces too many false positives. Keeping the false positive rate low is more important than increasing the true positive rate or decreasing the false negative rate. * Allowing choice over warnings/errors: Where it’s not possible to keep the false positive rate low, Tartan needs to offer users the choice to disable specific warnings/errors or classes of warnings/errors, so that they can manually keep their own false positive rate low. * Depth of checks before breadth of checks: It’s more important to catch all the problems with a certain data type (for example, `GError`) than to catch some problems with all data types, as this means users of Tartan only have to refactor their (for example) `GError` usage once, rather than multiple times as Tartan adds more checks. Checkers which are in progress can be marked as experimental and disabled by default. * Focus on big gain checks first: Prioritise spending implementation time on checks which are going to be useful to as many people as possible, before working on ones which are less likely to be needed. * Focus on ease of use: If Tartan is not easy to use, or its output not easy to understand, nobody is going to use it. * Do not require user code modifications: Users should not have to modify their code in order to run Tartan on it. That is not easy to use. * Unit tests for everything: Clang keeps changing, and once users have found a bug in their code using Tartan, they are going to fix it — so we can’t rely on Tartan’s behaviour not changing over time, and we can’t rely on users consistently and reproducibly testing it. So we must do that ourselves. Plugins ------- Tartan currently provides a single plugin to be loaded by the Clang static analyser. In the future, it may provide several plugins, but the number of such should be limited to reduce the length of command lines needed for compilation. For example, it would be reasonable to have one plugin specific to GLib, one to libsoup, one to libgdata, etc. Concepts -------- The code in Tartan can be split up into three types of module. Annotaters: Annotaters consume metadata (such as GIR annotation data or precondition assertions in C code) and modify Clang’s AST by adding qualifiers and attributes to aid its normal static analysis checkers avoid false negatives and find new true positives. Checkers: Checkers examine (and do not modify) the Clang AST, looking for specific constructs which they warn or error about. For example, one checker compares nonnull attributes with precondition assertions and warns if they disagree. Each checker should be self-contained and only check one type of construct; this allows the user to disable checkers they don’t want. There is a conflict between many of these checkers and annotations added by the annotaters above. Ideally, any AST changes made by the annotaters will be tagged as such, and the checkers will warn about them. Otherwise false negatives will result, where the annotaters have fixed up bad code rather than getting the user to fix it. (Having the annotaters fix this code is necessary to allow for further static analysis; e.g. nonnull checks.) Analysers: Analysers run only at analysis time, modifying the symbolic program state (rather than the AST) during analysis to help reduce the number of false positives. Analysers do not emit warnings or errors. Measurement ----------- Any changes made to the checking or reporting in Tartan should be carefully measured by running the modified plugin against a large number of GNOME modules, and analysing how the error counts of those modules change. Avoiding false positives is highly preferred over avoiding false negatives, on the principle that nobody will use the plugin if it produces more than a couple of false positives. As long as the plugin finds some true positives, the number of false negatives is of low importance — we’re not losing anything by them. Background reading ------------------ http://www.clic.cs.columbia.edu/~junfeng/reliable-software/papers/coverity.pdf http://lists.llvm.org/pipermail/cfe-dev/2015-August/044825.html