I just finished a round of security-oriented static analysis at work. It went pretty well. We found a handful of issues that required a second look - nothing immediately exploitable, but things that should be handled with a healthy dose of paranoia. We made them safer. I had my doubts about commercial-grade static analysis. Now I think I’m convinced.
I can’t help but wonder what this means, though. I agree that software isn’t complete unless it’s secure, but does that mean it’s incomplete until you run it through a static analyzer that costs more than a software engineer? If we all agree a practice is bad, why is it still allowed? Shouldn’t our tools do more to correct problems early? Am I using the wrong tools?
Static Program Analysis
The “static” in static program analysis means analysis occurs when your program isn’t running. You may have seen it called static code analysis, but many tools go beyond code. They analyze all artifacts available to them: code, configuration, byte code, documentation, and compiled binaries. An analyzer scans for issues ranging from the simple (consistent bracket placement in bracketed languages) to complex (data that crosses a network boundary is untrusted, so if it finds its way to a web browser, it’s a candidate for an XSS attack). It then presents the issues, describes their severity, and recommends fixes.
Static analyzers are not created equal. Their focus and costs vary wildly. A sample in the Microsoft .NET ecosystem might include:
- FxCop – Analyzes .NET object code. Provides a fairly fast scan for many issues including security, maintainability, and overall design. Early versions were free. The latest versions are bundled with Visual Studio Premium and Ultimate or in the MS Windows SDK.
- ReSharper – Runs in Visual Studio. Provides real-time feedback on over 1000 code quality issues. A personal license sets you back 150 bucks.
- HP Fortify – Dives deep with scans that include data flow analysis. Scanning a large project might take more than an hour. Priced in tens of thousands of dollars versus hundreds.
Where does the compiler stop and static analyzer begin?
If you remember back to your compiler class, a compiler’s primary responsibility is to validate that source code is well-formed (proper syntax) and to transform source code into a new representation that maintains its meaning (proper semantics). Modern compilers don’t stop there. Take the following C# snippet:
bool condition = false;
if(condition = true)
Console.Write(“Something bad may just have happened.”);
The snippet is well-formed C# code and its meaning is not ambiguous. Still, the Visual C# compiler generates two warnings:
- “Assignment in conditional expression is always constant; did you mean to use == instead of = ?”
- “The variable 'condition' is assigned but its value is never used”
These warnings came from a bit of static analysis. The compiler is smart enough to know that assignment while checking a condition is a common mistake. It’s also smart enough to know you don’t need variables that are never used.
Since we’ve confirmed a compiler can perform static analysis, we have to wonder why it doesn’t do more. It seems sensible to warn a programmer about issues as they happen versus scanning a project offline after large chunks of code are committed. Are the resource requirements to detect issues like SQL injection and XSS vulnerability too much for real-time analysis? I doubt it. In fact, I have a counter-example in a small project named Ur/Web.
Ur is a functional programming language with an ML-like syntax. Ur/Web is Ur with a standard library that supports writing safe web applications. In respect to our topic, its most interesting feature is:
“Ur/Web provides guaranteed security; certain kinds of security vulnerability are impossible. For example, modulo any compiler bugs, any Ur/Web application that the compiler accepts is free of code injection vulnerabilities.”
- from the Ur/Web FAQ
You may wonder how we started at static analysis and ended up with a small academic language maintained by a single professor. My point is this: it makes sense to catch issues early and it can be done. Just like testing, the later in the software life cycle we find an issue, the more it costs to fix. In the case of security issues, one major incident could mean the end of your company. After-the-fact static analysis is not enough. Source code that contains critical, knowable issues is as incorrect as source code with syntax errors. It shouldn’t compile or run.
We have the know-how and cycles to detect many code quality issues. What we don’t have is the will. When all programming languages and tools vie for popularity, no one wants to add restrictions that might frustrate a beginner, even when they make development safer. Just look at the frustration caused because Go makes unused variables and imports a compiler error. Better yet, compare the popularity of a language with more built-in safety, like Haskell, to an easy-to-learn language; I’ve yet to meet a client who uses Haskell. Vendors aren’t motivated to tighten things up. In addition, commercial “enterprise” languages like Java and C# have extensive (and expensive) analysis ecosystems. Making a language “safer” by default would alienate business partners as well as novice developers.
If software development is to improve, we need to improve our language implementations and their tools as well as our engineering practices. Safe and sane software shouldn’t be a choice, it shouldn’t be a “Premium” or “Ultimate” level feature, and it shouldn’t be an add-on that you pay for in easy installments. Instead, it should be something from which a developer must opt-out. If we agree a practice is bad and it’s tool-detectable, we shouldn’t look for it with tests. We should detect and prevent it automatically, without developer intervention. Anything less is a shame.