The best technologies are often the ones that you, as a user, never have to worry about. They work automatically, behind the scenes, making a product or service work better. With nothing to configure, nothing new to learn. They just work.
String Analysis is such a technology. If you think of it, it's a thing of beauty. Because String Analysis is actually one of the most complicated analysis technologies that IBM (or anyone) has ever developed in the world of static analysis.
Simply put, String Analysis is a technology that tracks string values in a program -- which variables hold which string values, how these values are manipulated, how they flow from variable to variable, and so on. Think of it like associating a regular expression pattern to describe the potential contents of every string variable at any point in the program. It's an exclusive IBM technology, developed by IBM Tokyo Research Lab. It's the smartest static analysis technology in the world, applied to make a complex problem simpler.
Consider this: all security static analyzers perform taint analysis; they look at untrusted data being read by a program, and check where that data flows to. In essence, for each variable in a program, taint analysis tracks a simple binary property: they are either trusted or not. In contrast, String analysis tracks potential string patterns for each variable. As you can imagine, this is much more complex and costly.
In security analysis, understanding potential string values is huge. It allows the scanner to understand which values are not simply untrusted, but can actually contain data that may be dangerous. It lets the scanner automatically figure out which parts of the code perform input validation, and whether this input validation is done correctly or not. With a better insight into what the code is doing, the tool can provide much more accurate results, with much fewer false positives.
Traditional tools rely on user configuration to provide information about input validation methods. String Analysis can figure out this information accurately and automatically, without configuration. This means less work for the user, and more accurate results out-of-the-box.
Any thoughts? Leave a comment after the beep :)