The following is an overview taken from my whitepaper that discusses Larry Suto's recent paper "Analyzing the Effectiveness and Coverage of Web Application Security Scanners".
Suto's paper, as you may or may not know, caused some stir in the industry, which got some of us busy doing our own analysis to rebut some of its findings and methodology.
In October 2007, Larry Suto published the whitepaper "Analyzing the Effectiveness and Coverage of Web Application Security Scanners". The paper, attempted to quantify the effectiveness of automated web application scanners, and in particular he compared between NTObjective's NTOSpider, HP (SPIDynamics) WebInspect, and IBM Rational AppScan.
Larry's comparative work attempted to analyze the effectiveness of the scanners in four areas:
- Links crawled by the scanners' web crawler
- Coverage of application code (tested using Fortify's Tracer tool)
- Number of verified vulnerabilities that were found
- Number of false positives
The paper's conclusion showed a large discrepancy in the number of vulnerabilities detected by WebInspect and AppScan, to those detected by NTOSpider, and in addition it seemed as if WebInspect and AppScan missed a large amount of vulnerabilities (i.e. False Negatives).
My overview, which is presented in this paper, will attempt to shed some light on the results presented in Suto's paper, as well as to rebut its accuracy.
This paper will show that there are fundamental flaws in the original report:
- The findings are not appropriate for comparison purposes, as they compare numbers from products that are measuring vulnerabilities at different levels of granularity (i.e. Apples and Oranges).
- The methodology used in the original whitepaper is questionable, because it wasn't fully explained.
- After requesting information about the test environment and AppScan scan files from the author, our own experiments produced significantly different results from those published in the whitepaper.
The entire whitepaper is available for download here: [Better Untaught Than Ill-Taught]
You can find the original whitepaper written by Larry Suto here.
You should also check out a similar analysis paper written by Jeff Forristal (HP/SPI), that was published a few weeks ago.
I think we pretty much all drew the same conclusion about the testing methodology. It needs to be real science to be valuable.
Also, the code coverage and crawled links are very questionable metrics. To be more valuable, it needs to be related to the vulnerability finding. Let's say a tool A is covering 20% of the webapps and is catching 80% of the vulnerabilities... I would definitely prefer that ;)
Posted by: Romain Gaucher | December 03, 2007 at 11:03 PM
Here's a cross-post that I just found:
http://altomo.info/?p=29
Posted by: Ory Segal | February 04, 2008 at 09:03 PM