Benchmark: DVNA (Damn Vulnerable Node Application)¶
DVNA is an intentionally vulnerable Node.js/Express application with 19 documented security flaws spanning the OWASP Top 10. It is a standard benchmark for web application security tools.
- Repository: appsecco/dvna
- Stack: Node.js, Express, Sequelize ORM, EJS templates, SQLite
- Source files: 5 files (core/appHandler.js, core/authHandler.js, core/passport.js, routes/app.js, routes/main.js)
V1 vs V2 Comparison¶
| V1 (file-level) | V2 (function-level) | |
|---|---|---|
| Approach | Whole file + rules in one prompt | tree-sitter call graph, one function per LLM call |
| Files scanned | 12 | 12 (40 functions extracted) |
| LLM calls | 12 (one per file) | ~48 (one per function + flow analysis) |
| Scan time | ~6 min (server.js alone: 228s) | ~5 min (48 calls, ~6s avg) |
| Raw findings | 22 | 58 |
| True positives | 13 | ~33 |
| False positives | 9 (7 server.js garbage + 2 config noise) | ~25 (redirects, ORM, category drift) |
| Signal-to-noise | 59% | 57% |
| Detection rate | 8/19 (42%) | 13/19 (68%) |
Head-to-head: Official 19 Vulnerabilities¶
| # | Vulnerability | V1 | V2 |
|---|---|---|---|
| 1 | SQL Injection (usersearch) | YES | YES |
| 2 | Command Injection (ping) | YES | YES |
| 3 | Insecure password reset (MD5) | YES | YES |
| 4 | Hardcoded session secret | -- | YES |
| 5 | Password hashes disclosed | -- | -- |
| 6 | Sensitive data in logs | -- | -- |
| 7 | XXE (bulkproducts) | YES | YES |
| 8 | Broken Access Control (admin API) | YES | YES |
| 9 | IDOR in user edit | -- | YES |
| 10 | Stack trace exposure | -- | -- |
| 11 | X-Powered-By header | -- | PARTIAL |
| 12 | Reflected XSS (search) | -- | YES |
| 13 | Stored XSS (products) | -- | YES |
| 14 | DOM XSS (admin users) | -- | YES |
| 15 | Insecure deserialization | YES | YES |
| 16 | mathjs RCE (CVE) | -- | -- |
| 17 | Insufficient logging | -- | -- |
| 18 | CSRF | -- | YES |
| 19 | Open redirect | YES | YES |
| Total | 8/19 | 13/19 |
Why V2 detects more¶
-
Function-level focus: V1 sends the entire file (~400 lines) in one prompt. The 7B model loses focus on large files —
appHandler.js(450 lines) yielded only 4 findings in V1 vs 20 in V2 where each function gets its own dedicated analysis. -
Targeted rules per function: V2 selects rules based on function role (route handler gets IDOR+XSS+SQLi rules, auth function gets session+CSRF rules). V1 applies all matching rules at once, diluting the LLM's attention.
-
Multi-phase analysis: V2 runs 5 phases (code map, function review, auth logic, attack surface, data flow). V1 does a single pass + optional verification. The auth and flow phases catch logic flaws (CSRF, IDOR) that pattern matching misses.
-
Call graph context: V2 tells the LLM "this function is called by X and calls Y" — critical for understanding data flow. V1 has no inter-function awareness.
Why V1 still has value¶
- Single-file speed: 10-30s for one file vs V2 needing the whole project mapped first
- Simpler: no tree-sitter dependency, works on any single file in isolation
- CI/CD integration: scan only changed files on commit, fast feedback loop
- No project context needed: V1 works on a file you paste in; V2 needs a project root
Note: for project scans, V2 is actually faster. V1's large prompts choke the 7B model — server.js took 228s (4 min) in V1 because the model received the entire file + all rules in one prompt. V2 splits this into small fast calls.
V1 noise problem: server.js¶
V1 produced 7 false positives from server.js — the LLM listed every rule it checked and said "not found" but formatted each as a HIGH finding. This is a known V1 issue with small config files where the model has nothing to report but fills the JSON anyway. V2 doesn't have this problem because it only analyzes security-relevant functions (server.js setup code is classified as utility and skipped).
Extra findings only V2 caught¶
| Finding | Why V1 missed | Why V2 caught |
|---|---|---|
| IDOR in user edit | Buried in 450-line file, no IDOR rule | OWASP-012 rule + function isolation |
| Reflected/Stored/DOM XSS | File too large, XSS patterns lost | Each render function analyzed individually |
| CSRF on auth flows | No dedicated CSRF rule, auth in separate file | OWASP-013 rule + auth function gets CSRF check |
| Multiple SQLi points | V1 found 1/8, rest buried in large file | Each query function reviewed separately |
| Mass assignment | Not a code pattern V1 looks for | V2's flow analysis traced input to create() |
V2 Test Configuration¶
| Setting | Value |
|---|---|
| Engine | V2 (function-level) |
| Model | Qwen2.5-Coder-7B-Instruct-4bit |
| Timeout | 30s per LLM call |
| Rules | 162 built-in (OWASP + language + framework packs) |
| Functions analyzed | 40 (34 security-relevant) |
| Total LLM calls | ~48 |
| Scan time | ~5 minutes |
| Findings (raw) | 58 |
Results: Official Vulnerabilities¶
DVNA documents 19 vulnerabilities across 12 OWASP categories.
| # | Vulnerability | Category | Detected | Severity | Details |
|---|---|---|---|---|---|
| 1 | SQL Injection in user search | SQL Injection | YES | HIGH | appHandler.js:9 — string concatenation in SQL query |
| 2 | Command Injection in ping | Command Injection | YES | HIGH | appHandler.js:41 — shell command with user input |
| 3 | Insecure password reset (MD5 token) | Weak Cryptography | YES | MEDIUM | authHandler.js:74 — MD5 for token generation flagged |
| 4 | Hardcoded session secret | Broken Authentication | YES | MEDIUM | authHandler.js:5 — hardcoded session configuration |
| 5 | Password hashes disclosed in API | Sensitive Data Exposure | NO | -- | API returns full user objects without field filtering |
| 6 | Sensitive data in Sequelize logs | Sensitive Data Exposure | NO | -- | ORM default logging config, not in route handlers |
| 7 | XXE in bulk product import | XXE | YES | MEDIUM | appHandler.js:239 — noent:true in libxmljs parsing |
| 8 | Missing role check on admin API | Broken Access Control | YES | LOW | routes/app.js:36 — no role/permission check on admin endpoint |
| 9 | IDOR in user edit | IDOR | YES | MEDIUM | appHandler.js:144 — user-provided ID used without ownership check |
| 10 | Stack trace exposure in calculator | Security Misconfiguration | NO | -- | Runtime error handling, not visible in source |
| 11 | X-Powered-By header exposed | Security Misconfiguration | PARTIAL | LOW | Generic security misconfiguration flagged |
| 12 | Reflected XSS in product search | XSS | YES | MEDIUM | appHandler.js:121,152 — unescaped user content in response |
| 13 | Stored XSS in product listing | XSS | YES | MEDIUM | appHandler.js:204 — unsanitized data in DOM |
| 14 | DOM XSS in admin user listing | XSS | YES | MEDIUM | appHandler.js:204 — API data injected into DOM |
| 15 | Insecure deserialization (node-serialize) | Insecure Deserialization | YES | HIGH | appHandler.js:220 — unserialize() on user-controlled data |
| 16 | mathjs RCE (known CVE) | Component Vulnerability | NO | -- | Requires SCA tooling (npm audit), not SAST |
| 17 | Insufficient logging/monitoring | Insufficient Logging | NO | -- | Architectural concern — absence of code, not a pattern |
| 18 | CSRF on state-changing forms | CSRF | YES | MEDIUM | authHandler.js:48 — redirect without CSRF protection |
| 19 | Open redirect via URL parameter | Open Redirect | YES | MEDIUM | Multiple findings — req.query.url used in redirects |
Detection Summary¶
| Result | Count | Rate |
|---|---|---|
| Detected | 13 | 68% |
| Partial | 1 | 5% |
| Missed | 5 | 26% |
Why the 5 misses are out of scope for SAST¶
| # | Vuln | Why missed | What would catch it |
|---|---|---|---|
| 5 | Password hash disclosure | API returns raw DB objects — no dangerous code pattern, just missing filtering | Data-flow rule for "raw model objects in API response" |
| 6 | Sensitive data in logs | Sequelize logging config in models/index.js — default behavior, not explicit code |
Config audit / framework-specific default check |
| 10 | Stack trace exposure | Express dev-mode error handling — runtime behavior | Runtime testing / DAST |
| 16 | mathjs RCE | Known CVE in dependency — not a code pattern | SCA tool (npm audit, Snyk) |
| 17 | Insufficient logging | Absence of logging code — detecting what's not there | Architecture review / compliance check |
Extra Findings: Beyond the Official 19¶
Foil identified several issues not documented in DVNA's official vulnerability list.
Confirmed real issues¶
-
Mass Assignment in bulk product import (
appHandler.js:215) Product data from user input is passed directly tocreate()without field filtering. An attacker could inject extra fields (e.g.,price: 0,isAdmin: true) if the model has additional columns. This is a real mass assignment / over-posting vulnerability. -
Multiple SQL injection points (
appHandler.js:58,83,109,144,194,authHandler.js:19,44,71) The official docs highlight only the user search endpoint, but DVNA has SQL injection via string interpolation in at least 8 other query locations across login, password reset, product edit, and user lookup flows. -
Missing rate limiting on login (
routes/main.js:10) The login endpoint has no rate limiting or account lockout, enabling brute-force attacks. Listed as a finding under Broken Authentication. -
Additional IDOR on vulnerability display (
routes/main.js:14) The/app/vulnerabilities/:idendpoint accepts a resource ID without authorization checks.
False positives / noise¶
-
"Open Redirect" on hardcoded redirects (e.g.,
res.redirect('/login')) Several findings flagres.redirect('/path')with hardcoded paths as "open redirect." These are not exploitable — the redirect target is not user-controlled. ~8 findings are this pattern. -
"Insecure Deserialization" on Sequelize ORM calls (
passport.js:15,31,41,51) The LLM flags normal ORMfindById()/findOne()calls as "insecure deserialization." Sequelize parameterizes queries internally — these are safe. The deserialization rule is too broad; it should only flag actual serialization libraries, not ORMs. ~4 findings are this pattern. -
"OSI Model" category names Some findings use invented category names like "OSI Model - Data Flow" or "OSI Model Layer" instead of standard vuln classes. These are mostly duplicate detections of open redirects. The LLM occasionally ignores the category naming instruction.
Noise analysis¶
| Type | Count | Notes |
|---|---|---|
| True positive (official) | ~25 | Matches documented vulns |
| True positive (extra) | ~8 | Real issues not in docs |
| False positive (hardcoded redirect) | ~8 | Not exploitable |
| False positive (ORM = deserialization) | ~4 | Safe Sequelize calls |
| Duplicate / noisy category | ~13 | Same issue, bad category name |
| Total | 58 | |
| Signal-to-noise | ~57% true positives |
Improvement History¶
| Version | Detection | Notes |
|---|---|---|
| V2 initial (no targeted rules) | 8/19 (42%) + 4 partial | Generic rules, rule IDs as categories |
| V2 + IDOR/CSRF/XXE/Deser rules | 13/19 (68%) + 1 partial | Proper vuln class names, targeted OWASP rules |
Key improvements that moved the needle¶
- OWASP-012 (IDOR/BOLA): Specific ownership-check detection on ID parameters
- OWASP-013 (CSRF): Missing anti-CSRF token detection
- OWASP-014 (DOM XSS): innerHTML/unescaped template sinks
- JS-011 (XXE):
noent:truein XML parsers - JS-012 (Insecure Deserialization):
unserialize()from node-serialize - Proper category names: LLM instructed to use standard vuln class names
Reducing False Positives (TODO)¶
- Hardcoded redirect FPs: Add negative pattern — if the redirect target is a string literal (not user input), skip.
- ORM deserialization FPs: Scope the deserialization rule to actual serialization libraries; add ORM allowlist (Sequelize, TypeORM, Prisma, Mongoose).
- Category name drift: Post-processing normalization catches most but not all. Consider a stricter enum in the JSON schema for
response_format. - Dedup line 186: Multiple files report a finding at line 186 — this appears to be a shared function or template. Improve cross-file dedup.