Skip to content

16A9DA/Prompt2Exploit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prompt2Exploit

Comparative security analysis of AI-generated vs human-written Flask web applications using automated vulnerability scanning (OWASP ZAP).

Overview

This project examines whether AI-generated Flask web applications contain more or different exploitable vulnerabilities than equivalent human-written apps. Each application is scanned using OWASP ZAP, and findings are recorded in data/dataset.csv by severity and vulnerability type.

The data/analysis.ipynb notebook processes raw OWASP ZAP scan outputs and transforms them into a structured dataset for analysis. It handles vulnerability categorization, severity mapping, and exploitability labeling for both AI-generated and human-written applications. The processed data is then used to compute comparative metrics such as risk score and exploitability rate.

Project Structure

Prompt2Exploit/
├── data/
│   ├── ai_apps/
│   │   ├── login_app1/2/3.py
│   │   ├── form_app1/2/3.py
│   │   ├── notes_app2.py
│   │   └── api_app1/3.py
│   ├── human_apps/
│   │   ├── login_app1/2/3.py
│   │   ├── form_app1/2/3.py
│   │   ├── notes_app2.py
│   │   └── api_app1/3.py
│   └── dataset.csv
│   └── analysis.ipynb
└── prompt.txt
└── Zap_Levels.txt

App Categories

Category Apps Description
Login login_app1/2/3 Auth flows with registration, admin-only login, dashboard
Forms form_app1/2/3 Contact, feedback, and inquiry submission forms
Notes/Todos notes_app2 CRUD note-taking and todo list apps
API/Services api_app1/3 REST API, URL shortener, file upload service

ai_apps genereted from ChatGpt 5.3 mini model, you can find Master prompt and sub-prompt in prompt.txt. human_apps sourced from: henry-richard7, patrickloeber, gilcierweb, pj8912, CoreyMSchafer, TechWithTim, pallets patterns.

Running an App

pip install flask flask-sqlalchemy
cd data/ai_apps   # or data/human_apps
python <app_name>.py
# Accessible at http://127.0.0.1:5000

Dataset

data/dataset.csv records scan results for both ai and human app types:

Field Values
app_id e.g. login_app1, notes_app2
app_type ai | human
feature login, notes, form, api
vulnerability ZAP category
severity high, medium, low
exploitable yes | no

Zap_Levels.txt

You can find the category and how it was implement on the dataset.

Risk Score

App Type Risk Score
AI 1.212766
Human 1.440000

Risk score formaula:

For an application with:

  • 1 high (exploitable)
  • 2 medium (not exploitable)
  • 1 low (not exploitable)

Example:

Calculation:

  • High: 3 × 1.5 = 4.5
  • Medium: 2 × 0.5 = 1.0 each → 2.0 total
  • Low: 1 × 0.5 = 0.5

Total vulnerabilities = 4

Final Risk Score = 7.0 / 4 = 1.75


Exploitability Distribution

App Type Exploitable Proportion
AI No 0.978723
AI Yes 0.021277
Human No 0.853333
Human Yes 0.146667

Conclusion

This study shows that AI-generated Flask applications and human-written Flask applications differ not only in the number of vulnerabilities but also in their nature and severity distribution.

AI-generated applications tend to produce consistent security misconfigurations such as missing HTTP security headers and cookie attribute issues. These are systematic but generally low to medium severity, resulting in a lower exploitability rate.

Human-written applications demonstrate a broader and more diverse vulnerability landscape, including higher-severity issues such as persistent XSS, buffer overflow patterns, and authentication weaknesses. These vulnerabilities are less consistent but more likely to be exploitable and higher impact.

Overall findings: AI code → repetitive configuration-level issues / Human code → diverse logic-level vulnerabilities Human apps show higher risk score and exploitability

About

Security analysis of AI-generated vs human-written web applications using automated vulnerability scanning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors