Prompt2Exploit

Comparative security analysis of AI-generated vs human-written Flask web applications using automated vulnerability scanning (OWASP ZAP).

Overview

This project examines whether AI-generated Flask web applications contain more or different exploitable vulnerabilities than equivalent human-written apps. Each application is scanned using OWASP ZAP, and findings are recorded in data/dataset.csv by severity and vulnerability type.

The data/analysis.ipynb notebook processes raw OWASP ZAP scan outputs and transforms them into a structured dataset for analysis. It handles vulnerability categorization, severity mapping, and exploitability labeling for both AI-generated and human-written applications. The processed data is then used to compute comparative metrics such as risk score and exploitability rate.

Project Structure

Prompt2Exploit/
├── data/
│   ├── ai_apps/
│   │   ├── login_app1/2/3.py
│   │   ├── form_app1/2/3.py
│   │   ├── notes_app2.py
│   │   └── api_app1/3.py
│   ├── human_apps/
│   │   ├── login_app1/2/3.py
│   │   ├── form_app1/2/3.py
│   │   ├── notes_app2.py
│   │   └── api_app1/3.py
│   └── dataset.csv
│   └── analysis.ipynb
└── prompt.txt
└── Zap_Levels.txt

App Categories

Category	Apps	Description
Login	`login_app1/2/3`	Auth flows with registration, admin-only login, dashboard
Forms	`form_app1/2/3`	Contact, feedback, and inquiry submission forms
Notes/Todos	`notes_app2`	CRUD note-taking and todo list apps
API/Services	`api_app1/3`	REST API, URL shortener, file upload service

ai_apps genereted from ChatGpt 5.3 mini model, you can find Master prompt and sub-prompt in prompt.txt. human_apps sourced from: henry-richard7, patrickloeber, gilcierweb, pj8912, CoreyMSchafer, TechWithTim, pallets patterns.

Running an App

pip install flask flask-sqlalchemy
cd data/ai_apps   # or data/human_apps
python <app_name>.py
# Accessible at http://127.0.0.1:5000

Dataset

data/dataset.csv records scan results for both ai and human app types:

Field	Values
`app_id`	e.g. `login_app1`, `notes_app2`
`app_type`	`ai` \| `human`
`feature`	`login`, `notes`, `form`, `api`
`vulnerability`	ZAP category
`severity`	`high`, `medium`, `low`
`exploitable`	`yes` \| `no`

Zap_Levels.txt

You can find the category and how it was implement on the dataset.

Risk Score

App Type	Risk Score
AI	1.212766
Human	1.440000

Risk score formaula:

For an application with:

1 high (exploitable)
2 medium (not exploitable)
1 low (not exploitable)

Example:

Calculation:

High: 3 × 1.5 = 4.5
Medium: 2 × 0.5 = 1.0 each → 2.0 total
Low: 1 × 0.5 = 0.5

Total vulnerabilities = 4

Final Risk Score = 7.0 / 4 = 1.75

Exploitability Distribution

App Type	Exploitable	Proportion
AI	No	0.978723
AI	Yes	0.021277
Human	No	0.853333
Human	Yes	0.146667

Conclusion

This study shows that AI-generated Flask applications and human-written Flask applications differ not only in the number of vulnerabilities but also in their nature and severity distribution.

AI-generated applications tend to produce consistent security misconfigurations such as missing HTTP security headers and cookie attribute issues. These are systematic but generally low to medium severity, resulting in a lower exploitability rate.

Human-written applications demonstrate a broader and more diverse vulnerability landscape, including higher-severity issues such as persistent XSS, buffer overflow patterns, and authentication weaknesses. These vulnerabilities are less consistent but more likely to be exploitable and higher impact.

Overall findings: AI code → repetitive configuration-level issues / Human code → diverse logic-level vulnerabilities Human apps show higher risk score and exploitability

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
.gitignore		.gitignore
README.md		README.md
Zap_Levels.txt		Zap_Levels.txt
prompt.txt		prompt.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt2Exploit

Overview

Project Structure

App Categories

Running an App

Dataset

Zap_Levels.txt

Risk Score

Exploitability Distribution

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prompt2Exploit

Overview

Project Structure

App Categories

Running an App

Dataset

Zap_Levels.txt

Risk Score

Exploitability Distribution

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages