Prompt Injection and Data Exfiltration: Threat Modeling for LLM Apps

The core mistake

Teams often treat retrieved text as trusted instructions. Attackers exploit that by embedding malicious directives in documents, web pages, or even emails.

Your model should treat retrieval as untrusted data and only follow system-level policies and tool contracts.

Mitigations that scale

Use strict tool schemas, isolate secrets (never place them in prompts), and apply output filters for sensitive patterns.

Add automated red-teaming prompts to CI. If you can’t reproduce an injection bug, you can’t fix it.