Structure over cleverness

The best prompts are boring: clear instructions, concrete examples, explicit constraints. Avoid poetry. Aim for predictability.

Production prompts are tested like code: version controlled, evaluated on benchmarks, and improved iteratively based on failure analysis.

The anatomy of a production prompt

Start with role and context: 'You are a customer support agent with access to order history.' Follow with task definition: 'Help users track orders, process returns, and answer product questions.'

Add constraints: 'Never make up order details. Always cite the order ID. Escalate to human if refund exceeds $500.' End with output format requirements.

Few-shot examples that actually help

Include 2-4 high-quality examples that demonstrate edge cases and desired formatting. Show both successful outputs and appropriate refusals.

Bad examples hurt performance. Every example must be correct, representative, and demonstrate something the model might otherwise get wrong.

Chain-of-thought for reasoning tasks

For complex reasoning, explicitly instruct the model to think step-by-step before answering. This reduces errors on math, logic, and multi-step planning tasks.

Format matters: 'Think through this step by step in <thinking> tags, then provide your final answer in <answer> tags.' Structured outputs enable better parsing and validation.

Tool use patterns

Define tools with strict schemas. In the prompt, explain when to use each tool, what constitutes success, and how to handle errors.

Example pattern: 'If the user asks about weather, use get_weather(location). If it fails, apologize and ask the user to try again later. Never invent weather data.'

Error handling and graceful degradation

Tell the model what to do when uncertain: 'If you don't have enough information, ask a clarifying question.' or 'If the request violates policy, explain why in simple terms.'

Explicit error instructions reduce hallucination and improve user trust. Users prefer honest 'I don't know' over confident incorrect answers.

Versioning and testing

Treat prompts as code: store them in version control, review changes, and maintain a changelog. When you modify a prompt, regression test against your evaluation set.

Build a test suite of inputs covering: typical cases, edge cases, adversarial inputs, and known past failures. Run this suite on every prompt change.

Iterative improvement from production data

Collect failure cases from production: user corrections, escalations to humans, and low satisfaction ratings. These are your next test cases.

Add examples from these failures to your prompt or evaluation set. The best prompts evolve based on real-world usage patterns.