Prompt Chaining: How to Build Sequential AI Workflows
You've hit the wall. Your single prompt is getting longer, more complex, and increasingly unreliable. The LLM sometimes nails it, sometimes completely misses. Sound familiar?
Prompt chaining is the solution: break your mega-prompt into smaller, focused steps where each LLM call does one thing well.
This guide shows you how to build reliable prompt chains, when to use them, and how to avoid the common pitfalls that trip up most developers.
What Is Prompt Chaining?
Prompt chaining connects multiple LLM calls in sequence. The output of one prompt becomes the input for the next:
| 1 | ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ |
| 2 | │ Prompt 1 │───▶│ Prompt 2 │───▶│ Prompt 3 │ |
| 3 | │ Extract │ │ Transform │ │ Format │ |
| 4 | └─────────────┘ └─────────────┘ └─────────────┘ |
| 5 | │ │ │ |
| 6 | ▼ ▼ ▼ |
| 7 | Raw Data Structured Final Output |
| 8 | Data |
| 9 | |
Instead of asking the LLM to do everything at once:
| 1 | ❌ "Read this document, extract the key points, translate them to Spanish, |
| 2 | summarize each point, and format as a newsletter" |
| 3 | |
You break it into steps:
| 1 | ✅ Step 1: "Extract key points from this document" |
| 2 | Step 2: "Translate these points to Spanish" |
| 3 | Step 3: "Summarize each point in one sentence" |
| 4 | Step 4: "Format these summaries as a newsletter" |
| 5 | |
Each step is simpler, more reliable, and easier to debug.
Why Prompt Chaining Works
1. Reduced Cognitive Load
LLMs perform better on focused tasks. A prompt that does one thing well consistently outperforms a prompt trying to juggle five things.
Research insight: Studies show LLM accuracy drops significantly as task complexity increases. Breaking a 5-step task into 5 prompts can improve overall accuracy by 20-40%.
2. Debuggability
When something goes wrong in a monolithic prompt, good luck figuring out where. With chains, you can inspect each intermediate output:
| 1 | # Easy to debug |
| 2 | step1_output = extract_entities(document) # Check: Are entities correct? |
| 3 | step2_output = classify_entities(step1_output) # Check: Are classifications correct? |
| 4 | step3_output = generate_summary(step2_output) # Check: Is summary accurate? |
| 5 | |
3. Reusability
Chain steps become building blocks. Your "translate to Spanish" step works in any pipeline:
| 1 | # Reuse across different workflows |
| 2 | translate_step = TranslatePrompt(target_language="Spanish") |
| 3 | |
| 4 | workflow_a = Chain([extract, translate_step, summarize]) |
| 5 | workflow_b = Chain([user_input, translate_step, respond]) |
| 6 | |
4. Cost Optimization
You can use smaller, cheaper models for simpler steps and reserve expensive models for complex reasoning:
| 1 | chain = [ |
| 2 | Step("Extract dates", model="gpt-3.5-turbo"), # Simple extraction: cheap model |
| 3 | Step("Parse to ISO format", model="gpt-3.5-turbo"), # Formatting: cheap model |
| 4 | Step("Analyze timeline", model="gpt-4o"), # Complex reasoning: powerful model |
| 5 | ] |
| 6 | |
Basic Prompt Chain Implementation
Here's a minimal but complete implementation:
| 1 | import openai |
| 2 | from dataclasses import dataclass |
| 3 | |
| 4 | @dataclass |
| 5 | class ChainStep: |
| 6 | name: str |
| 7 | prompt_template: str |
| 8 | model: str = "gpt-4o" |
| 9 | |
| 10 | class PromptChain: |
| 11 | def __init__(self, steps: list[ChainStep]): |
| 12 | self.steps = steps |
| 13 | self.client = openai.OpenAI() |
| 14 | self.trace = [] # For debugging |
| 15 | |
| 16 | def run(self, initial_input: str) -> str: |
| 17 | current_input = initial_input |
| 18 | |
| 19 | for step in self.steps: |
| 20 | # Format prompt with current input |
| 21 | prompt = step.prompt_template.format(input=current_input) |
| 22 | |
| 23 | # Call LLM |
| 24 | response = self.client.chat.completions.create( |
| 25 | model=step.model, |
| 26 | messages=[{"role": "user", "content": prompt}] |
| 27 | ) |
| 28 | |
| 29 | output = response.choices[0].message.content |
| 30 | |
| 31 | # Save trace for debugging |
| 32 | self.trace.append({ |
| 33 | "step": step.name, |
| 34 | "input": current_input[:200], # Truncate for readability |
| 35 | "output": output[:200] |
| 36 | }) |
| 37 | |
| 38 | # Output becomes next input |
| 39 | current_input = output |
| 40 | |
| 41 | return current_input |
| 42 | |
| 43 | def debug(self): |
| 44 | """Print execution trace""" |
| 45 | for i, step in enumerate(self.trace): |
| 46 | print(f"\n{'='*50}") |
| 47 | print(f"Step {i+1}: {step['step']}") |
| 48 | print(f"Input: {step['input']}...") |
| 49 | print(f"Output: {step['output']}...") |
| 50 | |
| 51 | |
| 52 | # Usage |
| 53 | chain = PromptChain([ |
| 54 | ChainStep( |
| 55 | name="Extract", |
| 56 | prompt_template="Extract all person names from this text:\n\n{input}" |
| 57 | ), |
| 58 | ChainStep( |
| 59 | name="Deduplicate", |
| 60 | prompt_template="Remove duplicates from this list of names:\n\n{input}" |
| 61 | ), |
| 62 | ChainStep( |
| 63 | name="Format", |
| 64 | prompt_template="Format these names as a numbered list:\n\n{input}" |
| 65 | ) |
| 66 | ]) |
| 67 | |
| 68 | result = chain.run("John met Sarah at the coffee shop. Sarah introduced John to Mike...") |
| 69 | print(result) |
| 70 | chain.debug() # See what happened at each step |
| 71 | |
Output:
| 1 | 1. John |
| 2 | 2. Sarah |
| 3 | 3. Mike |
| 4 | |
| 5 | ================================================== |
| 6 | Step 1: Extract |
| 7 | Input: John met Sarah at the coffee shop. Sarah introduced John to Mike... |
| 8 | Output: John, Sarah, John, Mike, Sarah... |
| 9 | |
| 10 | ================================================== |
| 11 | Step 2: Deduplicate |
| 12 | Input: John, Sarah, John, Mike, Sarah... |
| 13 | Output: John, Sarah, Mike... |
| 14 | |
| 15 | ================================================== |
| 16 | Step 3: Format |
| 17 | Input: John, Sarah, Mike... |
| 18 | Output: 1. John |
| 19 | 2. Sarah |
| 20 | 3. Mike... |
| 21 | |
Real-World Example: Document Processing Pipeline
Let's build a practical document processing chain that:
- Extracts key information
- Validates the extraction
- Transforms to structured data
- Generates a summary
| 1 | from hopx import Sandbox |
| 2 | import openai |
| 3 | import json |
| 4 | |
| 5 | class DocumentProcessor: |
| 6 | def __init__(self): |
| 7 | self.client = openai.OpenAI() |
| 8 | |
| 9 | def process(self, document: str) -> dict: |
| 10 | # Step 1: Extract key information |
| 11 | extracted = self._extract(document) |
| 12 | |
| 13 | # Step 2: Validate extraction (with code execution) |
| 14 | validated = self._validate(extracted) |
| 15 | |
| 16 | # Step 3: Structure the data |
| 17 | structured = self._structure(validated) |
| 18 | |
| 19 | # Step 4: Generate summary |
| 20 | summary = self._summarize(structured) |
| 21 | |
| 22 | return { |
| 23 | "extracted": extracted, |
| 24 | "validated": validated, |
| 25 | "structured": structured, |
| 26 | "summary": summary |
| 27 | } |
| 28 | |
| 29 | def _extract(self, document: str) -> str: |
| 30 | """Step 1: Extract key entities and facts""" |
| 31 | response = self.client.chat.completions.create( |
| 32 | model="gpt-4o", |
| 33 | messages=[{ |
| 34 | "role": "system", |
| 35 | "content": """Extract the following from the document: |
| 36 | - People mentioned (with roles) |
| 37 | - Dates and deadlines |
| 38 | - Action items |
| 39 | - Key decisions |
| 40 | |
| 41 | Format as a structured list.""" |
| 42 | }, { |
| 43 | "role": "user", |
| 44 | "content": document |
| 45 | }] |
| 46 | ) |
| 47 | return response.choices[0].message.content |
| 48 | |
| 49 | def _validate(self, extracted: str) -> str: |
| 50 | """Step 2: Validate with code execution""" |
| 51 | sandbox = Sandbox.create(template="code-interpreter") |
| 52 | |
| 53 | try: |
| 54 | # Use code to validate dates, check for inconsistencies |
| 55 | validation_code = f''' |
| 56 | import re |
| 57 | from datetime import datetime |
| 58 | |
| 59 | text = """{extracted}""" |
| 60 | |
| 61 | # Find all dates |
| 62 | date_patterns = [ |
| 63 | r'\d{{1,2}}/\d{{1,2}}/\d{{4}}', |
| 64 | r'\d{{4}}-\d{{2}}-\d{{2}}', |
| 65 | r'(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{{1,2}},?\s+\d{{4}}' |
| 66 | ] |
| 67 | |
| 68 | dates_found = [] |
| 69 | for pattern in date_patterns: |
| 70 | dates_found.extend(re.findall(pattern, text)) |
| 71 | |
| 72 | # Check for potential issues |
| 73 | issues = [] |
| 74 | if len(dates_found) == 0: |
| 75 | issues.append("No dates found - verify manually") |
| 76 | |
| 77 | # Output validation result |
| 78 | print("VALIDATION RESULT") |
| 79 | print(f"Dates found: {{dates_found}}") |
| 80 | print(f"Issues: {{issues if issues else 'None'}}") |
| 81 | print("---") |
| 82 | print(text) |
| 83 | ''' |
| 84 | |
| 85 | sandbox.files.write("/app/validate.py", validation_code) |
| 86 | result = sandbox.commands.run("python /app/validate.py") |
| 87 | |
| 88 | return result.stdout |
| 89 | finally: |
| 90 | sandbox.kill() |
| 91 | |
| 92 | def _structure(self, validated: str) -> dict: |
| 93 | """Step 3: Convert to structured JSON""" |
| 94 | response = self.client.chat.completions.create( |
| 95 | model="gpt-4o", |
| 96 | messages=[{ |
| 97 | "role": "system", |
| 98 | "content": """Convert this information to JSON with the schema: |
| 99 | { |
| 100 | "people": [{"name": "", "role": ""}], |
| 101 | "dates": [{"date": "", "event": ""}], |
| 102 | "action_items": [{"task": "", "owner": "", "due": ""}], |
| 103 | "decisions": [""] |
| 104 | }""" |
| 105 | }, { |
| 106 | "role": "user", |
| 107 | "content": validated |
| 108 | }], |
| 109 | response_format={"type": "json_object"} |
| 110 | ) |
| 111 | return json.loads(response.choices[0].message.content) |
| 112 | |
| 113 | def _summarize(self, structured: dict) -> str: |
| 114 | """Step 4: Generate executive summary""" |
| 115 | response = self.client.chat.completions.create( |
| 116 | model="gpt-4o", |
| 117 | messages=[{ |
| 118 | "role": "system", |
| 119 | "content": "Write a 2-3 sentence executive summary of this meeting/document." |
| 120 | }, { |
| 121 | "role": "user", |
| 122 | "content": json.dumps(structured, indent=2) |
| 123 | }] |
| 124 | ) |
| 125 | return response.choices[0].message.content |
| 126 | |
| 127 | |
| 128 | # Usage |
| 129 | processor = DocumentProcessor() |
| 130 | result = processor.process(""" |
| 131 | Meeting Notes - Product Launch Planning |
| 132 | Date: January 15, 2025 |
| 133 | |
| 134 | Attendees: Sarah Chen (PM), Mike Johnson (Engineering Lead), Lisa Park (Marketing) |
| 135 | |
| 136 | Discussion: |
| 137 | Sarah presented the launch timeline. Target launch date is March 1, 2025. |
| 138 | Mike raised concerns about the API stability - needs 2 more weeks of testing. |
| 139 | Lisa confirmed marketing materials will be ready by February 15. |
| 140 | |
| 141 | Decisions: |
| 142 | - Soft launch to beta users on February 20 |
| 143 | - Full public launch on March 1 |
| 144 | - Mike to own the stability testing |
| 145 | |
| 146 | Action Items: |
| 147 | - Mike: Complete API load testing by February 1 |
| 148 | - Lisa: Finalize press release by February 10 |
| 149 | - Sarah: Coordinate with sales team by January 20 |
| 150 | """) |
| 151 | |
| 152 | print(json.dumps(result, indent=2)) |
| 153 | |
Prompt Chaining Patterns
Pattern 1: Linear Chain
The simplest pattern—each step feeds into the next:
| 1 | Input → [A] → [B] → [C] → Output |
| 2 | |
| 1 | def linear_chain(text): |
| 2 | extracted = extract(text) |
| 3 | translated = translate(extracted) |
| 4 | formatted = format_output(translated) |
| 5 | return formatted |
| 6 | |
Best for: Sequential transformations, document processing, data pipelines.
Pattern 2: Branching Chain
Different paths based on intermediate results:
| 1 | ┌─[B1]─┐ |
| 2 | Input → [A]─┤ ├─[D]→ Output |
| 3 | └─[B2]─┘ |
| 4 | |
| 1 | def branching_chain(text): |
| 2 | classification = classify(text) |
| 3 | |
| 4 | if classification == "technical": |
| 5 | processed = technical_processor(text) |
| 6 | else: |
| 7 | processed = general_processor(text) |
| 8 | |
| 9 | return finalize(processed) |
| 10 | |
Best for: Content routing, specialized processing, conditional logic.
Pattern 3: Parallel Chain
Multiple independent steps that merge:
| 1 | ┌─[A]─┐ |
| 2 | Input ──┼─[B]─┼── Merge → Output |
| 3 | └─[C]─┘ |
| 4 | |
| 1 | import concurrent.futures |
| 2 | |
| 3 | def parallel_chain(text): |
| 4 | with concurrent.futures.ThreadPoolExecutor() as executor: |
| 5 | future_summary = executor.submit(summarize, text) |
| 6 | future_entities = executor.submit(extract_entities, text) |
| 7 | future_sentiment = executor.submit(analyze_sentiment, text) |
| 8 | |
| 9 | summary = future_summary.result() |
| 10 | entities = future_entities.result() |
| 11 | sentiment = future_sentiment.result() |
| 12 | |
| 13 | return merge_results(summary, entities, sentiment) |
| 14 | |
Best for: Independent analyses, multi-perspective processing, speed optimization.
Pattern 4: Iterative Chain (Loop)
Repeat until a condition is met:
| 1 | ┌──────────────┐ |
| 2 | ▼ │ |
| 3 | Input → [Process] → [Check] ──(not done)──┘ |
| 4 | │ |
| 5 | (done) |
| 6 | ▼ |
| 7 | Output |
| 8 | |
| 1 | def iterative_chain(text, max_iterations=5): |
| 2 | current = text |
| 3 | |
| 4 | for i in range(max_iterations): |
| 5 | # Process |
| 6 | improved = improve(current) |
| 7 | |
| 8 | # Check if good enough |
| 9 | score = evaluate(improved) |
| 10 | if score > 0.9: |
| 11 | return improved |
| 12 | |
| 13 | current = improved |
| 14 | |
| 15 | return current |
| 16 | |
Best for: Refinement tasks, quality improvement, self-correction.
Pattern 5: Fallback Chain
Try multiple approaches, use first success:
| 1 | Input → [A] ──(fail)──→ [B] ──(fail)──→ [C] → Output |
| 2 | │ │ │ |
| 3 | (success) (success) (success) |
| 4 | ▼ ▼ ▼ |
| 5 | Output Output Output |
| 6 | |
| 1 | def fallback_chain(text): |
| 2 | strategies = [ |
| 3 | ("precise", precise_extract), |
| 4 | ("fuzzy", fuzzy_extract), |
| 5 | ("llm_only", llm_extract) |
| 6 | ] |
| 7 | |
| 8 | for name, strategy in strategies: |
| 9 | try: |
| 10 | result = strategy(text) |
| 11 | if validate(result): |
| 12 | return result |
| 13 | except Exception as e: |
| 14 | print(f"{name} failed: {e}") |
| 15 | continue |
| 16 | |
| 17 | raise ValueError("All strategies failed") |
| 18 | |
Best for: Robust systems, graceful degradation, handling edge cases.
Adding Code Execution to Chains
Many chain steps benefit from actual code execution—not just LLM reasoning. This is where sandboxed execution becomes essential:
| 1 | from hopx import Sandbox |
| 2 | import openai |
| 3 | |
| 4 | class CodeAugmentedChain: |
| 5 | def __init__(self): |
| 6 | self.client = openai.OpenAI() |
| 7 | |
| 8 | def analyze_data(self, data_description: str, question: str) -> dict: |
| 9 | """ |
| 10 | Chain: |
| 11 | 1. LLM generates analysis code |
| 12 | 2. Code executes in sandbox |
| 13 | 3. LLM interprets results |
| 14 | """ |
| 15 | |
| 16 | # Step 1: Generate analysis code |
| 17 | code = self._generate_code(data_description, question) |
| 18 | |
| 19 | # Step 2: Execute in sandbox |
| 20 | execution_result = self._execute_code(code) |
| 21 | |
| 22 | # Step 3: Interpret results |
| 23 | interpretation = self._interpret_results(question, execution_result) |
| 24 | |
| 25 | return { |
| 26 | "code": code, |
| 27 | "raw_output": execution_result, |
| 28 | "interpretation": interpretation |
| 29 | } |
| 30 | |
| 31 | def _generate_code(self, data_description: str, question: str) -> str: |
| 32 | response = self.client.chat.completions.create( |
| 33 | model="gpt-4o", |
| 34 | messages=[{ |
| 35 | "role": "system", |
| 36 | "content": """Generate Python code to analyze data and answer the question. |
| 37 | Use pandas for data manipulation. |
| 38 | Print results clearly. |
| 39 | Do not use plt.show() - save plots to files instead.""" |
| 40 | }, { |
| 41 | "role": "user", |
| 42 | "content": f"Data: {data_description}\n\nQuestion: {question}" |
| 43 | }] |
| 44 | ) |
| 45 | |
| 46 | # Extract code from response |
| 47 | content = response.choices[0].message.content |
| 48 | if "```python" in content: |
| 49 | code = content.split("```python")[1].split("```")[0] |
| 50 | else: |
| 51 | code = content |
| 52 | |
| 53 | return code.strip() |
| 54 | |
| 55 | def _execute_code(self, code: str) -> str: |
| 56 | sandbox = Sandbox.create(template="code-interpreter") |
| 57 | |
| 58 | try: |
| 59 | # Install required packages |
| 60 | sandbox.commands.run("pip install pandas numpy -q") |
| 61 | |
| 62 | # Write and execute code |
| 63 | sandbox.files.write("/app/analysis.py", code) |
| 64 | result = sandbox.commands.run("python /app/analysis.py") |
| 65 | |
| 66 | if result.exit_code != 0: |
| 67 | return f"ERROR:\n{result.stderr}" |
| 68 | |
| 69 | return result.stdout |
| 70 | |
| 71 | finally: |
| 72 | sandbox.kill() |
| 73 | |
| 74 | def _interpret_results(self, question: str, raw_output: str) -> str: |
| 75 | response = self.client.chat.completions.create( |
| 76 | model="gpt-4o", |
| 77 | messages=[{ |
| 78 | "role": "system", |
| 79 | "content": "Interpret these analysis results in plain English. Be specific and cite numbers." |
| 80 | }, { |
| 81 | "role": "user", |
| 82 | "content": f"Question: {question}\n\nAnalysis Output:\n{raw_output}" |
| 83 | }] |
| 84 | ) |
| 85 | return response.choices[0].message.content |
| 86 | |
| 87 | |
| 88 | # Usage |
| 89 | chain = CodeAugmentedChain() |
| 90 | result = chain.analyze_data( |
| 91 | data_description="CSV file at /app/sales.csv with columns: date, product, revenue, units_sold", |
| 92 | question="What was the best-selling product in Q4 2024?" |
| 93 | ) |
| 94 | |
Error Handling in Chains
Chains fail. Here's how to handle it gracefully:
| 1 | from dataclasses import dataclass |
| 2 | from typing import Optional |
| 3 | import traceback |
| 4 | |
| 5 | @dataclass |
| 6 | class ChainResult: |
| 7 | success: bool |
| 8 | output: Optional[str] |
| 9 | failed_step: Optional[str] |
| 10 | error: Optional[str] |
| 11 | partial_results: dict |
| 12 | |
| 13 | class RobustChain: |
| 14 | def __init__(self, steps: list): |
| 15 | self.steps = steps |
| 16 | |
| 17 | def run(self, initial_input: str) -> ChainResult: |
| 18 | current_input = initial_input |
| 19 | partial_results = {} |
| 20 | |
| 21 | for step in self.steps: |
| 22 | try: |
| 23 | output = step.execute(current_input) |
| 24 | partial_results[step.name] = output |
| 25 | current_input = output |
| 26 | |
| 27 | except Exception as e: |
| 28 | return ChainResult( |
| 29 | success=False, |
| 30 | output=None, |
| 31 | failed_step=step.name, |
| 32 | error=f"{type(e).__name__}: {str(e)}\n{traceback.format_exc()}", |
| 33 | partial_results=partial_results |
| 34 | ) |
| 35 | |
| 36 | return ChainResult( |
| 37 | success=True, |
| 38 | output=current_input, |
| 39 | failed_step=None, |
| 40 | error=None, |
| 41 | partial_results=partial_results |
| 42 | ) |
| 43 | |
| 44 | |
| 45 | # With retry logic |
| 46 | class RetryableChain(RobustChain): |
| 47 | def run(self, initial_input: str, max_retries: int = 3) -> ChainResult: |
| 48 | current_input = initial_input |
| 49 | partial_results = {} |
| 50 | |
| 51 | for step in self.steps: |
| 52 | for attempt in range(max_retries): |
| 53 | try: |
| 54 | output = step.execute(current_input) |
| 55 | partial_results[step.name] = output |
| 56 | current_input = output |
| 57 | break # Success, move to next step |
| 58 | |
| 59 | except Exception as e: |
| 60 | if attempt == max_retries - 1: |
| 61 | return ChainResult( |
| 62 | success=False, |
| 63 | output=None, |
| 64 | failed_step=step.name, |
| 65 | error=str(e), |
| 66 | partial_results=partial_results |
| 67 | ) |
| 68 | # Wait before retry (exponential backoff) |
| 69 | import time |
| 70 | time.sleep(2 ** attempt) |
| 71 | |
| 72 | return ChainResult( |
| 73 | success=True, |
| 74 | output=current_input, |
| 75 | failed_step=None, |
| 76 | error=None, |
| 77 | partial_results=partial_results |
| 78 | ) |
| 79 | |
When NOT to Use Prompt Chaining
Chaining isn't always the answer. Avoid it when:
| Scenario | Why Chaining Hurts | Better Alternative |
|---|---|---|
| Simple, single-step task | Unnecessary complexity | Single prompt |
| Highly interdependent reasoning | Context loss between steps | Long-context model |
| Real-time latency requirements | Each step adds latency | Cached/precomputed |
| Very short inputs | Overhead exceeds benefit | Single prompt |
| Exploratory/creative tasks | Structure kills creativity | Open-ended prompt |
Signs You're Over-Chaining
- Each step is trivial (could be done with string formatting)
- You're passing the same context through every step
- The chain is slower than a single smart prompt
- Steps are so coupled they always fail/succeed together
Performance Optimization
1. Parallelize Independent Steps
| 1 | import asyncio |
| 2 | |
| 3 | async def optimized_chain(text): |
| 4 | # These can run in parallel |
| 5 | summary_task = asyncio.create_task(summarize(text)) |
| 6 | entities_task = asyncio.create_task(extract_entities(text)) |
| 7 | |
| 8 | summary, entities = await asyncio.gather(summary_task, entities_task) |
| 9 | |
| 10 | # This depends on previous results |
| 11 | final = await generate_report(summary, entities) |
| 12 | |
| 13 | return final |
| 14 | |
2. Use Smaller Models for Simple Steps
| 1 | steps = [ |
| 2 | Step("Format cleanup", model="gpt-3.5-turbo"), # Simple |
| 3 | Step("Entity extraction", model="gpt-3.5-turbo"), # Pattern matching |
| 4 | Step("Complex reasoning", model="gpt-4o"), # Needs power |
| 5 | Step("Final formatting", model="gpt-3.5-turbo"), # Simple |
| 6 | ] |
| 7 | # Cost: ~60% less than using gpt-4o for everything |
| 8 | |
3. Cache Repeated Steps
| 1 | from functools import lru_cache |
| 2 | import hashlib |
| 3 | |
| 4 | @lru_cache(maxsize=1000) |
| 5 | def cached_step(input_hash: str, step_name: str) -> str: |
| 6 | # Actual processing |
| 7 | pass |
| 8 | |
| 9 | def chain_with_cache(text): |
| 10 | input_hash = hashlib.md5(text.encode()).hexdigest() |
| 11 | |
| 12 | # Check cache first |
| 13 | cached = cached_step(input_hash, "extract") |
| 14 | if cached: |
| 15 | return cached |
| 16 | |
| 17 | # Process and cache |
| 18 | result = extract(text) |
| 19 | cached_step.cache_info() # Store result |
| 20 | return result |
| 21 | |
4. Stream Long Chains
| 1 | async def streaming_chain(text): |
| 2 | """Yield results as each step completes""" |
| 3 | |
| 4 | yield {"step": "extract", "status": "starting"} |
| 5 | extracted = await extract(text) |
| 6 | yield {"step": "extract", "status": "complete", "preview": extracted[:100]} |
| 7 | |
| 8 | yield {"step": "transform", "status": "starting"} |
| 9 | transformed = await transform(extracted) |
| 10 | yield {"step": "transform", "status": "complete", "preview": transformed[:100]} |
| 11 | |
| 12 | yield {"step": "format", "status": "starting"} |
| 13 | final = await format_output(transformed) |
| 14 | yield {"step": "format", "status": "complete", "result": final} |
| 15 | |
Prompt Chaining vs. Agent Loops
Don't confuse chaining with agentic systems:
| Prompt Chaining | Agent Loops |
|---|---|
| Fixed sequence of steps | Dynamic, decides next step |
| Predictable execution path | Unpredictable path |
| Faster, cheaper | More flexible, expensive |
| Easier to debug | Harder to debug |
| Best for known workflows | Best for open-ended tasks |
Use chaining when you know the steps upfront.
Use agents when the LLM needs to figure out the steps.
Many production systems combine both: an agent that decides what to do, then triggers chains to do it.
Building Your First Chain: Quickstart
| 1 | # Install |
| 2 | # pip install openai hopx |
| 3 | |
| 4 | from openai import OpenAI |
| 5 | |
| 6 | client = OpenAI() |
| 7 | |
| 8 | def chain_step(prompt: str, input_text: str, model: str = "gpt-4o") -> str: |
| 9 | """Single chain step""" |
| 10 | response = client.chat.completions.create( |
| 11 | model=model, |
| 12 | messages=[{"role": "user", "content": f"{prompt}\n\nInput:\n{input_text}"}] |
| 13 | ) |
| 14 | return response.choices[0].message.content |
| 15 | |
| 16 | # Your first chain |
| 17 | text = "The quick brown fox jumps over the lazy dog. This is a sample text." |
| 18 | |
| 19 | step1 = chain_step("Count the words in this text", text) |
| 20 | step2 = chain_step("Is this count correct? Verify.", step1) |
| 21 | step3 = chain_step("Summarize your findings in one sentence.", step2) |
| 22 | |
| 23 | print(step3) |
| 24 | |
Once you're comfortable, add:
- Error handling
- Logging/tracing
- Parallel execution
- Code execution with sandboxes
Conclusion
Prompt chaining transforms unreliable mega-prompts into robust, debuggable pipelines:
- Break complex tasks into focused steps
- Debug easily by inspecting intermediate outputs
- Optimize costs by using right-sized models per step
- Build reusable components for multiple workflows
Start simple—a 2-3 step chain. Add complexity only when needed.
The best chains feel invisible: they just work, every time.
Ready to add code execution to your chains? Get started with HopX — sandboxes that spin up in 100ms.
Further Reading
- What Is an AI Agent? — Understanding the difference between chains and agents
- Multi-Agent Architectures with HopX — When single chains aren't enough
- Streaming Code Execution for Agents — Real-time output from chain steps
- LangChain Documentation — Popular framework for building chains