Back to Blog

LlamaIndex + HopX: Building RAG Agents with Code Execution

TutorialsAlin Dobra11 min read

LlamaIndex + HopX: Building RAG Agents with Code Execution

LlamaIndex excels at Retrieval-Augmented Generation—connecting LLMs to your data. But what happens when the answer isn't in your documents? What if the LLM needs to compute something?

That's where code execution comes in. This tutorial shows how to build LlamaIndex agents that can both retrieve information AND execute Python code to analyze, calculate, and visualize.

The Power of RAG + Code

text
1
2
  User: "What was our Q3 revenue and how does it compare to     
3
         the industry average growth rate?"                      
4
5
                               
6
                               
7
8
                      LlamaIndex Agent                            
9
                                                                  
10
  1. Query Vector Index  "Q3 revenue was $2.4M"                 
11
  2. Query Vector Index  "Industry avg growth is 12%"          
12
  3. Execute Python  Calculate comparison, growth rate          
13
  4. Generate Response  Synthesize with computed values         
14
15
                               
16
                               
17
18
  Answer: "Q3 revenue was $2.4M, representing 18% YoY growth.   
19
  This outperforms the industry average of 12% by 6 percentage  
20
  points, ranking us in the top quartile of our sector."        
21
22
 

Prerequisites

bash
1
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai hopx-ai
2
 

Set environment variables:

bash
1
export OPENAI_API_KEY="sk-..."
2
export HOPX_API_KEY="..."
3
 

Step 1: Create the Code Execution Tool

Build a LlamaIndex-compatible tool for sandboxed execution:

python
1
from llama_index.core.tools import FunctionTool
2
from hopx import Sandbox
3
from typing import Optional
4
 
5
def execute_python(code: str) -> str:
6
    """
7
    Execute Python code in an isolated sandbox.
8
    
9
    Use this tool when you need to:
10
    - Perform calculations or mathematical operations
11
    - Analyze data with pandas
12
    - Create visualizations
13
    - Process or transform data
14
    
15
    Args:
16
        code: Python code to execute. Must be complete and runnable.
17
              Always use print() to output results.
18
    
19
    Returns:
20
        The output from code execution or error message.
21
    """
22
    sandbox = None
23
    try:
24
        sandbox = Sandbox.create(template="code-interpreter")
25
        result = sandbox.runCode(code, language="python", timeout=60)
26
        
27
        if result.exitCode == 0:
28
            return result.stdout or "Code executed successfully (no output)"
29
        else:
30
            return f"Error: {result.stderr}"
31
    except Exception as e:
32
        return f"Execution failed: {str(e)}"
33
    finally:
34
        if sandbox:
35
            sandbox.kill()
36
 
37
 
38
# Create LlamaIndex tool
39
python_tool = FunctionTool.from_defaults(
40
    fn=execute_python,
41
    name="python_executor",
42
    description="""Execute Python code in a secure sandbox.
43
Use for calculations, data analysis, and any computational task.
44
The sandbox has pandas, numpy, matplotlib, scipy installed.
45
Always print() results you want to see."""
46
)
47
 

Step 2: Build a RAG Index

Create a simple vector index from documents:

python
1
from llama_index.core import VectorStoreIndex, Document, Settings
2
from llama_index.llms.openai import OpenAI
3
from llama_index.embeddings.openai import OpenAIEmbedding
4
 
5
# Configure LlamaIndex
6
Settings.llm = OpenAI(model="gpt-4o", temperature=0)
7
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
8
 
9
# Sample documents (replace with your data)
10
documents = [
11
    Document(text="""
12
    Q3 2024 Financial Report
13
    
14
    Revenue: $2.4 million
15
    Operating Expenses: $1.8 million
16
    Net Profit: $600,000
17
    
18
    Year-over-year revenue growth: 18%
19
    Customer acquisition: 450 new customers
20
    Churn rate: 3.2%
21
    
22
    Key metrics:
23
    - Average revenue per user (ARPU): $89
24
    - Customer lifetime value (LTV): $2,340
25
    - Customer acquisition cost (CAC): $156
26
    """),
27
    
28
    Document(text="""
29
    Industry Benchmarks 2024
30
    
31
    SaaS Industry Average Metrics:
32
    - Revenue growth: 12% YoY
33
    - Churn rate: 5.2%
34
    - ARPU: $75
35
    - LTV/CAC ratio: 3:1
36
    
37
    Top quartile performance:
38
    - Revenue growth: >15%
39
    - Churn rate: <3%
40
    - LTV/CAC ratio: >4:1
41
    """),
42
    
43
    Document(text="""
44
    Customer Segments Analysis
45
    
46
    Enterprise (>1000 employees):
47
    - 45 customers
48
    - $450 ARPU
49
    - 1.5% churn
50
    
51
    Mid-Market (100-1000 employees):
52
    - 180 customers  
53
    - $120 ARPU
54
    - 2.8% churn
55
    
56
    SMB (<100 employees):
57
    - 675 customers
58
    - $45 ARPU
59
    - 4.1% churn
60
    """)
61
]
62
 
63
# Create index
64
index = VectorStoreIndex.from_documents(documents)
65
 

Step 3: Create a Query Engine Tool

Wrap the index as a tool the agent can use:

python
1
from llama_index.core.tools import QueryEngineTool
2
 
3
# Create query engine
4
query_engine = index.as_query_engine(similarity_top_k=3)
5
 
6
# Wrap as tool
7
rag_tool = QueryEngineTool.from_defaults(
8
    query_engine=query_engine,
9
    name="company_knowledge",
10
    description="""Search the company knowledge base for information about:
11
- Financial metrics and reports
12
- Industry benchmarks
13
- Customer segments
14
- Performance data
15
 
16
Use this to find specific facts before doing calculations."""
17
)
18
 

Step 4: Build the Agent

Combine RAG and code execution in an agent:

python
1
from llama_index.core.agent import ReActAgent
2
 
3
# Create agent with both tools
4
agent = ReActAgent.from_tools(
5
    tools=[rag_tool, python_tool],
6
    llm=Settings.llm,
7
    verbose=True,
8
    max_iterations=10
9
)
10
 
11
# Test it
12
response = agent.chat(
13
    "What was our Q3 revenue and how does it compare to industry average? "
14
    "Calculate the exact percentage difference."
15
)
16
 
17
print(response)
18
 

Example output:

text
1
Thought: I need to find our Q3 revenue and the industry average, then calculate the comparison.
2
 
3
Action: company_knowledge
4
Action Input: {"input": "Q3 2024 revenue"}
5
Observation: Q3 revenue was $2.4 million with 18% YoY growth...
6
 
7
Action: company_knowledge  
8
Action Input: {"input": "industry average revenue growth"}
9
Observation: SaaS industry average revenue growth is 12% YoY...
10
 
11
Action: python_executor
12
Action Input: {"code": "our_growth = 18\nindustry_avg = 12\ndiff = our_growth - industry_avg\npercentage_better = (diff / industry_avg) * 100\nprint(f'Difference: {diff} percentage points')\nprint(f'We outperform by: {percentage_better:.1f}%')"}
13
Observation: Difference: 6 percentage points
14
We outperform by: 50.0%
15
 
16
Answer: Our Q3 revenue was $2.4 million with 18% year-over-year growth. 
17
Compared to the industry average of 12%, we outperform by 6 percentage points, 
18
which represents a 50% better growth rate than the industry benchmark.
19
 

Step 5: Persistent Sandbox for Complex Analysis

For multi-step analyses, use a persistent sandbox:

python
1
from llama_index.core.tools import FunctionTool
2
from hopx import Sandbox
3
from typing import Optional
4
 
5
class PersistentSandbox:
6
    """Manage a persistent sandbox for multi-step analysis."""
7
    
8
    _instance: Optional['PersistentSandbox'] = None
9
    
10
    def __init__(self):
11
        self.sandbox: Optional[Sandbox] = None
12
    
13
    @classmethod
14
    def get(cls) -> 'PersistentSandbox':
15
        if cls._instance is None:
16
            cls._instance = cls()
17
        return cls._instance
18
    
19
    def execute(self, code: str) -> str:
20
        if self.sandbox is None:
21
            self.sandbox = Sandbox.create(template="code-interpreter", ttl=600)
22
        
23
        result = self.sandbox.runCode(code, language="python", timeout=60)
24
        
25
        if result.exitCode == 0:
26
            return result.stdout or "Executed (no output)"
27
        return f"Error: {result.stderr}"
28
    
29
    def cleanup(self):
30
        if self.sandbox:
31
            self.sandbox.kill()
32
            self.sandbox = None
33
 
34
 
35
def execute_python_persistent(code: str) -> str:
36
    """
37
    Execute Python with persistent state.
38
    Variables and imports persist between calls.
39
    """
40
    return PersistentSandbox.get().execute(code)
41
 
42
 
43
persistent_python = FunctionTool.from_defaults(
44
    fn=execute_python_persistent,
45
    name="python_persistent",
46
    description="""Execute Python code with PERSISTENT STATE.
47
Variables, DataFrames, and imports persist between calls.
48
Use this for multi-step analysis where you need to build on previous results.
49
"""
50
)
51
 

Step 6: Data Analysis Agent

Build a specialized agent for data analysis:

python
1
from llama_index.core.agent import ReActAgent
2
from llama_index.core.tools import FunctionTool
3
from hopx import Sandbox
4
import json
5
 
6
# Data upload tool
7
def upload_data(filename: str, data: str) -> str:
8
    """
9
    Upload CSV data to the sandbox for analysis.
10
    
11
    Args:
12
        filename: Name for the file (e.g., 'sales.csv')
13
        data: CSV content as a string
14
    """
15
    sandbox = PersistentSandbox.get()
16
    if sandbox.sandbox is None:
17
        sandbox.sandbox = Sandbox.create(template="code-interpreter", ttl=600)
18
    
19
    sandbox.sandbox.files.write(f"/app/{filename}", data)
20
    return f"Uploaded {filename} to /app/{filename}"
21
 
22
 
23
upload_tool = FunctionTool.from_defaults(
24
    fn=upload_data,
25
    name="upload_data",
26
    description="Upload CSV data to sandbox. Provide filename and CSV content."
27
)
28
 
29
 
30
# Create data analysis agent
31
data_agent = ReActAgent.from_tools(
32
    tools=[rag_tool, persistent_python, upload_tool],
33
    llm=Settings.llm,
34
    verbose=True,
35
    system_prompt="""You are a data analyst assistant.
36
 
37
When analyzing data:
38
1. First check if relevant context exists in the knowledge base
39
2. Upload data files as needed using upload_data
40
3. Use python_persistent for multi-step analysis (state persists!)
41
4. Always show your calculations and explain your methodology
42
5. Create visualizations when helpful (save to /app/chart.png)
43
 
44
For calculations, always use Python to ensure accuracy."""
45
)
46
 
47
 
48
# Example usage
49
response = data_agent.chat("""
50
Here's our monthly revenue data:
51
 
52
month,revenue,customers
53
Jan,180000,520
54
Feb,195000,545
55
Mar,210000,580
56
Apr,225000,610
57
May,240000,650
58
Jun,260000,695
59
 
60
Upload this data and analyze:
61
1. Calculate month-over-month growth rates
62
2. What's the average growth rate?
63
3. Project July revenue based on the trend
64
4. Compare to industry benchmark from our knowledge base
65
""")
66
 
67
print(response)
68
 

Advanced: Sub-Question Query Engine

For complex queries, break them into sub-questions:

python
1
from llama_index.core.query_engine import SubQuestionQueryEngine
2
from llama_index.core.tools import QueryEngineTool, ToolMetadata
3
 
4
# Multiple specialized indices
5
financial_index = VectorStoreIndex.from_documents(financial_docs)
6
customer_index = VectorStoreIndex.from_documents(customer_docs)
7
market_index = VectorStoreIndex.from_documents(market_docs)
8
 
9
# Create query engine tools
10
query_engine_tools = [
11
    QueryEngineTool(
12
        query_engine=financial_index.as_query_engine(),
13
        metadata=ToolMetadata(
14
            name="financial_data",
15
            description="Financial reports, revenue, expenses, profits"
16
        )
17
    ),
18
    QueryEngineTool(
19
        query_engine=customer_index.as_query_engine(),
20
        metadata=ToolMetadata(
21
            name="customer_data", 
22
            description="Customer segments, churn, acquisition metrics"
23
        )
24
    ),
25
    QueryEngineTool(
26
        query_engine=market_index.as_query_engine(),
27
        metadata=ToolMetadata(
28
            name="market_data",
29
            description="Industry benchmarks, competitor analysis, market trends"
30
        )
31
    )
32
]
33
 
34
# Create sub-question query engine
35
sub_question_engine = SubQuestionQueryEngine.from_defaults(
36
    query_engine_tools=query_engine_tools
37
)
38
 
39
# Wrap as tool for agent
40
sub_question_tool = QueryEngineTool.from_defaults(
41
    query_engine=sub_question_engine,
42
    name="comprehensive_search",
43
    description="""Search across all company data sources.
44
Use for complex questions that span multiple topics.
45
Automatically breaks down into sub-questions."""
46
)
47
 
48
# Create powerful agent
49
comprehensive_agent = ReActAgent.from_tools(
50
    tools=[sub_question_tool, persistent_python],
51
    llm=Settings.llm,
52
    verbose=True
53
)
54
 

Multi-Document Analysis with Code

Analyze documents and compute insights:

python
1
from llama_index.core import SimpleDirectoryReader
2
from llama_index.core.node_parser import SentenceSplitter
3
 
4
# Load documents
5
documents = SimpleDirectoryReader("./data/reports/").load_data()
6
 
7
# Parse into nodes
8
parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)
9
nodes = parser.get_nodes_from_documents(documents)
10
 
11
# Create index
12
index = VectorStoreIndex(nodes)
13
 
14
# Agent for document analysis
15
doc_analysis_agent = ReActAgent.from_tools(
16
    tools=[
17
        QueryEngineTool.from_defaults(
18
            query_engine=index.as_query_engine(),
19
            name="document_search",
20
            description="Search uploaded documents for information"
21
        ),
22
        persistent_python
23
    ],
24
    llm=Settings.llm,
25
    verbose=True,
26
    system_prompt="""You are a document analysis agent.
27
 
28
Your workflow:
29
1. Search documents to extract relevant data points
30
2. Use Python to compute statistics, comparisons, trends
31
3. Always verify calculations by showing the code
32
4. Provide data-driven conclusions
33
 
34
When extracting numbers from documents, use Python to validate and compute."""
35
)
36
 

Structured Output with Code Validation

Ensure accuracy by validating with code:

python
1
from llama_index.core.tools import FunctionTool
2
from pydantic import BaseModel
3
from typing import List
4
 
5
class FinancialAnalysis(BaseModel):
6
    revenue: float
7
    growth_rate: float
8
    profit_margin: float
9
    industry_comparison: str
10
    recommendations: List[str]
11
 
12
 
13
def validated_analysis(query: str) -> str:
14
    """
15
    Perform financial analysis with code validation.
16
    
17
    Retrieves data, computes metrics in sandbox, returns validated results.
18
    """
19
    sandbox = PersistentSandbox.get()
20
    
21
    # Step 1: Query for raw data
22
    raw_data = query_engine.query(query)
23
    
24
    # Step 2: Validate and compute in sandbox
25
    validation_code = f'''
26
import json
27
 
28
# Parse extracted values (from RAG)
29
raw_text = """{raw_data}"""
30
 
31
# Extract and validate numbers
32
import re
33
numbers = re.findall(r'\$?([\d,]+(?:\.\d+)?)\s*(?:million|M)?', raw_text)
34
numbers = [float(n.replace(',', '')) for n in numbers]
35
 
36
# Compute derived metrics
37
if len(numbers) >= 2:
38
    revenue = numbers[0]
39
    if 'million' in raw_text.lower():
40
        revenue *= 1_000_000
41
    
42
    # Calculate metrics
43
    analysis = {{
44
        "revenue": revenue,
45
        "extracted_values": numbers,
46
        "validation": "passed" if revenue > 0 else "failed"
47
    }}
48
    print(json.dumps(analysis, indent=2))
49
else:
50
    print(json.dumps({{"error": "Could not extract values"}}))
51
'''
52
    
53
    result = sandbox.execute(validation_code)
54
    return result
55
 
56
 
57
validation_tool = FunctionTool.from_defaults(
58
    fn=validated_analysis,
59
    name="validated_financial_analysis",
60
    description="Perform validated financial analysis with code verification"
61
)
62
 

Complete Working Example

Here's a production-ready implementation:

python
1
"""
2
LlamaIndex RAG Agent with HopX Code Execution
3
"""
4
 
5
from llama_index.core import VectorStoreIndex, Document, Settings
6
from llama_index.core.agent import ReActAgent
7
from llama_index.core.tools import FunctionTool, QueryEngineTool
8
from llama_index.llms.openai import OpenAI
9
from llama_index.embeddings.openai import OpenAIEmbedding
10
from hopx import Sandbox
11
from typing import Optional
12
import os
13
 
14
# Verify environment
15
assert os.environ.get("OPENAI_API_KEY"), "Set OPENAI_API_KEY"
16
assert os.environ.get("HOPX_API_KEY"), "Set HOPX_API_KEY"
17
 
18
# Configure LlamaIndex
19
Settings.llm = OpenAI(model="gpt-4o", temperature=0)
20
Settings.embed_model = OpenAIEmbedding()
21
 
22
 
23
class SandboxManager:
24
    """Singleton sandbox manager."""
25
    _sandbox: Optional[Sandbox] = None
26
    
27
    @classmethod
28
    def execute(cls, code: str) -> str:
29
        if cls._sandbox is None:
30
            cls._sandbox = Sandbox.create(template="code-interpreter", ttl=600)
31
        result = cls._sandbox.runCode(code, language="python", timeout=60)
32
        return result.stdout if result.exitCode == 0 else f"Error: {result.stderr}"
33
    
34
    @classmethod
35
    def cleanup(cls):
36
        if cls._sandbox:
37
            cls._sandbox.kill()
38
            cls._sandbox = None
39
 
40
 
41
def python_executor(code: str) -> str:
42
    """Execute Python code with persistent state."""
43
    return SandboxManager.execute(code)
44
 
45
 
46
def create_rag_agent(documents: list) -> ReActAgent:
47
    """Create a RAG agent with code execution."""
48
    
49
    # Build index
50
    index = VectorStoreIndex.from_documents(
51
        [Document(text=d) for d in documents]
52
    )
53
    
54
    # Tools
55
    tools = [
56
        QueryEngineTool.from_defaults(
57
            query_engine=index.as_query_engine(),
58
            name="knowledge_base",
59
            description="Search the knowledge base for information"
60
        ),
61
        FunctionTool.from_defaults(
62
            fn=python_executor,
63
            name="python",
64
            description="Execute Python for calculations. State persists."
65
        )
66
    ]
67
    
68
    return ReActAgent.from_tools(
69
        tools=tools,
70
        llm=Settings.llm,
71
        verbose=True,
72
        system_prompt="""You are an analytical assistant.
73
1. Search knowledge base for facts
74
2. Use Python for all calculations
75
3. Always verify numbers with code
76
4. Explain your methodology"""
77
    )
78
 
79
 
80
# Example usage
81
if __name__ == "__main__":
82
    docs = [
83
        "Q3 2024: Revenue $2.4M, Growth 18%, Profit margin 25%",
84
        "Industry benchmark: Average growth 12%, Top quartile >15%",
85
        "Customers: 900 total, 45 enterprise ($450 ARPU), 675 SMB ($45 ARPU)"
86
    ]
87
    
88
    agent = create_rag_agent(docs)
89
    
90
    try:
91
        response = agent.chat(
92
            "What's our revenue per customer segment? "
93
            "Calculate the contribution of each segment."
94
        )
95
        print("\n" + "="*50)
96
        print(response)
97
    finally:
98
        SandboxManager.cleanup()
99
 

Best Practices

1. Query First, Compute Second

python
1
# Good pattern:
2
# 1. Retrieve facts from RAG
3
# 2. Compute with Python
4
# 3. Synthesize response
5
 
6
# Don't hallucinate numbers - always verify with code
7
 

2. Use Persistent Sandbox for Multi-Step

python
1
# For complex analysis:
2
step1 = agent.chat("Load the sales data and show structure")
3
step2 = agent.chat("Calculate monthly averages")  # Uses same sandbox
4
step3 = agent.chat("Create visualization")  # State persists
5
 

3. Validate RAG Extractions

python
1
# After RAG retrieval, validate numbers:
2
validation_code = f"""
3
extracted_value = {value}
4
# Sanity checks
5
assert extracted_value > 0, "Value should be positive"
6
assert extracted_value < 1e12, "Value seems too large"
7
print(f"Validated: {extracted_value}")
8
"""
9
 

4. Clean Up Resources

python
1
try:
2
    result = agent.chat(query)
3
finally:
4
    SandboxManager.cleanup()
5
 

Conclusion

LlamaIndex + HopX enables agents that:

  • Retrieve facts from your documents
  • Compute accurate answers with Python
  • Validate numbers through code execution
  • Persist state for complex analyses

No more hallucinated calculations. Your agent can reason about data with the precision of code.


Ready to add code execution to your RAG app? Get started with HopX — sandboxes that spin up in 100ms.

Further Reading