Tool Use: How AI Agents Interact with the Real World

An LLM without tools is like a brain without a body. It can think, reason, and generate text—but it can't do anything.

Tool use changes everything. Give an LLM access to tools, and suddenly it can search the web, query databases, execute code, send emails, and interact with any API. It transforms from a text generator into an autonomous agent.

This guide shows you how to implement tool use properly—from basic function calling to complex multi-tool orchestration.

What Is Tool Use?

Tool use is a pattern where an LLM decides when and how to call external functions to accomplish a task:

text

1	┌─────────────────────────────────────────────────────────────┐
2	│ User Query │
3	└─────────────────────────────────────────────────────────────┘
4	│
5	▼
6	┌─────────────────────────────────────────────────────────────┐
7	│ LLM │
8	│ │
9	│ "I need to check the weather. I'll use the weather tool" │
10	│ │
11	└─────────────────────────────────────────────────────────────┘
12	│
13	▼
14	┌───────────────────────────────┐
15	│ Tool Call: get_weather │
16	│ Args: {"city": "London"} │
17	└───────────────────────────────┘
18	│
19	▼
20	┌───────────────────────────────┐
21	│ Tool Result: "15°C, Cloudy" │
22	└───────────────────────────────┘
23	│
24	▼
25	┌─────────────────────────────────────────────────────────────┐
26	│ LLM │
27	│ │
28	│ "The weather in London is 15°C and cloudy." │
29	│ │
30	└─────────────────────────────────────────────────────────────┘
31

The key insight: the LLM doesn't execute tools directly. It outputs a structured request (tool name + arguments), your code executes the tool, and you feed the result back to the LLM.

Why Tools Matter

Without tools, LLMs are limited to:

Knowledge frozen at training time
No access to private data
Can't take actions in the world
Can only generate text

With tools, LLMs can:

Access real-time information
Query your databases
Execute code and analyze data
Send emails, create tickets, deploy code
Integrate with any API

Tools are what turn chat into action.

Basic Tool Implementation

OpenAI Function Calling

Here's the standard pattern with OpenAI:

python

import openai
import json
 
# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g., 'London'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature units"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function", 
        "function": {
            "name": "search_web",
            "description": "Search the web for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    }
                },
                "required": ["query"]
            }
        }
    }
]
 
# Tool implementations
def get_weather(city: str, units: str = "celsius") -> str:
    # In production, call a real weather API
    return f"Weather in {city}: 15°C, Cloudy"
 
def search_web(query: str) -> str:
    # In production, use a search API
    return f"Search results for '{query}': ..."
 
tool_functions = {
    "get_weather": get_weather,
    "search_web": search_web
}
 
# Main loop
def run_agent(user_message: str) -> str:
    client = openai.OpenAI()
    messages = [{"role": "user", "content": user_message}]
    
    while True:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools
        )
        
        message = response.choices[0].message
        
        # Check if LLM wants to use tools
        if message.tool_calls:
            # Add assistant message with tool calls
            messages.append(message)
            
            # Execute each tool
            for tool_call in message.tool_calls:
                function_name = tool_call.function.name
                arguments = json.loads(tool_call.function.arguments)
                
                # Call the actual function
                result = tool_functions[function_name](**arguments)
                
                # Add tool result to messages
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result
                })
        else:
            # No more tool calls, return final response
            return message.content
 
 
# Usage
response = run_agent("What's the weather in Tokyo and London?")
print(response)
 

Tool Design Principles

1. Clear, Specific Descriptions

The LLM decides which tool to use based on descriptions. Be precise:

python

# ❌ Bad: Vague description
{
    "name": "search",
    "description": "Search for things"
}
 
# ✅ Good: Specific description
{
    "name": "search_documentation",
    "description": "Search the official product documentation for API references, tutorials, and guides. Use for technical questions about how to use our product."
}
 

2. Constrained Parameters

Use enums and clear types to prevent errors:

python

# ❌ Bad: Open-ended parameter
{
    "name": "priority",
    "type": "string",
    "description": "Task priority"
}
 
# ✅ Good: Constrained parameter
{
    "name": "priority",
    "type": "string",
    "enum": ["low", "medium", "high", "critical"],
    "description": "Task priority level"
}
 

3. Atomic Operations

Each tool should do one thing well:

python

# ❌ Bad: Tool does too much
{
    "name": "manage_user",
    "description": "Create, update, delete, or fetch user"
}
 
# ✅ Good: Separate tools
{
    "name": "create_user",
    "description": "Create a new user account"
}
{
    "name": "get_user", 
    "description": "Fetch user details by ID or email"
}
{
    "name": "update_user",
    "description": "Update user profile information"
}
 

4. Meaningful Return Values

Return structured, actionable data:

python

# ❌ Bad: Just a status
def create_task(title: str) -> str:
    # ... create task ...
    return "Task created"
 
# ✅ Good: Return useful information
def create_task(title: str) -> str:
    task = db.tasks.create(title=title)
    return json.dumps({
        "task_id": task.id,
        "title": task.title,
        "status": "created",
        "url": f"https://app.example.com/tasks/{task.id}"
    })
 

Code Execution as a Tool

The most powerful tool you can give an LLM is the ability to execute code. But it's also the most dangerous.

The Wrong Way (Never Do This)

python

# ⚠️ DANGEROUS: Never execute LLM-generated code directly
def run_code(code: str) -> str:
    exec(code)  # This can delete files, exfiltrate data, anything
    return "Done"
 

The Right Way: Sandboxed Execution

python

from hopx import Sandbox
 
def run_python_code(code: str) -> str:
    """Execute Python code in an isolated sandbox"""
    sandbox = Sandbox.create(template="code-interpreter")
    
    try:
        # Write code to sandbox
        sandbox.files.write("/app/script.py", code)
        
        # Execute in isolation
        result = sandbox.commands.run("python /app/script.py", timeout=30)
        
        if result.exit_code == 0:
            return result.stdout
        else:
            return f"Error: {result.stderr}"
    
    finally:
        sandbox.kill()  # Destroy sandbox completely
 
 
# Define as a tool
code_execution_tool = {
    "type": "function",
    "function": {
        "name": "run_python_code",
        "description": "Execute Python code to perform calculations, data analysis, or any programmatic task. Use this when you need to compute something precisely.",
        "parameters": {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "Python code to execute. Must be complete and runnable."
                }
            },
            "required": ["code"]
        }
    }
}
 

The sandbox ensures:

Code can't access your host filesystem
Code can't make unauthorized network requests
Code can't persist beyond the execution
Resource limits prevent infinite loops

See Why AI Agents Need Isolated Code Execution for more on security.

Common Tool Categories

Information Retrieval

python

tools = [
    {
        "name": "search_web",
        "description": "Search the internet for current information"
    },
    {
        "name": "search_docs",
        "description": "Search internal documentation and knowledge base"
    },
    {
        "name": "get_url_content",
        "description": "Fetch and read content from a specific URL"
    },
    {
        "name": "query_database",
        "description": "Run a read-only SQL query against the database"
    }
]
 

Data Operations

python

tools = [
    {
        "name": "read_file",
        "description": "Read contents of a file"
    },
    {
        "name": "write_file",
        "description": "Write content to a file"
    },
    {
        "name": "analyze_csv",
        "description": "Load and analyze a CSV file using pandas"
    },
    {
        "name": "create_chart",
        "description": "Generate a chart from data"
    }
]
 

Communication

python

tools = [
    {
        "name": "send_email",
        "description": "Send an email to specified recipients"
    },
    {
        "name": "send_slack_message",
        "description": "Post a message to a Slack channel"
    },
    {
        "name": "create_ticket",
        "description": "Create a support ticket in the ticketing system"
    }
]
 

Actions

python

tools = [
    {
        "name": "run_python_code",
        "description": "Execute Python code in a sandbox"
    },
    {
        "name": "deploy_to_staging",
        "description": "Deploy the current branch to staging environment"
    },
    {
        "name": "run_tests",
        "description": "Run the test suite and return results"
    }
]
 

Advanced: Multi-Tool Orchestration

Real agents often need to use multiple tools in sequence:

python

import openai
import json
from hopx import Sandbox
 
class ToolOrchestrator:
    def __init__(self):
        self.client = openai.OpenAI()
        self.tools = self._define_tools()
        self.max_iterations = 10
    
    def run(self, task: str) -> str:
        messages = [
            {"role": "system", "content": self._system_prompt()},
            {"role": "user", "content": task}
        ]
        
        for _ in range(self.max_iterations):
            response = self.client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=self.tools
            )
            
            message = response.choices[0].message
            
            if not message.tool_calls:
                return message.content
            
            messages.append(message)
            
            # Execute all tool calls
            for tool_call in message.tool_calls:
                result = self._execute_tool(
                    tool_call.function.name,
                    json.loads(tool_call.function.arguments)
                )
                
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result
                })
        
        return "Max iterations reached"
    
    def _execute_tool(self, name: str, args: dict) -> str:
        """Route to appropriate tool implementation"""
        
        if name == "search_web":
            return self._search_web(args["query"])
        
        elif name == "run_python":
            return self._run_python(args["code"])
        
        elif name == "read_file":
            return self._read_file(args["path"])
        
        elif name == "write_file":
            return self._write_file(args["path"], args["content"])
        
        else:
            return f"Unknown tool: {name}"
    
    def _run_python(self, code: str) -> str:
        """Execute Python in sandbox"""
        sandbox = Sandbox.create(template="code-interpreter")
        
        try:
            sandbox.files.write("/app/code.py", code)
            result = sandbox.commands.run("python /app/code.py", timeout=60)
            
            output = result.stdout if result.exit_code == 0 else f"Error: {result.stderr}"
            return output[:5000]  # Truncate long outputs
        
        finally:
            sandbox.kill()
    
    def _search_web(self, query: str) -> str:
        # Implement with your preferred search API
        return f"Search results for: {query}"
    
    def _read_file(self, path: str) -> str:
        sandbox = Sandbox.create(template="code-interpreter")
        try:
            content = sandbox.files.read(path)
            return content[:10000]
        except:
            return f"File not found: {path}"
        finally:
            sandbox.kill()
    
    def _write_file(self, path: str, content: str) -> str:
        sandbox = Sandbox.create(template="code-interpreter")
        try:
            sandbox.files.write(path, content)
            return f"Successfully wrote to {path}"
        finally:
            sandbox.kill()
    
    def _define_tools(self) -> list:
        return [
            {
                "type": "function",
                "function": {
                    "name": "search_web",
                    "description": "Search the web for current information",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "query": {"type": "string", "description": "Search query"}
                        },
                        "required": ["query"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "run_python",
                    "description": "Execute Python code for calculations, data analysis, or any programmatic task",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "code": {"type": "string", "description": "Complete Python code to execute"}
                        },
                        "required": ["code"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "read_file",
                    "description": "Read the contents of a file",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "path": {"type": "string", "description": "Path to the file"}
                        },
                        "required": ["path"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "write_file",
                    "description": "Write content to a file",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "path": {"type": "string", "description": "Path to the file"},
                            "content": {"type": "string", "description": "Content to write"}
                        },
                        "required": ["path", "content"]
                    }
                }
            }
        ]
    
    def _system_prompt(self) -> str:
        return """You are a helpful AI assistant with access to tools.
 
Use tools when needed to complete tasks. You can:
- Search the web for current information
- Execute Python code for calculations and data analysis
- Read and write files
 
Think step by step. Use the most appropriate tool for each sub-task.
When you have enough information to answer, provide a clear response."""
 
 
# Usage
orchestrator = ToolOrchestrator()
result = orchestrator.run(
    "Find the current Bitcoin price, calculate what 0.5 BTC would be worth, "
    "and save the result to a file called 'btc_value.txt'"
)
print(result)
 

Parallel Tool Execution

When tools are independent, run them in parallel:

python

import asyncio
import openai
 
async def execute_tools_parallel(tool_calls: list) -> list:
    """Execute multiple tool calls concurrently"""
    
    async def execute_single(tool_call):
        name = tool_call.function.name
        args = json.loads(tool_call.function.arguments)
        
        # Run in thread pool to avoid blocking
        loop = asyncio.get_event_loop()
        result = await loop.run_in_executor(
            None, 
            lambda: tool_functions[name](**args)
        )
        
        return {
            "tool_call_id": tool_call.id,
            "content": result
        }
    
    # Execute all tools concurrently
    results = await asyncio.gather(*[
        execute_single(tc) for tc in tool_calls
    ])
    
    return results
 
 
# In the main loop
if message.tool_calls:
    results = asyncio.run(execute_tools_parallel(message.tool_calls))
    for result in results:
        messages.append({"role": "tool", **result})
 

Error Handling

Tools fail. Handle it gracefully:

python

def execute_tool_safely(name: str, args: dict) -> str:
    """Execute a tool with proper error handling"""
    
    try:
        # Validate tool exists
        if name not in tool_functions:
            return json.dumps({
                "error": f"Unknown tool: {name}",
                "available_tools": list(tool_functions.keys())
            })
        
        # Execute with timeout
        import signal
        
        def timeout_handler(signum, frame):
            raise TimeoutError("Tool execution timed out")
        
        signal.signal(signal.SIGALRM, timeout_handler)
        signal.alarm(30)  # 30 second timeout
        
        try:
            result = tool_functions[name](**args)
        finally:
            signal.alarm(0)  # Cancel timeout
        
        return result
    
    except TimeoutError as e:
        return json.dumps({
            "error": "Tool execution timed out",
            "tool": name,
            "suggestion": "Try a simpler query or break into smaller steps"
        })
    
    except TypeError as e:
        return json.dumps({
            "error": f"Invalid arguments: {str(e)}",
            "tool": name,
            "received_args": args
        })
    
    except Exception as e:
        return json.dumps({
            "error": f"Tool execution failed: {str(e)}",
            "tool": name,
            "error_type": type(e).__name__
        })
 

Tool Use Patterns

Pattern 1: Retrieval-Augmented Generation (RAG)

Search first, then answer:

python

def rag_answer(question: str) -> str:
    # Step 1: Search for relevant information
    search_results = search_knowledge_base(question)
    
    # Step 2: Generate answer using retrieved context
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Use this context to answer: {search_results}"},
            {"role": "user", "content": question}
        ]
    )
    
    return response.choices[0].message.content
 

Pattern 2: Verification Loop

Use tools to verify LLM outputs:

python

def verified_answer(question: str) -> str:
    # Generate initial answer
    answer = generate_answer(question)
    
    # Verify with tools
    verification = run_python(f"""
# Verify the claim: {answer}
# Check against authoritative sources
result = verify_claim("{answer}")
print(result)
""")
    
    if "verified" in verification.lower():
        return answer
    else:
        # Regenerate with verification feedback
        return generate_answer(f"{question}\n\nNote: {verification}")
 

Pattern 3: Progressive Disclosure

Start with cheap tools, escalate as needed:

python

def progressive_search(query: str) -> str:
    # Level 1: Check cache (free, instant)
    cached = check_cache(query)
    if cached:
        return cached
    
    # Level 2: Search local docs (cheap, fast)
    local = search_local_docs(query)
    if is_sufficient(local):
        return local
    
    # Level 3: Search web (expensive, slow)
    web = search_web(query)
    cache_result(query, web)
    return web
 

Security Best Practices

1. Allowlist Tools Per Use Case

python

# Different tool sets for different contexts
CUSTOMER_SUPPORT_TOOLS = ["search_faq", "create_ticket", "get_order_status"]
ADMIN_TOOLS = ["run_sql", "modify_user", "deploy_code"]
 
def get_tools_for_user(user_role: str) -> list:
    if user_role == "admin":
        return ADMIN_TOOLS
    else:
        return CUSTOMER_SUPPORT_TOOLS
 

2. Validate All Inputs

python

def run_sql(query: str) -> str:
    # Validate: Read-only queries only
    if any(word in query.upper() for word in ["INSERT", "UPDATE", "DELETE", "DROP"]):
        return "Error: Only SELECT queries are allowed"
    
    # Validate: No system tables
    if "information_schema" in query.lower():
        return "Error: System table access not allowed"
    
    # Execute
    return execute_query(query)
 

3. Rate Limit Tool Calls

python

from collections import defaultdict
import time
 
tool_calls = defaultdict(list)
 
def rate_limited_execute(user_id: str, tool_name: str, args: dict) -> str:
    now = time.time()
    recent_calls = [t for t in tool_calls[user_id] if now - t < 60]
    
    if len(recent_calls) >= 10:
        return "Error: Rate limit exceeded. Try again in a minute."
    
    tool_calls[user_id].append(now)
    return execute_tool(tool_name, args)
 

4. Audit All Tool Usage

python

import logging
 
def audited_execute(user_id: str, tool_name: str, args: dict) -> str:
    logging.info(f"TOOL_CALL | user={user_id} | tool={tool_name} | args={args}")
    
    result = execute_tool(tool_name, args)
    
    logging.info(f"TOOL_RESULT | user={user_id} | tool={tool_name} | result_length={len(result)}")
    
    return result
 

Measuring Tool Effectiveness

Track these metrics:

python

from dataclasses import dataclass
from datetime import datetime
 
@dataclass
class ToolMetrics:
    tool_name: str
    call_count: int
    success_rate: float
    avg_latency_ms: float
    error_types: dict
    
def analyze_tool_usage(logs: list) -> dict:
    metrics = {}
    
    for tool_name in set(log["tool"] for log in logs):
        tool_logs = [l for l in logs if l["tool"] == tool_name]
        
        metrics[tool_name] = ToolMetrics(
            tool_name=tool_name,
            call_count=len(tool_logs),
            success_rate=sum(1 for l in tool_logs if l["success"]) / len(tool_logs),
            avg_latency_ms=sum(l["latency"] for l in tool_logs) / len(tool_logs),
            error_types=count_errors(tool_logs)
        )
    
    return metrics
 

Key questions:

Which tools are used most?
Which tools fail most often?
Are there tools the LLM never uses? (Remove or improve descriptions)
Are there missing tools? (Check for failed attempts)

Conclusion

Tool use is what transforms LLMs from text generators into agents that can act in the world:

Define clear tools with specific descriptions
Sandbox code execution — never run LLM code directly
Handle errors gracefully — tools fail, plan for it
Secure by default — allowlist, validate, rate limit, audit

Start with 2-3 essential tools. Add more only when you see the need. A focused agent with good tools beats a confused agent with many.

Ready to add secure code execution to your agent's toolkit? Get started with HopX — sandboxes that spin up in 100ms.

Tool Use: How AI Agents Interact with the Real World

Tool Use: How AI Agents Interact with the Real World

What Is Tool Use?

Why Tools Matter

Basic Tool Implementation

OpenAI Function Calling

Tool Design Principles

1. Clear, Specific Descriptions

2. Constrained Parameters

3. Atomic Operations

4. Meaningful Return Values

Code Execution as a Tool

The Wrong Way (Never Do This)

The Right Way: Sandboxed Execution

Common Tool Categories

Information Retrieval

Data Operations

Communication

Actions

Advanced: Multi-Tool Orchestration

Parallel Tool Execution

Error Handling

Tool Use Patterns

Pattern 1: Retrieval-Augmented Generation (RAG)

Pattern 2: Verification Loop

Pattern 3: Progressive Disclosure

Security Best Practices

1. Allowlist Tools Per Use Case

2. Validate All Inputs

3. Rate Limit Tool Calls

4. Audit All Tool Usage

Measuring Tool Effectiveness

Conclusion

Further Reading

Related articles

Evaluator-Optimizer Loop: Continuous AI Agent Improvement

Human-in-the-Loop: Balancing AI Autonomy and Human Control

Memory for AI Agents: Short-term, Long-term, and RAG

1	import openai
2	import json
3
4	# Define tools
5	tools = [
6	{
7	"type": "function",
8	"function": {
9	"name": "get_weather",
10	"description": "Get current weather for a city",
11	"parameters": {
12	"type": "object",
13	"properties": {
14	"city": {
15	"type": "string",
16	"description": "City name, e.g., 'London'"
17	},
18	"units": {
19	"type": "string",
20	"enum": ["celsius", "fahrenheit"],
21	"description": "Temperature units"
22	}
23	},
24	"required": ["city"]
25	}
26	}
27	},
28	{
29	"type": "function",
30	"function": {
31	"name": "search_web",
32	"description": "Search the web for current information",
33	"parameters": {
34	"type": "object",
35	"properties": {
36	"query": {
37	"type": "string",
38	"description": "Search query"
39	}
40	},
41	"required": ["query"]
42	}
43	}
44	}
45	]
46
47	# Tool implementations
48	def get_weather(city: str, units: str = "celsius") -> str:
49	# In production, call a real weather API
50	return f"Weather in {city}: 15°C, Cloudy"
51
52	def search_web(query: str) -> str:
53	# In production, use a search API
54	return f"Search results for '{query}': ..."
55
56	tool_functions = {
57	"get_weather": get_weather,
58	"search_web": search_web
59	}
60
61	# Main loop
62	def run_agent(user_message: str) -> str:
63	client = openai.OpenAI()
64	messages = [{"role": "user", "content": user_message}]
65
66	while True:
67	response = client.chat.completions.create(
68	model="gpt-4o",
69	messages=messages,
70	tools=tools
71	)
72
73	message = response.choices[0].message
74
75	# Check if LLM wants to use tools
76	if message.tool_calls:
77	# Add assistant message with tool calls
78	messages.append(message)
79
80	# Execute each tool
81	for tool_call in message.tool_calls:
82	function_name = tool_call.function.name
83	arguments = json.loads(tool_call.function.arguments)
84
85	# Call the actual function
86	result = tool_functions[function_name](**arguments)
87
88	# Add tool result to messages
89	messages.append({
90	"role": "tool",
91	"tool_call_id": tool_call.id,
92	"content": result
93	})
94	else:
95	# No more tool calls, return final response
96	return message.content
97
98
99	# Usage
100	response = run_agent("What's the weather in Tokyo and London?")
101	print(response)
102

1	# ❌ Bad: Vague description
2	{
3	"name": "search",
4	"description": "Search for things"
5	}
6
7	# ✅ Good: Specific description
8	{
9	"name": "search_documentation",
10	"description": "Search the official product documentation for API references, tutorials, and guides. Use for technical questions about how to use our product."
11	}
12

1	# ❌ Bad: Open-ended parameter
2	{
3	"name": "priority",
4	"type": "string",
5	"description": "Task priority"
6	}
7
8	# ✅ Good: Constrained parameter
9	{
10	"name": "priority",
11	"type": "string",
12	"enum": ["low", "medium", "high", "critical"],
13	"description": "Task priority level"
14	}
15

1	# ❌ Bad: Tool does too much
2	{
3	"name": "manage_user",
4	"description": "Create, update, delete, or fetch user"
5	}
6
7	# ✅ Good: Separate tools
8	{
9	"name": "create_user",
10	"description": "Create a new user account"
11	}
12	{
13	"name": "get_user",
14	"description": "Fetch user details by ID or email"
15	}
16	{
17	"name": "update_user",
18	"description": "Update user profile information"
19	}
20

1	# ❌ Bad: Just a status
2	def create_task(title: str) -> str:
3	# ... create task ...
4	return "Task created"
5
6	# ✅ Good: Return useful information
7	def create_task(title: str) -> str:
8	task = db.tasks.create(title=title)
9	return json.dumps({
10	"task_id": task.id,
11	"title": task.title,
12	"status": "created",
13	"url": f"https://app.example.com/tasks/{task.id}"
14	})
15

1	# ⚠️ DANGEROUS: Never execute LLM-generated code directly
2	def run_code(code: str) -> str:
3	exec(code) # This can delete files, exfiltrate data, anything
4	return "Done"
5

1	from hopx import Sandbox
2
3	def run_python_code(code: str) -> str:
4	"""Execute Python code in an isolated sandbox"""
5	sandbox = Sandbox.create(template="code-interpreter")
6
7	try:
8	# Write code to sandbox
9	sandbox.files.write("/app/script.py", code)
10
11	# Execute in isolation
12	result = sandbox.commands.run("python /app/script.py", timeout=30)
13
14	if result.exit_code == 0:
15	return result.stdout
16	else:
17	return f"Error: {result.stderr}"
18
19	finally:
20	sandbox.kill() # Destroy sandbox completely
21
22
23	# Define as a tool
24	code_execution_tool = {
25	"type": "function",
26	"function": {
27	"name": "run_python_code",
28	"description": "Execute Python code to perform calculations, data analysis, or any programmatic task. Use this when you need to compute something precisely.",
29	"parameters": {
30	"type": "object",
31	"properties": {
32	"code": {
33	"type": "string",
34	"description": "Python code to execute. Must be complete and runnable."
35	}
36	},
37	"required": ["code"]
38	}
39	}
40	}
41

1	tools = [
2	{
3	"name": "search_web",
4	"description": "Search the internet for current information"
5	},
6	{
7	"name": "search_docs",
8	"description": "Search internal documentation and knowledge base"
9	},
10	{
11	"name": "get_url_content",
12	"description": "Fetch and read content from a specific URL"
13	},
14	{
15	"name": "query_database",
16	"description": "Run a read-only SQL query against the database"
17	}
18	]
19

1	tools = [
2	{
3	"name": "read_file",
4	"description": "Read contents of a file"
5	},
6	{
7	"name": "write_file",
8	"description": "Write content to a file"
9	},
10	{
11	"name": "analyze_csv",
12	"description": "Load and analyze a CSV file using pandas"
13	},
14	{
15	"name": "create_chart",
16	"description": "Generate a chart from data"
17	}
18	]
19

1	tools = [
2	{
3	"name": "send_email",
4	"description": "Send an email to specified recipients"
5	},
6	{
7	"name": "send_slack_message",
8	"description": "Post a message to a Slack channel"
9	},
10	{
11	"name": "create_ticket",
12	"description": "Create a support ticket in the ticketing system"
13	}
14	]
15

1	tools = [
2	{
3	"name": "run_python_code",
4	"description": "Execute Python code in a sandbox"
5	},
6	{
7	"name": "deploy_to_staging",
8	"description": "Deploy the current branch to staging environment"
9	},
10	{
11	"name": "run_tests",
12	"description": "Run the test suite and return results"
13	}
14	]
15

1	import asyncio
2	import openai
3
4	async def execute_tools_parallel(tool_calls: list) -> list:
5	"""Execute multiple tool calls concurrently"""
6
7	async def execute_single(tool_call):
8	name = tool_call.function.name
9	args = json.loads(tool_call.function.arguments)
10
11	# Run in thread pool to avoid blocking
12	loop = asyncio.get_event_loop()
13	result = await loop.run_in_executor(
14	None,
15	lambda: tool_functions[name](**args)
16	)
17
18	return {
19	"tool_call_id": tool_call.id,
20	"content": result
21	}
22
23	# Execute all tools concurrently
24	results = await asyncio.gather(*[
25	execute_single(tc) for tc in tool_calls
26	])
27
28	return results
29
30
31	# In the main loop
32	if message.tool_calls:
33	results = asyncio.run(execute_tools_parallel(message.tool_calls))
34	for result in results:
35	messages.append({"role": "tool", **result})
36

1	def execute_tool_safely(name: str, args: dict) -> str:
2	"""Execute a tool with proper error handling"""
3
4	try:
5	# Validate tool exists
6	if name not in tool_functions:
7	return json.dumps({
8	"error": f"Unknown tool: {name}",
9	"available_tools": list(tool_functions.keys())
10	})
11
12	# Execute with timeout
13	import signal
14
15	def timeout_handler(signum, frame):
16	raise TimeoutError("Tool execution timed out")
17
18	signal.signal(signal.SIGALRM, timeout_handler)
19	signal.alarm(30) # 30 second timeout
20
21	try:
22	result = tool_functions[name](**args)
23	finally:
24	signal.alarm(0) # Cancel timeout
25
26	return result
27
28	except TimeoutError as e:
29	return json.dumps({
30	"error": "Tool execution timed out",
31	"tool": name,
32	"suggestion": "Try a simpler query or break into smaller steps"
33	})
34
35	except TypeError as e:
36	return json.dumps({
37	"error": f"Invalid arguments: {str(e)}",
38	"tool": name,
39	"received_args": args
40	})
41
42	except Exception as e:
43	return json.dumps({
44	"error": f"Tool execution failed: {str(e)}",
45	"tool": name,
46	"error_type": type(e).__name__
47	})
48

1	def rag_answer(question: str) -> str:
2	# Step 1: Search for relevant information
3	search_results = search_knowledge_base(question)
4
5	# Step 2: Generate answer using retrieved context
6	response = client.chat.completions.create(
7	model="gpt-4o",
8	messages=[
9	{"role": "system", "content": f"Use this context to answer: {search_results}"},
10	{"role": "user", "content": question}
11	]
12	)
13
14	return response.choices[0].message.content
15

1	def verified_answer(question: str) -> str:
2	# Generate initial answer
3	answer = generate_answer(question)
4
5	# Verify with tools
6	verification = run_python(f"""
7	# Verify the claim: {answer}
8	# Check against authoritative sources
9	result = verify_claim("{answer}")
10	print(result)
11	""")
12
13	if "verified" in verification.lower():
14	return answer
15	else:
16	# Regenerate with verification feedback
17	return generate_answer(f"{question}\n\nNote: {verification}")
18

1	def progressive_search(query: str) -> str:
2	# Level 1: Check cache (free, instant)
3	cached = check_cache(query)
4	if cached:
5	return cached
6
7	# Level 2: Search local docs (cheap, fast)
8	local = search_local_docs(query)
9	if is_sufficient(local):
10	return local
11
12	# Level 3: Search web (expensive, slow)
13	web = search_web(query)
14	cache_result(query, web)
15	return web
16

1	# Different tool sets for different contexts
2	CUSTOMER_SUPPORT_TOOLS = ["search_faq", "create_ticket", "get_order_status"]
3	ADMIN_TOOLS = ["run_sql", "modify_user", "deploy_code"]
4
5	def get_tools_for_user(user_role: str) -> list:
6	if user_role == "admin":
7	return ADMIN_TOOLS
8	else:
9	return CUSTOMER_SUPPORT_TOOLS
10

1	def run_sql(query: str) -> str:
2	# Validate: Read-only queries only
3	if any(word in query.upper() for word in ["INSERT", "UPDATE", "DELETE", "DROP"]):
4	return "Error: Only SELECT queries are allowed"
5
6	# Validate: No system tables
7	if "information_schema" in query.lower():
8	return "Error: System table access not allowed"
9
10	# Execute
11	return execute_query(query)
12

1	from collections import defaultdict
2	import time
3
4	tool_calls = defaultdict(list)
5
6	def rate_limited_execute(user_id: str, tool_name: str, args: dict) -> str:
7	now = time.time()
8	recent_calls = [t for t in tool_calls[user_id] if now - t < 60]
9
10	if len(recent_calls) >= 10:
11	return "Error: Rate limit exceeded. Try again in a minute."
12
13	tool_calls[user_id].append(now)
14	return execute_tool(tool_name, args)
15

1	import logging
2
3	def audited_execute(user_id: str, tool_name: str, args: dict) -> str:
4	logging.info(f"TOOL_CALL \| user={user_id} \| tool={tool_name} \| args={args}")
5
6	result = execute_tool(tool_name, args)
7
8	logging.info(f"TOOL_RESULT \| user={user_id} \| tool={tool_name} \| result_length={len(result)}")
9
10	return result
11

1	from dataclasses import dataclass
2	from datetime import datetime
3
4	@dataclass
5	class ToolMetrics:
6	tool_name: str
7	call_count: int
8	success_rate: float
9	avg_latency_ms: float
10	error_types: dict
11
12	def analyze_tool_usage(logs: list) -> dict:
13	metrics = {}
14
15	for tool_name in set(log["tool"] for log in logs):
16	tool_logs = [l for l in logs if l["tool"] == tool_name]
17
18	metrics[tool_name] = ToolMetrics(
19	tool_name=tool_name,
20	call_count=len(tool_logs),
21	success_rate=sum(1 for l in tool_logs if l["success"]) / len(tool_logs),
22	avg_latency_ms=sum(l["latency"] for l in tool_logs) / len(tool_logs),
23	error_types=count_errors(tool_logs)
24	)
25
26	return metrics
27