Back to Blog

Tool Use: How AI Agents Interact with the Real World

AI AgentsAlin Dobra14 min read

Tool Use: How AI Agents Interact with the Real World

An LLM without tools is like a brain without a body. It can think, reason, and generate text—but it can't do anything.

Tool use changes everything. Give an LLM access to tools, and suddenly it can search the web, query databases, execute code, send emails, and interact with any API. It transforms from a text generator into an autonomous agent.

This guide shows you how to implement tool use properly—from basic function calling to complex multi-tool orchestration.

What Is Tool Use?

Tool use is a pattern where an LLM decides when and how to call external functions to accomplish a task:

text
1
2
                         User Query                          
3
4
                              
5
                              
6
7
                           LLM                               
8
                                                             
9
  "I need to check the weather. I'll use the weather tool"  
10
                                                             
11
12
                              
13
                              
14
              
15
                Tool Call: get_weather       
16
                Args: {"city": "London"}     
17
              
18
                              
19
                              
20
              
21
                Tool Result: "15°C, Cloudy"  
22
              
23
                              
24
                              
25
26
                           LLM                               
27
                                                             
28
  "The weather in London is 15°C and cloudy."               
29
                                                             
30
31
 

The key insight: the LLM doesn't execute tools directly. It outputs a structured request (tool name + arguments), your code executes the tool, and you feed the result back to the LLM.

Why Tools Matter

Without tools, LLMs are limited to:

  • Knowledge frozen at training time
  • No access to private data
  • Can't take actions in the world
  • Can only generate text

With tools, LLMs can:

  • Access real-time information
  • Query your databases
  • Execute code and analyze data
  • Send emails, create tickets, deploy code
  • Integrate with any API

Tools are what turn chat into action.

Basic Tool Implementation

OpenAI Function Calling

Here's the standard pattern with OpenAI:

python
1
import openai
2
import json
3
 
4
# Define tools
5
tools = [
6
    {
7
        "type": "function",
8
        "function": {
9
            "name": "get_weather",
10
            "description": "Get current weather for a city",
11
            "parameters": {
12
                "type": "object",
13
                "properties": {
14
                    "city": {
15
                        "type": "string",
16
                        "description": "City name, e.g., 'London'"
17
                    },
18
                    "units": {
19
                        "type": "string",
20
                        "enum": ["celsius", "fahrenheit"],
21
                        "description": "Temperature units"
22
                    }
23
                },
24
                "required": ["city"]
25
            }
26
        }
27
    },
28
    {
29
        "type": "function", 
30
        "function": {
31
            "name": "search_web",
32
            "description": "Search the web for current information",
33
            "parameters": {
34
                "type": "object",
35
                "properties": {
36
                    "query": {
37
                        "type": "string",
38
                        "description": "Search query"
39
                    }
40
                },
41
                "required": ["query"]
42
            }
43
        }
44
    }
45
]
46
 
47
# Tool implementations
48
def get_weather(city: str, units: str = "celsius") -> str:
49
    # In production, call a real weather API
50
    return f"Weather in {city}: 15°C, Cloudy"
51
 
52
def search_web(query: str) -> str:
53
    # In production, use a search API
54
    return f"Search results for '{query}': ..."
55
 
56
tool_functions = {
57
    "get_weather": get_weather,
58
    "search_web": search_web
59
}
60
 
61
# Main loop
62
def run_agent(user_message: str) -> str:
63
    client = openai.OpenAI()
64
    messages = [{"role": "user", "content": user_message}]
65
    
66
    while True:
67
        response = client.chat.completions.create(
68
            model="gpt-4o",
69
            messages=messages,
70
            tools=tools
71
        )
72
        
73
        message = response.choices[0].message
74
        
75
        # Check if LLM wants to use tools
76
        if message.tool_calls:
77
            # Add assistant message with tool calls
78
            messages.append(message)
79
            
80
            # Execute each tool
81
            for tool_call in message.tool_calls:
82
                function_name = tool_call.function.name
83
                arguments = json.loads(tool_call.function.arguments)
84
                
85
                # Call the actual function
86
                result = tool_functions[function_name](**arguments)
87
                
88
                # Add tool result to messages
89
                messages.append({
90
                    "role": "tool",
91
                    "tool_call_id": tool_call.id,
92
                    "content": result
93
                })
94
        else:
95
            # No more tool calls, return final response
96
            return message.content
97
 
98
 
99
# Usage
100
response = run_agent("What's the weather in Tokyo and London?")
101
print(response)
102
 

Tool Design Principles

1. Clear, Specific Descriptions

The LLM decides which tool to use based on descriptions. Be precise:

python
1
# ❌ Bad: Vague description
2
{
3
    "name": "search",
4
    "description": "Search for things"
5
}
6
 
7
# ✅ Good: Specific description
8
{
9
    "name": "search_documentation",
10
    "description": "Search the official product documentation for API references, tutorials, and guides. Use for technical questions about how to use our product."
11
}
12
 

2. Constrained Parameters

Use enums and clear types to prevent errors:

python
1
# ❌ Bad: Open-ended parameter
2
{
3
    "name": "priority",
4
    "type": "string",
5
    "description": "Task priority"
6
}
7
 
8
# ✅ Good: Constrained parameter
9
{
10
    "name": "priority",
11
    "type": "string",
12
    "enum": ["low", "medium", "high", "critical"],
13
    "description": "Task priority level"
14
}
15
 

3. Atomic Operations

Each tool should do one thing well:

python
1
# ❌ Bad: Tool does too much
2
{
3
    "name": "manage_user",
4
    "description": "Create, update, delete, or fetch user"
5
}
6
 
7
# ✅ Good: Separate tools
8
{
9
    "name": "create_user",
10
    "description": "Create a new user account"
11
}
12
{
13
    "name": "get_user", 
14
    "description": "Fetch user details by ID or email"
15
}
16
{
17
    "name": "update_user",
18
    "description": "Update user profile information"
19
}
20
 

4. Meaningful Return Values

Return structured, actionable data:

python
1
# ❌ Bad: Just a status
2
def create_task(title: str) -> str:
3
    # ... create task ...
4
    return "Task created"
5
 
6
# ✅ Good: Return useful information
7
def create_task(title: str) -> str:
8
    task = db.tasks.create(title=title)
9
    return json.dumps({
10
        "task_id": task.id,
11
        "title": task.title,
12
        "status": "created",
13
        "url": f"https://app.example.com/tasks/{task.id}"
14
    })
15
 

Code Execution as a Tool

The most powerful tool you can give an LLM is the ability to execute code. But it's also the most dangerous.

The Wrong Way (Never Do This)

python
1
# ⚠️ DANGEROUS: Never execute LLM-generated code directly
2
def run_code(code: str) -> str:
3
    exec(code)  # This can delete files, exfiltrate data, anything
4
    return "Done"
5
 

The Right Way: Sandboxed Execution

python
1
from hopx import Sandbox
2
 
3
def run_python_code(code: str) -> str:
4
    """Execute Python code in an isolated sandbox"""
5
    sandbox = Sandbox.create(template="code-interpreter")
6
    
7
    try:
8
        # Write code to sandbox
9
        sandbox.files.write("/app/script.py", code)
10
        
11
        # Execute in isolation
12
        result = sandbox.commands.run("python /app/script.py", timeout=30)
13
        
14
        if result.exit_code == 0:
15
            return result.stdout
16
        else:
17
            return f"Error: {result.stderr}"
18
    
19
    finally:
20
        sandbox.kill()  # Destroy sandbox completely
21
 
22
 
23
# Define as a tool
24
code_execution_tool = {
25
    "type": "function",
26
    "function": {
27
        "name": "run_python_code",
28
        "description": "Execute Python code to perform calculations, data analysis, or any programmatic task. Use this when you need to compute something precisely.",
29
        "parameters": {
30
            "type": "object",
31
            "properties": {
32
                "code": {
33
                    "type": "string",
34
                    "description": "Python code to execute. Must be complete and runnable."
35
                }
36
            },
37
            "required": ["code"]
38
        }
39
    }
40
}
41
 

The sandbox ensures:

  • Code can't access your host filesystem
  • Code can't make unauthorized network requests
  • Code can't persist beyond the execution
  • Resource limits prevent infinite loops

See Why AI Agents Need Isolated Code Execution for more on security.

Common Tool Categories

Information Retrieval

python
1
tools = [
2
    {
3
        "name": "search_web",
4
        "description": "Search the internet for current information"
5
    },
6
    {
7
        "name": "search_docs",
8
        "description": "Search internal documentation and knowledge base"
9
    },
10
    {
11
        "name": "get_url_content",
12
        "description": "Fetch and read content from a specific URL"
13
    },
14
    {
15
        "name": "query_database",
16
        "description": "Run a read-only SQL query against the database"
17
    }
18
]
19
 

Data Operations

python
1
tools = [
2
    {
3
        "name": "read_file",
4
        "description": "Read contents of a file"
5
    },
6
    {
7
        "name": "write_file",
8
        "description": "Write content to a file"
9
    },
10
    {
11
        "name": "analyze_csv",
12
        "description": "Load and analyze a CSV file using pandas"
13
    },
14
    {
15
        "name": "create_chart",
16
        "description": "Generate a chart from data"
17
    }
18
]
19
 

Communication

python
1
tools = [
2
    {
3
        "name": "send_email",
4
        "description": "Send an email to specified recipients"
5
    },
6
    {
7
        "name": "send_slack_message",
8
        "description": "Post a message to a Slack channel"
9
    },
10
    {
11
        "name": "create_ticket",
12
        "description": "Create a support ticket in the ticketing system"
13
    }
14
]
15
 

Actions

python
1
tools = [
2
    {
3
        "name": "run_python_code",
4
        "description": "Execute Python code in a sandbox"
5
    },
6
    {
7
        "name": "deploy_to_staging",
8
        "description": "Deploy the current branch to staging environment"
9
    },
10
    {
11
        "name": "run_tests",
12
        "description": "Run the test suite and return results"
13
    }
14
]
15
 

Advanced: Multi-Tool Orchestration

Real agents often need to use multiple tools in sequence:

python
1
import openai
2
import json
3
from hopx import Sandbox
4
 
5
class ToolOrchestrator:
6
    def __init__(self):
7
        self.client = openai.OpenAI()
8
        self.tools = self._define_tools()
9
        self.max_iterations = 10
10
    
11
    def run(self, task: str) -> str:
12
        messages = [
13
            {"role": "system", "content": self._system_prompt()},
14
            {"role": "user", "content": task}
15
        ]
16
        
17
        for _ in range(self.max_iterations):
18
            response = self.client.chat.completions.create(
19
                model="gpt-4o",
20
                messages=messages,
21
                tools=self.tools
22
            )
23
            
24
            message = response.choices[0].message
25
            
26
            if not message.tool_calls:
27
                return message.content
28
            
29
            messages.append(message)
30
            
31
            # Execute all tool calls
32
            for tool_call in message.tool_calls:
33
                result = self._execute_tool(
34
                    tool_call.function.name,
35
                    json.loads(tool_call.function.arguments)
36
                )
37
                
38
                messages.append({
39
                    "role": "tool",
40
                    "tool_call_id": tool_call.id,
41
                    "content": result
42
                })
43
        
44
        return "Max iterations reached"
45
    
46
    def _execute_tool(self, name: str, args: dict) -> str:
47
        """Route to appropriate tool implementation"""
48
        
49
        if name == "search_web":
50
            return self._search_web(args["query"])
51
        
52
        elif name == "run_python":
53
            return self._run_python(args["code"])
54
        
55
        elif name == "read_file":
56
            return self._read_file(args["path"])
57
        
58
        elif name == "write_file":
59
            return self._write_file(args["path"], args["content"])
60
        
61
        else:
62
            return f"Unknown tool: {name}"
63
    
64
    def _run_python(self, code: str) -> str:
65
        """Execute Python in sandbox"""
66
        sandbox = Sandbox.create(template="code-interpreter")
67
        
68
        try:
69
            sandbox.files.write("/app/code.py", code)
70
            result = sandbox.commands.run("python /app/code.py", timeout=60)
71
            
72
            output = result.stdout if result.exit_code == 0 else f"Error: {result.stderr}"
73
            return output[:5000]  # Truncate long outputs
74
        
75
        finally:
76
            sandbox.kill()
77
    
78
    def _search_web(self, query: str) -> str:
79
        # Implement with your preferred search API
80
        return f"Search results for: {query}"
81
    
82
    def _read_file(self, path: str) -> str:
83
        sandbox = Sandbox.create(template="code-interpreter")
84
        try:
85
            content = sandbox.files.read(path)
86
            return content[:10000]
87
        except:
88
            return f"File not found: {path}"
89
        finally:
90
            sandbox.kill()
91
    
92
    def _write_file(self, path: str, content: str) -> str:
93
        sandbox = Sandbox.create(template="code-interpreter")
94
        try:
95
            sandbox.files.write(path, content)
96
            return f"Successfully wrote to {path}"
97
        finally:
98
            sandbox.kill()
99
    
100
    def _define_tools(self) -> list:
101
        return [
102
            {
103
                "type": "function",
104
                "function": {
105
                    "name": "search_web",
106
                    "description": "Search the web for current information",
107
                    "parameters": {
108
                        "type": "object",
109
                        "properties": {
110
                            "query": {"type": "string", "description": "Search query"}
111
                        },
112
                        "required": ["query"]
113
                    }
114
                }
115
            },
116
            {
117
                "type": "function",
118
                "function": {
119
                    "name": "run_python",
120
                    "description": "Execute Python code for calculations, data analysis, or any programmatic task",
121
                    "parameters": {
122
                        "type": "object",
123
                        "properties": {
124
                            "code": {"type": "string", "description": "Complete Python code to execute"}
125
                        },
126
                        "required": ["code"]
127
                    }
128
                }
129
            },
130
            {
131
                "type": "function",
132
                "function": {
133
                    "name": "read_file",
134
                    "description": "Read the contents of a file",
135
                    "parameters": {
136
                        "type": "object",
137
                        "properties": {
138
                            "path": {"type": "string", "description": "Path to the file"}
139
                        },
140
                        "required": ["path"]
141
                    }
142
                }
143
            },
144
            {
145
                "type": "function",
146
                "function": {
147
                    "name": "write_file",
148
                    "description": "Write content to a file",
149
                    "parameters": {
150
                        "type": "object",
151
                        "properties": {
152
                            "path": {"type": "string", "description": "Path to the file"},
153
                            "content": {"type": "string", "description": "Content to write"}
154
                        },
155
                        "required": ["path", "content"]
156
                    }
157
                }
158
            }
159
        ]
160
    
161
    def _system_prompt(self) -> str:
162
        return """You are a helpful AI assistant with access to tools.
163
 
164
Use tools when needed to complete tasks. You can:
165
- Search the web for current information
166
- Execute Python code for calculations and data analysis
167
- Read and write files
168
 
169
Think step by step. Use the most appropriate tool for each sub-task.
170
When you have enough information to answer, provide a clear response."""
171
 
172
 
173
# Usage
174
orchestrator = ToolOrchestrator()
175
result = orchestrator.run(
176
    "Find the current Bitcoin price, calculate what 0.5 BTC would be worth, "
177
    "and save the result to a file called 'btc_value.txt'"
178
)
179
print(result)
180
 

Parallel Tool Execution

When tools are independent, run them in parallel:

python
1
import asyncio
2
import openai
3
 
4
async def execute_tools_parallel(tool_calls: list) -> list:
5
    """Execute multiple tool calls concurrently"""
6
    
7
    async def execute_single(tool_call):
8
        name = tool_call.function.name
9
        args = json.loads(tool_call.function.arguments)
10
        
11
        # Run in thread pool to avoid blocking
12
        loop = asyncio.get_event_loop()
13
        result = await loop.run_in_executor(
14
            None, 
15
            lambda: tool_functions[name](**args)
16
        )
17
        
18
        return {
19
            "tool_call_id": tool_call.id,
20
            "content": result
21
        }
22
    
23
    # Execute all tools concurrently
24
    results = await asyncio.gather(*[
25
        execute_single(tc) for tc in tool_calls
26
    ])
27
    
28
    return results
29
 
30
 
31
# In the main loop
32
if message.tool_calls:
33
    results = asyncio.run(execute_tools_parallel(message.tool_calls))
34
    for result in results:
35
        messages.append({"role": "tool", **result})
36
 

Error Handling

Tools fail. Handle it gracefully:

python
1
def execute_tool_safely(name: str, args: dict) -> str:
2
    """Execute a tool with proper error handling"""
3
    
4
    try:
5
        # Validate tool exists
6
        if name not in tool_functions:
7
            return json.dumps({
8
                "error": f"Unknown tool: {name}",
9
                "available_tools": list(tool_functions.keys())
10
            })
11
        
12
        # Execute with timeout
13
        import signal
14
        
15
        def timeout_handler(signum, frame):
16
            raise TimeoutError("Tool execution timed out")
17
        
18
        signal.signal(signal.SIGALRM, timeout_handler)
19
        signal.alarm(30)  # 30 second timeout
20
        
21
        try:
22
            result = tool_functions[name](**args)
23
        finally:
24
            signal.alarm(0)  # Cancel timeout
25
        
26
        return result
27
    
28
    except TimeoutError as e:
29
        return json.dumps({
30
            "error": "Tool execution timed out",
31
            "tool": name,
32
            "suggestion": "Try a simpler query or break into smaller steps"
33
        })
34
    
35
    except TypeError as e:
36
        return json.dumps({
37
            "error": f"Invalid arguments: {str(e)}",
38
            "tool": name,
39
            "received_args": args
40
        })
41
    
42
    except Exception as e:
43
        return json.dumps({
44
            "error": f"Tool execution failed: {str(e)}",
45
            "tool": name,
46
            "error_type": type(e).__name__
47
        })
48
 

Tool Use Patterns

Pattern 1: Retrieval-Augmented Generation (RAG)

Search first, then answer:

python
1
def rag_answer(question: str) -> str:
2
    # Step 1: Search for relevant information
3
    search_results = search_knowledge_base(question)
4
    
5
    # Step 2: Generate answer using retrieved context
6
    response = client.chat.completions.create(
7
        model="gpt-4o",
8
        messages=[
9
            {"role": "system", "content": f"Use this context to answer: {search_results}"},
10
            {"role": "user", "content": question}
11
        ]
12
    )
13
    
14
    return response.choices[0].message.content
15
 

Pattern 2: Verification Loop

Use tools to verify LLM outputs:

python
1
def verified_answer(question: str) -> str:
2
    # Generate initial answer
3
    answer = generate_answer(question)
4
    
5
    # Verify with tools
6
    verification = run_python(f"""
7
# Verify the claim: {answer}
8
# Check against authoritative sources
9
result = verify_claim("{answer}")
10
print(result)
11
""")
12
    
13
    if "verified" in verification.lower():
14
        return answer
15
    else:
16
        # Regenerate with verification feedback
17
        return generate_answer(f"{question}\n\nNote: {verification}")
18
 

Pattern 3: Progressive Disclosure

Start with cheap tools, escalate as needed:

python
1
def progressive_search(query: str) -> str:
2
    # Level 1: Check cache (free, instant)
3
    cached = check_cache(query)
4
    if cached:
5
        return cached
6
    
7
    # Level 2: Search local docs (cheap, fast)
8
    local = search_local_docs(query)
9
    if is_sufficient(local):
10
        return local
11
    
12
    # Level 3: Search web (expensive, slow)
13
    web = search_web(query)
14
    cache_result(query, web)
15
    return web
16
 

Security Best Practices

1. Allowlist Tools Per Use Case

python
1
# Different tool sets for different contexts
2
CUSTOMER_SUPPORT_TOOLS = ["search_faq", "create_ticket", "get_order_status"]
3
ADMIN_TOOLS = ["run_sql", "modify_user", "deploy_code"]
4
 
5
def get_tools_for_user(user_role: str) -> list:
6
    if user_role == "admin":
7
        return ADMIN_TOOLS
8
    else:
9
        return CUSTOMER_SUPPORT_TOOLS
10
 

2. Validate All Inputs

python
1
def run_sql(query: str) -> str:
2
    # Validate: Read-only queries only
3
    if any(word in query.upper() for word in ["INSERT", "UPDATE", "DELETE", "DROP"]):
4
        return "Error: Only SELECT queries are allowed"
5
    
6
    # Validate: No system tables
7
    if "information_schema" in query.lower():
8
        return "Error: System table access not allowed"
9
    
10
    # Execute
11
    return execute_query(query)
12
 

3. Rate Limit Tool Calls

python
1
from collections import defaultdict
2
import time
3
 
4
tool_calls = defaultdict(list)
5
 
6
def rate_limited_execute(user_id: str, tool_name: str, args: dict) -> str:
7
    now = time.time()
8
    recent_calls = [t for t in tool_calls[user_id] if now - t < 60]
9
    
10
    if len(recent_calls) >= 10:
11
        return "Error: Rate limit exceeded. Try again in a minute."
12
    
13
    tool_calls[user_id].append(now)
14
    return execute_tool(tool_name, args)
15
 

4. Audit All Tool Usage

python
1
import logging
2
 
3
def audited_execute(user_id: str, tool_name: str, args: dict) -> str:
4
    logging.info(f"TOOL_CALL | user={user_id} | tool={tool_name} | args={args}")
5
    
6
    result = execute_tool(tool_name, args)
7
    
8
    logging.info(f"TOOL_RESULT | user={user_id} | tool={tool_name} | result_length={len(result)}")
9
    
10
    return result
11
 

Measuring Tool Effectiveness

Track these metrics:

python
1
from dataclasses import dataclass
2
from datetime import datetime
3
 
4
@dataclass
5
class ToolMetrics:
6
    tool_name: str
7
    call_count: int
8
    success_rate: float
9
    avg_latency_ms: float
10
    error_types: dict
11
    
12
def analyze_tool_usage(logs: list) -> dict:
13
    metrics = {}
14
    
15
    for tool_name in set(log["tool"] for log in logs):
16
        tool_logs = [l for l in logs if l["tool"] == tool_name]
17
        
18
        metrics[tool_name] = ToolMetrics(
19
            tool_name=tool_name,
20
            call_count=len(tool_logs),
21
            success_rate=sum(1 for l in tool_logs if l["success"]) / len(tool_logs),
22
            avg_latency_ms=sum(l["latency"] for l in tool_logs) / len(tool_logs),
23
            error_types=count_errors(tool_logs)
24
        )
25
    
26
    return metrics
27
 

Key questions:

  • Which tools are used most?
  • Which tools fail most often?
  • Are there tools the LLM never uses? (Remove or improve descriptions)
  • Are there missing tools? (Check for failed attempts)

Conclusion

Tool use is what transforms LLMs from text generators into agents that can act in the world:

  • Define clear tools with specific descriptions
  • Sandbox code execution — never run LLM code directly
  • Handle errors gracefully — tools fail, plan for it
  • Secure by default — allowlist, validate, rate limit, audit

Start with 2-3 essential tools. Add more only when you see the need. A focused agent with good tools beats a confused agent with many.


Ready to add secure code execution to your agent's toolkit? Get started with HopX — sandboxes that spin up in 100ms.

Further Reading