Tool Use: How AI Agents Interact with the Real World
An LLM without tools is like a brain without a body. It can think, reason, and generate text—but it can't do anything.
Tool use changes everything. Give an LLM access to tools, and suddenly it can search the web, query databases, execute code, send emails, and interact with any API. It transforms from a text generator into an autonomous agent.
This guide shows you how to implement tool use properly—from basic function calling to complex multi-tool orchestration.
What Is Tool Use?
Tool use is a pattern where an LLM decides when and how to call external functions to accomplish a task:
| 1 | ┌─────────────────────────────────────────────────────────────┐ |
| 2 | │ User Query │ |
| 3 | └─────────────────────────────────────────────────────────────┘ |
| 4 | │ |
| 5 | ▼ |
| 6 | ┌─────────────────────────────────────────────────────────────┐ |
| 7 | │ LLM │ |
| 8 | │ │ |
| 9 | │ "I need to check the weather. I'll use the weather tool" │ |
| 10 | │ │ |
| 11 | └─────────────────────────────────────────────────────────────┘ |
| 12 | │ |
| 13 | ▼ |
| 14 | ┌───────────────────────────────┐ |
| 15 | │ Tool Call: get_weather │ |
| 16 | │ Args: {"city": "London"} │ |
| 17 | └───────────────────────────────┘ |
| 18 | │ |
| 19 | ▼ |
| 20 | ┌───────────────────────────────┐ |
| 21 | │ Tool Result: "15°C, Cloudy" │ |
| 22 | └───────────────────────────────┘ |
| 23 | │ |
| 24 | ▼ |
| 25 | ┌─────────────────────────────────────────────────────────────┐ |
| 26 | │ LLM │ |
| 27 | │ │ |
| 28 | │ "The weather in London is 15°C and cloudy." │ |
| 29 | │ │ |
| 30 | └─────────────────────────────────────────────────────────────┘ |
| 31 | |
The key insight: the LLM doesn't execute tools directly. It outputs a structured request (tool name + arguments), your code executes the tool, and you feed the result back to the LLM.
Why Tools Matter
Without tools, LLMs are limited to:
- Knowledge frozen at training time
- No access to private data
- Can't take actions in the world
- Can only generate text
With tools, LLMs can:
- Access real-time information
- Query your databases
- Execute code and analyze data
- Send emails, create tickets, deploy code
- Integrate with any API
Tools are what turn chat into action.
Basic Tool Implementation
OpenAI Function Calling
Here's the standard pattern with OpenAI:
| 1 | import openai |
| 2 | import json |
| 3 | |
| 4 | # Define tools |
| 5 | tools = [ |
| 6 | { |
| 7 | "type": "function", |
| 8 | "function": { |
| 9 | "name": "get_weather", |
| 10 | "description": "Get current weather for a city", |
| 11 | "parameters": { |
| 12 | "type": "object", |
| 13 | "properties": { |
| 14 | "city": { |
| 15 | "type": "string", |
| 16 | "description": "City name, e.g., 'London'" |
| 17 | }, |
| 18 | "units": { |
| 19 | "type": "string", |
| 20 | "enum": ["celsius", "fahrenheit"], |
| 21 | "description": "Temperature units" |
| 22 | } |
| 23 | }, |
| 24 | "required": ["city"] |
| 25 | } |
| 26 | } |
| 27 | }, |
| 28 | { |
| 29 | "type": "function", |
| 30 | "function": { |
| 31 | "name": "search_web", |
| 32 | "description": "Search the web for current information", |
| 33 | "parameters": { |
| 34 | "type": "object", |
| 35 | "properties": { |
| 36 | "query": { |
| 37 | "type": "string", |
| 38 | "description": "Search query" |
| 39 | } |
| 40 | }, |
| 41 | "required": ["query"] |
| 42 | } |
| 43 | } |
| 44 | } |
| 45 | ] |
| 46 | |
| 47 | # Tool implementations |
| 48 | def get_weather(city: str, units: str = "celsius") -> str: |
| 49 | # In production, call a real weather API |
| 50 | return f"Weather in {city}: 15°C, Cloudy" |
| 51 | |
| 52 | def search_web(query: str) -> str: |
| 53 | # In production, use a search API |
| 54 | return f"Search results for '{query}': ..." |
| 55 | |
| 56 | tool_functions = { |
| 57 | "get_weather": get_weather, |
| 58 | "search_web": search_web |
| 59 | } |
| 60 | |
| 61 | # Main loop |
| 62 | def run_agent(user_message: str) -> str: |
| 63 | client = openai.OpenAI() |
| 64 | messages = [{"role": "user", "content": user_message}] |
| 65 | |
| 66 | while True: |
| 67 | response = client.chat.completions.create( |
| 68 | model="gpt-4o", |
| 69 | messages=messages, |
| 70 | tools=tools |
| 71 | ) |
| 72 | |
| 73 | message = response.choices[0].message |
| 74 | |
| 75 | # Check if LLM wants to use tools |
| 76 | if message.tool_calls: |
| 77 | # Add assistant message with tool calls |
| 78 | messages.append(message) |
| 79 | |
| 80 | # Execute each tool |
| 81 | for tool_call in message.tool_calls: |
| 82 | function_name = tool_call.function.name |
| 83 | arguments = json.loads(tool_call.function.arguments) |
| 84 | |
| 85 | # Call the actual function |
| 86 | result = tool_functions[function_name](**arguments) |
| 87 | |
| 88 | # Add tool result to messages |
| 89 | messages.append({ |
| 90 | "role": "tool", |
| 91 | "tool_call_id": tool_call.id, |
| 92 | "content": result |
| 93 | }) |
| 94 | else: |
| 95 | # No more tool calls, return final response |
| 96 | return message.content |
| 97 | |
| 98 | |
| 99 | # Usage |
| 100 | response = run_agent("What's the weather in Tokyo and London?") |
| 101 | print(response) |
| 102 | |
Tool Design Principles
1. Clear, Specific Descriptions
The LLM decides which tool to use based on descriptions. Be precise:
| 1 | # ❌ Bad: Vague description |
| 2 | { |
| 3 | "name": "search", |
| 4 | "description": "Search for things" |
| 5 | } |
| 6 | |
| 7 | # ✅ Good: Specific description |
| 8 | { |
| 9 | "name": "search_documentation", |
| 10 | "description": "Search the official product documentation for API references, tutorials, and guides. Use for technical questions about how to use our product." |
| 11 | } |
| 12 | |
2. Constrained Parameters
Use enums and clear types to prevent errors:
| 1 | # ❌ Bad: Open-ended parameter |
| 2 | { |
| 3 | "name": "priority", |
| 4 | "type": "string", |
| 5 | "description": "Task priority" |
| 6 | } |
| 7 | |
| 8 | # ✅ Good: Constrained parameter |
| 9 | { |
| 10 | "name": "priority", |
| 11 | "type": "string", |
| 12 | "enum": ["low", "medium", "high", "critical"], |
| 13 | "description": "Task priority level" |
| 14 | } |
| 15 | |
3. Atomic Operations
Each tool should do one thing well:
| 1 | # ❌ Bad: Tool does too much |
| 2 | { |
| 3 | "name": "manage_user", |
| 4 | "description": "Create, update, delete, or fetch user" |
| 5 | } |
| 6 | |
| 7 | # ✅ Good: Separate tools |
| 8 | { |
| 9 | "name": "create_user", |
| 10 | "description": "Create a new user account" |
| 11 | } |
| 12 | { |
| 13 | "name": "get_user", |
| 14 | "description": "Fetch user details by ID or email" |
| 15 | } |
| 16 | { |
| 17 | "name": "update_user", |
| 18 | "description": "Update user profile information" |
| 19 | } |
| 20 | |
4. Meaningful Return Values
Return structured, actionable data:
| 1 | # ❌ Bad: Just a status |
| 2 | def create_task(title: str) -> str: |
| 3 | # ... create task ... |
| 4 | return "Task created" |
| 5 | |
| 6 | # ✅ Good: Return useful information |
| 7 | def create_task(title: str) -> str: |
| 8 | task = db.tasks.create(title=title) |
| 9 | return json.dumps({ |
| 10 | "task_id": task.id, |
| 11 | "title": task.title, |
| 12 | "status": "created", |
| 13 | "url": f"https://app.example.com/tasks/{task.id}" |
| 14 | }) |
| 15 | |
Code Execution as a Tool
The most powerful tool you can give an LLM is the ability to execute code. But it's also the most dangerous.
The Wrong Way (Never Do This)
| 1 | # ⚠️ DANGEROUS: Never execute LLM-generated code directly |
| 2 | def run_code(code: str) -> str: |
| 3 | exec(code) # This can delete files, exfiltrate data, anything |
| 4 | return "Done" |
| 5 | |
The Right Way: Sandboxed Execution
| 1 | from hopx import Sandbox |
| 2 | |
| 3 | def run_python_code(code: str) -> str: |
| 4 | """Execute Python code in an isolated sandbox""" |
| 5 | sandbox = Sandbox.create(template="code-interpreter") |
| 6 | |
| 7 | try: |
| 8 | # Write code to sandbox |
| 9 | sandbox.files.write("/app/script.py", code) |
| 10 | |
| 11 | # Execute in isolation |
| 12 | result = sandbox.commands.run("python /app/script.py", timeout=30) |
| 13 | |
| 14 | if result.exit_code == 0: |
| 15 | return result.stdout |
| 16 | else: |
| 17 | return f"Error: {result.stderr}" |
| 18 | |
| 19 | finally: |
| 20 | sandbox.kill() # Destroy sandbox completely |
| 21 | |
| 22 | |
| 23 | # Define as a tool |
| 24 | code_execution_tool = { |
| 25 | "type": "function", |
| 26 | "function": { |
| 27 | "name": "run_python_code", |
| 28 | "description": "Execute Python code to perform calculations, data analysis, or any programmatic task. Use this when you need to compute something precisely.", |
| 29 | "parameters": { |
| 30 | "type": "object", |
| 31 | "properties": { |
| 32 | "code": { |
| 33 | "type": "string", |
| 34 | "description": "Python code to execute. Must be complete and runnable." |
| 35 | } |
| 36 | }, |
| 37 | "required": ["code"] |
| 38 | } |
| 39 | } |
| 40 | } |
| 41 | |
The sandbox ensures:
- Code can't access your host filesystem
- Code can't make unauthorized network requests
- Code can't persist beyond the execution
- Resource limits prevent infinite loops
See Why AI Agents Need Isolated Code Execution for more on security.
Common Tool Categories
Information Retrieval
| 1 | tools = [ |
| 2 | { |
| 3 | "name": "search_web", |
| 4 | "description": "Search the internet for current information" |
| 5 | }, |
| 6 | { |
| 7 | "name": "search_docs", |
| 8 | "description": "Search internal documentation and knowledge base" |
| 9 | }, |
| 10 | { |
| 11 | "name": "get_url_content", |
| 12 | "description": "Fetch and read content from a specific URL" |
| 13 | }, |
| 14 | { |
| 15 | "name": "query_database", |
| 16 | "description": "Run a read-only SQL query against the database" |
| 17 | } |
| 18 | ] |
| 19 | |
Data Operations
| 1 | tools = [ |
| 2 | { |
| 3 | "name": "read_file", |
| 4 | "description": "Read contents of a file" |
| 5 | }, |
| 6 | { |
| 7 | "name": "write_file", |
| 8 | "description": "Write content to a file" |
| 9 | }, |
| 10 | { |
| 11 | "name": "analyze_csv", |
| 12 | "description": "Load and analyze a CSV file using pandas" |
| 13 | }, |
| 14 | { |
| 15 | "name": "create_chart", |
| 16 | "description": "Generate a chart from data" |
| 17 | } |
| 18 | ] |
| 19 | |
Communication
| 1 | tools = [ |
| 2 | { |
| 3 | "name": "send_email", |
| 4 | "description": "Send an email to specified recipients" |
| 5 | }, |
| 6 | { |
| 7 | "name": "send_slack_message", |
| 8 | "description": "Post a message to a Slack channel" |
| 9 | }, |
| 10 | { |
| 11 | "name": "create_ticket", |
| 12 | "description": "Create a support ticket in the ticketing system" |
| 13 | } |
| 14 | ] |
| 15 | |
Actions
| 1 | tools = [ |
| 2 | { |
| 3 | "name": "run_python_code", |
| 4 | "description": "Execute Python code in a sandbox" |
| 5 | }, |
| 6 | { |
| 7 | "name": "deploy_to_staging", |
| 8 | "description": "Deploy the current branch to staging environment" |
| 9 | }, |
| 10 | { |
| 11 | "name": "run_tests", |
| 12 | "description": "Run the test suite and return results" |
| 13 | } |
| 14 | ] |
| 15 | |
Advanced: Multi-Tool Orchestration
Real agents often need to use multiple tools in sequence:
| 1 | import openai |
| 2 | import json |
| 3 | from hopx import Sandbox |
| 4 | |
| 5 | class ToolOrchestrator: |
| 6 | def __init__(self): |
| 7 | self.client = openai.OpenAI() |
| 8 | self.tools = self._define_tools() |
| 9 | self.max_iterations = 10 |
| 10 | |
| 11 | def run(self, task: str) -> str: |
| 12 | messages = [ |
| 13 | {"role": "system", "content": self._system_prompt()}, |
| 14 | {"role": "user", "content": task} |
| 15 | ] |
| 16 | |
| 17 | for _ in range(self.max_iterations): |
| 18 | response = self.client.chat.completions.create( |
| 19 | model="gpt-4o", |
| 20 | messages=messages, |
| 21 | tools=self.tools |
| 22 | ) |
| 23 | |
| 24 | message = response.choices[0].message |
| 25 | |
| 26 | if not message.tool_calls: |
| 27 | return message.content |
| 28 | |
| 29 | messages.append(message) |
| 30 | |
| 31 | # Execute all tool calls |
| 32 | for tool_call in message.tool_calls: |
| 33 | result = self._execute_tool( |
| 34 | tool_call.function.name, |
| 35 | json.loads(tool_call.function.arguments) |
| 36 | ) |
| 37 | |
| 38 | messages.append({ |
| 39 | "role": "tool", |
| 40 | "tool_call_id": tool_call.id, |
| 41 | "content": result |
| 42 | }) |
| 43 | |
| 44 | return "Max iterations reached" |
| 45 | |
| 46 | def _execute_tool(self, name: str, args: dict) -> str: |
| 47 | """Route to appropriate tool implementation""" |
| 48 | |
| 49 | if name == "search_web": |
| 50 | return self._search_web(args["query"]) |
| 51 | |
| 52 | elif name == "run_python": |
| 53 | return self._run_python(args["code"]) |
| 54 | |
| 55 | elif name == "read_file": |
| 56 | return self._read_file(args["path"]) |
| 57 | |
| 58 | elif name == "write_file": |
| 59 | return self._write_file(args["path"], args["content"]) |
| 60 | |
| 61 | else: |
| 62 | return f"Unknown tool: {name}" |
| 63 | |
| 64 | def _run_python(self, code: str) -> str: |
| 65 | """Execute Python in sandbox""" |
| 66 | sandbox = Sandbox.create(template="code-interpreter") |
| 67 | |
| 68 | try: |
| 69 | sandbox.files.write("/app/code.py", code) |
| 70 | result = sandbox.commands.run("python /app/code.py", timeout=60) |
| 71 | |
| 72 | output = result.stdout if result.exit_code == 0 else f"Error: {result.stderr}" |
| 73 | return output[:5000] # Truncate long outputs |
| 74 | |
| 75 | finally: |
| 76 | sandbox.kill() |
| 77 | |
| 78 | def _search_web(self, query: str) -> str: |
| 79 | # Implement with your preferred search API |
| 80 | return f"Search results for: {query}" |
| 81 | |
| 82 | def _read_file(self, path: str) -> str: |
| 83 | sandbox = Sandbox.create(template="code-interpreter") |
| 84 | try: |
| 85 | content = sandbox.files.read(path) |
| 86 | return content[:10000] |
| 87 | except: |
| 88 | return f"File not found: {path}" |
| 89 | finally: |
| 90 | sandbox.kill() |
| 91 | |
| 92 | def _write_file(self, path: str, content: str) -> str: |
| 93 | sandbox = Sandbox.create(template="code-interpreter") |
| 94 | try: |
| 95 | sandbox.files.write(path, content) |
| 96 | return f"Successfully wrote to {path}" |
| 97 | finally: |
| 98 | sandbox.kill() |
| 99 | |
| 100 | def _define_tools(self) -> list: |
| 101 | return [ |
| 102 | { |
| 103 | "type": "function", |
| 104 | "function": { |
| 105 | "name": "search_web", |
| 106 | "description": "Search the web for current information", |
| 107 | "parameters": { |
| 108 | "type": "object", |
| 109 | "properties": { |
| 110 | "query": {"type": "string", "description": "Search query"} |
| 111 | }, |
| 112 | "required": ["query"] |
| 113 | } |
| 114 | } |
| 115 | }, |
| 116 | { |
| 117 | "type": "function", |
| 118 | "function": { |
| 119 | "name": "run_python", |
| 120 | "description": "Execute Python code for calculations, data analysis, or any programmatic task", |
| 121 | "parameters": { |
| 122 | "type": "object", |
| 123 | "properties": { |
| 124 | "code": {"type": "string", "description": "Complete Python code to execute"} |
| 125 | }, |
| 126 | "required": ["code"] |
| 127 | } |
| 128 | } |
| 129 | }, |
| 130 | { |
| 131 | "type": "function", |
| 132 | "function": { |
| 133 | "name": "read_file", |
| 134 | "description": "Read the contents of a file", |
| 135 | "parameters": { |
| 136 | "type": "object", |
| 137 | "properties": { |
| 138 | "path": {"type": "string", "description": "Path to the file"} |
| 139 | }, |
| 140 | "required": ["path"] |
| 141 | } |
| 142 | } |
| 143 | }, |
| 144 | { |
| 145 | "type": "function", |
| 146 | "function": { |
| 147 | "name": "write_file", |
| 148 | "description": "Write content to a file", |
| 149 | "parameters": { |
| 150 | "type": "object", |
| 151 | "properties": { |
| 152 | "path": {"type": "string", "description": "Path to the file"}, |
| 153 | "content": {"type": "string", "description": "Content to write"} |
| 154 | }, |
| 155 | "required": ["path", "content"] |
| 156 | } |
| 157 | } |
| 158 | } |
| 159 | ] |
| 160 | |
| 161 | def _system_prompt(self) -> str: |
| 162 | return """You are a helpful AI assistant with access to tools. |
| 163 | |
| 164 | Use tools when needed to complete tasks. You can: |
| 165 | - Search the web for current information |
| 166 | - Execute Python code for calculations and data analysis |
| 167 | - Read and write files |
| 168 | |
| 169 | Think step by step. Use the most appropriate tool for each sub-task. |
| 170 | When you have enough information to answer, provide a clear response.""" |
| 171 | |
| 172 | |
| 173 | # Usage |
| 174 | orchestrator = ToolOrchestrator() |
| 175 | result = orchestrator.run( |
| 176 | "Find the current Bitcoin price, calculate what 0.5 BTC would be worth, " |
| 177 | "and save the result to a file called 'btc_value.txt'" |
| 178 | ) |
| 179 | print(result) |
| 180 | |
Parallel Tool Execution
When tools are independent, run them in parallel:
| 1 | import asyncio |
| 2 | import openai |
| 3 | |
| 4 | async def execute_tools_parallel(tool_calls: list) -> list: |
| 5 | """Execute multiple tool calls concurrently""" |
| 6 | |
| 7 | async def execute_single(tool_call): |
| 8 | name = tool_call.function.name |
| 9 | args = json.loads(tool_call.function.arguments) |
| 10 | |
| 11 | # Run in thread pool to avoid blocking |
| 12 | loop = asyncio.get_event_loop() |
| 13 | result = await loop.run_in_executor( |
| 14 | None, |
| 15 | lambda: tool_functions[name](**args) |
| 16 | ) |
| 17 | |
| 18 | return { |
| 19 | "tool_call_id": tool_call.id, |
| 20 | "content": result |
| 21 | } |
| 22 | |
| 23 | # Execute all tools concurrently |
| 24 | results = await asyncio.gather(*[ |
| 25 | execute_single(tc) for tc in tool_calls |
| 26 | ]) |
| 27 | |
| 28 | return results |
| 29 | |
| 30 | |
| 31 | # In the main loop |
| 32 | if message.tool_calls: |
| 33 | results = asyncio.run(execute_tools_parallel(message.tool_calls)) |
| 34 | for result in results: |
| 35 | messages.append({"role": "tool", **result}) |
| 36 | |
Error Handling
Tools fail. Handle it gracefully:
| 1 | def execute_tool_safely(name: str, args: dict) -> str: |
| 2 | """Execute a tool with proper error handling""" |
| 3 | |
| 4 | try: |
| 5 | # Validate tool exists |
| 6 | if name not in tool_functions: |
| 7 | return json.dumps({ |
| 8 | "error": f"Unknown tool: {name}", |
| 9 | "available_tools": list(tool_functions.keys()) |
| 10 | }) |
| 11 | |
| 12 | # Execute with timeout |
| 13 | import signal |
| 14 | |
| 15 | def timeout_handler(signum, frame): |
| 16 | raise TimeoutError("Tool execution timed out") |
| 17 | |
| 18 | signal.signal(signal.SIGALRM, timeout_handler) |
| 19 | signal.alarm(30) # 30 second timeout |
| 20 | |
| 21 | try: |
| 22 | result = tool_functions[name](**args) |
| 23 | finally: |
| 24 | signal.alarm(0) # Cancel timeout |
| 25 | |
| 26 | return result |
| 27 | |
| 28 | except TimeoutError as e: |
| 29 | return json.dumps({ |
| 30 | "error": "Tool execution timed out", |
| 31 | "tool": name, |
| 32 | "suggestion": "Try a simpler query or break into smaller steps" |
| 33 | }) |
| 34 | |
| 35 | except TypeError as e: |
| 36 | return json.dumps({ |
| 37 | "error": f"Invalid arguments: {str(e)}", |
| 38 | "tool": name, |
| 39 | "received_args": args |
| 40 | }) |
| 41 | |
| 42 | except Exception as e: |
| 43 | return json.dumps({ |
| 44 | "error": f"Tool execution failed: {str(e)}", |
| 45 | "tool": name, |
| 46 | "error_type": type(e).__name__ |
| 47 | }) |
| 48 | |
Tool Use Patterns
Pattern 1: Retrieval-Augmented Generation (RAG)
Search first, then answer:
| 1 | def rag_answer(question: str) -> str: |
| 2 | # Step 1: Search for relevant information |
| 3 | search_results = search_knowledge_base(question) |
| 4 | |
| 5 | # Step 2: Generate answer using retrieved context |
| 6 | response = client.chat.completions.create( |
| 7 | model="gpt-4o", |
| 8 | messages=[ |
| 9 | {"role": "system", "content": f"Use this context to answer: {search_results}"}, |
| 10 | {"role": "user", "content": question} |
| 11 | ] |
| 12 | ) |
| 13 | |
| 14 | return response.choices[0].message.content |
| 15 | |
Pattern 2: Verification Loop
Use tools to verify LLM outputs:
| 1 | def verified_answer(question: str) -> str: |
| 2 | # Generate initial answer |
| 3 | answer = generate_answer(question) |
| 4 | |
| 5 | # Verify with tools |
| 6 | verification = run_python(f""" |
| 7 | # Verify the claim: {answer} |
| 8 | # Check against authoritative sources |
| 9 | result = verify_claim("{answer}") |
| 10 | print(result) |
| 11 | """) |
| 12 | |
| 13 | if "verified" in verification.lower(): |
| 14 | return answer |
| 15 | else: |
| 16 | # Regenerate with verification feedback |
| 17 | return generate_answer(f"{question}\n\nNote: {verification}") |
| 18 | |
Pattern 3: Progressive Disclosure
Start with cheap tools, escalate as needed:
| 1 | def progressive_search(query: str) -> str: |
| 2 | # Level 1: Check cache (free, instant) |
| 3 | cached = check_cache(query) |
| 4 | if cached: |
| 5 | return cached |
| 6 | |
| 7 | # Level 2: Search local docs (cheap, fast) |
| 8 | local = search_local_docs(query) |
| 9 | if is_sufficient(local): |
| 10 | return local |
| 11 | |
| 12 | # Level 3: Search web (expensive, slow) |
| 13 | web = search_web(query) |
| 14 | cache_result(query, web) |
| 15 | return web |
| 16 | |
Security Best Practices
1. Allowlist Tools Per Use Case
| 1 | # Different tool sets for different contexts |
| 2 | CUSTOMER_SUPPORT_TOOLS = ["search_faq", "create_ticket", "get_order_status"] |
| 3 | ADMIN_TOOLS = ["run_sql", "modify_user", "deploy_code"] |
| 4 | |
| 5 | def get_tools_for_user(user_role: str) -> list: |
| 6 | if user_role == "admin": |
| 7 | return ADMIN_TOOLS |
| 8 | else: |
| 9 | return CUSTOMER_SUPPORT_TOOLS |
| 10 | |
2. Validate All Inputs
| 1 | def run_sql(query: str) -> str: |
| 2 | # Validate: Read-only queries only |
| 3 | if any(word in query.upper() for word in ["INSERT", "UPDATE", "DELETE", "DROP"]): |
| 4 | return "Error: Only SELECT queries are allowed" |
| 5 | |
| 6 | # Validate: No system tables |
| 7 | if "information_schema" in query.lower(): |
| 8 | return "Error: System table access not allowed" |
| 9 | |
| 10 | # Execute |
| 11 | return execute_query(query) |
| 12 | |
3. Rate Limit Tool Calls
| 1 | from collections import defaultdict |
| 2 | import time |
| 3 | |
| 4 | tool_calls = defaultdict(list) |
| 5 | |
| 6 | def rate_limited_execute(user_id: str, tool_name: str, args: dict) -> str: |
| 7 | now = time.time() |
| 8 | recent_calls = [t for t in tool_calls[user_id] if now - t < 60] |
| 9 | |
| 10 | if len(recent_calls) >= 10: |
| 11 | return "Error: Rate limit exceeded. Try again in a minute." |
| 12 | |
| 13 | tool_calls[user_id].append(now) |
| 14 | return execute_tool(tool_name, args) |
| 15 | |
4. Audit All Tool Usage
| 1 | import logging |
| 2 | |
| 3 | def audited_execute(user_id: str, tool_name: str, args: dict) -> str: |
| 4 | logging.info(f"TOOL_CALL | user={user_id} | tool={tool_name} | args={args}") |
| 5 | |
| 6 | result = execute_tool(tool_name, args) |
| 7 | |
| 8 | logging.info(f"TOOL_RESULT | user={user_id} | tool={tool_name} | result_length={len(result)}") |
| 9 | |
| 10 | return result |
| 11 | |
Measuring Tool Effectiveness
Track these metrics:
| 1 | from dataclasses import dataclass |
| 2 | from datetime import datetime |
| 3 | |
| 4 | @dataclass |
| 5 | class ToolMetrics: |
| 6 | tool_name: str |
| 7 | call_count: int |
| 8 | success_rate: float |
| 9 | avg_latency_ms: float |
| 10 | error_types: dict |
| 11 | |
| 12 | def analyze_tool_usage(logs: list) -> dict: |
| 13 | metrics = {} |
| 14 | |
| 15 | for tool_name in set(log["tool"] for log in logs): |
| 16 | tool_logs = [l for l in logs if l["tool"] == tool_name] |
| 17 | |
| 18 | metrics[tool_name] = ToolMetrics( |
| 19 | tool_name=tool_name, |
| 20 | call_count=len(tool_logs), |
| 21 | success_rate=sum(1 for l in tool_logs if l["success"]) / len(tool_logs), |
| 22 | avg_latency_ms=sum(l["latency"] for l in tool_logs) / len(tool_logs), |
| 23 | error_types=count_errors(tool_logs) |
| 24 | ) |
| 25 | |
| 26 | return metrics |
| 27 | |
Key questions:
- Which tools are used most?
- Which tools fail most often?
- Are there tools the LLM never uses? (Remove or improve descriptions)
- Are there missing tools? (Check for failed attempts)
Conclusion
Tool use is what transforms LLMs from text generators into agents that can act in the world:
- Define clear tools with specific descriptions
- Sandbox code execution — never run LLM code directly
- Handle errors gracefully — tools fail, plan for it
- Secure by default — allowlist, validate, rate limit, audit
Start with 2-3 essential tools. Add more only when you see the need. A focused agent with good tools beats a confused agent with many.
Ready to add secure code execution to your agent's toolkit? Get started with HopX — sandboxes that spin up in 100ms.
Further Reading
- What Is an AI Agent? — The fundamentals of agentic systems
- Prompt Chaining — Combine tools in sequential workflows
- The Reflection Pattern — Use tools to verify and improve outputs
- Why AI Agents Need Isolated Code Execution — Security for code execution tools
- OpenAI Function Calling Guide — Official documentation