Desktop Automation with HopX: Browser Testing & RPA
Browser automation and RPA (Robotic Process Automation) are powerful tools for testing, web scraping, and automating repetitive tasks. But running browsers locally creates problems: resource usage, security risks, and scalability limitations.
HopX sandboxes provide isolated environments with full desktop capabilities, including browsers with GPU acceleration. This guide shows you how to run browser automation at scale.
Why Cloud-Based Browser Automation?
Local automation challenges:
- Browsers consume significant RAM and CPU
- Parallel execution requires expensive hardware
- Security risks from executing untrusted code
- Difficult to scale beyond a single machine
- Environment inconsistency across machines
HopX sandbox advantages:
- Each sandbox runs in an isolated micro-VM
- Full browser support with virtual display
- Scale to hundreds of parallel sessions
- Consistent, reproducible environments
- No local resource consumption
Setting Up Browser Automation
HopX provides a pre-configured desktop template with browsers and automation tools installed:
| 1 | from hopx import Sandbox |
| 2 | |
| 3 | # Create sandbox with desktop capabilities |
| 4 | sandbox = Sandbox.create(template="desktop") |
| 5 | |
| 6 | # Verify browser installation |
| 7 | result = sandbox.commands.run("chromium --version") |
| 8 | print(result.stdout) # Chromium 120.0.6099.71 |
| 9 | |
The desktop template includes:
- Chromium - Full browser with DevTools
- Firefox - Alternative browser engine
- Virtual Display (Xvfb) - Headless display server
- Playwright - Modern automation framework
- Selenium - Traditional WebDriver
Playwright Automation Examples
Playwright is the recommended tool for modern browser automation. It supports multiple browsers and provides excellent reliability.
Basic Navigation and Screenshots
| 1 | from hopx import Sandbox |
| 2 | |
| 3 | sandbox = Sandbox.create(template="desktop") |
| 4 | |
| 5 | # Write Playwright script |
| 6 | playwright_script = ''' |
| 7 | import asyncio |
| 8 | from playwright.async_api import async_playwright |
| 9 | |
| 10 | async def main(): |
| 11 | async with async_playwright() as p: |
| 12 | browser = await p.chromium.launch(headless=True) |
| 13 | page = await browser.new_page() |
| 14 | |
| 15 | # Navigate and take screenshot |
| 16 | await page.goto('https://hopx.ai') |
| 17 | await page.screenshot(path='/tmp/homepage.png') |
| 18 | |
| 19 | # Get page title |
| 20 | title = await page.title() |
| 21 | print(f"Page title: {title}") |
| 22 | |
| 23 | await browser.close() |
| 24 | |
| 25 | asyncio.run(main()) |
| 26 | ''' |
| 27 | |
| 28 | sandbox.files.write("/app/scrape.py", playwright_script) |
| 29 | result = sandbox.commands.run("cd /app && python scrape.py") |
| 30 | print(result.stdout) |
| 31 | |
| 32 | # Download the screenshot |
| 33 | screenshot = sandbox.files.read("/tmp/homepage.png") |
| 34 | with open("homepage.png", "wb") as f: |
| 35 | f.write(screenshot) |
| 36 | |
Form Automation and Data Extraction
| 1 | from hopx import Sandbox |
| 2 | |
| 3 | sandbox = Sandbox.create(template="desktop") |
| 4 | |
| 5 | form_script = ''' |
| 6 | import asyncio |
| 7 | from playwright.async_api import async_playwright |
| 8 | |
| 9 | async def main(): |
| 10 | async with async_playwright() as p: |
| 11 | browser = await p.chromium.launch(headless=True) |
| 12 | page = await browser.new_page() |
| 13 | |
| 14 | # Navigate to a form |
| 15 | await page.goto('https://example.com/contact') |
| 16 | |
| 17 | # Fill form fields |
| 18 | await page.fill('#name', 'John Doe') |
| 19 | await page.fill('#email', 'john@example.com') |
| 20 | await page.fill('#message', 'Hello from HopX!') |
| 21 | |
| 22 | # Submit and wait for response |
| 23 | await page.click('button[type="submit"]') |
| 24 | await page.wait_for_selector('.success-message') |
| 25 | |
| 26 | # Extract confirmation |
| 27 | confirmation = await page.text_content('.success-message') |
| 28 | print(f"Result: {confirmation}") |
| 29 | |
| 30 | await browser.close() |
| 31 | |
| 32 | asyncio.run(main()) |
| 33 | ''' |
| 34 | |
| 35 | sandbox.files.write("/app/form.py", form_script) |
| 36 | result = sandbox.commands.run("cd /app && python form.py") |
| 37 | print(result.stdout) |
| 38 | |
Parallel Browser Sessions
One of the biggest advantages of HopX is running multiple browsers in parallel:
| 1 | from hopx import Sandbox |
| 2 | import concurrent.futures |
| 3 | |
| 4 | def scrape_url(url): |
| 5 | """Scrape a single URL in its own sandbox""" |
| 6 | sandbox = Sandbox.create(template="desktop") |
| 7 | |
| 8 | script = f''' |
| 9 | import asyncio |
| 10 | from playwright.async_api import async_playwright |
| 11 | |
| 12 | async def main(): |
| 13 | async with async_playwright() as p: |
| 14 | browser = await p.chromium.launch(headless=True) |
| 15 | page = await browser.new_page() |
| 16 | |
| 17 | await page.goto("{url}") |
| 18 | title = await page.title() |
| 19 | content = await page.content() |
| 20 | |
| 21 | print(f"Title: {{title}}") |
| 22 | print(f"Length: {{len(content)}} chars") |
| 23 | |
| 24 | await browser.close() |
| 25 | |
| 26 | asyncio.run(main()) |
| 27 | ''' |
| 28 | |
| 29 | sandbox.files.write("/app/scrape.py", script) |
| 30 | result = sandbox.commands.run("cd /app && python scrape.py") |
| 31 | sandbox.kill() # Clean up |
| 32 | |
| 33 | return result.stdout |
| 34 | |
| 35 | # Scrape multiple URLs in parallel |
| 36 | urls = [ |
| 37 | "https://news.ycombinator.com", |
| 38 | "https://github.com/trending", |
| 39 | "https://reddit.com/r/programming", |
| 40 | "https://dev.to", |
| 41 | "https://lobste.rs" |
| 42 | ] |
| 43 | |
| 44 | with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: |
| 45 | results = list(executor.map(scrape_url, urls)) |
| 46 | |
| 47 | for url, result in zip(urls, results): |
| 48 | print(f"\n{url}:") |
| 49 | print(result) |
| 50 | |
Selenium WebDriver Examples
For projects already using Selenium, HopX sandboxes work seamlessly:
| 1 | from hopx import Sandbox |
| 2 | |
| 3 | sandbox = Sandbox.create(template="desktop") |
| 4 | |
| 5 | selenium_script = ''' |
| 6 | from selenium import webdriver |
| 7 | from selenium.webdriver.chrome.options import Options |
| 8 | from selenium.webdriver.common.by import By |
| 9 | from selenium.webdriver.support.ui import WebDriverWait |
| 10 | from selenium.webdriver.support import expected_conditions as EC |
| 11 | |
| 12 | # Configure headless Chrome |
| 13 | options = Options() |
| 14 | options.add_argument('--headless') |
| 15 | options.add_argument('--no-sandbox') |
| 16 | options.add_argument('--disable-dev-shm-usage') |
| 17 | options.add_argument('--disable-gpu') |
| 18 | |
| 19 | driver = webdriver.Chrome(options=options) |
| 20 | |
| 21 | try: |
| 22 | driver.get('https://hopx.ai') |
| 23 | |
| 24 | # Wait for element to load |
| 25 | wait = WebDriverWait(driver, 10) |
| 26 | element = wait.until( |
| 27 | EC.presence_of_element_located((By.TAG_NAME, "h1")) |
| 28 | ) |
| 29 | |
| 30 | print(f"Title: {driver.title}") |
| 31 | print(f"H1: {element.text}") |
| 32 | |
| 33 | # Take screenshot |
| 34 | driver.save_screenshot('/tmp/selenium-shot.png') |
| 35 | |
| 36 | finally: |
| 37 | driver.quit() |
| 38 | ''' |
| 39 | |
| 40 | sandbox.files.write("/app/selenium_test.py", selenium_script) |
| 41 | result = sandbox.commands.run("cd /app && python selenium_test.py") |
| 42 | print(result.stdout) |
| 43 | |
RPA Workflow Automation
HopX is perfect for Robotic Process Automation (RPA) tasks that interact with web applications:
Example: Invoice Processing Automation
| 1 | from hopx import Sandbox |
| 2 | |
| 3 | sandbox = Sandbox.create(template="desktop") |
| 4 | |
| 5 | rpa_script = ''' |
| 6 | import asyncio |
| 7 | from playwright.async_api import async_playwright |
| 8 | import json |
| 9 | |
| 10 | async def process_invoices(): |
| 11 | async with async_playwright() as p: |
| 12 | browser = await p.chromium.launch(headless=True) |
| 13 | context = await browser.new_context() |
| 14 | page = await context.new_page() |
| 15 | |
| 16 | # Login to invoice portal |
| 17 | await page.goto('https://invoices.example.com/login') |
| 18 | await page.fill('#username', 'automation@company.com') |
| 19 | await page.fill('#password', 'secure_password') |
| 20 | await page.click('#login-button') |
| 21 | |
| 22 | # Wait for dashboard |
| 23 | await page.wait_for_selector('.invoice-list') |
| 24 | |
| 25 | # Get all pending invoices |
| 26 | invoices = await page.query_selector_all('.invoice-item.pending') |
| 27 | |
| 28 | processed = [] |
| 29 | for invoice in invoices: |
| 30 | invoice_id = await invoice.get_attribute('data-id') |
| 31 | amount = await invoice.text_content('.amount') |
| 32 | |
| 33 | # Click to open invoice |
| 34 | await invoice.click() |
| 35 | await page.wait_for_selector('.invoice-details') |
| 36 | |
| 37 | # Approve invoice |
| 38 | await page.click('#approve-button') |
| 39 | await page.wait_for_selector('.approval-success') |
| 40 | |
| 41 | processed.append({ |
| 42 | 'id': invoice_id, |
| 43 | 'amount': amount, |
| 44 | 'status': 'approved' |
| 45 | }) |
| 46 | |
| 47 | # Go back to list |
| 48 | await page.click('.back-to-list') |
| 49 | await page.wait_for_selector('.invoice-list') |
| 50 | |
| 51 | print(json.dumps(processed, indent=2)) |
| 52 | await browser.close() |
| 53 | |
| 54 | asyncio.run(process_invoices()) |
| 55 | ''' |
| 56 | |
| 57 | sandbox.files.write("/app/rpa_invoices.py", rpa_script) |
| 58 | result = sandbox.commands.run("cd /app && python rpa_invoices.py") |
| 59 | print(result.stdout) |
| 60 | |
Example: Data Entry Automation
| 1 | from hopx import Sandbox |
| 2 | import json |
| 3 | |
| 4 | # Data to be entered |
| 5 | records = [ |
| 6 | {"name": "Alice Johnson", "email": "alice@example.com", "role": "Developer"}, |
| 7 | {"name": "Bob Smith", "email": "bob@example.com", "role": "Designer"}, |
| 8 | {"name": "Carol White", "email": "carol@example.com", "role": "Manager"}, |
| 9 | ] |
| 10 | |
| 11 | sandbox = Sandbox.create(template="desktop") |
| 12 | |
| 13 | rpa_script = ''' |
| 14 | import asyncio |
| 15 | from playwright.async_api import async_playwright |
| 16 | import json |
| 17 | |
| 18 | async def enter_records(records): |
| 19 | async with async_playwright() as p: |
| 20 | browser = await p.chromium.launch(headless=True) |
| 21 | page = await browser.new_page() |
| 22 | |
| 23 | await page.goto('https://hr-portal.example.com/employees') |
| 24 | |
| 25 | for record in records: |
| 26 | # Click "Add Employee" button |
| 27 | await page.click('#add-employee') |
| 28 | await page.wait_for_selector('#employee-form') |
| 29 | |
| 30 | # Fill form |
| 31 | await page.fill('#name', record['name']) |
| 32 | await page.fill('#email', record['email']) |
| 33 | await page.select_option('#role', record['role']) |
| 34 | |
| 35 | # Submit |
| 36 | await page.click('#submit-employee') |
| 37 | await page.wait_for_selector('.success-toast') |
| 38 | |
| 39 | print(f"Added: {record['name']}") |
| 40 | |
| 41 | await browser.close() |
| 42 | |
| 43 | records = ''' + json.dumps(records) + ''' |
| 44 | asyncio.run(enter_records(records)) |
| 45 | ''' |
| 46 | |
| 47 | sandbox.files.write("/app/data_entry.py", rpa_script) |
| 48 | result = sandbox.commands.run("cd /app && python data_entry.py") |
| 49 | print(result.stdout) |
| 50 | |
Browser Testing for CI/CD
HopX sandboxes are ideal for running end-to-end tests in your CI/CD pipeline:
| 1 | from hopx import Sandbox |
| 2 | |
| 3 | sandbox = Sandbox.create(template="desktop") |
| 4 | |
| 5 | # Write test file |
| 6 | test_script = ''' |
| 7 | import pytest |
| 8 | import asyncio |
| 9 | from playwright.async_api import async_playwright |
| 10 | |
| 11 | class TestHomepage: |
| 12 | @pytest.fixture(scope="class") |
| 13 | def browser(self): |
| 14 | async def get_browser(): |
| 15 | p = await async_playwright().start() |
| 16 | browser = await p.chromium.launch(headless=True) |
| 17 | yield browser |
| 18 | await browser.close() |
| 19 | await p.stop() |
| 20 | return asyncio.get_event_loop().run_until_complete(get_browser()) |
| 21 | |
| 22 | def test_homepage_loads(self, browser): |
| 23 | async def check(): |
| 24 | page = await browser.new_page() |
| 25 | response = await page.goto('https://hopx.ai') |
| 26 | assert response.status == 200 |
| 27 | await page.close() |
| 28 | asyncio.get_event_loop().run_until_complete(check()) |
| 29 | |
| 30 | def test_title_correct(self, browser): |
| 31 | async def check(): |
| 32 | page = await browser.new_page() |
| 33 | await page.goto('https://hopx.ai') |
| 34 | title = await page.title() |
| 35 | assert 'HopX' in title |
| 36 | await page.close() |
| 37 | asyncio.get_event_loop().run_until_complete(check()) |
| 38 | |
| 39 | def test_navigation_works(self, browser): |
| 40 | async def check(): |
| 41 | page = await browser.new_page() |
| 42 | await page.goto('https://hopx.ai') |
| 43 | await page.click('a[href="/docs"]') |
| 44 | await page.wait_for_url('**/docs**') |
| 45 | assert '/docs' in page.url |
| 46 | await page.close() |
| 47 | asyncio.get_event_loop().run_until_complete(check()) |
| 48 | ''' |
| 49 | |
| 50 | sandbox.files.write("/app/test_homepage.py", test_script) |
| 51 | result = sandbox.commands.run("cd /app && pytest test_homepage.py -v") |
| 52 | print(result.stdout) |
| 53 | |
Handling Authentication & Sessions
For RPA tasks requiring authentication, use persistent browser contexts:
| 1 | from hopx import Sandbox |
| 2 | |
| 3 | sandbox = Sandbox.create(template="desktop") |
| 4 | |
| 5 | auth_script = ''' |
| 6 | import asyncio |
| 7 | from playwright.async_api import async_playwright |
| 8 | |
| 9 | async def authenticated_session(): |
| 10 | async with async_playwright() as p: |
| 11 | browser = await p.chromium.launch(headless=True) |
| 12 | |
| 13 | # Create persistent context with storage |
| 14 | context = await browser.new_context( |
| 15 | storage_state=None # Start fresh |
| 16 | ) |
| 17 | |
| 18 | page = await context.new_page() |
| 19 | |
| 20 | # Login |
| 21 | await page.goto('https://app.example.com/login') |
| 22 | await page.fill('#email', 'user@example.com') |
| 23 | await page.fill('#password', 'password123') |
| 24 | await page.click('#login') |
| 25 | |
| 26 | # Wait for login to complete |
| 27 | await page.wait_for_url('**/dashboard**') |
| 28 | |
| 29 | # Save authentication state |
| 30 | await context.storage_state(path='/tmp/auth_state.json') |
| 31 | print("Authentication state saved!") |
| 32 | |
| 33 | # Now you can reuse this state in future sessions |
| 34 | await browser.close() |
| 35 | |
| 36 | asyncio.run(authenticated_session()) |
| 37 | ''' |
| 38 | |
| 39 | sandbox.files.write("/app/auth.py", auth_script) |
| 40 | result = sandbox.commands.run("cd /app && python auth.py") |
| 41 | print(result.stdout) |
| 42 | |
| 43 | # Later, use the saved state |
| 44 | reuse_script = ''' |
| 45 | import asyncio |
| 46 | from playwright.async_api import async_playwright |
| 47 | |
| 48 | async def reuse_session(): |
| 49 | async with async_playwright() as p: |
| 50 | browser = await p.chromium.launch(headless=True) |
| 51 | |
| 52 | # Reuse saved authentication |
| 53 | context = await browser.new_context( |
| 54 | storage_state='/tmp/auth_state.json' |
| 55 | ) |
| 56 | |
| 57 | page = await context.new_page() |
| 58 | await page.goto('https://app.example.com/dashboard') |
| 59 | |
| 60 | # Already logged in! |
| 61 | print(f"Current page: {page.url}") |
| 62 | |
| 63 | await browser.close() |
| 64 | |
| 65 | asyncio.run(reuse_session()) |
| 66 | ''' |
| 67 | |
| 68 | sandbox.files.write("/app/reuse.py", reuse_script) |
| 69 | result = sandbox.commands.run("cd /app && python reuse.py") |
| 70 | print(result.stdout) |
| 71 | |
Best Practices
1. Always Use Headless Mode in Production
| 1 | browser = await p.chromium.launch( |
| 2 | headless=True, |
| 3 | args=['--no-sandbox', '--disable-dev-shm-usage'] |
| 4 | ) |
| 5 | |
2. Implement Proper Timeouts
| 1 | page.set_default_timeout(30000) # 30 seconds |
| 2 | await page.wait_for_selector('.element', timeout=10000) |
| 3 | |
3. Handle Errors Gracefully
| 1 | try: |
| 2 | await page.click('#button') |
| 3 | except playwright.TimeoutError: |
| 4 | print("Button not found, taking screenshot for debugging") |
| 5 | await page.screenshot(path='/tmp/error.png') |
| 6 | |
4. Clean Up Resources
| 1 | sandbox = Sandbox.create(template="desktop") |
| 2 | try: |
| 3 | # ... automation code ... |
| 4 | finally: |
| 5 | sandbox.kill() # Always clean up |
| 6 | |
5. Use Retry Logic for Flaky Operations
| 1 | async def retry_click(page, selector, max_retries=3): |
| 2 | for attempt in range(max_retries): |
| 3 | try: |
| 4 | await page.click(selector) |
| 5 | return True |
| 6 | except Exception as e: |
| 7 | if attempt == max_retries - 1: |
| 8 | raise |
| 9 | await asyncio.sleep(1) |
| 10 | |
Performance Tips
- Reuse sandbox sessions for multiple operations when possible
- Disable images and CSS for faster scraping:
python
1 await page.route('**/*.{png,jpg,jpeg,gif,css}', lambda route: route.abort())2 - Use
page.wait_for_load_state('networkidle')to ensure page is fully loaded - Parallel execution with multiple sandboxes for independent tasks
Conclusion
HopX sandboxes provide the perfect environment for browser automation and RPA:
- Isolated - Each automation runs in its own secure VM
- Scalable - Run hundreds of browsers in parallel
- Consistent - Same environment every time
- Secure - No risk to your local machine
Start automating with HopX today and scale your browser automation to new levels.