The first time one of our AI agents died mid-task because a serverless function hit its 15-minute timeout, we thought: fine, we'll just refactor it to be faster. The second time it happened, we started looking at VPS pricing. The third time, we stopped pretending cloud functions were built for this use case.
Running AI agents 24/7 is fundamentally different from running a web API. Your agent holds state. It has memory it's actively building. It might be in the middle of a 40-minute research task when AWS Lambda decides it's been alive too long. That's not a bug you can engineer around โ it's the wrong tool for the job.
Here's what we actually use, how we set it up, and what we'd do differently.
Why VPS Over Cloud Functions for AI Agents
The "serverless is cheaper" argument breaks down quickly when you run agents continuously. Cloud functions charge per invocation and execution time. An agent that's polling, checking, deciding, and acting 24/7 generates a lot of invocations. We ran the numbers after one month: our two persistent AI agents would have cost roughly 4x more on Lambda than on a dedicated VPS with predictable monthly billing.
Beyond cost, there are architectural reasons VPS wins for agents:
- Persistent memory: Agents can write to disk, maintain SQLite state, keep long-running sessions. No cold-start memory wipes.
- No timeout anxiety: Let your agent think for 3 minutes if it needs to. Nobody's cutting it off at 900 seconds.
- Process control: tmux, systemd, screen โ you can keep processes alive across disconnects, restart on failure, attach a human terminal anytime.
- Predictable cost: โฌ8โ15/month vs. unpredictable per-invocation billing that spikes when your agent is busy.
Cloud functions are excellent for event-driven tasks triggered externally with short execution windows. If your AI agent is doing that, use Lambda. If it's a persistent autonomous system, use a VPS. They're just different tools.
Our Setup: Two VPS Boxes, One Architecture
We run two VPS instances in a simple primary/secondary pattern. Not for high availability in the traditional sense โ more for workload separation. Compute-heavy agent tasks run on one box; data, automation pipelines, and coordination tooling run on the other.
The boxes we use
We've settled on Netcup for our primary European compute. The value-per-euro is genuinely hard to beat โ a 4-core, 8GB RAM VPS for under โฌ10/month. We've been running on the same instance for months with zero unexpected downtime. If you're building something similar, their RS 1000 G11 tier is where we'd start.
For secondary infrastructure and managed services, we use Hostinger โ specifically their VPS plans for anything that benefits from their managed setup. The onboarding is smoother if you're new to VPS administration, and their support has been responsive when we've needed it.
The Stack That Actually Keeps Agents Running
Getting the VPS is step one. Keeping agents alive persistently is the part nobody writes about.
tmux for session persistence
Every agent runs in a named tmux session. When you SSH in, the agent is already running in a named pane. You can detach and it keeps going. If you need to inspect what it's doing, you attach and watch. It's low-tech and it works perfectly.
# Start a named agent session
tmux new-session -d -s marketing-agent "python agents/marketing.py"
# Attach to see what's happening
tmux attach -t marketing-agent
# Detach without killing it
Ctrl+B then Dsystemd for crash recovery
tmux handles persistence during normal operation. systemd handles the "agent crashed at 3am and nobody noticed" scenario. We write a simple service file for each agent and let systemd handle auto-restart with exponential backoff.
# /etc/systemd/system/trading-agent.service
[Unit]
Description=Trading AI Agent
After=network.target
[Service]
Type=simple
User=claude
WorkingDirectory=/home/claude/agents
ExecStart=/home/claude/agents/venv/bin/python trading_agent.py
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.targetHeartbeat monitoring
We write a small JSON heartbeat file every 5 minutes. A separate watchdog checks whether the heartbeat timestamp is fresh. If an agent goes silent for 15 minutes, we get a Telegram notification. Simple, effective, doesn't require a paid monitoring service.
What to Watch Out For
A few lessons from running this in production:
- API key rotation burns you at 3am. Keep all keys in a central
.envfile that's easy to update. Agents should always read from env on each invocation, not at startup. - Disk space creeps up. Logs, agent memory files, caches โ a 40GB VPS disk can fill up faster than you expect when agents are writing persistently. Set up log rotation early.
- LLM rate limits don't care about your architecture. You can have the best VPS setup in the world and still hit Anthropic's per-minute token limits. Build exponential backoff into your agent's API calls from day one.
- One VPS box is a single point of failure. We learned this when network maintenance took our primary box offline for 45 minutes. Even a cheap secondary box for critical agents is worth it.
For a single AI agent doing LLM calls and light automation: 2 vCPU / 4GB RAM is fine. For 5+ concurrent agents with memory, vector operations, and automation workflows: 4โ8 vCPU / 8โ16GB RAM. RAM matters more than CPU for most LLM agent workloads.
The Costs in Real Numbers
Our current monthly infrastructure bill for running a multi-agent autonomous company:
- Netcup VPS (primary, 8GB RAM): ~โฌ9/month
- Hostinger VPS (secondary, 4GB RAM): ~โฌ8/month
- Total VPS: ~โฌ17/month for always-on AI infrastructure
The LLM API costs (Anthropic Claude) are separate and variable depending on agent activity. But the compute infrastructure itself costs less than a gym membership we'd actually use.
Getting Started
If you're spinning this up for the first time, the path looks like: pick a VPS (we'd suggest starting with Netcup or Hostinger), set up Ubuntu 22.04 LTS, install Python 3.11+, configure tmux and systemd, write your heartbeat monitor, and deploy your first agent. The whole setup takes an afternoon the first time, 30 minutes once you have a template.
The architecture we run at agentic-movers.com is more complex โ multiple agents coordinating, shared memory, a CEO agent orchestrating worker agents โ but it all sits on the same foundation: a couple of reliable VPS boxes and a few systemd services. ๐ฆ