DevOps & Monitoring 2. Mai 2026 12 min Lesezeit

Claude Code Monitoring & Observability: AI Agents in Produktion überwachen

Ein Agent der läuft ist gut. Ein Agent dessen Kosten, Fehler und Performance du live siehst ist besser. Hier ist die vollständige Observability-Strategie für Claude Code Agents — von einfachem File-Logging bis zum Dashboard.

Warum Monitoring für AI Agents anders ist

Traditionelles App-Monitoring trackt CPU, RAM, HTTP-Status-Codes. Das reicht für AI Agents nicht. Du musst zusätzlich überwachen: Token-Verbrauch (direkte Kosten), Model-Latenz (User Experience), Tool-Call-Erfolgsrate (funktioniert der Agent?) und Prompt-Qualität (halluziniert er?).

Ohne Monitoring lernst du erst beim Monatsende aus der Abrechnung, dass dein Agent 10x teurer war als geplant. Oder noch schlimmer: du merkst erst Tage später, dass er stille Fehler produziert hat.

Die 4 Monitoring-Dimensionen

💰Kosten-Tracking

Der wichtigste Metric. Claude Code Agents verbrauchen Tokens — und Tokens kosten Geld. Tracke immer: Input-Tokens, Output-Tokens, Cache-Hits (günstiger), Model (Haiku vs Sonnet vs Opus).

# Token-Kosten pro Request loggen (Node.js)
const COST_PER_TOKEN = {
  'claude-haiku-4-5': { input: 0.00000025, output: 0.00000125 },
  'claude-sonnet-4-6': { input: 0.000003, output: 0.000015 },
  'claude-opus-4-6': { input: 0.000015, output: 0.000075 }
};

function logUsage(response, taskName) {
  const { input_tokens, output_tokens, model } = response.usage;
  const rates = COST_PER_TOKEN[model];
  const cost = (input_tokens * rates.input) + (output_tokens * rates.output);

  fs.appendFileSync('/logs/agent-costs.jsonl', JSON.stringify({
    ts: Date.now(),
    task: taskName,
    model,
    input_tokens,
    output_tokens,
    cost_usd: cost.toFixed(6)
  }) + '\n');

  // Tages-Budget-Check
  if (getDailyCost() > DAILY_BUDGET_USD) {
    alertHighCost(cost);
  }
}

⏱️Latenz & Performance

Wie lange braucht jeder Agent-Schritt? Besonders wichtig bei interaktiven Anwendungen. Tracke: Time-to-First-Token (TTFT), Total Response Time, Tool-Call-Dauer.

# Latenz-Wrapper für jeden API-Call
async function timedApiCall(fn, label) {
  const start = Date.now();
  try {
    const result = await fn();
    logMetric({ label, duration_ms: Date.now() - start, status: 'ok' });
    return result;
  } catch (err) {
    logMetric({ label, duration_ms: Date.now() - start, status: 'error', error: err.message });
    throw err;
  }
}

# Nutzung:
const response = await timedApiCall(
  () => claude.messages.create({...}),
  'strategy-analysis'
);

🔧Tool-Call-Erfolgsrate

Welche Tools ruft der Agent auf? Welche schlagen fehl? Eine hohe Fehlerrate bei einem bestimmten Tool zeigt einen Bug im Tool oder eine schlechte Prompt-Instruktion.

# Tool-Calls instrumentieren
function instrumentedTool(name, fn) {
  return async (...args) => {
    const callId = crypto.randomUUID();
    const start = Date.now();

    logger.info({ event: 'tool_call_start', tool: name, callId });

    try {
      const result = await fn(...args);
      logger.info({
        event: 'tool_call_success', tool: name,
        callId, duration_ms: Date.now() - start
      });
      return result;
    } catch (err) {
      logger.error({
        event: 'tool_call_error', tool: name,
        callId, error: err.message, duration_ms: Date.now() - start
      });
      throw err;
    }
  };
}

# Alle Tools wrappen:
tools.readFile = instrumentedTool('readFile', tools.readFile);

🧠Output-Qualität

Schwierigster Metric: Liefert der Agent das richtige Ergebnis? Basics: Länge (zu kurz = truncated?), Format-Compliance (valides JSON zurückgegeben?), Konfidenz-Score wenn verfügbar.

# Strukturierten Output validieren
function validateAgentOutput(output, schema) {
  const issues = [];

  // Längen-Check
  if (output.length < 50) issues.push('OUTPUT_TOO_SHORT');
  if (output.length > 50000) issues.push('OUTPUT_TOO_LONG');

  // Schema-Validation wenn erwartet
  if (schema) {
    try {
      const parsed = JSON.parse(output);
      schema.parse(parsed); // Zod schema
    } catch (e) {
      issues.push(`SCHEMA_VIOLATION: ${e.message}`);
    }
  }

  if (issues.length > 0) {
    logger.warn({ event: 'output_quality_issue', issues });
  }

  return issues.length === 0;
}

Structured Logging mit JSONL

Das wichtigste Logging-Format für Agents: JSONL (JSON Lines). Jede Zeile ist ein valides JSON — einfach anzuhängen, leicht abzufragen, kompatibel mit allen Log-Aggregations-Tools.

# Vollständiges Agent-Log-Format
{
  "ts": 1746173400000,
  "session_id": "sess_abc123",
  "agent": "strategy-agent",
  "event": "task_complete",
  "task": "weekly-report",
  "duration_ms": 12450,
  "tokens": { "input": 4200, "output": 1800, "cache_read": 8000 },
  "cost_usd": "0.0392",
  "model": "claude-sonnet-4-6",
  "tools_called": ["readFile", "writeFile", "searchWeb"],
  "tool_errors": 0,
  "output_bytes": 4200,
  "status": "ok"
}

# Abfragen mit jq:
cat /logs/agent.jsonl | jq 'select(.event == "task_complete") | .cost_usd' | \
  awk '{sum += $1} END {print "Total: $" sum}'

# Fehler der letzten Stunde:
cat /logs/agent.jsonl | jq 'select(.status == "error" and .ts > now - 3600)'

Alert-System: Was wann eskalieren

Alert	Schwelle	Severity	Aktion
Tageskosten	> 2× Budget	P0	Agent stoppen, sofort benachrichtigen
API-Fehlerrate	> 10% in 5 Min	P0	Retry-Limit, Slack-Alert
Latenz p95	> 30s	P1	Log + prüfen, kein Auto-Stop
Tool-Fehlerrate	> 20% für ein Tool	P1	Telegram-Nachricht, manuell prüfen
Kein Output	> 15 Min Pause	P1	Heartbeat-Check, ggf. Restart
Schema-Violations	> 5 in einer Stunde	P2	Log-Analyse, Prompt reviewen

# Alert-Dispatcher (Node.js)
async function sendAlert({ severity, title, details }) {
  const msg = `[${severity}] ${title}\n\n${details}`;

  if (severity === 'P0') {
    // Telegram für sofortige Eskalation
    await telegram.sendMessage({ chat_id: CHAT_ID, text: msg });
    // Zusätzlich: Agent pausieren
    await pauseAgent();
  } else if (severity === 'P1') {
    // Nur Telegram, kein Auto-Stop
    await telegram.sendMessage({ chat_id: CHAT_ID, text: msg });
  } else {
    // P2: nur ins Log
    logger.warn({ alert: title, details });
  }
}

Simple Monitoring-Dashboard

Du brauchst kein Grafana. Ein minimales HTML-Dashboard das /logs/agent.jsonl auswertet reicht für den Start:

Agent Monitoring Dashboard — Live Preview

$2.34

Heute (Budget: $5)

98.2%

Success Rate (24h)

4.2s

Ø Latenz p50

142

Tasks (heute)

# Express-Server für Mini-Dashboard (dashboard.js)
const express = require('express');
const app = express();

app.get('/metrics', (req, res) => {
  const logs = fs.readFileSync('/logs/agent.jsonl', 'utf8')
    .split('\n').filter(Boolean).map(JSON.parse);

  const today = logs.filter(l => l.ts > Date.now() - 86400000);

  res.json({
    total_cost_today: today.reduce((s, l) => s + parseFloat(l.cost_usd || 0), 0).toFixed(4),
    success_rate: today.filter(l => l.status === 'ok').length / today.length,
    avg_latency_ms: today.reduce((s, l) => s + (l.duration_ms || 0), 0) / today.length,
    task_count: today.length
  });
});

app.listen(3001);
console.log('Dashboard: http://localhost:3001/metrics');

Kosten unter Kontrolle halten

Tipp: Budget-Limiter einbauen. Setze einen harten Tages-Stop im Code — kein externes Monitoring kann so schnell reagieren wie eine im Agent eingebaute Kostenbremse.

# Budget-Guard (lädt täglich zurück)
class BudgetGuard {
  constructor(dailyLimitUsd) {
    this.limit = dailyLimitUsd;
    this.spent = 0;
    this.resetAt = this.getTomorrowMidnight();
  }

  check(estimatedCost) {
    if (Date.now() > this.resetAt) {
      this.spent = 0;
      this.resetAt = this.getTomorrowMidnight();
    }
    if (this.spent + estimatedCost > this.limit) {
      throw new Error(
        `Budget exceeded: $${this.spent.toFixed(2)} of $${this.limit} used today`
      );
    }
    this.spent += estimatedCost;
  }
}

# VOR jedem API-Call:
budget.check(estimateCost(inputTokens, model));

Langzeit-Analyse mit SQLite

JSONL ist gut für kurzfristiges Monitoring. Für Trend-Analysen über Wochen/Monate: JSONL täglich in SQLite importieren.

# JSONL → SQLite (täglich per Cron)
import sqlite3, json

def import_logs(jsonl_path, db_path):
    conn = sqlite3.connect(db_path)
    conn.execute('''CREATE TABLE IF NOT EXISTS agent_logs (
        ts INTEGER, agent TEXT, task TEXT,
        cost_usd REAL, tokens_total INTEGER,
        duration_ms INTEGER, status TEXT
    )''')

    with open(jsonl_path) as f:
        for line in f:
            log = json.loads(line)
            conn.execute('INSERT INTO agent_logs VALUES (?,?,?,?,?,?,?)', [
                log['ts'], log['agent'], log.get('task'),
                float(log.get('cost_usd', 0)),
                log.get('tokens', {}).get('input', 0) + log.get('tokens', {}).get('output', 0),
                log.get('duration_ms'), log.get('status')
            ])
    conn.commit()

# Wöchentliche Kostenentwicklung abfragen:
# SELECT strftime('%Y-W%W', ts/1000, 'unixepoch') week, SUM(cost_usd) FROM agent_logs GROUP BY week

Vorsicht: Keine Secrets in Logs! Niemals API Keys, Passwörter oder PII in Agent-Logs schreiben. Truncate lange Tool-Inputs auf 200 Zeichen, maskiere Patterns wie sk-ant-... oder Bearer ....

Integration in bestehende Monitoring-Stacks

Wenn du bereits Grafana, Datadog oder ähnliches nutzt: JSONL-Logs lassen sich einfach integrieren.

# Prometheus-Metrics-Endpoint (für Grafana)
const client = require('prom-client');

const agentCost = new client.Counter({
  name: 'agent_cost_usd_total',
  help: 'Total cost in USD',
  labelNames: ['agent', 'model']
});

const taskDuration = new client.Histogram({
  name: 'agent_task_duration_ms',
  help: 'Task duration in milliseconds',
  buckets: [1000, 5000, 15000, 30000, 60000]
});

# Nach jedem Task:
agentCost.inc({ agent: 'strategy', model }, cost);
taskDuration.observe(duration_ms);

Monitoring-Templates im Kurs

Im Claude Code Mastery Kurs findest du fertige Monitoring-Templates: Budget-Guard, JSONL-Logger, Mini-Dashboard und Alerting für Telegram — direkt einsetzbar für deinen Agent-Stack.

14 Tage kostenlos testen →