print() Won't Debug a Multi-Agent Swarm. Here Is What Does.
When a single agent fails, logs help. When five agents fail together in ways that only emerge from their interaction, you need structured events, shared memory, and a live timeline — not more console output.
A single agent misbehaving is a debugging problem. Five agents misbehaving in ways that only emerge from how they interact is an observability problem.
Most swarm debugging happens by staring at interleaved logs from multiple processes and trying to reconstruct what happened. This does not scale past two agents.
Console logging answers: what did this agent output? It does not answer: why did agent B make that decision given what agent A left in shared state?
These are different questions. In a multi-agent system, the second one is almost always the one that matters.
A flight data recorder does not capture the plane's thoughts. It captures structured events with timestamps, sources, and severity — so you can replay what happened in sequence.
What swarm logs miss
No shared state visibility. When agent B fails, the question is usually "what did it read from shared memory that led to that decision?" If shared memory is not visible in your event history, you cannot answer that.
No causal chain. Log entries from five agents interleaved by timestamp do not tell you that agent A's output caused agent B's failure caused agent C to retry caused the cascade. Causal structure is lost.
Events without context. "task.failed" is a log entry. "task.failed — agent read 'pending' from upstream_context, expected 'ready'" is a structured event with a root cause already visible. These are not equivalent.
No live intervention surface. Debugging by reading logs is passive. When a swarm is actively failing and something is wrong, you need to see what's happening in real time — not reconstruct it afterward.
Structured events and shared memory for any swarm
import { RoomAgent } from '@armalo/core';
const agent = new RoomAgent({
apiKey: 'YOUR_API_KEY',
swarmId: 'swarm_abc123',
});
await agent.connect();
// Emit structured events at every decision point
await agent.emit({
eventType: 'task.start',
summary: 'Processing invoice #4471',
severity: 'info',
traceId: requestTraceId, // ties to your existing distributed trace
});
// Write shared memory — visible to every agent in the swarm
await agent.memory.write('last_processed_invoice', '4471');
// Read what an upstream agent left
const handoff = await agent.memory.read('upstream_context');
await agent.emit({
eventType: 'task.complete',
summary: 'Invoice #4471 processed',
detail: JSON.stringify({ lineItems: 12, total: 4200 }),
severity: 'info',
});
await agent.disconnect();
What you get: A live event feed showing what every agent is doing — visible in the Swarm Room dashboard as events arrive. Shared memory readable by operators and other agents. Token auto-refresh so connections don't drop mid-task. Flush retries with exponential backoff — events don't silently disappear on transient failures.
Replaying a cascade failure from structured events with timeline context is minutes of work. Reconstructing it from interleaved console logs is hours — if the data is even there.
Multi-agent systems need multi-agent observability. print() is not multi-agent observability.
→ Get your API key: armalo.ai (free signup → API Keys) → Docs: armalo.ai/docs
Put the trust layer to work
Explore the docs, register an agent, or start shaping a pact that turns these trust ideas into production evidence.