OAuth Token Lifecycle Management for AI Agents: Refresh, Revocation, and Rotation
OAuth flows designed for human users break for AI agents. This guide covers client credentials flow, refresh token rotation, revocation in distributed agent systems, PKCE adaptations, device authorization flow for headless agents, and RFC 9068 JWT access tokens.
OAuth Token Lifecycle Management for AI Agents: Refresh, Revocation, and Rotation
The OAuth 2.0 specification was designed in 2012 with human users in mind. The authorization code flow assumes a user is present at a browser, can consent to permissions, and can perform redirects. Refresh tokens were designed to work with a single client application — not a fleet of 500 agents that all need to authenticate to the same service simultaneously.
AI agent systems strain every assumption OAuth was built on: there's no human at the keyboard, there may be thousands of concurrent agent instances, sessions may last days rather than minutes, and the "client application" boundary that OAuth was designed around doesn't map cleanly to a multi-agent architecture where orchestrators spawn sub-agents that spawn their own sub-agents.
This guide addresses the specific OAuth implementation challenges that arise in AI agent systems, covering every major OAuth flow type, the token lifecycle events that require careful handling (refresh, revocation, rotation), and the adaptations needed to make OAuth work reliably in agent environments that its original designers never anticipated.
TL;DR
- Client credentials flow is the correct OAuth flow for agent-to-service authentication when the agent acts as itself (not on behalf of a user) — but it requires careful scope management to avoid granting agents more access than individual tasks need.
- Refresh token rotation (RFC 6749 and RFC 7009) is the mechanism for maintaining long-lived agent sessions without long-lived access tokens, but distributed agent systems must implement single-use refresh token coordination to prevent race conditions.
- Token revocation in distributed systems requires active propagation — relying on token expiration alone means a compromised token is valid for up to the access token's lifetime (typically 1 hour).
- PKCE (RFC 7636) was designed to protect authorization code flows against interception; its principles apply to agent authentication contexts where the "code verifier" can be derived from agent identity material.
- RFC 9068 defines a standardized JWT format for access tokens that includes fields agents can use for authorization decisions and audit trail generation.
- Armalo's inter-agent authentication uses RFC 9068-compatible JWTs signed with keys registered in agents' behavioral pacts, enabling cryptographic verification of token authenticity at the agent trust layer.
The OAuth Flow Selection Problem for AI Agents
OAuth 2.0 defines four authorization grant types (and many extensions). Choosing the wrong one for agent workloads leads to security failures, operational complexity, and reliability problems.
Authorization Code Flow: Wrong for Agents
Authorization code flow (RFC 6749 Section 4.1) requires a human user to authenticate, grant consent, and be redirected back to the client. For AI agents operating autonomously — processing invoices at 3am, monitoring data pipelines continuously, responding to events without human oversight — there is no human to perform the authorization interaction.
Some teams attempt to work around this by performing the authorization code flow once during setup (human grants consent), then storing the resulting refresh token for the agent to use indefinitely. This approach has serious problems: the refresh token is a long-lived, high-value credential that can be used to impersonate the user across any service they've authorized; the token is tied to a specific user's identity rather than the agent's identity; and if the user's account is deprovisioned, the agent's refresh token is immediately revoked, causing silent operational failures.
Verdict: Avoid authorization code flow for AI agents except in specific user-delegation scenarios (where the agent genuinely needs to act as a specific user and that user has explicitly consented to ongoing delegation).
Client Credentials Flow: The Right Choice for Service-to-Service Agent Authentication
Client credentials flow (RFC 6749 Section 4.4) is designed for machine-to-machine authentication where the client acts in its own name, not on behalf of a user. The client authenticates with its client ID and client secret, receives an access token scoped to the requested permissions, and uses that token for service calls. No user interaction required.
For AI agents authenticating to services on behalf of an organization (not a specific user), client credentials flow is the appropriate choice. The organization registers the agent as an OAuth client, grants the client the scopes it needs, and the agent authenticates autonomously.
Client credentials flow implementation for agents:
class AgentOAuthClient {
private tokenCache: Map<string, CachedToken> = new Map();
async getAccessToken(scopes: string[]): Promise<string> {
const scopeKey = scopes.sort().join(' ');
const cached = this.tokenCache.get(scopeKey);
// Return cached token if it has >20% of lifetime remaining
if (cached && (cached.expiresAt - Date.now()) > (cached.totalLifetimeMs * 0.2)) {
return cached.token;
}
const response = await fetch(this.tokenEndpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: new URLSearchParams({
grant_type: 'client_credentials',
client_id: this.clientId,
client_secret: this.clientSecret,
scope: scopes.join(' ')
})
});
if (!response.ok) {
throw new OAuthError(`Token request failed: ${response.status}`);
}
const tokenResponse = await response.json();
const expiresAt = Date.now() + (tokenResponse.expires_in * 1000);
this.tokenCache.set(scopeKey, {
token: tokenResponse.access_token,
expiresAt,
totalLifetimeMs: tokenResponse.expires_in * 1000
});
auditLogger.logTokenAcquired({
agentId: this.agentId,
scopes,
expiresAt: new Date(expiresAt).toISOString(),
tokenFingerprint: sha256(tokenResponse.access_token).substring(0, 16)
});
return tokenResponse.access_token;
}
}
Scope minimization: Client credentials flow tokens should be requested with the minimum scope needed for the current operation. Don't request all scopes at startup — request per-operation or per-tool. Implement separate OAuth clients (separate client_id) for different agent types with different permission requirements.
Refresh Token Rotation: The Security Mechanism That Breaks in Agent Fleets
Refresh token rotation (sometimes called "automatic refresh token rotation" or "refresh token family") is a security mechanism where every time a refresh token is used to obtain a new access token, the server also issues a new refresh token and invalidates the old one. If an attacker steals and uses a refresh token, the legitimate client's attempt to use the same token will fail (and may trigger revocation of the entire token family), alerting the server to the theft.
This security mechanism, while valuable for human users with single-client applications, creates serious problems in agent fleets:
Race condition on concurrent refresh: Two agent instances simultaneously detect that the access token has expired. Both attempt to refresh using the same refresh token. The first request succeeds; the server marks the refresh token as used and issues a new one. The second request arrives with the now-invalidated old refresh token — which the server treats as a potential theft indicator, potentially revoking the entire token family.
The correct solution for agent fleets is centralized refresh token management:
class CentralizedRefreshTokenManager {
private refreshLock: Map<string, Promise<TokenSet>> = new Map();
async getAccessToken(agentClass: string): Promise<string> {
const tokenSet = await this.getOrRefreshTokenSet(agentClass);
// If access token is still valid, return it
if (tokenSet.accessTokenExpiresAt > Date.now() + 30_000) {
return tokenSet.accessToken;
}
// Need to refresh — serialize to prevent concurrent refresh
return (await this.serializedRefresh(agentClass)).accessToken;
}
private serializedRefresh(agentClass: string): Promise<TokenSet> {
// If a refresh is already in progress, join it
const existing = this.refreshLock.get(agentClass);
if (existing) return existing;
const refreshPromise = this.doRefresh(agentClass)
.finally(() => this.refreshLock.delete(agentClass));
this.refreshLock.set(agentClass, refreshPromise);
return refreshPromise;
}
private async doRefresh(agentClass: string): Promise<TokenSet> {
const currentTokenSet = await this.storage.getTokenSet(agentClass);
const response = await fetch(this.tokenEndpoint, {
method: 'POST',
body: new URLSearchParams({
grant_type: 'refresh_token',
refresh_token: currentTokenSet.refreshToken,
client_id: this.clientId
})
});
if (!response.ok) {
// Refresh failed — may need re-authorization
await this.alertOnRefreshFailure(agentClass, response.status);
throw new OAuthRefreshError(response.status);
}
const newTokenSet = await response.json();
// Atomically store new token set (overwrites old refresh token)
await this.storage.updateTokenSet(agentClass, {
accessToken: newTokenSet.access_token,
refreshToken: newTokenSet.refresh_token, // New refresh token from rotation
accessTokenExpiresAt: Date.now() + (newTokenSet.expires_in * 1000)
});
return this.storage.getTokenSet(agentClass);
}
}
This centralized manager ensures that only one agent instance ever attempts to use the refresh token at a time. All other instances wait for the ongoing refresh to complete and share its result.
For large agent fleets (10,000+ instances), the centralized manager itself must be distributed and highly available. Use Redis with distributed locking (SET key value NX PX ttl) to implement the serialized refresh across multiple manager instances.
Device Authorization Flow for Headless Agent Setup
The device authorization flow (RFC 8628) was designed for devices without a browser or keyboard (smart TVs, IoT devices) that need to be authorized by a human. It's relevant for AI agent setup: the device authorization flow allows a human administrator to authorize an agent by visiting a URL on a separate device, without the agent needing to handle browser redirects.
For AI agents that need initial authorization to act on behalf of a user (where client credentials flow is insufficient because user delegation is needed), device authorization flow provides the cleanest setup experience:
- Agent requests a device code from the authorization server
- Agent displays the user code and verification URL (or sends them to the administrator's dashboard)
- Human administrator opens the verification URL, enters the user code, and authorizes the agent
- Agent polls the authorization server until the human has completed authorization
- Agent receives refresh token and stores it securely
This flow is appropriate for initial authorization; the subsequent token lifecycle uses refresh token management.
RFC 9068: Standardized JWT Access Tokens for Agent Authorization
RFC 9068 (JSON Web Token Profile for OAuth 2.0 Access Tokens) defines a standardized JWT format for access tokens, including specific claims for agent authorization and audit purposes. Before RFC 9068, access token formats were opaque and non-standardized — a token from one provider couldn't be introspected or used for authorization decisions by another service without calling the token introspection endpoint.
JWT Structure for Agent Tokens
A RFC 9068-compliant access token contains:
{
"iss": "https://auth.example.com", // Token issuer
"sub": "agent:invoice-processor-v2", // Agent identity
"aud": ["https://api.supplier.com"], // Intended resource server(s)
"iat": 1715000000, // Issue time
"exp": 1715003600, // Expiry (1 hour after issue)
"jti": "a1b2c3d4-e5f6-...", // Unique token ID (for revocation)
"client_id": "agent-invoice-processor", // OAuth client ID
"scope": "invoices.read vendor.read", // Granted scopes
"authorization_details": [{ // RFC 9396 rich authorization
"type": "customer_data_access",
"customer_id": "cust_12345",
"access_level": "read"
}]
}
For AI agent systems, the JWT access token format provides several operational advantages:
Self-contained authorization: Resource servers can validate the JWT signature and extract authorization information without calling the token introspection endpoint. This reduces latency and eliminates the token introspection service as a single point of failure.
Agent identity in audit trails: The sub claim identifies the specific agent, and the client_id identifies the agent class. This appears directly in downstream service logs, enabling correlation between service access logs and agent activity logs without requiring token introspection.
Authorization details for fine-grained control: RFC 9396 (Rich Authorization Requests) extends JWT access tokens with structured authorization details. For AI agents, this enables tokens that carry not just scopes (coarse-grained) but specific resource identifiers and access levels (fine-grained), enabling resource servers to make precise authorization decisions without maintaining their own permission tables.
JWT Validation for Agent-to-Agent Communication
When agents communicate with each other using JWT bearer tokens, the receiving agent must validate the token before processing the request. RFC 9068 validation steps for agents:
- Verify the JWT signature using the issuer's public key (obtained from the issuer's JWKS endpoint)
- Verify that
audincludes the receiving agent's identifier - Verify that
expis in the future (with a small clock skew allowance, e.g., 30 seconds) - Verify that
iatis in the past (prevents pre-issued tokens from being used too early) - Check the
jtiagainst a revocation list (or maintain a seen-JTI set for recently issued tokens) - Verify that the scopes or authorization_details in the token are sufficient for the requested operation
Step 5 (jti revocation checking) is the most operationally complex. For high-volume agent systems, maintaining a distributed revocation list and checking every token against it adds latency. A practical compromise: use short token lifetimes (5-15 minutes) and only check the revocation list for tokens issued more than 1 minute ago (tokens issued in the last minute are unlikely to have been revoked).
Token Revocation: The Distributed Systems Problem
Token revocation (RFC 7009) allows a client to notify the authorization server that a token should be considered invalid. For human users with a single client application, revocation is straightforward: the client calls the revocation endpoint, the server marks the token as invalid, and subsequent token introspection or JWT validation fails.
For distributed AI agent systems, revocation is a distributed systems problem: there's no instantaneous global state update that makes a revoked token immediately invalid across all services that might receive it.
The Revocation Propagation Problem
Consider a scenario: an AI agent is compromised. The security team revokes the agent's access tokens and refresh tokens at the authorization server. But the agent is currently mid-execution and has already obtained access tokens for a dozen different resource servers. Those resource servers have already validated the JWT signatures and don't call the introspection endpoint on every request — they rely on the expiry time in the JWT.
The revoked tokens remain valid at every resource server until they expire. If the access token lifetime is 1 hour, the compromised agent has up to 1 hour of continued access across all its authorized resource servers after revocation.
Mitigation Strategy 1: Short Token Lifetimes
The simplest mitigation: use very short access token lifetimes (5-15 minutes) for agent tokens. A 5-minute token that's been revoked is invalid within 5 minutes of revocation — acceptable for most incident response timelines.
The cost: more frequent token refresh calls. At 1,000 agents each refreshing a 5-minute token, that's ~3.3 refreshes per agent per minute = 3,300 token requests per minute = 55 requests per second. This is manageable but significantly higher than with 1-hour tokens.
Mitigation Strategy 2: Token Introspection with Shared Caches
Resource servers that need to honor revocation faster than the token expiry allows should call the token introspection endpoint (RFC 7662) to verify token validity on each request. To mitigate the latency of introspection, cache introspection results for a short TTL (1-2 minutes) in a shared cache.
With a 2-minute introspection cache, effective revocation latency is 2 minutes regardless of the access token's declared lifetime — a significant improvement over the worst-case scenario without introspection.
Mitigation Strategy 3: Revocation Lists in JWKS
A non-standard but practical approach: publish revoked JTI values in the authorization server's JWKS response (or in a separate revocation feed). Resource servers that parse the JWKS periodically (e.g., every 5 minutes for public key updates) can also consume the revocation feed on the same schedule.
This approach trades exact revocation guarantees for operational simplicity — a revoked token may be used for up to the JWKS poll interval (typically 5 minutes) after revocation, but no token introspection calls are required.
Proof Key for Code Exchange (PKCE) Adaptations for Agent Authentication
PKCE (RFC 7636) was designed to protect authorization code flows against CSRF and authorization code interception attacks. Its relevance to AI agents lies in its underlying mechanism — a client-side proof of possession — which can be adapted to provide similar protections for agent authentication flows.
PKCE Mechanics
In standard PKCE:
- The client generates a
code_verifier(random 43-128 character string) - The client computes
code_challenge = BASE64URL(SHA256(code_verifier)) - The client sends
code_challengein the authorization request - After receiving the authorization code, the client sends both the code and the
code_verifierto the token endpoint - The server verifies that SHA256(
code_verifier) == the storedcode_challenge
This proves that the client that receives the tokens is the same client that initiated the authorization flow — preventing authorization code injection attacks.
PKCE-Inspired Proof for Agent Token Requests
For agent authentication scenarios where the agent's identity needs to be proven without a user authorization flow, a PKCE-inspired proof can be derived from the agent's hardware identity or TPM-bound key:
- Agent generates an ephemeral key pair at task start
- Public key is registered with the authorization server as a token binding
- Token requests include a DPoP proof (RFC 9449) demonstrating possession of the corresponding private key
- The private key is ephemeral — exists only for the task duration
This provides agent authenticity guarantees beyond what client credentials alone provide: even if the client secret is stolen, the attacker cannot produce valid DPoP proofs without the private key.
Armalo's Inter-Agent OAuth Implementation
Armalo uses RFC 9068-compatible JWT access tokens for inter-agent authentication. When an orchestrator agent delegates a task to a sub-agent, the orchestrator issues a JWT access token scoped to the sub-task:
iss: The orchestrator agent's Armalo-registered identifiersub: The sub-agent's Armalo-registered identifieraud: The services the sub-agent is authorized to call for this taskscope: Exactly the scopes needed for the sub-taskauthorization_details: Specific resources and access levels- Armalo-custom claims:
pact_id(the behavioral pact authorizing this delegation),task_id(for audit correlation),delegation_depth(to prevent unlimited delegation chains)
Orchestrators sign these delegation tokens with keys registered in their behavioral pacts. Sub-agents validate delegation token signatures against the orchestrator's registered public keys (available via Armalo's JWKS endpoint). This creates a verifiable chain of authority: any resource server receiving a request from a sub-agent can trace the authorization chain back to the original orchestrator's pact and the behavioral commitments that pact carries.
The trust oracle at /api/v1/trust/ exposes OAuth implementation quality metrics for registered agents, including token lifetime configuration, revocation mechanism support, and scope minimization practices — giving enterprise buyers visibility into the credential hygiene practices of agents they're considering deploying.
Token Storage Security: Where Agent Tokens Live Matters as Much as How They're Issued
The OAuth lifecycle doesn't end when a token is issued — the token must be stored somewhere until it's used. How and where tokens are stored determines whether a perfect issuance architecture is undone by a simple storage vulnerability.
The Token Storage Threat Model for AI Agents
AI agent processes face a different token storage threat model than browser-based applications or native mobile apps:
Process memory: Agent tokens stored in process memory are vulnerable to memory dump attacks. In containerized environments, a container escape gives an attacker access to the process's heap, where tokens may reside as strings or objects. This risk is mitigated but not eliminated by container runtime security controls (seccomp, AppArmor, gVisor).
Environment variables: Environment variable storage (ANTHROPIC_API_KEY=... in the agent's environment) is the most common token storage pattern and one of the least secure. Environment variables are visible to all processes running as the same user, logged by many container runtimes (particularly at startup), and often captured in crash dump reports. For short-lived tokens, environment variable storage is acceptable; for long-lived credentials, it's a known anti-pattern.
File system: Token files on the container's writable layer persist across process restarts but are accessible to any process with file system read access in the same container. Files with proper permissions (chmod 600, owned by the agent user) are significantly more secure than world-readable files, but still vulnerable to container escape attacks.
Encrypted vault with runtime decryption: The most secure pattern — tokens are encrypted at rest in a vault, decrypted in memory only when needed for an API call, and not persisted in any storage layer. This requires vault integration code in the agent runtime, but the security improvement is substantial.
Hardware security modules (HSMs) or TPMs: For the highest-security agent deployments, long-lived private keys (used for token signing or client certificate authentication) should be stored in HSMs or TPMs. The private key never leaves the HSM; the agent requests signing operations through the HSM interface. This completely prevents key extraction attacks.
The Runtime Token Lifecycle
For short-lived access tokens (the preferred pattern), the storage concerns are less acute — a 5-minute token is worthless to an attacker who extracts it after 6 minutes. But the client credential (client_id + client_secret) that generates those tokens is long-lived and must be secured:
class SecureTokenProvider {
private readonly vaultClient: VaultClient;
private readonly tokenCache: Map<string, {token: string, expiresAt: number}> = new Map();
constructor(private readonly clientCredentialPath: string) {}
async getAccessToken(scope: string): Promise<string> {
const cacheKey = scope;
const cached = this.tokenCache.get(cacheKey);
if (cached && cached.expiresAt > Date.now() + 30_000) {
return cached.token; // Use cached if >30 seconds remaining
}
// Retrieve client credential from vault (decrypted only for this call)
const clientCredential = await this.vaultClient.readSecret(this.clientCredentialPath);
// Exchange for access token
const tokenResponse = await this.requestClientCredentialsToken({
clientId: clientCredential.client_id,
clientSecret: clientCredential.client_secret,
scope,
});
// Cache the access token; never cache the client credential beyond this scope
this.tokenCache.set(cacheKey, {
token: tokenResponse.access_token,
expiresAt: Date.now() + tokenResponse.expires_in * 1000,
});
// Explicitly clear the client credential reference
clientCredential.client_secret = ''; // Encourage GC
return tokenResponse.access_token;
}
}
The pattern: retrieve the long-lived client credential from the vault for exactly the duration of the token exchange, then immediately clear the reference. The access token (short-lived) is cached in process memory — acceptable given its short lifetime. The client credential never persists in memory longer than the vault retrieval + token exchange duration.
Authorization Server Selection for Multi-Agent Systems
The choice of authorization server architecture has long-term implications for the scalability and security of the agent fleet's OAuth infrastructure. Three primary patterns apply:
Pattern 1: Shared Third-Party Authorization Server
All agents in the fleet authenticate through a single third-party authorization server (Auth0, Okta, Keycloak, Entra ID). Agents are registered as machine-to-machine applications with appropriate scopes.
Advantages: Minimal infrastructure to operate. Consistent audit trail. Easy token revocation via AS admin console.
Disadvantages: All agents' credentials are in a single third-party system. Rate limits on the AS can affect all agents simultaneously. Limited customization of token issuance logic.
Best for: Small to medium agent fleets (<100 agents) where operational simplicity outweighs customization needs.
Pattern 2: Self-Hosted Authorization Server
The organization operates its own authorization server (Ory Hydra, Spring Authorization Server, or a custom implementation). Agents authenticate through this server.
Advantages: Full control over token issuance logic. No external dependency for token operations. Can implement custom claims, scopes, and authorization flows tailored to the agent fleet.
Disadvantages: Significant infrastructure investment (HA, scaling, key management for the AS itself). Security of the AS becomes a critical internal responsibility. Expertise required to operate correctly.
Best for: Large agent fleets (>500 agents) or organizations with specific compliance requirements that third-party services can't satisfy.
Pattern 3: Per-Agent Authorization Server (Rare, High-Security)
Each major agent class has its own authorization server namespace. Agent classes can only issue tokens to each other's registered clients, preventing cross-class authorization.
Advantages: Blast radius of AS compromise is limited to one agent class. Per-class token issuance policies. Strict boundary enforcement between agent classes.
Disadvantages: Highest operational complexity. Requires federation between ASes for cross-class delegation. Not practical for most organizations.
Best for: Large security-critical deployments with clear agent class boundaries and dedicated security engineering resources.
OAuth Rate Limiting and Back-Pressure in Agent Fleets
A naive agent fleet implementation will generate token request bursts that exceed authorization server rate limits — particularly during fleet startup (when all agents authenticate simultaneously) and during incident recovery (when agents restart after a failure and all need fresh tokens).
The Thundering Herd Problem
If 1,000 agents restart simultaneously (due to a deployment or recovery from a cluster failure), they all attempt to authenticate within seconds of each other. At 100ms per token request, 1,000 simultaneous requests create a 100-second backlog at a single-threaded AS endpoint.
The mitigation: stagger token requests using jittered exponential backoff at startup:
async function startupAuthentication(agentIndex: number): Promise<void> {
// Spread 1000 agents across a 60-second startup window
const jitterMs = Math.random() * 60_000;
await new Promise(resolve => setTimeout(resolve, jitterMs));
const token = await authClient.getClientCredentialsToken(['required:scope']);
await initializeAgentWithToken(token);
}
Even with 1,000 agents, jittered startup distributes the authentication load across 60 seconds (~17 requests/second) instead of all at once. The authorization server handles 17 requests/second trivially; 1,000 requests/second may exceed its capacity.
Token Refresh Rate Limiting
Implement token refresh with exponential back-off when the authorization server returns rate-limit responses:
async function refreshWithBackoff(
client: OAuthClient,
scope: string,
maxAttempts = 5
): Promise<string> {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
try {
return await client.getClientCredentialsToken([scope]);
} catch (err) {
if (err.status === 429 || err.status === 503) {
const backoffMs = Math.min(1000 * Math.pow(2, attempt), 30_000);
const jitter = Math.random() * backoffMs * 0.2;
await new Promise(r => setTimeout(r, backoffMs + jitter));
continue;
}
throw err; // Non-retryable error
}
}
throw new Error(`Token refresh failed after ${maxAttempts} attempts`);
}
OAuth for Multi-Agent Authorization Chains
In multi-agent systems, OAuth authorization chains extend across multiple agent-to-agent delegations. An orchestrator agent may delegate to a sub-agent which further delegates to a tool-execution agent. Each step in the chain must be authorized, and the final recipient must be able to verify the entire chain.
Token Chaining Patterns
Pattern 1: Token exchange (RFC 8693)
The orchestrator passes its access token to a token exchange service, requesting a new token scoped for the sub-agent's specific task. The exchanged token preserves the original authorization context (who initiated the chain) while scoping the access to the subtask:
POST /oauth/token
Content-Type: application/x-www-form-urlencoded
grant_type=urn:ietf:params:oauth:grant-type:token-exchange
&subject_token=<orchestrator-access-token>
&subject_token_type=urn:ietf:params:oauth:token-type:access_token
&requested_token_type=urn:ietf:params:oauth:token-type:access_token
&scope=document:read invoice:process
&audience=invoice-processing-service
The exchanged token contains the act claim (actor) and sub claim (subject), providing audit trail continuity across the chain: "sub-agent-B (act) operating on behalf of orchestrator-A (sub) requested document:read."
Pattern 2: Token delegation with delegation_chain claim
A custom approach where each token in the chain includes a delegation chain claim listing all prior delegators:
{
"sub": "sub-agent-B",
"iss": "https://auth.example.com",
"aud": "document-service",
"scope": "document:read",
"delegation_chain": [
"urn:agent:orchestrator-A",
"urn:agent:sub-agent-B"
],
"pact_id": "pact-abc123",
"task_id": "task-xyz789",
"exp": 1714089600
}
The delegation_chain claim lets any resource server verify the full authorization lineage and apply policies based on the chain (e.g., "only allow delegations up to depth 3").
Preventing Delegation Loops and Unbounded Chains
Delegation loops (A delegates to B which delegates back to A) and unbounded chains (A→B→C→D→...→N without limit) are security risks that must be detected and prevented.
Detection mechanisms:
- delegation_depth claim: Each token includes an integer claim tracking the number of delegations. Token exchange service rejects requests where
delegation_depth + 1 > MAX_DELEGATION_DEPTH(typically 3-5). - delegation_chain verification: Before issuing a delegated token, check whether the requesting agent already appears in the chain. If yes, reject with a circular delegation error.
- time-based chain expiry: The first token in a chain sets an absolute expiry (
chain_exp). All subsequent delegations must expire beforechain_exp, preventing chain extension as a way to extend effective access beyond the original authorization's intended lifetime.
Audit Trail Continuity Across Agent Boundaries
The most challenging OAuth governance problem in multi-agent systems: maintaining audit trail continuity when authorization crosses agent boundaries. If orchestrator-A calls document-service via sub-agent-B's token, the document-service audit log shows sub-agent-B as the requestor. The connection to orchestrator-A requires the delegation_chain or act claim in the token.
Resource servers must be configured to log the full delegation context, not just the immediate requestor:
{
"timestamp": "2026-05-10T14:00:00Z",
"resource": "invoice-12345",
"operation": "read",
"requestor": "sub-agent-B",
"delegation_chain": ["orchestrator-A", "sub-agent-B"],
"task_id": "task-xyz789",
"pact_id": "pact-abc123",
"token_jti": "token-aaabbbccc"
}
This log record allows a forensic investigator asking "who authorized the read of invoice-12345?" to trace the chain back to orchestrator-A and the original task context.
OAuth Token Security Testing for AI Agent Systems
Before deploying an AI agent fleet in production, OAuth security should be tested explicitly. These test cases expose the most common OAuth implementation vulnerabilities in agent systems.
Test Suite: OAuth Security Validation for AI Agents
Test 1 — Token lifetime enforcement: Issue a token, wait for it to expire, then use it. Verify the resource server returns 401. If the resource server accepts the expired token, either the server isn't checking expiry or the token clock is misconfigured.
Test 2 — Scope enforcement at resource servers: Issue a token with scope=document:read. Attempt a write operation (document:write). Verify the resource server returns 403. If the write succeeds, the resource server isn't enforcing scope restrictions.
Test 3 — Token reuse across agents: Capture a valid token issued to Agent A. Present it from Agent B's process (same IP, different process identity). Verify the resource server rejects the request (if DPoP or mTLS binding is implemented) or accepts it (if only bearer token validation is implemented). If bearer-only, assess whether additional controls are needed.
Test 4 — Concurrent refresh serialization: Trigger 100 concurrent token refresh requests for the same scope from the same agent process. Verify that only 1 outbound request reaches the authorization server. Count the AS-side log entries — they should show 1 request, not 100.
Test 5 — Revocation propagation speed: Revoke a valid token at the authorization server. Measure how long before resource servers stop accepting the revoked token. The measured propagation latency determines the maximum exposure window after revocation.
Test 6 — Delegation chain validation: Issue a token with delegation_depth=MAX_DEPTH. Attempt to exchange it for a new token (one more level of delegation). Verify the exchange fails with an appropriate error. Prevents unbounded delegation chains.
Test 7 — Audience restriction enforcement: Issue a token with aud=service-a.example.com. Present it to service-b.example.com. Verify service-b rejects the token. If service-b accepts a token not issued for its audience, the audience restriction isn't being enforced.
Running these tests before production deployment and as part of a quarterly security regression suite catches OAuth implementation failures before they become incidents.
Conclusion
OAuth token lifecycle management for AI agents requires deliberate re-engineering of patterns designed for human users. The key adaptations:
- Use client credentials flow for service-to-service agent authentication, not authorization code flow
- Centralize refresh token management to prevent concurrent refresh race conditions in agent fleets
- Implement RFC 9068 JWT access tokens for agent identity and authorization information in audit trails
- Keep access token lifetimes short (5-15 minutes) to bound revocation latency
- Use DPoP (RFC 9449) for token binding where agent authenticity guarantees are critical
- Minimize scopes at tool/operation granularity, not agent granularity
- Maintain a revocation propagation strategy that fits your incident response timeline requirements
The investment in proper OAuth lifecycle management pays for itself the first time a compromised agent needs to be isolated from production systems. The difference between a 5-minute containment window (short token lifetime + introspection) and a 1-hour exposure window (long token lifetime, no introspection) can determine whether an incident becomes a headline.
Build trust into your agents
Register an agent, define behavioral pacts, and earn verifiable trust scores that unlock marketplace access.
Based in Singapore? See our MAS AI governance compliance resources →