Skip to main content

TLO Gateway Routing Specification

Implementation specification for TLO Gateway (:8001) as the River Agents entry point -- JWT validation, context extraction and propagation, route registration with FastAPI precedence rules, ACL enforcement, proxy client configuration, WebSocket channel management, Temporal workflow integration, request/response transformation, and security enforcement.

Quick Navigation

Architecture Overview

Position in the Request Flow

TLO Gateway (:8001) is the sole entry point from the frontend for all River Agent operations. No frontend request reaches Backend or river-agent directly. The gateway performs three functions on every request in this order:

  1. Authentication -- Validates the JWT, decodes claims, and extracts user context
  2. ACL Enforcement -- Checks the user's permissions against the required permission for the target endpoint
  3. Proxy Routing -- Forwards the request to the appropriate downstream service with injected internal headers

For River Agents, TLO proxies to two downstream targets:

TargetPortScopeLatency Profile
Backend (rgen-backend)8005All CRUD, lifecycle, state, audit, trigger management, run history, metrics, settings, approval resolutionLow -- database reads and writes
river-agent microservice (rgen-river-agent)8007LLM-powered agent generation from natural language; internal runtime execution calls from TemporalHigh -- LLM inference

Security invariant: Backend (:8005) and river-agent (:8007) trust all X-* context headers from TLO without independent JWT validation. They assume TLO has already validated the token. Network-level controls (VPC, security groups) must ensure these services are only reachable from TLO Gateway.

Proxy Topology

Comparison with Existing TLO Subsystems

River Agents follow the same architectural patterns as existing TLO subsystems:

SubsystemRoute PrefixTarget(s)Workflow EngineWebSocket
PSA (Prompt Studio AI)/api/v1/prompt-studio/**Backend :8005, PSA :8010Temporal (psa-execute-{id})/ws/executions/{id}
Data Sources/api/v1/data-sources/**Backend :8005, Data Orchestration :8002Temporal (test, discover)None
Automation/api/v1/automation/**Automation Service :8007NoneNone
Model Studio/api/v1/models/**Backend :8005NoneNone
River Agents/api/v1/agents/**Backend :8005, river-agent :8007Temporal (agent-run-{id}, agent-schedule-{id})/ws/agents/{id}

Key difference from other subsystems: River Agents require split routing within the same /agents/ prefix -- most routes go to Backend, but /agents/ai/generate targets the river-agent microservice. Specific lifecycle routes trigger Temporal workflows directly from TLO. This is analogous to how /data-sources/{id}/test routes through Temporal to Data Orchestration while other /data-sources/** routes go directly to Backend.


JWT Authentication and Context Propagation

JWT Validation Flow

Validation steps in order:

  1. Token presence -- missing Authorization header returns 401
  2. Token signature -- invalid signature returns 401
  3. Token expiry -- expired token returns 401
  4. Required claims -- missing org_id or workspace_id returns 401
  5. is_active check -- disabled accounts return 401
  6. RBAC check -- valid token with insufficient permissions returns 403

JWTPayload Extraction

TLO decodes the JWT into a JWTPayload dataclass. The following claims are extracted:

JWT ClaimJWTPayload FieldTypeNotes
subsubstrStandard JWT subject
user_id or subuser_idintCustom RiverGen claim
emailemailstrUser email
org_id or organization_idorganization_idintTenant identifier
workspace_idworkspace_idintWorkspace scope
rolesroleslist[str]e.g., ["admin"], ["editor"]
permissionspermissionslist[str]e.g., ["agent:view", "agent:create"]
is_activeis_activeboolDisabled accounts are rejected at step 5 above
session_idsession_idstrSession tracking

Context Headers Injected by TLO

After authentication, TLO injects the following headers into every proxied request. The target service must treat these as authoritative. Any client-supplied versions of these headers are overwritten.

HeaderValue SourcePurpose
X-User-IDuser.user_idIdentity of the requesting user
X-Org-IDuser.organization_idTenant isolation -- all queries MUST filter by this value
X-Organization-IDuser.organization_idAlias for backward compatibility
X-Workspace-IDuser.workspace_idWorkspace scoping
X-Emailuser.emailUser email for audit logs
X-Rolesuser.roles (comma-separated)Role list for downstream authorization decisions
X-Session-IDuser.session_idSession tracking
X-Request-IDGenerated UUID v4 or propagatedDistributed tracing -- logged by all services
X-Trace-IDPropagated from client or generatedOpenTelemetry trace correlation
X-Internal-Call"true"Marks the request as TLO-originated (service-to-service)

For agent-related requests, two additional headers are injected:

HeaderValue SourcePurpose
X-Agent-ID{id} from request pathIdentifies the target agent for audit and ACL scoping
X-Execution-ID{execution_id} from request pathIdentifies the target execution (for run and approval routes)

RequestContextData Structure

The RequestContextData dataclass (from middleware/context.py) is attached to request.state.context on every request:

@dataclass
class RequestContextData:
request_id: str # UUID v4, generated or propagated from X-Request-ID
timestamp: datetime # UTC request start time
trace_id: str | None # Distributed tracing ID
span_id: str | None # Current span ID
parent_span_id: str | None
organization_id: int | None # From X-Org-ID header or JWT
workspace_id: int | None # From X-Workspace-ID header or path
client_ip: str | None # From X-Forwarded-For, X-Real-IP, or direct connection
user_agent: str | None # From User-Agent header
user_id: int | None # Set after authentication

Route Registration

Routing Pattern and FastAPI Precedence

TLO uses FastAPI's @router.api_route with wildcard path parameters to proxy entire resource trees:

@router.api_route(
"/{resource}/{path:path}",
methods=["GET", "POST", "PUT", "PATCH", "DELETE"],
)
async def proxy_routes(request, path, user=Depends(get_current_user)):
return await proxy_to_backend(request, f"/api/v1/{resource}/{path}")

FastAPI matches routes in registration order. Specific routes take precedence over the wildcard catch-all. River Agent routes must be declared in this order:

  1. Specific routes that target the river-agent service (/agents/ai/generate, /agents/ai/suggest-goal, /agents/ai/refine)
  2. Specific POST routes with Temporal workflow side-effects (/agents/{id}/run, /agents/{id}/deploy)
  3. Approval resolution routes with Temporal signals (/agent-approvals/{id}/approve, /reject, /edit-approve)
  4. Run control routes with Temporal cancellation (/agent-runs/{id}/stop, /agent-runs/{id}/retry)
  5. Wildcard catch-all for remaining CRUD (/agents/{path:path}, /agent-runs/{path:path}, etc.)

Route Precedence Decision Tree

Complete Route Registration Table

Agent Management (Backend :8005)

MethodPathRequired PermissionNotes
GET/api/v1/agentsagent:viewList with search, filter, pagination
POST/api/v1/agentsagent:createCreate agent in draft status
GET/api/v1/agents/{id}agent:viewFull agent detail
PUT/api/v1/agents/{id}agent:updateUpdate config; auto-versions if deployed
DELETE/api/v1/agents/{id}agent:deleteSoft-delete
POST/api/v1/agents/{id}/validateagent:deployPre-deployment validation only
POST/api/v1/agents/{id}/deployagent:deployDeploy and activate agent; confirm = yes
POST/api/v1/agents/{id}/pauseagent:deploySuspend triggers
POST/api/v1/agents/{id}/resumeagent:deployReactivate triggers
POST/api/v1/agents/{id}/archiveagent:deleteArchive; confirm = yes
GET/api/v1/agents/{id}/versionsagent:viewList versions
GET/api/v1/agents/{id}/versions/{version_id}agent:viewGet single version
GET/api/v1/agents/{id}/versions/diffagent:viewDiff two versions
POST/api/v1/agents/{id}/versions/{version_id}/rollbackagent:deployRollback; confirm = yes

Agent Configuration Sub-Resources (Backend :8005)

MethodPathRequired Permission
GET/api/v1/agents/{id}/triggersagent:view
POST/api/v1/agents/{id}/triggersagent:update
PUT/api/v1/agents/{id}/triggers/{trigger_id}agent:update
DELETE/api/v1/agents/{id}/triggers/{trigger_id}agent:update
GET/api/v1/agents/{id}/data-sourcesagent:view
POST/api/v1/agents/{id}/data-sourcesagent:update
DELETE/api/v1/agents/{id}/data-sources/{binding_id}agent:update
GET/api/v1/agents/{id}/toolsagent:view
PUT/api/v1/agents/{id}/toolsagent:update
GET/api/v1/agents/{id}/policiesagent:view
POST/api/v1/agents/{id}/policiesagent:update
PUT/api/v1/agents/{id}/policies/{policy_id}agent:update
DELETE/api/v1/agents/{id}/policies/{policy_id}agent:update

AI Generation (river-agent :8007)

These routes must be registered before the /agents/{path:path} wildcard catch-all.

MethodPathRequired PermissionTarget
POST/api/v1/agents/ai/generateagent:createriver-agent :8007 (SSE stream)
POST/api/v1/agents/ai/suggest-goalagent:createriver-agent :8007
POST/api/v1/agents/ai/refineagent:createriver-agent :8007

Templates (Backend :8005)

MethodPathRequired Permission
GET/api/v1/templatesagent:view
GET/api/v1/templates/{id}agent:view
POST/api/v1/templatesagent:create
PATCH/api/v1/templates/{id}agent:update
DELETE/api/v1/templates/{id}agent:delete

Executions and Runs (Backend :8005 + Temporal)

MethodPathRequired PermissionRouting Note
POST/api/v1/agents/{id}/runsagent:executeStarts Temporal workflow agent-run-{execution_id}; returns 202
GET/api/v1/agents/runsagent:viewBackend direct
GET/api/v1/agents/{id}/runsagent:viewBackend direct
GET/api/v1/agents/runs/{execution_id}agent:viewBackend direct
GET/api/v1/agents/runs/{execution_id}/logsagent:viewBackend direct
POST/api/v1/agents/runs/{execution_id}/stopagent:executeSends Temporal cancellation signal
POST/api/v1/agents/runs/{execution_id}/retryagent:executeStarts new Temporal workflow with same context

Inbound Triggers (Backend :8005)

These routes accept traffic from external systems (webhooks, API callers). They do not require a user JWT -- they authenticate via agent API keys validated at TLO.

MethodPathAuth MethodNotes
POST/api/v1/agent-webhooks/{agent_id}Agent API keyEvent-based trigger from external webhook
POST/api/v1/agent-api/{agent_id}/executeAgent API keyAuthenticated API trigger from external systems

Approvals (Backend :8005 + Temporal)

MethodPathRequired PermissionRouting Note
GET/api/v1/agents/approvalsagent:approveBackend direct
GET/api/v1/agents/approvals/{id}agent:approveBackend direct
GET/api/v1/agents/{id}/approvalsagent:viewBackend direct
PATCH/api/v1/agents/approvals/{id}agent:approveBackend writes decision; TLO signals Temporal workflow
POST/api/v1/agents/approvals/{id}/expireagent:deployForce-expire pending approval (Admin only)

Metrics, Monitoring, and Audit (Backend :8005)

MethodPathRequired Permission
GET/api/v1/agents/{id}/metricsagent:view
GET/api/v1/agents/statsagent:view
GET/api/v1/agents/monitoring/summaryagent:monitor
GET/api/v1/agents/monitoring/throughputagent:monitor
GET/api/v1/agents/monitoring/clusteragent:monitor
GET/api/v1/agents/monitoring/alertsagent:monitor
GET/api/v1/auditagent:audit
GET/api/v1/audit/{entry_id}agent:audit
POST/api/v1/audit/exportagent:audit

Settings (Backend :8005)

MethodPathRequired Permission
GET/api/v1/workspace/settingsagent:view
PATCH/api/v1/workspace/settingsagent:update
GET/api/v1/workspace/settings/api-keysagent:admin
POST/api/v1/workspace/settings/api-keysagent:admin
DELETE/api/v1/workspace/settings/api-keys/{key_id}agent:admin
POST/api/v1/workspace/settings/webhooksagent:admin
DELETE/api/v1/workspace/settings/webhooks/{webhook_id}agent:admin

Emergency Controls (Backend :8005)

MethodPathRequired Permission
POST/api/v1/governance/emergency/pause-allagent:admin
POST/api/v1/governance/emergency/policyagent:admin

Wildcard Catch-All (Backend :8005)

These must be declared last in the route file.

PatternMethodsNotes
/api/v1/agents/{path:path}GET, POST, PUT, PATCH, DELETERemaining /agents/** routes
/api/v1/templates/{path:path}GET, POST, PUT, PATCH, DELETERemaining template routes
/api/v1/agent-webhooks/{path:path}POSTRemaining webhook routes
/api/v1/agent-api/{path:path}POSTRemaining API trigger routes

WebSocket Routes

PathRequired PermissionAuth MethodNotes
/ws/agents/{execution_id}agent:viewSec-WebSocket-Protocol: Bearer {jwt}Per-execution live stream
/ws/agents/workspaceagent:viewSec-WebSocket-Protocol: Bearer {jwt}Workspace-wide activity feed
/ws/monitoringagent:monitorSec-WebSocket-Protocol: Bearer {jwt}Infrastructure health stream
/ws/approvalsagent:approveSec-WebSocket-Protocol: Bearer {jwt}Pending approval notifications

Configuration Additions for river-agent Service

Add to services/tlo_gateway/config.py:

# River Agent Service
river_agent_service_url: str = "http://localhost:8007"
river_agent_timeout_seconds: int = 120
river_agent_retry_count: int = 2

Add to middleware/acl.py get_service_url():

"river_agent": settings.river_agent_service_url,

ACL Permission Matrix

Permission Definitions

River Agents introduce 9 permissions in the agent: namespace:

PermissionScopeDescription
agent:viewReadView agents, templates, runs, logs, metrics, and settings
agent:createWriteCreate agents, bind data sources and tools, create from template
agent:updateWriteUpdate agent config, triggers, policies, notification settings
agent:deleteWriteSoft-delete (archive) agents
agent:deployWriteDeploy, pause, resume, validate agents; manage lifecycle transitions
agent:executeWriteTrigger manual runs, retry failed runs, send stop signals
agent:approveWriteApprove, reject, or edit-approve pending approval requests
agent:auditReadView audit logs and execution traces
agent:monitorReadView monitoring metrics and system health
agent:adminAdminManage workspace settings, API keys, webhooks, emergency controls

Admin bypass: Users with the admin role in their JWT bypass all ACL checks, consistent with the existing check_tool_acl() behavior.

Full Endpoint-to-Permission Matrix

EndpointMethodRequired PermissionConfirm
/api/v1/agentsGETagent:viewNo
/api/v1/agentsPOSTagent:createNo
/api/v1/agents/{id}GETagent:viewNo
/api/v1/agents/{id}PUTagent:updateNo
/api/v1/agents/{id}DELETEagent:deleteYes
/api/v1/agents/{id}/validatePOSTagent:deployNo
/api/v1/agents/{id}/deployPOSTagent:deployYes
/api/v1/agents/{id}/pausePOSTagent:deployNo
/api/v1/agents/{id}/resumePOSTagent:deployNo
/api/v1/agents/{id}/archivePOSTagent:deleteYes
/api/v1/agents/{id}/versionsGETagent:viewNo
/api/v1/agents/{id}/versions/{vid}/rollbackPOSTagent:deployYes
/api/v1/agents/ai/generatePOSTagent:createNo
/api/v1/agents/ai/suggest-goalPOSTagent:createNo
/api/v1/agents/ai/refinePOSTagent:createNo
/api/v1/templatesGETagent:viewNo
/api/v1/templatesPOSTagent:createNo
/api/v1/templates/{id}PATCHagent:updateNo
/api/v1/templates/{id}DELETEagent:deleteYes
/api/v1/agents/{id}/triggersGETagent:viewNo
/api/v1/agents/{id}/triggersPOSTagent:updateNo
/api/v1/agents/{id}/triggers/{tid}PUTagent:updateNo
/api/v1/agents/{id}/triggers/{tid}DELETEagent:updateNo
/api/v1/agents/{id}/runsPOSTagent:executeNo
/api/v1/agents/runsGETagent:viewNo
/api/v1/agents/runs/{id}GETagent:viewNo
/api/v1/agents/runs/{id}/stopPOSTagent:executeNo
/api/v1/agents/runs/{id}/retryPOSTagent:executeNo
/api/v1/agents/runs/{id}/logsGETagent:viewNo
/api/v1/agent-webhooks/{agent_id}POSTAgent API keyNo
/api/v1/agent-api/{agent_id}/executePOSTAgent API keyNo
/api/v1/agents/approvalsGETagent:approveNo
/api/v1/agents/approvals/{id}GETagent:approveNo
/api/v1/agents/approvals/{id}PATCHagent:approveNo
/api/v1/agents/approvals/{id}/expirePOSTagent:adminNo
/api/v1/agents/{id}/metricsGETagent:viewNo
/api/v1/agents/monitoring/summaryGETagent:monitorNo
/api/v1/agents/monitoring/throughputGETagent:monitorNo
/api/v1/agents/monitoring/clusterGETagent:monitorNo
/api/v1/agents/monitoring/alertsGETagent:monitorNo
/api/v1/workspace/settingsGETagent:viewNo
/api/v1/workspace/settingsPATCHagent:updateNo
/api/v1/workspace/settings/api-keysGET, POSTagent:adminNo
/api/v1/workspace/settings/api-keys/{key_id}DELETEagent:adminYes
/api/v1/workspace/settings/webhooksPOSTagent:adminNo
/api/v1/governance/emergency/pause-allPOSTagent:adminYes

Role-to-Permission Mapping

River Agents RBAC operates on two tiers. Both tiers are enforced; a user must satisfy the relevant tier for the requested operation.

Organization-level roles:

PermissionOrg AdminOrg EditorOrg Viewer
agent:viewYesYesYes
agent:createYesYesNo
agent:updateYesYesNo
agent:deleteYesNoNo
agent:deployYesYesNo
agent:executeYesYesNo
agent:approveYesYesNo
agent:auditYesNoNo
agent:monitorYesNoNo
agent:adminYesNoNo

Workspace-level roles:

PermissionWS AdminWS EditorWS AnalystWS ViewerWS Auditor
agent:viewYesYesYesYesYes
agent:createYesYesNoNoNo
agent:updateYesYesNoNoNo
agent:deleteYesNoNoNoNo
agent:deployYesYesNoNoNo
agent:executeYesYesYesNoNo
agent:approveYesYesNoNoNo
agent:auditYesNoNoNoYes
agent:monitorYesNoYesNoYes
agent:adminYesNoNoNoNo

TOOL_ACL_MAP Entries for Agent Execution

When River Agents execute tools during their agentic loop via TLO's ACL proxy, the existing TOOL_ACL_MAP in middleware/acl.py applies. The agent runs with the triggering user's permissions -- see Agent Execution Security Model. No new entries are required for agent-invoked platform tools (e.g., execute_query still maps to data_source:query). However, TLO must pass X-Agent-ID alongside the existing user context headers so downstream services can distinguish agent-initiated requests from direct user requests in audit logs.

See Appendix B: TOOL_ACL_MAP Additions for the internal activity tools used by Temporal workflow activities.


Proxy Client Configuration

Circuit Breaker Settings

River Agents require separate circuit breakers for each downstream target:

Circuit BreakerTargetFailure ThresholdRecovery TimeoutRationale
backendBackend :80055 consecutive failures30 secondsShared with all TLO routes; conservative threshold
river_agentriver-agent :80073 consecutive failures60 secondsLLM service may have cold starts; longer recovery window

Circuit breaker state machine:

CLOSED     --[failure_count >= threshold]-->  OPEN
OPEN --[recovery_timeout elapsed]--> HALF_OPEN
HALF_OPEN --[success]--> CLOSED
HALF_OPEN --[failure]--> OPEN

When the circuit is OPEN, TLO returns 503 Service Unavailable immediately without attempting the downstream call.

Retry Policy

Error ConditionRetry?Max RetriesBackoffApplied To
HTTP 502 Bad GatewayYes3Exponential (0.5s, 1s, 2s)All routes
HTTP 503 Service UnavailableYes3ExponentialAll routes
HTTP 504 Gateway TimeoutYes3ExponentialAll routes
HTTP 429 Rate LimitedYes2Fixed (1s)All routes
httpx.ConnectErrorYes3ExponentialAll routes
httpx.TimeoutExceptionYes1NoneCRUD routes only
HTTP 4xx client errorsNo----Never retry
HTTP 5xx other server errorsNo----Only 502/503/504 retry
Execution trigger (POST .../runs)No----Not idempotent -- never retry
Approval resolution (PATCH .../approvals/{id})No----Not idempotent -- never retry

No retries on execute and approve routes: A retry on a successful-but-slow run trigger would start a duplicate execution. The client is responsible for retry logic with idempotency keys where required.

Timeout Configuration

Route CategoryConnect TimeoutRead TimeoutTotal TimeoutRationale
CRUD (list, get, create, update, delete)5s10s15sStandard database operations
Lifecycle (deploy, pause, resume, archive)5s15s20sMay trigger validation and trigger registration
Validation (/validate)5s30s35sConnectivity tests to data sources
AI Generation (/ai/generate)5s120s125sLLM inference; SSE stream may run for the full duration
Manual Run (/agents/{id}/runs)5s10s15sOnly starts the Temporal workflow; does not wait for completion
Execution Activity (Temporal to river-agent)5s300s305sFull agentic loop; up to 15 turns
Approval Resolution (PATCH .../approvals/{id})5s10s15sTemporal signal dispatch; near-instant

Health Check Endpoints

ServiceHealth EndpointExpected ResponseFailure Mode
Backend :8005GET /healthHTTP 200"backend": "failed" -- critical; /ready returns unhealthy
river-agent :8007GET /healthHTTP 200"river_agent": "failed" -- non-critical; CRUD and lifecycle still work

river-agent is not critical for readiness. If it is unavailable, only AI generation and runtime LLM execution are degraded. CRUD, lifecycle, and approval operations continue normally.


WebSocket Integration

Dedicated Endpoint

River Agents use a dedicated WebSocket endpoint: /ws/agents/{execution_id}.

This is separate from PSA's /ws/executions/{execution_id} because River Agent execution messages have different semantics -- approval notifications, governance gate events, turn-by-turn reasoning with action level enforcement, and long-running execution spans that may pause for hours at an approval gate. A dedicated endpoint allows:

  • Agent-specific message type definitions
  • Agent API key authentication in addition to user JWT (for external system integrations)
  • Independent connection management and per-execution message queuing

WebSocket Authentication

WebSocket connections authenticate via the Sec-WebSocket-Protocol header (this is the browser-compatible mechanism -- the protocols parameter in the WebSocket() constructor):

GET /ws/agents/exec-001-uuid HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Protocol: Bearer eyJhbGci...

TLO validates the token identically to HTTP request tokens and injects context headers into the forwarded upgrade request. The Backend WebSocket service never receives the client's raw token.

Queued messages: if the Temporal workflow has already emitted events before the WebSocket connection is established, those messages are queued in the ConnectionManager and flushed immediately upon connection.

Message Types

Server to Client (8 types)

TypeWhen SentDescription
connectedImmediately after WebSocket acceptConnection bound to execution_id
run_startedWhen Temporal workflow beginsExecution has started
turn_updateAfter each agentic loop turnReasoning turn progress with tool call and observation
governance_checkAfter governance evaluationDecision for a tool call: proceed, blocked, suggest, or gate
approval_requiredWhen workflow enters approval waitAction requires human approval; execution is paused
approval_resolvedWhen approval signal is processedApproval decision received
run_completedWhen Temporal workflow finishesExecution finished with final result and status
errorOn any unrecoverable errorTimeout, service failure, or policy violation

Client to Server (3 types)

TypeWhen SentDescription
approval_responseIn response to approval_requiredUser's approval decision (approve, reject, edit-approve)
stop_requestWhen user clicks "Stop"Emergency stop signal
pingPeriodicallyKeepalive; server responds with pong

Key Message Structures

turn_update:

{
"type": "turn_update",
"execution_id": "exec-001-uuid",
"turn": 3,
"max_turns": 15,
"phase": "reasoning",
"tool_name": "execute_query",
"tool_category": "execution",
"message": "Querying sales data for Q1 2026...",
"progress_percentage": 40.0,
"timestamp": "2026-04-23T10:15:05Z"
}

approval_required:

{
"type": "approval_required",
"execution_id": "exec-001-uuid",
"approval_id": "apr-001-uuid",
"agent_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"action": "write_back",
"severity": "warning",
"title": "Write 1,250 rows to production table",
"details": {
"tool_name": "write_back",
"tool_args": { "target_table": "customer_segments", "row_count": 1250 },
"agent_name": "Customer Segmentation Agent",
"reasoning": "Based on updated model scores, 1,250 customer records need segment reassignment."
},
"timeout_seconds": 3600,
"timestamp": "2026-04-23T10:15:12Z"
}

run_completed:

{
"type": "run_completed",
"execution_id": "exec-001-uuid",
"status": "completed",
"result": {
"summary": "Revenue anomaly detected: Q1 revenue dropped 15% vs Q4.",
"actions_taken": [
{ "tool": "execute_query", "status": "success" },
{ "tool": "send_notification", "status": "success" }
],
"turns_used": 8,
"total_duration_ms": 45200
},
"timestamp": "2026-04-23T10:15:50Z"
}

Client approval_response:

{
"type": "approval_response",
"execution_id": "exec-001-uuid",
"approval_id": "apr-001-uuid",
"decision": "edited",
"edited_args": { "target_table": "customer_segments_staging", "row_count": 1250 },
"reason": "Write to staging first for review.",
"timestamp": "2026-04-23T10:16:30Z"
}

Channel Subscription Model

The ConnectionManager tracks WebSocket connections at three scopes. When a run event occurs, it is fan-out delivered to all three scopes simultaneously:

ScopeSubscription KeyUse Case
Per-executionagent-run-{execution_id}Live progress for a single running execution
Per-agentagent-{agent_id}Monitoring dashboard for one agent (all run starts and completions)
Workspace-wideworkspace-agents-{workspace_id}Dashboard showing all agent activity in a workspace

WebSocket Flow During Execution


Temporal Workflow Integration

river_agent_execution_workflow

The primary workflow for every agent run, regardless of trigger type.

  • Workflow ID pattern: agent-run-{execution_id}
  • Task queue: tlo-gateway (shared with PSA workflows)
  • Execution timeout: 1 hour (configurable per agent; max 4 hours)

Activities

ActivityTimeoutRetryTargetDescription
resolve_agent10s2 retriesBackend :8005Load agent config, validate Active status, load current version
build_context30s2 retriesBackend :8005Assemble AgentContext: instructions, tools, data sources, memory, trigger payload, schema metadata
call_river_agent150s1 retryriver-agent :8007Send context and observations to LLM for next reasoning turn; returns tool call or final answer
check_governance10s2 retriesBackend :8005Evaluate action level, policies, and approval rules for the proposed tool call
execute_tool60s (120s for queries)1 retryTLO ACL proxyExecute tool call through TLO's ACL validation and service routing
log_turn5s3 retriesBackend :8005Persist turn log entry (reasoning, tool call, observation, governance decision)
update_execution_status5s3 retriesBackend :8005Update agent_executions record status and timing
update_long_term_context10s2 retriesBackend :8005Persist learnings from the run into agent's long_term_context
send_ws_message10s1 retryTLO internalPush WebSocket message via /internal/ws/push
send_notification10s2 retriesNotification :8006Send approval/completion notifications via Novu

Signal Handling for Approval Resolution

The workflow uses Temporal's workflow.wait_condition() to hibernate at zero compute cost while waiting for an approval signal.

Signal flow:

  1. User clicks Approve/Reject/Edit in the frontend
  2. Frontend calls PATCH /api/v1/agents/approvals/{id}
  3. Backend writes the resolution to approval_requests
  4. TLO resolves the execution_id from the approval_request record
  5. TLO sends Temporal signal to workflow agent-run-{execution_id}

Signal name: approval_resolution

Signal payload:

{
"approval_id": "apr-001-uuid",
"decision": "approved",
"edited_args": {},
"approver_user_id": 42,
"reason": "Verified against policy; proceed.",
"timestamp": "2026-04-23T10:16:30Z"
}

State Serialization for Approval Hibernation

When the workflow enters the approval wait, Temporal automatically serializes workflow state during wait_condition. All workflow variables must be JSON-serializable (Pydantic models guarantee this). The serialized state includes:

@dataclass
class HibernatedState:
execution_id: str
agent_id: str
current_turn: int
pending_tool_call: dict # {tool_name, tool_args, governance_decision}
conversation_history: list # list[TurnEntry]
agent_context: dict # serialized AgentContext
started_at: datetime
hibernated_at: datetime

No manual serialization is needed beyond ensuring all dataclasses are Pydantic models or standard Python types.

Error Handling and Retry Policy

Error TypeHandlingRetryNotification
resolve_agent failureAbort workflow immediately2 retries, 1s backoffNone
build_context failureAbort workflow2 retries, 2s backofferror WebSocket message
call_river_agent timeoutRetry once, then abort1 retry, 5s backofferror WebSocket message
call_river_agent LLM errorRetry once, then abort1 retry, 2s backofferror WebSocket message
execute_tool timeoutSkip tool; feed error as observation1 retry, 2s backoffturn_update with error
execute_tool ACL deniedFeed denial as observation; do not retryNo retryturn_update with denial
execute_tool service errorRetry once; then feed error as observation1 retry, 2s backoffturn_update with error
Approval timeout (1h default)Terminate with approval_timeout statusNo retryNotification to agent owner
Max turns exceededTerminate with max_turns_exceeded statusNo retryNotification to agent owner
Workflow cancellationClean up; set status to stoppedNo retryerror WebSocket message

river_agent_scheduled_trigger_workflow

Handles cron-based scheduled triggers for active agents.

  • Workflow ID pattern: agent-schedule-{agent_id}
  • Task queue: tlo-gateway
  • Pattern: ContinueAsNew after each execution to maintain the cron schedule indefinitely

Sequence

Cron Configuration

ParameterValue
Schedule IDagent-schedule-{agent_id}
Cron ExpressionFrom agent_triggers.trigger_config.cron_expression
TimezoneFrom trigger_config.timezone (default: UTC)
Overlap PolicySKIP -- do not start a new run if previous is still running
Catchup Window1 hour -- catch up on missed triggers within this window
Pause on DeployYes -- paused until agent reaches Active state

Request and Response Transformation

Header Injection

TransformationTriggerDescription
Standard context headersAll requestsX-User-ID, X-Org-ID, X-Workspace-ID, X-Roles, X-Request-ID, X-Internal-Call injected; client-supplied values overwritten
X-Agent-ID injectionPath contains /agents/{id}Agent ID extracted from path parameter
X-Execution-ID injectionPath contains /runs/{execution_id}Execution ID extracted from path parameter
X-Trigger-Type injectionPOST /agents/{id}/runsTrigger type extracted from request body

Request Validation and Sanitization

TLO performs lightweight validation before proxying. Deep schema validation is the responsibility of Backend.

ValidationApplied ToBehavior on Failure
JSON body parseAll POST, PUT, PATCH requestsReturn 400 if body is not valid JSON
agent_id formatPath parameter {id}Must be valid UUID; return 400 otherwise
Required fields for createPOST /api/v1/agentsname and instruction_set must be present
State pre-check for deployPOST /api/v1/agents/{id}/deployAgent must not be archived; return 409 otherwise

Response Envelope

For routes handled directly by TLO (Temporal workflow triggers, approval signals), the response is wrapped in the standard RiverGen envelope:

{
"success": true,
"status": 200,
"message": "Human-readable description",
"data": {},
"error": null,
"meta": {
"request_id": "uuid",
"timestamp": "2026-04-23T10:15:00Z"
}
}

For proxy routes (pass-through to Backend), the Backend response is forwarded unchanged -- Backend already returns this format.

Error Response Normalization

HTTP StatusError CodeMessage Template
400validation_errorSpecific validation failure message
401missing_token, invalid_token, expired_tokenAuthentication failure message
403permission_denied"Permission denied: requires '{permission}'"
404not_found"Agent not found", "Run not found"
409invalid_state_transition"Cannot deploy agent in '{current_status}' state"
429rate_limited"Rate limit exceeded. Retry after {retry_after}s"
503service_unavailable"Service {name} is temporarily unavailable"
504gateway_timeout"Service {name} timed out after {timeout}s"

Rate Limiting Headers

TLO injects rate limiting headers on every response:

HeaderValueDescription
X-RateLimit-LimitRequests per minuteWindow size for the current endpoint group
X-RateLimit-RemainingRemaining requests in windowCount down to limit
X-RateLimit-ResetUnix timestampWhen the window resets
Retry-AfterSecondsOnly present on 429 responses

Security Enforcement

Per-Request ACL Validation Flow

Tenant Isolation Enforcement

Tenant isolation is enforced at three independent layers:

TLO Layer: org_id and workspace_id come exclusively from the decoded JWT. TLO never allows the client to override these values via query parameters, request body fields, or path parameters. If a client sends ?workspace_id=999 but their JWT contains workspace_id=7, TLO uses 7.

Backend Layer: Every database query includes WHERE organization_id = :org_id using the value from X-Org-ID. This is enforced by SQLAlchemy query patterns and the organization_id dependency injected into all service methods.

Agent Layer: During execution, the agent can only access data sources belonging to the same organization_id as the agent. agent_version_data_sources bindings are validated against the user's org at deployment time.

Rate Limiting

Rate limits are enforced at two granularities:

GranularityWindowLimitApplies To
Per user (workspace)1 minute100 requestsAll agent API endpoints
Per API key1 minute60 requestsAgent API trigger endpoint (/agent-api/{id}/execute)
Per agent (execution)1 hour50 manual runsManual trigger endpoint (/agents/{id}/runs)
Burst (token bucket)--20 concurrentAll endpoints

Rate limiting is implemented in TLO middleware. When Redis is available (redis_url configured), limits are distributed across TLO instances. Otherwise, in-memory rate limiting applies (suitable for single-instance development only).

Request Logging for Audit Trail

Every River Agent request is logged with the following fields:

FieldSourcePurpose
request_idGenerated UUIDDistributed trace correlation
timestampServer clock (UTC)Event ordering
methodHTTP methodOperation type
pathRequest URL pathTarget resource
user_idJWT claimWho made the request
organization_idJWT claimTenant identifier
workspace_idJWT claimWorkspace scope
agent_idPath parameterTarget agent (if applicable)
execution_idPath parameter or bodyTarget execution (if applicable)
status_codeResponseSuccess or failure
duration_msComputedPerformance tracking
acl_decisionACL check resultWhether permission was granted or denied

Audit logs are:

  • Written synchronously to the application log (structured JSON)
  • Persisted asynchronously to audit_logs in PostgreSQL via Backend
  • Never include request or response bodies in production (configurable via log_include_request_body / log_include_response_body)

Sensitive Data Redaction

The following data is never logged, even in debug mode:

DataRedaction Rule
JWT tokenLogged as Bearer ***
API keysLogged as ***{last4} (last 4 characters only)
Database credentialsNever present in TLO context -- fetched by Backend only
Auth endpoint request bodiesNever logged (passwords, MFA codes)
WebSocket message bodiesLogged as type + execution_id only -- no payload content

Agent Execution Security Model

During an agent execution, the security context is the triggering user's identity, not the agent's identity. This means:

  • The agent can only access data sources the triggering user has permission to access
  • The agent can only execute tools the triggering user has permissions for
  • X-User-ID in tool calls contains the triggering user's ID, not a service account
  • X-Agent-ID is added for audit purposes but does not grant any additional permissions

This prevents privilege escalation: an agent configured by an Admin but triggered by a Viewer cannot perform Admin-level operations. The Viewer's permissions are the ceiling.

Scheduled triggers (no interactive user): The agent runs with the permissions of the agent owner -- the user who last deployed the agent. The owner's user_id and permissions are stored in the agent configuration at deployment time and used as the security context for all scheduled and event-triggered runs.


Appendix A: Configuration Reference

TLO Gateway Settings for River Agents

SettingEnv VariableDefaultDescription
river_agent_service_urlTLO_RIVER_AGENT_SERVICE_URLhttp://localhost:8007river-agent microservice base URL
river_agent_timeout_secondsTLO_RIVER_AGENT_TIMEOUT_SECONDS120Default timeout for river-agent calls
river_agent_retry_countTLO_RIVER_AGENT_RETRY_COUNT2Max retries for river-agent calls
agent_execution_timeout_secondsTLO_AGENT_EXECUTION_TIMEOUT_SECONDS3600Max duration for a single agent run
agent_turn_timeout_secondsTLO_AGENT_TURN_TIMEOUT_SECONDS150Max duration for a single reasoning turn
agent_max_turnsTLO_AGENT_MAX_TURNS15Max reasoning turns per execution
agent_approval_timeout_secondsTLO_AGENT_APPROVAL_TIMEOUT_SECONDS3600Max wait time for approval (1 hour)
agent_ws_message_queue_sizeTLO_AGENT_WS_MESSAGE_QUEUE_SIZE200Max queued WebSocket messages per execution

Circuit Breaker Settings

SettingEnv VariableDefaultDescription
river_agent_cb_failure_thresholdTLO_RIVER_AGENT_CB_FAILURE_THRESHOLD3Failures before circuit opens
river_agent_cb_recovery_timeoutTLO_RIVER_AGENT_CB_RECOVERY_TIMEOUT60Seconds before half-open test
backend_cb_failure_thresholdTLO_BACKEND_CB_FAILURE_THRESHOLD5Failures before circuit opens
backend_cb_recovery_timeoutTLO_BACKEND_CB_RECOVERY_TIMEOUT30Seconds before half-open test

Appendix B: TOOL_ACL_MAP Additions

The following entries must be added to middleware/acl.py TOOL_ACL_MAP for agent-internal tools invoked by Temporal workflow activities. These tools are called via X-Internal-Call: true from Temporal workers -- they bypass JWT validation and use the triggering user's context from X-User-ID / X-Org-ID headers.

# River Agent internal activity tools (called by Temporal workflow activities)
# These use X-Internal-Call: true and bypass JWT validation.
# User context is propagated via X-User-ID, X-Org-ID, X-Workspace-ID headers.
"resolve_agent_config": {"permission": "agent:view", "service": "backend"},
"list_agent_data_sources": {"permission": "agent:view", "service": "backend"},
"get_agent_memory": {"permission": "agent:view", "service": "backend"},
"update_agent_memory": {"permission": "agent:update", "service": "backend"},
"create_approval_request": {"permission": "agent:execute", "service": "backend"},
"log_agent_turn": {"permission": "agent:execute", "service": "backend"},
"update_execution_status": {"permission": "agent:execute", "service": "backend"},

Appendix C: Comparison with PSA Workflow

AspectPSA WorkflowRiver Agent Workflow
Workflow IDpsa-execute-{execution_id}agent-run-{execution_id}
Trigger typesAlways manual (user prompt)Manual, scheduled, event, API, threshold, workflow
Context assembly5 parallel enrichment activitiesContext builder loads agent config, memory, and trigger payload
Reasoning targetPSA Service :8010river-agent :8007
GovernancePer-tool ACL onlyPer-tool ACL + Action Level + Approval Gates + Policy Engine
Approval gateConfirmation token (destructive ops only)Full approval workflow with Temporal hibernate/resume
Max turns1515 (configurable per agent version)
WebSocket path/ws/executions/{id}/ws/agents/{id}
Post-executionStore prompt historyUpdate long-term context + metrics + notifications
State persistenceSession-based (lost on browser close)Long-lived memory across runs
Tool executionClient-side (WebSocket round-trip)Server-side (Temporal activity via TLO ACL proxy)
Scheduled runsNot supportedTemporal Schedule with ContinueAsNew cron pattern