Skip to main content

System Overview

Engineering reference for the River Agents platform: three-service architecture, four action levels, six trigger types, nine lifecycle states, agent configuration model, and component integration contracts.

Quick Navigation

Architecture Overview

River Agents is a three-service platform. Every execution crosses all three services in sequence.

ServicePortResponsibility
Backend (FastAPI):8005Agent CRUD, lifecycle state machine, trigger registration, approval processing, audit writes
TLO Gateway:8001JWT validation, per-tool ACL enforcement, route proxying to Backend and river-agent
river-agent (FastAPI):8007Stateless reasoning service -- runs the full agentic loop per execution request

The Backend owns all persistent state (PostgreSQL river_agents schema) and is the only process that writes to it. TLO Gateway is the single security boundary for all external-facing API calls and all tool dispatch. river-agent is entirely stateless -- it receives the full AgentContext in the request body and returns a complete ReasoningResult; it holds no database connections and no in-memory session state.

River Agents layers the following on top of the core reasoning engine: persistent agent identity with immutable version snapshots, six trigger types handled by a normalization layer, a 9-state lifecycle state machine with guarded transitions, long-term memory accumulated across runs in agent_memory, approval gate orchestration via Temporal workflow.await(), governance policy evaluation on every write-capable action, and a real-time observability surface via WebSocket and agent_metrics.

High-Level Conceptual Model


The Action Level Enforcement Model

Action levels are the primary governance control for every deployed agent. They are not a hint to the LLM -- they are an enforcement boundary in the execution infrastructure.

LevelNameEnforcement Behavior
1Read and RespondTool registry presented to river-agent contains only read-classified tools. No write-capable tool is dispatched.
2Recommend ActionsWrite-capable tools are available for reasoning and planning but blocked at dispatch. TLO returns a staged proposal instead of executing.
3Act with ApprovalWrite-capable tools are dispatched only after an approval record is resolved. Temporal workflow serializes state at the gate.
4Fully AutomatedAll tools in the agent's configured tool set are dispatched without gates. ACL still enforces the outer boundary.

The enforcement chain for a tool call is:

  1. river-agent selects a tool via the agentic loop reasoning step
  2. river-agent sends the tool call to Backend :8005
  3. Backend checks the agent's action_level against the tool's write classification
  4. If the level requires a gate, Backend creates an approval_requests record and signals Temporal to suspend via workflow.await()
  5. On approval resolution, Temporal resumes; Backend re-submits the tool call to TLO Gateway
  6. TLO validates the per-tool ACL before dispatching to the target service

The LLM never bypasses this chain. Even if river-agent reasons toward a write action that the action level disallows, the enforcement happens downstream of the reasoning output.

Risk profile by level:

LevelRiskDeployment Requirement
1NoneSafe for any user and any data source
2LowRequires a human operator available to review proposals
3MediumRequires configured approver assignment and notification routing
4HighRequires a validated instruction set, scoped tool list, and active monitoring before activation

Trigger Architecture

Six trigger types feed into the same Trigger Ingestion Service. The ingestion service normalizes each trigger into a standard envelope and routes it to the Execution Queue. The Agent Runner Dispatcher pulls from this queue and initiates a Temporal workflow per run.

Trigger TypeIngestion MechanismKey Configuration Fields
ManualPOST to /api/v1/agents/:id/runsNone required
ScheduledInternal cron scheduler reads trigger_config.cron_expressioncron_expression (e.g., 0 8 * * *), timezone
Event-basedKafka consumer or webhook endpointevent_type, optional payload_conditions
WorkflowTemporal signal listenerworkflow_id, signal_name
APIExternal POST to agent's dedicated endpoint via TLOapi_key, optional payload_schema
Threshold / PolicyMetric monitor polls configured metric sourcemetric_source, threshold_expression, cooldown_seconds

The trigger ingestion service performs two checks before enqueuing:

  1. Agent state check -- agent must be in active status. Any other state results in the trigger being logged and rejected.
  2. Concurrency check -- if allow_concurrent_runs is false (default), a trigger arriving while a run is already in running status is queued or dropped based on the on_concurrent_trigger policy (queue, drop, or replace).

Trigger Flow Diagram


Agent Lifecycle State Machine

Agent state is stored in river_agents.agents.status and transitions are enforced by the Agent Management Service. No component other than the Backend service may write to this field.

Agent Lifecycle States

Transition guards and enforcement:

TransitionGuardEnforced By
Configured -> ValidatedData source connectivity test passes; all bound tools exist in registry; governance policy resolvesAgent Management Service (validation job)
Validated -> DeployedTemporal workflow registered; trigger listeners activatedAgent Management Service + Trigger Ingestion Service
Active -> RunningTrigger ingested; concurrency check passes; rate limit not exceededTrigger Ingestion Service + Agent Runner Dispatcher
Running -> Awaiting ApprovalTool call classified as requiring approval at current action_levelBackend governance check during execution
Any state -> ArchivedCaller has agent:delete permission; running executions must be cancelled firstAgent Management Service (permission check)

Validation (Configured -> Validated) is the only transition that runs asynchronous checks. All other transitions are synchronous state writes with guard evaluation inline.


Agent Categories: Technical Reference

Eight categories are supported. Categories are a classification tag (business_function enum) that determines the default template, recommended tool set, and default action level applied at creation time. Category does not change runtime behavior -- action level and tool bindings are what the runtime enforces.

Categorybusiness_function ValueDefault action_levelTypical Write OperationsPrimary Trigger
Customer Supportcustomer_supportact_with_approvalSend response, issue refund, create escalation ticketEvent (ticket.created)
Sales and Lead QualificationsalesrecommendCRM record updateEvent (lead.created)
Finance and Reconciliationfinanceact_with_approvalLedger write-backScheduled (daily)
Risk and Compliancerisk_complianceact_with_approvalAccess revocation, case creationEvent + Threshold
Data Analyst and Insightsdata_analystread_onlyNoneManual + Scheduled
Operations MonitoringoperationsautomatedWorkflow restart, infrastructure scalingThreshold + Event
Executive Decision Supportexecutiveread_onlyNoneScheduled
Custom EnterprisecustomConfigurableConfigurableConfigurable

The defaults are applied at agent creation time as pre-populated values. Operators may override any of them. The Operations Monitoring category ships with fully_automated as its default because time-critical remediation actions (restarting a failed workflow, scaling a service) lose their value if blocked by an approval gate. This is the only category where fully_automated is the recommended starting point -- and it requires the most careful tool scoping before activation.


Agent Configuration Schema

The river_agents.agents table stores the root entity. The canonical object model covers all configurable and system-managed fields.

FieldTypeDescription
agent_idUUIDImmutable identifier generated at creation
namestringHuman-readable identifier used in logs, notifications, and the UI
descriptiontextFree-text description of the agent's purpose; not used by the reasoning engine
business_functionenumCategory classification; determines default template and recommended tools
domainstringBusiness domain tag used for grouping and filtering in the UI (e.g., "Revenue Operations")
owner_user_idUUIDForeign key to iam.users; receives approval notifications and is the default approver
statusenumCurrent lifecycle state; written only by Agent Management Service
action_levelenumAutonomy ceiling enforced at the Backend governance layer, not by the LLM
governance_levelenumPolicy strictness preset: standard, strict, custom
instruction_settextNatural language goal and behavioral constraints passed to river-agent as the system prompt component
selected_data_sourcesarray[UUID]Foreign keys to platform.data_sources; scopes what the agent may query
selected_toolsarray[string]Tool identifiers from the tool registry; defines what the agent may invoke
selected_workflowsarray[UUID]Temporal workflow IDs the agent may trigger as sub-processes
trigger_configJSONBType-discriminated union; deserialized by trigger type at ingestion time
approval_rulesJSONBApprover assignment, auto-approve conditions, and escalation timeout (seconds)
notification_configJSONBChannel config (Slack webhook, email, in-app) and the event types that trigger each
long_term_contextJSONBAccumulated memory written back by Backend after each run; read by river-agent during context assembly
current_version_idintegerPoints to the active agent_versions record; pinned on deploy, updated on version rollout
health_statusenumComputed by Monitoring Service from recent execution outcomes: healthy, degraded, critical, unknown
created_attimestamptzSet at INSERT; never updated
updated_attimestamptzUpdated on every configuration change
organization_idUUIDTenant isolation key; all queries against agent tables must include this filter
workspace_idUUIDWorkspace scope within the organization

Implementation notes on key fields:

  • instruction_set is injected into the river-agent context bundle as a system-level prompt segment, separate from the trigger payload. river-agent sees both.
  • trigger_config uses a type discriminator field. The trigger ingestion service switches on this field to determine the deserializer. Unknown type values cause the trigger registration to fail at validation time, not at ingestion time.
  • long_term_context is not a free-form log. The Backend writes a structured delta after each run. river-agent reads the full snapshot on context assembly. The schema for this JSONB is documented in the Schema Design section.
  • action_level is checked by the Backend, not by river-agent. river-agent has no knowledge of the action level -- it receives a tool registry that may or may not include write-capable tools, depending on what Backend decided to pass.

Component Interaction Model

The sequence below covers a standard execution run at action_level 3 (Act with Approval) to illustrate all service interactions.

Architectural Relationship and Component Interaction


Platform Integration Points

River Agents consume the following RiverGen platform components. Each integration point is a hard dependency -- the component must be available for the described functionality to work.

ComponentIntegration PointWhat Breaks Without It
Reasoning engine (river-agent :8007)Context bundle POST for every agent runAll executions fail; no reasoning is possible
TLO Gateway :8001Tool dispatch, JWT validation, ACL enforcementNo tools can be invoked; no external actions
Temporal.ioWorkflow orchestration, workflow.await() for approval gates, retry and timeout handlingApproval gates become synchronous (blocking); retry policies are lost; no durable execution
PostgreSQL (river_agents schema)Agent state, version history, execution records, audit log, approval requestsAll state is lost; platform cannot function
RedisAgent context caching, session state, rate limitingPerformance degrades; rate limiting is bypassed
QdrantSemantic search in tool calls, knowledge base retrieval for support/analyst agentsKnowledge retrieval tools fail; affects customer support and data analyst categories
NovuApproval request notifications, agent alert deliveryApprovers are not notified; approval gates time out
MinIOReport and export artifact storage for analyst and executive categoriesReport generation tools fail; artifacts cannot be stored
WebSocket (via TLO)Live execution feed consumed by the Run Detail UILive log view is unavailable; executions still complete

Key Architectural Decisions

These are the non-obvious decisions made during the design of River Agents. Understanding the rationale helps when evaluating changes or debugging edge cases.

river-agent is stateless by design

All state -- agent configuration, persistent memory, execution context -- lives in PostgreSQL and is assembled by the Backend before each call to river-agent. river-agent receives a complete context bundle and returns a complete reasoning result. This means river-agent scales horizontally with no coordination overhead and can be restarted at any point without affecting in-flight executions (Temporal handles re-delivery).

Action levels are enforced downstream of the reasoning output

The LLM cannot be trusted to self-enforce autonomy boundaries. If river-agent reasons toward a write action that the action level disallows, the enforcement happens in the Backend governance layer and TLO Gateway ACL check -- not in the prompt. This means governance is not prompt-engineering. A compromised or hallucinating model cannot bypass the action level by reasoning around a restriction in the instruction set.

Temporal is used for approval hibernation, not a custom wait loop

When an approval gate is triggered, the Temporal workflow suspends via workflow.await(). There is no polling, no scheduled job checking for resolution, and no in-memory state that could be lost on a Backend restart. The workflow resumes only when an explicit signal is sent from the Backend on approval resolution. This makes approval gates durable across restarts, arbitrarily long-lived, and zero-cost in terms of compute during the wait.

TLO Gateway is the single security boundary for all tool dispatch

Neither river-agent nor the Backend dispatches tool calls directly to external systems. All tool invocations pass through TLO Gateway, which re-validates the per-tool ACL at dispatch time using the propagated JWT context. This means ACL changes take effect immediately on the next tool call, even within a running execution. It also means no internal service can execute a tool by bypassing TLO.

Versioning is immutable

Once an agent version is deployed, its configuration snapshot in agent_versions is write-locked. Any configuration change on a deployed agent creates a new version. This ensures that the exact configuration that produced any execution can be retrieved from the audit trail -- the agent_version_id on every agent_executions record is a stable pointer to the full configuration at run time. Rollback is a version promotion, not a revert.