Core Services Specification

River Agents are implemented across 9 backend services. Services 1, 2, 3, 5, 7, 8, and 9 run as internal modules within Backend :8005. Service 4 (Reasoning) is a separate stateless process at river-agent :8007. Service 6 (Tool Invocation) describes the tool dispatch path that routes through TLO Gateway :8001. This document specifies the API contracts, internal components, owned tables, and inter-service dependencies for each.

Quick Navigation

Service Dependency Map
Service 1: Agent Management
Service 2: Trigger Ingestion
Service 3: Agent Execution Runner
Service 4: Reasoning Service
Service 5: Data Access and Schema
Service 6: Tool and Workflow Invocation
Service 7: Governance and Approval
Service 8: Monitoring and Telemetry
Service 9: Execution Logging and Audit

Service Dependency Map

Ownership boundary: Backend :8005 owns the river_agents PostgreSQL schema and is the only process that writes to it. river-agent :8007 is stateless -- it receives all necessary state in the request body and has no direct database connection.

Service 1: Agent Management

Owns all persistent agent state. Every other service reads agent configuration from records this service writes.

Owns: river_agents.agents, river_agents.agent_versions, river_agents.agent_version_tools, river_agents.agent_version_data_sources, river_agents.agent_templates

Internal Components

Component	Responsibility
`AgentService`	CRUD business logic for the `agents` table. Enforces org/workspace scoping on all reads and writes.
`LifecycleStateMachine`	The single code path for all `agents.status` writes. Validates pre-conditions before any transition. No other component may update `agents.status` directly.
`AgentValidator`	Runs the validation job triggered by `POST /api/v1/agents/{id}/validate`. Tests data source connectivity, checks that all bound tools exist in the registry, resolves governance policy references, and validates trigger config syntax.
`VersionManager`	Creates an immutable `agent_versions` snapshot on every post-deployment configuration change. Updates `agents.current_version_id` on deploy and rollback. Enforces the partial unique index that allows only one `active` version per agent.
`TemplateService`	Manages `agent_templates` records and the conversion of a template into a pre-filled agent configuration for the creation wizard.

API Endpoints

Method	Endpoint	Permission	Description
GET	`/api/v1/agents`	`agent:read`	List agents in workspace with search, status filter, and pagination
POST	`/api/v1/agents`	`agent:create`	Create agent in `draft` status
GET	`/api/v1/agents/{id}`	`agent:read`	Full agent detail including current version, health status, and recent run summary
PUT	`/api/v1/agents/{id}`	`agent:update`	Update configuration -- creates a new version if agent is deployed or active
DELETE	`/api/v1/agents/{id}`	`agent:delete`	Soft delete -- transitions to `archived`
POST	`/api/v1/agents/{id}/validate`	`agent:update`	Run async validation checks; returns validation result with field-level errors
POST	`/api/v1/agents/{id}/deploy`	`agent:deploy`	Deploy validated agent; registers triggers and transitions to `active`
POST	`/api/v1/agents/{id}/pause`	`agent:deploy`	Suspend all triggers; transition to `paused`
POST	`/api/v1/agents/{id}/resume`	`agent:deploy`	Re-activate triggers; transition back to `active`
POST	`/api/v1/agents/{id}/archive`	`agent:delete`	Retire agent; preserve for audit; block all future runs
GET	`/api/v1/agents/{id}/versions`	`agent:read`	List all version records for an agent
POST	`/api/v1/agents/{id}/versions/{vid}/rollback`	`agent:deploy`	Update `current_version_id` to a previous version; emits audit event
GET	`/api/v1/agent-templates`	`agent:read`	List available agent templates
GET	`/api/v1/agent-templates/{id}`	`agent:read`	Full template detail including configuration defaults
POST	`/api/v1/agents/from-template/{template_id}`	`agent:create`	Create pre-filled agent from template
POST	`/api/v1/agents/generate`	`agent:create`	AI-powered agent generation from natural language via river-agent :8007

Implementation Notes

Validation is asynchronous. The POST /validate endpoint queues a background job and returns a 202. The frontend polls the agent detail endpoint until validation_status is passed or failed. Validation can take up to 30 seconds for agents with multiple data sources.

PUT on a deployed agent does not modify the active version. It creates a new version with status = 'draft'. The agent continues running on the existing active version until the operator explicitly deploys the new one. This prevents configuration changes from affecting in-flight runs.

Rollback does not create a new version. It updates current_version_id and emits an agent_version_rolled_back audit event. The version history is preserved intact.

Service 2: Trigger Ingestion

Normalizes all trigger types into a uniform ExecutionRequest and routes to the Agent Execution Runner. This service has no knowledge of what happens during a run -- its only job is to ingest, validate, and enqueue.

Owns: river_agents.agent_triggers, river_agents.trigger_rejection_log

Internal Components

Component	Handles	Ingestion Mechanism
`ManualTriggerHandler`	`manual`	HTTP POST from frontend or external API call
`ScheduledTriggerHandler`	`scheduled`	Internal scheduler polls `agent_triggers` for due expressions every 10 seconds; handles timezone normalization
`EventTriggerHandler`	`event`	Kafka consumer group `river-agents-events`; evaluates `event_type` and optional `payload_conditions` filter per agent
`APITriggerHandler`	`api`	Authenticated endpoint; validates API key, enforces per-agent rate limits, optionally validates payload against `trigger_config.payload_schema`
`ThresholdTriggerHandler`	`threshold`	Subscribes to metric stream from Service 8; evaluates `threshold_expression`; enforces `cooldown_seconds` to prevent re-fire storms
`WorkflowTriggerHandler`	`workflow`	Listens for Temporal workflow signals matching `trigger_config.workflow_id` and `signal_name`
`TriggerDispatcher`	All	Normalizes all handler outputs into `ExecutionRequest`; validates agent is `active`; enforces concurrency policy; enqueues to Execution Queue

API Endpoints

Method	Endpoint	Permission	Description
POST	`/api/v1/agents/{id}/run`	`agent:execute`	Manual trigger -- immediately enqueues execution
POST	`/api/v1/agents/{id}/triggers`	`agent:update`	Register a new trigger configuration for an agent
GET	`/api/v1/agents/{id}/triggers`	`agent:read`	List all trigger configurations for an agent
PUT	`/api/v1/agents/{id}/triggers/{tid}`	`agent:update`	Update trigger configuration
DELETE	`/api/v1/agents/{id}/triggers/{tid}`	`agent:update`	Remove trigger
POST	`/api/v1/agent-webhooks/{agent_id}`	API key auth	Inbound webhook endpoint for event-based triggers
POST	`/api/v1/agent-api/{agent_id}/execute`	API key auth	Authenticated execution endpoint for external system triggers

ExecutionRequest Schema

Every trigger normalizes to this structure before the Dispatcher enqueues it.

Field	Type	Description
`request_id`	UUID	Idempotency key; duplicate `request_id` values are rejected by the Runner
`agent_id`	UUID	Target agent
`agent_version_id`	integer	Pinned at ingestion time from `agents.current_version_id`; immutable for the run
`trigger_type`	enum	`manual`, `scheduled`, `event`, `api`, `threshold`, `workflow`
`trigger_source`	string	Audit label (e.g., `"cron:daily-8am"`, `"webhook:zendesk-ticket"`)
`trigger_payload`	JSONB	Contextual data from the trigger source; passed to river-agent as part of the context bundle
`requested_by`	UUID	User ID for manual/API triggers; service identifier for automated triggers
`requested_at`	timestamptz	Ingestion timestamp; used for SLA and latency tracking

Implementation Notes

Concurrency is enforced at dispatch time. If agent_versions.runtime_config.allow_concurrent_runs is false and a run is already in running status, the TriggerDispatcher applies the on_concurrent_trigger policy: queue holds the request until the current run finishes, drop discards it with a rejection log entry, and replace cancels the running execution and starts fresh.

Event triggers use payload condition matching. The EventTriggerHandler evaluates trigger_config.payload_conditions as a JSON path expression against the incoming webhook payload. Agents with overlapping event type subscriptions are each evaluated independently.

Service 3: Agent Execution Runner

The central coordinator for every agent run. Bridges the normalized trigger from Service 2 to the reasoning engine in Service 4, manages approval gate hibernation, enforces execution limits, and finalizes the run record.

Owns: river_agents.agent_executions, river_agents.agent_memory

Internal Components

Component	Responsibility
`AgentRunWorkflow`	The root Temporal workflow. Owns the full execution lifecycle from `running` to a terminal state (`completed`, `failed`, `cancelled`, `budget_exhausted`). All execution state lives in this workflow and the linked DB records.
`ContextBuilder`	Assembles the `AgentContext` bundle before the first reasoning call and after each approval gate resume. Loads agent version config, `long_term_context`, trigger payload, data source schema metadata, and the active tool registry.
`StateSerializer`	On gate: writes `agent_executions.serialized_state` (JSONB containing conversation history, pending tool call, observation buffer, memory snapshot) and transitions execution to `paused`. On resume: deserializes and reconstructs `AgentContext` from the stored snapshot.
`RetryManager`	Applies Temporal retry policy per activity: 3 retries with exponential backoff for transient failures. Marks tool calls as permanently failed after retry exhaustion.
`TimeoutManager`	Enforces per-tool timeout (default 30 seconds) and per-run turn limit (default 15 turns). A run that exhausts turns without reaching finalization terminates with `budget_exhausted` status.

API Endpoints

Method	Endpoint	Permission	Description
GET	`/api/v1/agent-runs`	`agent:read`	List runs with filters: agent, status, trigger type, date range, pagination
GET	`/api/v1/agent-runs/{id}`	`agent:read`	Full run detail: status, timing, token usage, final output
POST	`/api/v1/agent-runs/{id}/stop`	`agent:execute`	Emergency cancel a running execution; sends cancellation signal to Temporal workflow
POST	`/api/v1/agent-runs/{id}/retry`	`agent:execute`	Re-enqueue a failed run using the same trigger payload and version
GET	`/api/v1/agent-runs/{id}/logs`	`agent:read`	Paginated turn-level log entries for a run
GET	`/api/v1/agent-runs/{id}/stream`	`agent:read`	WebSocket upgrade -- real-time run progress events

Implementation Notes

The Temporal workflow ID is derived from execution_id. The pattern is agent-run-{execution_id}. This means the workflow can be looked up by execution ID directly from Temporal's API without a separate index table. It also means the execution record must be created before the Temporal workflow is started -- the agent_executions.id is the source of truth.

Approval signals go directly to Temporal, not through the REST layer. When an operator calls PATCH /api/v1/approvals/{id}, Service 7 updates the approval_requests record and sends a signal via the Temporal client SDK: temporal.signal_workflow(workflow_id="agent-run-{execution_id}", signal="approval_resolved", payload={...}). The REST endpoint does not block waiting for the workflow to resume.

Memory write-back is non-blocking. After finalization, the ContextBuilder writes the memory_updates delta from the reasoning result back to river_agents.agent_memory. This write happens asynchronously after the run record is marked completed -- a write failure does not change the run's terminal status.

Service 4: Reasoning Service

The LLM reasoning engine. This service is implemented as a separate FastAPI process at river-agent :8007. It is entirely stateless -- all state arrives in the request body and all outputs are returned in the response. It is called once per reasoning turn by Service 3.

Owns: No database tables. Stateless.

Internal Components

Component	Responsibility
`AgentLoop`	Entry point. Receives the `AgentContext` bundle, runs one Reason -> Act -> Observe iteration, and returns a structured `ReasoningResult` containing the next tool call or a finalization signal.
`SystemPromptBuilder`	Dynamically composes the system prompt per turn by assembling sections from the `AgentContext`. The prompt is rebuilt on every turn to reflect the current observation state.
`RiverCore`	Multi-provider LLM router. Classifies the turn complexity, selects the appropriate model tier and provider, executes the inference call, and handles failover to the next provider in the chain on error.
`ToolCallParser`	Parses the raw LLM output into a structured `ToolCall` object (tool name + validated arguments). Rejects malformed tool selections before returning to Service 3.

System Prompt Composition

The SystemPromptBuilder assembles the system prompt from these sections in order:

Section	Content	Source
Role Definition	`"You are {agent.name}, a {agent.business_function} agent..."`	`agent_versions.name`, `business_function`
Goal and Instructions	The agent's natural language instruction set	`agent_versions.instruction_set`
Available Tools	JSON schema definitions for all tools in the active tool registry	Tool registry filtered by `agent_versions.selected_tools`
Data Context	Schema metadata for all connected data sources	Assembled by `ContextBuilder` from Service 5
Governance Constraints	Action level and applicable policy constraints in natural language	`agent_versions.action_level`, resolved policies
Long-Term Memory	Structured summary of learnings from past runs	`agent_memory.context_snapshot`
Trigger Context	The trigger type and payload for this run	`ExecutionRequest.trigger_payload`
Conversation History	All prior turns in this execution (reasoning + observations)	Accumulated in `AgentContext` across turns

Implementation Notes

Service 3 calls Service 4 once per turn, not once per run. The request body contains the full AgentContext at the current turn's state. This is intentional -- it keeps river-agent stateless and allows Service 3 (via Temporal) to be the durable state holder. The tradeoff is larger request payloads on longer runs.

river-agent does not know its action level. The tool registry it receives in the context bundle is already filtered by Service 3 based on action level. If a tool requires approval, that check happens in Service 7 after river-agent returns its tool selection. From river-agent's perspective, it selects from the tools it was given -- governance enforcement is downstream.

Finalization is signaled by a special tool call. When river-agent determines the goal is reached, it returns a ToolCall with tool_name = "finalize" and a structured final_output in the arguments. Service 3 detects this and begins the run finalization sequence without invoking Service 6.

Service 5: Data Access and Schema

Provides the reasoning engine with schema-aware, ACL-governed access to all connected data sources. All reads pass through TLO Gateway -- this service never connects directly to external data sources.

Owns: No dedicated tables. Reads from platform.data_sources (cross-schema). Schema metadata is cached in Redis with per-source TTL.

Internal Components

Component	Responsibility
`SchemaDiscoveryService`	Retrieves and caches table, column, and relationship metadata for connected data sources. Cache TTL is configurable per data source (default: 3600 seconds). Force-refresh is triggered by `POST /data-sources/{id}/discover-schema`.
`QueryExecutionEngine`	Translates a validated query spec into a Data Orchestration Service request. Handles result pagination, type normalization, and error classification (user error vs. connectivity error vs. timeout).
`SemanticCatalog`	Vector search over data source metadata in Qdrant. Used by the reasoning engine to resolve table/column references by semantic meaning when exact names are unknown.
`DataConnectorProxy`	Routes all data source interactions through TLO Gateway to the Data Orchestration Service. Injects the `X-Agent-ID` and `X-Execution-ID` headers for audit trail correlation in the downstream service.

API Endpoints

Method	Endpoint	Permission	Description
GET	`/api/v1/data-sources/{id}/schema`	`data_source:view`	Retrieve cached schema metadata for a connected data source
POST	`/api/v1/query/execute`	`data_source:query`	Execute a query against a connected data source via Data Orchestration
GET	`/api/v1/catalog/search`	`data_source:view`	Semantic search over data source metadata using Qdrant vectors
POST	`/api/v1/data-sources/{id}/test`	`data_source:view`	Test connectivity to a data source

Supported Data Source Types

Category	Sources
SQL Databases	PostgreSQL, MySQL, SQL Server, Snowflake, BigQuery, Redshift
NoSQL Databases	MongoDB, DynamoDB, Elasticsearch
SaaS APIs	Salesforce, HubSpot, Zendesk, Stripe, Shopify
File Storage	CSV and Excel files in MinIO, Google Sheets
Custom APIs	Any REST API with a registered OpenAPI specification

Implementation Notes

Schema metadata is cached, not live. The reasoning engine sees a schema snapshot, not the live database state. Stale schema causes query generation errors at execution time. The SchemaDiscoveryService detects schema-related query errors and triggers a background refresh for the affected data source.

ACL is checked per call at TLO Gateway. A data_source:view permission is required for schema discovery; data_source:query is required for query execution. These checks happen at TLO on every call. A data source permission revoked mid-run takes effect on the next tool invocation within that run.

Service 6: Tool and Workflow Invocation

Describes the path a tool call takes from the reasoning result to execution on a target service. This is not a standalone process -- it is the pattern implemented by Service 3 when dispatching an approved tool call through TLO Gateway.

Owns: No tables. Operates as a dispatch path within Service 3.

Tool Execution Steps

Step	Action	Owner
1	river-agent selects tool and returns structured `ToolCall`	Service 4
2	Service 3 validates arguments against tool's Pydantic input schema	Service 3 / Tool Registry
3	Service 3 calls Service 7 for action level and policy check	Service 7
4	If gated: approval flow; if blocked: turn error; if allowed: proceed	Service 7
5	Service 3 sends tool dispatch request to TLO Gateway with governance token	Service 3 -> TLO
6	TLO validates JWT and per-tool ACL permission	TLO Gateway
7	TLO routes to target service (Backend, Data Orchestration, external API)	TLO Gateway
8	Target service executes and returns result	Target service
9	Service 3 receives result; passes to `ResultValidator`	Service 3
10	Validated result is formatted as an observation and injected into the next `AgentContext`	Service 3

Tool Registry

All tools available to River Agents are registered in the Tool Registry. Engineers adding new tools must define:

A unique tool_name identifier
A Pydantic input schema (used for argument validation at step 2 and for prompt injection at system prompt build time)
A Pydantic output schema (used for result validation at step 9)
A write_classified boolean (determines whether the tool triggers action level checks in Service 7)
A required TLO ACL permission string (e.g., "agent:execute", "data_source:query")
A target service route (Backend, Data Orchestration, or external service URL)

Custom tools registered via OpenAPI spec upload are validated against this schema at registration time. A spec that cannot be mapped to a valid tool definition is rejected.

Workflow Invocation

For operations that span multiple steps or services, Service 3 can invoke a Temporal sub-workflow rather than a single tool call. The WorkflowInvoker starts a child workflow using the workflow_id specified in agent_versions.selected_workflows and passes the reasoning engine's arguments as the workflow input. The execution observes the workflow execution ID and optionally awaits the result if the parent workflow is configured to wait.

Service 7: Governance and Approval

The enforcement boundary between what an agent reasons it should do and what it is permitted to do. Every write-capable tool call passes through this service. No tool dispatch occurs without this service's sign-off.

Owns: river_agents.approval_requests, river_agents.governance_policies, river_agents.agent_policy_bindings

Internal Components

Component	Responsibility
`ActionLevelChecker`	Evaluates a proposed tool call against the agent's `action_level`. Returns `execute`, `stage_only`, or `gate` as the enforcement outcome.
`PolicyEngine`	Evaluates the current execution context (agent identity, workspace, tool name, tool arguments, data classification) against all bound governance policies. Returns `allow`, `block`, `gate`, or `alert` per matching policy.
`ApprovalGateService`	Creates `approval_requests` records, assigns approvers from `approval_rules`, and sends the approval signal to the Temporal workflow on resolution.
`ApprovalNotifier`	Dispatches approval request notifications via Novu to configured channels: Slack, email, in-app, or PagerDuty. Handles escalation if no response within `approval_rules.timeout_hours`.

API Endpoints

Method	Endpoint	Permission	Description
GET	`/api/v1/approvals`	`agent:approve`	List approval requests with status, agent, and date filters
GET	`/api/v1/approvals/{id}`	`agent:approve`	Full approval detail: proposed action, reasoning context, risk assessment
PATCH	`/api/v1/approvals/{id}`	`agent:approve`	Resolve: `approve`, `reject`, or `edit_and_approve` with modified arguments
GET	`/api/v1/approvals/pending`	`agent:approve`	Count of pending approvals (used for badge display)
GET	`/api/v1/agents/{id}/policies`	`agent:read`	List governance policies bound to an agent
POST	`/api/v1/agents/{id}/policies`	`agent:update`	Bind a governance policy to an agent
DELETE	`/api/v1/agents/{id}/policies/{pid}`	`agent:update`	Remove a policy binding

Action Level Enforcement Matrix

Agent `action_level`	Tool `write_classified`	Outcome
`read_only`	false (read tool)	Execute
`read_only`	true (write tool)	Block -- write operations not permitted at this level
`recommend`	false or true	Stage as proposal -- no execution; returned to user as recommendation
`act_with_approval`	false (read tool)	Execute
`act_with_approval`	true -- tool in `approval_rules`	Gate -- create `ApprovalRequest`, pause execution, notify approver
`act_with_approval`	true -- tool not in `approval_rules`	Execute
`automated`	false or true	Execute -- all configured tools run without gates

Approval Request Lifecycle

Approval Request State Machine

Implementation Notes

Policy evaluation happens after action level check. A tool call blocked at the action level never reaches the PolicyEngine. The policy engine evaluates only calls that pass the action level check. This ordering means action level is always the outer constraint.

edit_and_approve substitutes arguments entirely. When an approver uses edit-and-approve, the original ToolCall arguments are discarded and the approver's revised arguments are used. The approval_requests record stores both the original and modified arguments for audit. The ResultValidator in Service 6 re-validates the modified arguments against the tool's input schema before dispatch.

Every governance decision -- including execute outcomes -- emits an audit event. Service 7 calls Service 9 on every enforcement outcome, not only on blocks and gates. This ensures the audit trail is complete for compliance purposes.

Service 8: Monitoring and Telemetry

Collects and aggregates runtime telemetry across all agent executions. Powers the system-wide monitoring dashboard, per-agent metrics, real-time execution views, and the alert engine.

Owns: river_agents.agent_metrics_hourly, river_agents.agent_alerts

Internal Components

Component	Responsibility
`TelemetryEmitter`	Writes structured turn-level events to the WebSocket channel during active runs. Called by Service 3 at each turn boundary, tool dispatch, and approval gate event.
`MetricAggregator`	Aggregates execution outcomes into rolling windows (1h, 24h, 7d, 30d) and writes to `agent_metrics_hourly`. Runs as a background job after each run completion.
`HealthEvaluator`	Evaluates `agent_metrics_hourly` on a 5-minute cycle to derive `agents.health_status` (`healthy`, `degraded`, `critical`, `unknown`). Degraded threshold: success rate below 80% over 24 hours. Critical threshold: below 50%.
`AlertEngine`	Monitors per-agent and system-wide metrics against configured alert rules. Dispatches to Novu on threshold breach. Implements a cooldown period to prevent alert floods for sustained degradation.

API Endpoints

Method	Endpoint	Permission	Description
GET	`/api/v1/agents/{id}/metrics`	`agent:read`	Per-agent metrics for a configurable time window
GET	`/api/v1/monitoring/overview`	`agent:monitor`	System-wide: total agents, active runs, throughput, health distribution
GET	`/api/v1/monitoring/throughput`	`agent:monitor`	Time-series throughput data for the system chart
GET	`/api/v1/monitoring/alerts`	`agent:monitor`	Recent alert stream with severity and agent reference
GET	`/api/v1/monitoring/cluster`	`agent:monitor`	Runtime instance health table
WS	`/ws/agent-runs/{id}`	`agent:read`	Real-time run progress events for a specific execution
WS	`/ws/monitoring`	`agent:monitor`	System-wide real-time monitoring event stream

Metric Definitions

Metric	Aggregation	Scope	Storage
Total Runs	Count	Per-agent, System-wide	`agent_metrics_hourly.run_count`
Success Rate	`completed / total` as percentage	Per-agent	`agent_metrics_hourly.success_rate`
Average Latency	P50, P90, P99 in milliseconds	Per-agent, Per-tool	`agent_metrics_hourly.latency_p50/p90/p99`
Failure Count	Count with categorized reasons	Per-agent, System-wide	`agent_metrics_hourly.failure_count` + `failure_reasons` JSONB
Throughput	Runs per hour	System-wide	Derived from `agent_metrics_hourly` at query time
Pending Approvals	Count of `approval_requests` where `status = 'pending'`	Per-agent, System-wide	Queried live from `approval_requests`
Token Cost	Total tokens multiplied by model unit pricing	Per-agent, Per-run	`agent_executions.token_usage` JSONB
Actions Taken	Count grouped by `tool_name`	Per-agent	Aggregated from `agent_logs` at query time

WebSocket Event Types Emitted

Event	Payload Fields	Emitted When
`run_started`	`execution_id`, `agent_id`, `started_at`	Execution begins
`turn_reasoning`	`turn`, `content`, `model_used`, `tokens`	Reasoning turn complete
`tool_called`	`turn`, `tool_name`, `inputs`	Tool dispatch initiated
`tool_result`	`turn`, `tool_name`, `output`, `duration_ms`	Tool result received
`approval_requested`	`approval_id`, `tool_name`, `pending_since`	Approval gate triggered
`approval_resolved`	`approval_id`, `resolution`, `resolved_by`	Approval gate resolved
`run_completed`	`execution_id`, `status`, `final_output`, `duration_ms`	Execution finalized
`run_failed`	`execution_id`, `error_code`, `error_message`	Execution terminated with error

Service 9: Execution Logging and Audit

The write-once source of truth for all historical agent behavior. Provides queryable, filterable, and exportable audit trails for debugging, compliance, and analysis.

Owns: river_agents.audit_logs, river_agents.agent_logs

Internal Components

Component	Responsibility
`AuditWriter`	Single entry point for all writes to `audit_logs`. Enforces write-once semantics at the application layer -- no UPDATE or DELETE is permitted on this table through this component.
`ExecutionLogWriter`	Writes turn-level records to `agent_logs` during active runs. Called by Service 3 after every reasoning turn, tool call, and approval gate event.
`AuditQueryService`	Provides the read interface for both `audit_logs` and `agent_logs`. Handles filter compilation, pagination, and export formatting.
`RetentionManager`	Applies retention policies: archives or deletes audit records older than the configured retention period per organization. Retention periods are stored in `iam.organizations.audit_retention_days`.

API Endpoints

Method	Endpoint	Permission	Description
GET	`/api/v1/audit/logs`	`agent:audit`	Search audit logs with filters: agent, user, event type, date range, outcome, pagination
GET	`/api/v1/audit/logs/{id}`	`agent:audit`	Full audit entry with complete payload detail
GET	`/api/v1/audit/stats`	`agent:audit`	Aggregate counts: total events, by type, by agent, by user
GET	`/api/v1/audit/export`	`agent:audit`	Export filtered audit logs as CSV or JSON for compliance reporting
GET	`/api/v1/agent-runs/{id}/trace`	`agent:read`	Complete turn-by-turn execution trace for a single run

Audit Log Entry Schema

Field	Type	Description
`log_id`	UUID	Immutable identifier; set at write time and never changed
`timestamp`	timestamptz	Event time; indexed for range queries
`event_type`	enum	`agent.created`, `agent.deployed`, `agent.archived`, `run.started`, `run.completed`, `run.failed`, `tool.executed`, `tool.blocked`, `approval.requested`, `approval.resolved`, `policy.violated`, `permission.denied`
`agent_id`	UUID	Related agent (nullable for non-agent events)
`execution_id`	UUID	Related run (null for non-execution events)
`user_id`	UUID	Actor who initiated the event; system service ID for automated events
`action`	string	Specific action description (e.g., `"deploy_agent"`, `"execute_tool:update_crm"`)
`payload`	JSONB	Full request context and response (scrubbed of secrets)
`outcome`	enum	`success`, `failure`, `blocked`, `pending`
`ip_address`	inet	Source IP from the originating request; populated from TLO Gateway propagation header
`organization_id`	UUID	Tenant isolation key; all queries must filter by this

Implementation Notes

audit_logs has no UPDATE or DELETE paths in the application code. The write-once constraint is enforced by AuditWriter -- there is no update_audit_log or delete_audit_log method. At the database layer, this is reinforced by a row-level security policy that grants INSERT only (no UPDATE, no DELETE) to the Backend service's database role.

Turn-level logs in agent_logs are separate from audit logs. agent_logs stores the detailed reasoning trace (turn content, tool arguments, observations) and is queried for debugging and run detail views. audit_logs stores governance events, lifecycle transitions, and compliance-relevant actions. The two tables serve different consumers: agent_logs is for operators debugging a run; audit_logs is for compliance and security review.

Export uses streaming for large result sets. The GET /api/v1/audit/export endpoint streams the response rather than buffering the full result set in memory. Callers should handle chunked transfer encoding. Result sets over 100,000 rows are automatically split into multiple files in a ZIP archive.

Service Dependency Map​

Service 1: Agent Management​

Internal Components​

API Endpoints​

Implementation Notes​

Service 2: Trigger Ingestion​

Internal Components​

API Endpoints​

ExecutionRequest Schema​

Implementation Notes​

Service 3: Agent Execution Runner​

Internal Components​

API Endpoints​

Implementation Notes​

Service 4: Reasoning Service​

Internal Components​

System Prompt Composition​

Implementation Notes​

Service 5: Data Access and Schema​

Internal Components​

API Endpoints​

Supported Data Source Types​

Implementation Notes​

Service 6: Tool and Workflow Invocation​

Tool Execution Steps​

Tool Registry​

Workflow Invocation​

Service 7: Governance and Approval​

Internal Components​

API Endpoints​

Action Level Enforcement Matrix​

Approval Request Lifecycle​

Implementation Notes​

Service 8: Monitoring and Telemetry​

Internal Components​

API Endpoints​

Metric Definitions​

WebSocket Event Types Emitted​

Service 9: Execution Logging and Audit​

Internal Components​

API Endpoints​

Audit Log Entry Schema​

Implementation Notes​

Service Dependency Map

Service 1: Agent Management

Internal Components

API Endpoints

Implementation Notes

Service 2: Trigger Ingestion

Internal Components

API Endpoints

ExecutionRequest Schema

Implementation Notes

Service 3: Agent Execution Runner

Internal Components

API Endpoints

Implementation Notes

Service 4: Reasoning Service

Internal Components

System Prompt Composition

Implementation Notes

Service 5: Data Access and Schema

Internal Components

API Endpoints

Supported Data Source Types

Implementation Notes

Service 6: Tool and Workflow Invocation

Tool Execution Steps

Tool Registry

Workflow Invocation

Service 7: Governance and Approval

Internal Components

API Endpoints

Action Level Enforcement Matrix

Approval Request Lifecycle

Implementation Notes

Service 8: Monitoring and Telemetry

Internal Components

API Endpoints

Metric Definitions

WebSocket Event Types Emitted

Service 9: Execution Logging and Audit

Internal Components

API Endpoints

Audit Log Entry Schema

Implementation Notes