Execution Flow and Runtime Lifecycle

This document traces every agent execution from trigger reception through post-run processing -- step by step, service by service. It covers the 9-step runtime flow, complete sequence diagrams, trigger normalization, the approval gate hibernation mechanism, context assembly, real-time telemetry, error recovery, and multi-agent pipeline chaining.

Quick Navigation

Universal Runtime Flow
Complete Execution Sequence
Trigger Ingestion Flow
Approval Gate Sequence
Context Building Phase
Real-Time Telemetry Flow
Error Handling and Recovery
Post-Execution Processing
Multi-Agent Pipeline Flow

Universal Runtime Flow

Every agent execution follows these 9 steps regardless of trigger type, agent category, or action level. Steps 5 through 8 repeat per reasoning turn until the agent finalizes or a terminal condition is reached. Step 9 executes exactly once on any terminal transition.

9-Step Runtime Flow

Step	Name	Owning Service	Key Contract
1	Trigger Reception	Trigger Ingestion Service	Normalizes any signal source into a uniform `ExecutionRequest` schema
2	Agent Resolution	Agent Management Service	Validates agent is `Active`; rejects with log if `paused`, `archived`, or `draft`
3	Context Building	Context Builder	Assembles `AgentContext` -- instructions, tools, memory, schemas, trigger payload
4	Retrieval and Data Access	Data Access Service via TLO Gateway	Schema discovery + query execution; ACL validated per call at TLO Gateway
5	Reasoning and Planning	Reasoning Engine (river-agent :8007)	LLM selects next tool call or emits finalize signal; does not enforce governance
6	Governance and Policy Check	Governance Service (Backend :8005)	Enforces `action_level` and org policies before every write tool dispatch
7	Action Execution	Tool Execution Service via TLO Gateway	Routes approved tool calls to target services; formats result as observation
8	Logging and Observability	Logging Service + Monitoring Service	Writes per-turn log entry; updates metrics; streams WebSocket event to UI
9	Post-Execution Outcome	Execution Service + Notification Service	Terminal state write, async memory write-back, health recalculation, notifications

Reasoning loop bounds: The loop runs a maximum of 15 turns by default, configurable per agent via max_turns. The runner also enforces two additional loop-break conditions independent of the turn counter: no tool calls for 3 consecutive turns (reasoning stall), and the same tool called 3 times with identical arguments (infinite loop detection). Both conditions are evaluated by AgentRunWorkflow, not by the LLM.

Complete Execution Sequence

The following sequence covers a single execution that includes one approval-gated tool call. Non-governed tool calls follow the Fully Automated branch only.

Trigger Ingestion Flow

All six trigger types funnel through a common normalization step before the Agent Runner takes control. The ExecutionRequest schema is the interface boundary -- everything upstream of it is trigger-type-specific; everything downstream is uniform.

Trigger Ingestion Flow

Trigger-Specific Handler Behavior

Trigger Type	Handler Logic
Manual	Immediate dispatch. `requested_by` is set to the initiating user ID. Optional payload from the UI form is forwarded in `trigger_payload`.
Scheduled	Scheduler polls `scheduled_jobs` every 30 seconds. On cron match, creates `ExecutionRequest` with cron expression context. Handles timezone conversion. Prevents duplicate fires via distributed lock for the scheduled interval.
Event	Webhook endpoint receives event payload. Handler matches event type against all registered agent trigger rules. Applies payload condition filters (e.g., only fire if `priority = high`). A single incoming event may trigger multiple agents in parallel.
API	Validates API key against the issuing org. Parses and validates payload against optional JSON schema. Rate limiting enforced per API key at a default configurable per workspace.
Threshold	Subscribes to metric stream. Evaluates threshold expression (e.g., `cpu_percent > 90 for 5m`). Enforces cooldown period after each firing to prevent trigger storms on sustained threshold breaches.
Workflow	Listens for Temporal signals from parent workflows. Receives parent workflow context in payload. Enables ordered pipeline chaining: Agent N signals Agent N+1 with its output as the trigger payload.

Approval Gate Sequence

The approval gate uses Temporal's workflow.await() to pause execution state without consuming compute resources. The full AgentContext is serialized to the database; the Temporal workflow thread suspends. On signal receipt, the thread resumes by deserializing the saved state and continuing from the exact point of interruption.

Approval Gate Implementation Notes

Aspect	Behavior
Context serialization scope	Serialized context includes full conversation history up to the paused turn, the pending tool call and its arguments, and all observations received so far. Stored in `AgentExecution.context_snapshot`.
Edit and Approve semantics	Modified arguments replace the original tool call arguments entirely. The LLM never sees the original arguments again. The observation reflects the modified values so reasoning history remains internally consistent.
Approval expiry	`ApprovalRequest` records that remain `PENDING` beyond the configured expiry duration are transitioned to `EXPIRED`. The Temporal workflow receives an `approval_resolved (EXPIRED)` signal and the run terminates with status `approval_expired`.
Audit on every path	Every approval gate resolution -- approved, rejected, edited, or expired -- emits an `audit_logs` entry before the Temporal signal is sent. The audit record is the authoritative source of the governance decision; the `ApprovalRequest` record is the operational state tracker.
Approver notification routing	Notification channels (Slack, email, in-app) are configured per workspace in `WorkspaceSettings.approval_notification_config`. At least one channel must be active for approval-gated agents to deploy.

Context Building Phase

The Context Builder executes once per execution before the reasoning loop starts. Its output -- the AgentContext -- is passed to the System Prompt Builder, which constructs the full LLM prompt for the first reasoning turn. On subsequent turns, only the new observation is appended; the base context is not rebuilt.

Context Building Phase

AgentContext Composition

Section	Source	Content
Identity	Agent record	`agent_id`, `name`, `business_function`, `owner_user_id`
Instructions	Agent version	`instruction_set` -- the agent's goal expressed in natural language
Allowed Tools	Agent version + Tool Registry	Tool names and Pydantic schemas in LLM function-calling format
Data Sources	Agent version + Schema Service	`data_source_ids` with resolved table and column metadata
Governance	Agent config + Policy Engine	`action_level`, `approval_rules`, active org and workspace policies
Memory	Long-term context DB	Accumulated learnings from previous runs (summarized, not raw log entries)
Trigger Context	ExecutionRequest	`trigger_type`, `trigger_source`, `trigger_payload`
Execution Metadata	Execution record	`execution_id`, `turn_count` (starts at 0), `start_time`

Schema metadata caching: The Schema Discovery Service caches schema metadata -- it is not fetched live on every execution. The cache is invalidated when a data source connection is updated. Schema changes in a connected database do not propagate automatically; engineers adding new tables to a connected data source must trigger a manual schema refresh from the Data Sources configuration panel.

Memory summarization: Long-term context is stored as summarized learnings, not raw log entries. The summary is generated at the end of each successful run and stored per-agent. This keeps the system prompt within token budget even for agents with hundreds of prior runs.

Real-Time Telemetry Flow

During an active run, the Agent Runner emits WebSocket events after each turn transition. The UI subscribes to the run-specific channel on page load and renders each event incrementally without polling.

WebSocket Event Reference

Event	Direction	Key Payload Fields	UI Action
`run_started`	Server to Client	`execution_id`, `agent_id`, `trigger_type`, `started_at`	Open live feed; render run start indicator
`turn_reasoning`	Server to Client	`turn`, `thought_summary`, `model_used`	Append reasoning block to activity feed
`tool_called`	Server to Client	`turn`, `tool_name`, `args_summary`	Render tool call badge
`observation_received`	Server to Client	`turn`, `tool_name`, `result_summary`, `duration_ms`	Render observation block
`turn_complete`	Server to Client	`turn`, `duration_ms`	Update per-turn latency metric
`approval_required`	Server to Client	`approval_id`, `tool_name`, `pending_since`	Render inline approval card with action controls
`approval_resolved`	Server to Client	`approval_id`, `resolution`, `resolved_by`	Update approval card to resolved state
`run_completed`	Server to Client	`status`, `duration_ms`, `turns_used`, `final_output_summary`	Render completion badge; close live feed

WebSocket authentication: The connection uses a Bearer token passed as a subprotocol header (Sec-WebSocket-Protocol: Bearer <access_token>). The token is validated on the upgrade handshake. The connection is scoped to executions belonging to the authenticated user's workspace -- clients cannot subscribe to runs from other workspaces.

Error Handling and Recovery

Tool failures return structured error observations to the reasoning engine rather than terminating the execution. The LLM decides whether to retry with different arguments, use an alternative tool, escalate to a human via ask_user, or concede failure. The runner enforces hard limits independently of the LLM decision.

Error Handling and Recovery Flow

Turn Limit Enforcement

These three conditions are evaluated by AgentRunWorkflow independently of LLM behavior. They are safety rails, not LLM features.

Condition	Detection	Runner Response
Turn count reaches `max_turns` (default: 15)	Turn counter incremented per turn by `AgentRunWorkflow`	Runner requests a final progress summary from river-agent :8007, then terminates. Status: `max_turns_exceeded`.
No tool calls for 3 consecutive turns	Zero `tool_call` events in 3 sequential responses from river-agent	Runner injects system message: "You appear to be repeating yourself. Please take action or conclude."
Same tool called 3 times with identical arguments	SHA hash of `(tool_name + serialized_args)` compared across turns	Runner forces immediate termination with diagnostic log entry. Status: `failed`, error code: `infinite_tool_loop`.

Timeout separation: The 30-second per-tool timeout is enforced at the TLO Gateway level. The reasoning engine has a separate per-turn LLM inference timeout (default: 120 seconds), configurable via llm_timeout_seconds in the agent's version config. These are independent counters; a slow LLM response and a slow tool call each consume their own timeout budget.

Post-Execution Processing

Post-execution processing triggers on any terminal state transition: completed, failed, max_turns_exceeded, or cancelled. All steps are sequential and synchronous except long-term context write-back, which is non-blocking to avoid delaying status finalization.

Post-Execution Processing

Memory write-back failure handling: The long-term context update is fire-and-forget from the perspective of the AgentExecution record. If the memory write fails, the run status is still marked completed and the failure is logged as a non-fatal error in the execution service log. Memory consistency is best-effort; execution state integrity is not.

Health threshold evaluation: The 3-failure degraded and 5-failure critical thresholds are evaluated against the agent's most recent consecutive run history, not cumulative all-time counts. A single successful run resets the consecutive failure counter. Health status changes emit an audit_logs entry with event_type: agent_health_changed.

Multi-Agent Pipeline Flow

Agents chain using Workflow triggers: one agent emits a Temporal signal on completion, the downstream agent's trigger handler receives it, extracts the output as the trigger payload, and starts a new independent execution. The agents do not communicate directly -- the Temporal signal is the only coupling between pipeline stages.

Pipeline Design Constraints

Concern	Behavior
Signal durability	Temporal signals are durable. If the downstream agent's workflow has not started yet when the signal arrives, the signal is buffered until the workflow is ready. No data is lost on transient delays.
Error propagation	Pipeline stages are independent executions. A failure in Agent 2 does not automatically fail Agent 3 -- Agent 3 simply never receives its trigger signal. Downstream pipeline stages must be monitored independently.
Action level variation	Each pipeline stage can have a different action level. An approval gate in Agent 2 pauses that stage's execution but does not affect Agent 1's already-completed status.
Payload size limit	The Temporal signal carries a reference (e.g., MinIO path or `execution_id`), not the raw data. Agents that need the actual dataset fetch it from the referenced location during their data access step (Step 4). Large data must never be embedded directly in the signal payload.
Cycle prevention	Circular pipeline references -- Agent A signals Agent B which signals Agent A -- are not automatically detected at configuration time. Engineers designing pipelines are responsible for ensuring acyclicity. A circular pipeline will consume execution budget indefinitely until budget exhaustion or manual cancellation.

Universal Runtime Flow​

Complete Execution Sequence​

Trigger Ingestion Flow​

Trigger-Specific Handler Behavior​

Approval Gate Sequence​

Approval Gate Implementation Notes​

Context Building Phase​

AgentContext Composition​

Real-Time Telemetry Flow​

WebSocket Event Reference​

Error Handling and Recovery​

Turn Limit Enforcement​

Post-Execution Processing​

Multi-Agent Pipeline Flow​

Pipeline Design Constraints​

Universal Runtime Flow

Complete Execution Sequence

Trigger Ingestion Flow

Trigger-Specific Handler Behavior

Approval Gate Sequence

Approval Gate Implementation Notes

Context Building Phase

AgentContext Composition

Real-Time Telemetry Flow

WebSocket Event Reference

Error Handling and Recovery

Turn Limit Enforcement

Post-Execution Processing

Multi-Agent Pipeline Flow

Pipeline Design Constraints