Skip to main content

AI Tools and Technologies

Page Outline

This document provides a comprehensive technical reference for the RiverGen Intelligence Stack, including detailed sub-service inventories and provider-routing specifications.

Overview

The RiverGen Intelligence layer is a modular architecture composed of domain-specific agents and stateless AI services. This separation allows for high-performance reasoning while maintaining a provider-agnostic infrastructure.

The stack follows the primary engineering rule: Agents coordinate reasoning workflows, AI Services provide specialized intelligence, and Compute layers execute the final physical operations.

Intelligence Stack Overview

Use Cases

The tools within the Intelligence Stack are designed to support a broad range of automated data engineering and analytics tasks.

  • Cross-Source Federation: Planning and executing join operations across SQL, NoSQL, and Cloud Data Warehouses.
  • Resilient AI Execution: Implementing multi-provider fallback chains to ensure execution availability during LLM outages.
  • Automated Cataloging: Using vector-based semantic search to discover and map technical assets to business terminology.
  • Governed Reasoning: Ensuring that every turn in an agentic loop is subject to policy-aware validation and safety gates.

RiverCore: Multi-Provider Control

RiverCore is the foundational service that manages all Large Language Model (LLM) provider interactions. It abstracts the underlying complexity of specific SDKs into a unified platform interface.

![Routing Decision Flow](/img/AI-documentation/psa/routing decision flow.png)

Model Capability Categories

To optimize performance and cost, RiverCore classifies models into four capability categories. The system selects the best available model for each specific turn in an agentic loop.

  • Fast: Quick classification and pattern matching (e.g., Gemini Flash Lite, GPT-4o-mini).
  • Balanced: Standard query generation and structured reasoning (e.g., Gemini Flash, Claude Sonnet).
  • Reasoning: Complex joins, federation, and error recovery (e.g., Gemini Pro, o3, Claude Opus).
  • Coding: Specialized SQL/NoSQL generation and dialect-specific optimization (e.g., GPT-4o, DeepSeek-V3).

RiverCore Sub-Service Inventory

Sub-ServiceDescriptionStatus
Provider RegistryManages LLM provider registration and healthy checks.[ACTIVE]
Complexity RouterPerforms per-turn model selection based on task tier.[ACTIVE]
Tool Call AdapterNormalizes function-calling formats across providers.[ACTIVE]
Output NormalizerEnsures canonical JSON responses regardless of model.[ACTIVE]
Fallback ControllerManages automatic provider failover within a category.[ACTIVE]
Cost MeteringTracks token usage and operational costs per tenant.[ACTIVE]
Latency TelemetryMonitors end-to-end response times for AI calls.[ACTIVE]
Policy Model SelectorSelects models based on sensitivity and tenant rules.[PHASE 2]
Prompt LibraryCentralized management of specialized prompt templates.[PLANNED]
Context ManagerHandles chunking and packing for large input contexts.[PLANNED]
Retrieval InserterInjects top-K semantic results into the reasoner.[PLANNED]
Caching LayerDeduplicates identical requests to minimize cost.[PLANNED]
Retry ControllerManages provider-specific exponential backoff logic.[PLANNED]
Safety FilterPrevents PII leakage and blocks unsafe actions.[PLANNED]
A/B RoutingFacilitates model comparison and quality experiments.[PHASE 2]

RiverPlan: Execution Planning

RiverPlan is the engine responsible for converting natural language instructions into precise execution steps. It handles the decomposition of complex goals into sequential tool calls.

Sub-ServiceDescriptionStatus
Intent ClassifierMaps instructions to structured intent categories.[PLANNED]
Prompt ParserProcesses natural language and SPL keyword hints.[PLANNED]
Context AssemblerMerges schemas, governance, and user role metadata.[ACTIVE]
Source SelectorIdentifies which systems are required for execution.[PLANNED]
Query GeneratorPerforms NL2SQL translation for target dialects.[PLANNED]
Plan NormalizerStandardizes plans across all data source types.[PLANNED]
Plan ValidatorValidates plans against schema and logic constraints.[PLANNED]
Plan RepairerAutomatically fixes invalid fields in generated plans.[PLANNED]
Plan DecomposerSplits complex multi-source plans into discrete stages.[PLANNED]
Federation BuilderOrchestrates cross-source join order and compute.[PLANNED]
Write Plan BuilderBuilds safe patterns for state-changing operations.[PLANNED]
MaterializationManages views, snapshots, and scheduled exports.[PLANNED]
Hint GeneratorInjects pushdown and partition hints into queries.[PLANNED]
Plan SummarizerGenerates human-readable execution summaries.[PLANNED]
Plan ExplainerProvides step-by-step technical reasoning details.[PLANNED]
Plan DiffCompares current execution against previous versions.[PLANNED]

RiverGuard: Governance Intelligence

RiverGuard ensures that every proposed action complies with the organization's security and regulatory policies. It injects constraints directly into the reasoning context.

Sub-ServiceDescriptionStatus
Identity ResolverMaps user claims to internal roles and workspaces.[ACTIVE]
RBAC EvaluatorValidates permissions for specific tool executions.[ACTIVE]
RLS EngineGenerates dynamic row-level security filters.[ACTIVE]
Masking EngineApplies obfuscation rules to sensitive columns.[ACTIVE]
Sensitivity ClassifyIdentifies PII/PHI/PCI patterns in datasets.[PLANNED]
Approval GateManages human-in-the-loop gates for write actions.[PLANNED]
Write GuardEnforces safety rules for update and delete patterns.[PLANNED]
Policy CompilerConverts definitions into enforceable constraints.[ACTIVE]
Policy InjectorMerges constraints into the active execution plan.[ACTIVE]
Exception HandlerManages break-glass flows for emergency access.[PLANNED]
Quota GuardEnforces spend caps and compute resource limits.[PLANNED]
Connector GuardLimits allowed operations per individual connector.[PLANNED]
Audit BuilderGenerates standardized audit events for all actions.[PLANNED]
Compliance ReporterProduces SOC2 and GDPR-ready audit exports.[PLANNED]
Explanation GenExplains why access was permitted or blocked.[PLANNED]

RiverSemantic: Catalog Intelligence

RiverSemantic provides the mapping between business terminology and technical data assets. It uses vector search to identify relevant tables and columns during the reasoning loop.

Sub-ServiceDescriptionStatus
IngestionImports schemas and metadata from all connectors.[ACTIVE]
Embedding BuilderGenerates vectors for all table and column names.[ACTIVE]
Semantic RetrieverPerforms fast top-K matching using Qdrant.[ACTIVE]
Glossary ManagerManages business terms and their relationships.[PLANNED]
Term-to-FieldMaps glossary terms to technical schema fields.[PLANNED]
Entity ResolutionBuilds a unified identity graph across sources.[PLANNED]
Join RecommenderIdentifies optimal join keys based on data similarity.[PLANNED]
Change DetectorIdentifies drift in source system schemas.[PLANNED]
Freshness TrackerMonitors data staleness using watermarking.[PLANNED]
Quality EngineTracks completeness and null rate signals.[PLANNED]
Relationship InferAutomatically detects table-level relationships.[PLANNED]
Sample ProfilerPrepares statistics and distributions for profiling.[PLANNED]
Lineage StoreTracks structural lineage across the ecosystem.[PLANNED]
Context PackagerPrepares enriched payloads for the AI model.[ACTIVE]

RiverDecide: Decision Engine

RiverDecide evaluates data streams against trained ML models to automate platform operations and alerts.

Sub-ServiceDescriptionStatus
Workflow BuilderCreates decision graphs for automated actions.[PLANNED]
Rule EngineManages business thresholds and rule logic.[PLANNED]
Recommend EngineSuggests next best actions based on model output.[PLANNED]
Impact EstimatorEstimates ROI and risk for proposed decisions.[PLANNED]
Simulation EnginePerforms "what-if" backtesting on decision models.[PLANNED]
CounterfactualAnalyzes alternative outcomes for past decisions.[PLANNED]
Confidence ScorerAssigns certainty scores to automated actions.[PLANNED]
Approval RoutingRoutes decisions to the appropriate stakeholders.[PLANNED]
Action SelectorTargets policy-compliant actions for execution.[PLANNED]
Decision ExplainerProvides a narrative for automated conclusions.[PLANNED]
Outcome TrackerMeasures the effectiveness of live decisions.[PLANNED]
Experiment EngineManages A/B tests for decision logic versions.[PLANNED]
Promote Lifecycleversioning and promotion for decision workflows.[PLANNED]
Decision RegistryCentral storage for all active decision assets.[PLANNED]

RiverOptimize: Performance Strategy

RiverOptimize identifies the most efficient execution path for each plan to minimize cost and latency.

Sub-ServiceDescriptionStatus
Routing ControllerDecides between pushdown and internal compute.[PLANNED]
Join OptimizerDetermines the placement of cross-source joins.[PLANNED]
Staging StrategyManages intermediate results for large federation.[PLANNED]
Layout AdvisorAdvises on partitioning and clustering strategies.[PLANNED]
Predicate AdvisorOptimizes filters for source-side pushdown.[PLANNED]
Cost EstimatorProvides pre-execution cost predictions.[PLANNED]
Latency EstimatorPredicts total runtime for complex workflows.[PLANNED]
Query RewriterOptimizes SQL and API calls for performance.[PLANNED]
Workload ShaperManages batching and request rate limits.[PLANNED]
Cache PolicyDefines TTL and storage rules for AI caching.[PLANNED]
Concurrency GuardLimits concurrent loops per workspace/tenant.[PLANNED]
Adaptive RetryProvides source-aware retry and backoff logic.[PLANNED]
Spill StrategyManages memory strategy for heavy aggregations.[PLANNED]
Telemetry AnalyzerLearns from past runs to tune future plans.[PLANNED]
Policy LearnerAuto-tunes routing based on historical data.[PLANNED]

RiverViz: Visualization and Rendering

RiverViz provides the components for rendering query results and architectural explanations.

Sub-ServiceDescriptionStatus
Chart RecommenderMatches charts to specific data distributions.[PLANNED]
Bar/Line/AreaStandard categorical and time-series charts.[PLANNED]
Table RendererDynamic tables with sorting and pagination.[PLANNED]
Pivot VisualizerPivot tables and hierarchical aggregations.[PLANNED]
Trend AnalyzerVisuals for seasonality and trend detection.[PLANNED]
Outlier VisualsSpecialized charts for anomaly identification.[PLANNED]
Correlation MapHeatmaps and pairwise correlation matrices.[PLANNED]
Map VisualizerGeospatial data rendering on interactive maps.[PLANNED]
Lineage RendererGraph view of data and prompt relationships.[PLANNED]
Plan GraphDAG visualization of execution workflow stages.[PLANNED]
Policy OverlayDisplays masking and RLS rules in context.[PLANNED]
Decision GraphFlow diagrams for Decision Intelligence runs.[PLANNED]
Status TimelineReal-time progress and dependency visualization.[PLANNED]
Training VisualsMetric and performance charts for Model Studio.[PLANNED]
Compare Visualsside-by-side Champion vs Challenger charts.[PLANNED]
Drift VisualsVisualizing data and prediction drift events.[PLANNED]
Resource ChartsGPU, CPU, and Memory usage visualization.[PLANNED]
Explain PanelNarrative explanations for system decisions.[PLANNED]
Report BuilderAssembly of charts into exportable reports.[PLANNED]
Data ExportersPNG, PDF, and CSV/JSON export modules.[PLANNED]

RiverLearn: Adaptive Learning

RiverLearn captures and analyzes execution outcomes to improve system reasoning over time.

Sub-ServiceDescriptionStatus
Memory StorePersists plans, outcomes, and error signatures.[PLANNED]
Feedback LoopCaptures user thumbs-up/down and corrections.[PLANNED]
Pattern ExtractorIdentifies successful multi-step workflow plans.[PLANNED]
Quality ScorerScores plans based on validity and efficiency.[PLANNED]
Routing LearnerTunes pushdown decisions using past performance.[PLANNED]
Staging LearnerOptimizes intermediate data handling strategies.[PLANNED]
Semantic ReinforceImproves schema matching using user corrections.[PLANNED]
Governance LearnerTracks and learns from policy violation patterns.[PLANNED]
Automation LearnerTunes retry logic based on success signatures.[PLANNED]
Model Learnercorrelates training data changes with accuracy.[PLANNED]
Memory ProfilesOrg-specific reasoning and dictionary profiles.[PLANNED]
Test GenConverts failures into validated test cases.[PLANNED]

RiverObserve: Operational Monitoring

RiverObserve provides platform-wide observability and operational health tracking.

Sub-ServiceDescriptionStatus
Trace StoreFull request/response logging across services.[PHASE 1]
Metrics AggregatorReal-time SLI and SLO tracking for AI calls.[PLANNED]
Alerting EngineNotification engine for system incidents.[PLANNED]
Health MonitorTracks data source and LLM provider health.[PHASE 1]
Job MonitorMonitoring for Temporal workflows and queues.[PHASE 1]
Audit SearchIndexed search for action and audit history.[PLANNED]
Cost AggregatorToken and compute spend tracking by tenant.[PLANNED]
SLA TrackerMonitors response time and data freshness goals.[PLANNED]
Incident CorrelateIdentifies root causes from cross-service logs.[PLANNED]

Infrastructure Specifications

The Intelligence Stack relies on several high-performance infrastructure components to maintain its reasoning capabilities.

ComponentRoleTechnology
Vector StoreSemantic catalog searchQdrant (Port 6333)
Workflow EngineReasoning orchestrationTemporal (Port 7233)
Relational DBMetadata persistencePostgreSQL (Port 5433)
Document StoreConnector configurationMongoDB (Port 27017)
Cache LayerSession and turn stateRedis (Port 6381)
Object StorageFile and artifact storageMinIO/S3 (Port 9002)

LLM Implementation Details

PSA interacts with multiple Large Language Models via the RiverCore abstraction layer.

LogicSpecification
OrchestrationAgentic loop with tool-calling support.
Output TypeStrict JSON validation via Pydantic.
Model SelectionPer-turn complexity-based routing.
Fallbackprovider fallback chain per category.

![Agent Design Loop](/img/AI-documentation/psa/agent Loop.png)

Limitations and Constraints

Developers must build within the following technical constraints for the current stack version.

  • Reasoning Overhead: Distributed tool calls add 1.2s minimum latency per turn.
  • Context Boundaries: Model reasoning is restricted by the provider's token window (typ 128k).
  • Concurrency Guard: Global limit of 50 concurrent loops per instance.
  • Provider SLA: Total execution time is subject to the availability of external LLM APIs.