System Architecture

Welcome to the RiverGen System Architecture documentation. This documentation provides a comprehensive overview of the platform's infrastructure, backend design, and system components.

Quick Navigation

Overview
Architecture Diagram
Architecture Layers
Communication Patterns
Key Design Principles
Data Flow
Future Documentation
Related Documentation

Overview

RiverGen is built on a modern, scalable architecture that supports AI-powered river generation through a multi-layered system design. The architecture is organized into five distinct layers, each serving specific purposes and communicating through well-defined protocols.

Architecture Diagram

The following diagram illustrates the complete system architecture:

Architecture Layers

The RiverGen platform is structured into five main layers:

1. Client Layer

The Client Layer serves as the entry point for users interacting with the platform.

Components:

Web Client (User Interface): The primary user interface that provides access to all platform features

Communication Patterns:

Direct Data Transfer (High I/O): Direct upload connections to Storage Orchestration for high-throughput data transfers
REST/API Calls: Standard HTTP/REST API calls to the Gateway for all API operations
WebSocket Connections: Real-time bidirectional communication with the Socket.io Cluster for live updates and notifications

2. Orchestration Layer

The Orchestration Layer manages request routing, workspace context, and real-time communication.

Components:

Storage Orchestration (High I/O): Specialized orchestration component for handling high-throughput data uploads and storage operations
- Receives direct data transfers from clients
- Routes requests to Storage Service
- Communicates with AI services via REST/Kafka messages
Gateway + Routing + Workspace Context: Main API gateway that handles:
- Request routing and load balancing
- Workspace context management
- Authentication and authorization
- REST/API call processing
- Event distribution to Socket.io Cluster
Socket.io Cluster (4 Instances): Real-time communication cluster
- Handles WebSocket connections from clients
- Distributes events across instances using Redis adapter
- Provides real-time updates and notifications

Communication Patterns:

REST/API calls to Service Layer
REST/Kafka messages to AI services
WebSocket connections to clients
Redis adapter for Socket.io clustering

3. Service Layer

The Service Layer contains the core business logic and AI capabilities, organized into three sub-layers:

Core Services

Storage Service

Manages file storage operations
Handles direct data transfers to Object Storage
Provides file management APIs

Backend Core (SAM + Billing)

System Account Management (SAM)
Billing and subscription management
Core business logic
Connects to database via PgBouncer

AI Services (Sub-layer)

AI Agents (Intelligent Orchestrators)

PSA - Prompt Studio Agent: Manages prompt creation and management
MSA - Model Studio Agent: Handles model development and training
DIA - Decision Intelligence Agent: Provides decision-making capabilities
GA - Governance Agent: Manages governance and compliance
OSA - Operational Service Agent: Handles operational tasks

Core AI Services (Stateless Capability Services)

Planning Services
Schema and Metadata Services
Governance Services
Machine Learning Services
Data Quality and Profiling Services
Language and Explanation Services
Execution Support Services
Model Monitoring Services

Hybrid Compute (Sub-layer)

Internal Compute: Internal computation resources
Sparse Native: Sparse computation capabilities
Connector: Data connector services
Data Staging: Data staging for Model Studio
Notification Service: Sends notifications to Kafka

Communication Patterns:

REST/API calls from Orchestration Layer
REST/Kafka messages for async operations
Database connections via PgBouncer
Kafka integration for event streaming

4. Connection Pooling Layer

The Connection Pooling Layer optimizes database connections and manages connection resources.

Components:

PgBouncer Connection Pooler: Connection pool manager that:
- Manages database connections efficiently
- Reduces connection overhead
- Provides connection pooling for multiple services
- Routes connections to Postgres Cluster
- Manages connections to Redis for caching and sessions

Services Connected:

Backend Core
AI Agents
Core AI Services

5. Infrastructure Layer

The Infrastructure Layer provides the foundational storage, caching, and messaging infrastructure.

Components:

Object Storage: High-performance object storage for files and data
- Receives direct data transfers from Storage Service
- Optimized for high I/O operations
Postgres Cluster: Primary database cluster
- Managed connections via PgBouncer
- Stores application data, user information, and metadata
Redis App (Cache + Sessions): Redis instance for:
- Application caching
- Session storage
- Managed connections via PgBouncer
Kafka: Event streaming and messaging platform
- Receives internal layer calls from AI services
- Handles event streaming and async communication
- Processes notifications from Hybrid Compute services
Redis Socket (Socket.io Adapter): Redis instance for Socket.io clustering
- Enables horizontal scaling of Socket.io instances
- Maintains session state across Socket.io cluster nodes

Communication Patterns

The architecture uses several communication patterns:

Connection Types

Direct Data Transfer (High I/O): Blue thick lines - Used for high-throughput file uploads
REST/API Call: Black solid lines - Standard HTTP/REST API communication
WebSocket Connection: Black double arrows - Real-time bidirectional communication
REST/Kafka Message: Black dashed lines - Asynchronous messaging via REST or Kafka
Internal Layer Call: Black dotted lines - Internal service-to-service communication
Managed Connection: Black thick solid lines - Database connections managed by PgBouncer

Key Design Principles

Layered Architecture: Clear separation of concerns across five distinct layers
Scalability: Horizontal scaling through clustering (Socket.io, Postgres)
High Performance: Direct data transfer paths for high I/O operations
Real-time Capabilities: WebSocket support for live updates
Event-Driven: Kafka integration for asynchronous processing
Connection Efficiency: PgBouncer for optimized database connection management
Stateless Services: Core AI services designed as stateless for easy scaling

Data Flow

Upload Flow

Client → Storage Orchestration (Direct Data Transfer)
Storage Orchestration → Storage Service (REST/API)
Storage Service → Object Storage (Direct Data Transfer)

API Request Flow

Client → Gateway (REST/API Call)
Gateway → Backend Core or AI Services (REST/API or Kafka)
Services → PgBouncer → Postgres Cluster (Managed Connection)

Real-time Update Flow

Gateway → Socket.io Cluster (Events)
Socket.io Cluster → Client (WebSocket)
Socket.io instances communicate via Redis Socket adapter

AI Processing Flow

Gateway → AI Agents (REST/Kafka Message)
AI Agents → Core AI Services (Internal calls)
Core AI Services → Kafka (Event streaming)
Services → PgBouncer → Database (Managed Connection)

Future Documentation

This overview provides the foundation for detailed documentation of:

Microservices Architecture: Detailed breakdown of microservices design
Database Architecture: Database schema, replication, and optimization strategies
Event Bus Architecture: Kafka configuration, topics, and event patterns
Caching Architecture: Redis usage patterns and cache strategies
Security Architecture: Authentication, authorization, and security measures

API Documentation - Individual account APIs
Organization APIs - Team and enterprise APIs
Microservices APIs - Internal microservices documentation