System Architecture
System Architecture
Welcome to the RiverGen System Architecture documentation. This documentation provides a comprehensive overview of the platform's infrastructure, backend design, and system components.
Quick Navigation
Overview
RiverGen is built on a modern, scalable architecture that supports AI-powered river generation through a multi-layered system design. The architecture is organized into five distinct layers, each serving specific purposes and communicating through well-defined protocols.
Architecture Diagram
The following diagram illustrates the complete system architecture:

Architecture Layers
The RiverGen platform is structured into five main layers:
1. Client Layer
The Client Layer serves as the entry point for users interacting with the platform.
Components:
- Web Client (User Interface): The primary user interface that provides access to all platform features
Communication Patterns:
- Direct Data Transfer (High I/O): Direct upload connections to Storage Orchestration for high-throughput data transfers
- REST/API Calls: Standard HTTP/REST API calls to the Gateway for all API operations
- WebSocket Connections: Real-time bidirectional communication with the Socket.io Cluster for live updates and notifications
2. Orchestration Layer
The Orchestration Layer manages request routing, workspace context, and real-time communication.
Components:
-
Storage Orchestration (High I/O): Specialized orchestration component for handling high-throughput data uploads and storage operations
- Receives direct data transfers from clients
- Routes requests to Storage Service
- Communicates with AI services via REST/Kafka messages
-
Gateway + Routing + Workspace Context: Main API gateway that handles:
- Request routing and load balancing
- Workspace context management
- Authentication and authorization
- REST/API call processing
- Event distribution to Socket.io Cluster
-
Socket.io Cluster (4 Instances): Real-time communication cluster
- Handles WebSocket connections from clients
- Distributes events across instances using Redis adapter
- Provides real-time updates and notifications
Communication Patterns:
- REST/API calls to Service Layer
- REST/Kafka messages to AI services
- WebSocket connections to clients
- Redis adapter for Socket.io clustering
3. Service Layer
The Service Layer contains the core business logic and AI capabilities, organized into three sub-layers:
Core Services
Storage Service
- Manages file storage operations
- Handles direct data transfers to Object Storage
- Provides file management APIs
Backend Core (SAM + Billing)
- System Account Management (SAM)
- Billing and subscription management
- Core business logic
- Connects to database via PgBouncer
AI Services (Sub-layer)
AI Agents (Intelligent Orchestrators)
- PSA - Prompt Studio Agent: Manages prompt creation and management
- MSA - Model Studio Agent: Handles model development and training
- DIA - Decision Intelligence Agent: Provides decision-making capabilities
- GA - Governance Agent: Manages governance and compliance
- OSA - Operational Service Agent: Handles operational tasks
Core AI Services (Stateless Capability Services)
- Planning Services
- Schema and Metadata Services
- Governance Services
- Machine Learning Services
- Data Quality and Profiling Services
- Language and Explanation Services
- Execution Support Services
- Model Monitoring Services
Hybrid Compute (Sub-layer)
- Internal Compute: Internal computation resources
- Sparse Native: Sparse computation capabilities
- Connector: Data connector services
- Data Staging: Data staging for Model Studio
- Notification Service: Sends notifications to Kafka
Communication Patterns:
- REST/API calls from Orchestration Layer
- REST/Kafka messages for async operations
- Database connections via PgBouncer
- Kafka integration for event streaming
4. Connection Pooling Layer
The Connection Pooling Layer optimizes database connections and manages connection resources.
Components:
- PgBouncer Connection Pooler: Connection pool manager that:
- Manages database connections efficiently
- Reduces connection overhead
- Provides connection pooling for multiple services
- Routes connections to Postgres Cluster
- Manages connections to Redis for caching and sessions
Services Connected:
- Backend Core
- AI Agents
- Core AI Services
5. Infrastructure Layer
The Infrastructure Layer provides the foundational storage, caching, and messaging infrastructure.
Components:
-
Object Storage: High-performance object storage for files and data
- Receives direct data transfers from Storage Service
- Optimized for high I/O operations
-
Postgres Cluster: Primary database cluster
- Managed connections via PgBouncer
- Stores application data, user information, and metadata
-
Redis App (Cache + Sessions): Redis instance for:
- Application caching
- Session storage
- Managed connections via PgBouncer
-
Kafka: Event streaming and messaging platform
- Receives internal layer calls from AI services
- Handles event streaming and async communication
- Processes notifications from Hybrid Compute services
-
Redis Socket (Socket.io Adapter): Redis instance for Socket.io clustering
- Enables horizontal scaling of Socket.io instances
- Maintains session state across Socket.io cluster nodes
Communication Patterns
The architecture uses several communication patterns:
Connection Types
- Direct Data Transfer (High I/O): Blue thick lines - Used for high-throughput file uploads
- REST/API Call: Black solid lines - Standard HTTP/REST API communication
- WebSocket Connection: Black double arrows - Real-time bidirectional communication
- REST/Kafka Message: Black dashed lines - Asynchronous messaging via REST or Kafka
- Internal Layer Call: Black dotted lines - Internal service-to-service communication
- Managed Connection: Black thick solid lines - Database connections managed by PgBouncer
Key Design Principles
- Layered Architecture: Clear separation of concerns across five distinct layers
- Scalability: Horizontal scaling through clustering (Socket.io, Postgres)
- High Performance: Direct data transfer paths for high I/O operations
- Real-time Capabilities: WebSocket support for live updates
- Event-Driven: Kafka integration for asynchronous processing
- Connection Efficiency: PgBouncer for optimized database connection management
- Stateless Services: Core AI services designed as stateless for easy scaling
Data Flow
Upload Flow
- Client → Storage Orchestration (Direct Data Transfer)
- Storage Orchestration → Storage Service (REST/API)
- Storage Service → Object Storage (Direct Data Transfer)
API Request Flow
- Client → Gateway (REST/API Call)
- Gateway → Backend Core or AI Services (REST/API or Kafka)
- Services → PgBouncer → Postgres Cluster (Managed Connection)
Real-time Update Flow
- Gateway → Socket.io Cluster (Events)
- Socket.io Cluster → Client (WebSocket)
- Socket.io instances communicate via Redis Socket adapter
AI Processing Flow
- Gateway → AI Agents (REST/Kafka Message)
- AI Agents → Core AI Services (Internal calls)
- Core AI Services → Kafka (Event streaming)
- Services → PgBouncer → Database (Managed Connection)
Future Documentation
This overview provides the foundation for detailed documentation of:
- Microservices Architecture: Detailed breakdown of microservices design
- Database Architecture: Database schema, replication, and optimization strategies
- Event Bus Architecture: Kafka configuration, topics, and event patterns
- Caching Architecture: Redis usage patterns and cache strategies
- Security Architecture: Authentication, authorization, and security measures
Related Documentation
- API Documentation - Individual account APIs
- Organization APIs - Team and enterprise APIs
- Microservices APIs - Internal microservices documentation