Skip to main content

System Architecture

System Architecture

Welcome to the RiverGen System Architecture documentation. This documentation provides a comprehensive overview of the platform's infrastructure, backend design, and system components.

Quick Navigation

Overview

RiverGen is built on a modern, scalable architecture that supports AI-powered river generation through a multi-layered system design. The architecture is organized into five distinct layers, each serving specific purposes and communicating through well-defined protocols.

Architecture Diagram

The following diagram illustrates the complete system architecture:

RiverGen System Architecture Diagram

Architecture Layers

The RiverGen platform is structured into five main layers:

1. Client Layer

The Client Layer serves as the entry point for users interacting with the platform.

Components:

  • Web Client (User Interface): The primary user interface that provides access to all platform features

Communication Patterns:

  • Direct Data Transfer (High I/O): Direct upload connections to Storage Orchestration for high-throughput data transfers
  • REST/API Calls: Standard HTTP/REST API calls to the Gateway for all API operations
  • WebSocket Connections: Real-time bidirectional communication with the Socket.io Cluster for live updates and notifications

2. Orchestration Layer

The Orchestration Layer manages request routing, workspace context, and real-time communication.

Components:

  • Storage Orchestration (High I/O): Specialized orchestration component for handling high-throughput data uploads and storage operations

    • Receives direct data transfers from clients
    • Routes requests to Storage Service
    • Communicates with AI services via REST/Kafka messages
  • Gateway + Routing + Workspace Context: Main API gateway that handles:

    • Request routing and load balancing
    • Workspace context management
    • Authentication and authorization
    • REST/API call processing
    • Event distribution to Socket.io Cluster
  • Socket.io Cluster (4 Instances): Real-time communication cluster

    • Handles WebSocket connections from clients
    • Distributes events across instances using Redis adapter
    • Provides real-time updates and notifications

Communication Patterns:

  • REST/API calls to Service Layer
  • REST/Kafka messages to AI services
  • WebSocket connections to clients
  • Redis adapter for Socket.io clustering

3. Service Layer

The Service Layer contains the core business logic and AI capabilities, organized into three sub-layers:

Core Services

Storage Service

  • Manages file storage operations
  • Handles direct data transfers to Object Storage
  • Provides file management APIs

Backend Core (SAM + Billing)

  • System Account Management (SAM)
  • Billing and subscription management
  • Core business logic
  • Connects to database via PgBouncer

AI Services (Sub-layer)

AI Agents (Intelligent Orchestrators)

  • PSA - Prompt Studio Agent: Manages prompt creation and management
  • MSA - Model Studio Agent: Handles model development and training
  • DIA - Decision Intelligence Agent: Provides decision-making capabilities
  • GA - Governance Agent: Manages governance and compliance
  • OSA - Operational Service Agent: Handles operational tasks

Core AI Services (Stateless Capability Services)

  • Planning Services
  • Schema and Metadata Services
  • Governance Services
  • Machine Learning Services
  • Data Quality and Profiling Services
  • Language and Explanation Services
  • Execution Support Services
  • Model Monitoring Services

Hybrid Compute (Sub-layer)

  • Internal Compute: Internal computation resources
  • Sparse Native: Sparse computation capabilities
  • Connector: Data connector services
  • Data Staging: Data staging for Model Studio
  • Notification Service: Sends notifications to Kafka

Communication Patterns:

  • REST/API calls from Orchestration Layer
  • REST/Kafka messages for async operations
  • Database connections via PgBouncer
  • Kafka integration for event streaming

4. Connection Pooling Layer

The Connection Pooling Layer optimizes database connections and manages connection resources.

Components:

  • PgBouncer Connection Pooler: Connection pool manager that:
    • Manages database connections efficiently
    • Reduces connection overhead
    • Provides connection pooling for multiple services
    • Routes connections to Postgres Cluster
    • Manages connections to Redis for caching and sessions

Services Connected:

  • Backend Core
  • AI Agents
  • Core AI Services

5. Infrastructure Layer

The Infrastructure Layer provides the foundational storage, caching, and messaging infrastructure.

Components:

  • Object Storage: High-performance object storage for files and data

    • Receives direct data transfers from Storage Service
    • Optimized for high I/O operations
  • Postgres Cluster: Primary database cluster

    • Managed connections via PgBouncer
    • Stores application data, user information, and metadata
  • Redis App (Cache + Sessions): Redis instance for:

    • Application caching
    • Session storage
    • Managed connections via PgBouncer
  • Kafka: Event streaming and messaging platform

    • Receives internal layer calls from AI services
    • Handles event streaming and async communication
    • Processes notifications from Hybrid Compute services
  • Redis Socket (Socket.io Adapter): Redis instance for Socket.io clustering

    • Enables horizontal scaling of Socket.io instances
    • Maintains session state across Socket.io cluster nodes

Communication Patterns

The architecture uses several communication patterns:

Connection Types

  • Direct Data Transfer (High I/O): Blue thick lines - Used for high-throughput file uploads
  • REST/API Call: Black solid lines - Standard HTTP/REST API communication
  • WebSocket Connection: Black double arrows - Real-time bidirectional communication
  • REST/Kafka Message: Black dashed lines - Asynchronous messaging via REST or Kafka
  • Internal Layer Call: Black dotted lines - Internal service-to-service communication
  • Managed Connection: Black thick solid lines - Database connections managed by PgBouncer

Key Design Principles

  1. Layered Architecture: Clear separation of concerns across five distinct layers
  2. Scalability: Horizontal scaling through clustering (Socket.io, Postgres)
  3. High Performance: Direct data transfer paths for high I/O operations
  4. Real-time Capabilities: WebSocket support for live updates
  5. Event-Driven: Kafka integration for asynchronous processing
  6. Connection Efficiency: PgBouncer for optimized database connection management
  7. Stateless Services: Core AI services designed as stateless for easy scaling

Data Flow

Upload Flow

  1. Client → Storage Orchestration (Direct Data Transfer)
  2. Storage Orchestration → Storage Service (REST/API)
  3. Storage Service → Object Storage (Direct Data Transfer)

API Request Flow

  1. Client → Gateway (REST/API Call)
  2. Gateway → Backend Core or AI Services (REST/API or Kafka)
  3. Services → PgBouncer → Postgres Cluster (Managed Connection)

Real-time Update Flow

  1. Gateway → Socket.io Cluster (Events)
  2. Socket.io Cluster → Client (WebSocket)
  3. Socket.io instances communicate via Redis Socket adapter

AI Processing Flow

  1. Gateway → AI Agents (REST/Kafka Message)
  2. AI Agents → Core AI Services (Internal calls)
  3. Core AI Services → Kafka (Event streaming)
  4. Services → PgBouncer → Database (Managed Connection)

Future Documentation

This overview provides the foundation for detailed documentation of:

  • Microservices Architecture: Detailed breakdown of microservices design
  • Database Architecture: Database schema, replication, and optimization strategies
  • Event Bus Architecture: Kafka configuration, topics, and event patterns
  • Caching Architecture: Redis usage patterns and cache strategies
  • Security Architecture: Authentication, authorization, and security measures