Hybrid Computing Layer

Page Outline

Architecture Overview
- Resource Selection Flow
Resource Types
API Reference
Resource Allocation Strategy
- Automatic Selection (compute_preference: "auto")
- Manual Selection
Best Practices
Related Documentation

The Hybrid Computing Layer enables RiverGen's Model Studio to orchestrate ML training across heterogeneous infrastructure—on-premise clusters, cloud providers (AWS, Azure, GCP), and edge devices—while optimizing for cost, latency, and resource availability.

Key Capability

Model Studio automatically selects the optimal compute resource based on task requirements, data locality, cost constraints, and current resource availability.

Architecture Overview

System Architecture
Selection Criteria

Resource Selection Flow

Task Analysis: Model Studio analyzes the training task (data size, model complexity, time constraints)
Resource Matching: The scheduler queries the registry for available resources matching requirements
Cost Optimization: Evaluates cost vs. performance trade-offs based on user preferences
Allocation: Reserves the selected resource and updates its status to busy
Monitoring: Continuously tracks resource utilization and job progress
Release: Returns resource to available status upon job completion

Criterion	Weight	Description
Data Locality	High	Prefers resources close to data sources to minimize transfer time
Cost	Medium	Balances cost per hour against training time estimates
Availability	High	Only considers resources with `available` status
Capability Match	Critical	Ensures resource has required GPU/CPU/memory
Network Latency	Medium	Considers network latency for distributed training
Framework Support	Critical	Verifies resource supports required ML framework

Resource Types

1. On-Premise Resources

Use Cases:

Sensitive data that cannot leave the organization
High-performance GPU clusters for large-scale training
Cost-effective for sustained workloads

Example Configuration:

{
  "resource_id": "rivergen-gpu-cluster-01",
  "resource_type": "on-premise",
  "location": "RiverGen Data Center - Building A",
  "capabilities": {
    "gpu": true,
    "gpu_count": 8,
    "gpu_type": "NVIDIA A100",
    "cpu_cores": 128,
    "memory_gb": 1024,
    "storage_tb": 50
  },
  "supported_frameworks": ["pytorch", "tensorflow", "sklearn", "xgboost"],
  "max_parallel_jobs": 4,
  "cost_per_hour_usd": 0.0
}

2. Cloud Resources

Use Cases:

Elastic scaling for variable workloads
Access to specialized hardware (TPUs, Inferentia)
Geographic distribution for global deployments

Example Configuration:

{
  "resource_id": "aws-us-east-1-p4d",
  "resource_type": "cloud",
  "provider": "AWS",
  "region": "us-east-1",
  "capabilities": {
    "gpu": true,
    "gpu_count": 8,
    "gpu_type": "NVIDIA A100",
    "cpu_cores": 96,
    "memory_gb": 1152
  },
  "supported_frameworks": ["pytorch", "tensorflow", "mxnet"],
  "max_parallel_jobs": 10,
  "cost_per_hour_usd": 32.77,
  "network_bandwidth_gbps": 400.0
}

3. Edge Resources

Use Cases:

Federated learning on edge devices
Privacy-preserving local model training
Low-latency inference model deployment

Example Configuration:

{
  "resource_id": "edge-device-fleet-01",
  "resource_type": "edge",
  "location": "Distributed IoT Network",
  "capabilities": {
    "gpu": false,
    "cpu_cores": 4,
    "memory_gb": 8,
    "storage_gb": 64
  },
  "supported_frameworks": ["tensorflow-lite", "onnx"],
  "max_parallel_jobs": 1,
  "latency_ms": 2
}

API Reference

Register Compute Resource

Endpoint: POST /api/model-studio/compute-resources
Status Code: 201 Created

Request
Response
cURL Example

{
  "resource_id": "rivergen-gpu-cluster-02",
  "resource_type": "on-premise",
  "location": "RiverGen Data Center - Building B",
  "capabilities": {
    "gpu": true,
    "gpu_count": 4,
    "gpu_type": "NVIDIA V100",
    "cpu_cores": 64,
    "memory_gb": 512
  },
  "supported_frameworks": ["pytorch", "tensorflow", "sklearn"],
  "status": "available",
  "max_parallel_jobs": 2,
  "network_bandwidth_gbps": 10.0,
  "cost_per_hour_usd": 0.0,
  "auth_method": "ssh_key",
  "latency_ms": 5
}

Request Schema:

Field	Type	Required	Description
`resource_id`	string	Yes	Unique identifier (e.g., `rivergen-gpu-cluster-02`)
`resource_type`	string	Yes	Type: `on-premise`, `cloud`, `edge`
`provider`	string	No	Cloud provider: `AWS`, `Azure`, `GCP` (required for cloud)
`location`	string	No	Physical location or data center name
`region`	string	No	Cloud region (e.g., `us-east-1`)
`capabilities`	object	Yes	Hardware capabilities (GPU, CPU, memory)
`supported_frameworks`	array	No	ML frameworks: `pytorch`, `tensorflow`, `sklearn`, etc.
`status`	string	No	Initial status (default: `available`)
`max_parallel_jobs`	integer	No	Maximum concurrent training jobs
`network_bandwidth_gbps`	float	No	Network bandwidth in Gbps
`cost_per_hour_usd`	float	No	Cost per hour in USD (0.0 for on-premise)
`auth_method`	string	No	Authentication method: `ssh_key`, `api_token`, `iam_role`
`latency_ms`	integer	No	Network latency to Model Studio in milliseconds

{
  "id": 5,
  "resource_id": "rivergen-gpu-cluster-02",
  "resource_type": "on-premise",
  "provider": null,
  "location": "RiverGen Data Center - Building B",
  "region": null,
  "capabilities": {
    "gpu": true,
    "gpu_count": 4,
    "gpu_type": "NVIDIA V100",
    "cpu_cores": 64,
    "memory_gb": 512
  },
  "supported_frameworks": ["pytorch", "tensorflow", "sklearn"],
  "status": "available",
  "max_parallel_jobs": 2,
  "network_bandwidth_gbps": 10.0,
  "cost_per_hour_usd": 0.0,
  "auth_method": "ssh_key",
  "latency_ms": 5,
  "created_at": "2026-01-30T18:00:00Z",
  "updated_at": null
}

curl -X POST "http://localhost:8000/api/model-studio/compute-resources" \
  -H "Content-Type: application/json" \
  -d '{
    "resource_id": "rivergen-gpu-cluster-02",
    "resource_type": "on-premise",
    "location": "RiverGen Data Center - Building B",
    "capabilities": {
      "gpu": true,
      "gpu_count": 4,
      "gpu_type": "NVIDIA V100",
      "cpu_cores": 64,
      "memory_gb": 512
    },
    "supported_frameworks": ["pytorch", "tensorflow", "sklearn"],
    "status": "available",
    "max_parallel_jobs": 2
  }'

Unique Resource IDs

Each resource_id must be globally unique across all registered resources. Registration will fail with 400 Bad Request if a duplicate ID is detected.

List Compute Resources

Retrieve all registered compute resources with optional filtering.

Endpoint: GET /api/model-studio/compute-resources
Status Code: 200 OK

Query Parameters:

Parameter	Type	Description
`status_filter`	string	Filter by status: `available`, `busy`, `maintenance`, `unavailable`
`resource_type`	string	Filter by type: `on-premise`, `cloud`, `edge`
`provider`	string	Filter by cloud provider: `AWS`, `Azure`, `GCP`

Example Request:

curl "http://localhost:8000/api/model-studio/compute-resources?status_filter=available&resource_type=on-premise"

Response:

[
  {
    "id": 1,
    "resource_id": "rivergen-gpu-cluster-01",
    "resource_type": "on-premise",
    "status": "available",
    "capabilities": {
      "gpu": true,
      "gpu_count": 8,
      "cpu_cores": 128,
      "memory_gb": 1024
    },
    "max_parallel_jobs": 4,
    "cost_per_hour_usd": 0.0
  },
  {
    "id": 5,
    "resource_id": "rivergen-gpu-cluster-02",
    "resource_type": "on-premise",
    "status": "available",
    "capabilities": {
      "gpu": true,
      "gpu_count": 4,
      "cpu_cores": 64,
      "memory_gb": 512
    },
    "max_parallel_jobs": 2,
    "cost_per_hour_usd": 0.0
  }
]

Update Compute Resource

Update the status or configuration of an existing compute resource.

Endpoint: PUT /api/model-studio/compute-resources/{resource_id}
Status Code: 200 OK

Example - Mark Resource for Maintenance:

curl -X PUT "http://localhost:8000/api/model-studio/compute-resources/rivergen-gpu-cluster-02" \
  -H "Content-Type: application/json" \
  -d '{
    "status": "maintenance"
  }'

Resource Status Management

available: Ready to accept new training jobs
busy: Currently executing jobs (auto-managed by scheduler)
maintenance: Temporarily unavailable for scheduled maintenance
unavailable: Permanently offline or decommissioned

Resource Allocation Strategy

Automatic Selection (`compute_preference: "auto"`)

When a task intent specifies "compute_preference": "auto", Model Studio uses the following decision tree:

Manual Selection

Users can explicitly specify compute preferences:

{
  "task_name": "High-Priority Training",
  "ml_task": "classification",
  "compute_preference": "cloud",
  "constraints": {
    "max_cost_per_hour_usd": 50.0,
    "preferred_provider": "AWS",
    "required_gpu_count": 8
  }
}

Best Practices

1. Resource Registration

✅ Register all available compute resources during initial setup
✅ Use descriptive resource_id values (e.g., aws-us-east-1-p4d-01)
✅ Keep capabilities metadata accurate and up-to-date
✅ Set realistic max_parallel_jobs based on resource capacity

2. Cost Optimization

💰 Use on-premise resources for sustained, predictable workloads
💰 Reserve cloud resources for burst capacity and experimentation
💰 Set max_cost_per_hour_usd constraints in task intents
💰 Monitor actual vs. estimated training times to refine cost models

3. Monitoring & Maintenance

🔧 Regularly update resource status (e.g., mark for maintenance)
🔧 Monitor resource utilization via the Resource Monitor
🔧 Remove decommissioned resources or mark as unavailable
🔧 Validate framework compatibility before major upgrades

Model Studio Overview - High-level architecture
API Endpoints - Complete API reference
Implementation Deep-Dive - Technical internals

Architecture Overview​

Resource Selection Flow​

Resource Types​

1. On-Premise Resources​

2. Cloud Resources​

3. Edge Resources​

API Reference​

Register Compute Resource​

List Compute Resources​

Update Compute Resource​

Resource Allocation Strategy​

Automatic Selection (compute_preference: "auto")​

Manual Selection​

Best Practices​

1. Resource Registration​

2. Cost Optimization​

3. Monitoring & Maintenance​

Related Documentation​

Architecture Overview

Resource Selection Flow

Resource Types

1. On-Premise Resources

2. Cloud Resources

3. Edge Resources

API Reference

Register Compute Resource

List Compute Resources

Update Compute Resource

Resource Allocation Strategy

Automatic Selection (`compute_preference: "auto"`)

Manual Selection

Best Practices

1. Resource Registration

2. Cost Optimization

3. Monitoring & Maintenance

Related Documentation