Hybrid Computing Layer
Page Outline
The Hybrid Computing Layer enables RiverGen's Model Studio to orchestrate ML training across heterogeneous infrastructure—on-premise clusters, cloud providers (AWS, Azure, GCP), and edge devices—while optimizing for cost, latency, and resource availability.
Model Studio automatically selects the optimal compute resource based on task requirements, data locality, cost constraints, and current resource availability.
Architecture Overview
- System Architecture
- Selection Criteria
Resource Selection Flow
- Task Analysis: Model Studio analyzes the training task (data size, model complexity, time constraints)
- Resource Matching: The scheduler queries the registry for available resources matching requirements
- Cost Optimization: Evaluates cost vs. performance trade-offs based on user preferences
- Allocation: Reserves the selected resource and updates its status to
busy - Monitoring: Continuously tracks resource utilization and job progress
- Release: Returns resource to
availablestatus upon job completion
| Criterion | Weight | Description |
|---|---|---|
| Data Locality | High | Prefers resources close to data sources to minimize transfer time |
| Cost | Medium | Balances cost per hour against training time estimates |
| Availability | High | Only considers resources with available status |
| Capability Match | Critical | Ensures resource has required GPU/CPU/memory |
| Network Latency | Medium | Considers network latency for distributed training |
| Framework Support | Critical | Verifies resource supports required ML framework |
Resource Types
1. On-Premise Resources
Use Cases:
- Sensitive data that cannot leave the organization
- High-performance GPU clusters for large-scale training
- Cost-effective for sustained workloads
Example Configuration:
{
"resource_id": "rivergen-gpu-cluster-01",
"resource_type": "on-premise",
"location": "RiverGen Data Center - Building A",
"capabilities": {
"gpu": true,
"gpu_count": 8,
"gpu_type": "NVIDIA A100",
"cpu_cores": 128,
"memory_gb": 1024,
"storage_tb": 50
},
"supported_frameworks": ["pytorch", "tensorflow", "sklearn", "xgboost"],
"max_parallel_jobs": 4,
"cost_per_hour_usd": 0.0
}
2. Cloud Resources
Use Cases:
- Elastic scaling for variable workloads
- Access to specialized hardware (TPUs, Inferentia)
- Geographic distribution for global deployments
Example Configuration:
{
"resource_id": "aws-us-east-1-p4d",
"resource_type": "cloud",
"provider": "AWS",
"region": "us-east-1",
"capabilities": {
"gpu": true,
"gpu_count": 8,
"gpu_type": "NVIDIA A100",
"cpu_cores": 96,
"memory_gb": 1152
},
"supported_frameworks": ["pytorch", "tensorflow", "mxnet"],
"max_parallel_jobs": 10,
"cost_per_hour_usd": 32.77,
"network_bandwidth_gbps": 400.0
}
3. Edge Resources
Use Cases:
- Federated learning on edge devices
- Privacy-preserving local model training
- Low-latency inference model deployment
Example Configuration:
{
"resource_id": "edge-device-fleet-01",
"resource_type": "edge",
"location": "Distributed IoT Network",
"capabilities": {
"gpu": false,
"cpu_cores": 4,
"memory_gb": 8,
"storage_gb": 64
},
"supported_frameworks": ["tensorflow-lite", "onnx"],
"max_parallel_jobs": 1,
"latency_ms": 2
}
API Reference
Register Compute Resource
Register a new compute resource in the hybrid computing registry.
Endpoint: POST /api/model-studio/compute-resources
Status Code: 201 Created
- Request
- Response
- cURL Example
{
"resource_id": "rivergen-gpu-cluster-02",
"resource_type": "on-premise",
"location": "RiverGen Data Center - Building B",
"capabilities": {
"gpu": true,
"gpu_count": 4,
"gpu_type": "NVIDIA V100",
"cpu_cores": 64,
"memory_gb": 512
},
"supported_frameworks": ["pytorch", "tensorflow", "sklearn"],
"status": "available",
"max_parallel_jobs": 2,
"network_bandwidth_gbps": 10.0,
"cost_per_hour_usd": 0.0,
"auth_method": "ssh_key",
"latency_ms": 5
}
Request Schema:
| Field | Type | Required | Description |
|---|---|---|---|
resource_id | string | Yes | Unique identifier (e.g., rivergen-gpu-cluster-02) |
resource_type | string | Yes | Type: on-premise, cloud, edge |
provider | string | No | Cloud provider: AWS, Azure, GCP (required for cloud) |
location | string | No | Physical location or data center name |
region | string | No | Cloud region (e.g., us-east-1) |
capabilities | object | Yes | Hardware capabilities (GPU, CPU, memory) |
supported_frameworks | array | No | ML frameworks: pytorch, tensorflow, sklearn, etc. |
status | string | No | Initial status (default: available) |
max_parallel_jobs | integer | No | Maximum concurrent training jobs |
network_bandwidth_gbps | float | No | Network bandwidth in Gbps |
cost_per_hour_usd | float | No | Cost per hour in USD (0.0 for on-premise) |
auth_method | string | No | Authentication method: ssh_key, api_token, iam_role |
latency_ms | integer | No | Network latency to Model Studio in milliseconds |
{
"id": 5,
"resource_id": "rivergen-gpu-cluster-02",
"resource_type": "on-premise",
"provider": null,
"location": "RiverGen Data Center - Building B",
"region": null,
"capabilities": {
"gpu": true,
"gpu_count": 4,
"gpu_type": "NVIDIA V100",
"cpu_cores": 64,
"memory_gb": 512
},
"supported_frameworks": ["pytorch", "tensorflow", "sklearn"],
"status": "available",
"max_parallel_jobs": 2,
"network_bandwidth_gbps": 10.0,
"cost_per_hour_usd": 0.0,
"auth_method": "ssh_key",
"latency_ms": 5,
"created_at": "2026-01-30T18:00:00Z",
"updated_at": null
}
curl -X POST "http://localhost:8000/api/model-studio/compute-resources" \
-H "Content-Type: application/json" \
-d '{
"resource_id": "rivergen-gpu-cluster-02",
"resource_type": "on-premise",
"location": "RiverGen Data Center - Building B",
"capabilities": {
"gpu": true,
"gpu_count": 4,
"gpu_type": "NVIDIA V100",
"cpu_cores": 64,
"memory_gb": 512
},
"supported_frameworks": ["pytorch", "tensorflow", "sklearn"],
"status": "available",
"max_parallel_jobs": 2
}'
Each resource_id must be globally unique across all registered resources. Registration will fail with 400 Bad Request if a duplicate ID is detected.
List Compute Resources
Retrieve all registered compute resources with optional filtering.
Endpoint: GET /api/model-studio/compute-resources
Status Code: 200 OK
Query Parameters:
| Parameter | Type | Description |
|---|---|---|
status_filter | string | Filter by status: available, busy, maintenance, unavailable |
resource_type | string | Filter by type: on-premise, cloud, edge |
provider | string | Filter by cloud provider: AWS, Azure, GCP |
Example Request:
curl "http://localhost:8000/api/model-studio/compute-resources?status_filter=available&resource_type=on-premise"
Response:
[
{
"id": 1,
"resource_id": "rivergen-gpu-cluster-01",
"resource_type": "on-premise",
"status": "available",
"capabilities": {
"gpu": true,
"gpu_count": 8,
"cpu_cores": 128,
"memory_gb": 1024
},
"max_parallel_jobs": 4,
"cost_per_hour_usd": 0.0
},
{
"id": 5,
"resource_id": "rivergen-gpu-cluster-02",
"resource_type": "on-premise",
"status": "available",
"capabilities": {
"gpu": true,
"gpu_count": 4,
"cpu_cores": 64,
"memory_gb": 512
},
"max_parallel_jobs": 2,
"cost_per_hour_usd": 0.0
}
]
Update Compute Resource
Update the status or configuration of an existing compute resource.
Endpoint: PUT /api/model-studio/compute-resources/{resource_id}
Status Code: 200 OK
Example - Mark Resource for Maintenance:
curl -X PUT "http://localhost:8000/api/model-studio/compute-resources/rivergen-gpu-cluster-02" \
-H "Content-Type: application/json" \
-d '{
"status": "maintenance"
}'
available: Ready to accept new training jobsbusy: Currently executing jobs (auto-managed by scheduler)maintenance: Temporarily unavailable for scheduled maintenanceunavailable: Permanently offline or decommissioned
Resource Allocation Strategy
Automatic Selection (compute_preference: "auto")
When a task intent specifies "compute_preference": "auto", Model Studio uses the following decision tree:
Manual Selection
Users can explicitly specify compute preferences:
{
"task_name": "High-Priority Training",
"ml_task": "classification",
"compute_preference": "cloud",
"constraints": {
"max_cost_per_hour_usd": 50.0,
"preferred_provider": "AWS",
"required_gpu_count": 8
}
}
Best Practices
1. Resource Registration
- ✅ Register all available compute resources during initial setup
- ✅ Use descriptive
resource_idvalues (e.g.,aws-us-east-1-p4d-01) - ✅ Keep
capabilitiesmetadata accurate and up-to-date - ✅ Set realistic
max_parallel_jobsbased on resource capacity
2. Cost Optimization
- 💰 Use on-premise resources for sustained, predictable workloads
- 💰 Reserve cloud resources for burst capacity and experimentation
- 💰 Set
max_cost_per_hour_usdconstraints in task intents - 💰 Monitor actual vs. estimated training times to refine cost models
3. Monitoring & Maintenance
- 🔧 Regularly update resource status (e.g., mark for maintenance)
- 🔧 Monitor resource utilization via the Resource Monitor
- 🔧 Remove decommissioned resources or mark as
unavailable - 🔧 Validate framework compatibility before major upgrades
Related Documentation
- Model Studio Overview - High-level architecture
- API Endpoints - Complete API reference
- Implementation Deep-Dive - Technical internals