Staging Service
The Staging Service handles data staging operations for Model Studio, converting data sources to Parquet format for machine learning consumption.
Quick Navigation
Overview
The Staging Service is designed to:
- Extract data from data sources
- Convert data to Parquet format
- Stage data in chunks for Model Studio consumption
- Manage staging jobs with Temporal workflows
- Provide presigned URLs for chunk access
Note: This service is currently under development. All endpoints return 501 Not Implemented.
Service Architecture Flow Diagram
View Flow Diagram

Staging Service Flow Overview:
This flow diagram illustrates the complete staging service workflow from job creation through data extraction, chunk management, and cleanup. It demonstrates how jobs are created, data is extracted and converted to Parquet, chunks are managed with presigned URLs, and how Temporal workflows handle job lifecycle.
Key Flow Components:
- Job Creation: Model Studio creates staging jobs with data source parameters
- Temporal Workflow: Initializes Temporal workflow for reliable execution
- Data Extraction: Connects to data source and extracts data in chunks
- Parquet Conversion: Converts data to Parquet format with compression
- Chunk Management: Uploads chunks to MinIO/S3 and generates presigned URLs
- Job Control: Supports pause, resume, and cancel operations
- Chunk Consumption: Model Studio consumes chunks via presigned URLs
- Cleanup: Automatic cleanup of chunks when TTL expires or job completes
Base Path
All endpoints are prefixed with /api/v1/staging.
Base URL: http://staging-service:8003 (internal) or http://localhost:8003 (development)
Authentication
The service uses header-based authentication:
X-Org-ID: <organization_id> (required, defaults to 1)
X-User-ID: <user_id> (optional)
Endpoints
Job Management
| Method | Endpoint | Description | Status |
|---|---|---|---|
| POST | /api/v1/staging/jobs | Create a new data staging job | Not Implemented |
| GET | /api/v1/staging/jobs | List staging jobs for the organization | Not Implemented |
| GET | /api/v1/staging/jobs/{job_id} | Get status and progress of a staging job | Not Implemented |
| DELETE | /api/v1/staging/jobs/{job_id} | Cancel a staging job and cleanup resources | Not Implemented |
Job Control
| Method | Endpoint | Description | Status |
|---|---|---|---|
| POST | /api/v1/staging/jobs/{job_id}/pause | Pause a running staging job | Not Implemented |
| POST | /api/v1/staging/jobs/{job_id}/resume | Resume a paused staging job | Not Implemented |
Chunk Management
| Method | Endpoint | Description | Status |
|---|---|---|---|
| POST | /api/v1/staging/jobs/{job_id}/chunks/{chunk_id}/ack | Acknowledge consumption of a chunk | Not Implemented |
| GET | /api/v1/staging/jobs/{job_id}/chunks/{chunk_id}/refresh | Refresh the presigned URL for a chunk | Not Implemented |
Total: 8 endpoints (all not implemented)
Status
Current Status: All endpoints return 501 Not Implemented
Planned Implementation:
- Temporal workflow integration for job orchestration
- Parquet conversion pipeline
- Chunk-based data streaming
- Presigned URL generation for chunk access
- Job status tracking and progress monitoring
Internal Notes
- Service will use Temporal for workflow orchestration
- Data will be converted to Parquet format for ML consumption
- Chunks will be streamed to Model Studio with acknowledgment mechanism
- Presigned URLs will be generated for secure chunk access
- Jobs will support pause/resume functionality
- All endpoints are currently stubs returning 501 Not Implemented