create-data-source
Create Data Source
Sprint 3
Platform
Create a new data source connection.
Account Type & Use Case
Platform Account
Platform APIs enable data engineers to create new data source connections, establishing unified access to databases, warehouses, files, and APIs without data movement. This endpoint is used to onboard new data sources, enabling quick connection setup with automatic schema discovery and health monitoring.
Endpoint
POST /api/v1/data-sources
Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer <access_token> |
Content-Type | Yes | application/json |
Request Body
{
"workspace_id": 1,
"name": "Production PostgreSQL",
"type": "postgresql",
"description": "Main production database",
"host": "db.example.com",
"port": 5432,
"database_name": "production",
"schema_name": "public",
"connection_string": null,
"file_path": null,
"use_ssh_tunnel": false,
"ssh_host": null,
"ssh_port": 22,
"ssh_username": null,
"ssh_local_port": null,
"connection_pool_size": 5,
"connection_timeout": 30,
"query_timeout": 300,
"max_retries": 3,
"schema_auto_refresh": true,
"schema_refresh_interval": 3600,
"credentials": {
"username": "dbuser",
"password": "securepassword",
"api_key": null,
"oauth_token": null,
"ssh_private_key": null,
"ssh_passphrase": null,
"warehouse": null,
"role": null
},
"metadata": {},
"tags": ["production", "database"]
}
Parameters
| Field | Type | Required | Description |
|---|---|---|---|
workspace_id | integer | No | Workspace ID |
name | string | Yes | Data source name (1-255 characters, cannot be empty) |
type | string | Yes | Data source type (see DataSourceTypeEnum) |
description | string | No | Description |
host | string | No | Database host (required for database types) |
port | integer | No | Database port (required for database types) |
database_name | string | No | Database name (required for database types) |
schema_name | string | No | Schema name |
connection_string | string | No | Full connection string (alternative to host/port/database) |
file_path | string | No | Object path in storage for file-based sources (e.g., "organizations/1/files/1/file.csv"). Required for file-based data sources. Use the 'name' field from storage API response. |
storage_file_id | integer | No | [DEPRECATED] Use file_path instead. This field is ignored. |
use_ssh_tunnel | boolean | No | Use SSH tunnel (default: false) |
ssh_host | string | No | SSH tunnel host |
ssh_port | integer | No | SSH tunnel port (default: 22) |
ssh_username | string | No | SSH username |
ssh_local_port | integer | No | SSH local port (auto-assigned if not provided) |
connection_pool_size | integer | No | Connection pool size (default: 5) |
connection_timeout | integer | No | Connection timeout in seconds (default: 30) |
query_timeout | integer | No | Query timeout in seconds (default: 300) |
max_retries | integer | No | Maximum retry attempts (default: 3) |
schema_auto_refresh | boolean | No | Enable automatic schema refresh (default: true) |
schema_refresh_interval | integer | No | Schema refresh interval in seconds (default: 3600) |
credentials | object | Yes | Connection credentials (see DataSourceCredentials) |
metadata | object | No | Additional metadata |
tags | array[string] | No | Tags for categorization |
DataSourceTypeEnum Values
- SQL:
postgresql,mysql,mariadb,sqlserver,oracle,snowflake,bigquery,redshift - NoSQL:
mongodb,elasticsearch,redis,cassandra,dynamodb - Files:
csv,excel,json,parquet,orc,delta_lake,iceberg,hudi,s3 - Other:
http_api,other
DataSourceCredentials Object
| Field | Type | Required | Description |
|---|---|---|---|
username | string | No | Database username |
password | string | No | Database password |
api_key | string | No | API key for API-based sources |
oauth_token | string | No | OAuth token |
ssh_private_key | string | No | SSH private key for tunnel |
ssh_passphrase | string | No | SSH key passphrase |
warehouse | string | No | Snowflake warehouse |
role | string | No | Snowflake role |
Response
Success (201)
{
"success": true,
"data": {
"id": 1,
"organization_id": 1,
"workspace_id": 1,
"name": "Production PostgreSQL",
"type": "postgresql",
"description": "Main production database",
"host": "db.example.com",
"port": 5432,
"database_name": "production",
"schema_name": "public",
"file_path": null,
"use_ssh_tunnel": false,
"ssh_host": null,
"ssh_port": null,
"ssh_username": null,
"ssh_key_id": null,
"ssh_local_port": null,
"connection_pool_size": 5,
"connection_timeout": 30,
"query_timeout": 300,
"max_retries": 3,
"status": "inactive",
"last_tested_at": null,
"last_successful_connection_at": null,
"last_failed_connection_at": null,
"failure_count": 0,
"failure_reason": null,
"schema_discovered_at": null,
"schema_auto_refresh": true,
"schema_refresh_interval": 3600,
"metadata": {},
"tags": ["production", "database"],
"created_by_user_id": 1,
"created_at": "2024-12-01T08:00:00Z",
"updated_at": "2024-12-01T08:00:00Z",
"updated_by_user_id": null
},
"message": "Data source created successfully"
}
Error Codes
| Status | Code | Description |
|---|---|---|
| 400 | BAD_REQUEST | Invalid request data or validation error |
| 401 | UNAUTHORIZED | Invalid or missing authentication token |
| 403 | FORBIDDEN | User is not a member of any organization |
| 409 | CONFLICT | Data source with this name already exists |
Validations
namemust be 1-255 characters and cannot be empty (whitespace is trimmed)file_pathmust be a valid storage path format (e.g., "organizations/1/files/1/file.csv")file_pathcannot be empty if providedstorage_file_idis deprecated and ignored if provided- For file-based data sources,
file_pathis required - For database types, either
host/port/database_nameorconnection_stringis required
Features
- Supports all data source types (SQL, NoSQL, Files, API)
- SSH tunnel configuration
- Connection pooling configuration
- Credentials are encrypted at rest
- Automatic organization assignment
- File-based sources require
file_pathfrom storage API
Example
curl -X POST "https://api.rivergen.com/api/v1/data-sources" \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json" \
-d '{
"name": "Production PostgreSQL",
"type": "postgresql",
"host": "db.example.com",
"port": 5432,
"database_name": "production",
"credentials": {
"username": "dbuser",
"password": "securepassword"
}
}'
Related Endpoints
- Test Connection - Test the connection after creation
- Discover Schema - Discover database schema
- List Data Sources - List all data sources