Skip to main content

create-data-source

Create Data Source

Sprint 3

Create a new data source connection.

Account Type & Use Case

Endpoint

POST /api/v1/data-sources

Headers

HeaderRequiredDescription
AuthorizationYesBearer <access_token>
Content-TypeYesapplication/json

Request Body

{
"workspace_id": 1,
"name": "Production PostgreSQL",
"type": "postgresql",
"description": "Main production database",
"host": "db.example.com",
"port": 5432,
"database_name": "production",
"schema_name": "public",
"connection_string": null,
"file_path": null,
"use_ssh_tunnel": false,
"ssh_host": null,
"ssh_port": 22,
"ssh_username": null,
"ssh_local_port": null,
"connection_pool_size": 5,
"connection_timeout": 30,
"query_timeout": 300,
"max_retries": 3,
"schema_auto_refresh": true,
"schema_refresh_interval": 3600,
"credentials": {
"username": "dbuser",
"password": "securepassword",
"api_key": null,
"oauth_token": null,
"ssh_private_key": null,
"ssh_passphrase": null,
"warehouse": null,
"role": null
},
"metadata": {},
"tags": ["production", "database"]
}

Parameters

FieldTypeRequiredDescription
workspace_idintegerNoWorkspace ID
namestringYesData source name (1-255 characters, cannot be empty)
typestringYesData source type (see DataSourceTypeEnum)
descriptionstringNoDescription
hoststringNoDatabase host (required for database types)
portintegerNoDatabase port (required for database types)
database_namestringNoDatabase name (required for database types)
schema_namestringNoSchema name
connection_stringstringNoFull connection string (alternative to host/port/database)
file_pathstringNoObject path in storage for file-based sources (e.g., "organizations/1/files/1/file.csv"). Required for file-based data sources. Use the 'name' field from storage API response.
storage_file_idintegerNo[DEPRECATED] Use file_path instead. This field is ignored.
use_ssh_tunnelbooleanNoUse SSH tunnel (default: false)
ssh_hoststringNoSSH tunnel host
ssh_portintegerNoSSH tunnel port (default: 22)
ssh_usernamestringNoSSH username
ssh_local_portintegerNoSSH local port (auto-assigned if not provided)
connection_pool_sizeintegerNoConnection pool size (default: 5)
connection_timeoutintegerNoConnection timeout in seconds (default: 30)
query_timeoutintegerNoQuery timeout in seconds (default: 300)
max_retriesintegerNoMaximum retry attempts (default: 3)
schema_auto_refreshbooleanNoEnable automatic schema refresh (default: true)
schema_refresh_intervalintegerNoSchema refresh interval in seconds (default: 3600)
credentialsobjectYesConnection credentials (see DataSourceCredentials)
metadataobjectNoAdditional metadata
tagsarray[string]NoTags for categorization

DataSourceTypeEnum Values

  • SQL: postgresql, mysql, mariadb, sqlserver, oracle, snowflake, bigquery, redshift
  • NoSQL: mongodb, elasticsearch, redis, cassandra, dynamodb
  • Files: csv, excel, json, parquet, orc, delta_lake, iceberg, hudi, s3
  • Other: http_api, other

DataSourceCredentials Object

FieldTypeRequiredDescription
usernamestringNoDatabase username
passwordstringNoDatabase password
api_keystringNoAPI key for API-based sources
oauth_tokenstringNoOAuth token
ssh_private_keystringNoSSH private key for tunnel
ssh_passphrasestringNoSSH key passphrase
warehousestringNoSnowflake warehouse
rolestringNoSnowflake role

Response

Success (201)

{
"success": true,
"data": {
"id": 1,
"organization_id": 1,
"workspace_id": 1,
"name": "Production PostgreSQL",
"type": "postgresql",
"description": "Main production database",
"host": "db.example.com",
"port": 5432,
"database_name": "production",
"schema_name": "public",
"file_path": null,
"use_ssh_tunnel": false,
"ssh_host": null,
"ssh_port": null,
"ssh_username": null,
"ssh_key_id": null,
"ssh_local_port": null,
"connection_pool_size": 5,
"connection_timeout": 30,
"query_timeout": 300,
"max_retries": 3,
"status": "inactive",
"last_tested_at": null,
"last_successful_connection_at": null,
"last_failed_connection_at": null,
"failure_count": 0,
"failure_reason": null,
"schema_discovered_at": null,
"schema_auto_refresh": true,
"schema_refresh_interval": 3600,
"metadata": {},
"tags": ["production", "database"],
"created_by_user_id": 1,
"created_at": "2024-12-01T08:00:00Z",
"updated_at": "2024-12-01T08:00:00Z",
"updated_by_user_id": null
},
"message": "Data source created successfully"
}

Error Codes

StatusCodeDescription
400BAD_REQUESTInvalid request data or validation error
401UNAUTHORIZEDInvalid or missing authentication token
403FORBIDDENUser is not a member of any organization
409CONFLICTData source with this name already exists

Validations

  • name must be 1-255 characters and cannot be empty (whitespace is trimmed)
  • file_path must be a valid storage path format (e.g., "organizations/1/files/1/file.csv")
  • file_path cannot be empty if provided
  • storage_file_id is deprecated and ignored if provided
  • For file-based data sources, file_path is required
  • For database types, either host/port/database_name or connection_string is required

Features

  • Supports all data source types (SQL, NoSQL, Files, API)
  • SSH tunnel configuration
  • Connection pooling configuration
  • Credentials are encrypted at rest
  • Automatic organization assignment
  • File-based sources require file_path from storage API

Example

curl -X POST "https://api.rivergen.com/api/v1/data-sources" \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json" \
-d '{
"name": "Production PostgreSQL",
"type": "postgresql",
"host": "db.example.com",
"port": 5432,
"database_name": "production",
"credentials": {
"username": "dbuser",
"password": "securepassword"
}
}'