Custom Voice Management API Documentation
This document only describes the request and response format of custom voice management interfaces (upload/list/delete), suitable for users with existing development background as a reference. For how to actually use custom voices in
/v1/audio/speech, please refer to the section “Using Custom Voices (Optional)” in the OpenAI TTS API Call Documentation.
Overview
- Domain Example:
https://api.umodelverse.ai - Authentication Method: All interfaces require the
Authorization: Bearer <MODELVERSE_API_KEY>in the Header. - Organization Isolation:
- Custom voices are isolated by organization; all sub-accounts in the same organization can share the custom voices within that organization.
- Sharing is not possible between different organizations.
- Lifecycle: Custom voices are saved by default for 7 days, after which a background task will clean them up. For long-term storage needs, please contact the business team for evaluation.
1. Upload Custom Voice
- HTTP Method:
POST - Path:
/v1/audio/voice/upload - Content-Type:
- Recommended:
multipart/form-data(direct file upload) - Also supported: Form fields passing Base64 strings or remote URLs
- Recommended:
1.1 Request Parameters
Common Fields
| Field | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | The name of the voice, used for list display, e.g., “Gentle Female Voice”, “Customer Service Voice A”. |
| model | string | Yes | The corresponding TTS model name when using this voice, e.g., IndexTeam/IndexTTS-2. It should match the model in subsequent /v1/audio/speech requests. |
Speaker (Voice Sample Audio - Choose One, Required)
| Field | Type | Required | Description |
|---|---|---|---|
| speaker_file | file | Yes (choose one) | Local audio file (recommended method), uploaded via multipart/form-data. |
| speaker_file_base64 | string | Yes (choose one) | The Base64 string of speaker_file, passed through a standard form field. |
| speaker_url | string | Yes (choose one) | A publicly accessible URL pointing to the voice audio file. |
Notes:
- The three
speaker_*fields choose one, at least one must be provided;- If multiple are provided, the priority is:
speaker_file→speaker_file_base64→speaker_url;- If none of the three are provided, the request will be rejected (error code:
missing_speaker).
Emotion (Emotion Sample Audio - Choose One, Optional)
| Field | Type | Required | Description |
|---|---|---|---|
| emotion_file | file | No (choose one) | Emotion sample audio file, uploaded via multipart/form-data. |
| emotion_file_base64 | string | No (choose one) | The Base64 string of emotion_file, passed through a standard form field. |
| emotion_url | string | No (choose one) | A publicly accessible URL pointing to the emotion sample audio file. |
Notes:
- The
emotion_*fields are entirely optional and can be omitted;- If multiple are provided, the priority is:
emotion_file→emotion_file_base64→emotion_url;- If none of the
emotion_*paths are provided, the voice characteristics will be constructed based solely on the Speaker.
1.2 Audio File Constraints
The following constraints apply to the uploaded speaker_* and emotion_* audio:
- Format: Only
MP3,WAVformats are supported. - Size: Each audio file should be ≤
20MB. - Duration:
5–30seconds. - Sample Rate:
16kHzor higher.
If any of the above conditions are not met, the interface will return a 4xx error with the specific reason indicated in the error.code (e.g., file_too_large, duration_out_of_range, sample_rate_too_low).
1.3 Request Example (Recommended: multipart file upload)
curl -X POST "https://api.umodelverse.ai/v1/audio/voice/upload" \
-H "Authorization: Bearer $MODELVERSE_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "name=温柔女声" \
-F "model=IndexTeam/IndexTTS-2" \
-F "speaker_file=@/path/to/speaker.wav" \
-F "emotion_file=@/path/to/emotion.wav"1.4 Successful Response Example
{
"id": "uspeech:xxxx-xxxx-xxxx-xxxx"
}id: Custom voice ID, to be referenced in subsequent/v1/audio/speechrequests using thevoicefield (e.g.,"voice": "uspeech:xxxx-xxxx-xxxx-xxxx").
1.5 Failed Response Example
All error responses use a unified format:
{
"error": {
"message": "Error description",
"type": "invalid_request_error",
"code": "missing_speaker",
"param": "<Request ID or Parameter Name>"
}
}Common error code examples:
missing_name:namefield not provided;missing_speaker: Not a singlespeaker_*field provided;invalid_speaker_base64: Failed to decodespeaker_file_base64;unsupported_audio_format: Audio format is not MP3/WAV;file_too_large/duration_out_of_range/sample_rate_too_low: Audio does not meet size, duration, or sample rate requirements.
2. Query Custom Voice List
- HTTP Method:
GET - Path:
/v1/audio/voice/list
2.1 Request Description
- No request body is required, only the authentication information needs to be included in the Header.
- The system will return the list of custom voices for the organization that the current API Key belongs to (
top_org_id). - To ensure interface performance, a maximum of
1000records is returned per call.
2.2 Response Example
{
"list": [
{ "id": "uspeech:xxxx", "name": "Gentle Female Voice" },
{ "id": "uspeech:yyyy", "name": "Steady Male Voice" }
]
}Field Explanation:
| Field | Type | Description |
|---|---|---|
| list | array | A list of custom voices. |
| list[].id | string | Custom voice ID, which can be referenced in the voice field of /v1/audio/speech. |
| list[].name | string | The name of the voice entered during creation, intended for display only. |
3. Delete Custom Voice
- HTTP Method:
POST - Path:
/v1/audio/voice/delete - Content-Type:
application/json
3.1 Request Parameters
| Field | Type | Required | Description |
|---|---|---|---|
| id | string | Yes | ID of the custom voice to be deleted, i.e., the id returned by the upload interface. |
3.2 Request Example
curl -X POST "https://api.umodelverse.ai/v1/audio/voice/delete" \
-H "Authorization: Bearer $MODELVERSE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"id": "uspeech:xxxx"
}'3.3 Successful Response Example
{
"success": true
}Note: After a successful deletion, the
voice_idcan no longer be used in/v1/audio/speechrequests. Please confirm that thevoice_idis no longer in use before deletion.
3.4 Possible Error Codes
missing_id: Theidfield not provided in the request body;invalid_voice_id: The specifiediddoes not exist under the current organization or has already been deleted;- Other
server_error: Internal errors or object storage anomalies, investigate using the returnedmessageand request ID.
Through these three interfaces, you can complete the full lifecycle management of custom voices:
- Create a voice using the upload interface and obtain the
voice_id; - Reference the
voice_idin TTS calls using thevoicefield; - Manage existing voice resources through the list/delete interface and control storage costs in conjunction with the 7-day validity strategy.