Skip to Content

Custom Voice Management API Documentation

This document only describes the request and response format of custom voice management interfaces (upload/list/delete), suitable for users with existing development background as a reference. For how to actually use custom voices in /v1/audio/speech, please refer to the section “Using Custom Voices (Optional)” in the OpenAI TTS API Call Documentation.

Overview

  • Domain Example: https://api.umodelverse.ai
  • Authentication Method: All interfaces require the Authorization: Bearer <MODELVERSE_API_KEY> in the Header.
  • Organization Isolation:
    • Custom voices are isolated by organization; all sub-accounts in the same organization can share the custom voices within that organization.
    • Sharing is not possible between different organizations.
  • Lifecycle: Custom voices are saved by default for 7 days, after which a background task will clean them up. For long-term storage needs, please contact the business team for evaluation.

1. Upload Custom Voice

  • HTTP Method: POST
  • Path: /v1/audio/voice/upload
  • Content-Type:
    • Recommended: multipart/form-data (direct file upload)
    • Also supported: Form fields passing Base64 strings or remote URLs

1.1 Request Parameters

Common Fields

FieldTypeRequiredDescription
namestringYesThe name of the voice, used for list display, e.g., “Gentle Female Voice”, “Customer Service Voice A”.
modelstringYesThe corresponding TTS model name when using this voice, e.g., IndexTeam/IndexTTS-2. It should match the model in subsequent /v1/audio/speech requests.

Speaker (Voice Sample Audio - Choose One, Required)

FieldTypeRequiredDescription
speaker_filefileYes (choose one)Local audio file (recommended method), uploaded via multipart/form-data.
speaker_file_base64stringYes (choose one)The Base64 string of speaker_file, passed through a standard form field.
speaker_urlstringYes (choose one)A publicly accessible URL pointing to the voice audio file.

Notes:

  • The three speaker_* fields choose one, at least one must be provided;
  • If multiple are provided, the priority is: speaker_filespeaker_file_base64speaker_url;
  • If none of the three are provided, the request will be rejected (error code: missing_speaker).

Emotion (Emotion Sample Audio - Choose One, Optional)

FieldTypeRequiredDescription
emotion_filefileNo (choose one)Emotion sample audio file, uploaded via multipart/form-data.
emotion_file_base64stringNo (choose one)The Base64 string of emotion_file, passed through a standard form field.
emotion_urlstringNo (choose one)A publicly accessible URL pointing to the emotion sample audio file.

Notes:

  • The emotion_* fields are entirely optional and can be omitted;
  • If multiple are provided, the priority is: emotion_fileemotion_file_base64emotion_url;
  • If none of the emotion_* paths are provided, the voice characteristics will be constructed based solely on the Speaker.

1.2 Audio File Constraints

The following constraints apply to the uploaded speaker_* and emotion_* audio:

  • Format: Only MP3, WAV formats are supported.
  • Size: Each audio file should be ≤ 20MB.
  • Duration: 5–30 seconds.
  • Sample Rate: 16kHz or higher.

If any of the above conditions are not met, the interface will return a 4xx error with the specific reason indicated in the error.code (e.g., file_too_large, duration_out_of_range, sample_rate_too_low).

curl -X POST "https://api.umodelverse.ai/v1/audio/voice/upload" \ -H "Authorization: Bearer $MODELVERSE_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "name=温柔女声" \ -F "model=IndexTeam/IndexTTS-2" \ -F "speaker_file=@/path/to/speaker.wav" \ -F "emotion_file=@/path/to/emotion.wav"

1.4 Successful Response Example

{ "id": "uspeech:xxxx-xxxx-xxxx-xxxx" }
  • id: Custom voice ID, to be referenced in subsequent /v1/audio/speech requests using the voice field (e.g., "voice": "uspeech:xxxx-xxxx-xxxx-xxxx").

1.5 Failed Response Example

All error responses use a unified format:

{ "error": { "message": "Error description", "type": "invalid_request_error", "code": "missing_speaker", "param": "<Request ID or Parameter Name>" } }

Common error code examples:

  • missing_name: name field not provided;
  • missing_speaker: Not a single speaker_* field provided;
  • invalid_speaker_base64: Failed to decode speaker_file_base64;
  • unsupported_audio_format: Audio format is not MP3/WAV;
  • file_too_large / duration_out_of_range / sample_rate_too_low: Audio does not meet size, duration, or sample rate requirements.

2. Query Custom Voice List

  • HTTP Method: GET
  • Path: /v1/audio/voice/list

2.1 Request Description

  • No request body is required, only the authentication information needs to be included in the Header.
  • The system will return the list of custom voices for the organization that the current API Key belongs to (top_org_id).
  • To ensure interface performance, a maximum of 1000 records is returned per call.

2.2 Response Example

{ "list": [ { "id": "uspeech:xxxx", "name": "Gentle Female Voice" }, { "id": "uspeech:yyyy", "name": "Steady Male Voice" } ] }

Field Explanation:

FieldTypeDescription
listarrayA list of custom voices.
list[].idstringCustom voice ID, which can be referenced in the voice field of /v1/audio/speech.
list[].namestringThe name of the voice entered during creation, intended for display only.

3. Delete Custom Voice

  • HTTP Method: POST
  • Path: /v1/audio/voice/delete
  • Content-Type: application/json

3.1 Request Parameters

FieldTypeRequiredDescription
idstringYesID of the custom voice to be deleted, i.e., the id returned by the upload interface.

3.2 Request Example

curl -X POST "https://api.umodelverse.ai/v1/audio/voice/delete" \ -H "Authorization: Bearer $MODELVERSE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "id": "uspeech:xxxx" }'

3.3 Successful Response Example

{ "success": true }

Note: After a successful deletion, the voice_id can no longer be used in /v1/audio/speech requests. Please confirm that the voice_id is no longer in use before deletion.

3.4 Possible Error Codes

  • missing_id: The id field not provided in the request body;
  • invalid_voice_id: The specified id does not exist under the current organization or has already been deleted;
  • Other server_error: Internal errors or object storage anomalies, investigate using the returned message and request ID.

Through these three interfaces, you can complete the full lifecycle management of custom voices:

  1. Create a voice using the upload interface and obtain the voice_id;
  2. Reference the voice_id in TTS calls using the voice field;
  3. Manage existing voice resources through the list/delete interface and control storage costs in conjunction with the 7-day validity strategy.