OpenAI TTS API Documentation

The UModelverse platform offers a fully compatible speech synthesis interface with the OpenAI TTS (Text-to-Speech) API. Developers can seamlessly invoke high-quality TTS models deployed on Modelverse using familiar OpenAI SDKs or standard HTTP clients.

🎉 Limited Time Free: The TTS voice synthesis service is now open for free for a limited time. Welcome to experience!

Quick Start

You can call the Modelverse TTS API using the curl command or any client library that supports the OpenAI API. The platform supports a variety of high-quality speech synthesis models, such as IndexTeam/IndexTTS-2.

Request Parameters

Parameter	Type	Required	Description
model	string	Yes	The TTS model name to use, e.g., `IndexTeam/IndexTTS-2`
input	string	Yes	The text content to be converted to speech (supports up to 600 characters)
voice	string	Yes	The voice to use, optional values include: built-in voices (`jack_cheng`, `sales_voice`, `crystla_liu`, `stephen_chow`, `xiaoyueyue`, `mkas`, `entertain`, `novel`, `movie`), or you can provide a custom voice ID (like `uspeech:xxxx`, see below for “Using Custom Voices”)

Call Example

The following example demonstrates using the IndexTeam/IndexTTS-2 model for speech synthesis. Make sure to replace $MODELVERSE_API_KEY with your own API Key.

curl


curl https://api.umodelverse.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MODELVERSE_API_KEY" \
  -d '{
    "model": "IndexTeam/IndexTTS-2",
    "input": "Hello! Welcome to Modelverse voice synthesis service.",
    "voice": "jack_cheng"
  }' \
  --output speech.wav

python


from pathlib import Path
from openai import OpenAI
 
client = OpenAI(
    api_key=os.getenv("MODELVERSE_API_KEY", "<YOUR_MODELVERSE_API_KEY>"),
    base_url="https://api.umodelverse.ai/v1/",
)
 
speech_file_path = Path(__file__).parent / "generated-speech.wav"
 
with client.audio.speech.with_streaming_response.create(
    model="IndexTeam/IndexTTS-2",
    voice="jack_cheng",
    input="Hello! Welcome to Modelverse voice synthesis service.",
) as response:
    response.stream_to_file(speech_file_path)
 
print(f"Audio saved to {speech_file_path}")

Using Custom Voices (Optional)

In addition to using built-in voice names, you can upload your own voice timbre (even sample voices with specific emotions) to generate an exclusive voice_id, which can then be referenced in TTS requests through the voice field.

Note: Current custom voice resources will be automatically cleaned up 7 days after upload. If you need to use them for a long time, please remember to back up or re-upload in advance. (You can contact the sales team for long-term storage requirements.)

The use method can be simply understood as three steps:

Upload voice, get voice_id
Call POST /v1/audio/voice/upload to upload a sample voice meeting the requirements. The interface will return an id as the custom voice ID.
For detailed request parameters and return structures, please refer to the “Custom Voice Management API Documentation”.
Use voice_id in TTS requests
In the /v1/audio/speech request, set the voice field to the id returned in step one. A complete example will be given below.
Manage existing custom voices (optional)
To view or delete existing voices, call GET /v1/audio/voice/list and POST /v1/audio/voice/delete.
The specific request/return format is also in the “Custom Voice Management API Documentation”.

Tip: Custom voices are fully compatible with the OpenAI protocol and reuse the voice field. No need to learn new parameter names.

Using Custom voice_id in TTS Requests

Once you have a custom voice id (e.g., uspeech:xxxx), you only need to change the voice field to that id in the /v1/audio/speech request.


VOICE_ID="uspeech:xxxx-xxxx-xxxx-xxxx"
 
curl https://api.umodelverse.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MODELVERSE_API_KEY" \
  -d "{
    \"model\": \"IndexTeam/IndexTTS-2\",\
    \"input\": \"Hello, I am the gentle custom female voice.\",\
    \"voice\": \"$VOICE_ID\"\
  }" \
  --output speech-custom.wav

Behavior Description:

voice empty: Use default voice or model’s built-in default configuration.
voice is a built-in name (like jack_cheng, etc.): Use the platform’s built-in preset voice.
voice has a value like uspeech:xxxx: Means using your uploaded custom voice. The platform will find and apply the corresponding voice/emotion material according to that ID, no extra configuration needed.

Response Format

API returns a binary audio file stream.

Audio Format: Currently only supports WAV format output
Content-Type: audio/wav

Notes

Limited Time Free: The current TTS service is open for free for a limited time. The official charging standard will be notified separately later.
Audio Format: The format of the binary stream response currently only supports WAV. Other audio formats are not supported yet.
Text Length Limit: The text length limit for a single request varies depending on the specific model. The IndexTeam/IndexTTS-2 model usually supports text within 600 characters.
Voice Type: Different voice parameters will produce voices with different timbres and styles. It is recommended to choose the appropriate voice type according to the actual scenario.

Error Handling

When a request fails, the API returns a standard JSON format error response:


{
  "error": {
    "message": "Error description",
    "type": "invalid_request_error",
    "code": "error_code",
    "param": "<Request ID, for feedback or troubleshooting the cause of the error>"
  }
}

Frequently Asked Questions (FAQ)

Q: What are the requirements for the audio I upload?
A: It must be in MP3/WAV, a single file ≤ 20MB, duration 5–30 seconds, sampling rate at least 16kHz. If it does not meet the requirements, a 4xx error will be returned, and the reason will be specified in error.code.
Q: What should be filled in the voice field?
A:
- If you want to directly use the fixed voices provided by the platform, fill in the built-in names (e.g., jack_cheng);
- If you want to use your recorded voice, first call the upload interface to get the id, and then set voice to this id in the TTS request (e.g., uspeech:xxxx).
Q: What should I do if an invalid_voice_id / voice_not_found error occurs?
A: This means that this voice_id cannot be found under the current account, or it has been deleted. You can first call /v1/audio/voice/list to confirm the available IDs and then use the correct value in your request.
Q: Can custom voices be shared between different accounts?
A: Custom voices can be shared among all sub-accounts under the same organization. They cannot be shared under different accounts.