OpenAI TTS API Documentation
The UModelverse platform offers a fully compatible speech synthesis interface with the OpenAI TTS (Text-to-Speech) API. Developers can seamlessly invoke high-quality TTS models deployed on Modelverse using familiar OpenAI SDKs or standard HTTP clients.
🎉 Limited Time Free: The TTS voice synthesis service is now open for free for a limited time. Welcome to experience!
Quick Start
You can call the Modelverse TTS API using the curl command or any client library that supports the OpenAI API. The platform supports a variety of high-quality speech synthesis models, such as IndexTeam/IndexTTS-2.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | The TTS model name to use, e.g., IndexTeam/IndexTTS-2 |
| input | string | Yes | The text content to be converted to speech (supports up to 600 characters) |
| voice | string | Yes | The voice to use, optional values include: built-in voices (jack_cheng, sales_voice, crystla_liu, stephen_chow, xiaoyueyue, mkas, entertain, novel, movie), or you can provide a custom voice ID (like uspeech:xxxx, see below for “Using Custom Voices”) |
Call Example
The following example demonstrates using the IndexTeam/IndexTTS-2 model for speech synthesis. Make sure to replace $MODELVERSE_API_KEY with your own API Key.
** curl **
curl https://api.umodelverse.ai/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MODELVERSE_API_KEY" \
-d '{
"model": "IndexTeam/IndexTTS-2",
"input": "Hello! Welcome to Modelverse voice synthesis service.",
"voice": "jack_cheng"
}' \
--output speech.wav** python **
from pathlib import Path
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("MODELVERSE_API_KEY", "<YOUR_MODELVERSE_API_KEY>"),
base_url="https://api.umodelverse.ai/v1/",
)
speech_file_path = Path(__file__).parent / "generated-speech.wav"
with client.audio.speech.with_streaming_response.create(
model="IndexTeam/IndexTTS-2",
voice="jack_cheng",
input="Hello! Welcome to Modelverse voice synthesis service.",
) as response:
response.stream_to_file(speech_file_path)
print(f"Audio saved to {speech_file_path}")Using Custom Voices (Optional)
In addition to using built-in voice names, you can upload your own voice timbre (even sample voices with specific emotions) to generate an exclusive voice_id, which can then be referenced in TTS requests through the voice field.
Note: Current custom voice resources will be automatically cleaned up 7 days after upload. If you need to use them for a long time, please remember to back up or re-upload in advance. (You can contact the sales team for long-term storage requirements.)
The use method can be simply understood as three steps:
- Upload voice, get
voice_id
CallPOST /v1/audio/voice/uploadto upload a sample voice meeting the requirements. The interface will return anidas the custom voice ID.
For detailed request parameters and return structures, please refer to the “Custom Voice Management API Documentation”. - Use
voice_idin TTS requests
In the/v1/audio/speechrequest, set thevoicefield to theidreturned in step one. A complete example will be given below. - Manage existing custom voices (optional)
To view or delete existing voices, callGET /v1/audio/voice/listandPOST /v1/audio/voice/delete.
The specific request/return format is also in the “Custom Voice Management API Documentation”.
Tip: Custom voices are fully compatible with the OpenAI protocol and reuse the
voicefield. No need to learn new parameter names.
Using Custom voice_id in TTS Requests
Once you have a custom voice id (e.g., uspeech:xxxx), you only need to change the voice field to that id in the /v1/audio/speech request.
VOICE_ID="uspeech:xxxx-xxxx-xxxx-xxxx"
curl https://api.umodelverse.ai/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MODELVERSE_API_KEY" \
-d "{
\"model\": \"IndexTeam/IndexTTS-2\",\
\"input\": \"Hello, I am the gentle custom female voice.\",\
\"voice\": \"$VOICE_ID\"\
}" \
--output speech-custom.wavBehavior Description:
-
voiceempty: Use default voice or model’s built-in default configuration. -
voiceis a built-in name (likejack_cheng, etc.): Use the platform’s built-in preset voice. -
voicehas a value likeuspeech:xxxx: Means using your uploaded custom voice. The platform will find and apply the corresponding voice/emotion material according to that ID, no extra configuration needed.Response Format
API returns a binary audio file stream.
- Audio Format: Currently only supports WAV format output
- Content-Type:
audio/wav
Notes
- Limited Time Free: The current TTS service is open for free for a limited time. The official charging standard will be notified separately later.
- Audio Format: The format of the binary stream response currently only supports WAV. Other audio formats are not supported yet.
- Text Length Limit: The text length limit for a single request varies depending on the specific model. The
IndexTeam/IndexTTS-2model usually supports text within 600 characters. - Voice Type: Different
voiceparameters will produce voices with different timbres and styles. It is recommended to choose the appropriate voice type according to the actual scenario.
Error Handling
When a request fails, the API returns a standard JSON format error response:
{
"error": {
"message": "Error description",
"type": "invalid_request_error",
"code": "error_code",
"param": "<Request ID, for feedback or troubleshooting the cause of the error>"
}
}Frequently Asked Questions (FAQ)
-
Q: What are the requirements for the audio I upload?
A: It must be in MP3/WAV, a single file ≤ 20MB, duration 5–30 seconds, sampling rate at least 16kHz. If it does not meet the requirements, a 4xx error will be returned, and the reason will be specified inerror.code. -
Q: What should be filled in the
voicefield?
A:- If you want to directly use the fixed voices provided by the platform, fill in the built-in names (e.g.,
jack_cheng); - If you want to use your recorded voice, first call the upload interface to get the
id, and then setvoiceto thisidin the TTS request (e.g.,uspeech:xxxx).
- If you want to directly use the fixed voices provided by the platform, fill in the built-in names (e.g.,
-
Q: What should I do if an
invalid_voice_id/voice_not_founderror occurs?
A: This means that thisvoice_idcannot be found under the current account, or it has been deleted. You can first call/v1/audio/voice/listto confirm the available IDs and then use the correct value in your request. -
Q: Can custom voices be shared between different accounts?
A: Custom voices can be shared among all sub-accounts under the same organization. They cannot be shared under different accounts.