This project re-implements the GPT model's inference from index-tts using the vllm library, accelerating the inference process of index-tts.
Inference speed improvements (Index-TTS-v1) on a single RTX 4090 are as follows:
gpu_memory_utilization set to 0.25 (about 5GB of VRAM), a concurrency of around 16 was tested without issues (refer to simple_test.py for the speed test script)docker compose up/audio/speech API path for OpenAI compatibility/audio/voices API path to get the list of voices/charactersgit clone https://github.com/Ksuriuri/index-tts-vllm.git
cd index-tts-vllm
conda create -n index-tts-vllm python=3.12 conda activate index-tts-vllm
PyTorch version 2.8.0 is required (corresponding to vllm 0.10.2). For specific installation instructions, please refer to the PyTorch official website.
pip install -r requirements.txt
(Recommended) Download the corresponding version of the model weights to the checkpoints/ directory:
# Index-TTS
modelscope download --model kusuriuri/Index-TTS-vLLM --local_dir ./checkpoints/Index-TTS-vLLM
# IndexTTS-1.5
modelscope download --model kusuriuri/Index-TTS-1.5-vLLM --local_dir ./checkpoints/Index-TTS-1.5-vLLM
# IndexTTS-2
modelscope download --model kusuriuri/IndexTTS-2-vLLM --local_dir ./checkpoints/IndexTTS-2-vLLM
(Optional, not recommended) You can also use convert_hf_format.sh to convert the official weight files yourself:
bash convert_hf_format.sh /path/to/your/model_dir
Run the corresponding version:
# Index-TTS 1.0
python webui.py
# IndexTTS-1.5
python webui.py --version 1.5
# IndexTTS-2
python webui_v2.py
The first launch may take some time to compile the CUDA kernel for bigvgan.
The API is encapsulated using FastAPI. Here is an example of how to start it:
# Index-TTS-1.0/1.5
python api_server.py
# IndexTTS-2
python api_server_v2.py
--model_dir: Required, the path to the model weights--host: Service IP address, defaults to 0.0.0.0--port: Service port, defaults to 6006--gpu_memory_utilization: vllm GPU memory utilization, defaults to 0.25api_example.pyapi_example_v2.py/audio/speech API path for OpenAI compatibility/audio/voices API path to get the list of voices/charactersFor details, see: createSpeech
Word Error Rate (WER) Results for IndexTTS and Baseline Models on the seed-test
| model | zh | en |
|---|---|---|
| Human | 1.254 | 2.143 |
| index-tts (num_beams=3) | 1.005 | 1.943 |
| index-tts (num_beams=1) | 1.107 | 2.032 |
| index-tts-vllm | 1.12 | 1.987 |
The performance is basically the same as the original project.
Refer to simple_test.py, the API service needs to be started first.