Model overview
This is a pretrained model built on GPT-SoVITS, a speech synthesis framework that combines a GPT (Generative Pre-trained Transformer) model with SoVITS voice conversion technology. With only a small amount of sample audio, it can produce high-quality voice cloning and text-to-speech output.
For training, a single recording of Xu Silong speaking live was used. The audio was cleaned up with noise reduction, then segmented before training on an NVIDIA GeForce RTX 4090.
Included files
The release contains two files:
虚似龙-e10.ckpt虚似龙_e4_s32.pth
虚似龙-e10.ckpt
- File type: CKPT
- File size: 148 MB (155,312,957 bytes)
- SHA256:
AC74F39FDC648BF09FDCC34B7262E201B892715A1D0EF654EE5A986318704017
虚似龙_e4_s32.pth
- File type: PTH
- File size: 81.0 MB (85,007,879 bytes)
- SHA256:
3C3ADF3C717B9DE0EF2676359C775B899B51F82FCC6B25F666F82235C4CBB648
Running it locally
If you have a graphics card at home and already know your way around AI models, you can download the files and run the model locally just for fun.
Base project
The model is based on the GPT-SoVITS project by RVC-Boss.