You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unofficial PyTorch implementation of Higgs Audio V2 Tokenizer with HuBERT semantic features. Complete training pipeline for semantic-acoustic audio tokenization with 960x downsampling and 8-layer RVQ.
Unofficial PyTorch implementation of VALL-E: zero-shot text-to-speech and voice cloning using neural codec language models. Train and synthesize speech from text with a single reference audio.
Hide digital data inside speech-shaped audio that survives Zoom, Discord, WhatsApp, and cellular voice. Reproducible Pareto curve of six trained codecs spanning 76 bps (cellular) to 3196 bps (Zoom-class) with listenable demos.
Audio-driven facial animation via neural codec features and adaptive channel-grouped losses. Engineered for ultra-fast, production-ready inference. AIアバター向けニューラル音声からアニメーション変換エンジン
A from-scratch PyTorch implementation of a neural audio codec (Encodec/SoundStream-style) at 3.2 kbps on LibriSpeech, with experiments on perceptual loss for phase recovery in GAN-less settings.