A PyTorch-based Speech Toolkit
-
Updated
Apr 14, 2025 - Python
A PyTorch-based Speech Toolkit
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Foundation Architecture for (M)LLMs
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
WaveNet vocoder
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
AI powered speech denoising and enhancement
Controllable and fast Text-to-Speech for over 7000 languages!
SincNet is a neural architecture for efficiently processing raw audio samples.
General Speech Restoration
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Speech, Language, Audio, Music Processing with Large Language Model
A neural network for end-to-end speech denoising
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
🔉 spafe: Simplified Python Audio Features Extraction
UniSpeech - Large Scale Self-Supervised Learning for Speech
A python wrapper for Speech Signal Processing Toolkit (SPTK).
Problem Agnostic Speech Encoder
Add a description, image, and links to the speech-processing topic page so that developers can more easily learn about it.
To associate your repository with the speech-processing topic, visit your repo's landing page and select "manage topics."