Skip to content
View LEEYOONHYUNG's full-sized avatar

Highlights

  • Pro

Block or report LEEYOONHYUNG

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Python 164 18 Updated Mar 14, 2025

Vector (and Scalar) Quantization, in Pytorch

Python 3,018 243 Updated Mar 11, 2025

Official Jax Implementation of MaskGIT

Jupyter Notebook 492 50 Updated Nov 18, 2022

[ICLR2025] Halton Scheduler for Masked Generative Image Transformer

Python 197 21 Updated Feb 27, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 2,412 184 Updated Feb 14, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 2,171 105 Updated Jan 2, 2025

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 335 21 Updated Jan 14, 2025

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 83 3 Updated Dec 3, 2024

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 385 23 Updated Mar 15, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,213 278 Updated Nov 5, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 10,362 1,422 Updated Mar 15, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,806 633 Updated Mar 13, 2025

Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'

Python 16 Updated Jul 24, 2024

Voice Conversion With Just Nearest Neighbors

Python 474 67 Updated Mar 18, 2024

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,058 77 Updated Mar 2, 2025

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 76 9 Updated Mar 12, 2025

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Python 2,249 396 Updated Mar 15, 2025

Inference and training library for high-quality TTS models.

Python 5,128 541 Updated Dec 10, 2024

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,155 164 Updated Feb 13, 2025
Python 36 2 Updated Sep 19, 2024

Official Implementation for "Consistency Flow Matching: Defining Straight Flows with Velocity Consistency"

Python 196 6 Updated Jan 17, 2025

LLM101n: Let's build a Storyteller

32,555 1,774 Updated Aug 1, 2024

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates @ INTERSPEECH 2022

Python 285 23 Updated Sep 16, 2023

A playbook for systematically maximizing the performance of deep learning models.

28,135 2,319 Updated Jun 18, 2024

Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech, ICASSP 2023

Python 212 13 Updated Mar 13, 2023

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 892 107 Updated Aug 7, 2024

A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ulti…

Python 146 19 Updated Jun 6, 2022

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, …

Python 325 41 Updated Sep 24, 2022
Next