Skip to content
View oyt9306's full-sized avatar

Block or report oyt9306

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A-MEM: Agentic Memory for LLM Agents

Python 110 14 Updated Mar 7, 2025

MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Python 289 6 Updated Mar 12, 2025

Explore the Multimodal “Aha Moment” on 2B Model

Python 453 15 Updated Mar 10, 2025

Subjects200K dataset

Jupyter Notebook 100 3 Updated Jan 17, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 8,601 612 Updated Mar 7, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 43,902 5,371 Updated Mar 12, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 1,359 76 Updated Mar 12, 2025

Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)

Python 165 11 Updated Jul 5, 2024

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 141,072 28,255 Updated Mar 12, 2025

PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437

Python 911 46 Updated Feb 25, 2025

Official PyTorch Implementation of "History-Guided Video Diffusion"

Python 218 8 Updated Mar 6, 2025

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 767 39 Updated Mar 6, 2025

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling.

Python 77 2 Updated Feb 25, 2025

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,363 88 Updated Mar 10, 2025

MLGym A New Framework and Benchmark for Advancing AI Research Agents

Python 432 39 Updated Mar 12, 2025

Solve Visual Understanding with Reinforced VLMs

Python 4,030 249 Updated Mar 9, 2025

Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"

Python 161 8 Updated Feb 24, 2025

Code for the paper "Adapt - $\infty$: Scalable Lifelong Multimodal Instruction Tuning"

Python 7 Updated Oct 15, 2024

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,690 2,193 Updated Feb 1, 2025

Arbitrary-steps Image Super-resolution via Diffusion Inversion (CVPR 2025)

Python 979 62 Updated Mar 7, 2025

Towards training VQ-VAE models robustly!

Python 55 2 Updated Jan 9, 2025

Implementation of the paper "MaskBit: Embedding-free Image Generation from Bit Tokens"

Jupyter Notebook 57 3 Updated Feb 27, 2025

Efficient vision foundation models for high-resolution generation and perception.

Python 2,705 214 Updated Jan 24, 2025

Unofficial Implementation of E-LatentLPIPS(Ensembled-LatentLPIPS) of Diffusion2GAN

Python 40 2 Updated Jul 11, 2024

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 2,161 104 Updated Jan 2, 2025

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 83 2 Updated Dec 3, 2024

Official inference repo for FLUX.1 models

Python 20,757 1,462 Updated Feb 6, 2025
Python 1,043 79 Updated Jan 8, 2025

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Python 783 48 Updated Mar 12, 2024
Next