yeziyang1992

yeziyang yeziyang1992

22 followers · 4 following

Achievements

Stars

harry0703 / AudioNotes

快速提取音视频内容，整理成一份结构化的markdown笔记

Python 1,537 225 Updated Jul 26, 2024

Ola-Omni / Ola

Ola: Pushing the Frontiers of Omni-Modal Language Model

Python 306 14 Updated Feb 28, 2025

IDEA-Research / Grounded-SAM-2

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

Jupyter Notebook 1,838 175 Updated Dec 21, 2024

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 22,813 2,054 Updated Mar 15, 2025

bytedance / tarsier

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 318 19 Updated Feb 17, 2025

EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 2,202 219 Updated Mar 15, 2025

facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

Jupyter Notebook 9,991 900 Updated Aug 7, 2024

zjysteven / lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Python 271 26 Updated Feb 25, 2025

xindoo / openai-examples

Jupyter Notebook 6 4 Updated Nov 17, 2024

soCzech / TransNetV2

TransNet V2: Shot Boundary Detection Neural Network

Python 581 97 Updated Dec 4, 2023

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 21,817 2,396 Updated Aug 12, 2024

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 20,832 1,468 Updated Feb 6, 2025

ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python 924 126 Updated Apr 12, 2024

QwenLM / Qwen2.5-VL

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 8,704 618 Updated Mar 7, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…

Python 6,273 536 Updated Mar 15, 2025