Skip to content
View yeziyang1992's full-sized avatar

Block or report yeziyang1992

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

快速提取音视频内容,整理成一份结构化的markdown笔记

Python 1,537 225 Updated Jul 26, 2024

Ola: Pushing the Frontiers of Omni-Modal Language Model

Python 306 14 Updated Feb 28, 2025

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

Jupyter Notebook 1,838 175 Updated Dec 21, 2024

Fully open reproduction of DeepSeek-R1

Python 22,813 2,054 Updated Mar 15, 2025

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 318 19 Updated Feb 17, 2025

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 2,202 219 Updated Mar 15, 2025

PyTorch code and models for the DINOv2 self-supervised learning method.

Jupyter Notebook 9,991 900 Updated Aug 7, 2024

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Python 271 26 Updated Feb 25, 2025
Jupyter Notebook 6 4 Updated Nov 17, 2024

TransNet V2: Shot Boundary Detection Neural Network

Python 581 97 Updated Dec 4, 2023

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 21,817 2,396 Updated Aug 12, 2024

Official inference repo for FLUX.1 models

Python 20,832 1,468 Updated Feb 6, 2025

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python 924 126 Updated Apr 12, 2024

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 8,704 618 Updated Mar 7, 2025

Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…

Python 6,273 536 Updated Mar 15, 2025

LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning

Python 1,896 72 Updated Jan 22, 2025

Inpaint anything using Segment Anything and inpainting models.

Jupyter Notebook 6,993 596 Updated Feb 29, 2024

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

2,042 95 Updated Jan 26, 2025

Collection of AWESOME vision-language models for vision tasks

2,574 200 Updated Dec 3, 2024

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,421 427 Updated May 29, 2024

将微信读书划线同步到Notion

Python 2,635 6,333 Updated Mar 15, 2025

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python 7,473 823 Updated Mar 15, 2025
Python 41 7 Updated Apr 15, 2023

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Python 7,631 1,250 Updated Jul 23, 2024

We use MixedWM38, the mixed-type wafer defect pattern dataset for wafer defect pattern regcognition with visual transformers.

Jupyter Notebook 28 7 Updated Oct 1, 2023

Bag of Visual Feature with Hamming Enbedding, Reranking

Python 54 17 Updated Jun 20, 2018

Unofficial PyTorch implementation of "Meta Pseudo Labels"

Python 385 70 Updated Jan 18, 2024

Code for "MultiGrain: a unified image embedding for classes and instances"

Python 232 38 Updated Nov 6, 2019

📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.

Python 3,446 555 Updated May 27, 2024

使用 NextJS + Notion API 实现的,支持多种部署方案的静态博客,无需服务器、零门槛搭建网站,为Notion和所有创作者设计。 (A static blog built with NextJS and Notion API, supporting multiple deployment options. No server required, zero threshold t…

JavaScript 8,843 11,946 Updated Mar 14, 2025
Next