Stars
Ola: Pushing the Frontiers of Omni-Modal Language Model
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
Fully open reproduction of DeepSeek-R1
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
PyTorch code and models for the DINOv2 self-supervised learning method.
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
TransNet V2: Shot Boundary Detection Neural Network
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Official inference repo for FLUX.1 models
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Inpaint anything using Segment Anything and inpainting models.
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Collection of AWESOME vision-language models for vision tasks
a state-of-the-art-level open visual language model | 多模态预训练模型
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
We use MixedWM38, the mixed-type wafer defect pattern dataset for wafer defect pattern regcognition with visual transformers.
willard-yuan / SoTu
Forked from yzhangcs/SoTuBag of Visual Feature with Hamming Enbedding, Reranking
Unofficial PyTorch implementation of "Meta Pseudo Labels"
Code for "MultiGrain: a unified image embedding for classes and instances"
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
使用 NextJS + Notion API 实现的,支持多种部署方案的静态博客,无需服务器、零门槛搭建网站,为Notion和所有创作者设计。 (A static blog built with NextJS and Notion API, supporting multiple deployment options. No server required, zero threshold t…