Highlights
- Pro
Stars
EVE Series: Encoder-Free Vision-Language Models from BAAI
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
[NeurIPS 2023] HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Pytorch Implementation of "SMITE: Segment Me In TimE"
Code of AAAI2025 Paper 《VIoTGPT: Learning to Schedule Vision Tools in LLMs towards Intelligent Video Internet of Things》
An official implementation of "Hulk: A Universal Knowledge Translator for Human-Centric Tasks"
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
High-resolution models for human tasks.
Free, simple, and intuitive online database diagram editor and SQL generator.
[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
A programming language exclusively designed for cybersecurity
RTMPose series (RTMPose, DWPose, RTMO, RTMW) without mmcv, mmpose, mmdet etc.
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Robust Speech Recognition via Large-Scale Weak Supervision
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
[NeurIPS 2024] VideoTetris: Towards Compositional Text-To-Video Generation
A generative speech model for daily dialogue.