EVE Series: Encoder-Free Vision-Language Models from BAAI
-
Updated
Mar 1, 2025 - Python
EVE Series: Encoder-Free Vision-Language Models from BAAI
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
[NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
[ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
[ICML 2024] Offical code repo for ICML2024 paper "Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data"
This is an official repository for "Harnessing Vision Models for Time Series Analysis: A Survey".
Official code for "Can We Talk Models Into Seeing the World Differently?" (ICLR 2025).
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models, accepted to ECCV'24
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
[ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities"
Official repo of the paper "Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models"
Code for Post-hoc Probabilistic Vision-Language Models
PicQ: Demo for MiniCPM-o 2.6 to answer questions about images using natural language.
Add a description, image, and links to the vision-language-models topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-models topic, visit your repo's landing page and select "manage topics."