vision-language-models

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.

remote-sensing segmentation-models foundation-models large-vision-language-models large-multimodal-models vision-language-models grounding-llms

Updated Mar 30, 2025
Python

drive-bench / toolkit

Star

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

autonomous-driving chatgpt vision-language-models phi-3 internvl qwen2-vl driving-with-language

Updated Feb 22, 2025
Python

erfanshayegani / Jailbreak-In-Pieces

Star

[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

alignment ai-safety vlm llm vision-language-models cross-modality-safety-alignment multi-modal-models

Updated Jun 6, 2024
Python

elkhouryk / RS-TransCLIP

Star

[ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"

remote-sensing aerial-imagery image-classification satellite-imagery earth-observation scene-classification transductive-learning zero-shot-classification vision-language-models

Updated Apr 10, 2025
Python

jiayuww / SpatialEval

Star

[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs

machine-learning gemini reasoning claude spatial-reasoning multimodal-deep-learning foundation-models large-language-models gpt-4v vision-language-models llama3 gpt-4o

Updated Jan 23, 2025
Python

vanillaer / CPL-ICML2024

Star

[ICML 2024] Offical code repo for ICML2024 paper "Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data"

unlabeled-data pseudolabels vision-language-models

Updated Jun 21, 2024
Python

D2I-Group / awesome-vision-time-series

Star

This is an official repository for "Harnessing Vision Models for Time Series Analysis: A Survey".

time-series vision-models large-multimodal-models vision-language-models large-vision-models

Updated Mar 15, 2025
Python

paulgavrikov / vlm_shapebias

Star

Official code for "Can We Talk Models Into Seeing the World Differently?" (ICLR 2025).

steering-behaviors shape-bias vision-language-models iclr2025

Updated Jan 26, 2025
Python

ytaek-oh / fsc-clip

Star

[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality

compositionality zero-shot-classification image-text-retrieval vision-language-models

Updated Oct 8, 2024
Python

chu0802 / SnD

Star

This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models, accepted to ECCV'24

eccv continual-learning vision-language-models eccv2024

Updated Jan 19, 2025
Python

s-vco / s-vco

Star

Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images

alignment-algorithms vision-language-models preference-optimization

Updated Mar 15, 2025
Python

sled-group / COMFORT

Star

[ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities"

spatial-reasoning vision-language-models