Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Stars
A fork to add multimodal model training to open-r1
Witness the aha moment of VLM with less than $3.
Frontier Multimodal Foundation Models for Image and Video Understanding
This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.
A list of awesome papers and resources of recommender system on large language model (LLM).
Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
[ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
A generalist foundation model for healthcare capable of handling diverse medical data modalities.
Neural Code Intelligence Survey 2024; Reading lists and resources
[Nature Reviews Bioengineering🔥] Application of Large Language Models in Medicine. A curated list of practical guide resources of Medical LLMs (Medical LLMs Tree, Tables, and Papers)
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
[ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs
Code for EMNLP 2024 paper "DVD: Dynamic Contrastive Decoding for Knowledge Amplification in Multi-Document Question Answering"
[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering
Lightweight tool to identify Data Contamination in LLMs evaluation
[EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"
✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows
[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an …
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
LoFiT: Localized Fine-tuning on LLM Representations
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
This includes the original implementation of CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control.