SwiftAnnotate is a comprehensive auto-labeling tool designed for Text, Image, and Video data. It leverages state-of-the-art (SOTA) Vision Language Models (VLMs) and Large Language Models (LLMs) through a robust annotator-validator pipeline, ensuring high-quality, grounded annotations while minimizing hallucinations. SwiftAnnotate also supports annotations tasks like Object Detection and Segmentation through SOTA CV models like SAM2
, YOLOWorld
, and OWL-ViT
.
-
Text Processing 📝
Perform classification, summarization, and text generation with state-of-the-art NLP models. Solve real-world problems like spam detection, sentiment analysis, and content creation. -
Image Analysis 🖼️
Generate captions for images to provide meaningful descriptions. Classify images into predefined categories with high precision. Detect objects in images using models like YOLOWorld. Achieve pixel-perfect segmentation with SAM2 and OWL-ViT. -
Video Processing 🎥
Generate captions for videos with frame-level analysis and temporal understanding Understand video content by detecting scenes and actions effortlessly. -
Quality Assurance ✅
Use a two-stage pipeline for annotation and validation to ensure high data quality. Validate outputs rigorously to maintain reliability before deployment. -
Multi-modal Support 🌐
Seamlessly process text, images, and videos within a unified framework. Combine data types for powerful multi-modal insights and applications. -
Customization 🛠️ Easily extend and adapt the framework to suit specific project needs. Integrate new models and tasks seamlessly with modular architecture.
-
Developer-Friendly 👩💻👨💻 Easy-to-use package and detailed documentation to get started quickly.
To install SwiftAnnotate from PyPI and set up the project environment, follow these steps:
-
Install from PyPI
Run the following command to install the package directly:
pip install swiftannotate
-
For Development (Using Poetry)
If you want to contribute or explore the project codebase ensure you have Poetry installed. Follow the steps given below:
git clone https://github.com/yasho191/SwiftAnnotate cd SwiftAnnotate poetry install
You're now ready to explore and develop SwiftAnnotate!
The annotator-validator pipeline ensures high-quality annotations through a two-stage process:
Stage 1: Annotation
- Primary LLM/VLM generates initial annotations
- Configurable model selection (OpenAI, Google Gemini, Anthropic, Mistral, Qwen-VL)
Stage 2: Validation
- Secondary model validates initial annotations
- Cross-checks for hallucinations and factual accuracy
- Provides confidence scores and correction suggestions
- Option to regenerate annotations if validation fails
- Structured output format for consistency
Benefits
- Reduced hallucinations through 2 stage verification
- Higher annotation quality and consistency
- Automated quality control
- Traceable annotation process
The pipeline can be customized with different model combinations and validation thresholds based on specific use cases.
Currently, we support OpenAI, Google-Gemini, Ollama, and Qwen2-VL for image captioning. As Qwen2-VL is not yet available on Ollama it is supported through HuggingFace. To get started quickly refer the code snippets shown below.
OpenAI
import os
from swiftannotate.image import OpenAIForImageCaptioning
caption_model = "gpt-4o"
validation_model = "gpt-4o-mini"
api_key = "<YOUR_OPENAI_API_KEY>"
BASE_DIR = "<IMAGE_DIR>"
image_paths = [os.path.join(BASE_DIR, image) for image in os.listdir(BASE_DIR)]
image_captioning_pipeline = ImageCaptioningOpenAI(
caption_model=caption_model,
validation_model=validation_model,
api_key=api_key,
output_file="image_captioning_output.json"
)
results = image_captioning_pipeline.generate(image_paths=image_paths)
Qwen2-VL
You can use any version for the Qwen2-VL (7B, 72B) depending on the available resources. vLLM inference is not currently supported but it will be available soon.
import os
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers import BitsAndBytesConfig
from swiftannotate.image import Qwen2VLForImageCaptioning
# Load the images
BASE_DIR = "<IMAGE_DIR>"
image_paths = [os.path.join(BASE_DIR, image) for image in os.listdir(BASE_DIR)]
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
bnb_4bit_use_double_quant=True
)
model = AutoModelForImageTextToText.from_pretrained(
"Qwen/Qwen2-VL-7B-Instruct",
device_map="auto",
torch_dtype="auto",
quantization_config=quantization_config)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
# Load the Caption Model
captioning_pipeline = ImageCaptioningQwen2VL(
model = model,
processor = processor,
output_file="image_captioning_output.json"
)
results = captioning_pipeline.generate(image_paths)
We welcome contributions to SwiftAnnotate! There are several ways you can help improve the project:
- Enhanced Prompts: Contribute better validation and annotation prompts for improved accuracy
- File Support: Add support for additional input/output file formats
- Cloud Integration: Implement AWS S3 storage support and other cloud services
- Validation Strategies: Develop new validation approaches for different annotation tasks
- Model Support: Integrate additional LLMs and VLMs
- Documentation: Improve guides and examples
Please submit a pull request with your contributions or open an issue to discuss new features.