Mohammad amin Dehmolaee’s Post

Name: #computervision #ai #machinelearning #visualsegmentation | Mohammad amin Dehmolaee
Uploaded: 2024-07-30T10:24:25.685Z
Duration: 34 s
Channel: Mohammad amin Dehmolaee

Mohammad amin Dehmolaee

Co-Founder And CEO At SmartEra

8mo

🚀 Segment Anything Model 2 (SAM 2) is Out With Improving Visual Segmentation! 🖼️ Meta's SAM 2 is a groundbreaking foundation model for promptable visual segmentation in images and videos. 🎥 With its simple transformer architecture and streaming memory, it enables real-time video processing. ⚡ Key features: * Extends to video by treating images as single-frame videos 🎞️ * Strong performance across diverse tasks and visual domains 💪 * Promptable Visual Segmentation 🎯 Check out these impressive demos: * Tracking objects for video effects 🎬 * Segmenting moving cells in microscope footage for scientific research 🔬 * Handling complex, fast-moving objects 🏎️ #ComputerVision #AI #MachineLearning #VisualSegmentation

1 Comment

Mohammad amin Dehmolaee

Co-Founder And CEO At SmartEra

8mo

Demo: https://sam2.metademolab.com/demo

To view or add a comment, sign in

More Relevant Posts

Hanzla Javaid

Research Assistant @ Indiana University | MS in Computer Science
11mo
Report this post
In the recent wave of open source breakthroughs huggingface dropped a revolutionary 𝗔𝗻𝗶𝗺𝗮𝘁𝗲𝗗𝗶𝗳𝗳 pipeline that allows a diffusion model to generate subsequent frames with help of pre-trained motion adapter weights. 🚀 To demonstrate the working of this modular approach towards AI, we have developed an open source huggingface space. The space is capable of generating 2 second long animations (16 frames). Following are customizations available: 𝗡𝗲𝗴𝗮𝘁𝗶𝘃𝗲 𝗣𝗿𝗼𝗺𝗽𝘁: Guides the diffusion process away from undesired features and styles. 𝗚𝘂𝗶𝗱𝗮𝗻𝗰𝗲 𝗦𝗰𝗮𝗹𝗲: Determines how strongly the model adheres to the provided guidance versus exploring randomness (7.5 is a good balance for the diffusion model selected in the space) 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗦𝘁𝗲𝗽𝘀: It is the number of iterations used during the diffusion process to gradually denoise a sample. It plays a crucial role in the quality of generated frames. (Our space provides a maximum of 24 steps) 𝗔𝗱𝗮𝗽𝘁𝗲𝗿 𝗖𝗵𝗼𝗶𝗰𝗲: These are LoRA weights that help define motion of perspective for the AnimateDiff pipeline Space link: https://lnkd.in/d83dkHVt #stablediffusion #huggingface #generativeai #huggingface

2 Comments
Like Comment
To view or add a comment, sign in
Mohammed Kashif

AI Engineer | MTech'23 IIT Kharagpur | AI, ML and Generative AI | Educator
8mo Edited
Report this post
Exciting News from Meta AI! SAM 2 (Segment Anything Model) is Here! Meta AI has unveiled SAM 2, a groundbreaking advancement in computer vision, specifically in Image/Video segmentation. Applications: SAM 2 is incredibly versatile and can be applied across numerous fields: Driving Data, Microscopy, Egocentric Video, Robotic Surgery, Underwater Images, Paintings, Medical Imaging, Remote Sensing, Motion Segmentation, Camouflaged Object Detection and many more! This unified model for Promptable Visual Segmentation (PVS) of images and videos is built on a simple yet powerful transformer architecture with streaming memory for real-time video processing. It handles inputs like points, boxes, or masks on any frame to define segments of interest, predicting spatio-temporal masks (masklets) with precision. Core Components of SAM 2: 1. Image Encoder: Provides unconditioned tokens or feature embeddings representing each video frame, facilitating further processing and analysis. 2. Memory Attention: Influences current frame features based on past frames' features, predictions, and new prompts. It leverages the memory bank to incorporate historical data. 3. Memory Bank: Maintains information on past predictions with a dual-queue system: one for up to N recent frames and another for up to M prompted frames. 4. Mask Decoder: Utilizes "two-way" transformer blocks to update prompt and frame embeddings. For ambiguous prompts, it predicts multiple masks, ensuring the most valid output is propagated. 5. Prompt Encoder: Captures clicks (positive/negative), bounding boxes, or masks to define object extents within a frame. I’m particularly fascinated by how these components work together to make SAM 2 a powerfull model for video segmentation tasks. The model's ability to handle occlusions and ambiguous prompts with such elegance is awe-inspiring. What are other use cases of this model? What product can we make out of this SAM 2? I’d love to hear your thoughts! 🤔 Let’s dive into a discussion about the potential of SAM 2 in revolutionizing various industries. Your insights and ideas are welcome! 😊 Important links for further exploration: Paper: https://lnkd.in/gNdZK725 Demo: https://lnkd.in/g5b95W4H Code: https://lnkd.in/gZdTkiTs Website: https://ai.meta.com/sam2 #AI #MachineLearning #SAM2 #TechInnovation #ComputerVision #AIEngineering #DataScience #Research #AIEngineer #MLEngineer #DataScientist #TechInnovation #ProductDevelopment #MetaAI #Researchers #IIT #IIM #AcademicResearch #TechCommunity #ArtificialIntelligence #DeepLearning #NeuralNetworks #ComputerVision #AIResearch #STEM #Innovation #AdvancedTechnology #TechLeadership

SAM 2: Segment Anything in Images and Videos

ai.meta.com
Like Comment
To view or add a comment, sign in
Muhammad Rayyan

ML | DL | NLP | CV | Generative AI | LLM Reasoning | Multi AI Agents | Astrophysics Enthusiast
10mo
Report this post
Greetings, Mixed-modal models are a subset of multimodal models that combine inputs from different modalities, such as text, images, audio, video, etc., to enhance the understanding and performance of AI systems. Chameleon is the latest family of mixed-modal models from FAIR at Meta that can handle visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation - all in a single model. Chameleon matches or exceeds the performance of much larger models like Gemini Pro and GPT-4V in long-form mixed-modal generation. Architecture largely follows LLaMa-2's architecture. For normalization use RMSNorm, use the SwiGLU activation function and rotary positional embeddings (RoPE). Check out: https://lnkd.in/dJhru-fv #ai #ml #llm #model #meta #genai
Like Comment
To view or add a comment, sign in
Aevy

10,985 followers
8mo Edited
Report this post
Video Editors, imagine removing backgrounds from your videos with one click cutting 90% of your masking time & achieving even better results than rotoscoping or ultra keying. Meta AI just dropped SAM 2, which will allow you to do ✅ Real-time object segmentation in videos ✅ 3x faster interaction time ✅ Seamless object tracking across frames ✅ Works on ANY object in ANY video This model is actually a game-changer and is going to make your workflow efficient!

1 Comment
Like Comment
To view or add a comment, sign in
Humanity AI Consulting

4 followers
11mo
Report this post
Step into the future of weather reporting with The Weather Company's AI tool! 🌀 Revolutionizing the game, this tool creates hyperlocal weather videos, automates video creation with captions and data, and even offers personalized weather updates with an AI voice option. By integrating with the Max platform, it provides precise location-specific graphics for meteorologists, making real-time weather sharing easier than ever before. Stay ahead of the curve and experience the power of technology in weather forecasting. #WeatherTech #AIRevolution 📈☀️⛈️ #ai #artificialintelligence #love #humanitywins
Like Comment
To view or add a comment, sign in
Pietro Bolcato

Lead AI Engineer @Kittl | Gen AI, CV, NLP, MLOps | MSc AI, Double Degree | 2x Azure AI certified
3mo
Report this post
🔥 SynCamMaster is a novel research for synchronizing multi-camera video generation from diverse viewpoints. It leverages recent advancements in video diffusion models to ensure dynamic consistency across various viewpoints, making it ideal for applications like virtual filming. SynCamMaster stands out by generating open-world videos from arbitrary viewpoints, incorporating 6 DoF camera poses. It introduces a multi-view synchronization module to maintain appearance and geometry consistency across these viewpoints. The tool also uses a hybrid training scheme, combining multi-camera images and monocular videos with Unreal Engine-rendered videos, to overcome the scarcity of high-quality training data. The release includes a multi-view synchronized video dataset, SynCamVideo-Dataset, and the code is available for public use. 🔗 Paper, code and project page: https://lnkd.in/eQNY-VMW ⤵ Helpful? Follow me and join ⚡️ AI Pulse (https://lnkd.in/eWudwDsd) for daily, curated, bite-sized updates on AI—focused on what truly matters to keep you ahead of the curve 🔥
Like Comment
To view or add a comment, sign in
Duc Liem Dinh

AI Product Owner at Smart Manufacturing Innovation Center (SMIC) - Becamex IDC Vietnam | Sharing insight and experience in the field of AI 🔥
2mo
Report this post
🔥 “𝐆𝐨-𝐰𝐢𝐭𝐡-𝐭𝐡𝐞-𝐅𝐥𝐨𝐰”: 𝐌𝐨𝐭𝐢𝐨𝐧-𝐂𝐨𝐧𝐭𝐫𝐨𝐥𝐥𝐚𝐛𝐥𝐞 𝐕𝐢𝐝𝐞𝐨 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧 𝐌𝐨𝐝𝐞𝐥𝐬 𝐔𝐬𝐢𝐧𝐠 𝐑𝐞𝐚𝐥-𝐓𝐢𝐦𝐞 𝐖𝐚𝐫𝐩𝐞𝐝 𝐍𝐨𝐢𝐬𝐞! 🔥 The dedicated team from 𝐍𝐞𝐭𝐟𝐥𝐢𝐱 𝐄𝐲𝐞𝐥𝐢𝐧𝐞 𝐒𝐭𝐮𝐝𝐢𝐨𝐬, 𝐍𝐞𝐭𝐟𝐥𝐢𝐱, 𝐒𝐭𝐨𝐧𝐲 𝐁𝐫𝐨𝐨𝐤 𝐔𝐧𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲, 𝐔𝐧𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 𝐨𝐟 𝐌𝐚𝐫𝐲𝐥𝐚𝐧𝐝, 𝐚𝐧𝐝 𝐒𝐭𝐚𝐧𝐟𝐨𝐫𝐝 𝐔𝐧𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 has developed “𝐆𝐨-𝐰𝐢𝐭𝐡-𝐭𝐡𝐞-𝐅𝐥𝐨𝐰”, a method that enhances video diffusion models by allowing motion control via structured latent noise sampling. This approach achieves state-of-the-art performance without altering model architectures or training pipelines. 👉𝐏𝐚𝐩𝐞𝐫: https://lnkd.in/gYYVRxrW 👉𝐑𝐞𝐩𝐨: https://lnkd.in/gaqnqHmW 𝐊𝐞𝐲 𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐬: ✅ 𝐑𝐞𝐚𝐥-𝐓𝐢𝐦𝐞 𝐖𝐚𝐫𝐩𝐞𝐝 𝐍𝐨𝐢𝐬𝐞: Utilizes a novel noise warping algorithm that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields. ✅ 𝐌𝐨𝐭𝐢𝐨𝐧 𝐂𝐨𝐧𝐭𝐫𝐨l: Provides user-friendly motion control for local object motion, global camera movement, and motion transfer. ✅ 𝐄𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲: Fine-tunes modern video diffusion base models with minimal overhead, ensuring high-quality per-frame pixel output. ✅ 𝐒𝐜𝐚𝐥𝐚𝐛𝐢𝐥𝐢𝐭𝐲: Demonstrates robust and scalable performance across various benchmarks and user studies. #GowiththeFlow #VideoDiffusion #AI #Innovation #Technology #WELEARN.ai
Like Comment
To view or add a comment, sign in
Bolisetty Poojitha

Attended dhanekula institute of engineering and technology
9mo
Report this post
Hello connections 👋🏻 Exploring the Capabilities of Media Pipe Studio: A Dive into Hand Gesture Recognition, Object Detection, and Pose Landmark Detection!!! Media Pipe Studio is an amazing open-source framework by Google that offers a comprehensive suite of tools for real-time perception. Here's a glimpse of what I've been exploring: Hand Gesture Recognition: This feature allows for the detection and classification of various hand gestures in real time. Whether it's counting fingers, recognizing specific gestures, or even enabling touchless interaction, the potential applications are vast and fascinating. Object Detection: MediaPipe's object detection capabilities are robust and efficient. From identifying everyday objects to more complex scene understanding, this feature opens up numerous possibilities for automation, augmented reality, and beyond. Pose Landmark Detection: One of the most exciting features is pose landmark detection. It tracks human body movements and landmarks with remarkable precision. This can be leveraged in areas such as fitness tracking, motion capture for animation, and even gesture-based control systems. #AI #Machine Learning #MediaPipe #Computer Vision #handgestuiregesture-based control systems. #AI #Machine Learning #MediaPipe #Computer Vision #HandGestureRecognition #Object Detection #PoseLandmark Detection #Innovation #aimers #aimerssociety

2 Comments
Like Comment
To view or add a comment, sign in
Khawaja Muddassar

AI Engineer (CAIE™) | Generative AI | AI Answer | LLMs Applications
4mo
Report this post
🚀 Unlock the power of AI with DCGAN and cGAN, two cutting-edge architectures transforming image generation! 🎨✨ Dive into the world of creative possibilities and watch as these models turn random noise into stunning, lifelike visuals! 🤖🖼️ 🔍If you’re interested in learning, check out my GitHub Let’s connect and share insights! 🌟 #GenerativeAI #AIArt #DeepLearning #MachineLearning #AIGeneration #AICreativity #TechInnovation #DigitalArt #AIArtistry #AIArtists #SyntheticMedia #NextGenAI
Like Comment
To view or add a comment, sign in
A G Somanna

Artificial intelligence and Data science engineer
3mo
Report this post
🚀 **Transforming Perspectives with Computer Vision** 🎯 Proud to share an achievement in my journey with computer vision: successfully transforming a standard carrom board view into a **bird's-eye view** using OpenCV techniques! This wasn’t just a technical challenge—it was a perspective shift. Converting the side view to a top-down perspective required leveraging advanced **homography transformations**, meticulous point-mapping, and creative problem-solving. **Why does this matter?** - 🧠 **AI Innovation:** Opens the door to AI models understanding gameplay dynamics from an ideal angle. - 🌍 **Versatility:** The same principles apply to diverse fields, from sports analytics (hockey, football) to parking management systems and autonomous navigation. This achievement redefines how we approach vision-based tasks and demonstrates the power of combining creativity with cutting-edge technology. **Other Use Cases Include:** - 🎥 Sports broadcasting: Turning side views into top-down analytics for games. - 🚘 Smart traffic systems: Tracking vehicles from a bird's-eye perspective. - 🏗️ Construction monitoring: Gaining an aerial perspective for better decision-making. - 🎮 Gaming and AR: Enhancing real-time 3D scene generation. I’m thrilled about the possibilities and eager to explore more applications for this innovation! What’s your take on this? How would you use this technique in your domain? Let's connect and discuss! #ComputerVision #AI #Innovation #OpenCV #PerspectiveTransformation

2 Comments
Like Comment
To view or add a comment, sign in

4,484 followers

377 Posts

View Profile Connect

Mohammad amin Dehmolaee’s Post

More Relevant Posts

Explore topics