Imagine a world where your computer can effortlessly understand and manipulate the visual world around you, just like you do. That future is closer than you think. Today, we’re thrilled to unveil SAM 3 and SAM 3D, the latest additions to our groundbreaking Segment Anything Collection. These models are not just incremental updates; they’re game-changers in how we interact with images, videos, and even the physical world in 3D. But here's where it gets controversial: can AI truly bridge the gap between language and visual understanding? Let’s dive in and explore.
SAM 3: Redefining Object Detection and Tracking
SAM 3 is a leap forward in object detection and tracking, allowing users to identify and follow objects in images and videos using both visual and detailed text prompts. While SAM 1 and 2 focused on visual cues, SAM 3 introduces the ability to segment objects based on descriptive text, like “yellow school bus” or “people sitting down, but not wearing a red baseball cap.” This addresses a long-standing challenge in AI: linking language to specific visual elements. Traditional models often struggle with nuanced descriptions, but SAM 3 breaks this barrier, opening up new creative possibilities.
And this is the part most people miss: SAM 3 isn’t just about detecting objects; it’s about empowering creators. We’re integrating SAM 3 into tools like Edits, our video creation app, where users can apply effects to specific people or objects in their videos. Similarly, Vibes on the Meta AI app will soon feature SAM 3-enabled experiences, making it easier than ever to transform visual content.
SAM 3D: Bringing the Physical World to Life
SAM 3D takes things a step further by enabling 3D reconstruction from a single image. Comprising two models—SAM 3D Objects for scene and object reconstruction, and SAM 3D Body for human body estimation—this technology sets a new benchmark in AI-guided 3D modeling. SAM 3D Objects, in particular, outperforms existing methods, and our collaboration with artists has led to SAM 3D Artist Objects, a diverse evaluation dataset that raises the bar for measuring progress in 3D research.
Here’s the controversial bit: While SAM 3D has the potential to revolutionize fields like robotics, science, and sports medicine, it also raises questions about the ethical use of such powerful technology. How do we ensure it’s used responsibly? We’re already seeing practical applications, like the View in Room feature on Facebook Marketplace, which lets users visualize furniture in their spaces before buying. But what about its implications for privacy or artistic expression? We invite you to share your thoughts in the comments.
Explore and Create with Segment Anything Playground
Ready to experiment? Our new Segment Anything Playground (https://www.aidemos.meta.com/segment-anything) makes it easy for anyone to try SAM 3 and SAM 3D. Upload an image or video, use text prompts to segment objects, or explore 3D scenes with virtual rearrangements and effects. Templates range from practical tasks like pixelating sensitive information to fun edits like spotlight effects. No technical expertise required—just creativity.
Pushing the Boundaries of What’s Possible
As part of this release, we’re sharing model weights, datasets, and research papers to foster innovation. For SAM 3, we’ve partnered with Roboflow to help users fine-tune the model for specific needs. For SAM 3D, we’re introducing a novel benchmark dataset that challenges the field to achieve deeper understanding of the physical world.
We believe these tools will empower creators, researchers, and curious minds alike. But we want to hear from you: What excites you most about SAM 3 and SAM 3D? And what concerns do you have about their potential impact? Let’s start the conversation and shape the future of AI together. Learn more on the AI at Meta blog: SAM 3 and SAM 3D.