How Voice Commands Help Computers Find Objects in Pictures and Videos
A method for using voice commands to tell a computer which object in a photo or video you want to search for, allowing it to automatically isolate that object and perform a visual search.
Patent Number
US 9098533
Status
Active
Filing Date
October 3, 2011
Grant Date
August 4, 2015
Expiration
~October 2031 (estimated)
Claims
22
Assignee
Microsoft Technology Licensing LLC
Inventors
Emmanuel John Athans, Monty Lee Hammontree, Vikram Bapat
Citations
4 forward · 22 backward
What it covers
This patent describes a system that bridges the gap between what you say and what you see on a screen. When you point at an object on a display and ask a question about it, the system uses your voice query to identify the specific object within the image or video frame. It then intelligently selects a specific edge-detection algorithm—a mathematical tool to find the boundaries of shapes—tailored to that specific object or context. Finally, it crops that object out of the original image and uses it to perform a 'reverse visual search' to find more information, showing you the results directly on your screen.
What it doesn't cover
- —Does not cover general voice-to-text transcription that is not linked to visual object extraction.
- —Does not cover visual searches that rely solely on manual user selection or cropping without voice input.
- —Does not cover object detection methods that use a single, fixed edge-detection algorithm for all image types.
The clever bit
The system dynamically selects the best edge-detection algorithm based on the voice query and context, rather than using a one-size-fits-all approach to finding the object's boundaries.
Why it matters
This technology is a foundational step toward multimodal interfaces where voice and vision work together. It moves beyond simple keyword searching by allowing users to interact with visual media as if it were a searchable database, which is critical for modern augmented reality and smart assistant applications.
Real-world examples
- 1.Smart TV features that identify actors or products in a movie scene via voice command.
- 2.Augmented reality shopping apps that isolate items in a live video feed.
- 3.Digital photo management tools that allow users to search for specific objects within a video.
Generated by PatentBrief · Not legal advice · patentbrief.org
US 9098533 · 2026