PatentBrief — patentbrief.org

9098533 USView original on Google PatentsCompare ↗Brief ↗

How Voice Commands Help Computers Find Objects in Pictures and Videos

Name: PatentBrief
Address: Phoenix, AZ, US
Price range: Free

A method for using voice commands to tell a computer which object in a photo or video you want to search for, allowing it to automatically isolate that object and perform a visual search.

Granted 2015ActiveExpires 2031Owned by Microsoft Technology Licensing LLCInvented by Emmanuel John Athans, Monty Lee Hammontree, Vikram Bapat

Original patent title: “Voice directed context sensitive visual search”

Plain-English explanation by SahiLast reviewed · June 15, 2026

A method for using voice commands to tell a computer which object in a photo or video you want to search for, allowing it to automatically isolate that object and perform a visual search. Granted to Microsoft Technology Licensing LLC in 2015 with 22 claims and 4 forward citations.

Coverage

What does this patent actually cover?

This patent describes a system that bridges the gap between what you say and what you see on a screen. When you point at an object on a display and ask a question about it, the system uses your voice query to identify the specific object within the image or video frame. It then intelligently selects a specific edge-detection algorithm—a mathematical tool to find the boundaries of shapes—tailored to that specific object or context. Finally, it crops that object out of the original image and uses it to perform a 'reverse visual search' to find more information, showing you the results directly on your screen.

The gap

What does this patent NOT cover?

Does not cover general voice-to-text transcription that is not linked to visual object extraction.
Does not cover visual searches that rely solely on manual user selection or cropping without voice input.
Does not cover object detection methods that use a single, fixed edge-detection algorithm for all image types.

These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.

Key facts

Patent number	US 9098533
Status	Active
Field	Consumer Electronics
Assignee	Microsoft Technology Licensing LLC
Inventors	Emmanuel John Athans, Monty Lee Hammontree, Vikram Bapat
Filed	2011
Granted	2015
Claims	22
Times cited	4
Litigation	None on record

Value · $55K–$175KModest

What made this novel

The system dynamically selects the best edge-detection algorithm based on the voice query and context, rather than using a one-size-fits-all approach to finding the object's boundaries.

Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.

Where you've seen this

Real-world examples

Smart TV features that identify actors or products in a movie scene via voice command.

Augmented reality shopping apps that isolate items in a live video feed.

Digital photo management tools that allow users to search for specific objects within a video.

Why it matters

The bigger picture

This technology is a foundational step toward multimodal interfaces where voice and vision work together. It moves beyond simple keyword searching by allowing users to interact with visual media as if it were a searchable database, which is critical for modern augmented reality and smart assistant applications.

Filed

October 3, 2011

Granted

August 4, 2015

Market context

Who's building on this

Companies in this space

Microsoft continues to integrate these types of multimodal search capabilities into their Azure AI services and Bing visual search products. Other major tech companies, including Google and Amazon, are heavily invested in similar pipelines that combine voice-activated visual recognition for their respective smart home and mobile ecosystems.

Market impact

This patent helped formalize the workflow for voice-activated visual search, a feature that has become standard in modern smart assistants. It provided a technical roadmap for how to combine disparate search results—like text-based queries and visual data—to improve the accuracy of object identification in media.

Claim 1 — Plain English

What this patent covers

The clever bit

The system dynamically selects the best edge-detection algorithm based on the voice query and context, rather than using a one-size-fits-all approach to finding the object's boundaries.

What it does not cover

Does not cover general voice-to-text transcription that is not linked to visual object extraction.
Does not cover visual searches that rely solely on manual user selection or cropping without voice input.
Does not cover object detection methods that use a single, fixed edge-detection algorithm for all image types.

Patent timeline

FilingOct 3, 2011

Application submitted to the patent office

PublicationAug 4, 2015

Application published, typically 18 months after filing

GrantAug 4, 2015

Patent officially issued

PatentBrief Score

Impact Score

Moderate

Citation count

14/40

Early citations

Claim breadth

15/20

Broad claims

Recency

5/20

Granted 10–20 years ago

Assignee scale

20/20

Major company or institution

PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.

Heuristic Value Estimate

What this patent might be worth

Modest

$55K – $175K

Midpoint $109K · 5.2 yr remaining · industry ×1.6

Adjust inputs →

Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.

Claim text not yet imported for this patent

The original legal language

Original claims

22 claims as filed with the patent office.

Concepts involved

Claim Prior art Non-obviousness Novelty Specification Assignee Patent term

Citations

Patent lineage

Cites earlier patents

earlier patents this invention cites as foundations

View prior art →

Cited by later patents

later patents that build on this invention

View patents →

Cite this patent

Athans, E. J., Hammontree, M. L., & Bapat, V. (2015). How Voice Commands Help Computers Find Objects in Pictures and Videos (U.S. Patent No. 9,098,533). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/9098533/amazon-kinesis

Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.

Embed

Add this patent to your site

Drop this plain-English patent card into any blog post or article — free, no signup. It always links back to the full breakdown here.

<div data-patentlens-widget data-patent-number="US9098533"></div>
<script src="https://patentbrief.org/embed.js" async></script>

Stay in the loop

Get a weekly digest of new patents.

One email per week. No spam. Unsubscribe anytime.

Keep exploring

Related patents you should know

US 4683195 · 1987

How to Make Billions of Copies of a DNA Segment

This patent describes the Polymerase Chain Reaction (PCR), a method to rapidly create many copies of a specific piece of DNA or RNA, enabling its detection and analysis.

Cetus Corp

US 8697359 · 2014

How to Edit Genes in Human Cells Using an Engineered CRISPR System

This patent describes an engineered CRISPR-Cas9 system for precisely cutting DNA in eukaryotic cells to change how genes work, opening the door for gene editing in complex organisms.

Massachusetts Institute of Technology

US 7657849 · 2010

How the iPhone's Slide-to-Unlock Gesture Works

Apple's 2010 patent describes unlocking a device by dragging a specific graphical image across the touchscreen along a predefined path, a gesture that became iconic with the original iPhone.

Apple Inc

US 4733665 · 1988

How Doctors Implant a Permanent Stent Using a Balloon

This patent describes the method for placing a permanent, expandable wire mesh tube inside a blood vessel or other body tube using a balloon-tipped catheter to widen it and keep it open.

Expandable Grafts Partnership

US 4965188 · 1990

How to Make Many Copies of a DNA Piece with Heat

This patent describes the Polymerase Chain Reaction (PCR) method, a technique to make millions of copies of a specific DNA segment using a heat-resistant enzyme and repeated temperature changes.

Cetus Corp

US 4235871 · 1980

How to Encapsulate Active Materials in Lipid Bubbles Efficiently

This patent describes a method for trapping biologically active substances inside tiny, multi-layered fat bubbles called liposomes, using a specific water-in-oil emulsion and gel-forming process to improve how much material gets captured.

Individual

Semantically similar

You might also find these interesting

SEARCH ALL

software

US 10311112 · 2019 · Zorroa Corp

How to Search Through Long Videos Using a Compressed Visual Timeline

software

US 8862582 · 2014 · AT&T Intellectual Property I LP

How Computers Automatically Organize and Search Photos Using Contextual Data

US 9965705 · 2018 · Baidu USA LLC

How AI Uses Question-Guided Attention to Answer Questions About Images

consumer electronics

US 8825660 · 2014 · eBay Inc

How eBay Uses Image Fingerprints to Search for Products

More to explore

Frequently Asked Questions

What does How Voice Commands Help Computers Find Objects in Pictures and Videos cover?

A method for using voice commands to tell a computer which object in a photo or video you want to search for, allowing it to automatically isolate that object and perform a visual search.

Who owns patent US 9098533?

Microsoft Technology Licensing LLC owns this patent, granted in 2015.

When does this patent expire?

This patent is expected to expire on August 4, 2035, when the invention enters the public domain.

What is patent US 9098533 cited by?

This patent has been cited by 4 later patents that build on its ideas.

What problem does this patent solve?

What does this patent NOT cover?

Does not cover general voice-to-text transcription that is not linked to visual object extraction.

Same assignee

More from Microsoft Technology Licensing LLC

View all →

US 12217035·2025

How to Safely Shut Down Microservices Without Breaking Apps

US 11170293·2021

How AI Systems Learn to Predict and Act Simultaneously

US 11062228·2021

How AI Learns New Tasks Using Old Data Labels

US 10543427·2020

How Game Controllers Change Button Functions Using Plug-in Accessories

Patent monitoring

Get notified when Microsoft Technology Licensing LLC files a new patent

Last reviewed: June 15, 2026 · PatentBrief is not a law firm and this is not legal advice.

How Voice Commands Help Computers Find Objects in Pictures and Videos

What does this patent actually cover?

What does this patent NOT cover?

Key facts

Real-world examples

The bigger picture

Who's building on this

What this patent covers

Patent timeline

Impact Score

What this patent might be worth

Original claims

Patent lineage

Cite this patent

Add this patent to your site

Get a weekly digest of new patents.

Related patents you should know

How to Make Billions of Copies of a DNA Segment

How to Edit Genes in Human Cells Using an Engineered CRISPR System

How the iPhone's Slide-to-Unlock Gesture Works

How Doctors Implant a Permanent Stent Using a Balloon

How to Make Many Copies of a DNA Piece with Heat

How to Encapsulate Active Materials in Lipid Bubbles Efficiently

You might also find these interesting

How to Search Through Long Videos Using a Compressed Visual Timeline

How Computers Automatically Organize and Search Photos Using Contextual Data

How AI Uses Question-Guided Attention to Answer Questions About Images

How eBay Uses Image Fingerprints to Search for Products

More in Consumer Electronics

How the iPhone's Slide-to-Unlock Gesture Works

How Touchscreens Understand Your Finger Swipes and Scrolls

How Stores Make Custom Products On-Demand with Remote Approval

How Touchscreens Show and Snap Back When You Scroll Past an Edge

Frequently Asked Questions

More from Microsoft Technology Licensing LLC

Get notified when Microsoft Technology Licensing LLC files a new patent