# How Smart Speakers Know You're Talking to Them After a Command

> This patent describes how a smart speaker system can tell if follow-up speech is meant for it, even without a "wake word," by analyzing voice activity and partial speech recognition results using an AI model.

- **Patent:** US 11361763
- **Original title:** Detecting system-directed speech
- **Owner:** Amazon Technologies
- **Granted:** 2022
- **Status:** Active
- **Times cited:** 64
- **Field:** consumer_electronics, software, ai_ml, telecommunications

## What it does

The patent details a method for a system, like a smart speaker, to identify if incoming speech is directed at it, especially after an initial interaction. First, the system receives an first command, processes it, and responds. Crucially, it then instructs the device to send subsequent audio without requiring a wake word (Claim 1). When this second audio arrives, the system first checks for voice activity. Then, it performs automatic speech recognition (ASR) on the audio. While ASR is running, it simultaneously creates a "feature vector" from the early parts of the ASR results and feeds this into a deep neural network (DNN). This DNN calculates a score indicating how likely the speech is intended for the system. If the score passes a certain level, the system proceeds to understand the full speech and act on it. For example, after you say "Alexa, set a timer for 10 minutes," the system might then listen for a follow-up like "and add a reminder for my meeting" without needing you to say "Alexa" again.

## What it does NOT cover

- Does not cover systems that always require a wake word for every interaction.
- Does not cover systems that rely solely on voice activity detection to determine system-directed speech.
- Does not cover determining system-directed speech without using a deep neural network on a feature vector derived from partial ASR results.
- Does not cover systems where the device itself determines the presence of a wake word in the second input audio data (Claim 1 explicitly states "without the device determining a presence of a wakeword").
- Does not cover systems that only use full ASR results, rather than partial ASR results, to create the feature vector for the DNN.

## The clever bit

The innovation lies in using partial ASR results, combined with other audio characteristics, to predict if speech is system-directed in parallel with the main ASR process. This allows the system to quickly decide whether to fully process the speech or discard it, saving computational resources and improving responsiveness for follow-up commands.

## Real-world examples

1. Amazon Alexa's "Follow-Up Mode"
2. Google Assistant's "Continued Conversation"
3. Most modern smart speaker follow-up interactions

## Why it matters

This technology is fundamental for creating more natural and conversational interactions with voice assistants. It allows users to have follow-up conversations without repeatedly saying the wake word, making the experience smoother and less clunky. This capability is key to the user experience of modern smart speakers and virtual assistants, enabling multi-turn dialogues.

## Frequently asked questions

### What does How Smart Speakers Know You're Talking to Them After a Command cover?

This patent describes how a smart speaker system can tell if follow-up speech is meant for it, even without a "wake word," by analyzing voice activity and partial speech recognition results using an AI model.

### Who owns patent US 11361763?

Amazon Technologies owns this patent, granted in 2022.

### When does this patent expire?

This patent is expected to expire on September 1, 2037, when the invention enters the public domain.

### What is patent US 11361763 cited by?

This patent has been cited by 64 later patents that build on its ideas.

### What problem does this patent solve?

This technology is fundamental for creating more natural and conversational interactions with voice assistants. It allows users to have follow-up conversations without repeatedly saying the wake word, making the experience smoother and less clunky. This capability is key to the user experience of modern smart speakers and virtual assistants, enabling multi-turn dialogues.

### What does this patent NOT cover?

Does not cover systems that always require a wake word for every interaction.

**Full plain-English explainer:** https://patentbrief.org/patent/us/11361763/detecting-system-directed-speech

**Original patent:** https://patents.google.com/patent/US11361763

---

_Source: PatentBrief — https://patentbrief.org. Patent facts are from public records; the plain-English explanation is PatentBrief's._


## Related patents

Semantically similar inventions in the PatentBrief corpus:

- [How a Digital Assistant Launches Apps Using Your Voice](https://patentbrief.org/patent/us/9548050/continuity-handoff) — This patent describes how a digital assistant like Siri uses your spoken words and understanding of your conversation to figure out what you want and launch the right app.
- [How Digital Assistants Control Apps and Ask for More Information](https://patentbrief.org/patent/us/11204787/github-copilot-code-generation-ai) — This patent describes how a digital assistant on a device can understand what a user wants from a natural language command, find the right app, get a step-by-step guide from another device, and then ask the user for more details on the screen to complete the task with that app.
- [How Sonos Speakers Use Personalized Wake Words to Recognize Different Users](https://patentbrief.org/patent/us/9965247/icloud-drive) — A system that lets multiple people control a shared speaker by using unique voice-trigger words to link their specific music accounts and preferences.
- [How Software Detects What You Want Based on Your Social Media Posts](https://patentbrief.org/patent/us/8521818/facebook-share-button) — A system that reads your social media posts to figure out your intent, then automatically serves ads or updates your profile based on how likely you are to actually buy or do something.
- [How Computers Automatically Label Different Speakers in Audio Recordings](https://patentbrief.org/patent/us/6424946/amazon-personalized-recommendations) — A method for identifying and labeling speakers in audio recordings, even if the system has never heard the person speak before, by grouping similar voices and asking a user to name them.
