How AI Predicts Who Will Speak Next in a Conversation
IBM's patent describes a system that uses neural networks to analyze speech patterns and intentions to predict which person will talk next in a conversation.
Patent Number
US 11645473
Status
Active
Filing Date
December 23, 2020
Grant Date
May 9, 2023
Expiration
~December 2040 (estimated)
Claims
23
Assignee
International Business Machines Corp
Inventors
Emily Mower Provost, Lazaros Polymenakos, Zakaria Aldeneh, Dimitrios B. Dimitriadis
Citations
0 forward · 25 backward
What it covers
This system uses a neural network to monitor a conversation between two people and figure out who is likely to speak next. It does this by simultaneously analyzing two things: the speaker's 'intention' (like whether they are asking a question or making a statement) and the 'turn type' (like whether they are trying to hold the floor or are about to switch speakers). By combining these predictions using a joint loss function, the system can determine if the first person will keep talking or if the second person will jump in. For example, if the system detects a rising pitch at the end of a sentence (an acoustic cue) and identifies the intention as a question, it can predict a turn switch to the other person.
What it doesn't cover
- —Does not cover systems that only look at text transcripts without analyzing acoustic cues like pitch or speaking rate.
- —Does not cover simple rule-based systems that rely solely on silence duration to detect turn-taking.
- —Does not cover systems that predict the content of the next sentence, only the identity of the next speaker.
The clever bit
The innovation lies in training a single neural network to jointly optimize two different tasks—intention recognition and turn-type detection—rather than treating them as separate, sequential steps.
Why it matters
Managing natural conversation flow is a major hurdle for voice assistants like Siri or Alexa, which often interrupt users or stop listening too early. By accurately predicting turn-taking, this technology aims to make human-computer interactions feel more fluid and less robotic. It is part of a broader effort to move beyond simple keyword recognition into true conversational understanding.
Real-world examples
- 1.Smart home voice assistants
- 2.Automated customer service phone bots
- 3.Real-time speech-to-text transcription software
Generated by PatentBrief · Not legal advice · patentbrief.org
US 11645473 · 2026