PatentBrief · Patent BriefUS 11645473

How AI Predicts Who Will Speak Next in a Conversation

IBM's patent describes a system that uses neural networks to analyze speech patterns and intentions to predict which person will talk next in a conversation.

Patent Number

US 11645473

Status

Active

Filing Date

December 23, 2020

Grant Date

May 9, 2023

Expiration

~December 2040 (estimated)

Claims

Assignee

International Business Machines Corp

Inventors

Emily Mower Provost, Lazaros Polymenakos, Zakaria Aldeneh, Dimitrios B. Dimitriadis

Citations

0 forward · 25 backward

What it covers

This system uses a neural network to monitor a conversation between two people and figure out who is likely to speak next. It does this by simultaneously analyzing two things: the speaker's 'intention' (like whether they are asking a question or making a statement) and the 'turn type' (like whether they are trying to hold the floor or are about to switch speakers). By combining these predictions using a joint loss function, the system can determine if the first person will keep talking or if the second person will jump in. For example, if the system detects a rising pitch at the end of a sentence (an acoustic cue) and identifies the intention as a question, it can predict a turn switch to the other person.

What it doesn't cover

—Does not cover systems that only look at text transcripts without analyzing acoustic cues like pitch or speaking rate.
—Does not cover simple rule-based systems that rely solely on silence duration to detect turn-taking.
—Does not cover systems that predict the content of the next sentence, only the identity of the next speaker.

The clever bit

The innovation lies in training a single neural network to jointly optimize two different tasks—intention recognition and turn-type detection—rather than treating them as separate, sequential steps.

Why it matters

Managing natural conversation flow is a major hurdle for voice assistants like Siri or Alexa, which often interrupt users or stop listening too early. By accurately predicting turn-taking, this technology aims to make human-computer interactions feel more fluid and less robotic. It is part of a broader effort to move beyond simple keyword recognition into true conversational understanding.

Real-world examples

1.Smart home voice assistants
2.Automated customer service phone bots
3.Real-time speech-to-text transcription software

Generated by PatentBrief · Not legal advice · patentbrief.org

US 11645473 · 2026