PatentBrief · Patent BriefUS 6424946

How Computers Automatically Label Different Speakers in Audio Recordings

A method for identifying and labeling speakers in audio recordings, even if the system has never heard the person speak before, by grouping similar voices and asking a user to name them.

Patent Number

US 6424946

Status

Expired

Filing Date

November 5, 1999

Grant Date

July 23, 2002

Expiration

November 5, 2019

Claims

Assignee

International Business Machines Corp

Inventors

Alain Charles Louis Tritschler, Mahesh Viswanathan

Citations

75 forward · 2 backward

What it covers

This system processes audio to identify who is speaking by breaking the audio into segments and comparing them against a database. It uses 'background models' for people not yet in the system, such as a generic 'unenrolled male' or 'unenrolled female' profile. When the system detects a recurring voice that it doesn't recognize, it groups those segments together into a cluster. A user can then provide a name for that cluster, and the system automatically updates its database to recognize that person in future recordings.

What it doesn't cover

—Does not cover real-time voice synthesis or voice modification.
—Does not cover hardware-based microphones or physical audio capture devices.
—Does not cover methods that require every speaker to be pre-enrolled before identification can begin.
—Does not cover the specific linguistic analysis of the words being spoken, only the identification of the speaker.

The clever bit

The system uses a 'background model' for unknown speakers, allowing it to track and cluster voices it hasn't learned yet, rather than simply failing to identify them or labeling them as noise.

Why it matters

This patent laid the groundwork for modern transcription services that distinguish between multiple participants in a meeting. It solved the 'cold start' problem in voice recognition, where a system could not label a speaker until they were manually registered. This is now a standard feature in enterprise meeting software and legal transcription tools.

Real-world examples

1.Automated meeting transcription services like Otter.ai
2.Zoom and Microsoft Teams speaker identification features
3.Legal and medical transcription software
4.Call center analytics platforms

Generated by PatentBrief · Not legal advice · patentbrief.org

US 6424946 · 2026