# How Computers Automatically Label Different Speakers in Audio Recordings

> A method for identifying and labeling speakers in audio recordings, even if the system has never heard the person speak before, by grouping similar voices and asking a user to name them.

- **Patent:** US 6424946
- **Original title:** Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering
- **Owner:** International Business Machines Corp
- **Granted:** 2002
- **Status:** Public domain (expired)
- **Times cited:** 75
- **Field:** software, ai_ml, telecommunications

## What it does

This system processes audio to identify who is speaking by breaking the audio into segments and comparing them against a database. It uses 'background models' for people not yet in the system, such as a generic 'unenrolled male' or 'unenrolled female' profile. When the system detects a recurring voice that it doesn't recognize, it groups those segments together into a cluster. A user can then provide a name for that cluster, and the system automatically updates its database to recognize that person in future recordings.

## What it does NOT cover

- Does not cover real-time voice synthesis or voice modification.
- Does not cover hardware-based microphones or physical audio capture devices.
- Does not cover methods that require every speaker to be pre-enrolled before identification can begin.
- Does not cover the specific linguistic analysis of the words being spoken, only the identification of the speaker.

## The clever bit

The system uses a 'background model' for unknown speakers, allowing it to track and cluster voices it hasn't learned yet, rather than simply failing to identify them or labeling them as noise.

## Real-world examples

1. Automated meeting transcription services like Otter.ai
2. Zoom and Microsoft Teams speaker identification features
3. Legal and medical transcription software
4. Call center analytics platforms

## Why it matters

This patent laid the groundwork for modern transcription services that distinguish between multiple participants in a meeting. It solved the 'cold start' problem in voice recognition, where a system could not label a speaker until they were manually registered. This is now a standard feature in enterprise meeting software and legal transcription tools.

## Frequently asked questions

### What does How Computers Automatically Label Different Speakers in Audio Recordings cover?

A method for identifying and labeling speakers in audio recordings, even if the system has never heard the person speak before, by grouping similar voices and asking a user to name them.

### Who owns patent US 6424946?

International Business Machines Corp owns this patent, granted in 2002.

### When does this patent expire?

This patent has expired and is now in the public domain — anyone can use the invention freely.

### What is patent US 6424946 cited by?

This patent has been cited by 75 later patents that build on its ideas.

### What problem does this patent solve?

This patent laid the groundwork for modern transcription services that distinguish between multiple participants in a meeting. It solved the 'cold start' problem in voice recognition, where a system could not label a speaker until they were manually registered. This is now a standard feature in enterprise meeting software and legal transcription tools.

### What does this patent NOT cover?

Does not cover real-time voice synthesis or voice modification.

**Full plain-English explainer:** https://patentbrief.org/patent/us/6424946/amazon-personalized-recommendations

**Original patent:** https://patents.google.com/patent/US6424946

---

_Source: PatentBrief — https://patentbrief.org. Patent facts are from public records; the plain-English explanation is PatentBrief's._


## Related patents

Semantically similar inventions in the PatentBrief corpus:

- [How to Automatically Generate Musical Harmonies from Audio](https://patentbrief.org/patent/us/8168877/musical-harmony-generation-from-polyphonic-audio-signals) — This 2012 patent describes a system that listens to music and automatically generates harmony notes to accompany a melody, even detecting and ignoring accidental strums on stringed instruments.
- [How AI Models Understand Language Using 'Attention'](https://patentbrief.org/patent/us/10452978/transformer-attention-mechanism) — This patent describes a neural network architecture, known as a Transformer, that uses a "self-attention" mechanism to process sequences of information, like words in a sentence, by weighing the importance of different parts of the input.
- [How to Seamlessly Switch Video Streams for Many Viewers](https://patentbrief.org/patent/us/6732183/video-and-audio-streaming-for-multiple-users) — This patent describes a computer system that allows an administrator or viewer to smoothly switch between different video or audio sources for many people watching at the same time, without interrupting their viewing experience.
- [How AI Uses Question-Guided Attention to Answer Questions About Images](https://patentbrief.org/patent/us/9965705/systems-and-methods-for-attention-based-configurable-convolutional-neural-networks-abc-cnn-for-visual-question-answering) — A method for AI to answer questions about images by dynamically focusing on relevant parts of the picture based on the specific question asked.
- [How Computers Calculate Probabilities in Large Knowledge Bases](https://patentbrief.org/patent/us/9361579/large-scale-probabilistic-ontology-reasoning) — A method for finding answers in a database of uncertain facts by ignoring probabilities to find a solution first, then calculating how likely that solution is based on the underlying evidence.
