Skip to content
PatentBrief
Get alertsTop ↑

How Computers Automatically Label Different Speakers in Audio Recordings

A method for identifying and labeling speakers in audio recordings, even if the system has never heard the person speak before, by grouping similar voices and asking a user to name them.

Granted 2002ExpiredExpired 2019Owned by International Business Machines CorpInvented by Alain Charles Louis Tritschler, Mahesh Viswanathan

Original patent title: “Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering

Plain-English explanation by SahiLast reviewed · June 13, 2026

A method for identifying and labeling speakers in audio recordings, even if the system has never heard the person speak before, by grouping similar voices and asking a user to name them. Granted to International Business Machines Corp in 2002 with 33 claims and 75 forward citations, and it is now in the public domain.

Key facts

Patent numberUS 6424946
StatusExpired
FieldSoftware & Internet
AssigneeInternational Business Machines Corp
InventorsAlain Charles Louis Tritschler, Mahesh Viswanathan
Filed1999
Granted2002
Expires2019 (expired)
Claims33
Times cited75
LitigationNone on record
Value · $58K$184KModest

Coverage

What does this patent actually cover?

This system processes audio to identify who is speaking by breaking the audio into segments and comparing them against a database. It uses 'background models' for people not yet in the system, such as a generic 'unenrolled male' or 'unenrolled female' profile. When the system detects a recurring voice that it doesn't recognize, it groups those segments together into a cluster. A user can then provide a name for that cluster, and the system automatically updates its database to recognize that person in future recordings.

The gap

What does this patent NOT cover?

  • Does not cover real-time voice synthesis or voice modification.
  • Does not cover hardware-based microphones or physical audio capture devices.
  • Does not cover methods that require every speaker to be pre-enrolled before identification can begin.
  • Does not cover the specific linguistic analysis of the words being spoken, only the identification of the speaker.

These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.

What made this novel

The system uses a 'background model' for unknown speakers, allowing it to track and cluster voices it hasn't learned yet, rather than simply failing to identify them or labeling them as noise.

The Patent Drawing

Representative patent drawing for Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering (US 6424946)
Representative figure · US 6424946All figures on Google Patents →
Methods and apparatus for unkn…(Primary claim)softwareai mltelecommunications

Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.

Where you've seen this

Real-world examples

01

Automated meeting transcription services like Otter.ai

02

Zoom and Microsoft Teams speaker identification features

03

Legal and medical transcription software

04

Call center analytics platforms

Why it matters

The bigger picture

This patent laid the groundwork for modern transcription services that distinguish between multiple participants in a meeting. It solved the 'cold start' problem in voice recognition, where a system could not label a speaker until they were manually registered. This is now a standard feature in enterprise meeting software and legal transcription tools.

Filed

November 5, 1999

Granted

July 23, 2002

Market context

Who's building on this

Companies in this space

IBM remains a significant player in enterprise AI and natural language processing. Modern cloud providers like Amazon (AWS Transcribe), Google (Cloud Speech-to-Text), and Microsoft (Azure Speech) have expanded these foundational clustering techniques into massive, scalable services.

Market impact

This technology enabled the transition from simple speech-to-text to 'speaker diarization,' which is the ability to create a 'who spoke when' log. It turned raw transcripts into structured data, which is essential for modern business productivity tools and compliance monitoring in regulated industries.

Claim 1 — Plain English

What this patent covers

This system processes audio to identify who is speaking by breaking the audio into segments and comparing them against a database. It uses 'background models' for people not yet in the system, such as a generic 'unenrolled male' or 'unenrolled female' profile. When the system detects a recurring voice that it doesn't recognize, it groups those segments together into a cluster. A user can then provide a name for that cluster, and the system automatically updates its database to recognize that person in future recordings.

The clever bit

The system uses a 'background model' for unknown speakers, allowing it to track and cluster voices it hasn't learned yet, rather than simply failing to identify them or labeling them as noise.

What it does not cover

  • Does not cover real-time voice synthesis or voice modification.
  • Does not cover hardware-based microphones or physical audio capture devices.
  • Does not cover methods that require every speaker to be pre-enrolled before identification can begin.
  • Does not cover the specific linguistic analysis of the words being spoken, only the identification of the speaker.

Patent Journey

From filing to expiry

PatentBrief Score

Impact Score

Moderate

Citation count

38/40

Highly cited

Claim breadth

20/20

Very broad protection

Recency

0/20

Older than 20 years

Assignee scale

0/20

Independent or smaller assigneeassigneeThe entity that owns the patent — usually the inventor's employer or a company.Read more →

PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.

Heuristic Value Estimate

What this patent might be worth

Modest

$58K$184K

Midpoint $115K · expired or expiring · industry ×1.6

Adjust inputs →

Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.

The original legal language

Original claims

33 claims as filed with the patent office.

Concepts involved

ClaimPrior artNon-obviousnessNoveltySpecificationAssigneePatent term

Citations

Patent lineage

Cites earlier patents

2

earlier patents this invention cites as foundations

View prior art →

Cited by later patents

75

later patents that build on this invention

View patents →

Cite this patent

Tritschler, A. C. L., & Viswanathan, M. (2002). How Computers Automatically Label Different Speakers in Audio Recordings (U.S. Patent No. 6,424,946). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/6424946/amazon-personalized-recommendations

Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.

Embed

Add this patent to your site

Drop this plain-English patent card into any blog post or article — free, no signup. It always links back to the full breakdown here.

<div data-patentlens-widget data-patent-number="US6424946"></div>
<script src="https://patentbrief.org/embed.js" async></script>

Stay in the loop

Get a weekly digest of new patents.

One email per week. No spam. Unsubscribe anytime.

Keep exploring

Related patents you should know

US 4683195 · 1987

How to Make Billions of Copies of a DNA Segment

This patent describes the Polymerase Chain Reaction (PCR), a method to rapidly create many copies of a specific piece of DNA or RNA, enabling its detection and analysis.

Cetus Corp

US 8697359 · 2014

How to Edit Genes in Human Cells Using an Engineered CRISPR System

This patent describes an engineered CRISPR-Cas9 system for precisely cutting DNA in eukaryotic cells to change how genes work, opening the door for gene editing in complex organisms.

Massachusetts Institute of Technology

US 7657849 · 2010

How the iPhone's Slide-to-Unlock Gesture Works

Apple's 2010 patent describes unlocking a device by dragging a specific graphical image across the touchscreen along a predefined path, a gesture that became iconic with the original iPhone.

Apple Inc

US 4733665 · 1988

How Doctors Implant a Permanent Stent Using a Balloon

This patent describes the method for placing a permanent, expandable wire mesh tube inside a blood vessel or other body tube using a balloon-tipped catheter to widen it and keep it open.

Expandable Grafts Partnership

US 4405829 · 1983

How RSA Public-Key Encryption Keeps Digital Messages Secret

This patent describes the foundational RSA algorithm, a method for securely sending messages where anyone can encrypt a message using a public key, but only the intended recipient can decrypt it using a secret private key.

Massachusetts Institute of Technology

US 4575330 · 1986

How 3D Printers Build Objects Layer by Layer from Liquid

This patent describes the foundational method for 3D printing, where a machine builds a three-dimensional object layer by layer by hardening a liquid material with light or other energy.

UVP Inc

Semantically similar

You might also find these interesting

SEARCH ALL

More to explore

More in Software & Internet

Browse all Software & Internet

New to patents?

What is a patent?How to read a patentAnatomy of a claimHow strong is this patent?What the citations meanWhat it doesn't coverSoftware PatentsPatent glossary

Common Questions

Frequently Asked Questions

What does How Computers Automatically Label Different Speakers in Audio Recordings cover?

A method for identifying and labeling speakers in audio recordings, even if the system has never heard the person speak before, by grouping similar voices and asking a user to name them.

Who owns patent US 6424946?

International Business Machines Corp owns this patent, granted in 2002.

When does this patent expire?

This patent has expired and is now in the public domain — anyone can use the invention freely.

What is patent US 6424946 cited by?

This patent has been cited by 75 later patents that build on its ideas.

What problem does this patent solve?

This patent laid the groundwork for modern transcription services that distinguish between multiple participants in a meeting. It solved the 'cold start' problem in voice recognition, where a system could not label a speaker until they were manually registered. This is now a standard feature in enterprise meeting software and legal transcription tools.

What does this patent NOT cover?

Does not cover real-time voice synthesis or voice modification.

View all →
US 9959544·2018

How a Server Updates Smart Card Apps and Shows Ads

US 9361579·2016

How Computers Calculate Probabilities in Large Knowledge Bases

US 7543038·2009

How to Keep Apps Running Without a Constant Internet Connection

US 6370526·2002

Smart Ranking of Emails and Files Based on How You Click

Patent monitoring

Get notified when International Business Machines Corp files a new patent

Get notified when this company files a new patent. Weekly digest · Confirm via email · Unsubscribe anytime.

Last reviewed: June 13, 2026 · PatentBrief is not a law firm and this is not legal advice.