PatentBrief — patentbrief.org

10878335 USView original on Google PatentsCompare ↗Brief ↗

How Computers Find Similar Text Using Compact Data Structures

Name: PatentBrief
Address: Phoenix, AZ, US
Price range: Free

This patent describes a method for efficiently identifying similar text records, like documents or product reviews, by using special compact data structures that store text terms probabilistically and then analyzing them with machine learning.

Granted 2020ActiveExpires 2036Owned by Amazon Technologies IncInvented by Robert Mark Waugh

Original patent title: “Scalable text analysis using probabilistic data structures”

Plain-English explanation by SahiLast reviewed · June 15, 2026

Coverage

What does this patent actually cover?

This system (Claim 1) takes a piece of text, such as a product review, and uses a "hashing-based function" to map its words (e.g., "excellent") to specific spots in a "probabilistic data structure." This data structure acts like a compact, fuzzy summary of many other text records. When a word is mapped, the system updates an entry in this structure to indicate the word's presence. Importantly, these entries can represent multiple words (Claim 1), making the structure very efficient. After updating, the system applies a "dimensionality reduction algorithm" to simplify the data, then feeds this into a "similarity detection algorithm" to figure out how much the new text is like other texts it has seen. For example, it could find customer reviews that discuss similar product features.

The gap

What does this patent NOT cover?

Does not cover systems that store every single word explicitly in a traditional database for similarity comparison, as it relies on probabilistic storage where entries can represent more than one text term.
Does not cover similarity detection that doesn't use a probabilistic data structure as the initial input for further analysis.
Does not cover text analysis methods that do not involve applying a hashing-based function to text terms to update the data structure.
Does not cover systems that omit the step of applying a dimensionality reduction algorithm on the probabilistic data structure before generating similarity indications.
Does not cover combining data structures without using bit-level Boolean operations or vector instructions, as specified in Claim 3.

These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.

Key facts

Patent number	US 10878335
Status	Active
Field	Software & Internet
Assignee	Amazon Technologies Inc
Inventor	Robert Mark Waugh
Filed	2016
Granted	2020
Claims	23
Times cited	18
Litigation	None on record

Value · $187K–$599KModest

What made this novel

The novelty lies in using probabilistic data structures, where multiple terms can share entries, as the direct input for machine learning algorithms like dimensionality reduction and similarity detection. This allows for highly scalable text analysis without needing to store full text or traditional, large term-frequency matrices.

Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.

Where you've seen this

Real-world examples

Amazon product recommendation systems

Customer review analysis for sentiment and trends

Content moderation for online platforms

Document clustering in large datasets

Spam detection in email services

Why it matters

The bigger picture

This patent is important for processing huge amounts of text data efficiently, which is common in cloud services and e-commerce. By using probabilistic data structures, it allows for faster and more resource-friendly analysis of customer reviews, product descriptions, or documents. This efficiency helps companies quickly identify trends, recommend products, or moderate content without needing vast storage for every single word.

Filed

June 14, 2016

Granted

December 29, 2020

Market context

Who's building on this

Companies in this space

Amazon Technologies Inc. is the assignee and continues to build on and utilize such technologies for its vast e-commerce, cloud computing (AWS), and digital content services. Other major cloud providers like Google and Microsoft, as well as companies in data analytics and AI, also develop and use similar scalable text processing techniques.

Market impact

This type of technology enables companies to process and understand massive volumes of unstructured text data more efficiently, which is crucial for modern internet services. It underpins features like personalized recommendations, improved search results, and automated content analysis, allowing for better user experiences and more targeted advertising across various platforms.

Claim 1 — Plain English

What this patent covers

The clever bit

What it does not cover

Does not cover systems that store every single word explicitly in a traditional database for similarity comparison, as it relies on probabilistic storage where entries can represent more than one text term.
Does not cover similarity detection that doesn't use a probabilistic data structure as the initial input for further analysis.
Does not cover text analysis methods that do not involve applying a hashing-based function to text terms to update the data structure.
Does not cover systems that omit the step of applying a dimensionality reduction algorithm on the probabilistic data structure before generating similarity indications.
Does not cover combining data structures without using bit-level Boolean operations or vector instructions, as specified in Claim 3.

Patent timeline

FilingJun 14, 2016

Application submitted to the patent office

PublicationDec 29, 2020

Application published, typically 18 months after filing

GrantDec 29, 2020

Patent officially issued

PatentBrief Score

Impact Score

Strong

Citation count

26/40

Moderately cited

Claim breadth

15/20

Broad claims

Recency

10/20

Granted 5–10 years ago

Assignee scale

20/20

Major company or institution

PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.

Heuristic Value Estimate

What this patent might be worth

Modest

$187K – $599K

Midpoint $374K · 9.9 yr remaining · industry ×1.6

Adjust inputs →

Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.

Claim text not yet imported for this patent

The original legal language

Original claims

23 claims as filed with the patent office.

Concepts involved

Claim Prior art Non-obviousness Novelty Specification Assignee Patent term

Citations

Patent lineage

Cites earlier patents

earlier patents this invention cites as foundations

View prior art →

Cited by later patents

later patents that build on this invention

View patents →

Cite this patent

Waugh, R. M. (2020). How Computers Find Similar Text Using Compact Data Structures (U.S. Patent No. 10,878,335). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/10878335/bert-bidirectional-encoder-representations

Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.

Embed

Add this patent to your site

Drop this plain-English patent card into any blog post or article — free, no signup. It always links back to the full breakdown here.

<div data-patentlens-widget data-patent-number="US10878335"></div>
<script src="https://patentbrief.org/embed.js" async></script>

Stay in the loop

Get a weekly digest of new patents.

One email per week. No spam. Unsubscribe anytime.

Keep exploring

Related patents you should know

US 4683195 · 1987

How to Make Billions of Copies of a DNA Segment

This patent describes the Polymerase Chain Reaction (PCR), a method to rapidly create many copies of a specific piece of DNA or RNA, enabling its detection and analysis.

Cetus Corp

US 8697359 · 2014

How to Edit Genes in Human Cells Using an Engineered CRISPR System

This patent describes an engineered CRISPR-Cas9 system for precisely cutting DNA in eukaryotic cells to change how genes work, opening the door for gene editing in complex organisms.

Massachusetts Institute of Technology

US 7657849 · 2010

How the iPhone's Slide-to-Unlock Gesture Works

Apple's 2010 patent describes unlocking a device by dragging a specific graphical image across the touchscreen along a predefined path, a gesture that became iconic with the original iPhone.

Apple Inc

US 4733665 · 1988

How Doctors Implant a Permanent Stent Using a Balloon

This patent describes the method for placing a permanent, expandable wire mesh tube inside a blood vessel or other body tube using a balloon-tipped catheter to widen it and keep it open.

Expandable Grafts Partnership

US 4965188 · 1990

How to Make Many Copies of a DNA Piece with Heat

This patent describes the Polymerase Chain Reaction (PCR) method, a technique to make millions of copies of a specific DNA segment using a heat-resistant enzyme and repeated temperature changes.

Cetus Corp

US 4235871 · 1980

How to Encapsulate Active Materials in Lipid Bubbles Efficiently

This patent describes a method for trapping biologically active substances inside tiny, multi-layered fat bubbles called liposomes, using a specific water-in-oil emulsion and gel-forming process to improve how much material gets captured.

Individual

Semantically similar

You might also find these interesting

SEARCH ALL

ai ml

US 6523026 · 2003 · Huntsman International LLC

How Computers Find Hidden Connections Between Different Fields of Knowledge

software

US 10909459 · 2021 · Cognizant Technology Solutions US Corp

Teaching Computers to Understand Document Similarity Using AI

software

US 9330167 · 2016 · Groupon Inc

How Groupon Automatically Categorizes Merchant Services Using Text Analysis

software

US 9607103 · 2017 · Ab Initio Technology LLC

How Computers Match and Join Messy Data from Different Sources

More to explore

Frequently Asked Questions

What does How Computers Find Similar Text Using Compact Data Structures cover?

Who owns patent US 10878335?

Amazon Technologies Inc owns this patent, granted in 2020.

When does this patent expire?

This patent is expected to expire on December 29, 2040, when the invention enters the public domain.

What is patent US 10878335 cited by?

This patent has been cited by 18 later patents that build on its ideas.

What problem does this patent solve?

What does this patent NOT cover?

Does not cover systems that store every single word explicitly in a traditional database for similarity comparison, as it relies on probabilistic storage where entries can represent more than one text term.

Same assignee

More from Amazon Technologies Inc

View all →

US 10824959·2020

How to Make Artificial Intelligence Explain Its Own Decisions

US 9881277·2018

How Amazon Tracks Warehouse Workers' Hands Using Radio Waves

US 9544394·2017

How CDNs Use Client-Side Code to Speed Up Web Downloads

US 9535948·2017

How Software Automatically Translates Database Queries for Different Storage Systems

Patent monitoring

Get notified when Amazon Technologies Inc files a new patent

Last reviewed: June 15, 2026 · PatentBrief is not a law firm and this is not legal advice.

How Computers Find Similar Text Using Compact Data Structures

What does this patent actually cover?

What does this patent NOT cover?

Key facts

Real-world examples

The bigger picture

Who's building on this

What this patent covers

Patent timeline

Impact Score

What this patent might be worth

Original claims

Patent lineage

Cite this patent

Add this patent to your site

Get a weekly digest of new patents.

Related patents you should know

How to Make Billions of Copies of a DNA Segment

How to Edit Genes in Human Cells Using an Engineered CRISPR System

How the iPhone's Slide-to-Unlock Gesture Works

How Doctors Implant a Permanent Stent Using a Balloon

How to Make Many Copies of a DNA Piece with Heat

How to Encapsulate Active Materials in Lipid Bubbles Efficiently

You might also find these interesting

How Computers Find Hidden Connections Between Different Fields of Knowledge

Teaching Computers to Understand Document Similarity Using AI

How Groupon Automatically Categorizes Merchant Services Using Text Analysis

How Computers Match and Join Messy Data from Different Sources

More in Software & Internet

How RSA Public-Key Encryption Keeps Digital Messages Secret

How Websites Get Ranked by Importance

How Amazon's One-Click Ordering Works for Online Purchases

Displaying Friends' Activities in a Social Network Feed

Frequently Asked Questions

More from Amazon Technologies Inc

Get notified when Amazon Technologies Inc files a new patent