Skip to content
PatentBrief
Get alertsTop ↑

Teaching Computers to Understand Document Similarity Using AI

This patent describes a way to train a computer program (a neural network) to understand how similar documents are to each other, by showing it examples and teaching it to group similar ones together and separate dissimilar ones.

Granted 2021ActiveExpires 2037Owned by Cognizant Technology Solutions US CorpInvented by Diego Guy M. Legrand, Nigel Duffy, Petr TSATSIN + 1 more

Original patent title: “Content embedding using deep metric learning algorithms

Plain-English explanation by SahiLast reviewed · June 15, 2026

This patent describes a way to train a computer program (a neural network) to understand how similar documents are to each other, by showing it examples and teaching it to group similar ones together and separate dissimilar ones. Granted to Cognizant Technology Solutions US Corp in 2021 with 22 claims and 53 forward citations.

Key facts

Patent numberUS 10909459
StatusActive
FieldSoftware & Internet
AssigneeCognizant Technology Solutions US Corp
InventorsDiego Guy M. Legrand, Nigel Duffy, Petr TSATSIN and 1 other
Filed2017
Granted2021
Claims22
Times cited53
LitigationNone on record
Value · $468K$1.5MSubstantial

Coverage

What does this patent actually cover?

This patent explains how to train a computer program, specifically a neural network, to create a 'space' where documents can be placed based on their meaning. Imagine you have a target document (like an article about dogs). You also give the program a 'favored' document (another article about dogs) and several 'unfavored' documents (articles about cats, cars, or anything else). The program learns by trying to make the 'dog' documents closer together in its 'space' and further away from the 'non-dog' documents. It does this by adjusting its internal settings, called parameters, to minimize a 'loss' function. This loss function measures how well it's separating the favored document from the unfavored ones relative to the target document. For instance, a training set might include an article about 'Golden Retrievers' (target), another about 'Labradors' (favored), and articles about 'Siamese Cats' and 'Electric Cars' (unfavored). The system adjusts itself so that the 'Golden Retriever' and 'Labrador' articles are 'close' in its internal representation, while the 'Siamese Cat' and 'Electric Car' articles are 'far' from the 'Golden Retriever' article.

The gap

What does this patent NOT cover?

  • Does not cover methods that do not use a neural network for training.
  • Does not cover training methods that do not involve a target document, a favored document, and at least two unfavored documents.
  • Does not cover systems that do not calculate a 'loss' based on the distance between document representations.
  • Does not cover methods where the computer program is not 'trained' using adjustable parameters.
  • Does not cover creating an embedding space without using document vectors as input.

These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.

What made this novel

The core idea is teaching the AI not just to recognize what a document is about, but to learn the *relative similarity* between documents. By explicitly training it to bring 'good' matches closer and push 'bad' matches further away from a reference, it learns a nuanced understanding of meaning that's more effective than simply classifying documents.

Content embedding using deep m…(Primary claim)softwareai mltelecommunications

Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.

Where you've seen this

Real-world examples

01

Search engine result ranking

02

Product recommendation systems

03

Content similarity detection

04

Plagiarism detection tools

05

Customer feedback analysis

Why it matters

The bigger picture

This technology is foundational for many modern AI applications that deal with understanding and organizing large amounts of text or other data. It enables search engines, recommendation systems, and content moderation tools to better grasp the meaning and relationships between different pieces of information.

Filed

June 9, 2017

Granted

February 2, 2021

Market context

Who's building on this

Companies in this space

Cognizant Technology Solutions, the assigneeassigneeThe entity that owns the patent — usually the inventor's employer or a company.Read more →, is a major IT services company that likely uses and builds upon such AI techniques for its clients. Many other tech companies, including major cloud providers like Google, Amazon, and Microsoft, develop and deploy similar deep metric learning algorithms for their search, recommendation, and AI services.

Market impact

This patent's approach to learning document embeddings has become a standard technique in the field of Natural Language Processing (NLP). It underpins the effectiveness of many modern search and recommendation engines, allowing them to provide more relevant results and suggestions by understanding semantic similarity rather than just keyword matching.

Claim 1 — Plain English

What this patent covers

This patent explains how to train a computer program, specifically a neural network, to create a 'space' where documents can be placed based on their meaning. Imagine you have a target document (like an article about dogs). You also give the program a 'favored' document (another article about dogs) and several 'unfavored' documents (articles about cats, cars, or anything else). The program learns by trying to make the 'dog' documents closer together in its 'space' and further away from the 'non-dog' documents. It does this by adjusting its internal settings, called parameters, to minimize a 'loss' function. This loss function measures how well it's separating the favored document from the unfavored ones relative to the target document. For instance, a training set might include an article about 'Golden Retrievers' (target), another about 'Labradors' (favored), and articles about 'Siamese Cats' and 'Electric Cars' (unfavored). The system adjusts itself so that the 'Golden Retriever' and 'Labrador' articles are 'close' in its internal representation, while the 'Siamese Cat' and 'Electric Car' articles are 'far' from the 'Golden Retriever' article.

The clever bit

The core idea is teaching the AI not just to recognize what a document is about, but to learn the *relative similarity* between documents. By explicitly training it to bring 'good' matches closer and push 'bad' matches further away from a reference, it learns a nuanced understanding of meaning that's more effective than simply classifying documents.

What it does not cover

  • Does not cover methods that do not use a neural network for training.
  • Does not cover training methods that do not involve a target document, a favored document, and at least two unfavored documents.
  • Does not cover systems that do not calculate a 'loss' based on the distance between document representations.
  • Does not cover methods where the computer program is not 'trained' using adjustable parameters.
  • Does not cover creating an embedding space without using document vectors as input.

Patent timeline

Filing

Application submitted to the patent office

Publication

Application published, typically 18 months after filing

Grant

Patent officially issued

PatentBrief Score

Impact Score

Strong

Citation count

35/40

Highly cited

Claim breadth

15/20

Broad claimsclaimsThe numbered statements at the end of a patent that legally define what the inventor owns.Read more →

Recency

10/20

Granted 5–10 years ago

Assignee scale

0/20

Independent or smaller assigneeassigneeThe entity that owns the patent — usually the inventor's employer or a company.Read more →

PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.

Heuristic Value Estimate

What this patent might be worth

Substantial

$468K$1.5M

Midpoint $936K · 11.0 yr remaining · industry ×1.6

Adjust inputs →

Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.

The original legal language

Original claims

22 claims as filed with the patent office.

Concepts involved

ClaimPrior artNon-obviousnessNoveltySpecificationAssigneePatent term

Citations

Patent lineage

Cites earlier patents

103

earlier patents this invention cites as foundations

View prior art →

Cited by later patents

53

later patents that build on this invention

View patents →

Cite this patent

Legrand, D. G. M., Duffy, N., TSATSIN, P., & Long, P. M. (2021). Teaching Computers to Understand Document Similarity Using AI (U.S. Patent No. 10,909,459). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/10909459/federated-learning

Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.

Embed

Add this patent to your site

Drop this plain-English patent card into any blog post or article — free, no signup. It always links back to the full breakdown here.

<div data-patentlens-widget data-patent-number="US10909459"></div>
<script src="https://patentbrief.org/embed.js" async></script>

Stay in the loop

Get a weekly digest of new patents.

One email per week. No spam. Unsubscribe anytime.

Keep exploring

Related patents you should know

US 4683195 · 1987

How to Make Billions of Copies of a DNA Segment

This patent describes the Polymerase Chain Reaction (PCR), a method to rapidly create many copies of a specific piece of DNA or RNA, enabling its detection and analysis.

Cetus Corp

US 8697359 · 2014

How to Edit Genes in Human Cells Using an Engineered CRISPR System

This patent describes an engineered CRISPR-Cas9 system for precisely cutting DNA in eukaryotic cells to change how genes work, opening the door for gene editing in complex organisms.

Massachusetts Institute of Technology

US 7657849 · 2010

How the iPhone's Slide-to-Unlock Gesture Works

Apple's 2010 patent describes unlocking a device by dragging a specific graphical image across the touchscreen along a predefined path, a gesture that became iconic with the original iPhone.

Apple Inc

US 4733665 · 1988

How Doctors Implant a Permanent Stent Using a Balloon

This patent describes the method for placing a permanent, expandable wire mesh tube inside a blood vessel or other body tube using a balloon-tipped catheter to widen it and keep it open.

Expandable Grafts Partnership

US 4965188 · 1990

How to Make Many Copies of a DNA Piece with Heat

This patent describes the Polymerase Chain Reaction (PCR) method, a technique to make millions of copies of a specific DNA segment using a heat-resistant enzyme and repeated temperature changes.

Cetus Corp

US 4235871 · 1980

How to Encapsulate Active Materials in Lipid Bubbles Efficiently

This patent describes a method for trapping biologically active substances inside tiny, multi-layered fat bubbles called liposomes, using a specific water-in-oil emulsion and gel-forming process to improve how much material gets captured.

Individual

More to explore

More in Software & Internet

Browse all Software & Internet

New to patents?

What is a patent?How to read a patentAnatomy of a claimHow strong is this patent?What the citations meanWhat it doesn't coverSoftware PatentsPatent glossary

Common Questions

Frequently Asked Questions

What does Teaching Computers to Understand Document Similarity Using AI cover?

This patent describes a way to train a computer program (a neural network) to understand how similar documents are to each other, by showing it examples and teaching it to group similar ones together and separate dissimilar ones.

Who owns patent US 10909459?

Cognizant Technology Solutions US Corp owns this patent, granted in 2021.

When does this patent expire?

This patent is expected to expire on February 2, 2041, when the invention enters the public domain.

What is patent US 10909459 cited by?

This patent has been cited by 53 later patents that build on its ideas.

What problem does this patent solve?

This technology is foundational for many modern AI applications that deal with understanding and organizing large amounts of text or other data. It enables search engines, recommendation systems, and content moderation tools to better grasp the meaning and relationships between different pieces of information.

What does this patent NOT cover?

Does not cover methods that do not use a neural network for training.

Patent monitoring

Get notified when Cognizant Technology Solutions US Corp files a new patent

Get notified when this company files a new patent. Weekly digest · Confirm via email · Unsubscribe anytime.

Last reviewed: June 15, 2026 · PatentBrief is not a law firm and this is not legal advice.