Skip to content
PatentBrief
Get alertsTop ↑

How Computers Match and Join Messy Data from Different Sources

A method for merging datasets by identifying related but non-identical items using flexible matching rules rather than strict equality.

Granted 2017ActiveExpires 2033Owned by Ab Initio Technology LLCInvented by Arlen Anderson

Original patent title: “Fuzzy data operations

Plain-English explanation by SahiLast reviewed · June 15, 2026

A method for merging datasets by identifying related but non-identical items using flexible matching rules rather than strict equality. Granted to Ab Initio Technology LLC in 2017 with 23 claims and 15 forward citations.

Key facts

Patent numberUS 9607103
StatusActive
FieldSoftware & Internet
AssigneeAb Initio Technology LLC
InventorArlen Anderson
Filed2013
Granted2017
Claims23
Times cited15
LitigationNone on record
Value · $87K$280KModest

Coverage

What does this patent actually cover?

This patent describes a way for computer systems to combine data from two different sources even when the information doesn't match perfectly. Instead of looking for identical values, the system uses a 'variant relation' to determine if two objects are close enough to be considered a match, such as checking if the mathematical distance between two values falls below a specific threshold. Once these matches are identified, the system evaluates the surrounding data and joins the records together to create a new, combined dataset. For example, it could link 'John Smith' in one database with 'J. Smith' in another by recognizing they are variants of the same person based on defined similarity rules.

The gap

What does this patent NOT cover?

  • Does not cover simple database joins that rely on exact matches (e.g., matching primary keys that are identical).
  • Does not cover human-manual data entry or manual reconciliation processes.
  • Does not cover matching methods that are strictly limited to equivalence relations (where A must equal B).
  • Does not cover the storage hardware itself, only the logical method of processing and joining the data.

These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.

What made this novel

The system explicitly allows for 'non-equivalence relations,' meaning it can chain matches through intermediate data elements to find connections that aren't directly obvious, effectively building a bridge between disparate data points.

Fuzzy data operations(Primary claim)softwareai mlfinanceecommerce

Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.

Where you've seen this

Real-world examples

01

Enterprise data integration platforms

02

Customer data platforms (CDP) for deduplication

03

Automated financial record reconciliation

04

Master data management systems

Why it matters

The bigger picture

In large-scale data processing, data is rarely clean or perfectly formatted across different departments or companies. This patent provides a formal framework for 'fuzzy' data integration, which is essential for business intelligence, customer relationship management, and regulatory compliance where records must be consolidated despite inconsistent naming or formatting.

Filed

January 23, 2013

Granted

March 28, 2017

Market context

Who's building on this

Companies in this space

Ab Initio Technology remains a primary player in high-performance data processing. Other companies in the data integration and master data management space, such as Informatica and Talend, utilize similar logic for fuzzy matching and data quality workflows.

Market impact

This patent reinforces the shift toward automated data quality tools that reduce the need for manual data cleaning. It supports the infrastructure of modern data lakes and warehouses where disparate data sources must be unified to provide accurate business insights.

Claim 1 — Plain English

What this patent covers

This patent describes a way for computer systems to combine data from two different sources even when the information doesn't match perfectly. Instead of looking for identical values, the system uses a 'variant relation' to determine if two objects are close enough to be considered a match, such as checking if the mathematical distance between two values falls below a specific threshold. Once these matches are identified, the system evaluates the surrounding data and joins the records together to create a new, combined dataset. For example, it could link 'John Smith' in one database with 'J. Smith' in another by recognizing they are variants of the same person based on defined similarity rules.

The clever bit

The system explicitly allows for 'non-equivalence relations,' meaning it can chain matches through intermediate data elements to find connections that aren't directly obvious, effectively building a bridge between disparate data points.

What it does not cover

  • Does not cover simple database joins that rely on exact matches (e.g., matching primary keys that are identical).
  • Does not cover human-manual data entry or manual reconciliation processes.
  • Does not cover matching methods that are strictly limited to equivalence relations (where A must equal B).
  • Does not cover the storage hardware itself, only the logical method of processing and joining the data.

Patent timeline

Filing

Application submitted to the patent office

Publication

Application published, typically 18 months after filing

Grant

Patent officially issued

PatentBrief Score

Impact Score

Moderate

Citation count

24/40

Moderately cited

Claim breadth

15/20

Broad claimsclaimsThe numbered statements at the end of a patent that legally define what the inventor owns.Read more →

Recency

10/20

Granted 5–10 years ago

Assignee scale

0/20

Independent or smaller assigneeassigneeThe entity that owns the patent — usually the inventor's employer or a company.Read more →

PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.

Heuristic Value Estimate

What this patent might be worth

Modest

$87K$280K

Midpoint $175K · 6.6 yr remaining · industry ×1.6

Adjust inputs →

Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.

The original legal language

Original claims

23 claims as filed with the patent office.

Concepts involved

ClaimPrior artNon-obviousnessNoveltySpecificationAssigneePatent term

Citations

Patent lineage

Cites earlier patents

99

earlier patents this invention cites as foundations

View prior art →

Cited by later patents

15

later patents that build on this invention

View patents →

Cite this patent

Anderson, A. (2017). How Computers Match and Join Messy Data from Different Sources (U.S. Patent No. 9,607,103). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/9607103/amazon-athena

Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.

Embed

Add this patent to your site

Drop this plain-English patent card into any blog post or article — free, no signup. It always links back to the full breakdown here.

<div data-patentlens-widget data-patent-number="US9607103"></div>
<script src="https://patentbrief.org/embed.js" async></script>

Stay in the loop

Get a weekly digest of new patents.

One email per week. No spam. Unsubscribe anytime.

Keep exploring

Related patents you should know

US 4683195 · 1987

How to Make Billions of Copies of a DNA Segment

This patent describes the Polymerase Chain Reaction (PCR), a method to rapidly create many copies of a specific piece of DNA or RNA, enabling its detection and analysis.

Cetus Corp

US 8697359 · 2014

How to Edit Genes in Human Cells Using an Engineered CRISPR System

This patent describes an engineered CRISPR-Cas9 system for precisely cutting DNA in eukaryotic cells to change how genes work, opening the door for gene editing in complex organisms.

Massachusetts Institute of Technology

US 7657849 · 2010

How the iPhone's Slide-to-Unlock Gesture Works

Apple's 2010 patent describes unlocking a device by dragging a specific graphical image across the touchscreen along a predefined path, a gesture that became iconic with the original iPhone.

Apple Inc

US 4733665 · 1988

How Doctors Implant a Permanent Stent Using a Balloon

This patent describes the method for placing a permanent, expandable wire mesh tube inside a blood vessel or other body tube using a balloon-tipped catheter to widen it and keep it open.

Expandable Grafts Partnership

US 4965188 · 1990

How to Make Many Copies of a DNA Piece with Heat

This patent describes the Polymerase Chain Reaction (PCR) method, a technique to make millions of copies of a specific DNA segment using a heat-resistant enzyme and repeated temperature changes.

Cetus Corp

US 4235871 · 1980

How to Encapsulate Active Materials in Lipid Bubbles Efficiently

This patent describes a method for trapping biologically active substances inside tiny, multi-layered fat bubbles called liposomes, using a specific water-in-oil emulsion and gel-forming process to improve how much material gets captured.

Individual

More to explore

More in Software & Internet

Browse all Software & Internet

New to patents?

What is a patent?How to read a patentAnatomy of a claimHow strong is this patent?What the citations meanWhat it doesn't coverSoftware PatentsPatent glossary

Common Questions

Frequently Asked Questions

What does How Computers Match and Join Messy Data from Different Sources cover?

A method for merging datasets by identifying related but non-identical items using flexible matching rules rather than strict equality.

Who owns patent US 9607103?

Ab Initio Technology LLC owns this patent, granted in 2017.

When does this patent expire?

This patent is expected to expire on March 28, 2037, when the invention enters the public domain.

What is patent US 9607103 cited by?

This patent has been cited by 15 later patents that build on its ideas.

What problem does this patent solve?

In large-scale data processing, data is rarely clean or perfectly formatted across different departments or companies. This patent provides a formal framework for 'fuzzy' data integration, which is essential for business intelligence, customer relationship management, and regulatory compliance where records must be consolidated despite inconsistent naming or formatting.

What does this patent NOT cover?

Does not cover simple database joins that rely on exact matches (e.g., matching primary keys that are identical).

Patent monitoring

Get notified when Ab Initio Technology LLC files a new patent

Get notified when this company files a new patent. Weekly digest · Confirm via email · Unsubscribe anytime.

Last reviewed: June 15, 2026 · PatentBrief is not a law firm and this is not legal advice.