# How Computers Match and Join Messy Data from Different Sources

> A method for merging datasets by identifying related but non-identical items using flexible matching rules rather than strict equality.

- **Patent:** US 9607103
- **Original title:** Fuzzy data operations
- **Owner:** Ab Initio Technology LLC
- **Granted:** 2017
- **Status:** Active
- **Times cited:** 15
- **Field:** software, ai_ml, finance, ecommerce

## What it does

This patent describes a way for computer systems to combine data from two different sources even when the information doesn't match perfectly. Instead of looking for identical values, the system uses a 'variant relation' to determine if two objects are close enough to be considered a match, such as checking if the mathematical distance between two values falls below a specific threshold. Once these matches are identified, the system evaluates the surrounding data and joins the records together to create a new, combined dataset. For example, it could link 'John Smith' in one database with 'J. Smith' in another by recognizing they are variants of the same person based on defined similarity rules.

## What it does NOT cover

- Does not cover simple database joins that rely on exact matches (e.g., matching primary keys that are identical).
- Does not cover human-manual data entry or manual reconciliation processes.
- Does not cover matching methods that are strictly limited to equivalence relations (where A must equal B).
- Does not cover the storage hardware itself, only the logical method of processing and joining the data.

## The clever bit

The system explicitly allows for 'non-equivalence relations,' meaning it can chain matches through intermediate data elements to find connections that aren't directly obvious, effectively building a bridge between disparate data points.

## Real-world examples

1. Enterprise data integration platforms
2. Customer data platforms (CDP) for deduplication
3. Automated financial record reconciliation
4. Master data management systems

## Why it matters

In large-scale data processing, data is rarely clean or perfectly formatted across different departments or companies. This patent provides a formal framework for 'fuzzy' data integration, which is essential for business intelligence, customer relationship management, and regulatory compliance where records must be consolidated despite inconsistent naming or formatting.

## Frequently asked questions

### What does How Computers Match and Join Messy Data from Different Sources cover?

A method for merging datasets by identifying related but non-identical items using flexible matching rules rather than strict equality.

### Who owns patent US 9607103?

Ab Initio Technology LLC owns this patent, granted in 2017.

### When does this patent expire?

This patent is expected to expire on March 28, 2037, when the invention enters the public domain.

### What is patent US 9607103 cited by?

This patent has been cited by 15 later patents that build on its ideas.

### What problem does this patent solve?

In large-scale data processing, data is rarely clean or perfectly formatted across different departments or companies. This patent provides a formal framework for 'fuzzy' data integration, which is essential for business intelligence, customer relationship management, and regulatory compliance where records must be consolidated despite inconsistent naming or formatting.

### What does this patent NOT cover?

Does not cover simple database joins that rely on exact matches (e.g., matching primary keys that are identical).

**Full plain-English explainer:** https://patentbrief.org/patent/us/9607103/amazon-athena

**Original patent:** https://patents.google.com/patent/US9607103

---

_Source: PatentBrief — https://patentbrief.org. Patent facts are from public records; the plain-English explanation is PatentBrief's._


## Related patents

Semantically similar inventions in the PatentBrief corpus:

- [How Computers Find Similar Text Using Compact Data Structures](https://patentbrief.org/patent/us/10878335/bert-bidirectional-encoder-representations) — This patent describes a method for efficiently identifying similar text records, like documents or product reviews, by using special compact data structures that store text terms probabilistically and then analyzing them with machine learning.
- [How Assistant Systems Combine Information About One Thing from Many Places](https://patentbrief.org/patent/us/11704899/resolving-entities-from-multiple-data-sources-for-assistant-systems) — This patent describes a system that gathers all known information about a single person, place, or thing from various sources and combines it into one complete profile for an assistant system.
- [How a Smart System Verifies and Updates Customer Data](https://patentbrief.org/patent/us/8285656/cortana-virtual-assistant) — This patent describes an automated system that uses artificial intelligence to pick the best ways to check and update information about people or businesses, choosing from methods like web searches, phone calls, or direct mail.
- [How AI Connects Different Databases Using Knowledge Graphs](https://patentbrief.org/patent/us/11507851/system-and-method-of-integrating-databases-based-on-knowledge-graph) — This patent describes a server-based method that uses artificial intelligence and two learning models to automatically find and integrate connections between data fields and data values across multiple databases that have different structures.
- [How to Build Complex Database Searches Using Venn Diagrams](https://patentbrief.org/patent/us/5966126/graphic-user-interface-for-database-system) — A method for searching databases by visually connecting Venn diagrams to represent complex logical relationships between different sets of data.