# How Computers Match and Join Messy Data from Different Sources

> A method for merging datasets by identifying related but non-identical items using flexible matching rules rather than strict equality.

- **Patent:** US 9607103
- **Original title:** Fuzzy data operations
- **Owner:** Ab Initio Technology LLC
- **Granted:** 2017
- **Status:** Active
- **Times cited:** 15
- **Field:** software, ai_ml, finance, ecommerce

## What it does

This patent describes a way for computer systems to combine data from two different sources even when the information doesn't match perfectly. Instead of looking for identical values, the system uses a 'variant relation' to determine if two objects are close enough to be considered a match, such as checking if the mathematical distance between two values falls below a specific threshold. Once these matches are identified, the system evaluates the surrounding data and joins the records together to create a new, combined dataset. For example, it could link 'John Smith' in one database with 'J. Smith' in another by recognizing they are variants of the same person based on defined similarity rules.

## What it does NOT cover

- Does not cover simple database joins that rely on exact matches (e.g., matching primary keys that are identical).
- Does not cover human-manual data entry or manual reconciliation processes.
- Does not cover matching methods that are strictly limited to equivalence relations (where A must equal B).
- Does not cover the storage hardware itself, only the logical method of processing and joining the data.

## The clever bit

The system explicitly allows for 'non-equivalence relations,' meaning it can chain matches through intermediate data elements to find connections that aren't directly obvious, effectively building a bridge between disparate data points.

## Real-world examples

1. Enterprise data integration platforms
2. Customer data platforms (CDP) for deduplication
3. Automated financial record reconciliation
4. Master data management systems

## Why it matters

In large-scale data processing, data is rarely clean or perfectly formatted across different departments or companies. This patent provides a formal framework for 'fuzzy' data integration, which is essential for business intelligence, customer relationship management, and regulatory compliance where records must be consolidated despite inconsistent naming or formatting.

## Frequently asked questions

### What does How Computers Match and Join Messy Data from Different Sources cover?

A method for merging datasets by identifying related but non-identical items using flexible matching rules rather than strict equality.

### Who owns patent US 9607103?

Ab Initio Technology LLC owns this patent, granted in 2017.

### When does this patent expire?

This patent is expected to expire on March 28, 2037, when the invention enters the public domain.

### What is patent US 9607103 cited by?

This patent has been cited by 15 later patents that build on its ideas.

### What problem does this patent solve?

In large-scale data processing, data is rarely clean or perfectly formatted across different departments or companies. This patent provides a formal framework for 'fuzzy' data integration, which is essential for business intelligence, customer relationship management, and regulatory compliance where records must be consolidated despite inconsistent naming or formatting.

### What does this patent NOT cover?

Does not cover simple database joins that rely on exact matches (e.g., matching primary keys that are identical).

**Full plain-English explainer:** https://patentbrief.org/patent/us/9607103/amazon-athena

**Original patent:** https://patents.google.com/patent/US9607103

---

_Source: PatentBrief — https://patentbrief.org. Patent facts are from public records; the plain-English explanation is PatentBrief's._
