How Google Distributed Machine Learning Across Many Computers
A 2003 Google patent describing a way to build machine learning models by splitting the work across a large network of computers rather than a single machine.
Original patent title: “Large scale machine learning systems and methods”
A 2003 Google patent describing a way to build machine learning models by splitting the work across a large network of computers rather than a single machine. Granted to Google LLC in 2007 with 45 claims and 72 forward citations.
Key facts
Coverage
What does this patent actually cover?
This patent describes a distributed system where a large machine learning model is built by multiple computer nodes working together. Instead of one computer processing all data, one node selects a 'candidate condition'—a potential rule for the model—and asks other nodes to provide statistics about how often that condition occurs in their specific slice of data. These nodes calculate derivatives of log-likelihood or histograms to help determine if the rule is useful. Finally, the system aggregates this information to decide whether to add the rule to the final model, effectively allowing the model to grow in complexity by leveraging the combined power of the entire network.
The gap
What does this patent NOT cover?
- Does not cover machine learning models that run entirely on a single processor or single computer node.
- Does not cover specific neural network architectures like Transformers or CNNs, as the claimsclaimsThe numbered statements at the end of a patent that legally define what the inventor owns.Read more → focus on rule-based model generation.
- Does not cover real-time inference or prediction methods, only the process of generating the model itself.
These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.
What made this novel
The system uses a 'feature-to-instance index' to quickly identify which data points satisfy a condition, then offloads the heavy mathematical lifting (calculating derivatives) to the nodes that actually hold the data, minimizing the need to move large datasets across the network.
Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.
Where you've seen this
Real-world examples
Google Search ranking algorithms
Large-scale ad-click prediction systems
Distributed training clusters in data centers
Why it matters
The bigger picture
This patent represents an early architectural blueprint for the massive-scale computing that defines modern Google. By enabling models to be trained across distributed clusters, it allowed for the processing of datasets far too large for the hardware of the early 2000s, laying the groundwork for the company's dominance in search ranking and ad-targeting algorithms.
Filed
December 15, 2003
Granted
May 22, 2007
Market context
Who's building on this
Companies in this space
Google continues to iterate on these distributed training concepts through its internal infrastructure and public tools like TensorFlow. Major cloud providers like AWS and Microsoft Azure have since built their own proprietary frameworks that utilize similar principles of sharding data and distributing gradient calculations across compute clusters.
Market impact
This patent helped solidify the shift toward 'Big Data' infrastructure, where the ability to scale compute horizontally became more important than the speed of a single processor. It effectively created a competitive moat for companies that could afford to build massive, interconnected data centers, forcing the rest of the industry to adopt distributed computing paradigms to remain relevant in search and advertising.
Claim 1 — Plain English
What this patent covers
This patent describes a distributed system where a large machine learning model is built by multiple computer nodes working together. Instead of one computer processing all data, one node selects a 'candidate condition'—a potential rule for the model—and asks other nodes to provide statistics about how often that condition occurs in their specific slice of data. These nodes calculate derivatives of log-likelihood or histograms to help determine if the rule is useful. Finally, the system aggregates this information to decide whether to add the rule to the final model, effectively allowing the model to grow in complexity by leveraging the combined power of the entire network.
The clever bit
The system uses a 'feature-to-instance index' to quickly identify which data points satisfy a condition, then offloads the heavy mathematical lifting (calculating derivatives) to the nodes that actually hold the data, minimizing the need to move large datasets across the network.
What it does not cover
- Does not cover machine learning models that run entirely on a single processor or single computer node.
- Does not cover specific neural network architectures like Transformers or CNNs, as the claims focus on rule-based model generation.
- Does not cover real-time inference or prediction methods, only the process of generating the model itself.
Patent timeline
Application submitted to the patent office
Application published, typically 18 months after filing
Patent officially issued
PatentBrief Score
Impact Score
High impact
Citation count
37/40
Highly cited
Claim breadth
20/20
Very broad protection
Recency
5/20
Granted 10–20 years ago
Assignee scale
20/20
Major company or institution
PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.
Heuristic Value Estimate
What this patent might be worth
$86K – $276K
Midpoint $173K · expired or expiring · industry ×1.6
Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.
The original legal language
Original claims
45 claims as filed with the patent office.
Concepts involved
Citations
Patent lineage
Cite this patent
Harik, G. R., Tong, S., Shazeer, N., Bem, J., & Levenberg, J. L. (2007). How Google Distributed Machine Learning Across Many Computers (U.S. Patent No. 7,222,127). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/7222127/google-adsense
Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.
Embed
Add this patent to your site
Drop this plain-English patent card into any blog post or article — free, no signup. It always links back to the full breakdown here.
<div data-patentlens-widget data-patent-number="US7222127"></div> <script src="https://patentbrief.org/embed.js" async></script>
Stay in the loop
Get a weekly digest of new patents.
One email per week. No spam. Unsubscribe anytime.
Keep exploring
Related patents you should know
US 4683195 · 1987
How to Make Billions of Copies of a DNA Segment
This patent describes the Polymerase Chain Reaction (PCR), a method to rapidly create many copies of a specific piece of DNA or RNA, enabling its detection and analysis.
Cetus Corp
US 8697359 · 2014
How to Edit Genes in Human Cells Using an Engineered CRISPR System
This patent describes an engineered CRISPR-Cas9 system for precisely cutting DNA in eukaryotic cells to change how genes work, opening the door for gene editing in complex organisms.
Massachusetts Institute of Technology
US 7657849 · 2010
How the iPhone's Slide-to-Unlock Gesture Works
Apple's 2010 patent describes unlocking a device by dragging a specific graphical image across the touchscreen along a predefined path, a gesture that became iconic with the original iPhone.
Apple Inc
US 4733665 · 1988
How Doctors Implant a Permanent Stent Using a Balloon
This patent describes the method for placing a permanent, expandable wire mesh tube inside a blood vessel or other body tube using a balloon-tipped catheter to widen it and keep it open.
Expandable Grafts Partnership
US 4965188 · 1990
How to Make Many Copies of a DNA Piece with Heat
This patent describes the Polymerase Chain Reaction (PCR) method, a technique to make millions of copies of a specific DNA segment using a heat-resistant enzyme and repeated temperature changes.
Cetus Corp
US 4235871 · 1980
How to Encapsulate Active Materials in Lipid Bubbles Efficiently
This patent describes a method for trapping biologically active substances inside tiny, multi-layered fat bubbles called liposomes, using a specific water-in-oil emulsion and gel-forming process to improve how much material gets captured.
Individual
More to explore
More in Software & Internet
US 4405829 · 1983 · Massachusetts Institute of Technology
How RSA Public-Key Encryption Keeps Digital Messages Secret
US 6285999 · 2001 · Leland Stanford Junior University
How Websites Get Ranked by Importance
US 5960411 · 1999 · Amazon com Inc
How Amazon's One-Click Ordering Works for Online Purchases
US 7669123 · 2010 · Facebook Inc
Displaying Friends' Activities in a Social Network Feed
New to patents?
Common Questions
Frequently Asked Questions
What does How Google Distributed Machine Learning Across Many Computers cover?
A 2003 Google patent describing a way to build machine learning models by splitting the work across a large network of computers rather than a single machine.
Who owns patent US 7222127?
Google LLC owns this patent, granted in 2007.
When does this patent expire?
This patent is expected to expire on May 22, 2027, when the invention enters the public domain.
What is patent US 7222127 cited by?
This patent has been cited by 72 later patents that build on its ideas.
What problem does this patent solve?
This patent represents an early architectural blueprint for the massive-scale computing that defines modern Google. By enabling models to be trained across distributed clusters, it allowed for the processing of datasets far too large for the hardware of the early 2000s, laying the groundwork for the company's dominance in search ranking and ad-targeting algorithms.
What does this patent NOT cover?
Does not cover machine learning models that run entirely on a single processor or single computer node.
Same assignee
More from Google LLC
Patent monitoring



