How AI Models Use Self-Attention to Understand Language and Sequences
This patent describes a neural network architecture, known as a Transformer, that uses a self-attention mechanism to process sequences of data, allowing it to understand context and relationships between different parts of the input.
Original patent title: “Attention-based sequence transduction neural networks”
What this patent covers
The actual claim
The patent describes a system for converting one sequence of information into another, like translating text. It uses an "encoder neural network" to take an input sequence, such as a sentence, and create a coded representation of each word or piece of information (ClaimclaimA numbered sentence at the end of a patent that legally defines what the inventor owns. The most important section.Read more → 1). This encoder has multiple "subnetworks," each containing a "self-attention sub-layer." This sub-layer looks at all parts of the input sequence simultaneously for each particular input position. It does this by creating a "query" from the current input, and "keys" and "values" from all inputs, then using these to figure out how much attention to pay to other parts of the sequence (Claim 1). For example, when processing the word "bank" in a sentence, the self-attention mechanism helps the network understand if "bank" refers to a financial institution or a river bank by looking at other words in the sentence. The coded representations are then passed to a "decoder neural network" to generate the final output sequence.
What this patent does NOT cover
The boundaries
- Does not cover neural networks that process sequences without using a self-attention mechanism.
- Does not cover attention mechanisms that do not determine queries, keys, and values from the subnetwork inputs.
- Does not cover models that process data without an encoder-decoder structure for sequence transduction.
- Does not cover recurrent neural networks (RNNs) or convolutional neural networks (CNNs) that process sequences without a self-attention sub-layer.
- Does not cover models that process single data points rather than sequences of inputs.
These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.
What made this novel
The truly novel aspect is the "self-attention mechanism." Instead of processing sequences one step at a time, like older models, self-attention allows the network to weigh the importance of all other parts of the input sequence *to itself* when processing each individual part. This parallel processing and direct connection between distant parts of a sequence made models much more efficient and effective at understanding long-range dependencies.
The Patent Drawing

Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.
Where you've seen this
Real-world examples
Google Translate
ChatGPT and other large language models
Google Search ranking algorithms
AI assistants like Google Assistant and Amazon Alexa
Code generation tools
Text summarization software
Why it matters
The bigger picture
This patent is based on the Transformer architecture, which revolutionized how artificial intelligence processes sequential data, especially in natural language processing. It allowed for significant advancements in machine translation, text generation, and understanding complex language structures. The underlying principles are now fundamental to many of the most powerful AI models in use today.
Filed
June 28, 2018
Granted
October 22, 2019
Market context
Who's building on this
Companies in this space
Google LLC, the assigneeassigneeThe entity that owns the patent — usually the inventor's employer or a company.Read more →, continues to be a leader in developing and deploying Transformer-based models across its products. Major AI research labs and companies like OpenAI, Meta, Microsoft, and Anthropic are also heavily invested in building upon and advancing this foundational technology, particularly for large language models and other generative AI applications.
Market impact
This patent's underlying technology, the Transformer architecture, profoundly impacted the AI market. It enabled the creation of large language models, leading to a new era of AI capabilities in natural language understanding and generation. This shift spurred massive investment in AI research and development, created new product categories like AI assistants and generative AI tools, and became a core component of many modern AI systems across various industries.
Claim 1 — Plain English
What this patent covers
The patent describes a system for converting one sequence of information into another, like translating text. It uses an "encoder neural network" to take an input sequence, such as a sentence, and create a coded representation of each word or piece of information (Claim 1). This encoder has multiple "subnetworks," each containing a "self-attention sub-layer." This sub-layer looks at all parts of the input sequence simultaneously for each particular input position. It does this by creating a "query" from the current input, and "keys" and "values" from all inputs, then using these to figure out how much attention to pay to other parts of the sequence (Claim 1). For example, when processing the word "bank" in a sentence, the self-attention mechanism helps the network understand if "bank" refers to a financial institution or a river bank by looking at other words in the sentence. The coded representations are then passed to a "decoder neural network" to generate the final output sequence.
The clever bit
The truly novel aspect is the "self-attention mechanism." Instead of processing sequences one step at a time, like older models, self-attention allows the network to weigh the importance of all other parts of the input sequence *to itself* when processing each individual part. This parallel processing and direct connection between distant parts of a sequence made models much more efficient and effective at understanding long-range dependencies.
What it does not cover
- Does not cover neural networks that process sequences without using a self-attention mechanism.
- Does not cover attention mechanisms that do not determine queries, keys, and values from the subnetwork inputs.
- Does not cover models that process data without an encoder-decoder structure for sequence transduction.
- Does not cover recurrent neural networks (RNNs) or convolutional neural networks (CNNs) that process sequences without a self-attention sub-layer.
- Does not cover models that process single data points rather than sequences of inputs.
Patent Journey
From filing to today
Patent Filed
2018
Patent Granted
2019 · 1yr after filing
Active Today
2026
Expires
2038
PatentBrief Score
Impact Score
High impact
Citation count
33/40
Moderately cited
Claim breadth
20/20
Very broad protection
Recency
10/20
Granted 5–10 years ago
Assignee scale
20/20
Major technology company
PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.
Heuristic Value Estimate
What this patent might be worth
$216K – $691K
Midpoint $432K · 12.1 yr remaining · industry baseline
Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.
The original legal language
Original claims
33 claims as filed with the patent office.
Citations
Patent lineage
Cite this patent
Shazeer, N. M., Gomez, A. N., Kaiser, L. M., Uszkoreit, J. D., Jones, L. O., Parmar, N. J., Polosukhin, I., & Vaswani, A. T. (2019). How AI Models Use Self-Attention to Understand Language and Sequences (U.S. Patent No. 10,452,978). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/10452978/openai-attention-mechanism
Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.
Stay in the loop
Get a weekly digest of new patents.
One email per week. No spam. Unsubscribe anytime.
Keep exploring
Related patents you should know
US 8697359 · 2014
How to Use CRISPR-Cas9 to Edit Genes in Human Cells
This patent describes a method and system for precisely altering gene expression in eukaryotic cells, including human cells, using an engineered CRISPR-Cas9 system that targets and cleaves specific DNA sequences.
Massachusetts Institute of Technology
US 4683195 · 1987
How to Make Many Copies of a Specific DNA Segment
This patent describes the Polymerase Chain Reaction (PCR), a fundamental process for making millions of copies of a specific DNA or RNA segment from a tiny sample, enabling its detection.
Cetus Corp
US 7657849 · 2010
How the iPhone's Slide-to-Unlock Gesture Worked
Apple's 2010 patent on unlocking a device by dragging a specific graphical image along a predefined path on a touchscreen, a gesture iconic with early iPhones.
Apple Inc
US 4405829 · 1983
How RSA Public-Key Encryption Secures Digital Messages
This patent describes the RSA public-key cryptographic system, a method for securely sending digital messages by using a public key to encrypt and a private key to decrypt, based on the mathematical difficulty of factoring large numbers.
Massachusetts Institute of Technology
US 7479949 · 2009
How Touchscreens Tell the Difference Between Your Finger Gestures
Apple's 2009 patent describes how a touchscreen device uses clever rules, called heuristics, to figure out whether your finger movement means you want to scroll, pan, or switch items, often by looking at the very start of your touch.
Apple Inc
US 5347632 · 1994
How Early Online Services Delivered Applications Using Networked 'Objects'
This patent describes a system for early interactive computer networks, like Prodigy, that allowed personal computers to display information and perform services by fetching and storing small pieces of application code and data called 'objects' from a central network.
Prodigy Services Co
Semantically similar
You might also find these interesting
US 9965705 · 2018 · Baidu USA LLC
AI System for Answering Questions About Pictures Using Attention
US 11727263 · 2023 · Samsung Electronics Co Ltd
How AI Learns to Write Better Sentences Using Feedback
US 4914603 · 1990 · GTE Laboratories Inc
How an Early Neural Network Learned Faster with a Special Variable
US 10824959 · 2020 · Amazon Technologies Inc
How AI Models Can Explain Their Decisions with Simple Rules
Same assignee
More from Google LLC
Patent monitoring



