Skip to content
PatentBrief
Get alertsTop ↑

How AI Models Use Self-Attention to Understand Language and Sequences

This patent describes a neural network architecture, known as a Transformer, that uses a self-attention mechanism to process sequences of data, allowing it to understand context and relationships between different parts of the input.

Granted 2019activeExpires 2038Owned by Google LLCInvented by Noam M. Shazeer, Aidan Nicholas Gomez, Lukasz Mieczyslaw Kaiser + 5 more

Original patent title: “Attention-based sequence transduction neural networks

Plain-English explanation by SahiLast reviewed · May 28, 2026
Value · $216K$691KModest

What this patent covers

The actual claim

The patent describes a system for converting one sequence of information into another, like translating text. It uses an "encoder neural network" to take an input sequence, such as a sentence, and create a coded representation of each word or piece of information (ClaimclaimA numbered sentence at the end of a patent that legally defines what the inventor owns. The most important section.Read more → 1). This encoder has multiple "subnetworks," each containing a "self-attention sub-layer." This sub-layer looks at all parts of the input sequence simultaneously for each particular input position. It does this by creating a "query" from the current input, and "keys" and "values" from all inputs, then using these to figure out how much attention to pay to other parts of the sequence (Claim 1). For example, when processing the word "bank" in a sentence, the self-attention mechanism helps the network understand if "bank" refers to a financial institution or a river bank by looking at other words in the sentence. The coded representations are then passed to a "decoder neural network" to generate the final output sequence.

What this patent does NOT cover

The boundaries

  • Does not cover neural networks that process sequences without using a self-attention mechanism.
  • Does not cover attention mechanisms that do not determine queries, keys, and values from the subnetwork inputs.
  • Does not cover models that process data without an encoder-decoder structure for sequence transduction.
  • Does not cover recurrent neural networks (RNNs) or convolutional neural networks (CNNs) that process sequences without a self-attention sub-layer.
  • Does not cover models that process single data points rather than sequences of inputs.

These exclusions are unique to PatentBrief — derived from the actual claim language, not patent-office boilerplate.

What made this novel

The truly novel aspect is the "self-attention mechanism." Instead of processing sequences one step at a time, like older models, self-attention allows the network to weigh the importance of all other parts of the input sequence *to itself* when processing each individual part. This parallel processing and direct connection between distant parts of a sequence made models much more efficient and effective at understanding long-range dependencies.

The Patent Drawing

Representative patent drawing for Attention-based sequence transduction neural networks (US 10452978)
Representative figure · US 10452978All figures on Google Patents →
Attention-based sequence trans…(Primary claim)ai mlsoftwaretelecommunicationsconsumer electronics

Schematic visualization of the patent's claim structure. Hand-drawn diagrams in progress for each landmark patent.

Where you've seen this

Real-world examples

01

Google Translate

02

ChatGPT and other large language models

03

Google Search ranking algorithms

04

AI assistants like Google Assistant and Amazon Alexa

05

Code generation tools

06

Text summarization software

Why it matters

The bigger picture

This patent is based on the Transformer architecture, which revolutionized how artificial intelligence processes sequential data, especially in natural language processing. It allowed for significant advancements in machine translation, text generation, and understanding complex language structures. The underlying principles are now fundamental to many of the most powerful AI models in use today.

Filed

June 28, 2018

Granted

October 22, 2019

Market context

Who's building on this

Companies in this space

Google LLC, the assigneeassigneeThe entity that owns the patent — usually the inventor's employer or a company.Read more →, continues to be a leader in developing and deploying Transformer-based models across its products. Major AI research labs and companies like OpenAI, Meta, Microsoft, and Anthropic are also heavily invested in building upon and advancing this foundational technology, particularly for large language models and other generative AI applications.

Market impact

This patent's underlying technology, the Transformer architecture, profoundly impacted the AI market. It enabled the creation of large language models, leading to a new era of AI capabilities in natural language understanding and generation. This shift spurred massive investment in AI research and development, created new product categories like AI assistants and generative AI tools, and became a core component of many modern AI systems across various industries.

Claim 1 — Plain English

What this patent covers

The patent describes a system for converting one sequence of information into another, like translating text. It uses an "encoder neural network" to take an input sequence, such as a sentence, and create a coded representation of each word or piece of information (Claim 1). This encoder has multiple "subnetworks," each containing a "self-attention sub-layer." This sub-layer looks at all parts of the input sequence simultaneously for each particular input position. It does this by creating a "query" from the current input, and "keys" and "values" from all inputs, then using these to figure out how much attention to pay to other parts of the sequence (Claim 1). For example, when processing the word "bank" in a sentence, the self-attention mechanism helps the network understand if "bank" refers to a financial institution or a river bank by looking at other words in the sentence. The coded representations are then passed to a "decoder neural network" to generate the final output sequence.

The clever bit

The truly novel aspect is the "self-attention mechanism." Instead of processing sequences one step at a time, like older models, self-attention allows the network to weigh the importance of all other parts of the input sequence *to itself* when processing each individual part. This parallel processing and direct connection between distant parts of a sequence made models much more efficient and effective at understanding long-range dependencies.

What it does not cover

  • Does not cover neural networks that process sequences without using a self-attention mechanism.
  • Does not cover attention mechanisms that do not determine queries, keys, and values from the subnetwork inputs.
  • Does not cover models that process data without an encoder-decoder structure for sequence transduction.
  • Does not cover recurrent neural networks (RNNs) or convolutional neural networks (CNNs) that process sequences without a self-attention sub-layer.
  • Does not cover models that process single data points rather than sequences of inputs.

Patent Journey

From filing to today

Patent Filed

2018

Patent Granted

2019 · 1yr after filing

Active Today

2026

Expires

2038

PatentBrief Score

Impact Score

83/ 100

High impact

Citation count

33/40

Moderately cited

Claim breadth

20/20

Very broad protection

Recency

10/20

Granted 5–10 years ago

Assignee scale

20/20

Major technology company

PatentBrief Impact Score — based on citation count, claim breadth, recency, and assignee scale. Not a legal assessment.

Heuristic Value Estimate

What this patent might be worth

Modest

$216K$691K

Midpoint $432K · 12.1 yr remaining · industry baseline

Adjust inputs →

Heuristic only — blends forward/backward citation counts, claim scope, time remaining, litigation history, and CPC-derived industry baseline. Real valuations need a professional appraisal.

The original legal language

Original claims

33 claims as filed with the patent office.

Citations

Patent lineage

Cites earlier patents

35

earlier patents this invention cites as foundations

View prior art →

Cited by later patents

44

later patents that build on this invention

View patents →

Cite this patent

Shazeer, N. M., Gomez, A. N., Kaiser, L. M., Uszkoreit, J. D., Jones, L. O., Parmar, N. J., Polosukhin, I., & Vaswani, A. T. (2019). How AI Models Use Self-Attention to Understand Language and Sequences (U.S. Patent No. 10,452,978). U.S. Patent and Trademark Office. https://patentbrief.org/patent/us/10452978/openai-attention-mechanism

Auto-generated from the patent record. Double-check author order and the issue date against the official USPTO document before submitting.

Stay in the loop

Get a weekly digest of new patents.

One email per week. No spam. Unsubscribe anytime.

Keep exploring

Related patents you should know

US 8697359 · 2014

How to Use CRISPR-Cas9 to Edit Genes in Human Cells

This patent describes a method and system for precisely altering gene expression in eukaryotic cells, including human cells, using an engineered CRISPR-Cas9 system that targets and cleaves specific DNA sequences.

Massachusetts Institute of Technology

US 4683195 · 1987

How to Make Many Copies of a Specific DNA Segment

This patent describes the Polymerase Chain Reaction (PCR), a fundamental process for making millions of copies of a specific DNA or RNA segment from a tiny sample, enabling its detection.

Cetus Corp

US 7657849 · 2010

How the iPhone's Slide-to-Unlock Gesture Worked

Apple's 2010 patent on unlocking a device by dragging a specific graphical image along a predefined path on a touchscreen, a gesture iconic with early iPhones.

Apple Inc

US 4405829 · 1983

How RSA Public-Key Encryption Secures Digital Messages

This patent describes the RSA public-key cryptographic system, a method for securely sending digital messages by using a public key to encrypt and a private key to decrypt, based on the mathematical difficulty of factoring large numbers.

Massachusetts Institute of Technology

US 7479949 · 2009

How Touchscreens Tell the Difference Between Your Finger Gestures

Apple's 2009 patent describes how a touchscreen device uses clever rules, called heuristics, to figure out whether your finger movement means you want to scroll, pan, or switch items, often by looking at the very start of your touch.

Apple Inc

US 5347632 · 1994

How Early Online Services Delivered Applications Using Networked 'Objects'

This patent describes a system for early interactive computer networks, like Prodigy, that allowed personal computers to display information and perform services by fetching and storing small pieces of application code and data called 'objects' from a central network.

Prodigy Services Co

Semantically similar

You might also find these interesting

SEARCH ALL

Same assignee

More from Google LLC

View all →
US 8311950·2012

How Social Networks Guess Your Interests from Who Views What

US 7865399·2011

How a Central Broker Handles Online Shopping Cart Transactions

Patent monitoring

Get notified when Google LLC files a new patent

Get notified when this company files a new patent. Weekly digest · Confirm via email · Unsubscribe anytime.

Last reviewed: May 28, 2026 · PatentBrief is not a law firm and this is not legal advice.