How to Fix Faulty Memory Cells in AI Chips
This patent describes a system that tests individual memory cells in AI chips for uneven behavior and then permanently disables the faulty ones before the chip starts learning, making AI training more efficient.
Patent Number
US 10956815
Status
Active
Filing Date
May 31, 2017
Grant Date
March 23, 2021
Expiration
May 31, 2037
Claims
23
Assignee
International Business Machines
Inventors
Tayfun Gokmen
Citations
3 forward · 50 backward
What it covers
The patent describes a system for improving neural network training on specialized hardware called Resistive Processing Unit (RPU) arrays. An RPU array has many tiny memory cells (RPUs) arranged in a grid, where each RPU's electrical state (its "conduction state") stores a "weight" for the AI. Before training begins, a controller measures each RPU's "asymmetry value" by sending a positive electrical pulse and a negative electrical pulse and comparing how much the RPU's conduction state changes for each (Claim 1). If an RPU's asymmetry is too high, meaning it behaves differently depending on the electrical direction, the system "burns" it by applying a high voltage to permanently disable it (Claim 2, Claim 6). This ensures only reliable RPUs are used for training.
What it doesn't cover
- —Does not cover systems that train neural networks on traditional silicon chips like CPUs or GPUs, as it specifically targets RPU arrays.
- —Does not cover methods of improving RPU array performance that don't involve measuring and disabling asymmetric RPUs.
- —Does not cover RPU arrays where faulty units are simply ignored or remapped instead of being physically "burned" by a high voltage.
- —Does not cover identifying faulty RPUs *during* or *after* the neural network training process.
- —Does not cover RPUs that store information without also locally performing data processing operations (Claim 9).
The clever bit
The novelty lies in proactively identifying and permanently disabling individual, unevenly behaving memory cells (RPUs) *before* the main AI training process even starts. This prevents faulty components from corrupting the learning process or requiring complex software workarounds later.
Why it matters
Training large neural networks is computationally intensive and energy-hungry. This patent addresses a fundamental challenge in building specialized AI hardware: the inherent imperfections of analog memory components like RPUs. By identifying and disabling faulty RPUs early, it aims to make these AI accelerators more reliable and efficient, potentially speeding up AI development and reducing power consumption for complex models.
Real-world examples
- 1.IBM's AI hardware accelerators
- 2.Neuromorphic computing chips
- 3.In-memory computing architectures
- 4.Specialized AI training hardware
Generated by PatentBrief · Not legal advice · patentbrief.org
US 10956815 · 2026