Technology Patents
Foundation Model Patents
The transformer, §101/Alice, trade secrets, and why AI labs don't patent; foundation model IP strategy for AI startup founders.
FAQ
Who patents foundation models, and why do most AI labs NOT patent their core methods?
Foundation model patents are notable mostly for their ABSENCE: the leading AI labs largely rely on trade secrets and speed rather than patents for core model methods, because §101 (Alice) makes pure machine-learning method claims weak and because patenting requires disclosure that helps competitors. WHO HOLDS WHAT: GOOGLE holds the foundational TRANSFORMER patent (US 10,452,978 — 'Attention Is All You Need', the architecture underlying virtually every modern LLM) and BERT/related patents — but Google has signaled it will not assert the transformer patent aggressively, and the architecture is published and universally used. NVIDIA holds substantial hardware/systems and CUDA/inference patents (the GPU/systems layer is far more patentable than the model). OPENAI, ANTHROPIC, and META hold comparatively FEW core-method patents — they publish selectively, keep training data/recipes/weights as TRADE SECRETS, and compete on capability and speed rather than patent exclusivity (Meta also open-sources Llama). MICROSOFT, IBM, and others hold larger AI patent portfolios skewed toward applications and systems. WHY LABS DON'T PATENT CORE METHODS: (1) §101/Alice — a claim to 'train/run a neural network to do X' is an abstract idea (math) and likely invalid; (2) patenting requires public disclosure, handing the recipe to competitors, while trade secret keeps it hidden with no 20-year clock; (3) the field moves faster than the multi-year patent process; (4) enforcement against another lab's secret training process is nearly impossible. So the dominant 'IP strategy' in foundation models is TRADE SECRET + DATA SCALE + SPEED, not patents.
Why is §101 (the Alice problem) the central foundation model patent challenge?
Abstract-idea §101 risk; mathematical-concept and mental-process exclusions; technical-improvement claiming; and hardware-integration framing represent the central foundation-model patentability problem — because neural-network training and inference are mathematics, which §101 treats as an unpatentable abstract idea. THE §101 PROBLEM: under Alice/Mayo, claims directed to abstract ideas — including MATHEMATICAL CONCEPTS and MENTAL PROCESSES — are not patentable unless they recite 'significantly more,' a specific technical improvement. A neural network is, at its core, matrix math optimized by gradient descent — so a claim to 'a method of training a model to generate text' or 'predicting the next token using a transformer' is highly vulnerable as an abstract mathematical concept implemented on a generic computer. SURVIVING §101: a foundation-model claim must recite a SPECIFIC TECHNICAL IMPROVEMENT to the functioning of a computer or a concrete technical process — e.g. a novel hardware-software architecture that measurably improves training efficiency/memory, a specific systems technique tied to GPUs/accelerators, or integration into a particular machine producing a technical result (not just a better prediction). HARDWARE-INTEGRATION FRAMING: claims tied to specific accelerator hardware, memory/dataflow architectures, distributed-training systems, or inference-serving infrastructure are far more §101-durable than abstract model/training claims (this is why NVIDIA's systems IP is strong while abstract model methods are weak). APPLICATION FRAMING: claims to a specific technical application (a concrete control system, a signal-processing improvement) can survive where a general model claim cannot. The lesson: the model math is abstract and §101-weak; patentable AI IP lives in the SYSTEMS/HARDWARE layer and specific technical improvements — and the core model is better protected as a trade secret.
What foundation model, training, and inference innovations are actually patentable?
Hardware-coupled systems innovations; training-infrastructure innovations; inference-serving innovations; and specific-technical-application innovations represent the actually-patentable foundation-model domains — concentrated in the SYSTEMS and HARDWARE layer, not the abstract model. SYSTEMS / HARDWARE PATENTS: AI accelerator architecture and dataflow (see edge-AI/AI-chip patents), distributed/parallel training systems (pipeline/tensor/data parallelism, communication scheduling, memory optimization like activation checkpointing/sharding — ZeRO/FSDP-style — these are concrete systems improvements with good §101 footing), and the hardware-software co-design — the most defensible AI IP. INFERENCE-SERVING PATENTS: KV-cache management, paged attention, continuous batching, speculative decoding, quantization hardware/datapaths, and serving-infrastructure (concrete technical improvements to compute efficiency, tied to systems). TRAINING-METHOD PATENTS (NARROWER): RLHF (reinforcement learning from human feedback), constitutional AI, fine-tuning (LoRA/adapters), retrieval-augmented generation RAG, and data pipelines — these are MORE §101-vulnerable as methods, but may be patentable if tied to a specific technical implementation/improvement (most labs keep these as trade secrets and publish selectively). ARCHITECTURE PATENTS (NARROW): specific architecture variants (mixture-of-experts routing, attention variants, long-context mechanisms) framed as technical improvements — but the bare architecture is §101-risky. APPLICATION PATENTS: a foundation model integrated into a specific technical system producing a concrete result (a control system, a diagnostic device) is patentable as the application. The durable, patentable AI IP is in the SYSTEMS/HARDWARE that train and serve models efficiently — abstract model and training methods are weak and usually trade-secret.
What IP strategy should foundation model and AI startup founders use?
Foundation model startup IP strategy is fundamentally DIFFERENT from most deep-tech: the core asset (the model, training recipe, data) is best protected by TRADE SECRET and speed, not patents, because §101 makes model/training method claims weak and disclosure aids competitors; understand that you generally cannot durably patent 'a better LLM' or 'a training method' (Alice/abstract idea), that the patentable AI IP lives in the systems/hardware layer (training infrastructure, inference serving, accelerators) and specific technical applications, that the transformer and most architectures are published prior art (Google's transformer patent is not broadly asserted), and that data, talent, compute, and execution speed are the real moats; identify whitespace in training/inference systems, hardware-coupled techniques, and specific technical applications — and protect the core model as a trade secret. FOUNDATION-MODEL STARTUP IP STRATEGY: TRADE-SECRET THE MODEL/RECIPE/DATA — DON'T PATENT THE CORE: the model weights, training recipe, and data are your moat; patenting them requires disclosure and faces §101 invalidity — keep them as trade secrets (no clock, no disclosure) and compete on capability/speed; PATENT THE SYSTEMS/HARDWARE LAYER (§101-DURABLE): training-infrastructure (distributed training, memory/communication optimization), inference-serving (KV cache, speculative decoding, quantization), and accelerator co-design are concrete technical improvements with good §101 footing — this is where defensible AI patents live; SPECIFIC TECHNICAL APPLICATIONS ARE PATENTABLE: a model integrated into a concrete technical system (a device, a control/diagnostic process) producing a measurable technical result can be patented as the application, where a general model cannot; DON'T RELY ON PATENTING 'A BETTER MODEL' — IT WON'T SURVIVE ALICE: 'train/run a neural network to do X' is an abstract idea — frame any model claim as a specific technical improvement to the computer/system or skip it; DATA, COMPUTE, TALENT, AND SPEED ARE THE REAL MOATS: in foundation models, execution beats patents — proprietary data, compute access, and shipping fast matter far more than IP; DEFENSIVE PUBLISHING AND OPEN-SOURCE ARE TOOLS: publishing creates prior art that blocks others from patenting (and open-sourcing, like Meta's Llama, is a strategic moat play); WHEN TO PATENT: SYSTEMS/HARDWARE OR APPLICATION WITH MEASURED TECHNICAL IMPROVEMENT: file for training/inference systems or specific applications showing a measured technical improvement (training throughput/memory, inference latency/cost, a concrete application result) — measured systems-level efficiency or a concrete technical application is what's patentable; KEY FTO CHECKLIST: Google transformer/attention US10452978 + BERT (published, not broadly asserted); NVIDIA accelerator/CUDA/inference systems; training-infrastructure (parallelism/memory/communication) §101-durable; inference-serving KV-cache/speculative-decoding/quantization; RLHF/constitutional-AI/RAG/LoRA (§101-weak as methods — trade-secret); mixture-of-experts/attention-variant architectures (§101-risky); Alice/Mayo abstract-idea + mathematical-concept §101 (CENTRAL); trade-secret model/weights/data/recipe; defensive publication; open-source strategy.
Related Guides