Skip to content
PatentBrief

Technology Patents

Natural Language Processing Patents

Transformer, BERT, and GPT LLM patent landscape; RAG retrieval-augmented generation IP; and IP strategy for NLP and LLM startup founders.

FAQ

Who are the major NLP patent holders, and what innovations do Google, IBM, and Microsoft protect?

Natural language processing patents cover transformer and attention mechanism architectures; LLM pre-training and fine-tuning methods; retrieval-augmented generation RAG and vector database systems; named entity recognition NER and information extraction; machine translation neural MT systems; dialogue management and conversational AI; and text embedding and semantic search — with IP concentrated at large technology companies although some startup-specific domains remain: MAJOR NLP PATENT HOLDERS: GOOGLE/ALPHABET: 10,000+; specific NLP+LLM (specific specific transformer architecture: specific specific multi-head self-attention from specific specific scaled dot-product Attention(Q,K,V) = softmax(QK^T/√dk)V from specific specific Vaswani et al. 2017 «Attention Is All You Need» for specific specific encoder-decoder sequence-to-sequence 512 token input from specific specific 8-head attention h=8 dk=64 dmodel=512 from specific specific positional encoding sin/cos for specific specific MT machine translation WMT 2014 English-German 28.4 BLEU; specific specific BERT bidirectional encoder: specific specific masked language model MLM 15% token mask + specific specific next sentence prediction NSP from specific specific 12 transformer layers Lbase 768 hidden 12 heads 110M parameters for specific specific NLU fine-tuning GLUE benchmark 80.5 score from specific specific WordPiece tokenizer 30,000 vocab); IBM: 5,000+; specific Watson NLP (specific specific DeepQA: specific specific IBM Watson Jeopardy! IBM 2011 from specific specific >100 NLP components from specific specific candidate generation + specific specific scoring + specific specific merging for specific specific open-domain QA at specific specific 99% precision >0.5 confidence from specific specific evidence-based answer scoring; specific specific NER named entity recognition: specific specific bidirectional LSTM + specific specific CRF conditional random field from specific specific BIO tagging scheme for specific specific person+organization+location at specific specific CoNLL-2003 91.2 F1 from specific specific LSTM-CRF joint sequence labeling); MICROSOFT: 5,000+; specific Azure AI+Bing (specific specific RLHF reinforcement learning human feedback: specific specific reward model from specific specific human preference ranking from specific specific comparative annotation for specific specific GPT-4+Bing LLM fine-tuning from specific specific PPO proximal policy optimization gradient update clipping ratio 0.1-0.2 for specific specific aligned helpfulness+harmlessness generation; specific specific Bing neural search: specific specific dense+sparse hybrid retrieval from specific specific BM25 sparse + specific specific dense BERT bi-encoder ANN for specific specific dual-encoder retrieval + specific specific cross-encoder reranking for specific specific >+15% NDCG vs. specific specific BM25-only at specific specific 1M document corpus); META: 3,000+; specific LLaMA+Faiss (specific specific LLaMA open-weight: specific specific 7B→70B dense decoder transformer from specific specific RoPE rotary position embedding + specific specific SwiGLU activation + specific specific RMSNorm from specific specific trained on 2T tokens Common Crawl+ArXiv+Wikipedia for specific specific RLHF Llama-2-Chat helpfulness; specific specific Faiss ANN: specific specific inverted file IVF + specific specific product quantization PQ from specific specific coarse 16384 centroid + specific specific fine PQ8 for specific specific billion-scale dense retrieval at specific specific 1 ms/query); OPENAI/MICROSOFT: 3,000+; specific GPT (specific specific GPT autoregressive decoder: specific specific causal LM next-token prediction from specific specific 12→96 transformer layers for specific specific GPT-1→GPT-4 from specific specific BPE byte-pair encoding tokenizer 50,257 vocab for specific specific zero-shot few-shot in-context learning).

What NLP model architecture, fine-tuning, and inference optimization innovations are patentable?

Novel transformer architecture modifications improving efficiency or capability; fine-tuning methods for task-specific adaptation of pre-trained LLMs; quantization and distillation methods for LLM inference optimization; and retrieval-augmented generation RAG systems combining LLMs with external knowledge represent four NLP innovation domains with active patent filing: TRANSFORMER ARCHITECTURE PATENTS: GOOGLE; META; MICROSOFT; NVIDIA; HUGGING FACE: specific transformer (specific specific mixture of experts MoE: specific specific conditional compute from specific specific gating network G(x) = top-k(softmax(Wx)) for specific specific k=2 active experts out of specific specific N=64-128 for specific specific 8× more parameters than specific specific dense model at specific specific same FLOPs from specific specific Mistral 8×7B 47B total 13B active at specific specific same inference cost as specific specific 13B dense; specific specific rotary position embedding RoPE: specific specific rotation matrix R from specific specific complex number e^(im·θk) for specific specific relative position encoding without specific specific additive positional encoding from specific specific query+key rotation for specific specific linear complexity + specific specific length extrapolation to >4× training context from specific specific LLaMA+Mistral RoPE 4k→32k extension via specific specific YaRN or specific specific LongRoPE; specific specific flash attention: specific specific tiling GPU SRAM kernel from specific specific attention A = softmax(QK^T/√dk)V from specific specific tile-by-tile computation for specific specific O(N) HBM memory vs. specific specific O(N²) naive attention for specific specific 2-4× speedup at specific specific 2k→8k sequence for specific specific A100 GPU 40 GB SRAM vs. specific specific 80 GB HBM); FINE-TUNING AND PEFT PATENTS: MICROSOFT; HUGGING FACE; GOOGLE; STANFORD: specific fine-tuning (specific specific LoRA low-rank adaptation: specific specific weight update ΔW = BA rank decomposition from specific specific r=4-16 rank vs. specific specific d=4096 for specific specific 10,000× fewer parameters vs. specific specific full fine-tuning at specific specific same quality for specific specific task-specific LoRA adapter from specific specific αW0 + (α/r)BA frozen+trained for specific specific multi-task adapter switching; specific specific instruction fine-tuning: specific specific FLAN instruction template from specific specific T5 1.5B from specific specific 62 benchmark 473 tasks for specific specific zero-shot generalization at specific specific FLAN >GPT-3 175B few-shot on specific specific natural language inference from specific specific dataset mixing ratio optimization 512 templates/task); LLM INFERENCE OPTIMIZATION PATENTS: NVIDIA; GOOGLE; MICROSOFT; ANYSCALE: specific LLM inference (specific specific continuous batching: specific specific PagedAttention from specific specific vLLM KV-cache paged memory for specific specific multiple request batching within specific specific single forward pass from specific specific 3-24× throughput improvement vs. specific specific static batching for specific specific 4× GPU utilization improvement at specific specific 70B inference; specific specific quantization INT4: specific specific GPTQ one-shot post-training weight quantization from specific specific Hessian second-order from specific specific sequential layer-wise quantization from specific specific 4-bit weight + specific specific 16-bit activation for specific specific 4× memory reduction at specific specific <1% perplexity increase for specific specific LLaMA 70B 40→10 GB VRAM); RAG PATENTS: META; GOOGLE; MICROSOFT; COHERE; DATABRICKS: specific RAG (specific specific dense passage retrieval DPR: specific specific dual-encoder BERT question encoder + specific specific BERT passage encoder from specific specific dot-product similarity from specific specific 21M Wikipedia passages for specific specific top-k=100 passage retrieval + specific specific fusion-in-decoder FiD reader at specific specific NQ 50.5 EM from specific specific single max-pooling CLS token representation; specific specific hybrid search: specific specific sparse BM25 Lucene + specific specific dense ANN Faiss cosine from specific specific RRF reciprocal rank fusion score for specific specific 1/(k+rank) for specific specific superior tail-query recall vs. specific specific dense-only or specific specific sparse-only at specific specific BEIR benchmark +5-10% NDCG@10).

What are key patents in machine translation, conversational AI, and NLP for structured data?

Neural machine translation NMT system patents from Google, Microsoft, and DeepL; conversational AI dialogue management patents covering task-oriented and open-domain systems; and NLP for structured data including text-to-SQL table question answering and document parsing represent three NLP application patent domains: MACHINE TRANSLATION PATENTS: GOOGLE; MICROSOFT; DEEPL (LINGUEE); AMAZON: specific NMT (specific specific sequence-to-sequence Seq2Seq: specific specific encoder LSTM+attention from specific specific Bahdanau 2015 alignment model for specific specific source hidden state context vector for specific specific WMT 2014 BLEU 41.0 English-German outperforming specific specific phrase-based SMT at specific specific PBMT 20.7 BLEU; specific specific transformer NMT: specific specific Google Translate transformer big from specific specific 6L encoder+decoder Lbig 1024 hidden 16 heads 213M for specific specific WMT 2014 English-German 28.4 BLEU from specific specific label smoothing ε=0.1 for specific specific overtrained softmax from specific specific beam search k=4 for specific specific 100+ languages Google Translate 500M+ daily; specific specific multilingual NMT: specific specific single shared encoder-decoder from specific specific 103 language pairs with specific specific language token prepend «en»→«de» for specific specific zero-shot translation between specific specific unseen language pair at specific specific +1-3 BLEU from specific specific transfer from specific specific high-resource language); CONVERSATIONAL AI PATENTS: GOOGLE DIALOGFLOW; AMAZON LEX; MICROSOFT LUIS; RASA: specific dialogue management (specific specific task-oriented DST: specific specific dialogue state tracking from specific specific BERT encoder + specific specific slot-filling CRF for specific specific entity+intent from specific specific SGD Schema-Guided Dialogue from specific specific 16,142 dialogues 26 services 44% new user turns; specific specific multi-turn open-domain: specific specific BlenderBot from specific specific 9.4B parameter encoder-decoder from specific specific Wizard-of-Wikipedia+ConvAI2 multi-task for specific specific engaging+knowledgeable+empathetic+consistent at specific specific ACUTE-Eval human +15% vs. specific specific GPT-2; specific specific voice assistant NLU: specific specific Alexa BERT intent classification + specific specific NER entity slot-filling from specific specific 10M labeled utterances for specific specific <50 ms end-to-end latency from specific specific dynamic quantization INT8); NLP STRUCTURED DATA PATENTS: GOOGLE; SALESFORCE EINSTEIN; OPENAI; DATABRICKS: specific NLP+structured (specific specific text-to-SQL: specific specific CodeX SQL from specific specific GPT-3 Codex fine-tune on specific specific Spider 1.0 text-to-SQL benchmark 70.1% exact match from specific specific schema-linking entity alignment from specific specific column+table mention for specific specific natural language business intelligence query; specific specific table QA: specific specific TAPAS table-BERT from specific specific row+column segment encoding from specific specific Wikipedia infobox table from specific specific 2D position IDs for specific specific cell selection + specific specific aggregation COUNT/SUM/AVG for specific specific WikiTableQuestions 49.6% vs. specific specific traditional SQL; specific specific document parsing: specific specific LayoutLM multimodal BERT from specific specific 2D bounding box position + specific specific image pixel embedding + specific specific text token from specific specific invoice+receipt+form parsing for specific specific FUNSD key-value extraction 79.3 F1 from specific specific DocVQA 72.9%).

What IP strategy should NLP and LLM startup founders use to build a defensible patent portfolio?

NLP startup IP strategy must recognize that large tech companies hold dense foundational IP in transformer architectures and BERT/GPT LLM training methods; identify the more open application-layer patent opportunities in domain-specific fine-tuning, specialized RAG pipeline architectures, and novel evaluation and deployment methods; carefully navigate the § 101 Alice risk for NLP software patents; and understand the significant role that trade secrets and data assets play in LLM differentiation: NLP/LLM STARTUP IP STRATEGY: UNDERSTAND THE NLP PATENT LANDSCAPE: GOOGLE HOLDS TRANSFORMER AND BERT FOUNDATIONAL IP: Google (10,000+) filed the original transformer architecture patent (Vaswani et al. 2017) and BERT (Devlin et al. 2018) — both foundational model architectures; while the papers are published and methods widely used, Google holds process and system claims; most novel LLM startups are already designing around these specific architectures; MICROSOFT AND META HOLD RLHF AND LLAMA IP: Microsoft (5,000+) has built significant IP in RLHF and GPT integration via OpenAI partnership; Meta (3,000+) holds IP in LLaMA open-weight training, Faiss vector search, and dense passage retrieval; HUGGING FACE AND STANFORD HOLD PEFT/LORA IP: Microsoft LoRA and Stanford FLAN IP cover the dominant efficient fine-tuning methods — important FTO for any LLM fine-tuning service; NLP PATENTS ARE HIGH-RISK FOR § 101: NLP software patents face significant § 101 Alice risk — USPTO rejections of abstract idea without significantly more are common for language model method claims; anchoring to specific hardware (GPU tensor core operations, specific inference server architecture) or specific novel input/output formats significantly improves § 101 survival; WHEN TO PATENT IN NLP: NOVEL DOMAIN-SPECIFIC MODEL ARCHITECTURE WITH MEASURED PERFORMANCE IMPROVEMENT: specific novel architecture modification (specific specific attention variant + specific specific positional encoding + specific specific normalization) with specific measured accuracy metric (specific specific F1/BLEU/EM) at specific specific benchmark + specific specific inference speed tokens/second + specific specific memory footprint GB vs. specific specific BERT-large or specific specific LLaMA-7B baseline on specific specific same hardware and specific specific same task — the key differentiator is domain-specific performance on specific realistic benchmarks, not general NLP claims; NOVEL RAG PIPELINE WITH MEASURED RETRIEVAL AND GENERATION ACCURACY: specific novel RAG architecture (specific specific retriever type + specific specific chunking strategy + specific specific reranking model + specific specific context injection method) with specific measured end-to-end accuracy % on specific specific domain benchmark + specific specific latency ms per query at specific specific corpus size + specific specific hallucination rate % measured by specific specific factual consistency metric vs. specific specific vanilla DPR+FiD or specific specific naive Pinecone RAG baseline; NOVEL NLP INFERENCE OPTIMIZATION WITH MEASURED THROUGHPUT AND QUALITY: specific novel inference method (specific specific batching strategy + specific specific quantization scheme + specific specific caching approach) with specific measured tokens/second at specific specific target latency SLA ms + specific specific quality degradation perplexity delta + specific specific GPU memory MB at specific specific model size vs. specific specific vLLM PagedAttention or specific specific GPTQ INT4 baseline; TRADE SECRETS: training data curation and filtering pipeline; RLHF annotation guidelines and rubrics; evaluation benchmark design; proprietary customer domain fine-tuning datasets; hardware-software co-optimization recipe for specific GPU configuration — all are stronger protections for differentiated LLM capability than software patents; § 101 ANCHORING: anchor NLP patents to specific hardware tensor operations (GPU/TPU compute kernel operations) + specific data pipeline transformations + specific inference latency targets rather than algorithmic method claims alone; KEY FTO CHECKLIST: Google transformer multi-head self-attention Vaswani 2017; BERT 12L 768H MLM+NSP fine-tuning GLUE 80.5; T5 text-to-text unified framework; Microsoft RLHF PPO reward model GPT-4 Bing; Meta LLaMA RoPE SwiGLU RMSNorm 2T tokens; Faiss IVF+PQ billion-scale 1ms/query ANN; LoRA BA rank decomposition r=4-16 adapter; Flashattention tiling SRAM O(N) memory; vLLM PagedAttention continuous batching 3-24× throughput; GPTQ INT4 Hessian second-order sequential layer-wise.

Related Guides

AI Machine Learning PatentsMachine Learning PatentsComputer Vision PatentsStartup IP Strategy