Transformer
LLM & Language ModelsThe neural network architecture behind virtually all modern AI language models — introduced in 2017 by Google and now the foundation of GPT, Claude, Gemini, and Llama.
The Transformer architecture, introduced in the 2017 paper 'Attention Is All You Need,' is the single most important technical innovation behind the current AI boom. It replaced older sequence-processing architectures (RNNs, LSTMs) with a mechanism called 'attention' that can process all parts of an input simultaneously.
The key innovation is self-attention: the ability to dynamically focus on different parts of the input when processing each element. When translating 'The cat sat on the mat because it was tired,' attention helps the model understand that 'it' refers to 'cat,' not 'mat' — by attending to the relevant context.
Transformers scale exceptionally well. Doubling the model size and training data tends to produce predictable capability improvements — a property called 'scaling laws.' This predictability is what convinced AI labs to invest billions in building ever-larger transformer models.
Real-World Example
The 'T' in GPT stands for Transformer. Every major language model — GPT-4, Claude, Gemini, Llama, Mistral — is built on the Transformer architecture invented by Google in 2017.
Related Terms
More in LLM & Language Models
FAQ
What is Transformer?
The neural network architecture behind virtually all modern AI language models — introduced in 2017 by Google and now the foundation of GPT, Claude, Gemini, and Llama.
How is Transformer used in practice?
The 'T' in GPT stands for Transformer. Every major language model — GPT-4, Claude, Gemini, Llama, Mistral — is built on the Transformer architecture invented by Google in 2017.
What concepts are related to Transformer?
Key related concepts include LLM (Large Language Model), Neural Network, Parameters. Understanding these together gives a more complete picture of how Transformer fits into the AI landscape.