Transformer

LLM & Language Models

The neural network architecture behind virtually all modern AI language models — introduced in 2017 by Google and now the foundation of GPT, Claude, Gemini, and Llama.

The Transformer architecture, introduced in the 2017 paper 'Attention Is All You Need,' is the single most important technical innovation behind the current AI boom. It replaced older sequence-processing architectures (RNNs, LSTMs) with a mechanism called 'attention' that can process all parts of an input simultaneously.

The key innovation is self-attention: the ability to dynamically focus on different parts of the input when processing each element. When translating 'The cat sat on the mat because it was tired,' attention helps the model understand that 'it' refers to 'cat,' not 'mat' — by attending to the relevant context.

Transformers scale exceptionally well. Doubling the model size and training data tends to produce predictable capability improvements — a property called 'scaling laws.' This predictability is what convinced AI labs to invest billions in building ever-larger transformer models.

Real-World Example

The 'T' in GPT stands for Transformer. Every major language model — GPT-4, Claude, Gemini, Llama, Mistral — is built on the Transformer architecture invented by Google in 2017.

Try AI Summarizer

Condense long articles, papers, and reports into clear, concise summaries in seconds.

Try Free

Put this concept to work

Once the definition is clear, the next useful move is to try a focused tool flow instead of bouncing through more glossary pages.

Open the summarizer route

FAQ

What is Transformer?

The neural network architecture behind virtually all modern AI language models — introduced in 2017 by Google and now the foundation of GPT, Claude, Gemini, and Llama.

How is Transformer used in practice?

The 'T' in GPT stands for Transformer. Every major language model — GPT-4, Claude, Gemini, Llama, Mistral — is built on the Transformer architecture invented by Google in 2017.

What concepts are related to Transformer?

Key related concepts include LLM (Large Language Model), Transformer, Neural Network, Foundation Model, Parameters. Understanding these together gives a more complete picture of how Transformer fits into the AI landscape.

← Continue to a focused tool

Transformer

Real-World Example

Related Terms

Try AI Summarizer

Put this concept to work

FAQ