Transformers¶

At a high level, Transformers process data in parallel and use attention to understand relationships between all parts of the input. The core idea is "Each word looks at every other word and decides what matters most".

History of Transformers¶

Before Transformer architecture, LSTM (Long Short Term Memory), it is more powerful and has more capabilities but cannot do actions in parallel.

2017 - Google scientists published a paper titled "Attention is all you need" on new model architecture called transformers.
2018 - GPT-1
2019 - GPT-2
2020 - GPT-3
2022 - RLHF and ChatGPT
2023 - GPT-4
2024 - GPT-4o
2025 - GPT-5