Neural Networks: From the Perceptron to GPT and Modern AI

The Dawn of Neural Networks (1950s–1960s)

In 1957, Frank Rosenblatt introduced the Perceptron, the very first artificial neural network capable of simple learning. It generated huge excitement, with newspapers calling it a “thinking machine.”

But in 1969, Marvin Minsky and Seymour Papert published Perceptrons, showing its mathematical limits. The single-layer perceptron couldn’t solve basic problems like XOR or recognise global structures in images. This critique led to reduced funding and the first AI winter.

Backpropagation and the Revival (1970s–1980s)

In the Soviet Union, Alexey Ivakhnenko experimented with multilayer networks (GMDH). Around the same time, in 1974, Paul Werbos introduced the backpropagation algorithm in his PhD thesis. His work went largely unnoticed.

The breakthrough came in 1986, when Geoffrey Hinton, David Rumelhart, and Ronald Williams demonstrated that backpropagation could train multilayer networks. Neural networks were back in the spotlight.

Deep Learning Emerges (1990s–2000s)

In 1998, Yann LeCun developed LeNet-5, a convolutional neural network (CNN) capable of reading handwritten digits. This was the foundation of modern computer vision.

In 2006, Hinton and colleagues revived the field once again, coining the term deep learning to describe networks with many layers.

The Revolution (2010s)

The real explosion came in 2012 with AlexNet, created by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. By winning the ImageNet competition by a huge margin, AlexNet proved deep learning’s superiority in computer vision.

In 2014, Ian Goodfellow introduced GANs (Generative Adversarial Networks), a new way to generate realistic images.
In 2017, the Transformer architecture (Attention Is All You Need) changed the landscape, laying the foundation for today’s large language models.

The Age of LLMs (2018–2025)

OpenAI’s GPT series (2018–2020) scaled Transformers to new heights. GPT-2 surprised the world with fluent text generation, and GPT-3 (175 billion parameters) demonstrated few-shot learning abilities.

In 2022, ChatGPT made conversational AI accessible to everyone.
In 2023, GPT-4 introduced multimodality, understanding both text and images.
By 2025, large language models are multimodal assistants, combining text, vision, audio, and tools in everyday life.

For Further Reading (Academic Notes)

Rosenblatt, F. (1958). The Perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408.
Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to computational geometry. MIT Press.
Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences (Doctoral dissertation, Harvard University).
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. NeurIPS, 25, 1097–1105.
Goodfellow, I., et al. (2014). Generative adversarial nets. NeurIPS, 27, 2672–2680.
Vaswani, A., et al. (2017). Attention is all you need. NeurIPS, 30, 5998–6008.
Brown, T. B., et al. (2020). Language models are few-shot learners. NeurIPS, 33, 1877–1901.