Deep Nets

In the 80s there was a thaw in the AI winter, largely led by the Parallel Distributed Processing (PDP) books.
Among other things, these introduce the Multi Layer Perceptron (MLP) learning via the backpropagation of error.
A perceptron is like a McCullouch Pitts neuron (1943) (an ingtegrate and fire neuron), though it's not continous.
What was cool about the MLP was the supervised learning algorithm that learned from the Backpropagation of error.
These used layers of neurons (unlike the brain), which allowed some cool maths to help.
It has a problem with the vanishing error gradient, so these generally had no more than four layers of neurons.
More recently (~2010), deep nets came around. These used largely the same techniques (though cool new ones helped too), but learned a layer at a time, which addressed the vanishing error gradient problem.
With this, a lot of new data (big data), and a lot of computational power (GPUs and Moore's law), the new error of deep nets began.
These can solve much bigger problems.
Chat GPT is just a really big deep net (500 billion parameters).
It probably learned with all wikipedia and a lot of other text.
Attention helps, but, as far as we can tell (it's proprietory) it's mostly just a deep net.

Deep Nets and ChatGPT