Deep Nets and ChatGPT
- In the 80s there was a thaw in the AI winter, largely led by
the Parallel Distributed Processing (PDP) books.
- Among other things, these introduce the Multi Layer Perceptron
(MLP) learning via the backpropagation of error.
- A perceptron is like a McCullouch Pitts neuron (1943) (an ingtegrate
and fire neuron), though it's not continous.
- What was cool about the MLP was the supervised learning algorithm
that learned from the Backpropagation of error.
- These used layers of neurons (unlike the brain), which allowed
some cool maths to help.
- It has a problem with the vanishing error gradient, so these generally
had no more than four layers of neurons.
- More recently (~2010), deep nets came around. These used largely
the same techniques (though cool new ones helped too), but learned
a layer at a time, which addressed the vanishing error gradient problem.
- With this, a lot of new data (big data), and a lot of computational
power (GPUs and Moore's law), the new error of deep nets began.
- These can solve much bigger problems.
- Chat GPT is just a really big deep net (500 billion parameters).
- It probably learned with all wikipedia and a lot of other text.
- Attention helps, but, as far as we can tell (it's proprietory)
it's mostly just a deep net.