MLPs with a lot of layers
- One of the (maybe the) big successes of AI in the 80s was
the MLP
.
- It's still widely used today in analytics.
- In these nets, people only typically use one hidden layer,
and at most two.
- Why? It's the vanishing gradient problem.
- It's trained by the backprop rule (or some variant).
- Backprop can only correct from the backpropagation of error.
- It doesn't work as well in the connections further from
the output (and thus the error).