Why deep nets are great

If you know about multi-layer perceptrons learning via back propagation of error, you are in pretty good shape to understand deep nets.
The basic problem from the 80s and MLPs is that you couldn't really go beyond two levels of hidden layers.
There were two reasons really: not enough data, and not enough computing power.
There was some thinking a few years back when deep nets started to become popular that the problem was the learning algorithm. In general, backprop still works.
However, there are a lot of other good algorithms.
It also helps if you train the system a layer at a time.
What's really good about deep nets, is that there are a lot of parameters to set using a learning mechanism.
So, if you have 100 neurons in layer 1, and 100 neuron in layer 2, there are 10,000 connections (or parameters) to set.
Each layer (of the same size) adds 10,000 new parameters.
You of course do not have to use perceptrons. Restricted Boltzmann Machines are popular, but you can use other things.
A common technique is make a generative model between the input and first layer, so that the system is more roboust. You can use an auto-associator (like a Hebbian rule) for this.
There's no restriction between the connectivity in levels, the number of neurons in levels, or even the levels themselves (e.g. level 2 can split to level 3a and 3b, and then later combine into level 4.).
If you are doing vision, you might want to have a 2D input field, with spatially local connectivity.
You can use different learning algorithms at different levels or even different algorithms at different times on a given level.
It's incredibly flexible, or perhaps incredibly underspecified.
However, there is a lot of solid (published) mathematical theory behind it.