Stanford Colloquium on Computer Systems: a talk by Geoffrey Hinton, Apr. 27, 2016:
https://www.youtube.com/watch?v=VIRCybGgHts Some of the highlights of this remarkable lecture:
1) Hinton argues for the networks with many orders of magnitude more parameters than data points and with strong regularization.
(Cf. recent paper on "Outrageously Large Neural Networks",
https://arxiv.org/abs/1701.06538 This has 100 billion parameters, so only 3 orders of magnitude below the size Hinton suggests for the brain.)
2) He specifically explains how "dropout" works as regularization (one considers a huge ensemble of models, then for each step of the gradient descent, one samples from that ensemble by picking one model, regularization happens via sharing weights between models from the ensemble, the resulting models approximates the geometric mean of all the models in the ensemble, hence it is related to the "product of experts" scheme).
This is the most immediate practically applicable part of the talk.
He makes a strong case that spikes do something similar to this dropout regularization.
3) He conjectures that the derivative of the error with respect to the neuron input is coded by the derivative of the output signal of the neuron with respect to time (while the derivative of the value detected by that neuron with respect to time is coded by some other neuron, if that derivative is needed).
Then, strangely enough, backpropagation becomes implemented via Hebbian learning and moreover, the spike-time-dependent plasticity rule naturally emerges from looking at the derivative filter over the spike train.
This is the most revolutionary part of the talk.
4) The problem of the lack of symmetric reverse connections in the brain is solved via recent work on "feedback alignment" (where one can use random weights instead of actual weights while computing the derivative by the chain rule). He explains the intuition behind the "feedback alignment" and why it works, but at the same time he says that the initial hopes that it would be better than backprop due to capturing second-derivative information do not seem to be justified, and that the classic backprop seems to be slightly better, rather than the opposite. (The original article on "feedback alignment" contianing those initial hopes is
https://arxiv.org/abs/1411.0247 ; it's still a nice method, even if we now don't think it's better than backprop.)
***
The talk itself is 65 min, then questions and answers. I found it useful to watch in some increments of a few minutes each, one thought at a time, rewinding often.