Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - TheoryDL (Practically Relevant Theory of Deep Learning)

Teaser

\"Since the proposal of this project has been written, the impact of Deep Learning on our everyday life has been greatly increased.Translating text, searching and organizing images based on textual content, chat-bots, self-driving cars, are all examples of technologies which...

Summary

\"Since the proposal of this project has been written, the impact of Deep Learning on our everyday life has been greatly increased.
Translating text, searching and organizing images based on textual content, chat-bots, self-driving cars, are all examples of technologies which heavily rely on Deep Learning.
To the general audience, new technologies tend to look like a magic.
The unique situation in deep learning is that this technology looks like a magic even to data scientists.
The goal of the TheoryDL project is to demystify deep learning, by providing a deep (pun intended) theoretical understanding of this technology, and in particular, understanding its potential but also its limitations.
The significance of this goal is two folded. First, I believe that it is dangerous to rely on technology which we do not understand. Second, a better theoretical understanding should enable to improve existing algorithms. Of particular interest is to be able to come up with faster algorithms, which are not of brute-force nature. Current algorithms contain a lot of brute-force components, and therefore the \"\"power\"\" of using deep learning is focused around few industrial companies that have the data and computing resources. A better theoretical understanding may lead to a democratization of this technology.
\"

Work performed

We have tackled the problem from several angles and the findings are summarized in several publications (see the publications list).
Maybe the most fruitful direction was a systematic study of failures of deep learning.
People tend to run and tell about success stories, but failures are even more interesting as laying down the boundaries of a technology enables to better understand why and when it works. We have identified cases in which gradient based training of deep learning fails miserably. Interestingly, the failures are neither due to overfitting/underfitting nor due to spurious local minima or a plethora of saddle points. They are rather due to more subtle issues such as insufficient information in the gradients or bad signal to noise ratios.
This direction led us to an important observation: that weight sharing is crucial for optimization of deep learning. We proved that without weight sharing, deep learning can only essentially learn low frequencies, but completely fails to learn mid and high frequencies. Weight sharing enables some sort of coarse-to-fine training.
From there, we were able to define generative hierarchical models for which provably efficient algorithms, that actually work in practice, exist.

Final results

\"We believe that we are currently on the verge of a breakthrough. Neurons in a deep network have dual role: as \"\"selectors\"\" and as \"\"linear aggregators\"\". We are currently studying deep learning models in which these two roles are decoupled. We show that the \"\"linear aggregator\"\" part can be trained efficiently while the \"\"selector\"\" part can be defined by some sort of random initialization. This also shed light on mysteries in the generalization performance of deep networks and in their ability to \"\"memorize\"\" training sets. We believe that this direction will enable to demystify many of the empirically observed phenomena in deep learning.\"