Deep learning is actually a new term for an artificial intelligence approach called neural networks. These networks have been around for over 70 years. Warren McCullough, Walter Pitts and Walter Pitts were the first to propose neural networks. They were two University of Chicago researchers who later moved to MIT in 1952.
Until 1969, neural nets were a key area of neuroscience research. However, the MIT mathematicians Marvin Minsky and Seymour Papert basically killed the reasearch they became co-directors at the new MIT Artificial Intelligence Laboratory.
Deep learning applications rely on “convolutional” neural network, which means that the nodes in each layer are clustered and overlap with each other. Each cluster then feeds data to multiple nodes of the next layer.
This technique enjoyed a resurgence during the 1980s but was again lost in the first decade of the new century. However, it has been fueled by graphics chips’ increased processing power and is now back in vogue.
“There’s this idea, that ideas in science can be a bit like epidemics virus,” states Tomaso Poggio, Eugene McDermott Professor and Brain and Cognitive Sciences at MIT, who is also an investigator at MIT’s McGovern Institute for Brain research and director of MIT’s Center for Brains, Minds, and Machines. There are five to six main flu virus strains, each of which is able to return within a time period of 25 years. Infected people develop an immune response and don’t become infected for 25 years. The virus then infects a new generation. Scientists fall in love with ideas, get excited about them, then hammer them until they die. Then they get immunized. Ideas should also have the same amount of periodicity!
Neural nets are central for machine learning. Here, a computer learns how to perform a task by analysing training examples. The examples are usually pre-labeled. For example, an object recognition system might be given thousands of images of cars, houses and coffee cups and would look for visual patterns that correspond with specific labels.
A neural net is loosely based on the human brain and consists of simple processing nodes with densely interconnected connections. Today’s neural nets are composed of layers of nodes and are “feed-forward,” which means that data flows only in one direction. A single node may be connected to multiple nodes in its layer below, which it receives data from, or to many nodes in the above layer, which it transmits data to.
A node assigns a number to each of its incoming connections. This is called a “weight”. Each node then receives a new data item, which is a different number, over each of its connections. Then it multiplies that number by the weight. The node then adds all the products together to get a single number. The node does not pass any data to the next layer if that number falls below a threshold. The node “fires” if the number is higher than the threshold value. This, in today’s neural networks, generally means that the node sends the number — the sum and weighted inputs along with all its outgoing connections.
All weights and thresholds of a neural network are initially set at random when it is being trained. The input layer is the base layer; data is then fed to the next layers. It gets multiplied, added together, and finally arrives at the output layer, where it has been radically transformed. Training data is adjusted to ensure that training data with identical labels yields similar outputs.
McCullough & Pitts (1944) described neural nets that had weights and thresholds. However, they were not arranged in layers and researchers didn’t provide any details about a training method. McCullough & Pitts demonstrated that a neural network could in principle compute the same functions as a digital computer. This was more neuroscience than computer science: The idea was to suggest that the human mind could be considered a computing device.
The use of neural nets continues to be an important tool in neuroscientific research. Certain network layouts and rules to adjust weights and thresholds have been shown to reproduce observed features of human cognition and neuroanatomy, which suggests that they can capture information about how the brain processes information.
Frank Rosenblatt, a Cornell University psychologist, demonstrated the Perceptron in 1957 as the first neural network that could be trained. The Perceptron design was similar to the modern neural network, but it only had one layer with adjustable thresholds and weights. It was sandwiched between input layers and output layers.
Perceptrons were a popular area of psychology research until 1959 when Minsky and papert published “Perceptrons,” a book that demonstrated that Perceptrons were too slow to do meaningful computations.
Poggio states that all these limitations “sort of disappear” if you use machinery that’s a little more complex — like two layers. However, the book was chilling for neural-net research at the time.
Poggio states, “You need to place these things in historical context. They were advocating programming languages such as Lisp. People were still using analog computers many years ago. At the time, it was not obvious that programming was the best way to go. Although I believe they were a bit too ambitious, it is not all bad. This is analog computing versus digital computing. They fought for the right thing at the time.”
Researchers had by the 1980s developed algorithms to modify neural net weights and thresholds. These were effective enough for networks with multiple layers, eliminating many of the limitations identified in Minsky and Papert. This field experienced a revival.
However, neural nets are not satisfying intellectually. Although enough training can improve a network’s ability to classify data efficiently, what does that mean? How does an object recognizer identify the image features it is looking at and how can they be combined into distinctive visual signatures for cars, houses and coffee cups? This question cannot be answered by looking at the individual weights of each connection.
Computer scientists have been developing innovative methods to determine the analytic strategies used by neural nets in recent years. In the 1980s, however, the strategies of the networks were difficult to understand. Around the turn of this century, neural networks were replaced by support vector machines (an alternative method to machine learning that is based on very simple and elegant mathematics).
Computer-games are responsible for the recent revival in neural networks, the deep-learning revolution. Graphic processing units (GPUs) are designed to handle the complex graphics and fast pace of video games. They pack thousands of processing cores on one chip. Researchers quickly realized that the architecture of a GPU was very similar to a neural network.
Modern GPUs allowed the one-layer networks from the 1960s and two- to three layer networks of 1980s to grow into the multi-, fifteen-, and even fifty-layer networks that we see today. This is what “deep” means in “deep learning”, the network’s depth. And currently, deep learning is responsible for the best-performing systems in almost every area of artificial-intelligence research.
Although the network’s opaqueness is still a concern for theorists there are some improvements. Recent work by Poggio and CBMM colleagues has produced a three-part theory of neural networks.
The first part, published in the International Journal of Automation and Computing discusses the variety of computations deep-learning networks are capable of and the advantages that they offer over shallower networks. Parts two, three are released as CBMM technical documents. They address problems such as global optimization. This refers to ensuring that a network finds the settings that most closely match its training data. Overfitting is when a network becomes too tuned to its specific training data and fails to generalize to other instances in the same category.
Although there are many the oretical questions still to be answered by CBMM researchers, their work could help break the generational cycle which has kept neural networks in favor for seven decades.