Blog

Deep Learning

-- posted March 2016 --

This blog is about deep learning, which pertains to the use of deep neural networks for machine learning, and is the cause for many of today’s most talked-about innovations in artificial intelligence.

If you have never taken a course in this field, the objective in machine learning is to describe the correlations among the variables in a process/system by fitting mathematical functions to collected output data from the process, as opposed to the conventional approaches for mathematical modeling of processes/systems based on explicitly defining the mathematical relationships between the input and output variables. As the need to deal with big data entailing hundreds or thousands of variables and parameters is becoming more prominent, developing mathematical functions that encode the dependency among so many variables has become increasingly more challenging, and consequently, more and more applications have to rely on implementing some sort of machine learning for extracting the dependency directly from collected data. The most common tasks in machine learning are classification, clustering, and prediction. For example, to classify whether an incoming email is spam or non-spam, an algorithm will first analyze many spam and non-spam messages, and based on the occurrence of certain keywords in the messages it will learn to categorize the messages and to filter out the spam emails.

The recent breakthrough in machine learning with deep artificial neural networks demonstrated enormous potential for application across a wide range of tasks. In general, neural networks consist of many simple computational units, called neurons, which are interconnected to create a network. Each inter-neuronal connection is assigned a numerical value, called a weight. By iteratively adjusting the connection weights between the neurons based on processing the available data, a neural network can be trained to perform different tasks. A shallow neural network employs a single layer of artificial neurons for describing the relationship between the input and output variables in a process, whereas a deep neural network employs multiple layers of artificial neurons for achieving the same objective. The depth proved to be very useful for representing features in the data at multiple hierarchical abstraction levels. For instance, the most common application of deep neural networks is image processing for object recognition. Deep networks allow one layer of neurons to detect low-level features in the image, such as edges or corners, then the next layer of neurons uses the knowledge from the previous layer in order to detect features at a higher level of reasoning, such as contours, then the next layer can recognize more complex features, and the last layer of neurons can recognize objects in the image at the highest level of abstraction, such as cars, trees, people, etc.

Oftentimes the computational processes inside deep neural networks are compared to the way the human brain works, which, frankly speaking, is a very loose interpretation. Our brains indeed use a network of about 80 billion of biological neurons, which similarly to the artificial neurons are activated or inhibited, and with that, they affect the activation or inhibition of the neighboring neurons. However, we know very little of the way the biological neurons are interconnected along synapses and how they are used for reasoning or memorization, not even to mention our understanding (or lack thereof) of the hierarchy of abstraction in the information processing by the brain.

Back to the deep artificial neural networks, several different types of network architectures are being currently used by altering the way the artificial neurons are connected and the layers structured. An interesting characteristic of these networks is that there isn’t a strong theoretical background or mathematical understanding with regards to their operation. Nevertheless, the potential of deep neural nets to learn patterns in big and complex data is indisputable. They have outperformed the rest of machine learning methods in multiple competitions and challenges on image recognition, handwriting recognition, machine translation, image captioning, etc. In some of the contests, the authors even claimed super-human performance, where the algorithm performed better than people on particular tasks. And again, that should be taken with a dose of reserve, and I would rather call it human-level performance, and only in a very narrow application area. However, deep neural networks do have a potential in the future to reach and overcome human-level intelligence.

The pervasive use of deep learning is found in a plethora of applications, such as miscellaneous image processing tasks (e.g., scanning a paycheck at the ATM to find the written money amount), speech recognition ability of your cell phone, web search engines, self-driving cars, natural language processing and language translation, customized internet adds/news based on a user’s profile browsing history, and many other applications. Baidu, Facebook, Google, IBM, Microsoft today employ the top researchers in this field.

The flowing 5:58 minutes video provides a brief non-technical introduction to the concept of deep learning:

https://www.youtube.com/watch?v=bHvf7Tagt18

Quote:

Michael Nielsen (author of Neural Networks and Deep Learning): “Suppose that a few decades hence neural networks lead to artificial intelligence (AI). Will we understand how such intelligent networks work? Perhaps the networks will be opaque to us, with weights and biases we don't understand, because they've been learned automatically. In the early days of AI research people hoped that the effort to build an AI would also help us understand the principles behind intelligence and, maybe, the functioning of the human brain. But perhaps the outcome will be that we end up understanding neither the brain nor how artificial intelligence works!”

Back

Aleksandar (Alex) Vakanski