Evolving Deep Neural Networks

This post is by UT Austin grad student Jason Liang

Deep learning has revolutionized the field of machine learning in many ways. From achieving state-of-the-art results in many benchmarks and competitions to effectively exploiting the computational power of the cloud, deep learning has received widespread attention not just in academia but also in industry. Deep learning has helped researchers and scientists obtain state-of-the-art results in speech recognition, object detection, time-series prediction, reinforcement learning, sequential decision-making, video/image processing, and many other supervised and unsupervised learning tasks. One of the leaders in this field is Sentient Technologies, an AI startup based in San Francisco that specializes in financial trading, e-commerce, and healthcare applications using deep learning, evolutionary computation, and other machine learning and data science approaches. I am currently working as an intern at Sentient, developing ways to make deep learning not only easier to implement, but also more applicable to more general problem domains. This internship allows transferring my dissertation research to industry, and also gives me access to computational resources that makes such work possible.

Deep learning, despite its newfound popularity among the machine learning and artificial intelligence community, is actually an extension of decades old neural network research; the major difference is that the size of both the datasets and available computing power have increased exponentially. One of the problems with deep learning is that the architecture design has a large impact on its performance and some problems require specialized architectures. For example, the Googlenet architecture (shown below), which won the 2014 Imagenet competition for image classification, contains specialized submodules which themselves are deep networks. Also, as the networks become more complex, the number of parameters and configurations that needs to be optimized increases as well. At Sentient, my advisor Risto Miikkulainen and I are developing evolutionary algorithms to automatically discover and train the best deep neural networks for a particular problem. Our vision is to eventually create a general framework that is applicable to any problem and uses machines to automate AI and machine learning research.

Googlenet architecture

One of the downsides of deep learning is that training a neural network is very computationally intensive. Most networks of moderate complexity and above take hours, if not days to train in machines with powerful GPUs. This compute cost is even worst for evolution of deep networks, since now there is a whole population of networks that must be trained and evaluated during every generation. Due to the immense computational requirements, evolutionary deep learning has been considered to be impractical until now. Fortunately, Sentient has developed a massively scalable evolutionary algorithm that runs on millions of CPUs all over the world  to evolve stock trading agents. We are currently extending it to utilize GPUs as well, to perform parallel training of each deep neural network simultaneously. This framework will eventually be scalable to hundreds of thousands of GPUs. Since GPUs are expensive and relatively rare, we are also looking at ways of utilizing also CPUs for training deep neural networks. If the training of a single network model can be parallelized across many CPU machines, then it is truly possible scale up evolution of neural nets to millions of machines.

As computing power becomes faster and cheaper, I believe that there is going to a lot newfound interest in applying evolutionary algorithms to deep networks. This approach should be particularly useful in automatic discovery of new architectures for new problem domains, such as understanding cluttered images, video, and natural language, as well as reinforcement learning and sequential decision making. This process will depend on extreme computational resources, thereby making it productive to combine the resources of academia and industry.

This entry was posted in BEACON Researchers at Work and tagged , , , , . Bookmark the permalink.

Comments are closed.