BEACON Researchers at Work: Developing interactive evolutionary computation for machine learning games

This week’s BEACON Researchers at Work post is by University of Texas at Austin graduate student Igor Karpov.

Photo of Igor Karpov

Igor with a pair of traditional Komi fur skis

When thinking about parts of my work that are most relevant to BEACON, several topics come to mind simultaneously. To avoid making the hard choice myself, I will briefly describe all of them, and leave the choice of what is interesting to follow up on to the reader.

The unifying theme for the projects described below is that I use of and extend evolutionary computation methods in the context of a popular and complex domain – video and computer games. The domain has several properties that make it an interesting subject of study from the perspective of artificial intelligence. First, the variety of game genres and complexities allows for a gradient of increasingly complex behaviors and adaptation approaches to be developed. Secondly, game engines have developed a good balance of complex environments and behaviors with reasonable amounts of computation and simulation speed. Finally, and perhaps most importantly, the game domain has plenty of human participation. This means both that the domain itself is interesting and challenging enough to hold our attention, and that we can study how our state-of-the-art autonomous agents do when they are interacting with human-level intelligence in its various forms.

Bar graph

The relative ability of bots and human players to pass for human players in the Botprize competition.

3D line graph

An example of the data collected from human games in the Botprize domain.

One of the most complex such domains that I will talk about is the Botprize competition. The goal of this competition is to develop a software player for a state of the art first-person game (Unreal Tournament 2004 in our case) that is behaviorally indistinguishable from a human player. To be more concrete, we have to design a bot that will fool the human players it interacts with into labeling it as a human about as often as another human player is able to do so.

To address this challenge, I have worked with a fellow BEACON researcher and UT Austin graduate student Jacob Schrum (who works on multi-objective evolution of neural network controllers for game domains) and our advisor Risto Miikkulainen, to develop UT2, a game bot that participated in the Botprize competition several times, and placed 2nd in 2010. The overall system is complex and includes a scripted behavior architecture, a module used in combat and evolved by multi-objective constructive evolution of artificial neural networks, and a module that is responsible for human-like movement that is based on playback of human examples (Believable Bot Navigation via Playback of Human Traces). The area is ripe for future work, including imitation learning from human behavior and ways of combining imitation with evolution of autonomous behaviors.

Diagram

A schematic diagram explaining the human-assisted neuroevolution method. Three types of human input (advice, example traces and task shaping) are combined with an evolving population of artificial neural networks to produce desired solutions faster.

The second project that I have worked on together with Vinod Valsalam and Risto Miikkulainen, sets out to study exactly the ways in which human users can harness machine learning methods such as neuroevolution. In this human subject study, we compare manual design of game behavior and unassisted evolution of neural networks against three different types of human-assisted, interactive neuroevolution, namely evolution in the presence of task shaping, evolution with the addition of advice, and evolution with learning from examples (see Human-Assisted Neuroevolution through Shaping, Advice and Examples).

3 line graphs

Relative time to solve the three design tasks manually, by evolving solutions automatically, and with human assistance.

Our results indicate that while the unassisted neuroevolution is a powerful game behavior design tool and outperforms manual design significantly, it can be greatly improved with the correct application of an interactive human assistance method. Further, the type of human assistance that works best depends on the task, leading to a hope of developing hybrid methods that combine the strengths of human input and of machine evolution automatically.

Screen capture of a maze from OpenNEROFinally, I will touch on a substantial open source software development project I am leading. The software is called OpenNERO: game platform for AI research and education. It is a system that includes several different game-like mods that are unified by an AI framework and support neuroevolution, reinforcement learning, search methods, planning, and potentially many others. While a description of the entire system is beyond the scope of this post, I encourage the reader to checkout our website at opennero.googlecode.com, and see some of the educational and research demos we have made available.

Color matrix

A color matrix representation of score differences in the OpenNERO round robin tournament. Rows and columns of the matrix are the red and blue team playing the match respectively. Redder colors mean more decisive victory for the red team and bluer colors mean more decisive victory for the blue team. Teams are ordered according to their average score across all matches played.

One of the most recent ways in which we have used the OpenNERO platform was to run the 2011 OpenNERO Tournament. This tournament, which was run as part of Stanford University’s online Introduction to Artificial Intelligence course, invited students to evolve and/or train behaviors for an RTS-like game, where their teams would compete with other submissions for the right to be called “strongest in the field.” We received 156 submissions and ran a round-robin tournament, resulting in a detailed analysis of behavior diversity and other characteristics. The infrastructure developed for parallel evaluation of games and for analysis and visualization of tournament results gives us confidence that this type of a competition can be run with a much larger number of participants, and can potentially even be used to drive the process of evolution of novel behaviors itself.

For more information about Igor’s work, you can contact him at ikarpov at cs dot utexas dot edu.

This entry was posted in BEACON Researchers at Work and tagged , , , , , . Bookmark the permalink.

Comments are closed.