BEACON Researchers at Work: Improving Search Quality in Genetic Programming

This week’s BEACON Researcher at Work blog post is by MSU graduate student Armand Burks.

About Me

PortraitAs a young “army brat,” I was always fascinated with learning how things worked. Trying to figure out how my toys worked, I often found myself amidst a pile of plastic pieces, wires, and motors, desperately trying to perform the magic trick of transforming those pieces back into one while my parents remained none the wiser. To this day, I am still not sure if I ever became more skilled at this trick or at making the toys disappear altogether. Either way, I became a better magician, and my interest in how things worked only grew.

It was this same fascination that led me to study Computer Science, as I could not only safely break things to understand them, but I could also create things from scratch. During my undergraduate studies at Alabama State University, I gained exposure to research through the Historically Black Colleges and Universities Undergraduate Program (HBCU-UP). This program ultimately led me to Michigan State University, where I first learned about Evolutionary Computation. From there, it was “love at first sight.” I was able to make things, break things, and let things evolve, all through the power of research!

My Current Research in Genetic Programming

My current research deals with combating premature convergence in Genetic Programming (GP). Before I delve into premature convergence, a brief summary of Genetic Programming is necessary. GP can be thought of as a special type evolutionary algorithm, where one of the main goals is to automatically create computer programs through evolutionary processes. GP evolves a population of computer programs, where programs are typically executed on an optimization problem and compared based on how close they are to solving that problem.

GP genotypeThe major difference between GP and most evolutionary algorithms is way in which the genotype (genetic makeup) of a program is represented in GP. In the most basic form of GP, genotypes are represented by a tree structure because it lends itself well to the creation of programs. As a concrete example, the figure at right shows the genotype of a program that encodes the mathematical expression: x2 – 1. When this tree’s genetic code is executed, the function nodes (* and -) operate on the terminal nodes (the input variable x and the constant value 1) to produce an output value. The evolutionary process breeds programs from the evolving population by combining their genetic material in order to discover new programs.

Premature Convergence in Genetic Programming

Premature convergence is one of the biggest problems in Genetic Programming, and evolutionary algorithms in general. To sum it up, premature convergence can be thought of as evolutionary stagnation. Since GP is often applied to optimization problems, one way to gain insight into the difficulty of such a problem for GP is to consider the enormity of the problem’s search space (the large number of possible computer programs that could be generated in attempt to solve the problem). Premature convergence takes place when the population quickly becomes so similar that the algorithm can no longer effectively explore the search space. At this point, it is often very difficult or even impossible for the population to improve in fitness (breed better programs). There are several reasons why this occurs, but the loss of genetic diversity is one of the most notable.

This is a particularly interesting problem in GP because of the tree-based genotype. Previous research has confirmed that not only do traditional GP populations very quickly lose genetic diversity in general, but a large contiguous portion of most of the trees become completely identical in a top-down fashion. This has a very profound and stifling impact on the effectiveness of finding solutions to difficult problems.

Combating Premature Convergence and Improving Search Quality

Fortunately, a lot of research has gone into combating premature convergence, and a lot of great strides have been made towards improving search quality. One example from the evolutionary computation literature is the Age-Layered Population Structure (ALPS). The main idea of ALPS is to use a special concept of age to divide the population into several layers. This allows new genetic material to be effectively inserted into the population and flow upwards from the bottom, while older genetic material evolves and improves as it makes its way towards the topmost layers. This helps to keep the population from becoming too genetically similar while it explores the search space. ALPS and other techniques have been successfully demonstrated in a number of problem domains.

Although there have been a lot of improvements that address premature convergence with varying degrees of success, I believe that there is still a lot of room to do better, and this is my current research focus. It is known that techniques designed to combat premature convergence often incur the cost of taking longer to find the optimal solution to a problem, although these approaches generally increase robustness (the reliability of finding optimal solutions). This is because these approaches avoid prematurely converging by aggressively exploring new areas of the very large search space instead of focusing in on one particular area. With this in mind, one of my research questions is: can we design robust algorithms that are capable of effectively avoiding premature convergence while still finding the optimal solution faster?

For more information about Armand’s work, you can contact him at burksarm at msu dot edu.

This entry was posted in BEACON Researchers at Work and tagged , , , , . Bookmark the permalink.

Comments are closed.