BEACON | An NSF Center for the Study of Evolution in Action

BEACON Researchers at Work: Searching for Innovation

Posted on August 6, 2012 by Danielle Whittaker

This week’s BEACON Researchers at Work blog post is by University of Texas graduate student Erkin Bahceci.

In this blog post, I will describe my research (with Risto Miikkulainen) on competitive multi-agent search, and in particular how I used evolutionary computation to optimize agent strategies. I have always been interested in how people come up with new ideas and novel solutions to problems, and similarly how companies create new products. Real-world product design is a complicated process subject to various constraints, which is difficult to model in full detail. However, a higher-level view might provide useful insights as well. Such an abstract approach can be applicable to a broader set of problems, that is, not just creation of new products, but any type of innovation search (such as art and design and scientific discovery).

Figure 1: An example 3D feature space for innovation search

One abstract way to model innovation search is to look at it as combining existing features or ideas to come up with a new product or idea. Let’s say we have a feature space in the form of a 3D cube, where each of the three dimensions corresponds to a feature, and each of the eight cube corners represents a potential product, that is, a combination of features. In this space, points can be encoded as bit strings, where 1 means the corresponding feature is included, and 0 means that feature is missing. For instance, the point 110 may represent a product with wireless communication and touchscreen, but without a camera, whereas 101 may represent a wireless device with a camera, but no touchscreen. Also, let’s say each point has a fitness value, denoting the potential success of the product that has those features, as shown in Figure 1. The goal is to find successful products, that is, points with high fitness values.

Of course, we are interested in having more than three features in our search space, which can be achieved by replacing the cube with an N-dimensional hypercube, which has 2^N corner points instead of just eight. Also, to assign fitness values to points in a systematic way, we use the NK fitness landscape formalization, which is outside the scope of this post. On this fitness landscape, the goal of a company (or an agent) is to visit points with as high fitness values as possible. The agent can start with an initial point, and try improving on it by adding or removing a small number features, called exploiting, or it can jump to a drastically different part of the feature space by making a larger number of changes, called exploring.

Furthermore, innovation search is usually not done in isolation, but in a competitive environment, which means agents influence each other with their discoveries and inventions. Agents sometimes imitate other agents closely, and they sometimes merely get some inspiration from others’ work. This sort of agent interaction can be added to the abstract model above by letting agents exploit and explore around another agent’s shared points, using those points to start their search instead of one of their own private points.

Another aspect of the fitness landscape we use is that it is dynamic. That is, whenever an agent visits a point, the fitness of that point and nearby points change, which we call flocking. Two types of flocking are used. Initial agent visits to a point cause boosting, where the region rises in fitness, whereas subsequent visits lead to crowding, where the region sinks. These changes make it possible to model the dynamics of fitness landscapes in innovation search, where boosting corresponds to creating new markets (such as tablet computers) and crowding to the saturation of existing markets (such as desktops).

It would be useful to see such changes in the fitness landscape. However, visualizing the whole fitness landscape is a challenge, since it is not practical to have a 3D visualization of an N-dimensional hypercube with all its 2^N points, where each point has N neighbors (since the flipping of each bit of an N-dimensional point produces a neighbor). To address this challenge, we came up with an alternative way to visualize the fitness landscape. Instead of trying to display all points and their neighbors, we only show the neighborhood around a focus point in full detail, for example, around one of the agents, with the resolution of displayed points diminishing in proportion to their distance to the focus point.

Figure 2: Wave riding with the spherical visualization of NK landscapes (Click to view full size)

To identify agent strategies that are good at finding high-fitness points, we perform strategy optimization, by evolving artificial neural networks (in particular Compositional Pattern Producing Networks) using the NEAT method. When the agent’s current state is given as input, these evolved networks output a set of values (one for each action), which are then used to probabilistically pick an action for the agent to perform. The possible actions are exploiting or exploring using shared points or private points.

We evolved agent strategies in three setups: (1) one where the agent was evaluated in a single homogeneous environment that had opponent agents with identical strategies, (2) one where the agent was evaluated in multiple homogeneous environments (which takes much more time than the first setup), and (3) one where the agent was evaluated in a single heterogeneous environment, that is, against opponent agents that each had a different strategy (which is similar to the first setup in terms of time required).

The second and third setups had the goal of evolving general strategies that can perform well in multiple environments, whereas the first setup did not. The performance of the two general strategies produced by the second and third setups on a given environment was lower than the strategy evolved particularly for that environment in the first setup, which is expected. On the other hand, the strategies that the heterogeneous setup produced performed close to those of the multiple homogeneous setup in the same homogeneous environments that were part of that setup, even though the required time for the heterogeneous setup evolution was much shorter. This result is noteworthy and might be useful in other domains as well.

One of the observed behaviors was a wave riding strategy in certain environments. Short exploitation jumps allow an agent to ride a boosting wave, staying at the forefront of the area that is being boosted as it moves through the landscape, leaving a trail of past visited points that have sunk in fitness (Figure 2). A sample video of this behavior is available.

Another observed phenomenon in this domain was a Twitter effect: when agents share all of their knowledge openly, they are inclined to imitate each other more and follow the sa

me ideas, which reduces diversity and also overall fitness due to crowding (Figure 3), whereas moderate restrictions to openness improve diversity and may increase creativity in the long run.

Figure 3: A conceptual illustration of the Twitter effect (click to view full size)

In order to test these ideas in a real-world setting, I am currently working with a dataset of human behavior in a competitive multi-agent search task under laboratory conditions. The two main goals are modeling the human subjects in this dataset as agent strategies, and obtaining strategies that perform better than the human subjects, through optimization. In the future, these methods and results may be applied in various industries by utilizing archival data from companies.

For more information about Erkin’s work, you can contact him at erkin at cs dot texas dot edu.

Posted in BEACON Researchers at Work | Tagged agent strategies, BEACON Researchers at Work, Evolutionary Computation | Leave a comment

Big horns trump smooth pickup lines every time

Posted on July 30, 2012 by Danielle Whittaker

From our latest press release:

Elk and rhinoceros beetles aren’t diabetic, but to grow big horns and attract mates it appears that the males are insulin-dependent.

Ian Dworkin, Michigan State University zoologist, was part of a team that for the first time ever showed why horns – from elk to rhinoceros beetles – and other decorative, mate-attracting structures are sensitive to changes in nutrition. As reported in the current issue of Science, the key ingredient for this growth is insulin, Dworkin said.

“Clearly elk antlers, peacock tail feathers and beetle horns are very different, but it appears that they do share similar mechanisms to make these structures so big,” he said. “And lowering insulin levels dramatically reduced the size of their ornate structures.”

Sexual selection has roots back to Darwin’s research. Subsequent research revealed the so-called “handicap principle,” which labeled the males as burdened for toting such unwieldy baggage. Dworkin’s team, however, believes that when insulin-dependence is part of the picture, the showy males are not in fact handicapped. Instead this insulin-dependence of these big horns provides a way for the males to show how great they are.

“It’s a sign that these males are thriving, made of some pretty sturdy stuff and certainly mate-worthy,” said Dworkin, who conducted the research at BEACON, MSU’s National Science Foundation Center for the Study of Evolution in Action.

Dworkin and the team determined that each time such exaggerated traits evolve, they repeatedly, but independently, seem to use insulin-dependence. This suggests that the traits are more likely to have evolved as honest indicators of quality rather than handicaps.

“While more work needs to be done, our results provide and important way of linking genetic mechanism with the ultimate evolutionary reason for the trait exaggeration,” Dworkin said.

Researchers from the University of Montana and Washington State University contributed to this study.

Original MSU press release is here.

The paper in Science can be accessed here.

Posted in Uncategorized | Tagged Biological Evolution, press release, sexual selection | Leave a comment

BEACON Researchers at Work: Evolving Robot Brains using Evolutionary Algorithms

Posted on July 23, 2012 by Danielle Whittaker

This week’s BEACON Researchers at Work post is courtesy of MSU undergraduate student Faisal Tameesh.

For more information about Faisal’s work, you can contact him at tameeshf at msu dot edu.

Posted in BEACON Researchers at Work | Tagged evolutionary algorithms, robots, video | Leave a comment

BEACON 2012 Congress Recap

Posted on July 20, 2012 by Danielle Whittaker

The annual BEACON Congress took place this week at Michigan State University. In addition to talks and posters about ongoing research and education progress, one of the most important kinds of sessions held at BEACON Congresses are “Sandbox Sessions.” Each of these sessions is organized around one topic, often a kind of “Grand Challenge” in evolutionary science that BEACONites are interested in pursuing. The format of these sessions is informal and collaborative, with one or two moderators encouraging input from all attendees, with the goal of generating ideas and new collaborations for BEACON research. This year’s Congress featured nine Sandbox Sessions on a variety of topics, described below.

Epigenetics, led by Joseph Graves (faculty, NCAT). This session started with a discussion of the definition of epigenetics – what it is, and what it is not. Many broad phenomena are often called “epigenetics” without any rigorous testing to test that hypothesis, and may simply be polymorphisms. Participants also considered how BEACON is well-suited to push this research forward.

Techniques for Visualizing Evolution, led by Heather Goldsby (postdoc, UW). In this session, participants identified the biggest challenges faced when trying to visualize evolution, including dealing with large amounts of data and representing multidimensional concepts like fitness landscapes. A variety of potential tools were discussed.

Evolution in Action and Global Change, led by Jeff Morris (postdoc, MSU). Participants discussed how understanding evolution is important for making predictions about the effects of climate change and other global phenomena. A variety of methods for studying this topic were also discussed.

Considerations for Next Generation Sequencing, led by Jeff Barrick (faculty, UT). Jeff Barrick gave an introduction to the various kinds of data available and best practices for handling them, and also shared the variety of courses and workshops available at BEACON institutions. Participants shared some new techniques, and discussed the challenges inherent in analyzing certain parts of genomes.

Dynamics of Predator-Prey Systems, led by Aaron Wagner (postdoc, MSU). This session was heavily attended by both biologists and engineers, who were interested in the topic for very different reasons. Predator-prey dynamics are extremely useful for developing search functions, particularly because one is less likely to get locked into a single solution due to the coevolution between predator and prey. Participants discussed ways to improve communication between biologists and engineers in this domain, especially because certain terms were not always used in the same way.

Collaborating with Minority-Serving Institutions, led by Judi Brown Clarke (Diversity Director, MSU). In addition to our partners at North Carolina A&T State University, our Congress was joined by faculty from two other minority-serving institutions: Dr. Aditi Pai from Spelman College, and Dr. Joseph Onyilagha from University of Arkansas at Pine Bluff. The group discussed the challenges of teaching evolutionary biology at minority-serving institutions, the need for mentoring to ensure a smooth transition between college and graduate school, and ways in which BEACON could reach out to these institutions with the goal of increasing diversity in evolutionary science.

Evolution of Sex, led by Barry Williams (faculty, MSU) and Ben Kerr (faculty, UW). Participants in this session discussed the ways in which sexual reproduction may have evolved, and the difficulties in designing experimental research to test these hypotheses. Digital evolution experiments have demonstrated that even though sexual reproduction has enormous benefits, it does not always evolve in experimental systems.

Evolution of Social Interactions, led by Arend Hintze (postdoc, MSU, who, by the way, is looking for a job!). This session addressed questions such as: What is the definition of “social”? Is there a continuum between social and non-social tasks? Does the brain use abilities acquired to solve non-social tasks to solve social tasks, or do social tasks require higher-order solutions? How essential is communication for social interactions?

Harnessing Evolution, led by Betty Cheng (faculty, MSU). This session focused on the huge range of evolutionary computation techniques currently being used in engineering, and the challenges involved in applying these techniques. Finally, the outreach potential of evolutionary computation was discussed – for example, the hands-on experience provided by robots.

This year’s Congress was the best-attended yet, and we look forward to seeing the products of all this dynamic interaction!

BEACONites: do you have photos of Congress that you’d like to share? Please send them to me at djwhitta at msu dot edu.

Posted in BEACONites | Tagged BEACON Congress, collaboration, diversity, Meetings, New research ideas | Leave a comment

BEACON Researchers at Work: Does sociality influence disease resistance?

Posted on July 16, 2012 by Danielle Whittaker

This week’s BEACON Researchers at Work blog post is by MSU graduate student Katy Califf.

I’m generally interested in how genetic diversity and behavior influence each other in wild populations of mammals, particularly in the realm of disease ecology. More simply, how do genetics influence behavioral decisions such as where to live, with whom to cooperate, and with whom to mate? Then, over time, as these decisions are made, what does the genetic population structure look like? When you think about it, a lot of these decisions come down to questions about pathogen resistance. If I live here, will I be safe? If I eat this, will I get sick? If I have kids with this individual, will they be healthy?

Obviously, most of these decisions are not conscious ones for many animals. So the next question is: how are these decisions made? Even if you’re not a scientist, you probably know that in the wild, we usually see related individuals living together, and unrelated individuals mating. This is because of genetic diversity. Related individuals are genetically similar and will often cooperate, as they gain what we call “inclusive fitness” by increasing the survival of individuals who share some of the same genes. Similarly, it is more beneficial to mate with unrelated individuals than to mate with closely related individuals, because this increases the genetic diversity of the offspring. There are many reasons why this is beneficial, but the one I am most interested in is the possibility that genetic diversity is an important determinant of disease resistance. In fact, many biologists believe that sex evolved as a mechanism to increase genetic diversity, largely in response to constantly evolving pathogens.

This leads us to another question: how do animals know to which animals they are related? Sometimes, they simply know through experience. But in the wild, animals often encounter related individuals that they’ve never met before. Evolution has favored mechanisms for recognizing these individuals, often through the sense of smell. Many wild animals demonstrate these unconscious preferences.

The well-known “t-shirt study” demonstrates how preferences based on genetic information might operate. In the 90’s, Wedekind & Furi asked female students to sniff sweaty t-shirts that males had worn for several days, and tell the researchers which odors they preferred. Interestingly, the females preferred shirts that had been worn by men who were genetically dissimilar from them at a group of genes known as the major histocompatibility complex, or MHC. When these females were taking an oral contraceptive, this preference was reversed. These same patterns have been seen in other species as well. The researchers concluded that this preference arises because, when the females are able to become pregnant, they prefer the odor of males that will make better mates, genetically speaking. And when they are unable to become pregnant, they prefer the odor of males that are more likely to be kin, and thus more likely to be reproductive helpers.

MHC genes are critical to disease resistance in vertebrates. The molecules encoded in these genes recognize foreign pathogens in the body and present them to immune system cells, thereby initiating immune response. Diversity in MHC genes has been linked in many species to increased reproductive success, decreased parasite loads, and enhanced disease resistance. However, it’s not yet completely clear how the MHC genes influence odor.

Photo by Andy Flies

As a graduate student in Dr. Kay Holekamp’s lab, I am currently addressing questions related to MHC diversity in a highly social carnivore, the spotted hyena (Crocuta crocuta). The Mara Hyena Project, run by Dr. Holekamp, has been studying spotted hyenas in the Masai Mara National Reserve in Kenya since 1988. Spotted hyenas are unique among carnivores in that females are socially dominant to all breeding males. Females have complete control over mating, due to their highly masculinized genitalia. Since female choice is essentially absolute in this species, it’s a great system in which to ask questions about the underlying mechanisms of mate choice. In addition, research has shown that hyenas have a particularly robust immune system. They regularly test positive for many diseases that have caused massive mortality in sympatric carnivores, yet hyenas rarely get sick, and their disease related mortality is much lower than that of other carnivores. While hyenas are exceptionally adept hunters, they also regularly feed on carrion, and they have specific adaptations to break open large bones. These feeding habits might expose them to certain types of parasites or pathogens to which other carnivores are not exposed.

First, I wanted to know just how diverse the MHC genes are among spotted hyenas. I have sequenced 3 genes in hyenas that have been linked to various fitness measures in other species, and have found that spotted hyenas do indeed exhibit evidence of strong positive selection and high diversity at their MHC loci. I am now analyzing several years of pedigree data to determine whether females tend to mate with males that differ from them at these loci more than would be expected by chance. If immune system diversity is important to hyenas, I would expect them to be mating with individuals that differ from them at MHC. However, it is also possible that all spotted hyenas are already so diverse that something other than mate choice is driving this diversity.

I have also looked at MHC diversity in a closely related hyena species, the striped hyena (Hyaena hyaena). The striped hyena is a more “typical” mammal than the spotted. There are no sex role reversals in this species, and they tend to be solitary or live in small family groups. However, I find levels of MHC diversity in this species that are just the same as those in spotted hyenas. This leads me to believe that there is some evolutionary force in hyenas that is more important in maintaining MHC diversity than sexual selection. Unique traits shared by both of these hyena species are the ability to crack bone and feed on carrion. Perhaps the pathogens encountered via this feeding ecology are more important in hyenas than other carnivores, and this may be what is driving variation.

I am currently sequencing MHC genes in the remaining 2 extant species in the Hyaenidae family to test this hypothesis. The brown hyena (Parahyaenea brunnea) also cracks bone and feeds on carrion. However, the aardwolf (Proteles cristata) is a diminutive hyena that feeds only on termites. If the above hypothesis is correct, in the brown hyena I would expect to see levels of MHC diversity similar to the spotted and striped hyenas, whereas I would expect to see lower levels of diversity in the aardwolf. Either way, these data will further clarify the picture of evolution at MHC genes in hyenas. Stay tuned!

For more about Katy’s work,

you can contact her at califfka at msu dot edu.

Posted in BEACON Researchers at Work | Tagged BEACON Researchers at Work, Biological Evolution, disease, Field Biology, hyenas, MHC | Leave a comment

Evolution of Music Illustrates Epistatic Interactions

Posted on July 10, 2012 by Danielle Whittaker

Idealized music fitness landscape

In today’s issue of Proceedings of the National Academy of Science, BEACONite Chris Adami comments on a research article by MacCallum, Mauch, Burt & Leroy on “The evolution of music by public choice.” Much like the digital evolution techniques used by many BEACONites, the authors used a computer program to evolve musical sounds via mutation and recombination. The successful sounds that contributed to the next generation were those selected by human listeners who found the sounds pleasing. Though the “musical appeal” measurement increased significantly in the early evolutionary stages, it quickly flattened out. Two traits that contributed to the music’s appeal – rhythmic complexity and chordal clarity – were highly correlated, and sexual recombination was likely breaking up these adaptive complexes, leading to the adaptation slowdown.

Adami explains that although this experiment may not tell us much about the way music actually evolves in human societies, it demonstrates that the fitness landscape metaphor and epistatic interactions between mutations are important both for understanding evolution on artificial landscapes as well as for predicting evolution in nature – for example, the predicted evolution of drug resistance in HIV.

Citation: Adami, C. 2012. Adaptive walks on the fitness landscape of music. Proceedings of the National Academy of Sciences. doi:10.1073/pnas.1209301109

Posted in BEACON in the News | Tagged Digital Evolution, fitness landscapes | Leave a comment

BEACON Researchers at Work: Evolving Robot Behavior

Posted on July 9, 2012 by Danielle Whittaker

This week’s BEACON Researchers at Work blog post is by MSU graduate student Chad Byers.

I have always been fascinated by the mechanisms that drive both the molecular and digital systems of our world and have been fortunate enough, through the BEACON Center, to work in an environment providing the resources to pursue knowledge in both. A cell, the fundamental building block of an organism, performs a number of internal functions essential to the organism’s survival in order to produce control properties that we, as computer scientists, wish to incorporate into the design of our digital systems. Properties such as decentralization, resiliency to failure, and cooperation have been notoriously difficult to mimic from nature. One alternative is to capture several of the key components from the biological cell “model” into a digital model and allow these components to freely mutate, thereby using evolution to control the behaviors of a system, such as a wheeled robot. In this way, we allow evolution to naturally select for these same properties (robustness, resiliency, etc.) due to the selective advantage they provide over their competition in a digital population.

As most biologists would agree, control within a biological cell is both massively distributed and massively parallel, arising out of complex networks of interactions. When we first started down this route for finding a control system amenable to the process of evolution, we often found ourselves between a rock and a hard place. Many successful bio-inspired techniques, such as artificial neural networks, upheld the qualities of massive parallelism and distributed control, however, it was difficult to truly understand how an evolved network successfully performed the task. On the opposite end of the spectrum were techniques such as Genetic Programming whose genomes were a sequence of instructions that altered the robot itself (ex. “move-backward”) or interacted with the robot’s environment (ex. “if-sense-color”). However, evolved programs of this type often end up littered with regions that endlessly sense information from the environment or suffer from bloated instruction sets (ex. “if-sense-red”, “if-sense-blue”, “if-sense-green”, etc.) making control difficult.

With these difficulties in mind, we decided to design a digital model to map components from the biological cell model of control into the digital realm. To do this, we used the process of signal transduction from nature as inspiration where molecules form a universal medium of information that continually pass into a cell via receptors, encapsulating information concerning the cell’s current environment. These molecules are often manipulated, altered, and communicated internally in order to produce the cell’s response. Fortunately, the world of digital systems is not too different where the universal medium is instead based on sequences of bits (bitstrings) that are altered through instructions in the computer to produce the system’s response.

To begin with, we decomposed the process of biological signal transduction into a 3-step process: (1) Sense, (2) Compute, and (3) Respond, in order to provide control for a simulated wheeled robot. Similar to a cell’s receptors, a robot possesses various sensors for detecting stimuli such as obstacles, colors, and sounds. In the first step of our model, Sense, the robot uses its sensors to detect nearby stimuli and sets a corresponding bit within a bitstring to True, signifying the presence of the stimuli in one of the sensors. Once all of the stimuli in the robot’s environmemnt have been mapped to their corresponding bit(s), the second step, Compute, takes place. During this step, evolved computer programs called digital enzymes execute in parallel with one another by reading bitstrings from the environment, altering their stored bits, and using these altered bitstrings to guide the simulated robot’s behaviors. To relieve the human design bias discussed earlier, we allow the number of unique programs and the instructions contained within each program, to mutate and evolve freely. Finally, in the last step, Respond, we observe the bitstrings that were sent to guide the robot’s behaviors and determine which actions were “voted” upon by digital enzymes during the Compute step. The end result from this process is a majority-bit vote for how the robot should turn, move, emit color, and emit sound. With this design, we are able to obtain relief the bloated instruction set problem by instead mapping the sounds, colors, and actions of both the system and environment into bitstrings which can be exchanged and interpreted throughout the system.

In our preliminary work, we wanted to test our proof of concept design on a target problem that would allow properties such as robustness and cooperation to be selected for in evolved, simulated robot controllers. We decided that one critical task that nearly every organism on our planet faces at some point or another is foraging for food to sustain life. Each robot controller in the population starts as a blank slate, containing only one copy of empty program. To evaluate each robot design, we created a clonal colony of 6 robots, using a given controller, who were charged with the task of finding and returning 8 food items to a central home region in an unbounded world, as quickly as possible. After 1000 generations of evolution, mutation and natural selection built many surprising strategies, using both sound and color, in order to successfully forage in their digital environments.

One of these evolved strategies is shown in the video below where the robots (arrows) make circular movements near home, mimicking home’s color (red) and effectively broadening the sense of home in the environment. As they search around home and discover an item of food (blue), they immediately switch their behavior to act as a locator beacon, blinking their lights as a notification to others. Over time, this strategy allows the colony to successfully find and return the food items in their environment to home.

As we move forward in the future with this research, we are interested in looking at what aspects of our digital model are most important for driving behavior such as distributed control, interaction, memory, consensus, etc. and how these properties are influenced by the environment.

For more information about Chad’s work, you can contact him at byerscha at msu dot edu.

Posted in BEACON Researchers at Work | Tagged BEACON Researchers at Work, Digital Evolution, evolving behavior, robots, video | Leave a comment

BEACON Researchers at Work: Phylotastic!

Posted on July 3, 2012 by Danielle Whittaker

This week’s BEACON Researchers at Work blog post is by University of Texas at Austin graduate student Emily Jane McTavish.

In early June several BEACONites participated in a hackathon at the National Evolutionary Synthesis Center (NESCent), in Durham, NC. I was there in person, and Luke Harmon’s lab at University of Idaho, including Jon Eastman, Joseph Brown and Matt Pennell telecommuted in. The whole phylotastic team included 31 people from more than 10 institutions. Everyone involved is listed here.

Side note: Cross country collaboration required the Idaho team to be up for a first morning start time of 6 AM! (Pacific Time) whereas over on the east coast in Durham I took a swim in the pool before our 9 am start. Yay East Coast!

Phylotastic

Re-use of phylogenies is at the heart of the Phylotastic mission. While phylogeneticists are busy creating bigger and bigger trees (muahaha!), the information in these trees isn’t currently readily accessible to others. For example, an ecologist might use a phylogeny of the grasses found in a study area, or a high school teacher might like their students to explore the relationships among mammal groups.

Hackathon? Is that like Reggaeton? (An actual text I got while at Phylotastic.)

Yes! Great for dance parties, and… wait. No. They are not very similar.

This was my first hackathon, so it was all new to me. The hackathon concept is to get a bunch of people who are interested in the same software end product together, spend a bit of time talking about how it should work, and a concentrated period of time programming and making it happen!

The concept of the Phylotastic hackthon was developed by the NESCent Hackathons, Interoperability, and Phylogenies group. Although interoperability is a bit jargony, it is a concept that any scientist who has wasted huge chunks of time formatting data for the nit-picky specifications of various analysis software can appreciate: data and analyses should be housed in such a way that the information can be easily transferred. One example of this is the NeXML format for storing phylogenetic information. Unlike the standard nexus file, which requires a lot of other information not included in that file to be understood, NeXML format holds all the information about the phylogenic analysis together in one place, easily archived, and most importantly, easily re-used.

Phylotastic will be implemented as a web application. The concept is simple: a user types in the names of species they are investigating, and Phylotastic spits out a tree of those species. Phylotastic has three main components, the tree store, taxonomic name resolution service, and the topology server. In addition several neat extensions have already been developed.

Hacking happening! (from L, Christian Zmasek, Chris Baron, Arlin Stoltzfus, Megan Pirrung, Holly Bik, Rutger Vos)

Components of Phylotastic:

Tree store: A back end data store that includes large trees that can be pruned to make the output tree. Phylotastic will be able to output phylogenies based on either these trees or a user-inputted source tree My role in phylotastic is working on how to include metadata in association with these trees. A set of standards for the appropriate Minimum Information About a Phylogenic Analysis (MIAPA) were developed at NESCent last year. MIAPA is designed to standardize the data associated with trees. Sharing of trees in simple parenthetical (Newick) format, divorces phylogenies from essential information about how they were constructed, for example, reference for the publication, molecular vs. morphological data, Bayesian analyses vs. ML vs. Parsimony, the units of branch lengths, and much more.

Taxonomic Name Resolution Service: This service translates user input names into those used on trees in the tree store. Name resolution is a trickier problem than it might seem, and very impressively implemented by the TNRS group. One of the most pervasive problems, is the interpretation of different taxon names (sometimes for the same group), or spelling errors. Most biologists are familiar with how heated discussions of naming can be, (for an example see the Sperm Whale Wikipedia talk page. Warning: strong language!)

See demo python module to translate common names to species names.

(FYI: TNRS suggests Physeter catodon for the sperm whale)

Phylotastic Topology Demo: Ge a tree for your taxa of interest at http://phylotastic-wg.nescent.org/script/phylotastic.cgi. This demo doesn’t have the TNRS implemented yet, and so doesn’t support for common names yet, but if you enter names in the format genus_species, you can get a pruned tree.

Extensions of Phylotastic

Datelife: Many researchers’ questions require dates associated with divergences in trees. The phylotastic project Datelife returns times of divergences for pairs of species. Take a look at the website for a neat demo, and check out the divergence time of your favorite species pair! http://datelife.org/

The Durham Datelife team (Peter Midford, Jeet Sukumaran and Tracy Heath)

Reconcilotastic: Treats the trees in phytotastic trees store as species trees and reconciles gene tree relationships with these species trees. http://phylotastic.nescent.org/shiny/reconciliotastic/

Phylotastic playground: Integrates geographic information about species locations and ranges with phylotastic output. http://phylotastic.org/demos/phylotastic-viz/index.html

Mesquite-otastic: Phylotastic is implemented as a module in the phylogenetics software environment Mesquite. This screencast shows how a researcher could perform ancestral state reconstruction using a tree pulled form phylotastic. See a screencast!

The Phylotastic project is still in progress, and all the components still need to be fully integrated. Take a look at these de

monstations though, and see if it might be of interest to you or your collaborators.

Also- Phylotastic will be presented at iEvoBio, a sister meeting to Evolution, in Ottawa July 11-12. If you are there, come check it out and say hi!

For more information, you can contact Emily Jane at ejmctavish at utexas dot edu.

Posted in BEACON Researchers at Work | Tagged BEACON Researchers at Work, Computer Science, phylogenetics, software | Leave a comment

BEACON Researchers at Work: What happens to bacterial communities under selection?

Posted on June 25, 2012 by Danielle Whittaker

This week’s BEACON Researchers at Work blog post is by Michigan State University postdoc Bjørn Østman.

When one gene comes under a new selection pressure, a population can respond by increasing the frequency of the better alleles. This can involve directional selection, whereby the population shifts towards the new optimum, and/or it can entail stabilizing selection, where the genetic diversity of the population decreases. In both cases allele frequencies change, and this is what (biological) evolution is.

This is all fairly straightforward. However, when there are many populations that are distinct species, and they all come under the same new selection pressure, then what is that? If we can detect selection between these distinct populations, is that still evolution? It is not evolution in the traditional sense, which center its attention on what happen within a population. So if we’re not looking at what happens within one population, can we even say that we are studying evolution?

In a previous post I explained how we have used metagenomics to retrieve DNA sequences of a specific gene called nitrite reductase (nirK) that soil bacteria use to obtain energy from fertilizer. When sequencing the soil only a limited set of sequences are discovered. Imagine then that some species are more abundant in the soil than others. Because it is random with respect to which species they come from, we are then clearly more likely to retrieve sequences from the most abundant species. There are many bacterial species that has a copy of nirK, and we are limited in how many sequences we can obtain. Many species will therefore not be represented in our sample.

Now, comparing these sequences is done using the formalism of d_N/d_S, which measures the ratio between non-synonymous nucleotide substitutions and synonymous substitutions (substitutions that change an amino acid vs. those that do not). d_N/d_S (also designated by ω) is measured between species, so it is perfect for the sequences we have. The analysis showed that ω is very low, indicating purifying selection – there are more synonymous nucleotide changes compared to non-synonymous changes than expected if both were equally likely. That means that nirK is being constrained and optimized, presumably because the gene carries out an important function for the bacteria. Changes to the resulting protein are not tolerated, though a little variation in the amino acid sequence between the species does exist.

Furthermore, different environments were compared. In one environment, deciduous forest (DF), the soil is not fertilized. In another environment used for standard agriculture (AG), the soil is fertilized. The analysis showed that the sequences in AG are under stronger purifying selection than sequences in DF (figure 1). Presumably this is because the conditions in AG make it more favorable and more important to have a really good copy of nirK that can help the bacteria to obtain energy from the nitrate in the fertilizer.

Figure 1. dN/dS is smaller in Ag than in DF, indicating that there is stronger purifying selection in AG compare to DF. ES and SF are environments that have not been used for agriculture for about 20 and 40 years, respectively.

So far, so good. Now here is my question. Given that the bacteria experience purifying selection, do we really know what is happening to the community of species? Take a look at the following figure.

Figure 2. An artist’s representation of different populations in two-dimensional amino acid space.

The farther sequences are from each other, the fewer amino acids they have in common. In (A) several species of bacteria can be seen, each represented by a Gaussian distribution, where the darkest points are the more abundant sequences. The red cross represents the optimal sequence (need there be only one?), but because bacteria in DF get most of their energy from oxygen, nirK is of relatively little consequence. In (B) and (C), AG has been loaded with fertilizer, so now there is ample opportunity to get energy from that. Therefore the species experience a pull towards the optimal sequence. In (B) this results in each of the population shifting their distribution towards to optimum, while in (C) they do not shift, but instead the species that are already closer to the optimum experience an increases in carrying capacity, such that they become more abundant compared to species that are farther away from the optimum.

d_N/d_S basically measures this distance in amino acid space, and clearly this distance is on average diminished in (B). However, because we are more likely to retrieve sequences from the more abundant species, the average distance between sequences is also diminished in (C). In other words, both models are consistent with d_N/d_S being lower in AG, and we therefore cannot say what is really going on in the soil. Is there a way to distinguish between the two models? Could we take some bacteria to the lab and grow them under DF and AG like conditions, and then figure this out? Is there a third model that can explain the data as well?

And then the question of evolution – is this even evolution? Some biologists simply call this species sorting, and dismiss that it is evolution. However, I argue that it is evolution, because what we are observing is the effect of natural selection, which in (B) causes a change in allele frequencies within each population, and in (C) because it changes abundance that can lead to long-term changes in community structure.

Evolution or not? What do you think?

I have cross-posted on my own blog Pleiotropy. Please leave comments there.

For more information about Bjørn’s work, you can contact him at ostman at msu dot edu.

Posted in BEACON Researchers at Work | Tagged agriculture, bacterial communities, BEACON Researchers at Work, Biological Evolution, metagenomics | Leave a comment

BEACON Researchers at Work: Speciation and genetic incompatibilities in digital organisms

Posted on June 18, 2012 by Danielle Whittaker

This week’s BEACON Researchers at Work post is by MSU graduate student Carlos Anderson.

This blog post is a follow up to one I wrote last year about my research on speciation with digital organisms. One of my projects tested the hypothesis that compensatory adaptation—an evolutionary process in which secondary mutations can reduce the negative effects of accumulated deleterious mutations—causes hybrid inviability between populations, a form of reproductive isolation that may lead to speciation. Using digital organisms, I showed that hybrids between compensated populations inherited combinations of deleterious and compensatory mutations that turned out to be incompatible in the hybrid. This incompatibility arose because the full combination of deleterious and compensatory mutations from either parent were not always present in the hybrid, causing a mismatch between deleterious and compensatory mutations.

Over the past year, I have been working on complementing my digital work with biological organisms, specifically, the budding yeast. Yeast are unicellular fungi, known primarily for their use in brewing and baking, but have also been used extensively in scientific research. Using common techniques in yeast genetics, I deleted certain genes from yeast that caused their growth rate to slow down, simulating the effects of deleterious mutations. Currently, I am allowing populations of yeast with these deleterious mutations to evolve separately, so that each population acquires compensatory mutations independently of other such populations. After 500 generations (they are currently at around 50), I will hybridize different populations and measure the growth rate of the hybrids. If I find that hybrids are less fit than their parents, I will also be able to tease apart the interactions between deleterious or compensatory mutations that cause incompatibilities. If I find such incompatibilities, this work would not only show that reproductive isolation can form via compensatory adaptation, but it would also support the utility of digital organisms in evolutionary research.

In another study using digital organisms, I tested whether speciation could proceed in the face of migration between populations. I also tested whether differences in environments between populations promoted or constrained speciation. I found that when the environments were different, reproductive isolation developed even at high migration rates of 10%. When the environments were the same, however, even a 1% migration prevented reproductive isolation. The reason was that when the environments were the same, any beneficial mutations in one population that were transferred to the other population through migration were also adaptive in the other population. This caused both populations to adapt similarly, preventing reproductive isolation between them.

Since those experiments, I have added two variables that may affect reproductive isolation: dimensionality and type of hybridization. Dimensionality describes the number of selective pressures in an environment. It has been hypothesized that adaptation to multiple selective pressures may increase divergence between populations, and thus increase the probability of genetic incompatibilities. To test this hypothesis, I evolved populations under both low and high dimensionality, where the environments could be the same or different between the populations (as in the previous experiments). I found that, indeed, populations that adapted to a highly dimensional environment produced hybrids that were less fit than hybrids produced by populations adapted to few dimensions.

The other variable I added was the type of hybridization. Originally, hybrids were created by exchanging a single, contiguous region of a random size from the parental genomes. Thus, hybrids often inherited complete sets of tightly-linked co-adapted genetic regions from each parent, hiding potential incompatibilities between populations. In an attempt to expose these co-adapted genetic regions, I implemented a hybridization method in which every site of the genome could potentially recombine, thereby increasing the number of recombination break points. Whereas with the original hybridization method populations that adapted to similar environments did not form as strong reproductive isolation as populations that adapted to different environments (when there was no migration between populations), the new hybridization method caused both treatments—same environments and different environments—to show similar levels of reproductive isolation. These results show that genetic incompatibilities can form even when populations adapt to the same environment.

Pairwise genetic incompatibilities (DMIs) snowball through time

Finally, using digital organisms I explored a hypothesis related to the formation of genetic incompatibilities. As I have discussed, when a population becomes geographically divided into two, each population evolves independently of the other, so that genetic incompatibilities may form. Genetic incompatibilities may cause hybrids to be inviable, which is a form of reproductive isolation (i.e., speciation). One theoretical prediction of this process posits that the number of genetic incompatibilities involving two alleles should increase quadratically through time (the so-called “snowball effect”). This prediction is very difficult to test with biological organisms because a lot of genetic manipulations would have to be performed. Using digital organisms, I tested the snowball effect and found that pairwise incompatibilities do increase quadratically through time. But I also found the presence of “buffer” alleles in hybrids that lessened the fitness effect of these incompatibilities, showing that more complex interactions may explain hybrid inviability.

Overall, my research opens the exciting possibility that speciation can form through complex genetic incompatibilities, some of which may be due to compensatory adaptation, regardless of the environmental differences (or lack of) between populations. My next project is to examine more closely compensatory adaptation itself. I will test how population size, mutation rate, and the initial fitness effect of deleterious mutations affect the rate of compensatory adaptation (versus reversion). Knowledge of the effects of these factors on compensation is not only relevant to my research on speciation, but also relevant to broader topics where compensation is important, such as the recovery of endangered species and the fight against antibiotic resistance.

For more information about Carlos’ work, you can contact him at carlosja at msu dot edu.

Posted in BEACON Researchers at Work | Tagged BEACON Researchers at Work, Biological Evolution, Digital Evolution, snowball effect, speciation | Leave a comment