This Evolution 101 post is by MSU grad student Tyler Derr

I’m sure you’ve heard the saying that our DNA is the “blueprint” of who we are. Well, our genes are the sequences in our DNA that actually encode instructions for particular functions. What might come as a surprise to some is that in humans over 98% of our DNA is non-coding regions (Elgar and Vavouri 2008). This means that less than 2% of our DNA actually codes for proteins. Pseudogenes, unlike our genes, fit into the non-coding category, but what exactly are they?

A pseudogene is defined as a sequence in our DNA that is homologous to a known gene, but is nonfunctional (i.e. looks like a gene, but for some reason or another, just can’t quite make the cut to creating a functional protein). Therefore it would seem logical to assume, since they can’t make a functional protein, that they serve no purpose. In fact, until just a few years ago, scientists believed that they didn’t have any immediate function. However, now we know that some pseudogenes can actually serve an important function to an organism and not just a critical role in evolution. After discussing the types of pseudogenes and an example of how they can sometimes provide an immediate useful functionality to an organism, we will discuss their relevance to evolution.

There are three main types of duplicated pseudogenes: unitary, duplicated and processed and duplicated non-processed (Figure 1). This categorization is based on how the pseudogenes appear. We shall further discuss these three types below.

Figure 1: Types of duplicated pseudogenes: unprocessed (top) and processed (bottom). Note that they are all created in some way from a functional gene. Rouchka, Eric C., and I. Elizabeth Cha. “Current trends in pseudogene detection and characterization.” Current Bioinformatics 4.2 (2009): 112-119.

A processed duplicated pseudogene occurs during what is called retrotransposition. This is when a portion of mature mRNA is placed back into the DNA. This type of pseudogene is the easiest to detect in our DNA due to the fact that this process (also known as reverse transcription) allows the insertion of the mRNA poly-A tail into the DNA. The poly-A tail is just a long sequence of ‘A’s, which is a characteristic of mature mRNA and is usually not found in our DNA. One of the reasons why these insertions are classified as pseudogenes is because the placement of the mRNA in the DNA lacks a promoter sequence, which acts as a flag that represents where to start the transcription process. The steps required for making a processed pseudogene can be seen in the bottom section of Figure 1.

The second type of pseudogenes are unprocessed duplicated pseudogenes and are created during the copying of genes in the DNA (gene duplication). Once a gene has been duplicated, if one of the gene copies incurs a mutation, such as a nucleotide change that results in an early stop codon in the middle of the gene, it loses its ability to code for a protein and can be thought of as “junk DNA”. Normally, there would be a huge selection pressure on such a mutation if a single gene no longer functioned, but since the mutation happened on a gene that had undergone gene duplication, there would still be at least one functioning copy of that gene in the genome. Thus, this type of pseudogene can undergo genetic drift and acquire more mutations that have no direct effect on the fitness of the individuals that have this DNA sequence. The duplication and mutation steps are shown in the upper section of Figure 1.

The last type are the same as the duplicated pseudogenes in that they occur due to mutations, but instead of happening to a gene that has undergone gene duplication, unitary pseudogenes are when the mutated gene is the only copy of itself in the genome. The argument used as to why there would not be selection pressure on individuals that have a duplicated pseudogene no longer holds with the unitary pseudogenes. This is because if the mutated gene has no duplicates, then the gene has been completely deactivated.

As mentioned earlier, scientists used to consider pseudogenes in the category of “junk DNA”, nonfunctional gene lookalikes, and knew them mostly as just sequences that caused problems in their studies (e.g. PCR experiments). However, over time and with further study, quite a number of surprisingly interesting findings have been uncovered involving pseudogenes. One specific example from 2010 that was published in Nature discovered that what we had previously known as a pseudogene, PTENP1, was in fact helping suppress tumor growth in many colon cancer cell lines (Poliseno et al. 2010). The basic idea is that although PTENP1 could not undergo translation to become a functional protein, it was able to play the role of a decoy in having microRNA bind with its processed mRNA rather than the mRNA processed from the PTEN gene. This allowed for more PTEN protein since if the microRNA had attached to the PTEN mRNA it would have not been able to undergo translation to becoming a protein. This process is shown below in Figure 2. The authors had proposed that PTENP1 be no longer considered a pseudogene, but instead a “bona fide tumor suppressor gene.”

Figure 2: Visual explanation of how the pseudogene PTEN1 can help suppress tumor growth. Image Credit:

Even though we just discussed an example where a pseudogene performs an immediate function for an organism, we also need to mention how they can function beyond the scope of an individual organisms lifetime, and instead on an evolutionary scale. In fact (as mentioned earlier) a duplicated pseudogene has the potential to undergo genetic drift and acquire multiple mutations with no detrimental effects to the fitness of the individual. It can be the case that this sequence of DNA later be resurrected into a gene resulting in a new functional protein being formed (Zhang 2003). It can sometimes be the case that simple (e.g. single nucleotide) mutations can result in huge jumps in the functional space; even to the point where the gene is turned off. It might come as a surprise, but theoretically having multiple mutations (which in comparison results in a larger step in the DNA sequence space) on a duplicated pseudogene can actually result in a smaller step in the functional space. These multiple mutations can result in a fitness improvement to an individual. Thus showing that pseudogenes provide an avenue for genetic drift to take place and due to the resurrection of the mutated duplicated gene an individual can express the a function that provides a fitness improvement.

We have discussed the three main types of pseudogenes, introduced an example that has proven some pseudogenes are actually important in their current state, and discussed reasons as to why they are valuable from an evolutionary standpoint. Although it might not be the case that every pseudogene has a current unique and important function to fulfill, those that currently do not have a purpose are still undergoing genetic drift and could possibly, one day, arise to serve a purpose for our ancestors many years from now.


Elgar, G. and Vavouri, T. (2008). “Tuning in to the signals: noncoding sequence conservation in vertebrate genomes”. Trends in genetics, 24(7), 344-352.

Poliseno, L., et al. (2010). “A coding-independent function of gene and pseudogene mRNAs regulates tumour biology”. Nature, 465(7301), 1033-1038.

Zhang, J. (2003). “Evolution by gene duplication: an update”. Trends in Ecology and Evolution, 18(6), 292-298.


Posted in Evolution 101 | Tagged , , | Leave a comment

Selfish Genes and the Resulting Gene Conflict

This Evolution 101 post is by MSU grad student Alex Lalejini

The above comic strip might lead one to believe that the phrase ‘selfish genes’ describes genes that make individuals act selfishly; however, this is not at all what is meant by the phrase ‘selfish genes’. This post gives a brief introduction to selfish genes, which is a story rich in greed, replication, and conflict. Aside from the greed, replication, and conflict, the effects and implications of selfish genes are far-reaching, which makes them incredibly interesting.

The Gene Perspective

We are accustomed to thinking about evolution from the perspective of whole organisms: Individual organisms in a population have varying observable characteristics, or phenotypes, as a result of inherited genetic variations. Some variations increase an individual’s ability to compete for resources and reproduce, and through natural selection over many generations, the beneficial genetic variations are propagated through the population of organisms.

However, it is interesting to consider evolution from a more gene­centered perspective. Richard Dawkins has described living organisms as “throwaway survival machines for genes” with this gene­centered perspective in mind. While individual organisms inevitably die, the information coded by their genes has the chance to continue from generation to generation. Genes with phenotypic effects beneficial to a host organism’s chance of survival and reproduction have an improved chance to persist over many generations as compared to genes with less beneficial or harmful phenotypic effects. This is not always the case; some genetic elements achieve persistence and propagation in a population without any consideration for their host organism.


Genetic elements may exploit alternative methods of persistence and propagation without contributing to their host organism’s fitness. As such, these genetic elements are not necessarily invested in the host organism’s fitness, and as a result, their alternative methods of propagation often negatively affect the host organism’s fitness. These genes are selfish – their expression advances their own interests at the expense of other genes and the host organism as a whole. There are a multitude of types of gene selfishness. Here I provide an overview of one: transposable elements.

From Genes in Conflict by Burt and Trivers (2009), transposable elements “accumulate by copying themselves into new locations of the genome”; they are often referred to as selfish DNA parasites. Transposable elements are analogous to a self­replicating computer virus that copies itself many times when it is put on a computer in order to avoid removal. As we are aware, this type of computer virus could have a significant negative effect on the performance of a computer. In a similar manner, transposable elements can have a strongly negative effect on the host organism’s fitness. This form of selfish expression is surprisingly widespread – at least 45% of the human genome is derived from transposable elements (Lander et al.,2001)!

On a bit of a historical note, we must thank Barbara McClintock and her work with the familiar Thanksgiving holiday staple, multi­colored corn, for the discovery of transposable elements. As McClintock discovered, transposable elements are responsible for the vivid mosaics of color seen in Indian Corn. For those with further interest in transposable elements, her work is a great place to start.


We have now seen that the expression of selfish genes can increase their own fitness at the cost of other genes and the host organism. As a result, selfish genes are often at odds with the ‘unselfish’ genes that rely on the reproductive success of the host organism in order to increase in frequency. This tension caused by opposing interests facilitates a form of gene conflict. If there is a selfish gene negatively affecting organismal fitness present in a population, there is selection pressure for other genes that suppress the selfish genetic element’s expression. As a result, we see genes with contradictory or conflicting effects evolve.

Evolutionary Implications

While the immediate effects of selfish genetic elements on host organisms are often negative, there is some evidence to suggest that selfish genetic elements and the resulting gene conflict helps to drive evolutionary change and innovation (Werren, 2001). As some genetic elements evolve to get ahead at the expense of the rest of the organism, other genetic elements arise to minimize the negative effects of selfish genes. In the case of transposable elements, the rest of the genome is sometimes able to recruit a transposable element for new cellular functions (Werren, 2001). In this way, selfish genetic elements can be instrumental in pushing organisms toward increasing genetic robustness. Selfish genetic elements, however, do not always bestow long­term positive effects on organisms; they can also lead to species extinction, which perhaps ironically, also leads to the selfish genetic element’s extinction.


Burt, A. & Trivers, R. (2009). Genes in conflict: the biology of selfish genetic elements. Harvard University Press.

Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., … & Grafham, D. (2001). Initial sequencing and analysis of the human genome.Nature,409(6822), 860­921.

Werren, J. H. (2011). Selfish genetic elements, genetic conflict, and evolutionary innovation. Proceedings of the National Academy of Sciences,108(Supplement 2), 10863­10870.

Posted in Uncategorized | Leave a comment

Evolution 101 – Mutations: From the X-Men to the X-Chromosome

This Evolution 101 post is by MSU grad student Douglas Kirkpatrick

Fig 1: The X-Men

Everyone knows what mutation is, right? It’s that magical scientific hand-wave that gives the X-Men their powers. Almost certainly the result of interaction with gamma radiation or toxic substances, mutation always has the most drastic results on the people that it affects. Right?

Unfortunately for us would-be superheroes, these results are neither likely nor commonplace, and the effects of mutation are almost always exaggerated while its everyday occurrence is typically ignored. The best example of how mutation affects our everyday life is, ironically enough, found in a brief scene from X-Men: First Class. The young Professor Xavier mentions while wooing a young woman that based on her blue eyes alone she is a mutant, as blue eyes are a result of a specific mutation to DNA. This observation is as close to science fact that the movie comes.

The word mutation comes from the Latin “mutare” which means “to change.” [2] Thus, by its roots the word mutation indicates a change or modification to something. In the context of biology, this is specifically a change to the genetic structure of an organism, to the organism’s DNA. Within that broad definition fall many forms of mutations, each with varying results and effects. An overarching classification for mutations is the scale at which they operate. Small scale mutations affect only a single gene, or perhaps only a few DNA base pairs, whereas large scale mutations affect an entire chromosome or chromosomes.

The small scale mutations fall into one of three main types. First are the point mutations, where a specific DNA base is changed to a different base, from Adenine to Cytosine perhaps. These are potentially caused by radiation, but are often prevented due to the matched base pairs of DNA and the assistance of corrective proteins. The second small scale mutation is addition, where a sequence of DNA is added to a given location in the chromosome. The third, deletion, is the opposite of addition. Deletion occurs when a certain sequence of DNA is removed from a given location. Deletion is visually demonstrated in figure 2. Despite their size, these mutations can still have a large effect on the organism. A single point mutation in hemoglobin, causing the replacement of an amino acid, is the source of sickle cell disease. [1]

Large scale mutations can be similarly subdivided. Duplications and deletions occur when regions of the genome are repeated or removed, respectively, from a chromosome. These types of mutations can cause a certain protein to be overproduced or not produced at all. Underproduction of a protein might result in toxic waste buildup that the cell cannot clean up, while overproduction might result in the cell transporting away all of its raw materials; neither is beneficial to cell behavior. A visualization of these mutations can be found in figure 2. Translocation occurs when portions of the genome move from one location to another.   This can happen between different locations on the same chromosome, or when a gene moves from one chromosome to an entirely new chromosome. The demonstration of movement to another chromosome can be seen in figure 2. Inversion results when direction of a region of the genome is reversed, making it effectively unusable. Just as long passages of text are difficult to read backwards, inverted genes can no longer be read and used to produce proteins. On the extreme end of this classification is the addition or removal of entire chromosomes, known as aneuploidy. A good example of this is the replication of the 21st chromosome that leads to Down Syndrome.

Further classification of mutations is possible, such as the effect on the organism, or alternatively the effect on cellular function. Effects on an organism can be beneficial, and aid the fitness of the organism; detrimental, and reduce the fitness or capabilities of the organism; or lethal, and kill the cell or organism. Beneficial mutations are rare; much more common are the detrimental mutations that can cause cancer. Where a mutation occurs can also affect how a mutation influences an organism. A mutation to the region of the genome that codes for a given protein will have a major impact on cell or organism behavior. Mutations in other regions of the genome may only have a more indirect impact on the organism.

The most important alternative classification, however, comes in distinguishing whether or not the mutation will potentially be handed down to the organism’s offspring. If the mutation occurs in somatic or general body cells (e.g., skin cells in humans), then the effect of the mutation is limited to the original organism. However, if the mutation occurs in the genetic material that is passed on to offspring, known as the germ line (e.g., egg or sperm cells in humans), effects propagate on to new generations. These propagating changes cause populations to change and evolve. We are the result of the compounding effects of thousands of different mutations to our ancestor’s DNA, determining features from our eye color to our height to our mental capabilities. While we may not possess superpowers, we are all X-Men (or Women) in our own way.

  1. Gabriel, A. & Przybylski, J. (2010) Sickle-cell anemia: A Look at Global Haplotype Distribution. Nature Education 3(3):2
Posted in Evolution 101 | Tagged , , | Leave a comment

4th Annual Big Data in Biology Symposium at the University of Texas in Austin

This post is by University of Texas at Austin grad student Rayna Harris

On Wednesday, May 11, 2016 The Center for Computational Biology and Bioinformatics hosted the 4th Annual Big Data in Biology Symposium at the University of Texas in Austin.



A major theme this year was centered around the intersection of synthetic biology and big data in biology. In the same way that sequencing genomes is becoming easier and cheaper, so too is our ability to modify and manipulate genomes. Talks by Pam Silver from Harvard University, Jon Laurent from Ed Marcotte’s lab, and Ross Thyer from Andy Ellington’s lab shed light on how synthetic biology can help us better understand evolutionary processes and build new tools for solving real-world problems.

A number of hands go up after Pam Silver’s excellent talk called “Building Biological Control: Living Therapeutics to Cell Factories”. Check out our Facebook page for more photos.

The other major theme (which is highlighted on our T-shirt and fliers) was the juxtaposition of beauty in nature and in computation. I think it’s awesome that we can use similar computational tools to study diverse biological processes. This was showcased by Lucia Carbone from Oregon Health Science University (OHSU) and Rachel Wright from Misha Matz’s lab, who use genomics and bioinformatics to study phenotypic diversity in primates and coral reefs, respectively. Ila Fiete and James Howison also presented great research on how they use computation to study behavior across scales, from Fiete’s dissection of neural circuits involved in memory to Howison’s analysis of how communities of people develop and maintain open source software.


A) Survey responses suggest that we offered the right amount of talks and the diversity of speakers was appropriate. B) Overall, the variety of social activities promoted networking, although there is room for improvement.


We try to make each event better and more impactful than the last, and based on the feedback we have received we seem to be successful. Here’s my reflection on what went well, what could have gone better, and how I think the event could be improved next year.

The diversity of speakers. This year, 4 out of 7 speakers were female (greater than 50%)! We invited three trainees from UT and two PIs to share their research with the community. Even though historically the event is meant to showcase research done here at UT, we had enough funding to invite two outside speakers (Pam Silver from Harvard Medical School and Lucia Carbone from OHSU). As you can see from the survey, pretty much everyone thought that the diversity of speakers was just right!

The poster session. This year, 28 scientists presented posters. The double-sided poster stands arranged nicely in a big space so there was plenty of room to move around. Even before the session began, people were constantly hanging out by posters and sharing ideas. We gave out awards for Best Big Data in Biology poster to one undergraduate (Abdurrahman Kharbat, David Stein Lab) and one graduate student (Laura Ferguson, Adron Harris Lab).

The Student-Industry Dinner. For the second year in a row, we solicited and received corporate sponsorship to host an industry mixer. This invitation-only event at the Clay Pit Indian restaurant was designed promote meaningful exchange between industry and academia. My favorite thing about this event is seeing people, who would never have otherwise met, engaged in animated conversations over delicious food.


The food. I used to strive for perfect marks on food from even the pickiest of eaters, but I’ve realized that catering is one of those things that will never go as well as I plan.

The length of lunch. In the past, lunch was 1.5 hours long because we conducted panel discussion on topics related to big data. This year, we kept the 1.5-hour lunch but dropped the panel discussion. As a result, some momentum was lost and attendance dropped a little bit for the afternoon session.

The length of the poster session. I thought 1.5 hours would be long enough for the poster session, but I was wrong. One judge didn’t have enough time to complete all the evaluations, and I one presenter was mid-sentence when the Facilities clean-up crew arrived at her poster to break down the poster stand. Maybe next year we can increase the session to two or more hours.

The industry exhibits. We invited representatives from six local companies at tables to host tables or “exhibits” during the poster session. I had never arranged anything like this before, so I didn’t really know what to ask of the representatives. I think it could have been a little more exciting or vibrant, because they didn’t attract much of a crowd. I would argue, however, that it was useful at promoting networking, if not during the poster session then later in the evening during the dinner.


Overall, I was really happy with the symposium and mixer, and so were the attendees! I think we have a good system for promoting the spread of knowledge in our community. Next year, we will bring back a more interactive session, like the breakout sessions from previous years, to provide a space for group discussions on specific challenges and opportunities for research. We also hope to extend our reach to industry beyond local companies to bring in representatives from national companies with mutual interest in Big Data in Biology. We look forward to seeing you there!


This event would not be possible without the help of many people!

Director: Hans Hofmann
Coordinator: Rayna Harris
Graphic Design: Nicole Elmer
Administration: Laurie Alvarez
Dinner Coordinators: Sean Leonard, Rebecca Tarvin, Rayna Harris
Corporate Relations: Kristine Haskett, Sumaya Saati
Symposium helpers: Laurie Alvarez, Nicole Elmer, Dennis Wylie, Benni Goetz, Dhivya Arrassappan

Event Sponsors
Mirna Therapeautics
The University of Texas at Austin Graduate School
The Graduate Student Assembly


Posted in BEACON Researchers at Work, BEACONites | Tagged , , , , | Leave a comment

Evolving antimutator microbial machines

University of Texas at Austin grad student Dacia Leon (Twitter: @leondacia)

This post is by University of Texas at Austin grad student Dacia Leon (Twitter: @leondacia)

Fluorescence microplate readers are really exciting. These instruments are a staple in any synthetic biology lab given that they allow for high-throughput quantification of microbial growth and fluorescence over time – so many experiments, so much data. My lab recently purchased one of these microplate readers, and it rarely experiences the “OFF” state.   A typical experimental setup consists of up to ninety-six engineered strains each containing a fluorescent reporter protein, which is commonly used as a proxy for expression of a synthetic device. As synthetic biologists, we use microbes as host organisms to “design-test-build” devices that are developed as a product of the researcher’s imagination. The nature of these synthetic devices varies greatly in that some address a pressing, societal need while others aim to explore the limits of biology. Our microplate reader is one way in which we can assay/troubleshoot our devices, and has therefore earned a valued position in our lab.

Our beloved microplate reader machine doing what it does best – taking time points every 15 minutes.

Reflecting on our enthusiasm for the microplate reader reminded me of a review, written over ten years ago discussing the fundamental principles of synthetic biology1. The primary author of the review, Drew Endy, states that one of the greatest limitations in engineering is that machines are built to be single-use. This means that once a machine is no longer functional, or simply antiquated, it is discarded or recycled. For example, our microplate reader is not designed to self-replicate and produce new generations of microplate readers, unfortunately. But, if given the possibility, would it not be incredible? Directions – incubating at 37°C overnight will yield hundreds of healthy, microplate reader colonies. The idea of self-replicating machines may seem absurd until we regard microbes as machines.

Genetic engineering of microbes has existed for decades, and there have been many successes in fields such as drug development, bioenergy, production of industrial chemicals, and agriculture. One of my favorite success stories is the biosynthesis of the antimalarial drug, artemisinin, in an engineered yeast host strain2. Artemisinin is naturally a plant-derived compound and its traditional method of isolation results in drastic fluxes in drug price and availability. Plant-based production of artemisinin consists of harvesting biomass from full-grown plants, which takes ~8 months, and then treatment with a solvent in order to extract the artemisinin. Unfortunately, this method is neither cheap nor stable and given the current death toll – over a million annually, most malaria patients are unable to access the treatment. To address this problem, a semi-synthetic artemisinin pathway was constructed in yeast to produce an artemisinin precursor, artemisinic acid, which can be chemically converted into artemisinin. The engineered yeast strain contained multiple modifications including three heterologous genes from the native artemisinin synthesis pathway and a series of alterations refining metabolic flux in the yeast host. Once synthesized, artemisinic acid is transported to the cell’s outer membrane, allowing for rapid purification, and subsequent conversion to artemisinin. Compared to the plant-based method, the engineered yeast host produces comparable levels of artemisinic acid, but over a markedly shorter time period (4-5 days) and using a yeast host omits the erratic nature of plant-based isolation from the process. Currently, research is focused on optimizing the industrial scale production of yeast artemisinin in order to advance it as a viable strategy against malaria.

There is one salient factor that I have excluded from this story so far. Microbes are replicating machines, but these machines are highly error-prone in their replication. Errors can stem from various sources such as environmental stress, the nature of the enzymes involved in DNA replication, and/or any toxic byproducts that are generated by a cell’s own metabolism. In any case, these replication errors lead to fixed mutations. Engineered microbes contain heterologous synthetic devices that utilize a heavy proportion of a cell’s resources for expression. These devices cause cellular stress and decrease fitness, resulting in a strong selection for mutations that eliminate expression of any synthetic part that comprises the device. Over time, random mutations that naturally occur in a living host organism will accumulate and render a synthetic device inactive. Microbes are limited by their inherent unpredictability. This poses a challenging problem for engineers.

PResERV method work flow

To address this issue, one could engineer genetic stability on the side of the host organism, the synthetic device, or both. Synthetic devices can be re-designed by applying strategies such as removing repeat regions, optimizing codon usage, and altering gene expression. These strategies have been applied to device design and are shown to increase the lifetime of the device encoded in the host. One area that has been largely unexplored is on the side of the host organism. My work hypothesizes that the stability of the host genome can be significantly improved by decreasing the natural mutation rate of the host organism, resulting in a lower probability that the encoded synthetic device will mutate and become inactive. By lowering the baseline mutation rate, the host organism can be altered to accommodate any synthetic device, ensuring its long-term stability in the host. My goal is to engineer genetically stable host organisms and to understand the cellular mechanisms required for genetic stability. Our lab has developed an iterative, universal method called Periodic Reselection for Evolutionary Reliable Variants (PResERV), which enriches for genetically stable strains in a population using a fluorescent reporter gene. Mutants that maintain long-term expression of the synthetic fluorescent reporter gene are putative candidates with reduced mutation rates. These mutants are then isolated and sequenced to determine the causative mutations. PResERV will be used to identify genetically stable mutants in the two most commonly utilized host organisms, E. coli and yeast. By applying PResERV in both organisms, I will engineer reduced mutation variants and learn about relevant mechanisms in each organism.

Here are some of the questions I hope to answer with my research:

1.) How conserved are the cellular mechanisms that reduce mutation rates across species?2.) Can I develop general design principles for lowering mutation rates in any organism?3.) What is the limit to reducing mutation rates? Does this depend on the host organism?4.) How is the robustness of a reduced mutation host affected by increasingly complex synthetic devices?

Additionally, I think my work can provide knowledge and resources for the synthetic biology community. A collection of genetically stable host organisms will allow engineers to tackle more challenging problems without being limited by inactivating mutations.

[1] Endy, D. Foundations for engineering biology. Nature 438, 449-453 (2005).
[2] Dae-Kyon R et al. Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature 440, 940-943 (2006).

Posted in BEACON Researchers at Work | Tagged , , , , , | Leave a comment

Anonymity. Does anyone have it?

This post is by North Carolina A&T grad student Siobahn Day

SiobahnD-0144-cropGreetings! My name is Siobahn Day. I’m currently a PhD student in the Computer Science at North Carolina A&T State University. I work as a graduate researcher in the Center for Advanced Studies in Identity Science. I have developed the concept of Adversarial Authorship as a means of preserving author anonymity. I’m currently developing and evaluating an Interactive Evolutionary Computation for Adversarial Authorship which allows users to conceal their writing style.

This research is particularly important to me because as technology has advanced over the years, our laws have not (US). Due to the rapid growth of the internet and social networks it’s very hard for one to have anonymity. As a result, many Anonymous Social Network (ASNs) have arose. Some believe that privacy is dead and I’d like to see what could be done to change that outlook. I’m excited to share with you some of my current research and snippets of a publication that will appear in the The 25th International Conference on Computer Communication and Networks (ICCCN 2016) proceedings later this year. BEACON has given me new and innovative ways at looking at my problem in order to find an effective solution.

Over the last few years, we have seen an increase in the number of Anonymous Social Networks (ASNs). What many internet users may not know is that their writing style can be tracked across the internet and even through an ASN. The good news is that by using a technique referred to as Adversarial Stylometry one can effectively imitate the writing style of another or even obfuscate their own writing style in an effort to conceal their true writing style – for a short term. The bad news is that recent research has shown that Adversarial Stylometry is not effective in concealing ones writing style over the long term. We introduce a number of underlying concepts that will allow users to conceal their writing style over the long term. One such concept we refer to as Adversarial Authorship.

In Adversarial Authorship, authors are provided an AuthorWeb which allows them to see graphically how their writing style compares with others in the AuthorWeb. The AuthorWeb presented uses Entropy-based Evolutionary Clustering (EBEC) in an effort to cluster writing styles. Our results show that EBEC outperforms a number of other machine learning techniques for author recognition. Users of an AuthorWeb can then write to user-specified clusters in an effort to conceal their writing style.

If this research interests you in any way, feel free to read my previous publication: Towards the Development of a Cyber Analysis & Advisement Tool (CAAT) for Mitigating De-Anonymization Attacks . You can also visit my research team at The Center for Advanced Studies in Identity Science. I look forward to continuing to share with BEACON much more of my research as it evolves.

Posted in Uncategorized | Leave a comment

Evolving Deep Neural Networks

This post is by UT Austin grad student Jason Liang

Deep learning has revolutionized the field of machine learning in many ways. From achieving state-of-the-art results in many benchmarks and competitions to effectively exploiting the computational power of the cloud, deep learning has received widespread attention not just in academia but also in industry. Deep learning has helped researchers and scientists obtain state-of-the-art results in speech recognition, object detection, time-series prediction, reinforcement learning, sequential decision-making, video/image processing, and many other supervised and unsupervised learning tasks. One of the leaders in this field is Sentient Technologies, an AI startup based in San Francisco that specializes in financial trading, e-commerce, and healthcare applications using deep learning, evolutionary computation, and other machine learning and data science approaches. I am currently working as an intern at Sentient, developing ways to make deep learning not only easier to implement, but also more applicable to more general problem domains. This internship allows transferring my dissertation research to industry, and also gives me access to computational resources that makes such work possible.

Deep learning, despite its newfound popularity among the machine learning and artificial intelligence community, is actually an extension of decades old neural network research; the major difference is that the size of both the datasets and available computing power have increased exponentially. One of the problems with deep learning is that the architecture design has a large impact on its performance and some problems require specialized architectures. For example, the Googlenet architecture (shown below), which won the 2014 Imagenet competition for image classification, contains specialized submodules which themselves are deep networks. Also, as the networks become more complex, the number of parameters and configurations that needs to be optimized increases as well. At Sentient, my advisor Risto Miikkulainen and I are developing evolutionary algorithms to automatically discover and train the best deep neural networks for a particular problem. Our vision is to eventually create a general framework that is applicable to any problem and uses machines to automate AI and machine learning research.

Googlenet architecture

One of the downsides of deep learning is that training a neural network is very computationally intensive. Most networks of moderate complexity and above take hours, if not days to train in machines with powerful GPUs. This compute cost is even worst for evolution of deep networks, since now there is a whole population of networks that must be trained and evaluated during every generation. Due to the immense computational requirements, evolutionary deep learning has been considered to be impractical until now. Fortunately, Sentient has developed a massively scalable evolutionary algorithm that runs on millions of CPUs all over the world  to evolve stock trading agents. We are currently extending it to utilize GPUs as well, to perform parallel training of each deep neural network simultaneously. This framework will eventually be scalable to hundreds of thousands of GPUs. Since GPUs are expensive and relatively rare, we are also looking at ways of utilizing also CPUs for training deep neural networks. If the training of a single network model can be parallelized across many CPU machines, then it is truly possible scale up evolution of neural nets to millions of machines.

As computing power becomes faster and cheaper, I believe that there is going to a lot newfound interest in applying evolutionary algorithms to deep networks. This approach should be particularly useful in automatic discovery of new architectures for new problem domains, such as understanding cluttered images, video, and natural language, as well as reinforcement learning and sequential decision making. This process will depend on extreme computational resources, thereby making it productive to combine the resources of academia and industry.

Posted in BEACON Researchers at Work | Tagged , , , , | Leave a comment

Mass Extinctions, Evolution, and…. Robots?

Check out this great video produced by the UT Alumni Association talking about research by BEACONites Joel Lehman and Risto Miikkulainen at UT Austin.

Lehman and Miikkulainen published an awesome paper in PLOS ONE looking at evolution after a mass extinction.  I, for one, welcome our new robot overlords.

Here’s their abstract,

Extinction events impact the trajectory of biological evolution significantly. They are often viewed as upheavals to the evolutionary process. In contrast, this paper supports the hypothesis that although they are unpredictably destructive, extinction events may in the long term accelerate evolution by increasing evolvability. In particular, if extinction events extinguish indiscriminately many ways of life, indirectly they may select for the ability to expand rapidly through vacated niches. Lineages with such an ability are more likely to persist through multiple extinctions. Lending computational support for this hypothesis, this paper shows how increased evolvability will result from simulated extinction events in two computational models of evolved behavior. The conclusion is that although they are destructive in the short term, extinction events may make evolution more prolific in the long term.

Posted in BEACON in the News, BEACON Researchers at Work, BEACONites, Member Announcements | Tagged , , , | Leave a comment

A microbe-dependent world: studying the legume-rhizobia symbiosis for a more sustainable future

This post is by MSU grad student Shawna Rowe

Living in a world full of fascinating visual elements and intriguing macro-organisms often results in people forgetting the most abundant group of earth’s inhabitants— microbes. Microbes are not only the most abundant and diverse group of living organisms but are also, in my personal opinion, the most fascinating. Whether it be the Demodex brevis that colonize human faces or the rhizobia that live in our soils or the Thermus aquaticus that live in the depths of Yellowstone, microbes are inescapable and responsible for endless biological processes.

One group of bacteria, rhizobia, are soil-dwelling and underappreciated powerhouses of agricultural productivity. These bacteria form a specialized relationship with leguminous plants (soybean, bean, lentils, peanuts, etc.) in which they supply nitrogen, a globally limiting resource, in exchange for carbon. When undisturbed, this interaction naturally increases soil nitrogen content. Agricultural soils are frequently nitrogen limited which causes farmers to deposit approximately 80 million tons of nitrogen fertilizers on agricultural fields each year! This practice has resulted in increased crop yields at the expense of the environment. Toxic algal blooms pollute water sources, microbial communities have been destroyed, fossil fuels are burned to produce the fertilizers, and gaseous nitrogen compounds are released into the atmosphere as consequences of modern fertilizer production and use. Fortunately, the relationship between legumes and rhizobia offers an opportunity to offset the excessive use of fertilizers and begin shifting away from these environmentally detrimental practices.

Medicago truncatula, a model legume on which I conduct research growing in two different types of growth containers. The fully encased one (test tubes) provides sterile conditions for assays that require a more controlled environment.

In this relationship the host legume provides the infrastructure in the form of specialized organs known as nodules. Inside these nodules live the hardworking rhizobia. The plant nodules serve as a protected space for the microbes to reproduce and expand as they complete the energy expensive task of converting N2 to NH3. Years of evolutionary pressure has resulted in a very tightly controlled balance of resource trade. However, as with most relationships there exists opportunities for trouble— in context of this mutually beneficial relationship the rhizobial partners have the opportunity to take more resources from the host plant while supplying comparably less nitrogen. This act has been termed “cheating.” Cheaters are problematic since they threaten to destabilize the long-established and important relationship; a reality that would further strengthen our dependence on nitrogen fertilizers in the agriculture sector. In Dr. Maren Friesen’s lab, I aim to elucidate molecular mechanisms of this resource trade between legumes and rhizobia. My work focuses on understanding how host plants are able to differentially recognize and respond to rhizobial partners of varying effectiveness. Developing an understanding of these response and control mechanisms is critical to understand how microbes are able to exploit their hosts and how external pressures are driving the emergence of cheaters.

Shawna working in a biosafety cabinet in the Friesen lab space

As a native of southwest Missouri, ranked 6th in soybean production in the U.S., I spent most of my life surrounded by agricultural fields. Traveling to school frequently involved getting stuck behind a tractor when planting season arrived. Future Farmers of America was the largest student organization and roughly half of the student population had milked a cow before the age of 10. Although charming and hardworking, small agricultural towns are often times inherently (but unintentionally) anti-science. STEM education was severely lacking and evolution was a dirty word capable of eliciting dramatic arguments and endless frustration. Because of this, I loathed the idea of working in agriculture. Upon graduating high school, I entered college as a Biochemistry major with no clear idea of what “biochemistry” was nor what I could do with it. I was fortunate enough to land a job in a plant biochemistry research lab. There, they focused on understanding basic mechanisms of plant immune responses to pathogenic bacteria. That job set up the stage for my future research interests. I discovered the complex world of molecular signaling events and microbial associations. I learned about the co-evolution of organisms that commonly associate and how these associations drive the development and establishment of complex features of host-microbe interactions. I fell in love with the unseen world.

Years later, these experiences still serve as the foundation for the questions I ask and the topics I find intriguing. In the Friesen lab, I hope to better understand how hormones, specialized proteins, and various other plant derived molecules serve as regulatory components for the unique relationship leguminous plants have with the microbial world. Further developing our understanding of the regulatory mechanisms will both shed light on the co-evolution of legumes and rhizobia as well as the factors that threaten to destabilize this biologically important relationship.

Posted in BEACON Researchers at Work | Tagged , , , , | Leave a comment

Male battles split species apart

Picture of me: Behind me are some of the hundreds of fish tanks in the basement of Giltner containing all the baby sticklebacks we generated for this experiment.

This post is by MSU postdoc Jason Keagy

How do species form?

Stated more precisely, how does one species become two? This turns out to be an immensely difficult question to answer, because 1) species are not always distinct entities (species definitions are argued about ad naseum [1]) and 2) the formation of species (speciation) is a process that often takes a long time to complete.

One way in which species could form is if selection is divergent and a population responds to that selection [2] – for example, Anolis lizards that have adapted such that each species has limbs that are optimal for living in different types of vegetation [3], or insects that have specialized on feeding on different plants [4]. One way to represent the relationship between phenotypes (traits such as limbs, coloration, or digestive enzymes) and fitness is with a “fitness landscape” [5], so called because in three-dimensional representations (e.g., two traits as the x and y axes and fitness as the z axis), it can resemble a landscape of peaks and valleys. However, we don’t have a lot of great examples of these because it is often difficult to measure fitness and fitness often depends on multiple independent phenotypic traits in complicated ways.

The power of sticklebacks

In some freshwater lakes in British Columbia, you can find two different types of stickleback, called “benthics” and “limnetics” that are reproductively isolated, and therefore, typically referred to as species. These benthic and limnetic sticklebacks are descended from marine sticklebacks who bred in glacially fed streams. After the glaciers melted ~12,000 years ago, the weight of the ice being removed caused the land to rebound, and the uplifted streams became isolated lakes. Because of this relatively short timescale, these fish have become a model system for studying adaptation and speciation.

What is the difference between benthic and limnetic sticklebacks? Limnetics live in open water, eat plankton, and are more visually oriented, whereas benthics eat invertebrates off of plants or the lake bottom, live in complex spatially structured vegetated habitats, and are more dependent on smell. Limnetic and benthic sticklebacks also differ in body size, shape, and mating traits. In other words, they are really different! Critical for maintaining these differences is strong reproductive isolation and so the Boughman lab has long been interested in understanding what influences this isolation.

The role of male competition

Typically, the focus in speciation research has been on natural selection (even in sticklebacks). Much less studied and controversial is whether sexual selection can drive speciation. Especially unstudied is intrasexual (often male-male) competition’s role. That seems like a pretty big oversight to me. Flip on any nature show and you’re sure to see at least one scene of males bashing each other to pieces. It turns out Jenny Boughman, Liliana Lettieri, and I were already working on a project which was perfect for studying how male competition might impact speciation.

Fig. 1. Males compete intensely over territories on which they build nests. Pictured here are three males in a tank at KBS. The male in the foreground is directly over his nest. It’s pretty well concealed!

Male sticklebacks compete for territories on which they build their nests (Fig. 1). They’ll even destroy each others’ nests and steal pieces such as the choicest algae. Eventually, these males will try to attract females via courtship behavior to convince them to lay eggs in their nests. Male competition is extremely important to determining male fitness: if males can’t successfully obtain and keep a territory, and build and keep a nest, they are unable to reproduce (we rarely see sneak spawning). Male competition could have important impacts on speciation because males of each species build nests very close to each other in nature and are therefore direct competitors for space and resources.

Our research

Our main research questions included: How do male phenotypes relate to male competitive fitness? Do the resulting fitness landscapes have multiple peaks? Would these peaks promote speciation? We created hundreds of hybrid males in the laboratory through artificial crosses. This greatly expanded the combinations of phenotypes from that seen in the wild. Then we put these males in large outdoor tanks at Kellogg Biological Station that had sand and algae and food caught from nearby ponds. We measured lots of physical traits on the males and spent hundreds of hours recording their male competition behavior (with the help of an awesome army of undergrads).

Fig 2. Be really careful about what you are taking with you into water bodies. Your actions can have serious evolutionary and ecological consequences!

Our research revealed some surprises [6]. First, there were indeed two fitness peaks corresponding to pure benthic and pure limnetic multivariate phenotypes. But there was another region of high fitness (a bridge connecting the peaks) that implies certain intermediate hybrids were also good competitors. Interestingly, these hybrids had phenotypes like fish now seen in Enos Lake, where after anthropogenic disturbance (someone released crayfish into the water, Fig. 2) formerly distinct benthic and limnetic species are now a hybrid swarm (a depressing example of evolution in action). Previously the hybridization had been attributed to the crayfish’s introduction resulting in generalist rather than specialist sticklebacks having higher survival, a change in natural selection [7]. These generalists would have been produced by hybridization, which before happened at inconsequential numbers, but this trickle would have become larger as hybrids were now surviving to adulthood. However, our results show that sexual selection through male competition may also have been a contributing factor that sped up the species collapse. The hybrid males with phenotypes corresponding to the bridge within our fitness landscape would have likely been very successful at getting nests, increasing the likelihood of further hybridization. Our data strongly suggest male competition could be very important in the speciation process and impact speciation in complex ways.

[1] As one example of this disagreement, see Wu, C-I. 2001. The genic view of the process of speciation. Journal of Evolutionary Biology. 14: 851-865 and the ten responses.
[2] For a book dedicated to this topic, see Nosil, P. 2012. Ecological speciation. Oxford: Oxford University Press.
[3] A nice HHMI video description of this research is here:
[4] There are many nice examples of this including 1) pea aphids that have diverged to specialize on red clover and alfalfa, 2) fruit flies feeding on different species of cactus, 3) the races of apple maggot fly that feed on either hawthorn or apples, and 4) stick insects adapted to wildly different plants in California.
[5] There is some disagreement over the what specifically “fitness landscape” refers to and what the proper term is for what I refer to as a “fitness landscape” here (especially among philosophers of science). You can read about it in the first section of this book: Svensson, E., Calsbeek, R. (eds) 2012. The Adaptive Landscape in Evolutionary Biology. Oxford: Oxford University Press.
[6] Keagy, J., Lettieri, L., Boughman, J.W. 2016. Male competition fitness landscapes predict both forward and reverse speciation. Ecology letters. 19: 71-80.
[7] Behm, J.E., Ives, A.R., Boughman, J.W. 2010. Breakdown in postmating isolation and the collapse of a species pair through hybridization. American Naturalist. 175: 11–26.

Posted in BEACON Researchers at Work, Notes from the Field | Tagged , , , , , , | Leave a comment