Artificial selection and correlated traits

This Evolution 101 post is by MSU grad student Tyler Derr

One of the basic mechanisms of evolutionary change is natural selection. It was in Charles Darwin’s famous book, On the Origin of Species, where he defined natural selection to be the “principle by which each slight variation [of a trait], if useful, is preserved.” (Darwin 1859). Knowing that many people would be skeptical of what he presented in his book, the first chapter is structured to first discuss selection in terms of breeding. He presented examples of where humans have selectively bred both animals and plants.

Artificial selection (also called selective breeding) is a process in which humans interfere with natural selection to obtain certain traits we desire an organism to have. This process is performed by choosing which animals or plants are allowed to mate with each other in the hopes if both of the parents have a certain observable trait then their offspring will as well. Shown in Fig. 1 are common vegetables that have all been cultivated from wild mustard by past farmers artificially selecting traits in the plant.

Fig. 1. Vegetables that arose from the wild mustard due to selecting for different traits.

Selective breeding has been done on many animals. Examples of such artificial selection being performed can be seen in dogs. Due to this selective breeding, there are now hundreds of different breeds. It was only just recently, thanks to whole genome sequencing, that we have discovered that gray wolves and dogs started to diverge from a common ancestor at the same time roughly 27,000- 40,000 years ago (Skoglund 2015). Although today we breed dogs for certain traits such as cute floppy ears or a fluffy coat, we can pretty safely say that these would have probably not been at the top of the list with humans that long ago. We know that based on when dogs first arose this would have been the time when humans were still hunter-gathers (Freedman 2014). Fig. 2 below shows a few dog breeds representing were they were selectively breed, based on geographical location, and also representing from what previous dog breeds they were breed from.

So if humans tens of thousands of years ago did not select for all the same traits that we select for today, what did they desire? Well, it can be hypothesized that they simply desired the dogs to be non-aggressive. Although we normally think of evolution as a very long process (which it is with natural selection), the game changes when we as humans start getting involved. It was in 1959 that a Russian geneticist Kmitry K. Belyaev (shown in Fig. 3) began a study in the hopes of breeding a population of tame foxes (Trut 1999). Belyaev solely selected for tameness and strictly against aggression when breeding. After just ten generations, 18% of the pups were not only tame, but showing signs of affection such as whimpering for attention and even licking the experimenters (Trut 1999). Evolution in Action: The Silver Fox Experiment, is a short clip from the BBC’s documentary [The Secret Life of Dogs] in which they discuss the early stages of the fox experiment and then also show the progress that has been made over the last 50 years.

Fig. 2: Examples of how selective dog breeding has branched from the common ancestor based on geographical location. Cain, Michael L., Damman, Hans, Lue, Robert A. and Carol Kaesuk Yoon. Discover Biology Second Edition. New York: W. W. Norton, 2002.

As mentioned before, the only trait that was selected for in the fox experiment was tameness, however interestingly enough quite a number of other traits came with it. Other physical traits such as a curly tail instead of straight, floppy ears, and shorter limbs began to appear in the tame foxes, which are traits commonly shared among other domesticated animals (Trut 1999). What was just described is commonly known as the correlation of traits, which is another interesting topic to mention when discussing artificial selection. This is when selecting for a specific trait not only allows the offspring to have the trait selected for, but also a set of other traits that are genetically correlated.

Fig. 3: Belyaev shown with some of the tame foxes that were bred in his experiment.

 

As seen from both the plant and animal examples, humans can have a large impact on a species when performing selective breeding. It will be interesting to see how future organisms will be changed to fit the wants and or needs of humans in the future.

References:

Darwin, C. (1859). “On the origin of species by means of natural selection, or by preservation of favoured races in the struggle for life”. London: John Murray.

Freedman, A., et al. (2014). “Genome sequencing highlights the dynamic early history of dogs”. PLoS genetics 10(1): e1004016.

Skoglund, P., et al. (2015). “Ancient wolf genome reveals an early divergence of domestic dog ancestors and admixture into high-latitude breeds”. Current Biology 25(11): 1515–9.

Trut, L. (1999). “Early Canid Domestication: The Farm-Fox Experiment Foxes bred for tamability in a 40-year experiment exhibit remarkable transformations that suggest an interplay between behavioral genetics and development”. American Scientist 87(2): 160-169.

 

Posted in Evolution 101, Uncategorized | Tagged , | Comments Off on Artificial selection and correlated traits

It’s a (Selective) Sweep for the Good Genes!

This Evolution 101 post is by MSU grad student Douglas Kirkpatrick

Fig. 1: Lebron James helps lead the Cavaliers in a sweep of the Celtics.

In baseball, ice hockey, and basketball, when a team wins all of the games in a playoff series, they are said to have swept the other team out of the playoffs. In other words, a sweep is the complete victory of one group over the other, often due to a star player. Funnily enough, this terminology applies almost as well in the world of evolutionary biology. A selective sweep is the process by which strong selection of a beneficial allele causes the surrounding linked alleles to achieve a higher frequency in the population. In the same way that superstar Lebron James lead his team, the Cleveland Cavaliers, to a sweep of the Hawks and the Celtics in last year’s playoffs, a good gene might help linked alleles attain reproductive success in a selective sweep.

The essence of this process is that when a hugely beneficial allele appears, it is rapidly selected for. Due to the rapid selection, many linked or nearby genes are also passed on to descendants, even though these other alleles may be neutral or even mildly deleterious to the organism.   This process is illustrated in Figure 2. The extremely beneficial gene is highlighted in red, while deleterious or neutral genes are presented in blue. Each line in Figure 2 represents the genome of a single member of the population.   The pattern of the beneficial gene and its neighbors, seen in the fourth row on the right, are collectively referred to as a “haplotype.” The haplotype of the advantageous gene becomes much more frequent after selection occurs, as new populations are spawned [3].

Fig 2: A basic biological selective sweep

Several methods exist to detect selective sweeps; the primary method is to measure linkage disequilibrium. That is, the distributions of alleles in a population are compared to determine the presence of haplotypes. If a specific haplotype, or collection of linked alleles, is exceedingly common, then it is likely that a selective sweep occurred in the recent past. The linked alleles are primarily those that are collocated within the genome. An alternative measure to find instances of selective sweep is to measure the time to most recent common ancestor. If they have all evolved from a recent ancestor, then it is likely that a selective sweep has occurred. That is, because the ancestor is recent, its haplotypes have likely spread quickly through the population, making it probable that a selective sweep has occurred.

Two examples of selective sweeps are particularly relevant in the modern world: in pathogenic bacteria and in agriculture. Bacteria and disease-causing bacteria in particular, have a short life span. As such, any new allele that creates a more virulent form will spread wildly as the less potent haplotypes die off in a short time. The potential for rapid change in addition to strong selective pressure from outside forces like antibiotics or antivirals leads to many selective sweeps. The short life cycles and high selective pressures placed on pathogens have been shown to have caused selective sweeps in influenza and toxoplasma gondii [1].

While the selective sweeps in bacteria may be detrimental to mankind, there are selective sweeps that have helped humanity. Data shows that selective sweeps were responsible for unifying a diverse population into what we know as modern corn. Artificial pressure, driven by farmers only choosing optimal offspring, in coordination with selective crossbreeding forced a fast evolution of corn. The optimal traits selected for and the strong evolutionary pressure created a selective sweep [5]. The selective sweep in this case proved to be largely beneficial for people.

Figure 3: Selective Sweeps in human populations. Tishkoff et. al, 2007

Numerous examples of selective sweeps can be found in human DNA. There are at least six different chromosomes that show evidence of selective sweeps [4]. The most noticeable was described by Tishkoff et al., and comes as a result of the alleles for lactose tolerance. There was strong selection for the alleles that allow humans to digest milk as adults. As a result, selective sweeps occurred; this is illustrated in Figure 3. Interestingly, similar mutations occurred separately in Africa (Group A) and Eurasia (Group B). Each individual bar represents a portion of an individual’s genome that is shared with the group; a larger bar indicates that more genetic material is the same as the original cell. The horizontal axis is the relative position to the allele for lactase tolerance. The red and the green in each graph respectively show the haplotype for tolerance that has become more common, while the blue and orange are an old haplotype that has been outcompeted [2]. This information is essential to understanding how different yet similar processes evolved in humans; it is likely to be important going forward due to how many selective sweeps have occurred in the human genome.

Selective sweeps are important because they allow for rapid evolution in a short period of time. The strong selection for a specific haplotype can quickly change the distribution of alleles in a population. In addition, finding a selective sweep can help identify key periods of evolutionary change. Sweeps have had a major impact both on the human genome, and that of plants and animals seen in everyday life; they will continue to do so into the foreseeable future.

Sources:

  1. Sa, Juliana Marth, et al. (2009). “Geographic patterns of Plasmodium falciparum drug resistance distinguished by differential responses to amodiaquine and chloroquine”. PNAS 106 (45): 18883–18889.
  2. Tishkoff, Sarah A., et al. “Convergent adaptation of human lactase persistence in Africa and Europe.”Nature genetics 1 (2007): 31-40.
  3. A selective sweep. http://www.nature.com/scitable/content/a-selective-sweep-24827
  4. “A haplotype map of the human genome”. Nature 437 (7063): 1299–1320. October 2005. doi:10.1038/nature04226
  5. Gore, Michael A., et al. (2009). “A First-Generation Haplotype Map of Maize”. Science 326: 1115–1117. doi:10.1126/science.1177837. PMID 19965431.
Posted in Evolution 101 | Tagged , , , , , | Leave a comment

Pseudogenes

This Evolution 101 post is by MSU grad student Tyler Derr

I’m sure you’ve heard the saying that our DNA is the “blueprint” of who we are. Well, our genes are the sequences in our DNA that actually encode instructions for particular functions. What might come as a surprise to some is that in humans over 98% of our DNA is non-coding regions (Elgar and Vavouri 2008). This means that less than 2% of our DNA actually codes for proteins. Pseudogenes, unlike our genes, fit into the non-coding category, but what exactly are they?

A pseudogene is defined as a sequence in our DNA that is homologous to a known gene, but is nonfunctional (i.e. looks like a gene, but for some reason or another, just can’t quite make the cut to creating a functional protein). Therefore it would seem logical to assume, since they can’t make a functional protein, that they serve no purpose. In fact, until just a few years ago, scientists believed that they didn’t have any immediate function. However, now we know that some pseudogenes can actually serve an important function to an organism and not just a critical role in evolution. After discussing the types of pseudogenes and an example of how they can sometimes provide an immediate useful functionality to an organism, we will discuss their relevance to evolution.

There are three main types of duplicated pseudogenes: unitary, duplicated and processed and duplicated non-processed (Figure 1). This categorization is based on how the pseudogenes appear. We shall further discuss these three types below.

Figure 1: Types of duplicated pseudogenes: unprocessed (top) and processed (bottom). Note that they are all created in some way from a functional gene. Rouchka, Eric C., and I. Elizabeth Cha. “Current trends in pseudogene detection and characterization.” Current Bioinformatics 4.2 (2009): 112-119.

A processed duplicated pseudogene occurs during what is called retrotransposition. This is when a portion of mature mRNA is placed back into the DNA. This type of pseudogene is the easiest to detect in our DNA due to the fact that this process (also known as reverse transcription) allows the insertion of the mRNA poly-A tail into the DNA. The poly-A tail is just a long sequence of ‘A’s, which is a characteristic of mature mRNA and is usually not found in our DNA. One of the reasons why these insertions are classified as pseudogenes is because the placement of the mRNA in the DNA lacks a promoter sequence, which acts as a flag that represents where to start the transcription process. The steps required for making a processed pseudogene can be seen in the bottom section of Figure 1.

The second type of pseudogenes are unprocessed duplicated pseudogenes and are created during the copying of genes in the DNA (gene duplication). Once a gene has been duplicated, if one of the gene copies incurs a mutation, such as a nucleotide change that results in an early stop codon in the middle of the gene, it loses its ability to code for a protein and can be thought of as “junk DNA”. Normally, there would be a huge selection pressure on such a mutation if a single gene no longer functioned, but since the mutation happened on a gene that had undergone gene duplication, there would still be at least one functioning copy of that gene in the genome. Thus, this type of pseudogene can undergo genetic drift and acquire more mutations that have no direct effect on the fitness of the individuals that have this DNA sequence. The duplication and mutation steps are shown in the upper section of Figure 1.

The last type are the same as the duplicated pseudogenes in that they occur due to mutations, but instead of happening to a gene that has undergone gene duplication, unitary pseudogenes are when the mutated gene is the only copy of itself in the genome. The argument used as to why there would not be selection pressure on individuals that have a duplicated pseudogene no longer holds with the unitary pseudogenes. This is because if the mutated gene has no duplicates, then the gene has been completely deactivated.

As mentioned earlier, scientists used to consider pseudogenes in the category of “junk DNA”, nonfunctional gene lookalikes, and knew them mostly as just sequences that caused problems in their studies (e.g. PCR experiments). However, over time and with further study, quite a number of surprisingly interesting findings have been uncovered involving pseudogenes. One specific example from 2010 that was published in Nature discovered that what we had previously known as a pseudogene, PTENP1, was in fact helping suppress tumor growth in many colon cancer cell lines (Poliseno et al. 2010). The basic idea is that although PTENP1 could not undergo translation to become a functional protein, it was able to play the role of a decoy in having microRNA bind with its processed mRNA rather than the mRNA processed from the PTEN gene. This allowed for more PTEN protein since if the microRNA had attached to the PTEN mRNA it would have not been able to undergo translation to becoming a protein. This process is shown below in Figure 2. The authors had proposed that PTENP1 be no longer considered a pseudogene, but instead a “bona fide tumor suppressor gene.”

Figure 2: Visual explanation of how the pseudogene PTEN1 can help suppress tumor growth. Image Credit: hms.harvard.edu

Even though we just discussed an example where a pseudogene performs an immediate function for an organism, we also need to mention how they can function beyond the scope of an individual organisms lifetime, and instead on an evolutionary scale. In fact (as mentioned earlier) a duplicated pseudogene has the potential to undergo genetic drift and acquire multiple mutations with no detrimental effects to the fitness of the individual. It can be the case that this sequence of DNA later be resurrected into a gene resulting in a new functional protein being formed (Zhang 2003). It can sometimes be the case that simple (e.g. single nucleotide) mutations can result in huge jumps in the functional space; even to the point where the gene is turned off. It might come as a surprise, but theoretically having multiple mutations (which in comparison results in a larger step in the DNA sequence space) on a duplicated pseudogene can actually result in a smaller step in the functional space. These multiple mutations can result in a fitness improvement to an individual. Thus showing that pseudogenes provide an avenue for genetic drift to take place and due to the resurrection of the mutated duplicated gene an individual can express the a function that provides a fitness improvement.

We have discussed the three main types of pseudogenes, introduced an example that has proven some pseudogenes are actually important in their current state, and discussed reasons as to why they are valuable from an evolutionary standpoint. Although it might not be the case that every pseudogene has a current unique and important function to fulfill, those that currently do not have a purpose are still undergoing genetic drift and could possibly, one day, arise to serve a purpose for our ancestors many years from now.

References:

Elgar, G. and Vavouri, T. (2008). “Tuning in to the signals: noncoding sequence conservation in vertebrate genomes”. Trends in genetics, 24(7), 344-352.

Poliseno, L., et al. (2010). “A coding-independent function of gene and pseudogene mRNAs regulates tumour biology”. Nature, 465(7301), 1033-1038.

Zhang, J. (2003). “Evolution by gene duplication: an update”. Trends in Ecology and Evolution, 18(6), 292-298.

 

Posted in Evolution 101 | Tagged , , | Leave a comment

Selfish Genes and the Resulting Gene Conflict

This Evolution 101 post is by MSU grad student Alex Lalejini

The above comic strip might lead one to believe that the phrase ‘selfish genes’ describes genes that make individuals act selfishly; however, this is not at all what is meant by the phrase ‘selfish genes’. This post gives a brief introduction to selfish genes, which is a story rich in greed, replication, and conflict. Aside from the greed, replication, and conflict, the effects and implications of selfish genes are far-reaching, which makes them incredibly interesting.

The Gene Perspective

We are accustomed to thinking about evolution from the perspective of whole organisms: Individual organisms in a population have varying observable characteristics, or phenotypes, as a result of inherited genetic variations. Some variations increase an individual’s ability to compete for resources and reproduce, and through natural selection over many generations, the beneficial genetic variations are propagated through the population of organisms.

However, it is interesting to consider evolution from a more gene­centered perspective. Richard Dawkins has described living organisms as “throwaway survival machines for genes” with this gene­centered perspective in mind. While individual organisms inevitably die, the information coded by their genes has the chance to continue from generation to generation. Genes with phenotypic effects beneficial to a host organism’s chance of survival and reproduction have an improved chance to persist over many generations as compared to genes with less beneficial or harmful phenotypic effects. This is not always the case; some genetic elements achieve persistence and propagation in a population without any consideration for their host organism.

Selfishness

Genetic elements may exploit alternative methods of persistence and propagation without contributing to their host organism’s fitness. As such, these genetic elements are not necessarily invested in the host organism’s fitness, and as a result, their alternative methods of propagation often negatively affect the host organism’s fitness. These genes are selfish – their expression advances their own interests at the expense of other genes and the host organism as a whole. There are a multitude of types of gene selfishness. Here I provide an overview of one: transposable elements.

From Genes in Conflict by Burt and Trivers (2009), transposable elements “accumulate by copying themselves into new locations of the genome”; they are often referred to as selfish DNA parasites. Transposable elements are analogous to a self­replicating computer virus that copies itself many times when it is put on a computer in order to avoid removal. As we are aware, this type of computer virus could have a significant negative effect on the performance of a computer. In a similar manner, transposable elements can have a strongly negative effect on the host organism’s fitness. This form of selfish expression is surprisingly widespread – at least 45% of the human genome is derived from transposable elements (Lander et al.,2001)!

On a bit of a historical note, we must thank Barbara McClintock and her work with the familiar Thanksgiving holiday staple, multi­colored corn, for the discovery of transposable elements. As McClintock discovered, transposable elements are responsible for the vivid mosaics of color seen in Indian Corn. For those with further interest in transposable elements, her work is a great place to start.

Conflict

We have now seen that the expression of selfish genes can increase their own fitness at the cost of other genes and the host organism. As a result, selfish genes are often at odds with the ‘unselfish’ genes that rely on the reproductive success of the host organism in order to increase in frequency. This tension caused by opposing interests facilitates a form of gene conflict. If there is a selfish gene negatively affecting organismal fitness present in a population, there is selection pressure for other genes that suppress the selfish genetic element’s expression. As a result, we see genes with contradictory or conflicting effects evolve.

Evolutionary Implications

While the immediate effects of selfish genetic elements on host organisms are often negative, there is some evidence to suggest that selfish genetic elements and the resulting gene conflict helps to drive evolutionary change and innovation (Werren, 2001). As some genetic elements evolve to get ahead at the expense of the rest of the organism, other genetic elements arise to minimize the negative effects of selfish genes. In the case of transposable elements, the rest of the genome is sometimes able to recruit a transposable element for new cellular functions (Werren, 2001). In this way, selfish genetic elements can be instrumental in pushing organisms toward increasing genetic robustness. Selfish genetic elements, however, do not always bestow long­term positive effects on organisms; they can also lead to species extinction, which perhaps ironically, also leads to the selfish genetic element’s extinction.

References

Burt, A. & Trivers, R. (2009). Genes in conflict: the biology of selfish genetic elements. Harvard University Press.

Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., … & Grafham, D. (2001). Initial sequencing and analysis of the human genome.Nature,409(6822), 860­921.

Werren, J. H. (2011). Selfish genetic elements, genetic conflict, and evolutionary innovation. Proceedings of the National Academy of Sciences,108(Supplement 2), 10863­10870.

Posted in Uncategorized | Leave a comment

Evolution 101 – Mutations: From the X-Men to the X-Chromosome

This Evolution 101 post is by MSU grad student Douglas Kirkpatrick

Fig 1: The X-Men

Everyone knows what mutation is, right? It’s that magical scientific hand-wave that gives the X-Men their powers. Almost certainly the result of interaction with gamma radiation or toxic substances, mutation always has the most drastic results on the people that it affects. Right?

Unfortunately for us would-be superheroes, these results are neither likely nor commonplace, and the effects of mutation are almost always exaggerated while its everyday occurrence is typically ignored. The best example of how mutation affects our everyday life is, ironically enough, found in a brief scene from X-Men: First Class. The young Professor Xavier mentions while wooing a young woman that based on her blue eyes alone she is a mutant, as blue eyes are a result of a specific mutation to DNA. This observation is as close to science fact that the movie comes.

The word mutation comes from the Latin “mutare” which means “to change.” [2] Thus, by its roots the word mutation indicates a change or modification to something. In the context of biology, this is specifically a change to the genetic structure of an organism, to the organism’s DNA. Within that broad definition fall many forms of mutations, each with varying results and effects. An overarching classification for mutations is the scale at which they operate. Small scale mutations affect only a single gene, or perhaps only a few DNA base pairs, whereas large scale mutations affect an entire chromosome or chromosomes.

The small scale mutations fall into one of three main types. First are the point mutations, where a specific DNA base is changed to a different base, from Adenine to Cytosine perhaps. These are potentially caused by radiation, but are often prevented due to the matched base pairs of DNA and the assistance of corrective proteins. The second small scale mutation is addition, where a sequence of DNA is added to a given location in the chromosome. The third, deletion, is the opposite of addition. Deletion occurs when a certain sequence of DNA is removed from a given location. Deletion is visually demonstrated in figure 2. Despite their size, these mutations can still have a large effect on the organism. A single point mutation in hemoglobin, causing the replacement of an amino acid, is the source of sickle cell disease. [1]

Large scale mutations can be similarly subdivided. Duplications and deletions occur when regions of the genome are repeated or removed, respectively, from a chromosome. These types of mutations can cause a certain protein to be overproduced or not produced at all. Underproduction of a protein might result in toxic waste buildup that the cell cannot clean up, while overproduction might result in the cell transporting away all of its raw materials; neither is beneficial to cell behavior. A visualization of these mutations can be found in figure 2. Translocation occurs when portions of the genome move from one location to another.   This can happen between different locations on the same chromosome, or when a gene moves from one chromosome to an entirely new chromosome. The demonstration of movement to another chromosome can be seen in figure 2. Inversion results when direction of a region of the genome is reversed, making it effectively unusable. Just as long passages of text are difficult to read backwards, inverted genes can no longer be read and used to produce proteins. On the extreme end of this classification is the addition or removal of entire chromosomes, known as aneuploidy. A good example of this is the replication of the 21st chromosome that leads to Down Syndrome.

Further classification of mutations is possible, such as the effect on the organism, or alternatively the effect on cellular function. Effects on an organism can be beneficial, and aid the fitness of the organism; detrimental, and reduce the fitness or capabilities of the organism; or lethal, and kill the cell or organism. Beneficial mutations are rare; much more common are the detrimental mutations that can cause cancer. Where a mutation occurs can also affect how a mutation influences an organism. A mutation to the region of the genome that codes for a given protein will have a major impact on cell or organism behavior. Mutations in other regions of the genome may only have a more indirect impact on the organism.

The most important alternative classification, however, comes in distinguishing whether or not the mutation will potentially be handed down to the organism’s offspring. If the mutation occurs in somatic or general body cells (e.g., skin cells in humans), then the effect of the mutation is limited to the original organism. However, if the mutation occurs in the genetic material that is passed on to offspring, known as the germ line (e.g., egg or sperm cells in humans), effects propagate on to new generations. These propagating changes cause populations to change and evolve. We are the result of the compounding effects of thousands of different mutations to our ancestor’s DNA, determining features from our eye color to our height to our mental capabilities. While we may not possess superpowers, we are all X-Men (or Women) in our own way.

  1. Gabriel, A. & Przybylski, J. (2010) Sickle-cell anemia: A Look at Global Haplotype Distribution. Nature Education 3(3):2
  2. http://dictionary.reference.com/browse/mutation
Posted in Evolution 101 | Tagged , , | Leave a comment

4th Annual Big Data in Biology Symposium at the University of Texas in Austin

This post is by University of Texas at Austin grad student Rayna Harris

On Wednesday, May 11, 2016 The Center for Computational Biology and Bioinformatics hosted the 4th Annual Big Data in Biology Symposium at the University of Texas in Austin.

banner

SCIENTIFIC THEMES

A major theme this year was centered around the intersection of synthetic biology and big data in biology. In the same way that sequencing genomes is becoming easier and cheaper, so too is our ability to modify and manipulate genomes. Talks by Pam Silver from Harvard University, Jon Laurent from Ed Marcotte’s lab, and Ross Thyer from Andy Ellington’s lab shed light on how synthetic biology can help us better understand evolutionary processes and build new tools for solving real-world problems.

A number of hands go up after Pam Silver’s excellent talk called “Building Biological Control: Living Therapeutics to Cell Factories”. Check out our Facebook page for more photos.

The other major theme (which is highlighted on our T-shirt and fliers) was the juxtaposition of beauty in nature and in computation. I think it’s awesome that we can use similar computational tools to study diverse biological processes. This was showcased by Lucia Carbone from Oregon Health Science University (OHSU) and Rachel Wright from Misha Matz’s lab, who use genomics and bioinformatics to study phenotypic diversity in primates and coral reefs, respectively. Ila Fiete and James Howison also presented great research on how they use computation to study behavior across scales, from Fiete’s dissection of neural circuits involved in memory to Howison’s analysis of how communities of people develop and maintain open source software.

POST-SYMPOSIUM SURVEY RESLTS

A) Survey responses suggest that we offered the right amount of talks and the diversity of speakers was appropriate. B) Overall, the variety of social activities promoted networking, although there is room for improvement.

WHAT WORKED WELL

We try to make each event better and more impactful than the last, and based on the feedback we have received we seem to be successful. Here’s my reflection on what went well, what could have gone better, and how I think the event could be improved next year.

The diversity of speakers. This year, 4 out of 7 speakers were female (greater than 50%)! We invited three trainees from UT and two PIs to share their research with the community. Even though historically the event is meant to showcase research done here at UT, we had enough funding to invite two outside speakers (Pam Silver from Harvard Medical School and Lucia Carbone from OHSU). As you can see from the survey, pretty much everyone thought that the diversity of speakers was just right!

The poster session. This year, 28 scientists presented posters. The double-sided poster stands arranged nicely in a big space so there was plenty of room to move around. Even before the session began, people were constantly hanging out by posters and sharing ideas. We gave out awards for Best Big Data in Biology poster to one undergraduate (Abdurrahman Kharbat, David Stein Lab) and one graduate student (Laura Ferguson, Adron Harris Lab).

The Student-Industry Dinner. For the second year in a row, we solicited and received corporate sponsorship to host an industry mixer. This invitation-only event at the Clay Pit Indian restaurant was designed promote meaningful exchange between industry and academia. My favorite thing about this event is seeing people, who would never have otherwise met, engaged in animated conversations over delicious food.

WHAT COULD HAVE GONE BETTER

The food. I used to strive for perfect marks on food from even the pickiest of eaters, but I’ve realized that catering is one of those things that will never go as well as I plan.

The length of lunch. In the past, lunch was 1.5 hours long because we conducted panel discussion on topics related to big data. This year, we kept the 1.5-hour lunch but dropped the panel discussion. As a result, some momentum was lost and attendance dropped a little bit for the afternoon session.

The length of the poster session. I thought 1.5 hours would be long enough for the poster session, but I was wrong. One judge didn’t have enough time to complete all the evaluations, and I one presenter was mid-sentence when the Facilities clean-up crew arrived at her poster to break down the poster stand. Maybe next year we can increase the session to two or more hours.

The industry exhibits. We invited representatives from six local companies at tables to host tables or “exhibits” during the poster session. I had never arranged anything like this before, so I didn’t really know what to ask of the representatives. I think it could have been a little more exciting or vibrant, because they didn’t attract much of a crowd. I would argue, however, that it was useful at promoting networking, if not during the poster session then later in the evening during the dinner.

WHAT TO EXPECT IN 2017

Overall, I was really happy with the symposium and mixer, and so were the attendees! I think we have a good system for promoting the spread of knowledge in our community. Next year, we will bring back a more interactive session, like the breakout sessions from previous years, to provide a space for group discussions on specific challenges and opportunities for research. We also hope to extend our reach to industry beyond local companies to bring in representatives from national companies with mutual interest in Big Data in Biology. We look forward to seeing you there!

MANY THANKS!

This event would not be possible without the help of many people!

People
Director: Hans Hofmann
Coordinator: Rayna Harris
Graphic Design: Nicole Elmer
Administration: Laurie Alvarez
Dinner Coordinators: Sean Leonard, Rebecca Tarvin, Rayna Harris
Corporate Relations: Kristine Haskett, Sumaya Saati
Symposium helpers: Laurie Alvarez, Nicole Elmer, Dennis Wylie, Benni Goetz, Dhivya Arrassappan

Event Sponsors
Mirna Therapeautics
IBM
The University of Texas at Austin Graduate School
The Graduate Student Assembly

 

Posted in BEACON Researchers at Work, BEACONites | Tagged , , , , | Leave a comment

Evolving antimutator microbial machines

University of Texas at Austin grad student Dacia Leon (Twitter: @leondacia)

This post is by University of Texas at Austin grad student Dacia Leon (Twitter: @leondacia)

Fluorescence microplate readers are really exciting. These instruments are a staple in any synthetic biology lab given that they allow for high-throughput quantification of microbial growth and fluorescence over time – so many experiments, so much data. My lab recently purchased one of these microplate readers, and it rarely experiences the “OFF” state.   A typical experimental setup consists of up to ninety-six engineered strains each containing a fluorescent reporter protein, which is commonly used as a proxy for expression of a synthetic device. As synthetic biologists, we use microbes as host organisms to “design-test-build” devices that are developed as a product of the researcher’s imagination. The nature of these synthetic devices varies greatly in that some address a pressing, societal need while others aim to explore the limits of biology. Our microplate reader is one way in which we can assay/troubleshoot our devices, and has therefore earned a valued position in our lab.

Our beloved microplate reader machine doing what it does best – taking time points every 15 minutes.

Reflecting on our enthusiasm for the microplate reader reminded me of a review, written over ten years ago discussing the fundamental principles of synthetic biology1. The primary author of the review, Drew Endy, states that one of the greatest limitations in engineering is that machines are built to be single-use. This means that once a machine is no longer functional, or simply antiquated, it is discarded or recycled. For example, our microplate reader is not designed to self-replicate and produce new generations of microplate readers, unfortunately. But, if given the possibility, would it not be incredible? Directions – incubating at 37°C overnight will yield hundreds of healthy, microplate reader colonies. The idea of self-replicating machines may seem absurd until we regard microbes as machines.

Genetic engineering of microbes has existed for decades, and there have been many successes in fields such as drug development, bioenergy, production of industrial chemicals, and agriculture. One of my favorite success stories is the biosynthesis of the antimalarial drug, artemisinin, in an engineered yeast host strain2. Artemisinin is naturally a plant-derived compound and its traditional method of isolation results in drastic fluxes in drug price and availability. Plant-based production of artemisinin consists of harvesting biomass from full-grown plants, which takes ~8 months, and then treatment with a solvent in order to extract the artemisinin. Unfortunately, this method is neither cheap nor stable and given the current death toll – over a million annually, most malaria patients are unable to access the treatment. To address this problem, a semi-synthetic artemisinin pathway was constructed in yeast to produce an artemisinin precursor, artemisinic acid, which can be chemically converted into artemisinin. The engineered yeast strain contained multiple modifications including three heterologous genes from the native artemisinin synthesis pathway and a series of alterations refining metabolic flux in the yeast host. Once synthesized, artemisinic acid is transported to the cell’s outer membrane, allowing for rapid purification, and subsequent conversion to artemisinin. Compared to the plant-based method, the engineered yeast host produces comparable levels of artemisinic acid, but over a markedly shorter time period (4-5 days) and using a yeast host omits the erratic nature of plant-based isolation from the process. Currently, research is focused on optimizing the industrial scale production of yeast artemisinin in order to advance it as a viable strategy against malaria.

There is one salient factor that I have excluded from this story so far. Microbes are replicating machines, but these machines are highly error-prone in their replication. Errors can stem from various sources such as environmental stress, the nature of the enzymes involved in DNA replication, and/or any toxic byproducts that are generated by a cell’s own metabolism. In any case, these replication errors lead to fixed mutations. Engineered microbes contain heterologous synthetic devices that utilize a heavy proportion of a cell’s resources for expression. These devices cause cellular stress and decrease fitness, resulting in a strong selection for mutations that eliminate expression of any synthetic part that comprises the device. Over time, random mutations that naturally occur in a living host organism will accumulate and render a synthetic device inactive. Microbes are limited by their inherent unpredictability. This poses a challenging problem for engineers.

PResERV method work flow

To address this issue, one could engineer genetic stability on the side of the host organism, the synthetic device, or both. Synthetic devices can be re-designed by applying strategies such as removing repeat regions, optimizing codon usage, and altering gene expression. These strategies have been applied to device design and are shown to increase the lifetime of the device encoded in the host. One area that has been largely unexplored is on the side of the host organism. My work hypothesizes that the stability of the host genome can be significantly improved by decreasing the natural mutation rate of the host organism, resulting in a lower probability that the encoded synthetic device will mutate and become inactive. By lowering the baseline mutation rate, the host organism can be altered to accommodate any synthetic device, ensuring its long-term stability in the host. My goal is to engineer genetically stable host organisms and to understand the cellular mechanisms required for genetic stability. Our lab has developed an iterative, universal method called Periodic Reselection for Evolutionary Reliable Variants (PResERV), which enriches for genetically stable strains in a population using a fluorescent reporter gene. Mutants that maintain long-term expression of the synthetic fluorescent reporter gene are putative candidates with reduced mutation rates. These mutants are then isolated and sequenced to determine the causative mutations. PResERV will be used to identify genetically stable mutants in the two most commonly utilized host organisms, E. coli and yeast. By applying PResERV in both organisms, I will engineer reduced mutation variants and learn about relevant mechanisms in each organism.

Here are some of the questions I hope to answer with my research:

1.) How conserved are the cellular mechanisms that reduce mutation rates across species?2.) Can I develop general design principles for lowering mutation rates in any organism?3.) What is the limit to reducing mutation rates? Does this depend on the host organism?4.) How is the robustness of a reduced mutation host affected by increasingly complex synthetic devices?

Additionally, I think my work can provide knowledge and resources for the synthetic biology community. A collection of genetically stable host organisms will allow engineers to tackle more challenging problems without being limited by inactivating mutations.

References:
[1] Endy, D. Foundations for engineering biology. Nature 438, 449-453 (2005).
[2] Dae-Kyon R et al. Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature 440, 940-943 (2006).

Posted in BEACON Researchers at Work | Tagged , , , , , | Leave a comment

Anonymity. Does anyone have it?

This post is by North Carolina A&T grad student Siobahn Day

SiobahnD-0144-cropGreetings! My name is Siobahn Day. I’m currently a PhD student in the Computer Science at North Carolina A&T State University. I work as a graduate researcher in the Center for Advanced Studies in Identity Science. I have developed the concept of Adversarial Authorship as a means of preserving author anonymity. I’m currently developing and evaluating an Interactive Evolutionary Computation for Adversarial Authorship which allows users to conceal their writing style.

This research is particularly important to me because as technology has advanced over the years, our laws have not (US). Due to the rapid growth of the internet and social networks it’s very hard for one to have anonymity. As a result, many Anonymous Social Network (ASNs) have arose. Some believe that privacy is dead and I’d like to see what could be done to change that outlook. I’m excited to share with you some of my current research and snippets of a publication that will appear in the The 25th International Conference on Computer Communication and Networks (ICCCN 2016) proceedings later this year. BEACON has given me new and innovative ways at looking at my problem in order to find an effective solution.

Over the last few years, we have seen an increase in the number of Anonymous Social Networks (ASNs). What many internet users may not know is that their writing style can be tracked across the internet and even through an ASN. The good news is that by using a technique referred to as Adversarial Stylometry one can effectively imitate the writing style of another or even obfuscate their own writing style in an effort to conceal their true writing style – for a short term. The bad news is that recent research has shown that Adversarial Stylometry is not effective in concealing ones writing style over the long term. We introduce a number of underlying concepts that will allow users to conceal their writing style over the long term. One such concept we refer to as Adversarial Authorship.

In Adversarial Authorship, authors are provided an AuthorWeb which allows them to see graphically how their writing style compares with others in the AuthorWeb. The AuthorWeb presented uses Entropy-based Evolutionary Clustering (EBEC) in an effort to cluster writing styles. Our results show that EBEC outperforms a number of other machine learning techniques for author recognition. Users of an AuthorWeb can then write to user-specified clusters in an effort to conceal their writing style.

If this research interests you in any way, feel free to read my previous publication: Towards the Development of a Cyber Analysis & Advisement Tool (CAAT) for Mitigating De-Anonymization Attacks . You can also visit my research team at The Center for Advanced Studies in Identity Science. I look forward to continuing to share with BEACON much more of my research as it evolves.

Posted in Uncategorized | Leave a comment

Evolving Deep Neural Networks

This post is by UT Austin grad student Jason Liang

Deep learning has revolutionized the field of machine learning in many ways. From achieving state-of-the-art results in many benchmarks and competitions to effectively exploiting the computational power of the cloud, deep learning has received widespread attention not just in academia but also in industry. Deep learning has helped researchers and scientists obtain state-of-the-art results in speech recognition, object detection, time-series prediction, reinforcement learning, sequential decision-making, video/image processing, and many other supervised and unsupervised learning tasks. One of the leaders in this field is Sentient Technologies, an AI startup based in San Francisco that specializes in financial trading, e-commerce, and healthcare applications using deep learning, evolutionary computation, and other machine learning and data science approaches. I am currently working as an intern at Sentient, developing ways to make deep learning not only easier to implement, but also more applicable to more general problem domains. This internship allows transferring my dissertation research to industry, and also gives me access to computational resources that makes such work possible.

Deep learning, despite its newfound popularity among the machine learning and artificial intelligence community, is actually an extension of decades old neural network research; the major difference is that the size of both the datasets and available computing power have increased exponentially. One of the problems with deep learning is that the architecture design has a large impact on its performance and some problems require specialized architectures. For example, the Googlenet architecture (shown below), which won the 2014 Imagenet competition for image classification, contains specialized submodules which themselves are deep networks. Also, as the networks become more complex, the number of parameters and configurations that needs to be optimized increases as well. At Sentient, my advisor Risto Miikkulainen and I are developing evolutionary algorithms to automatically discover and train the best deep neural networks for a particular problem. Our vision is to eventually create a general framework that is applicable to any problem and uses machines to automate AI and machine learning research.

Googlenet architecture

One of the downsides of deep learning is that training a neural network is very computationally intensive. Most networks of moderate complexity and above take hours, if not days to train in machines with powerful GPUs. This compute cost is even worst for evolution of deep networks, since now there is a whole population of networks that must be trained and evaluated during every generation. Due to the immense computational requirements, evolutionary deep learning has been considered to be impractical until now. Fortunately, Sentient has developed a massively scalable evolutionary algorithm that runs on millions of CPUs all over the world  to evolve stock trading agents. We are currently extending it to utilize GPUs as well, to perform parallel training of each deep neural network simultaneously. This framework will eventually be scalable to hundreds of thousands of GPUs. Since GPUs are expensive and relatively rare, we are also looking at ways of utilizing also CPUs for training deep neural networks. If the training of a single network model can be parallelized across many CPU machines, then it is truly possible scale up evolution of neural nets to millions of machines.

As computing power becomes faster and cheaper, I believe that there is going to a lot newfound interest in applying evolutionary algorithms to deep networks. This approach should be particularly useful in automatic discovery of new architectures for new problem domains, such as understanding cluttered images, video, and natural language, as well as reinforcement learning and sequential decision making. This process will depend on extreme computational resources, thereby making it productive to combine the resources of academia and industry.

Posted in BEACON Researchers at Work | Tagged , , , , | Leave a comment

Mass Extinctions, Evolution, and…. Robots?

Check out this great video produced by the UT Alumni Association talking about research by BEACONites Joel Lehman and Risto Miikkulainen at UT Austin.

Lehman and Miikkulainen published an awesome paper in PLOS ONE looking at evolution after a mass extinction.  I, for one, welcome our new robot overlords.

Here’s their abstract,

Extinction events impact the trajectory of biological evolution significantly. They are often viewed as upheavals to the evolutionary process. In contrast, this paper supports the hypothesis that although they are unpredictably destructive, extinction events may in the long term accelerate evolution by increasing evolvability. In particular, if extinction events extinguish indiscriminately many ways of life, indirectly they may select for the ability to expand rapidly through vacated niches. Lineages with such an ability are more likely to persist through multiple extinctions. Lending computational support for this hypothesis, this paper shows how increased evolvability will result from simulated extinction events in two computational models of evolved behavior. The conclusion is that although they are destructive in the short term, extinction events may make evolution more prolific in the long term.

Posted in BEACON in the News, BEACON Researchers at Work, BEACONites, Member Announcements | Tagged , , , | Leave a comment