BEACON Researchers at Work: Evolving Genome Libraries

This week’s BEACON Researchers at Work post is by University of Texas at Austin graduate student Peter Enyeart.

Peter Enyeart June 2011 (photo by Mario Gallucci)

Photo by Mario Gallucci, http://www.galluccidesign.com/

I love bacteria. That may seem like a strange thing to say, but I really do. When most people think of bacteria, they think of lurking dangers waiting to make us sick. But in reality only a very small fraction of bacteria cause disease, and more than a few do a great deal to help us. In fact, there are more cells of bacteria in your body than there are cells of you. I’ve always loved the idea of discovering more about this unseen world that permeates us.

Bacteria can also do all sorts of amazing things. They can produce electricity and clean up heavy-metal waste (including uranium), survive radiation thousands of times stronger than a human could withstand, build their own magnets to use as compasses, eat oil, and make fuels, to name just a few examples. We’re actually in the midst of something of a golden age in expanding our knowledge of just what is out there in the microbial world. Bacteria never cease to amaze me with the inventive things they can do and the difficult environments they can survive in.

Additionally, as a biologist trying to understand and control the molecular processes of living things, I like bacteria because they represent a sweet spot in complexity. You don’t go into a field like this if you’re not interested in figuring out complicated systems, but some systems are more complicated than others. For instance, E. coli, the most commonly studied bacterium, has a genome size of about 4.6 million base-pairs of DNA, containing about 4000 genes, all on one chromosome. That’s pretty complicated. But compare that to the human genome, which has two versions of 23 different chromosomes, the smallest of which is ten times bigger than the E. coli genome, for a total of 3 billion base pairs of DNA and about 20,000 genes (which can be spliced in many different ways) per set of 23. That’s starting to get crazy. The cellular structure of eukaryotes (which include us) is also much more complicated than bacteria; in fact, some of the structures in eukaryotic cells seem to have originally been bacteria. It’s like cells within cells in there.

So I like bacteria because they’re complicated enough to be interesting, but simple enough that we can understand them much better and reprogram them much more easily than we can our own cells. For example, my research focuses on bacterial genome engineering. I want to rewrite the bacterial code to get them do to new things, and hopefully gain a better understanding of how they work in the process. One of my projects involves making libraries of bacteria with different genomic rearrangements, and then competing them against each other to see which rearrangements are most beneficial. Evolution is a powerful engineering tool. (See here for a video of me explaining another of my projects to middle school students.)

The motivations for this come from both basic and applied science. One interesting thing about bacterial genomes is that, while they vary a great deal in gene content, their overall structure tends to be very similar. This is in contrast to eukaryotes like fungi, plants, and animals, which have all kinds of different genome arrangements but relatively little variation in the types of genes found in them. So the question arises, has the structure of the bacterial genome evolved to the “best” possible structure? Or is it just one of many possibilities that might work as well or better, but are just difficult to evolve to from the current state? Most of the efforts to address this question only looked at a small number of rearrangements, but there are a lot more possibilities out there to examine before we can call the case closed. And if in the process of answering this question we learn how to adapt genomes for specific purposes, all the better.

Figure 1. Making genomic libraries. Lox sites are represented by arrows. IRs (inverted repeats) are DNA sequences that mark off a transposon (which requires a “transposase” to move around). The marker is an antibiotic resistance gene that allows us to kill any cells that don’t insert the transposon into their genomes. When the Cre protein comes in, it deletes the marker between the lox sites and causes a rearrangement with a lox site elsewhere in the genome (represented by grey color-coded DNA being replaced by black color-coded DNA on one side).

So how do we look at a huge number of rearrangements at once? We need two components: one is a DNA element that allows rearrangements to be made, and the other is a mechanism for randomly placing those elements throughout the genome. For the former, we use a 34-base sequence of DNA called a lox site. A protein called Cre will bring lox sites together and recombine them. This results in the DNA between the lox sites being either deleted or inverted. A chunk of DNA deleted in this way can also reinsert at the same or another lox site, allowing for cut-and-paste operations. To place the lox sites, we put them in transposons, sometimes called “jumping genes,” which are pieces of DNA that randomly insert themselves into other pieces of DNA (like genomes). See Figure 1 for a visual depiction of how this works.

Using this method we will be able to build a library that represents all the possible rearrangements between all the genes in the E. coli genome. We can then compete them, sequence their genomes, and program computers to tell us what came out the other end. (This points up one of the advantages of working in bacteria; obtaining and analyzing a similarly complete set of rearrangements for the human genome would be extremely difficult.) Figure 2 shows some actual data from an initial experiment on a library of about 40,000 different genome rearrangements. So far it looks like the cells do like the original structure, but I’m excited to see if we find anything the next time we do this on a library of at least one million genomes. Stay tuned!

Figure 2. Visualized data for genomic rearrangements. The data on the left was obtained several hours after introducing Cre into cells containing three lox sites per genome, and that one the right was obtained 100 generations later. The outer blue ring indicates the frequency of unrecombined transposon insertions, the red ring represents how frequently we see that piece of the genome deleted, the brown ring represents the different structural domains of the genome (with “oriC” and “dif” being the sites of initiation and termination of replication), and the grey lines in the center represent the inversions we saw, with brighter lines representing more frequent inversions.

For more information about Peter’s work, you can contact him at peter dot enyeart at utexas dot edu.

BEACON is a consortium of

Member Resources

Archives

Categories