This week’s BEACON Researchers at Work blog post is by University of Texas at Austin graduate student Emily Jane McTavish.
In early June several BEACONites participated in a hackathon at the National Evolutionary Synthesis Center (NESCent), in Durham, NC. I was there in person, and Luke Harmon’s lab at University of Idaho, including Jon Eastman, Joseph Brown and Matt Pennell telecommuted in. The whole phylotastic team included 31 people from more than 10 institutions. Everyone involved is listed here.
Re-use of phylogenies is at the heart of the Phylotastic mission. While phylogeneticists are busy creating bigger and bigger trees (muahaha!), the information in these trees isn’t currently readily accessible to others. For example, an ecologist might use a phylogeny of the grasses found in a study area, or a high school teacher might like their students to explore the relationships among mammal groups.
Hackathon? Is that like Reggaeton? (An actual text I got while at Phylotastic.)
Yes! Great for dance parties, and… wait. No. They are not very similar.
This was my first hackathon, so it was all new to me. The hackathon concept is to get a bunch of people who are interested in the same software end product together, spend a bit of time talking about how it should work, and a concentrated period of time programming and making it happen!
The concept of the Phylotastic hackthon was developed by the NESCent Hackathons, Interoperability, and Phylogenies group. Although interoperability is a bit jargony, it is a concept that any scientist who has wasted huge chunks of time formatting data for the nit-picky specifications of various analysis software can appreciate: data and analyses should be housed in such a way that the information can be easily transferred. One example of this is the NeXML format for storing phylogenetic information. Unlike the standard nexus file, which requires a lot of other information not included in that file to be understood, NeXML format holds all the information about the phylogenic analysis together in one place, easily archived, and most importantly, easily re-used.
Phylotastic will be implemented as a web application. The concept is simple: a user types in the names of species they are investigating, and Phylotastic spits out a tree of those species. Phylotastic has three main components, the tree store, taxonomic name resolution service, and the topology server. In addition several neat extensions have already been developed.
Components of Phylotastic:
Tree store: A back end data store that includes large trees that can be pruned to make the output tree. Phylotastic will be able to output phylogenies based on either these trees or a user-inputted source tree My role in phylotastic is working on how to include metadata in association with these trees. A set of standards for the appropriate Minimum Information About a Phylogenic Analysis (MIAPA) were developed at NESCent last year. MIAPA is designed to standardize the data associated with trees. Sharing of trees in simple parenthetical (Newick) format, divorces phylogenies from essential information about how they were constructed, for example, reference for the publication, molecular vs. morphological data, Bayesian analyses vs. ML vs. Parsimony, the units of branch lengths, and much more.
Taxonomic Name Resolution Service: This service translates user input names into those used on trees in the tree store. Name resolution is a trickier problem than it might seem, and very impressively implemented by the TNRS group. One of the most pervasive problems, is the interpretation of different taxon names (sometimes for the same group), or spelling errors. Most biologists are familiar with how heated discussions of naming can be, (for an example see the Sperm Whale Wikipedia talk page. Warning: strong language!)
See demo python module to translate common names to species names.
(FYI: TNRS suggests Physeter catodon for the sperm whale)
Phylotastic Topology Demo: Ge a tree for your taxa of interest at http://phylotastic-wg.nescent.org/script/phylotastic.cgi. This demo doesn’t have the TNRS implemented yet, and so doesn’t support for common names yet, but if you enter names in the format genus_species, you can get a pruned tree.
Extensions of Phylotastic
Datelife: Many researchers’ questions require dates associated with divergences in trees. The phylotastic project Datelife returns times of divergences for pairs of species. Take a look at the website for a neat demo, and check out the divergence time of your favorite species pair! http://datelife.org/
Reconcilotastic: Treats the trees in phytotastic trees store as species trees and reconciles gene tree relationships with these species trees. http://phylotastic.nescent.org/shiny/reconciliotastic/
Phylotastic playground: Integrates geographic information about species locations and ranges with phylotastic output. http://phylotastic.org/demos/phylotastic-viz/index.html
Mesquite-otastic: Phylotastic is implemented as a module in the phylogenetics software environment Mesquite. This screencast shows how a researcher could perform ancestral state reconstruction using a tree pulled form phylotastic. See a screencast!
The Phylotastic project is still in progress, and all the components still need to be fully integrated. Take a look at these demonstations though, and see if it might be of interest to you or your collaborators.
For more information, you can contact Emily Jane at ejmctavish at utexas dot edu.