An Instinct for Truth: a new book by BEACON co-founder Robert T. Pennock

Book cover for An Instinct for TruthRobert T. Pennock, a BEACON co-founder and co-PI, has just published a new book. An Instinct for Truth: Curiosity and the Moral Character of Science is an exploration of the scientific mindset—such character virtues as curiosity, veracity, attentiveness, and humility to evidence—and its importance for science, democracy, and human flourishing.

The title comes from a quote from Charles Darwin, who wrote in a letter to a scientist colleague that “I believe there exists, & I feel within me, an instinct for truth, or knowledge or discovery, of something of the same nature as the instinct of virtue…”

Some of the research for the book was supported by BEACON. An Instinct for Truth provides the philosophical basis for the Scientific Virtues Toolbox responsible conduct of research (RCR) workshops that Pennock and colleagues developed for BEACON. They are now running virtue-based RCR workshops for various departments across campus and plan to expand nationally.

Pennock is also applying this scientific virtue-based approach to try to improve STEM education, arguing for the importance of teaching the values that comprise the scientific mindset. Software like Avida-ED and Salmon Run help teach evolution but they also are evolutionary playgrounds where students can exercise their scientific curiosity.

Robert T. Pennock is a Distinguished Professor at Lyman Briggs College at Michigan State University, with appointments in the departments of Philosophy and Computer Science & Engineering. In addition to BEACON, Pennock is affiliated with MSU”s Ecology, Evolutionary Biology, and Behavior (EEBB) program and Socially Engaged Philosophy of Science (SEPOS).

The book is published by MIT Press. Below is the book overview from the publisher’s website:

Exemplary scientists have a characteristic way of viewing the world and their work: their mindset and methods all aim at discovering truths about nature. In An Instinct for Truth, Robert Pennock explores this scientific mindset and argues that what Charles Darwin called “an instinct for truth, knowledge, and discovery” has a tacit moral structure—that it is important not only for scientific excellence and integrity but also for democracy and human flourishing. In an era of “post-truth,” the scientific drive to discover empirical truths has a special value.

Taking a virtue-theoretic perspective, Pennock explores curiosity, veracity, skepticism, humility to evidence, and other scientific virtues and vices. He explains that curiosity is the most distinctive element of the scientific character, by which other norms are shaped; discusses the passionate nature of scientific attentiveness; and calls for science education not only to teach scientific findings and methods but also to nurture the scientific mindset and its core values.

Drawing on historical sources as well as a sociological study of more than a thousand scientists, Pennock’s philosophical account is grounded in values that scientists themselves recognize they should aspire to. Pennock argues that epistemic and ethical values are normatively interconnected, and that for science and society to flourish, we need not just a philosophy of science, but a philosophy of the scientist.

“In An Instinct for Truth, a wide-ranging volume on philosophical, historical, religious and sociological aspects of the scientific vocation, Robert T. Pennock shows that not only is curiosity a powerful motivator in the drive for reliable knowledge, it also, if guided by a virtuous scientist, leads to socially beneficial outcomes. Any practicing scientist or student of science can benefit from Pennock’s observations about why we do science, or more, how to do science right.” — Rush D. Holt, CEO and Executive Publisher, American Association for the Advancement of Science

Posted in BEACON in the News | Comments Off on An Instinct for Truth: a new book by BEACON co-founder Robert T. Pennock

BEACON alum Wendy Smythe receives AISES Professional of the Year award

Dr. Wendy Smythe, former BEACON Postdoctoral Research Fellow (2016-2018) received the American Indian Science and Engineering Society (AISES) Professional of the Year Award. 

Wendy Smythe, now a tenure track assistant professor at the University of Minnesota Duluth (UMD), received the AISES award based on her overall leadership and technical achievements.

She is Alaska Native Haida from Hydaburg, Alaska. Her Haida name is K’ah Skaahluwaa (laughing lady), from the Xáadas (Haida) clan of Sdast’ aas (Fish egg house). She works with Indigenous communities to couple STEM discipline with Traditional Ecological Knowledge (TEK) in K-12 education. Through her work, she seeks to increase the number of Indigenous people represented in STEM disciplines, increase diversity and innovation, and teach the next generation of Indigenous leaders.

Wendy has a dual Ph.D. in both environmental science and engineering and estuary and ocean systems from Oregon Health and Science University. Two years ago, she founded the Geoscience Foundation Education program in her tribal community of Hydaburg, Alaska, in collaboration with the tribe (Hydaburg Cooperative Association) and the Hydaburg School District.

In the fall of 2019, she joined UMD’s American Indian Studies Department, as a tenure track assistant professor, as the first scientist to join the department leading the new Masters of Tribal Resource and Environmental Stewardship program. She recently completed a technology and policy fellowship with the National Science Foundation’s AAAS Program.

 

Posted in BEACON in the News, BEACONites | Tagged | Comments Off on BEACON alum Wendy Smythe receives AISES Professional of the Year award

The evolution of academic posters: from Poster 1.0 to Better Poster 2.0 to Hybrid Poster 1.5

By: Natalie Vande Pol (PhD Candidate, Michigan State University)

This week marks the start of my 6th year as a PhD student in the Microbiology and Molecular Genetics program at Michigan State University. I have been extremely fortunate to attend a professional conference in my research field every summer since I began my graduate program. At 4 of those 5 conferences, I have presented a poster describing my research. (Figure 1) And until this year, there was a standard procedure for writing and designing a scientific poster. This year, that all changed…

Figure 1: MC11 poster session, July 2018 (Lodge, D.J., Cantrell, S.A., Luangsa-ard, J. et al. IMA Fungus (2018) 9: 52. https://doi.org/10.1007/BF03449438)

It all started with a video produced by Mike Morrison (@mikemorrison), an Industrial/Organizational Psychology PhD student at Michigan State University. In the video, Mr Morrison makes the argument that the standard poster format used by most academics is overly technical and usually obscures the main finding(s) of the science being presented (Figures 2 & 3). Also, the time required to parse the information on a poster means that most people attending a poster session are only able to really engage with 3-6 posters in an hour, severely limiting the dissemination of potentially useful knowledge through the scientific community. Mr. Morrison proposes an alternative poster design, which he calls “Poster 2.0” (Figure 4, Video). The biggest changes to the poster layout are 1) a large, central, simple takeaway message that summarizes the point of the poster in accessible language; 2) a “standalone” bar on the left with a very basic introduction, methods, and discussion; and 3) an “ammo” bar on the right with anything that the presenter might want to have handy when talking about their poster. The standalone bar is meant for someone to read about your research in more detail without needing to engage with the presenter. In addition, Mr. Morrison suggests including a QR code, which he suggests pointing to the paper associated with the poster subject so that readers can access the additional detail they might want.

Figure 2: “Poster 1.0” by Mr. Mike Morrison

Figure 3: My “Poster 1.0” at IMC11, July 2018.

Figure 4: “Poster 2.0” by Mike Morrison

“How to create a better research poster in less time (including templates)” by Mike Morrison

The big advantage to the Poster 2.0 format is that the takeaway message is highly accessible, meaning that it is a short, prominently displayed message in plain language and large font that can be read and understood in the time it takes to walk past the poster. It is also supposed to be very easy and fast to write the poster since the language is simple and the “ammo bar” is unformatted. The disadvantages of this format is that the very low detail of the introduction, methods, and intermediate results make it somewhat difficult for a reader to learn more about the project when the presenter is absent. Since most posters are hung and available all day in advance of the actual poster session, this can be disadvantageous. Now, some would say that this simply means that during the poster session, the reader will come and discuss the poster with the presenter, or read the paper using the QR code (if there is a paper and if the reader has a QR scanner on their phone). However, I have also overheard some more “old-fashioned” academics who regard this lack of instantaneously available detail and new format as “gimmicky” and faintly unacceptable/unprofessional.

Being a rebellious, tech-loving millennial, I decided to give the Poster 2.0 format a shot when writing a poster for a conference earlier this month. The first thing I learned was that it’s actually really hard to distil the main takeaway message, especially from my preliminary and incomplete results, which is what most posters describe. Moreover, it’s also really hard to distill an introduction, methods, results, and discussion into less than a quarter of the poster space. Ultimately, writing this poster was no faster than any of the other 4 posters I’ve written. In the end, I decided to create a hybrid version, which I jokingly called “Poster 1.5.” Poster 1.5 maintains the large, simple, prominent takeaway message and slightly abbreviated text, but has significantly more text than 2.0 and lacks the ammo bar. Finally, since I was presenting preliminary data, I had no paper to which to direct a QR code, so I eliminated that element as well. I will point out, the QR code doesn’t need to point to a paper, it could point to any form of supplementary multimedia (videos, audio, etc.), the presenter’s website, and so much more.

It turns out, I’m not the only one to have the idea for a hybrid. There is an active academic Twitter community around Poster 2.0, with followers posting pictures of their implementations and adaptations. Dr. Andrew R. Smith (@AndrewRSmith), an Associate Professor of Psychology at Appalachian State University posted a template for his own rendition of a Poster 1.5 (Figure 5).

Figure 5: “Poster 1.5” by Dr. Andrew R. Smith

When I presented my Poster 1.5, I had the most “traffic” at my poster than ever, especially from a more generalized audience. In the past, most of the people who have visited my posters were specialists who picked out keywords from my poster title and were working with the same organism. With the main takeaway of the poster front and center, I also met people who were interested in my methods and intermediate findings and the applications/implications for broader research. Discussion was more animated and since my entire poster was already written in plainer language, it was a lot easier for me to develop a generalized “schpiel” on the spot, rather than sifting through all the details and trying to create a schpiel adapted for each listener. It was much easier to add detail than to subtract it.

In conclusion, I think that the conversation and experimentation that Mr. Morrison instigated has been invaluable to academia. The same-old Poster 1.0 format has been so standard that nobody (except Mr. Morrison) even questioned whether there might be a better way to do things. Just having challenged the status-quo has radically shaken up poster design and new adaptations are being explored all the time. I’m very excited to see how poster design continues to evolve and expect we will see the rise of many new poster “species” tailored to the needs of different fields and content types.

Posted in Uncategorized | Comments Off on The evolution of academic posters: from Poster 1.0 to Better Poster 2.0 to Hybrid Poster 1.5

BEACON Team wins Best Paper Award in Evolutionary Machine Learning Track at GECCO 2019

Zhichao Lu and colleagues accepting the Best Paper Award at GECCO 2019

Congratulations to BEACONites Zhichao Lu, Ian Whalen, Vishnu Boddeti, Yashesh Dhebar, Kalyanmoy Deb, Erik Goodman, and Wolfgang Banzhaf! Their paper “NSGA-Net: Neural Architecture Search using Multi-Objective Genetic Algorithm” won the Best Paper Award in the Evolutionary Machine Learning track at GECCO 2019 in Prague.

There were in total 64 papers submitted to the Evolutionary Machine Learning (EML) track, only 16 of which were accepted as full papers. Two papers were nominated for Best Paper Award. Zhichao Lu and colleagues won the award based on the on-site voting from the conference attendees.

Here is the abstract of the paper, which can be accessed from arXiv: https://arxiv.org/abs/1810.03522

This paper introduces NSGA-Net – an evolutionary approach for neural architecture search (NAS). NSGA-Net is designed with three goals in mind: (1) a procedure considering multiple and conflicting objectives, (2) an efficient procedure balancing exploration and exploitation of the space of potential neural network architectures, and (3) a procedure finding a diverse set of trade-off network architectures achieved in a single run. NSGA-Net is a population-based search algorithm that explores a space of potential neural network architectures in three steps, namely, a population initialization step that is based on prior-knowledge from hand-crafted architectures, an exploration step comprising crossover and mutation of architectures, and finally an exploitation step that utilizes the hidden useful knowledge stored in the entire history of evaluated neural architectures in the form of a Bayesian Network. Experimental results suggest that combining the dual objectives of minimizing an error metric and computational complexity, as measured by FLOPs, allows NSGA-Net to find competitive neural architectures. Moreover, NSGA-Net achieves a comparable error rate on the CIFAR-10 dataset when compared to other state-of-the-art NAS methods while using orders of magnitude less computational resources. These results are encouraging and show the promise to further use of EC methods in various deep-learning paradigms.

The source code for the paper can be accessed from GitHub: https://github.com/ianwhale/nsga-net

Posted in BEACON in the News | Comments Off on BEACON Team wins Best Paper Award in Evolutionary Machine Learning Track at GECCO 2019

Genome Hackers – a near-peer, interdisciplinary summer program for high school girls

By: Cindy Yeh, Graduate Student, (Dunham Lab, Genome Sciences), University of Washington

Only 26% of the computing professional workforce is made of women, less than 10% of whom are women of color (ncwit.org). This is in contrast to the gender distribution in the life sciences, which is much closer to 50%. As technology continues to play an increasingly important role in our lives, addressing this gender disparity by giving young women access and exposure to computational thinking early is imperative.

I was introduced to programming as a high schooler, but never really learned how to code until I started my PhD program at the University of Washington in the Genome Sciences Department. Programming felt more intuitive when I was trying to implement a biological concept, such as finding the longest matching pattern in a DNA sequence using a suffix array or extracting information from FASTA files. Learning computer science can be intimidating, but I figured if this method allowed me to better understand its logic, it could be a great way to introduce young women to coding and make technical fields more accessible. Indeed, many research studies have found that integrated approaches are much more effective than traditional, non-interdisciplinary curricula. Furthermore, developing integrated lessons require many hours of professional development for which many teachers may not have time (Lin et al., 2018; Struyf et al. 2019; Salami et al., 2015; Stohlmann et al., 2012; Thibaut et al., 2018).

In 2017, as first year graduate students, my colleague, Andria Ellis, and I received a small grant from the National Center for Women in Information Technologies (NCWIT) to run a one-week, half-day summer camp for high school girls called Genome Hackers. We wanted our program not to have our participants walk away as experts in computer science or genomics, but to introduce them to concepts that they otherwise would never have the opportunity to learn prior to college. The idea was that if they were challenged by these topics in the future, they would seem less abstract or intimidating. We also wanted to teach real-world applications of computer science and how it specifically is used in genomics. With a team of graduate student instructors, our participants learned how to perform PCR to isolate and amplify a particular gene and subsequently Sanger sequence the PCR product to retrieve raw sequences. Simultaneously, we taught them the basics in programming through Python. By the end of the camp, the participants had written transcription and translation scripts, where they can directly take their Sanger sequencing results and determine the amino acid sequences of their gene. Furthermore, they shared their sequencing results with other students and generated a phylogenetic tree to investigate the relatedness of the same gene from various species. They also used their final amino acid sequence to generate a predicted protein structure compared across species as well.

Figure 1: 2017 participants learning how to pipette

Genome Hackers culminates in a poster session where the students share with scientists in the department (and with their family and friends!) their many accomplishments over the course of the week. This really helps tie the week together, and participants walk away with something concrete that they can show off. Furthermore, our camp is affordable ($50/week with scholarship available); this is in contrast to many other biotechnology camps where fees can be a deciding factor for many applicants, usually costing, at the minimum, $300, per week (these can sometimes cost upwards of $500 per participant!).

Figure 2: Participants working hard on their transcription and translation scripts

After receiving overwhelmingly positive feedback from graduate students, faculty, teachers, and parents, we will be running Genome Hackers in 2019 for its third year in a row. We are also running iterations of this camp through two other campuses (SoundBio Labs and University of Chicago). Here we will determine what aspects of our current curriculum are easy to implement and what areas need improvement. Our final goal is to package our program into something any high school biology teacher or graduate student can pick up and implement on their own without my or Andria’s presence.

Figure 3: 2017 participants presenting their findings to those teachers, parents, and scientists at University of Washington

Figure 4: 2018 participants presenting their findings to those teachers, parents, and scientists at University of Washington

Several of our former participants have now also participated in Girls Who Code at Fred Hutch or gone on to pursue technical degrees. One former participant has even returned to Genome Hackers as a near-peer mentor and may lead her own session this year. I never would have guessed that this was something I would accomplish (or even want to accomplish) as a graduate student. While I did put a lot of energy towards outreach and service as an undergraduate, being able to take what I have learned in the lab as a graduate student and materialize it into teaching high school students has been one of the most rewarding activities I’ve ever pursued in and outside of my scientific career. Andria and I are also both very lucky that our PIs (Cole Trapnell and Maitreya Dunham, respectively) appreciate outreach activities and continue to encourage us to pursue them.

Figure 5: Group photo from 2018. Cindy and Andria and are the ends of the front row.

We are always searching for new ideas or collaborators who may be interested in running their own version of Genome Hackers. We have a website (genomehackers.org) and an e-mail (genomehackersuw@nullgmail.com) and are very interested in hearing your comments.

Figure 6: Students’ confidence and interest levels before and after Genome Hackers

Participant Testimonials:

“I have been taught coding before, but I feel like […this program] introduced a new coding language very well.”

 

“I liked how I got to see how programming aided genome scientists.”

 

“My favorite part was getting to learn a new coding language, and combining two of my passions.”

 

“I wasn’t very interested in coding, but after actually doing some coding I now really like it and I might look into doing coding for a career with biology.”

 

“I will remember creating my first science poster. It felt amazing learning how to reach a conclusion and finally getting to have something to show for it.”

 

“I was really proud of myself for figuring out how to code a DNA strand into RNA.”

 

 

References

Lin, Y.-T., Wang, M.-T., Wu, C.-C., 2018. Design and Implementation of Interdisciplinary STEM Instruction: Teaching Programming by Computational Physics. The Asia-Pacific Education Researcher 28, 77–91. doi:10.1007/s40299-018-0415-0

Salami, M.K.A., Makela, C.J., Miranda, M.A.D., 2015. Assessing changes in teachers’ attitudes toward interdisciplinary STEM teaching. International Journal of Technology and Design Education 27, 63–88. doi:10.1007/s10798-015-9341-0

Stohlmann, M., Moore, T., Roehrig, G., 2012. Considerations for Teaching Integrated STEM Education. Journal of Pre-College Engineering Education Research 2, 28–34. doi:10.5703/1288284314653

Struyf, A., Loof, H.D., Pauw, J.B.-D., Petegem, P.V., 2019. Students’ engagement in different STEM learning environments: integrated STEM education as promising practice? International Journal of Science Education 41, 1387–1407. doi:10.1080/09500693.2019.1607983

Thibaut, L., Knipprath, H., Dehaene, W., Depaepe, F., 2018. The influence of teachers’ attitudes and school context on instructional practices in integrated STEM education. Teaching and Teacher Education 71, 190–205. doi:10.1016/j.tate.2017.12.014

Posted in Diversity in STEM, Education | Comments Off on Genome Hackers – a near-peer, interdisciplinary summer program for high school girls

The devil in the closet

By: Dr. Wenying Shou – Fred Hutchinson Cancer Research Center

Sometimes in science, a seemingly straightforward journey can take an enormous amount of time. Our paper in PLoS Biology (Hart et al., 2019) was one such journey. The question seemed easy enough: for a highly simplified microbial community — a community of two yeast strains engineered to help each other or “cooperate”, could we predict how fast the community might grow?

If you think that this question is esoteric, it is not. Cooperation is surprisingly common in biology: pathogenic bacteria cooperate with each other to launch infections; microbes in sewage treatment sludge cooperate to break down wastes. The faster a community can grow, the more likely it will survive perturbations or advance to new territories. Ultimately, a quantitative understanding of microbial communities will empower us to control and use communities as, for example, probiotics.

One remarkable aspect of this work — the very long and turbulent gestation — is invisible from the data themselves. When responding to journal reviewers’ critiques, I had the urge to write down this untold story of scientific discovery.

A humble dream

The project started when I was a postdoc about 17 years ago. To tie biology with mathematics, I joined a physicist’s lab at the Rockefeller University in New York City. I wanted to see how interacting “parts” of a biological system might generate quantitative properties of the system as a “whole”.

All biological systems consist of parts. For example, an ecological community consists of interacting species, and the human body consists of different cell types. A quantitative understanding of how a biological system works can be very powerful. For example, it could help us predict what would happen if we were to perturb a part.

A mathematical model consists of one or more equations. An equation describes how different quantities are linked to each other. For example, how fast population size changes equals how fast new members are added through birth, minus how fast the existing members die. The growth and death rates are examples of model parameters.

Back then, people had been modeling biological systems such as ecological communities, gene regulatory networks, and the cell division cycle. Some models matched data beautifully. However, the renowned mathematician and computer scientist John von Neumann once stated “with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” In other words, given enough “free parameters” — parameters one could freely choose rather than being constrained by reality from experimental measurements, a model can be made to fit any data. Although a fitting model can explain data, it does not mean that the model is correct or can predict new data.

To avoid the “free parameter” problem, I decided to start with a very simple system. In such a system, I should know exactly how parts interact with each other. I could then write down the equations, know which parameters need to be measured, and measure all parameters. After much deliberation, I decided to engineer a highly simplified cooperative yeast community consisting of two strains, each supplying the other with an essential metabolite. My colleagues and I thought of a lovely name for it: CoSMO — Cooperation that is Synthetic and Mutually Obligatory. Unlike real-life communities where scientists often have trouble counting the number of species, CoSMO has two and only two strains. Unlike real-life communities where species influence each other by releasing many uncharacterized chemicals, in CoSMO each strain releases only one metabolite which is required and consumed by the partner. Moreover in CoSMO, the two strains coexist due to their inter-dependence, and thus I do not need to worry about losing any one of them.

Now that I know how the two strains interact with each other, it should in principle be easy to predict community properties, such as how fast the community grows — or community growth rate. Community growth rate primarily depends on two traits of each strain: the metabolite release rate and the amount of metabolite consumed per birth. Thus, modeling community growth rate boils down to measuring four parameters. This is by no means ambitious!

Mission aborted

I measured all four parameters. I measured the metabolite release and consumption traits of each strain in the absence of its partner. I got rid of the partner so that the released metabolites would accumulate in the test tube for me to measure, instead of being immediately consumed by the partner. However, since the partner was not present, the measurement environment (called the “batch culture” environment) differed from the community environment. For example, to measure metabolite consumption in batch cultures, I would add a high dose of metabolite at the beginning of an experiment. In contrast, in communities, the consumed metabolite is constantly supplied by the partner at a low level. Measuring strain traits in a community-like environment would require a special experimental setup. So, I had to hope that the batch culture environment could approximate the community environment.

After measuring the four parameters, I predicted community growth rate. However, my prediction was way off from the experimental results. I was disappointed. In theory, a mathematical model is useful when it fails, because failure suggests that we are still missing important pieces. In reality, I was far from being thrilled by the failure, because too many pieces could be missing, even in a system as simple as CoSMO. For example, the batch culture environment might not approximate the community environment; cells could be evolving… The problem immediately becomes monstrously messy and un-elegant — a devil.

Eventually, I aborted the mission. I was forced to ask whether I could explain some other community property — the minimal total cell density required for the community to start to grow. That calculation required measuring more parameters — such as cell growth rates at various metabolite concentrations. I did not have the experimental setup for such measurements, so I gave in to free parameters. I did what was, and is still, commonly done: looking for literature values. The literature values varied over an order of magnitude, so I naturally chose the value that could explain my data. I felt guilty, but comforted myself by noting that at least, the free parameter I chose was not outrageous, and that at least, the fraction of free parameters in my model was far lower than most other models. This reasoning did not exonerate me, but helped to bring a closure to my postdoc project (Shou et al., 2007).

Haunted by the devil

When I started my lab at the Hutch, I promptly locked up the devil in a closet. I felt caught: on the one hand, grant reviewers kept punishing me because “CoSMO is too simple”, yet on the other hand, I could not even understand a very basic property of the community. My modeling failure was indeed humiliating. There was no way I could rephrase the question in an exciting fashion to attract any one, possibly even including myself.

My group started working on other more sexy problems, such as how the two cooperating strains might fend off cheaters who consume but do not contribute metabolites (Momeni et al., 2013b; Waite and Shou, 2012).

Despite group members’ successes, the devil kept haunting me. Babak Momeni, then a postdoctoral fellow in my lab, examined spatial patterning in CoSMO when CoSMO grew on an agarose pad. When we compared patterns predicted by our model versus patterns observed in experiments, they looked similar in a qualitative sense. However, the timing looked very different. This is not surprising given that we do not understand how fast the community grows. Fortunately, dynamics was not the focus of that paper, so we erased all time stamps from our simulations (Momeni et al., 2013a).

Figure 1. CoSMO patterning. The two cooperating strains were engineered to express green or red fluorescent proteins, and can thus be distinguished under a microscope. Time stamp was shown for experiment (right) and not simulation (left). This figure is from (Momeni et al., 2013a).

Devil breaking loose

Eventually, the devil of my past failure would not allow me to ignore it any further.

Chi-Chun Chen and Jose Pineda, two talented group members, were quantifying metabolite release rates of evolved cells. They wanted to see whether cells could evolve to be more “generous” by releasing more. However, Chi-Chun and Jose were getting highly variable results despite their superb experimental skills. It seemed that we got stuck when the question turned quantitative.

We suspected that the variable measurement results could be due to cell traits being highly sensitive to the measurement environment. To enable measurements in a community-like environment, David Skelding — a physicist in the lab — started to build devices called “chemostats”. In chemostats, nutrients were supplied at a small dose (in small drops) but frequently (every tens of seconds), mimicking partner strain’s slow but constant metabolite release rate. It took David a good year or more to ensure that chemostats worked reliably and precisely (Skelding et al., 2018).

 

Figure 2. Chemostats. This home-made multi-plexed chemostat has eight culturing chambers (tubes with yellow stoppers). The syringe pump on the left pushes the fresh medium into chambers through tubing. Sterile humidified air was also introduced into the chambers to push out excess waste. This figure is from (Skelding et al., 2018).

Taming the devil

When Sam Hart joined my lab as a research technician, he inherited the problems I, Chi-Chun, and Jose had left behind.

Initially, Sam continued to quantify strain traits in batch cultures, because after all, chemostat measurements are much harder and are limited by the number of chambers. However, at some point, we realized that without getting our fundamentals on a solid footing, we would be chasing after our tails: If we do not understand the two ancestral strains (i.e. why ancestral strains’ traits cannot explain ancestral community’s growth rate), there is no point trying to understand evolved strains.

By that time, Sam had already invested a year or two. But Sam was unflustered because he understood the importance of asking the right, albeit inconvenient, question. Sam re-measured ancestral strains’ metabolite release and consumption traits in David’s chemostats. By controlling how slowly metabolites were supplied, Sam could force cells to grow at various slow rates observed in CoSMO. However, chemostats introduced their own devil: because of metabolite limitation, cells from both strains quickly evolved away from their original states while adapting to metabolite limitation. Sam then figured out ways to deal with this new problem.

Figure 3. Ancestral versus evolved clones. On agarose with low metabolite, ancestral cells failed to divide (arrows). Cells from a mildly-adapted evolved clone (center) showed mixed phenomena: some cells remained undivided (arrow), while other cells formed microcolonies of various sizes. Cells from a strongly-adapted evolved clone formed microcolonies of a uniform and large size. These images were taken using a cell phone camera and thus do not have a scale bar. For reference, an average yeast cell (e.g. black dots in “anc”) has a diameter of ~5 µm. Image from (Hart et al., 2019).

Eventually, Sam discovered that indeed, measurements of metabolite release and consumption traits could differ significantly in chemostats versus in batch cultures. Hanbing Mi, an undergraduate visiting student from China, figured out how to properly measure community growth rate when cells could evolve quickly. Once we took all these into consideration, we solved the puzzle (Hart et al., 2019). But only partially: we still do not understand CoSMO’s initial phase of slower growth.

Figure 4. Model can explain experimental observations of CoSMO long-term growth rate. Model prediction explained experiments (purple) when parameters were measured in community-like chemostat environments (green), and not when parameters were measured in batch culture environments (blue). Error bars mark 95% confidence interval. This figure is from (Hart et al., 2019).

Using the same quantification methodology that we have found to be trustworthy, Sam figured out what it means to be “generous” (Hart & Pineda et al., 2019), and which mutants evolved to be more generous (manuscript in preparation). Sam is now a graduate student at the University of Washington.

Summary

It takes a lot to do careful science. For science to advance, it must stand on a solid foundation. By demonstrating how to properly model a very simple living system, we have helped setting the standard for future modeling of more complex systems such as probiotic communities or infectious diseases.

Acknowledgements

I am very grateful to my lab members, particularly Sam Hart, Hanbing Mi,  Jose Pineda, Chi-Chun Chen, and David Skelding, for doing high-quality work.

Hart & Pineda, Chen, Chichun, Green, Robin, Shou W. 2019. Disentangling strictly self-serving mutations from win-win mutations in a mutualistic microbial community. eLife Accepted.

Hart SFM, Mi H, Green R, Xie L, Pineda JMB, Momeni B, Shou W. 2019. Uncovering and resolving challenges of quantitative modeling in a simplified community of interacting cells. PLOS Biol 17:e3000135. doi:10.1371/journal.pbio.3000135

Momeni B, Brileya KA, Fields MW, Shou W. 2013a. Strong inter-population cooperation leads to partner intermixing in microbial communities. eLife 2:e00230. doi:10.7554/eLife.00230

Momeni B, Waite AJ, Shou W. 2013b. Spatial self-organization favors heterotypic cooperation over cheating. eLife 2:e00960. doi:10.7554/eLife.00960

Shou W, Ram S, Vilar JM. 2007. Synthetic cooperation in engineered yeast populations. Proc Natl Acad Sci USA 104:1877–1882. doi:10.1073/pnas.0610575104

Skelding D, Hart SF, Vidyasagar T, Pozhitkov AE, Shou W. 2018. Developing a low-cost milliliter-scale chemostat array for precise control of cellular growth. Quant Biol 6:129–141.

Waite AJ, Shou W. 2012. Adaptation to a new environment allows cooperators to purge cheaters stochastically. Proc Natl Acad Sci 109:19079–19086. doi:10.1073/pnas.1210190109

 

Posted in Uncategorized | Comments Off on The devil in the closet

Paul Turner elected to National Academy of Sciences

Paul TurnerProfessor Paul Turner was elected to the National Academy of Sciences earlier this week (following his election to the American Academy of Arts & Sciences two weeks ago).

Paul Turner is a professor of of Ecology and Evolutionary Biology at Yale University, and he joined BEACON as a Faculty Affiliate in 2013. Since then has been involved in many BEACON projects and mentored several BEACON trainees. He spoke about his fascinating work on using viruses to control antibiotic-resistant bacteria at last summer’s BEACON Congress.

When announcing this exciting news to BEACON, Rich Lenski wrote, “Paul has done beautiful work on the evolution of viruses, including intriguing issues that arise when multiple virions infect the same host cell. In the last few years, Paul has also been performing clever, life-saving (literally) experiments in which phages (viruses that infect bacteria) are chosen that target bacterial pathogens that are resistant to every available antibiotic.”

You can read more about Paul and his work here: https://turnerlab.yale.edu/

Here’s a write up of one of the cases of using viruses to fight antibiotic-resistant bacteria:
https://www.statnews.com/2016/12/07/virus-bacteria-phage-therapy/

Congratulations, Paul!

Posted in BEACON in the News | Tagged | Comments Off on Paul Turner elected to National Academy of Sciences

Using a course-based undergraduate research experience to increase leadership opportunities for students

By: Katie Dickinson, research scientist, Kerr Lab (Department of Biology), University of Washington

Katie Dickinson is a research scientist based out of the Kerr Lab (Department of Biology) at the University of Washington

Course-based Undergraduate Research Experiences (CUREs) are becoming increasingly popular, as they enable all students to gain the positive outcomes associated with undergraduate research. In a CURE, students investigate real-world research questions without predefined outcomes.

With support from BEACON and the Howard Hughes Medical Institute, our team has developed a CURE on experimental evolution of antibiotic resistance in Escherichia coli for the introductory biology sequence at the University of Washington. In our CURE, students isolate bacteria strains that are sensitive and resistant to rifampicin and streptomycin, do daily transfers to conduct experimental evolution, and gather and analyze data on variation in level of resistance, the fitness effects of resistance, and collateral effects. In addition, students analyze the products of their own evolution experiments; they sequence the relevant gene(s) of their sensitive and resistant bacterial isolates, look for mutations, and explore how those mutations change protein structure and cellular processes. In this way, the students will gain an understanding of the genetic and phenotypic basis of drug resistance.

Currently, our CURE is being scaled so that several thousand students per year can participate.  The goals of our new curriculum include improving undergraduate students’ understanding of key evolutionary concepts and their ability to design experiments, while also increasing their emotional engagement with their learning, academic performance, confidence, resiliency, and professional identity. One of our CURE’s keys to success: peer facilitators.

Peer facilitators work with graduate teaching assistants (TAs) to run each session in the CURE sequence. In lab, PFs assist the TA by 1) demonstrating lab techniques, 2) answering student questions, and 3) facilitating active learning activities designed to increase understanding of evolutionary theory and experimental results. Their help is crucial, because in many cases the PFs—having completed the CURE previously as a student—have a deeper understanding of the protocols and underlying biology than the TAs, who are often new to the CURE. In addition, PFs play a key mentoring role for their younger peers: offering support, advice, and encouragement.

Peer Facilitator Margaux is assisting with lab preparations

Past PFs have said this experience helped them improve their communication and teaching skills, develop leadership qualities, reinforced their own study skills and science knowledge, and increased their confidence and motivation, in addition to enhancing their CVs. I asked current PFs their thoughts on the program and this is what a few of them had to say.

Bao N. a PF since autumn 2017.
“During my freshman year, I enrolled in the Biology CURE. While I never imagined taking the lead role in group projects, I worked diligently and did not hesitate to ask questions. To my surprise, at the end of the quarter, I was chosen among several students to become a peer facilitator- mentors for students in the course’s next offering session. I jumped at this opportunity, as it was my first leadership position in college. It remains one of the most meaningful experiences I have had at the UW. I learned to appreciate the rigorous scientific research happening during and after each class session. I learned to communicate effectively with students as well as other members of the teaching team. I learned to take responsibility for the knowledge and skills student receive, knowing that they will carry these skills into real-world settings, such as a clinic or a research lab. Being a PF is especially meaningful because I was able to support students more inclusively, especially when I can relate to the academic challenges a student can face in this class. Whereas the TA alone would have limited time helping individual students, my role allows me to spend a little extra time with each student. I was also able to incorporate my own experience as an alumnus of the same class in order to help the course developers build lesson plans. I gained many resources from my own peer facilitators and looked up to them as role models. In return, I strive to be very open with my students if they have questions or concerns about how to succeed in class, how to get involved with research or how to apply to certain scholarships.”

Khoi H. a PF since winter 2019.
“Personally, I really enjoy the idea of CURE lab. The lab itself is refreshing in a way, unlike chemistry or physics labs, you come into lab reading a manual and you can always Google what’s about to happen beforehand. It makes the lab just a boring contest of who can repeat what they found online, whereas the CURE lab, it is an immersive, collaborative effort by students and TAs to attempt to understand a subject. I immediately signed up to PF for CURE labs, because I think this is a great addition to the curriculum. The CURE lab allows me to support students and encourage science in them, whilst maintaining the fun and educational environment. To the students, having a PF is helpful because the students are able to relate to the PF since they are both undergrads, so students may be more comfortable asking PFs for help. This is beneficial to both TA and students because we act as a communication bridge between the two. Although we only formally meet in the classroom, a PF can still assist students outside of class, whether that is in other courses, socially, or emotionally. Additionally, being a PF taught me ways to interpret materials in various ways, making me feel more comfortable when it comes to finding another way to explain the material. Overall, having a PF is beneficial to the students—especially because it improves their understanding and allows them to be more engaged in the course.”

Grace D a PF since winter 2017.
“Serving as a PF has furthered my love of teaching science in ways that are inclusive to all, as well as fostering a personal curiosity in research. Without this program, I would never have had the experience or confidence to pursue other research opportunities. It was also through meeting fellow undergrads interested in STEM education that I came to truly appreciate how extraordinary the CURE PF experience is. While the rest of my peers had similar experiences of tutoring and assisting students with worksheets during lecture, there was a unique difference in how we were able to take ownership of the course material and lab techniques, and also collaborate with, advice, and support both the students and other PFs too. As a result, my career goals have shifted more toward research and academia, something I previously didn’t know anything about, never thought I would be interested in, nor believed that I was capable of. It has been an honor to be a part of this incredible CURE family and I am deeply grateful for the ways it has pushed me to become a better scientist, teacher, and friend.”

Cindy T a PF since autumn 2017.
“I never expected to find myself being a part of something like the CURE lab. During my freshman year, I came off as extremely quiet and shy around people – talking to classmates was something I did not voluntarily engage in. After going through the CURE program, I was surprised to be one of the many students that were eligible and selected to be a PF. At first, I doubted myself; would I be able to guide the undergrads in the “right” direction? However, my fears gradually subsided. The community members within the CURE program were so welcoming and accommodating. This small but growing community of TAs and PFs felt like a small family to me. Throughout my time as a PF in this program, I slowly gained the confidence to communicate more clearly and confidently. The concept of the CURE program also appealed to me. Giving undergrads the opportunity to gain lab experience while performing an actual experiment was something unheard of. Instead of doing a textbook lab experiment where there should be expected results, the data students obtain from this experiment do contribute to a greater cause – so there is some amount of real life application.”

Winter 2019 Peer Facilitator team: Back row, left to right: Bao, Grace, Khoi, Margaux, Deja, Julianna, and Angie.  Front row, left to right: Sammi, Yuri, Richard, and Shannon.   Not pictured: Ariel, Alena, Cindy, Lindsey, Rachael, Reilly, Tibebu, and Veronica

LOOKING AHEAD:

One of our goals is to helping low-income and underrepresented students build the skills and confidence needed to complete a STEM major.  We aim to recruit PFs from diverse backgrounds to serve as role models in the classroom.  In addition, we would like to create a PF mentoring ladder where experienced PFs are partnered with newer PFs to help encourage and train each other. At the core of the PF program is mentoring, research, and education.  To help support the PFs we are working on developing additional resources and training modules that will cover topics such as active learning, mentoring, diversity and equity, career support, general teambuilding, and undergraduate research. We hope that as PFs engage in peer mentoring and support activities, they will pay it forward and will become leaders who teach others what they have learned.

Posted in BEACONites, Education | Comments Off on Using a course-based undergraduate research experience to increase leadership opportunities for students

Fish, You are the Father!

By: Isaac Miller-Crews, PhD Candidate, University of Texas at Austin

My job would be much easier if CVS sold paternity testing kits for fish instead of humans! I am interested in the evolution of the neural regulation of reproduction, which requires knowing whether an animal reproduced. Genetic testing, such as parentage analysis, allows us to figure out relationships among individuals without direct historical knowledge. This testing has generally relied on looking in the DNA for microsatellites but we’re discovering new, more powerful, and cheaper ways to conduct these tests in the ‘Age of Big Data’ (Flanagan, 2018; Hodel, 2016). This is especially true if your fish population stubbornly refuse to have variable microsatellites!

Yet, common standards or guidelines for dealing with next-generation sequencing data still need to be figured out (Flanagan, 2018). Importantly, few bioinformatic tools exist that can differentiate well between closely related individuals or deal with DNA mixtures. Looking at single nucleotide polymorphisms (SNPs) across thousands of genomic sites allows researchers significantly more information on variability among samples than standard microsatellite approaches (Hodel, 2016). A new technique called restriction site-associated DNA sequencing (RAD-seq) helps us narrow down which places to look at on the DNA, because it only sequences certain fragments, and which fragments you get depends on which endonucleases you use to cut up the DNA. 2bRAD sequencing uses an endonuclease (type-2b) that give you consistent fragments across your sample, not to mention it’s very cost-effective (Wang, 2012).

The simplest form of paternity testing is exclusion, in which paternity is ruled out if a single site disagrees between the alleged father and the offspring-mother pair (Marshall, 1998), is prone to errors. (Wang, 2010). Parental and sibship reconstruction can generate full sets of possible parental genotype profiles but cannot be used with pooled offspring samples (Wang, 2004). The most common paternity testing technique uses a likelihood model to categorically assign paternity between individuals (Meagher, 1986). Not only does this approach require setting a threshold to call genotypes, but it also limits paternity to the comparison of only two alleged fathers (Marshall, 1998). Furthermore, this type of technique cannot deal with cases of mixed or pooled samples, since it can only categorically assign paternity to one putative father.

Luckily, there is always a Bayesian approach! Partial paternity testing assigns fractions of the offspring to candidate parents based on the highest Bayesian posterior probability (Hadfield, 2006) and outperforms categorical likelihood models, especially in being able to circumvent systematic biases, such as over-assigning paternity to males with a relatively higher number of homozygous loci (Devlin, 1988). Assigning partial paternity is thus perfect if you want to assess an entire brood or clutch or litter at once!

Most parentage testing techniques assume that parents are unrelated, and the pool of putative parents contain no close relatives, which can lead to troubling situations where full-siblings are assigned parentage over actual parents (Thompson, 1976). Populations with a lot of closely related individuals pose a problem to both microsatellite and SNP assays due to the lower variation amongst samples. In these cases, only 100 SNPs are required to outperform microsatellites (Flanagan, 2018). If close relatives are suspected to be in the sample, broader pedigree analysis is often required, such as done with identity-by-state (IBS) matrix clustering. Yet, to date, only one study has attempted to combine IBS clustering with any paternity testing method, categorical assignment, or to a genotyping-by-sequencing with RAD-seq data (Gutierrez, 2017). If only someone could combine the awesome power of IBS matrix clustering with the staggering potential of partial paternity testing!

The African cichlid fish Burton’s mouthbrooder, Astatotilapia burtoni, is a model system in social neuroscience, which forms highly complex and dynamic social communities. Adult male A. burtoni are considered either territorial or non-territorial (Fernald, 1977). Males position within the social dominance hierarchy is dynamic as possession of territories is transient (Hofmann, 1999). A. burtoni reproduce within territorial bowers prior to female mouth-brooding for around two weeks, during which fry can be directly removed from the mother’s buccal cavity. Current estimates of male reproductive success usually integrate some combination of female behavior (proximity, duration/frequency in shelter, or number of eggs laid in a territory), with variation in female preference assumed from this proxy of male reproductive success (Kidd, 2006). Although a female may associate with a male this does not directly equate to mating outcomes, meaning behavioral scoring is not enough to assign paternity (Theis, 2012).

My research aims to do just that by developing a NGS-based parentage analysis bioinformatics pipeline that integrates partial paternity assignment and IBS matrix clustering. The powerful pairing of these two parentage assignment methods allows detection of biases that might arise from closely related individuals in the alleged parent population and will handle pooled samples of multiple offspring. Which is great since our laboratory population of A. burtoni is quite inbred and produces fairly large broods (imagine mouth-brooding anywhere from 10-60 fry). Implementation of paternity testing to measure reproduction outcomes can help us understand the interaction between dynamic systems such as female reproductive cycle and male social dynamics (Fig. 1).

Figure 1. Research overview of how female internal reproductive state (blue) with male external social structure (red) interact and integrate into producing reproduction (purple). Measuring reproductive output requires the development of paternity testing methods.

The integration of a bioinformatics pipeline and the unique advantages of 2bRAD sequencing will allow for relatively easy expansion both into alternative DNA sequencing approaches and any species, regardless of available genomic resources. I plan to integrate paternity testing, as a measure of Darwinian fitness, into analysis on mate preferences and reproductive success in naturalistic communities of A. burtoni. While we use a lot of behavioral proxies of reproduction, such as social interactions or association time, nothing let’s you know that the deed was done like genetically testing everyone. Layered on top of these models of reproductive success within a social hierarchy I want to integrate neuromolecular techniques, from both the spatial resolution of single genes up to transcriptomic networks. This means I will know information about an individual’s behavior, reproductive success, and neural profile all within the context of an actual social community. Talk about truly integrative!

Isaac Miller-Crews is a PhD candidate in the Hofmann Lab (Department of Integrative Biology) at the
University of Texas at Austin

References:

Devlin, B., Roeder, K., & Ellstrand, N. C. (1988). Fractional paternity assignment: theoretical development and comparison to other methods. Theoretical and Applied Genetics, 76(3), 369–380. https://doi.org/10.1007/BF00265336
Fernald, R. D., & Hirata, N. R. (1977). Field study of Haplochromis burtoni : Quantitative behavioral observations. Animal Behaviour, 25, 964–975.
Flanagan, S. P., & Jones, A. G. (2018). The future of parentage analysis: From microsatellites to SNPs and beyond. Molecular Ecology, mec.14988. https://doi.org/10.1111/mec.14988
Gutierrez, A. P., Turner, F., Gharbi, K., Talbot, R., Lowe, N. R., Peñaloza, C., … Houston, R. D. (2017). Development of a Medium Density Combined-Species SNP Array for Pacific and European Oysters (Crassostrea gigas and Ostrea edulis). G3 (Bethesda, Md.), 7(7), 2209–2218. https://doi.org/10.1534/g3.117.041780
Hadfield, J. D., Richardson, D. S., & Burke, T. (2006). Towards unbiased parentage assignment: Combining genetic, behavioural and spatial data in a Bayesian framework. Molecular Ecology, 15(12), 3715–3730. https://doi.org/10.1111/j.1365-294X.2006.03050.x
Hodel, R. G. J., Segovia-Salcedo, M. C., Landis, J. B., Crowl, A. A., Sun, M., Liu, X., … Soltis, P. S. (2016). The Report of My Death was an Exaggeration: A Review for Researchers Using Microsatellites in the 21st Century. Applications in Plant Sciences, 4(6), 1600025. https://doi.org/10.3732/apps.1600025
Hofmann, H. a, Benson, M. E., & Fernald, R. D. (1999). Social status regulates growth rate: consequences for life-history strategies. Proceedings of the National Academy of Sciences of the United States of America, 96(24), 14171–6. https://doi.org/10.1073/pnas.96.24.14171
Kidd, M. R., Danley, P. D., & Kocher, T. D. (2006). A direct assay of female choice in cichlids: all the eggs in one basket. Journal of Fish Biology, 68(2), 373–384. https://doi.org/10.1111/j.0022-1112.2006.00896.x
Marshall, T. C., Slate, J., Kruuk, L. E. B., & Pemberton, J. M. (1998). Statistical confidence for likelihood-based paternity inference in natural populations. Molecular Ecology, 7(5), 639–655. https://doi.org/10.1046/j.1365-294x.1998.00374.x
Meagher, T. R., & Thompson, E. (1986). The relationship between single parent and parent pair genetic likelihoods in genealogy reconstruction. Theoretical Population Biology, 29(1), 87–106. https://doi.org/10.1016/0040-5809(86)90006-7
Thompson, E. A. (1976). A paradox of genealogical inference. Advances in Applied Probability, 8(04), 648–650. https://doi.org/10.2307/1425927
Wang, J. (2010). Effects of genotyping errors on parentage exclusion analysis. Molecular Ecology, 19(22), 5061–5078. https://doi.org/10.1111/j.1365-294X.2010.04865.x
Wang, J. (2004). Sibship Reconstruction from Genetic Data with Typing Errors. Genetics, 166(4), 1963–1979. https://doi.org/10.1534/genetics.166.4.1963
Wang, S., Meyer, E., Mckay, J. K., & Matz, M. V. (2012). 2b-rad: a simple and flexible method for genome-wide genotyping. https://doi.org/10.1038/nmeth.2023
Posted in BEACON Researchers at Work | Comments Off on Fish, You are the Father!

200 Years of Developmental Hourglass: Using Big Data to Increase Our Understanding of Vertebrate Embryogenesis from a Trickle to a Flood

By: Megan Chan, Undergraduate Student, University of Texas – Austin

When I started college at The University of Texas at Austin a couple of years ago, I enrolled as a biochemistry/pre-pharmacy major. I didn’t know anything about computational biology back then but have since had the opportunity to participate in computational biology research under the guidance of Dr. Rebecca Young and Dr. Hans Hofmann in the Department of Integrative Biology at UT Austin. Over the last couple of years, I have grown more and more interested in the realm of data analytics, and my experience in hands-on research has completely changed my goals for the future. Because of this, I finally transferred majors last year to computational biology.

Megan Chan

At the University of Texas, we have a program called the Freshman Research Initiative (FRI) that helps new students get experience in research labs. Although I originally applied just to get something interesting on my resume, I ended up gaining much more. As part of FRI, I joined a research stream called Big Data in Biology, led by Dhivya Arasappan. The goal of this stream was to introduce freshmen to concepts in genetics and how statistics and computer science are being used to study biological systems. I chose this stream over others I was interested in (like streams working in genetically engineering bacteria or chemical analysis of wine tannins) because I had really enjoyed a year of programming when I was in high school. I had never considered myself very knowledgeable about computers and often felt overwhelmed when around guys who had been writing code since middle school, but I found the challenge of solving problems and discovering something new exciting. In my sophomore year I realized that I wanted to continue exploring this field and completely changed my career focus from pharmacy to computational biology.

As part of FRI, I had the opportunity to join Dr. Young and Dr. Hofmann in an independent project adding evidence to a long-standing debate over the validity of what is commonly known as the hourglass model of vertebrate development. The hourglass model hypothesizes that the vertebrate body plan imposes a constraint on diversification of mid-embryonic development across vertebrate species. Early evidence for this theory was based on qualitative analysis of anatomical developmental variation, but in recent years gene expression data has been used as evidence for and against the hourglass model. The part of this overall project that I have been working on focuses on describing patterns of similarity in developmental gene expression through embryogenesis among several vertebrate species. This has involved the processing and analysis over 150 open-source gene expression datasets representing developmental stages for six species. By comparing the similarity of gene expression between each combination of species at each time point in development I can ask whether mid-embryonic stages are most similar in gene expression across species.

A major challenge in achieving this goal has been the lack of consistency in staging for different species. There is not a common quantitative way to equate a particular stage of development in one species with that in another. To add to this problem, of the species we have data for, most only have data for a select set of stages, and the number of stages sequenced for each species is also different. For example, there are 8 out of 46 stages represented for chicken embryos and 24 out of a possible 44 stages for a species of frog (not including free-swimming tadpoles). To overcome this essential problem, I’ve turned to machine learning and comparing qualitative descriptions of stages to group developmental time points within each species into comparable sets.

Of the various methods I integrated into my approach, the first method I employed was K-means clustering. K-means is an unsupervised machine learning algorithm that iteratively computes the distance between each data point and a set of k centroids to calculate which points cluster together around a mean, with k being the number of clusters to find. This was the first method I tried because it is a fairly common way of classifying data without pre-determining classes. To find the appropriate k, I generated an elbow plot visualizing the amount of variation that would be accounted for by several possible numbers of clusters and chose a k that represented a reasonable amount of variation without dividing the data into too small of clusters. A known feature with K-means, however, is that it randomizes the initial centroids which can result in some variation in cluster membership when the clusters are not robust. To enhance/strength of this method, I used partitioned hierarchical clustering, another form of unsupervised machine learning. Similar to the first, this algorithm’s goal is to group the data points into a predetermined number of clusters with similar values, but it starts by considering the entire dataset one cluster and then partitions it into smaller pieces until it’s reached the appropriate number of clusters. Hierarchical clustering, unlike K-means, tends to be consistent, and our results showed that, at an appropriate number of clusters found with the earlier described method, it also conserved the order of the developmental stages. Further analysis showed that these clusters could be defined by at least some biological significance. We are now confronted with the challenge of aligning these clusters across species.

Now, my work has turned from heavy computation to intense reading. I’ve made it this far without having to know too much about the details of what all these stages mean, but I’ve come to face the fact that I will need some biological knowledge of vertebrate development in order to compare these stages in any reasonable way. The beauty of being in an interdisciplinary field.

The knowledge that I’ve gained while working on this project is invaluable to me as I start to pursue my own projects and begin exploring my future options as graduation slowly approaches. I’ve enjoyed the work I’ve done in this lab so much that last year I started analyzing data for fun; in one instance looking for patterns in word choice in a dataset of Russian disinformation tweets, and in another instance predicting the length of time a dog will stay in the local shelter based on its age. This research experience has also opened many doors for me, allowing me the opportunity to pursue positions analyzing data for other labs on campus and jobs mentoring new students in research, and giving me the tools I needed to land a software internship in biotech this summer. In my last year, I hope to publish results for this project and leave an impact on future research.

Posted in Uncategorized | Comments Off on 200 Years of Developmental Hourglass: Using Big Data to Increase Our Understanding of Vertebrate Embryogenesis from a Trickle to a Flood