BEACON Researchers at Work: The Encoding Scheme based on Anchors for Scalable Multi-Objective Clustering Algorithms

This BEACON Researchers at Work blog post is by Dr. Shuwei Zhu, Prof. Lihong Xu and Prof. Erik D Goodman.

Fig. 1. The correct clustering solution often corresponds to a tradeoff between two or more clustering objectives

BEACON’s Greenhouse research team, led by Prof. Lihong Xu (from Tongji University, China) and Prof. Erik D Goodman, has been doing work on BEACON’s international collaboration projects to model, optimize and control for the microclimate inside greenhouse with the aid of evolutionary techniques. We explore the capabilities of Multi-Objective Evolutionary Algorithms (MOEAs) for solving problems with multiple conflicting objectives, to explicitly tradeoff the greenhouse production and the associated energy cost. Recently, we are working on data-driven MOEAs with the help of surrogate models to address the above task, as the data collected is very helpful to guide the search towards the true Pareto front (PF) of the problem.  However, the huge amount of data makes it time-consuming to build data-driven models, hence we resort to data mining techniques as preprocessing.

Clustering is a widely researched topic in data mining, pattern recognition, and machine learning (ML), which aims at partitioning a given dataset into clusters (groups or categories). Generally, data clustering can be regarded as an optimization problem, which is targeted for some particular types of data. Also, the correct clustering solution and the number of clusters (k) often corresponds to a tradeoff between two or more clustering objectives. For example, as shown in Fig. 1, minimization of the overall deviation and connectivity is conflicting, but each can contribute to the correct clustering solution as well as the corresponding true k in this case. Evolutionary multiobjective clustering (MOC) algorithms have shown promising potential to outperform conventional single-objective clustering algorithms, especially when the number of clusters k is not set before clustering. However, the computational burden becomes a tricky problem due to the extensive search space and fitness computational time of the evolving population, especially when the data size is large.

To define the model structure, most existing MOC methods are primarily driven by the choice of suitable clustering criteria (objectives) in order to address specific types of data. Actually, the choice of cluster representation (encoding scheme) plays a somewhat primary role in developing the model structure, which, accordingly, influences the search space and reproduction operators. For MOC methods with evolutionary computation (EC), three prevalent cluster representations (or encoding schemes) exist in the literature—namely: 1) graph-based (sometimes called a locus-based adjacency representation) scheme represents clusters with links between genes; 2) prototype-based scheme uses cluster representatives (e.g., centroids, medoids, or modes of a cluster) as the individuals; and 3) label-based scheme directly encodes each gene with the class label. The first two encoding schemes (more common in the literature) are presented in Fig. 2.

(a) Graph-based encoding (b) Cluster center-based encoding
Fig. 2. Encoding schemes of MOC

The existing encoding schemes have, to a greater or lesser degree, limitations regarding their scalability when used in an EC-based framework. The graph-based and label-based representations belong to the category of point-based approaches, in which class assignments of data points (objects) are explicitly/implicitly encoded in the individuals, such that the genome length is equal to the number of data points n, leading to difficulties in addressing large data. On the other hand, prototype-based methods are limited to hyper-spherical clusters, which may significantly degrade their performance when handling data that are not linearly separable; while graph-based and label-based representations can overcome this problem due to their independence from the shape of clusters.

Fig. 3. Hierarchical structure of training seed nodes

Besides the encoding scheme, we also develop a new reproduction operation to produce high-quality solutions more efficiently by taking advantage of clustering ensemble technique. All of the previous MOC approaches use genetic operator-based reproduction (e.g., crossover and mutation), which comes from traditional MOEAs without considering many properties of clustering tasks. To solve this problem, we propose to design an ensemble-based reproduction operator. However, most clustering ensemble are not able to be served as reproduction operators in the proposed framework. This is mainly due to the fact that the clustering ensemble technique should: 1) flexibly output a solution of specified k; and 2) be time efficient. Fortunately, we find that the bipartite graph partitioning strategy is very suitable for these purposes. Moreover, in the final decision making from the set of non-dominated solutions get by MOC, which is underexplored in the existing methods, the usage of the bipartite-graph-based cluster ensemble strategy is also presented, whether k is provided or not. To be specific, the bipartite graph integrating information of multiple solutions is used to generate a specific clustering solution of k. If k is not provided, the seed-points-based cluster validity can effectively and efficiently be used to select the appropriate result among solutions of different k generated through the obtained bipartite graph.

From the above analysis we see that most existing cluster representations have scalability limitations and lead to huge search spaces when applied on large datasets, or sensitivity issues to the shape of clusters. In view of this, we propose to develop a novel cluster representation for developing scalable MOC models, which can simplify the search procedure and decrease computational overhead. For our purposes, a set of seed nodes (anchors) is found to represent the local structure of small subgroups, and then they are used to encode the individuals of the proposed MOC method. Instead of directly generating sufficient seed points distributed uniformly over the entire dataset, we locate them within a hierarchical topology (a coarse-to-fine-trained topological structure), such that there exist more seed points in denser regions and fewer in sparse regions. The hierarchical structure of training seed nodes is shown in Fig. 3, where we can see a set of seed nodes (the red points) is generated to approximately represent the data set based on its density distribution, which is beneficial to the performance if conducting clustering on these seed nodes. However, its one-layer version can just generate uniformly-distributed seed points. Thus, the graph of seed points can be built (e.g., using minimum spanning trees) to develop the proposed hierarchical topology-based cluster representation, which can reduce both time and space complexities significantly.

Comparison experiments are conducted on a series of different data distributions, revealing the superiority of the proposed MOC algorithm in terms of both clustering performance and computing efficiency.

Fig. 4. Different types of combinations of clusterings

We illustrate three different types of combinations of clusterings in Fig. 4, where (a) sequential clustering: each method in turn may use information provided by the previous clustering system while possibly exploiting new additional data; (b) cooperative clustering: each clustering algorithm produces its result independently. The final clustering is computed in a post-processing step, and the only exchange of information is about when the individual processes are completed; and (c) collaborative clustering: the group solves together problems defined and imposed by the central controller, affecting an individual task to each learner. Interactions are recurrent between team members, responsibility is collective, the action of each teammate contribute to the performance of the clustering each iteration. It is obvious that most classical clustering methods that work sequentially belong to the first type; and ensemble clustering techniques work in parallel at the end are the kind of cooperative clustering. The MOC is actually a kind of collaborative clustering, as the individuals are exchanging information each generation, in order to evolve a set of high quality and diverse clustering solutions to aid the final decision making. As the scalability issue of MOC models has been alleviated dramatically in our work by defining the new cluster representation, there is a big room of potential to explore MOC methods in addressing more complicated data sets with either huge size or irregular clustering structures.

Notes: The main result above was published in the Journal— IEEE Transactions on Cybernetics, 2021, doi: 10.1109/TCYB.2021.3081988

References:

[1] S. Zhu, L. Xu, E. D Goodman. Hierarchical Topology-Based Cluster Representation for Scalable Evolutionary Multiobjective Clustering. IEEE Transactions on Cybernetics, 2021, doi: 10.1109/TCYB.2021.3081988

For more information about this work, you can contact Prof. Lihong Xu at email: xulihong@nullmsu.edu

Posted in BEACON Researchers at Work | Tagged , , , | Comments Off on BEACON Researchers at Work: The Encoding Scheme based on Anchors for Scalable Multi-Objective Clustering Algorithms

BEACON Researchers at Work: The “(M-1)+1” Framework of Generalized Pareto Dominance for Evolutionary Many-objective Optimization

This BEACON Researchers at Work blog post is by Dr. Shuwei Zhu, Prof. Lihong Xu, Prof. Erik D Goodman and Dr. Zhichao Lu.

Photo of Zu and Goodman standing in front of greenhouse

Prof. Lihong Zu and Prof. Erik D. Goodman

BEACON’s Greenhouse research team, led by Prof. Lihong Xu (from Tongji University, China) and Prof. Erik D Goodman, has been doing work on BEACON’s international collaboration projects to model, optimize and control for the microclimate inside greenhouse with the aid of evolutionary techniques. We explore the capabilities of evolutionary algorithms for solving problems with multiple conflicting objectives and started from adapting some Multi-Objective Evolutionary Algorithms (MOEAs), including NSGA-II developed by a senior BEACONite—Prof. Kalyanmoy Deb, to explicitly tradeoff the greenhouse production and the associated energy cost. This strategy worked pretty well until we tried to take more objectives (such as the control precision) into our consideration, as the effectiveness of Pareto-dominance-based MOEAs deteriorates progressively as the number of objectives in the problem, given by M, grows.

It has been widely-believed that, the above issue is mainly due to the poor discriminability of Pareto optimality in many-objective spaces (typically M≥4). To be specific, solutions become incomparable which induced by the loss of Pareto-based selection pressure toward the true Pareto front (PF). As a consequence, research efforts have been driven in the general direction of developing solution ranking methods that do not rely on Pareto dominance (especially decomposition-based techniques), which can provide sufficient selection pressure. However, it is still a nontrivial issue for many existing non-Pareto-dominance-based evolutionary algorithms to deal with unknown irregular Pareto front shapes. For example, it is known that the performance of decomposition-based algorithms strongly depends on the Pareto front (PF) shapes, since the predefined set of weight directions plays a primary role in the performance of decomposition-based algorithms. Compared to decomposition-based methods, the performance of Pareto-dominance-based MOEAs is less related to the PF shapes.

To tackle the scalability problem of Pareto-based MOEAs, in our previous work [1] the generalization of Pareto optimality (GPO) was proposed to better discriminate among solutions. It can guarantee the identity of expanding the dominance area to improve selection pressure, but only at the cost of introducing some difficulty in diversity maintenance. Usually, excessive selection pressure tends to cause diversity maintenance to deteriorate, for example, it may lead the population to converge into a subregion (or several small subregions) of the PF; while excessive diversity pressure may result in degraded convergence performance. So the GPO still has difficulty to maintain the delicate balance between convergence and diversity.

Hence, a new many-objective evolutionary algorithm based on the generalization of Pareto optimality (GPO) is proposed, which is simple, yet effective, in addressing many-objective optimization problems. The proposed algorithm [2] used an “(M-1)+1” framework of GPO dominance, (M-1)-GPD for short, to rank solutions in the environmental selection step, in order to promote convergence and diversity simultaneously. To be specific, we apply M symmetrical cases of (M-1)-GPD, where each enhances the selection pressure of M-1 objectives by expanding the dominance area of solutions, while remaining unchanged for the one objective left out of that process. For clearer understanding, Fig. 1 shows a graphical explanation of (M-1)-GPD in the original f1-f2 space and the indirect objective spaces f1-Ω2 (blue) and Ω1-f2 (green).

Fig. 1. Pictorial illustration of a two-dimensional objective space and the shrunken space after performing two (M-1)-GPD cases. (a) The original f1-f2 objective space; (b) The two indirect f1- Ω2 (blue) and Ω1-f2 (green) objective spaces (also the two contracted spaces inside) after generalization.

Given an expanding angle φ, for solution P the f1 objective remains unchanged while f2 expands by φ degrees, i.e., “M-1″ is related to f2 and “1” is for f1, such that ⋀P φ(1) depicts the coverage of its dominance envelope. Correspondingly, the case of f1-“M-1” and f2-“1” is presented for solution Q, and ⋀Q φ(2) shows its dominating  envelope. In the case of ⋀P φ(1), the contracted blue space f1– f2 (inside axes f1 and Ω2) is obtained; while for dominating envelope ⋀Q φ(2), we have the contracted green space f1– f2 (inside axes Ω1 and f2), as shown in Fig. 1b. Moreover, the difference is clearly visible, marked in light gray–—i.e., the coverage of the true PF (bold PQ curve in Fig. 1a) in the original f1– f2 space and of the two partial generalized Pareto fronts (shown by the green arc of P and the blue arc of Q, respectively, in Fig. 1b). In fact, a larger φ is likely to lower the coverage of the true PF. When φ reaches its maximum, the two transformed objective spaces will be close to the hyper-lines, such that only two extreme solutions left in each of the extreme case, respectively.

The proposed (M-1)-GPD scheme is nearly parameterless and is used in a novel many-objective evolutionary algorithms (MaOEA), that is, multiple (M-1)-GPD-based optimization, called MultiGPO for short, which shows competitive performance compared with several state-of-the-art MaOEAs. Moreover, Fig. 2 shows the comparison of conventional Pareto dominance, GPO and MultiGPO. In conventional Pareto dominance, the Percentage of the better region is 1/2M, which is based on the value of M, such that the convergence deteriorates with insufficient selection pressure. It is obvious that GPO losses diversity by enhancing dominance area. However, the diversity of population is poor (in this case) and the parameter φ should be dynamically tuned generation by generation, leading to the impractical usage of GPO in applications. For our MultiGPO algorithm, the convergence and diversity can be balanced well, and the parameter φ is set based on the value of M that is fixed during optimization. Moreover, since no reference vectors are employed, the performance of MultiGPO is less dependent on the PF shapes than methods using reference vectors, and is robust, especially in solving problems having irregular PFs. The scheme validation was performed on some benchmark functions with different types of known PFs (e.g., concave, convex, and disconnected), and real-world problems with irregular PFs. For most problems, a relatively complete set of solutions could always be acquired by MultiGPO, and it can obtain overall better performance than some other state-of-the-art methods.

Fig. 2. Comparison of the conventional Pareto dominance, GPO and MultiGPO

To enhance understanding, we illustrate the (M-1)-GPD ranking method on a set of candidate solutions of a bi-objective minimization problem, as shown in Fig. 3, where (a) depicts the case of GPOf1, while (b) is for GPOf2. The ten circles (labeled from A to J) denote the non-dominated solutions of this problem, as analogous to those obtained on an MaOP, which are usually incomparable.  In Fig. 3a, solutions B, C, D, I, J highlighted in blue are ranked in the first order by GPOf1; while Fig. 3b shows that solutions A, B, F, G, H, I (also highlighted in blue) are ranked order 1 according to GPOf2. It is obvious that these two cases are mostly complementary to each other. That is to say, employing GPOf1 or GPOf2 can enhance selection pressure in terms of discriminability, and diversity can be preserved by adopting them simultaneously. Apart from the complementarity of the distinct symmetrical (M-1)-GPD schemes, the max-min distance-based selection also plays a role in maintaining diversity. For example, solution E is accorded a higher rank (>1) by either GPOf1 or GPOf2; however, its density in terms of the max-min distance is better than others. Hence, a priority will be provided to solution E for survival during environmental selection. Moreover, solutions C and D (or F and G) are closer to each other in terms of angular distance. With the max-min distance-based selection, so if C or F is selected, the nearest neighbor D or G can hardly be considered for survival to the next generation.

Fig. 3. An illustration of the (M-1)-GPD ranking method. (a) GPOf1: Objective f1 remains unchanged and f2 expands dominance region; (b) GPOf2: Objective f2 remains unchanged and f1 expands dominance region.

Notes: The main result above has been published in the Journal— IEEE Transactions on Cybernetics, 2021, doi: 10.1109/TCYB.2021.3051078

References:

[1]  S. Zhu, L. Xu, , E. D Goodman, and Zhichao Lu. A New Many-Objective Evolutionary Algorithm Based on Generalized Pareto Dominance. IEEE Transactions on Cybernetics, 2021, doi: 10.1109/TCYB.2021.3051078

For more information about this work, you can contact Prof. Lihong Xu at email: xulihong@nullmsu.edu

 

 

Posted in BEACON Researchers at Work | Tagged , , , | Comments Off on BEACON Researchers at Work: The “(M-1)+1” Framework of Generalized Pareto Dominance for Evolutionary Many-objective Optimization

BEACONites to Compete in 2021 Reach Out Science Slam Communication Challenge

BEACON graduate students Joelyn de Lima, Anna Raschke, Miles Roberts, and Katherine Skocelas have been named semifinalists in the 2021 Reach Out Science Slam Communication Challenge jointly sponsored by the National Science Foundation and the Museum of Science, Boston. They each will be presenting an original, three-minute science story on their research.

The Reach Out Science Slam is a nationwide effort to boost the communication skills of students and early-career researchers affiliated with the National Science Foundation’s 12 flagship Science and Technology Centers (STCs). These Centers tackle the frontiers of science and technology, foster discovery and innovation, and train next-generation scientists and engineers.

Competitors of the Reach Out Science Slam are required to make their science stories suitable for family audiences and include a live presentation component. Entrants are encouraged to make presentations engaging by incorporating demonstrations, animation, props, music, and more.

The Slam Semifinals will be held on April 6 and 13 and 20 and the Finals will be held on May 4, 2021. All Reach Out events will be presented live on YouTube at 7:00 p.m. EDT before a national audience and a panel of expert judges. The Judges’ Pick and the Audience Choice Winner of the Science Slam Finals will receive a $1,000 VISA gift card. All finalists will receive professionally packaged videos of their presentations and will also be distributed through Museum of Science and National Science Foundation social media channels.

For more information on the Reach Out Science Slam Communication Challenge, please visit mos.org/reach-out-challenge or contact reachout@nullmos.org.

About the Museum of Science

Among the world’s largest science centers, and New England’s most attended cultural institution, the Museum of Science engages 1.4 million visitors a year to science, technology, engineering, and math (STEM) through interactive exhibits and programs. Nearly an additional 2 million people experience the Museum annually through touring exhibitions, traveling programs, planetarium productions and preK-8 EiE® STEM curricula through the William and Charlotte Bloomberg Science Education Center.  Established in 1830, the Museum is home to such iconic exhibits as the Thomson Theater of Electricity, the Charles Hayden Planetarium, and the Mugar Omni Theater. The Museum influences formal and informal STEM education through research and national advocacy, as a strong community partner and loyal educator resource, and as a leader in universal design, developing exhibits and programming accessible to all. Learn more at https://mos.org/.

Posted in BEACON in the News | Comments Off on BEACONites to Compete in 2021 Reach Out Science Slam Communication Challenge

BEACON wins MSU Excellence in Diversity Award

Last month, Michigan State University announced this year’s winners of the annual Excellence in Diversity Awards. BEACON is very proud to receive the Team Award for Sustained Efforts towards Excellence in Diversity!

Photo of Judi Brown Clarke

 

We owe much of our success to the efforts of our original Diversity Director, Dr. Judi Brown Clarke. Judi’s vision and expertise shaped our work not only in recruitment, retention, and support of underrepresented scientists, but also in cultivating an inclusive and aware climate for all of our students, postdocs, staff, and faculty. Judi is now the Vice President for Equity & Inclusion and Chief Diversity Officer at Stony Brook University.

 

Photo of Connie JamesWe are also very grateful to Connie James, who has taken over the Diversity Director role at BEACON since Judi moved on. Since the beginning of BEACON, Connie has always played an important role in ensuring a supportive and inclusive environment for our members, and now she is ensuring that we not only continue to live up to the high standards we set in our early years, but also keep pushing ever higher!

 

Posted in BEACON in the News | Comments Off on BEACON wins MSU Excellence in Diversity Award

Extending Genetic Programming for use in Big Data Analytics: BEACON alum Amir Gandomi

Former BEACON Distinguished Postdoc Amir H. Gandomi has received a 2021 Discovery Early Career Researcher Award from the Australian Research Council. Amir was a BEACON postdoc from 2015-2017, and is now a Professor of Data Science at University of Technology Sydney in Sydney, Australia. He will use this award to develop genetic programming for big data analytics.

The objective of this project is to enhance a powerful machine learning method, called Genetic Programming (GP), so that it can be used for big data analytics. GP is a robust machine learning method because it not only searches for an optimal model but also elucidates the structure of the model. Therefore, unlike other machine learning methods, GP does not require a pre-defined structure. Discovering the structure is what makes GP powerful, but it is also what makes it difficult because the search space for a typical run may be extremely high, particularly when dealing with very large datasets. Amir proposes to extend GP so that it can be used for big data analytics, which is increasingly demanded by the modern world.

Compared to other machine-learning methods such as neural networks, GP is transparent, as it produces an explicit model (e.g., a mathematical equation) that is transparent and recognizable (Gandomi and Roke, 2015). Unlike GP, most other machine-learning models are complex and cannot build an explicit model. They are more appropriate for use as a part of a computer program, which limits their applicability.

In this work, to reduce the search space and model complexity, Amir will develop a new GP methodology for determining initial expression tree structure. This new GP methodology incorporates cutting-edge GP systems and an information-theoretical approach. Also, this proposal introduces a new concept called Alpha program that uses maximum information of the population. To extend GP for use in big data analytics, the following objectives will be investigated:

Objective 1: develop a framework that can efficiently decompose a large dataset that is based on divide-and-conquer.

Objective 2: extend GP by means of an intelligent strategy.

Objective 3: develop the Alpha program to facilitate maximum usage of gene information.

Objective 4: tune the algorithm for use in classification and regression problems and apply it to big data analytics.

Based on these objectives, this project is organized into four phases, as follows:

Figure 1. Four Phases of Extending GP for use in Big Data Analytics

The full list of awardees can be viewed here. Congratulations, Amir!

Posted in Uncategorized | Tagged , | Comments Off on Extending Genetic Programming for use in Big Data Analytics: BEACON alum Amir Gandomi

Evolutionary AI site with expert podcasts and a COVID-19 intervention demo

The Evolutionary AI research group at Sentient has moved to Cognizant Technology Solutions. The group includes several current and past BEACONites, including Risto Miikkulainen, Elliot Meyerson, Jason Liang, and Santiago Gonzalez, and past interns Aditya Rawal and Khaled Talukder.  The group has a new website, https://evolution.ml/; the earlier content (announced previously in this blog) is there, including video interviews with 17 academic and industry leaders  on “The Future of AI”, as well as the “Evolution is the New Deep Learning” microsite.

Following the idea of expert interviews, the site showcases five new podcasts in the Pulse of AI series (https://evolution.ml/podcasts). In these podcasts, Jason Stoughton discusses topics such as biological vs. computational evolution, trustworthy AI, AutoML, demystifying AI, and open-endedness with Stephanie Forrest, Joydeep Ghosh, Babak Hodjat, Quoc Le, Risto Miikkulainen, Jordan Pollack, and Ken Stanley.

There is also a new site on decision making (https://evolution.ml/esp), featuring research on “Evolutionary Surrogate-assisted Prescription.” The goal is to extend AI from predicting what will happen to prescribing what we should do about it. The idea is to first train a predictor neural network through supervised learning, and then use it as a surrogate to evolve a prescriptor neural network to make good decisions. The site features papers, visualizations, and demos on various game domains (including FlappyBird!) showing how this approach can be sample-efficient, reliable, and safe in sequential decision tasks.

A major new part of the site focuses on COVID-19 (https://evolution.ml/esp/npi): It demonstrates how the same technology can be used to model the potential effects of non-pharmaceutical intervention (NPI) strategies to contain and mitigate the pandemic. The predictor is trained with historical data on the number of cases and the NPIs over time in various countries, i.e. restrictions on schools and workplaces, public events and gatherings, and transportation. A Pareto front of prescriptors is then evolved to discover the best tradeoffs between minimizing cases and restrictions. To illustrate this principle, the site includes an interactive demo: you can explore how, given your preferred tradeoff, the pandemic could be contained and mitigated in different countries.

We invite you to explore the “evolution.ml” site—and perhaps also bring your own expertise in AI to help deal with COVID-19!

Posted in BEACON Researchers at Work | Comments Off on Evolutionary AI site with expert podcasts and a COVID-19 intervention demo

Engaging Galápagos Students and Educators in Evolutionary Activities

This blog post is by Madison Bovee, Alexa Warwick, John G. Phillips, Brant G. Miller, and Christine Parent.

Evolutionary research is conducted across the globe, yet no location may be as emblematic as the Galápagos Islands (Figure 1). Made famous by Charles Darwin’s visit in 1835, long term research on the islands continues to advance the field of evolution. Even though international scientists frequently conduct research in locations like the Galápagos, all too often they collect data and leave without significant engagement with local communities and stakeholders. Given that evolution is a critical yet challenging topic to learn, efforts to engage these communities could help highlight the importance of the ecosystem they live in for evolutionary advances.

Figure 1. A map of the Galápagos Islands denoting the geological age of each island and sampling locations used by the Parent Lab in their evolutionary research on snails. Figure courtesy of Phillips et al. 2020. ES = Española, FA = Fernandina, FL = Floreana, PA = Pinta, RA = Rábida, SC = Santa Cruz, SA = Santiago SL = San Cristóbal. Isabela samples are partitioned by volcano: AL = Alcedo, DA = Darwin, CA = Cerro Azul, SN = Sierra Negra, WF = Wolf.

In Ecuador specifically, previous work has shown only 50% of the population are accepting of evolution, placing them 14th out of the 19 Latin American countries surveyed, and lower than most European countries (Pew Research Center 2014, Miller et al. 2006). Thus, it seems the evolutionary lessons from iconic organisms like the finches and tortoises have had a much greater impact outside of Ecuador than within its borders. Dr. Christine Parent’s lab at the University of Idaho (UI) has a history of public engagement in the Galápagos Islands using her research on snails (Parent and Crespi 2006, 2009; Parent et al. 2008, Kraemer et al. 2019, Phillips et al. 2020). Last year we began collaborating with one of the local Galápagos schools: Tomas de Berlanga School in Bellavista (island of Santa Cruz), thanks to funding from BEACON (DBI-0939454) and the National Science Foundation (#1751157 to Dr. Parent).

The team consisted of Dr. Parent, Dr. John Phillips (Parent Lab postdoc), Dr. Brant G. Miller (UI education faculty), Madison Bovee (UI pre-service elementary teacher), and Dr. Alexa Warwick (MSU faculty). We had the opportunity to work with students aged 9–11 years old. Our goal was to expose them to the fundamental importance of evolution in biology and in their everyday lives. We used inquiry-based learning strategies to enhance their understanding of basic evolutionary and ecological principles by using the ecosystems in their own backyard. We had three full days to help students become more familiar with evolutionary research within the Galápagos Islands.

Day one consisted of evaluating the students’ familiarity with the theory of evolution and practicing data collection and inference. The students were split into three smaller groups each led by one of our team members and assisted by a local teacher. First, we discussed how to make thoughtful observations and then asked them to observe their surroundings on a diversity walk that was literally in a park near their school. Students recorded their observations in their journals by completing the following prompts: “I notice”,  “I wonder”, and “It reminds me of”.  This activity allowed the students to start thinking like a scientist. Next they worked in pairs to analyze authentic science data as part of a Data Nugget, followed by group discussion about what they learned.

Student observation in her notebook about the finch (‘el pinzon’) and wondering how it can sit on a cactus without hurting itself.

A pair of students working on a Data Nugget.

As the students were now prepared to think more abstractly, day two focused on evolutionary learning and preparation to have the students create their own inquiry projects. To start the day we had two scientists present, Dr. Christine Parent and Dr. Satoshi Chiba (Tohoku University), who have been conducting snail research on the Galápagos and other oceanic islands. These presentations showed the students exciting examples of evolutionary research and highlighted the value and unique opportunities that come from research in the Galápagos Islands. Students then explored evolutionary ideas through the bird beak activity, inspired by beak evolution of finches in the Galápagos. Students used different tools to represent a bird’s beak and considered the relationship between the beak and a bird’s ability to find food and survive in a given environment. Finally, we ended the day with each of the three groups of students coming up with 50 questions about snails. We then discussed what makes a question testable and whether we could answer the question within our time frame and other restrictions (only collecting snail shells rather than live snails). Once the question was selected students discussed their methodology and materials to prepare for day three.

Dr. Parent presenting to all the students.

Exploring evolution with the bird beak activity.

Day three solely consisted of the students investigating their inquiry-based question about snails, by collecting and analyzing data to make claims answering their selected question. The three groups and all chose similar, but unique questions that they felt were testable and interesting. We had the opportunity to conduct this exercise on a local coffee plantation to collect our data by looking for snail shells. Each group then prepared and presented their results.

Looking for snail shells for their inquiry projects.

Holding snail shells found on the diversity walk.

A group of students presenting their project results.

The students grew within these three days tremendously and so did the team. We had to overcome barriers that we didn’t expect, such as the language barrier (it was a bilingual school but the younger students were not as comfortable with English), lack of knowledge of the students’ background in science, and how active and energetic the students were. However, we adapted to these changes within the first day and we used these barriers to our advantage. For example, we were able to enhance their background in evolution on the first day which led to our other two days running smoother since we knew what knowledge they were relying on. We also created lessons that were culturally relevant and used hands-on activity which helped the students relate to our lessons.

Our efforts were successful as evaluated by the students’ pre- and post-assessments. Overall, 88% of the 42 students who completed the post-assessment agreed or strongly agreed that they liked science and 56% wanted to be a scientist. About half of the students (52%) agreed or strongly agreed that different organisms can have the same common ancestor and just under half (45%) said they could explain how natural selection worked. Thus, we think their knowledge about evolution grew and their curiosity sparked. In addition, they learned more about the importance of the Galapagos in conducting scientific research, and 71% responded to the post-assessment question “Why do you think the Galápagos attracts so many scientists?” with mention of the unique plants and animals found there. For example, one student wrote “Hay especies endemicas, cual significa, que están ubicadas en un solo lugar en el mundo, en este caso Galápagos” (There are endemic species, which means that they are located in only one place in the world, in this case Galápagos).  We also taught them how to do inquiry-based research and helped these students to “think like a scientist”. When they were asked what the most interesting thing they learned, 76% mentioned the snails, suggesting their experience investigating their own snail questions was impactful. For example, one student wrote “De trabajar en grupo y sobre los caracoles” (working in groups and about the snails). This inquiry experience is a tool they will be able to use the rest of their lives and now they also have the information to use this tool right in their backyard.

Finally, we greatly appreciate the assistance from the local teachers with classroom management and translating instructions during the three days of working with the students, as well the school administrators with logistics and planning, especially Michelle Rothenbach and Justin Scoggin. We look forward to continuing to collaborate with teachers and students from the Tomas de Berlanga School on evolutionary education in the future.

An overview of NSF-REU funded undergraduate research conducted in conjunction with our BEACON outreach can also be found here: https://www.uidaho.edu/sci/biology/news/features/2019/galapagos

Citations

Kraemer, A. C., Philip, C. W., Rankin, A. M., & Parent, C. E. (2019). Trade-offs direct the evolution of coloration in Galápagos land snails. Proceedings of the Royal Society of London. Series B. Biological Sciences, 286(1894), 1–9. doi: 10.1098/rspb.2018.2278

Miller, J. D., Scott, E. C., & Okamoto, S. (2006). Public Acceptance of Evolution. Science, 313(5788), 765–766. doi: 10.1126/science.1126746

Parent, C. E., & Crespi, B. J. (2009). Ecological opportunity in adaptive radiation of Galápagos endemic land snails. American Naturalist, 174(6), 898–905. doi: 10.1086/646604

Parent, C. E., & Crespi, B. J. (2006). Sequential Colonization and Diversification of Galápagos Endemic Land Snail Genus Bulimulus (Gastropoda, Stylommatophora). Evolution, 60(11), 2311. doi: 10.1554/06-366.1

Pew Research Center. 2014. Religion in Latin America: Widespread change in a historically catholic region.

Phillips, J. G., Linscott, T. M., Rankin, A. M., Kraemer, A. C., Shoobs, N. F., & Parent, C. E. (2020). Archipelago-wide patterns of colonization and speciation among an endemic radiation of Galápagos land snails. Journal of Heredity, esz068, 92–102. doi: 10.1093/jhered/esz068.

Posted in BEACON Researchers at Work | Comments Off on Engaging Galápagos Students and Educators in Evolutionary Activities

How Claire from the BA test kitchen made me rethink our scientific role models

This blog post is by MSU faculty member Arend Hintze.

I love making stuff, let it be wood crafting or building cosplay Halloween costumes for my kids. However, I also like to do things the right way.  Consequently, I have to learn new skills all the time. To that end, I watch a lot making of videos and tutorials.  Over time I realized that I spend a lot of time watching experts on YouTube doing things.  At first, I thought I am just a sucker for infotainment, but then I took a closer look at my YouTube-history. I found confirmation for the infotainment preference. I watch a lot of Physics Girl, Computerphile, Numberphile, the Backyardscientist, Captain Disillusion, Today I Found Out, Scott Manley, the Slowmo-Guys, SciManDan, and Veritassium. These all fall into the category of science or technology dissemination. But I also saw that I follow Claire from the Bon Appétit test kitchen, Adam from Tested, Peter Brown, Odin Makes. All YouTuber’s who are experts in what they do, but instead of disseminating scientific content or technological advancements, they usually build or create things.

Claire Saffitz from Bon Appétit’s BA Test Kitchen

Here comes the strange observation. Me being a scientist, I feel much more connected with the makers. I have a much deeper emotional connection with Claire and Adam than with Bill Nye or Neil deGrasse Tyson. But why is that?

Most science YouTubers talk about scientific facts, and how they can be understood. They debunk false claims and fake news. Or they show advancements, and how sophisticated detectors allow us to understand the very stuff reality is made from. While I love all of that, I don’t feel myself doing science properly represented. Yes, accomplishments are great, but 99.9% of the time, I don’t feel like I accomplished something.  Scientific discoveries are rare, most experiments fail, and results keep contradict each other until much later when suddenly everything makes sense. No, I don’t have imposter syndrome, that is an entirely different thing.  Almost by definition, we scientists work on the edge of the known. If we didn’t try to push this boundary, we wouldn’t do our job right. If our experiments worked out every single time, we could have known the answer beforehand. It is “being wrong” that is informative. It is “not knowing” what drives the quest for knowledge, and it is a long, cumbersome, and often frustrating path.

However, the typical US education is not preparing students for such challenges. STEM education makes science “fun,” everything has an answer, and tests only require you to regurgitate these answers. Our kids experience immediate rewards not only in their learning environments but also in how they play. Digital games are optimized for instant rewards, which is what makes them so addictive.  Critical thinking is nice, but you also need to come up with new and creative ways to solve problems.

We need to show our students and children that failures are an integral part of learning.  We need to show them how to deal with setbacks. I think students learn the most from watching others fail and deal with failure than just being baffled by other’s accomplishments. One allows you to empathize, and the other makes you depressed.

Adam Savage from Tested and MythBusters

This is the reason why I watch MythBusters with my kids, where “failure is always an option.”

This is the reason why I try to get my kids with questions as quickly as possible to a point where they don’t know anymore. I want them to be comfortable with not knowing. I need them to enjoy this state, as it is the motor for curiosity and creative exploration. I stereotypically respond to the “I don’t know” answer with “take a guess!”

This is also the reason why I emotionally bond with Adam and Claire. Both explore, both fail often, and both do “not know” in front of the camera. The only difference between Adam and Claire is in their ability to cope. Adam has more than ten years of experience from MythBusters in not getting the expected results. That is probably the reason why he can enjoy what he does so much more.  He lets you feel how little he is bothered by failure. Similarly, Claire shows how frustrated she is when something doesn’t go according to plan. She also lets us experience how she deals with that frustration: A sigh, a comment, and then she goes on. No regrets! Now she knows more, and now she can try something new, which ultimately leads to the answer.

I don’t want the other science YouTubers and science advocates to change what they do, please keep up the great work. I enjoy every bit of what you are doing. The reason why I think Adam and Claire are also such great science role models is their ability to struggle publicly. They show how failing is an integral part of finding the solution, and their ability to cope with that frustration is exemplary.

Thank you for that, and I promise I keep failing, thank you for leading by example.

Posted in BEACON Researchers at Work | Comments Off on How Claire from the BA test kitchen made me rethink our scientific role models

Goodman Extends BEACON’s Collaborations in China

This post is by BEACON’s Executive Director Erik Goodman.

On Oct. 19, 2019, I left East Lansing for China, with stops in Shantou, Guangzhou and Shanghai. I received a warm and wonderful reception everywhere I went, in spite of tariffs, trade wars, and all the political difficulties that fill the news.

My first stop was in Shantou, on the southern coast of China, a few hours from Guangzhou, Shenzhen and HongKong. At Shantou University, I was warmly welcomed by Prof. Zhun Fan (on left with his wife, her parents, and son, and yes, the ChaoShan-style food was yummy! Zhun is a former doctoral advisee of mine, and he now leads the Laboratory for Robotics and Intelligent Manufacturing, and also the Provincial Key Laboratory for Digital Signal and Information Processing (DSIP). BEACON is a co-founder of a joint center, the Center for Evolutionary Intelligence and Robotics, established between Prof. Fan’s laboratory and BEACON, with additional partners at Guangdong University of Technology and Nanjing University of Aeronautics and Astronautics. During four days in Shantou, I gave three lectures and heard reports from students and faculty about their research progress. Discussions led to many new ideas to explore. I met with Shantou University Provost Wang to review our past collaboration and investigate the possibility of broadening it next year to include more participation in the application of evolutionary computation to civil engineering. Prof. Wang is an expert in structural health monitoring and energy capture to power sensors, areas in which MSU CEE also has expertise. During the visit, the provincial government announced continuing support for the DSIP Key Laboratory, and also 1M RMB (about US$140,000) in support of the activities of the joint center with BEACON. The joint center received an excellent rating from the government on its operations to date. Included in that are Goodman’s annual visits and the visit of Chaoda Peng, a graduate student at Guangdong University of Technology, to BEACON for two years, working with Goodman. Peng is advised by Prof. Hailin Liu, who has also been a long-term visitor to BEACON.

The next stop was Guangdong University of Technology, in Guangzhou, a 3-hour train ride from Shantou (a “slow” train—only 120mph). I was hosted by Prof. Hailin Liu and gave a presentation on recent work I am involved in at BEACON, and met with students and faculty involved in evolutionary computation. In the evening, they took me on a river tour of downtown Guangzhou, and the picture shows how much more light show there is in Guangzhou than in Times Square, New York! Coordinated moving images on a series of a dozen or more buildings! Absolutely spectacular!

Goodman with Prof. Lihong Xu, BEACON Advisory Professor, at Tongji University, with advertisement of Goodman’s lecture.

I then returned to Shanghai and met for two days with the greenhouse control team at Tongji University, in a collaboration extending more than ten years, resulting in dozens of joint papers and in control systems being tested in commercial-sized greenhouses. Yuanping Su and Chunteng Bao are BEACON visitors working with me this year, and I was delighted to participate in the doctoral defense of Leilei Cao, who was another two-year visitor in BEACON. His work was nominated for an outstanding dissertation award. I lectured at Tongji University to about two hundred graduate students about our recent work on evolutionary deep learning and about my solid fuel rocket optimization work using a heterogeneous parallel genetic algorithm. I also gave a similar talk at East China Normal University the next day, for about 50 graduate students.

Then it was off to India for a month of collaboration on a short course, industry workshop, and book, with BEACON’s Prof. Kalyanmoy Deb and two distinguished Indian scholars. But that’s for another blog!

Posted in BEACON Researchers at Work | Comments Off on Goodman Extends BEACON’s Collaborations in China

Using lessons from Facebook and fence-building to understand the evolution of deadly bacteria

This blog post is by University of Idaho graduate student Clinton Elg.

Evolution of a Deadly Bacteria

Vibrio cholerae is bacteria that resides in water and causes deadly cholera disease. While areas of the world with functional sewage and potable water are largely unaffected, there is still no definitive cure for the disease. It remains rampant in less developed regions and often acts as a deadly second act after natural disaster and wars destroy infrastructure.

Bacteria are constantly evolving, and this includes illness-causing bacteria like V. cholera. Evolution leads to new major outbreaks called pandemics, with each new pandemic strain of bacteria acting like an updated software version which outcompetes and outperforms older versions. V. cholerae is now in its seventh pandemic, and the latest strain namedEl Tor V. cholerae” contains two unique pieces of DNA not found in earlier pandemics. Scientists have named these unique pieces of DNA Vibrio Seventh Pandemic Island (VSP) I & II. The VSP’s contains around 33 genes, with each gene a DNA “blueprint” that the bacteria will convert into a protein “machine”. What kind of machinery do these VSP genes encode for? How does this new cellular machinery help El Tor V. cholerae outcompete older pandemics of the disease?

Breakthrough at Michigan State University and Tufts University

In 2018, a remarkable discovery was made by PhD Candidate Geoffrey Severin and Miriam Ramliden, in the Chris Waters lab at Michigan State University (MSU) and the Wai-Leung Ng lab at Tufts University, respectively. It had been known that a VSP gene named dncV was important for El Tor V. cholerae to cause disease, but the reason remained elusive. The team discovered that increasing the number of “blueprint” copies of dncV within El Tor V. cholerae produced smaller populations of bacteria when grown on solid surfaces. These bacteria populations are called “colonies”, and scientists call colonies that shrink “small colony variants”. They later discovered that dncV encodes a molecular “switch” that activates the cell shrinking machinery of another VSP gene called capV. In the larger picture, this small colony variation may help explain why this is the leading strain sickening people around the world and is an early clue to unraveling the novel functions of the VSP’s in El Tor V. cholerae.

 

Figure 1. Representative images of a typical El Tor Vibrio cholerae colony (left) and a small colony variant of El Tor V. cholerae engineered to express excess dncV (right).

Figure 2. Dr. Chris Waters (L) and Geoffery Severin (R) from Michigan State University.

 

Building a Fence (Around Gene Networks)

Despite this scientific home run, much work remains. Imagine a human analogy: the tools required to create a fence include a hammer, nails, wood, level, chalk string, shovel, and concrete. From looking at these tools piled on the ground, one might reasonably predict that somebody is planning to build a fence.  In the VSP’s of El Tor V. cholera, we have a pile of 33 new and strange tools and we only know what 2 of them do!  Now imagine the tools for fence building were jumbled in a pile of other random tools. Without knowing what the tools are, or what tasks they might accomplish, how would you pick out the seven tools and their association with fence building? In a biological sense, we have 31 unknown genes in the VSP’s of El Tor V. cholera that group into an unknown number of “gene networks”. Each gene network is a group of genes that work together to accomplish a specific cellular task.

Geoff and Chris decided that they needed a way to predict the number of gene networks and the genes that constitute each network. They reached out to Dr. Eva Top at University of Idaho and began a collaboration with her PhD student Clint Elg from the Bioinformatics and Computation Biology (BCB) program.

Figure 3. Dr. Eva Top (L) and Clint Elg (R) at the University of Idaho.

Using The Math Behind Facebook to Predict Gene Networks

To provide predictions of the gene networks in the VSP islands of El Tor V. cholera, Clint turned to what may seem an unlikely place: Facebook. Have you ever been on Facebook and seen a new friend recommended to you? Underlying this are complex mathematical models that predict social circles like your family or your co-workers. The predictions are made by seeing how mutually related (or “correlated”) you are to another user profile: the more you post or respond to another person, the higher your mathematical correlation.

In a similar fashion, we can use the thousands of bacterial DNA genomes available on the internet to see how often certain genes correlate with each other. Instead of predicting their social networks, we are predicting their gene networks! For example, consider the two genes found by Geoff, Wai-Leung, and Chris that provide small colony morphology, vc0178 and vc0179. These genes do nothing individually, but when found together they allow a change in bacteria size. Since evolution selects for DNA that provides some sort of advantage, we should expect these two genes to co-reside in bacterial genomes at a much higher rate than two randomly chosen genes.

The result is an alpha version of software, correlogy, built with the help of mathematician Ben Riddenhour from the Institute for Modeling Collaboration and Innovation (IMCI). Correlogy predicted vc0178 and vc0179 to be highly correlated by using data from thousands of bacterial genomes, matching what Geoff and Chris had biologically demonstrated in the lab! More importantly, the software has predicted gene networks for the remaining 31 VSP genes of unknown function and interactions. These predictions give protein specialists like Geoff and Chris a place to start investigating the VSP genes which fuel the modern Seventh Cholera Pandemic.

Figure 4. VSP gene networks as predicted by correlogy. Gene vc0179 encodes protein DncV, and gene vc0178 encodes protein CapV. These two genes work together as a gene network to allow the small colony morphology found in the modern El Tor Pandemic strain.

BEACON and Collaborative Science

Our research into the deadly disease of cholera is making important discoveries. We would like to express gratitude to the tax-payer funded National Science Foundation (NSF) and particularly the BEACON program. The NSF BEACON program enabled important insight into the evolution of a lethal bacteria by encouraging and funding a meaningful collaboration between biologists, protein specialists, computer scientists, and mathematicians.

Posted in BEACON Researchers at Work | Tagged , , , | Comments Off on Using lessons from Facebook and fence-building to understand the evolution of deadly bacteria