Former BEACON Distinguished Postdoc Amir H. Gandomi has received a 2021 Discovery Early Career Researcher Award from the Australian Research Council. Amir was a BEACON postdoc from 2015-2017, and is now a Professor of Data Science at University of Technology Sydney in Sydney, Australia. He will use this award to develop genetic programming for big data analytics.
The objective of this project is to enhance a powerful machine learning method, called Genetic Programming (GP), so that it can be used for big data analytics. GP is a robust machine learning method because it not only searches for an optimal model but also elucidates the structure of the model. Therefore, unlike other machine learning methods, GP does not require a pre-defined structure. Discovering the structure is what makes GP powerful, but it is also what makes it difficult because the search space for a typical run may be extremely high, particularly when dealing with very large datasets. Amir proposes to extend GP so that it can be used for big data analytics, which is increasingly demanded by the modern world.
Compared to other machine-learning methods such as neural networks, GP is transparent, as it produces an explicit model (e.g., a mathematical equation) that is transparent and recognizable (Gandomi and Roke, 2015). Unlike GP, most other machine-learning models are complex and cannot build an explicit model. They are more appropriate for use as a part of a computer program, which limits their applicability.
In this work, to reduce the search space and model complexity, Amir will develop a new GP methodology for determining initial expression tree structure. This new GP methodology incorporates cutting-edge GP systems and an information-theoretical approach. Also, this proposal introduces a new concept called Alpha program that uses maximum information of the population. To extend GP for use in big data analytics, the following objectives will be investigated:
Objective 1: develop a framework that can efficiently decompose a large dataset that is based on divide-and-conquer.
Objective 2: extend GP by means of an intelligent strategy.
Objective 3: develop the Alpha program to facilitate maximum usage of gene information.
Objective 4: tune the algorithm for use in classification and regression problems and apply it to big data analytics.
Based on these objectives, this project is organized into four phases, as follows:
The full list of awardees can be viewed here. Congratulations, Amir!