BEACON Researchers at Work: Bidding Strategy in Learning Classifier Systems Using Loan and Niching GA

This week’s BEACON Researchers at Work post is by North Carolina A&T State University graduate student Abrham Workineh.

Nature has given some degree of inherent intelligence to living things.  One definition of  intelligence is the ability to learn from experience, to adapt to new situations and make proper decisions. Intuitively, a machine is said to be learning whenever it responds to its external environment or an input by adjusting its state to improve its future performance. Humans and living things learn heuristics through experience and use them to solve various problems. But machines have no intuition and hence fail to understand commonsense knowledge. For instance, animals are much better than robots or computers when it comes to recognition and vision. It may be easy for humans to remember a person they saw only once many years ago. But computers and robots, though damn fast in computational stuffs, need thousands of hours of intensive training before they recognize the face of a single person. In fact, several manifestations of this natural intelligence have contributed to the advancement of technology, especially in the areas of machine learning and robotics.  To mention a few: task coordination and collaboration techniques in honey bees, flock of birds, and food foraging in ant colonies are some of the sources of intelligence that have been deeply investigated and customized to machines. The essence behind machine learning is to introduce natural intelligence to machines like robots by designing computer algorithms that evolve through experience to adapt to a new environment. It enables machines to plan, learn from experience and integrate these two aspects with the rest of their behavior. Beyond its engineering relevance however, achievement of learning in machines will also have a biological significance as it helps to understand how animals and humans learn.

When I think of my inclination in this area, I always go back in memory to those old times. Having grown up in a remote area, I did not have even an iota of information about modern technology in my early childhood. I had only nature to ponder and appreciate. My day-to-day encounters included seeing a colony of ants crawling on a straight path, birds coming back to their nest after traveling away for miles, birds forming a “V-shape” while flocking… these were some of my fascinations about nature and somehow they shaped my research inclination towards machine learning. My undergrad senior design project was to develop a fingerprint recognition module using back propagation neural network. I started exploring more about Learning Classifier Systems (LCS) and neural networks during my MS study.  Currently, I am researching on improving the convergence rate in LCS using a modified bidding strategy and niching genetic algorithms.

An LCS is a machine learning system based on reinforcement learning and genetic algorithms (GAs).  It is one type of evolutionary algorithm that utilizes a knowledge base of syntactically simple production rules which can be manipulated by a GA. LCS uses GA as a search engine to discover new rules among a population of candidate rules based on the experience of existing rules. However, the GA used in LCS is different from the one in standalone GA. Consider for instance the problem of function optimization. In a standalone GA, the intention is to find parameter settings which correspond to the extremum of the fitness function.  There is no notion of environmental state and hence the GA structure lacks the condition part of the classifier.  Instead, it only manipulates a set of parameters corresponding to the action part of a classifier.  Thus in the standalone form, GA is a function optimizer that seeks for points of maximum functional value in the search space.  In LCS, it can be perceived as a function approximator.  The reinforcement learning technique determines the rule fitness and enables the system to learn from its environment based on a reward signal that implies the quality of its action.

In a strength-based LCS, auctioning among classifiers that matches to an environmental message has been used as a means to identify winner classifiers. All classifiers participating in an auction issue a bid proportional to their strength, and a winner classifier is allowed to fire and receive a reward or punishment from its environment as a consequence of its action. In this kind of bidding strategy, good classifiers with low strength and little experience have to wait until the strength of low-performing classifiers has come down through continuous taxation.  This slows down the rate of convergence to optimal solution sets. In addition, offspring classifiers that come from weak parents as a result of randomness in the selection process may inherit a small strength as compared to experienced classifiers in the population. A mutation occurring at a point may however make them better match to more environmental inputs. But due to the initial low strength they have to wait for some time till they mature and try their action. To mitigate these shortcomings of the bidding strategy in traditional LCS, loaning and bid history are introduced.  In direct analogy with real auctions, all classifiers matching the current input compare the average bid history with their potential bid based on their current strength.  The average bid history parameter gives general information about the bid market (potential of competent classifiers) and determines the amount of loan a classifier should ask.  The following figure gives a bigger picture of the implemented centralized loaning system.

Learning in an LCS is an ongoing adaption to a partially known environment and not an optimization problem as in most reinforcement learning systems. The learning system consists of six components: the auction, clearing house (CH), reinforcement program (RP), the (GA), the reservoir and the bank.

The Auction is part of the learning system where all classifiers in the match set bid a fixed amount proportion of their strength to win an auction. Here, we have introduced a new modified bidding strategy to the existing LCS implementation where classifiers in the match set can request a loan from a bank based on loan grant criteria.  Once the loan is granted, it can be added to their strength value and used during auctions to issue a bid. This improves a classifier’s potential to win an auction and try its action which may intern result in receiving a reward earlier. In other terms, a classifier gets a chance to take action without waiting until bad classifiers weaken due to continuous taxation. The CH is the part where all classifiers clear out their taxes. The GA discovers new rules among a population of candidate rules based on the experience of existing rules. Each GA operation brings two new classifiers to the existing population classifiers. GA diversifies the population by adding some degree of randomness in picking parents for reproduction. The RP determines the rule’s fitness by generating a signal in the form of a reward or punishment. It basically guides the search to evolve to optimal solutions at last.

The Bank is the loaning agent in the system analogous to auctions in real life. We have introduced a loan and bid history concepts to the traditional LCS implementation. Classifiers in the match set make a decision on whether to take a loan or go ahead by their own. The bid history is a global variable that keeps track of the average bid in the previous auctions.  It serves as a bench mark for classifiers to decide on how much loan to take or bid without any loan.  The requested loan amount is granted when the loan criteria are satisfied.  Finally, the reservoir plays a dual role: avoids financial bankruptcy at times when a classifier’s debt exceeds its strength and subsidizes new classifiers coming from GA.

To further expedite the rate of convergence to optimal solution set, a more compact and robust system representation using a decentralized loaning is also implemented. In this scenario, loan exchange is allowed among classifiers in the population by removing the reservoir and bank components in the centralized approach.  Preliminary results proved the effectiveness of a decentralized loaning technique. Investigation of niching GA and the feasibility of integrating it with a decentralized loaning in LCS is also part of our ongoing research under BEACON’s support.

For more information about Abrham’s work, you can contact him at atworkin at ncat dot edu.

This entry was posted in BEACON Researchers at Work and tagged , , , . Bookmark the permalink.

Comments are closed.