This week’s BEACON Researchers at Work blog post is by MSU graduate student Erik Fredericks. His first BEACON blog post is here.
Engineers in industry are often given the somewhat daunting task of updating legacy software systems. Consider, for a moment, a piece of software used on a daily basis at a particular company, say, Initech. This software was programmed roughly twenty years ago by an engineer that has since left the company, leaving only cryptic comments and mostly unreadable software requirements. However, the importance of this software is paramount to Initech’s success, given its near-ubiquitous use throughout the corporation.
Times have changed over the past 20 years, Initech has gone through some changes, and our software package is quietly reaching the end of its usefulness. However, everybody still continues to use it, even though it is no longer as useful as it once was. Conveniently, a new engineer has been hired and is given his first task: update the software system, thereby extending its effective lifetime. Everybody is already used to it, the TPS reports it generates are spot-on, re-training all employees would be cost-prohibitive, and really, how hard can it be to update software?
After spending weeks gathering new requirements, analyzing the source code, and evaluating the overall state of the framework, our engineer has determined that the easiest way to fix the software system is to use software composition. Composition is a reuse technique for introducing new functionality into existing systems by reusing existing modules from other software systems. For instance, security can be added to a web application by composing in a security module that has been written for another application, where the composition is enabled by transforming the existing data into the format required by the new security module. Moreover, the data being returned must also be transformed into the format required by the legacy application.
Even though the code for the new module has already been written, there still remains many different ways in which to compose the module. For instance, how exactly should the legacy software system send data to the module? How should the data be returned? Is it even written in the same programming language? There are innumerable approaches for transforming data, and our software engineer would rather not spend the extra time and effort figuring out how to accomplish this task. Our engineer decides that an automated approach is best. Enter, context-free grammar-based genetic programming (CFG-GP).
CFG-GP is a search-based technique for automatically generating programs, similar in fashion to the standard genetic program. It comprises a population of candidate solutions that are evolved over time towards a particular goal or set of goals. CFG-GP differs from standard genetic programming in the solution representation. Particularly, each possible solution defined for the genetic program is specified by a context-free grammar. The grammar imposes a structure on generated solutions, effectively reducing the amount of invalid programs typically generated by a normal genetic program and also reducing wasted computing time. For composition, the grammar provides a structure for the many different ways in which data can be transformed for a particular programming language.
As with a typical genetic program, CFG-GP must also specify a fitness function to guide the search process towards an optimal solution. In our case, we leverage software preconditions and postconditions, along with common software engineering metrics, to guide the search process. Preconditions and postconditions define the conditions necessary for a module to be invoked within a program. For example, preconditions can define the data type and order of method parameters, and postconditions can define a method return value or the state of the program following invocation of the method. Together, preconditions and postconditions define what exactly is required for a method to be used by a program, enabling us to use it as part of the fitness function. In our case, candidate solutions must satisfy all preconditions and postconditions.
For example, we may wish to define preconditions and postconditions for a new security module, particularly for encrypting data to be sent to a server. A sample encryption module we wish to compose is defined as follows, in which the encryption function accepts two strings and a string pointer, and returns a boolean value:
<bool> openssl_private_encrypt(String data,
String &encrypted,
String private_key)
We now define the preconditions and postconditions necessary to compose this module. Particularly, the module preconditions represent the format the data is required to be in prior to module invocation. Essentially, we are interested in the parameters accepted by the module:
- STRING: data_in
- POINTER: data_out
- STRING: key
Next, the postconditions for the module must be defined to represent the state of the data following invocation of the module. Here, we are interested in what the module returns back to the legacy software system:
- BOOLEAN: return_value
Using these preconditions and postconditions, coupled with further definition of data available and state of the legacy software system prior to and following module invocation, we can successfully compose the security module.
Common software engineering techniques also help to guide the search process at a high level, enabling an engineer to fine-tune the set of optimal solutions. For instance, minimizing the amount of extra function calls, data dependencies, or amount of generated code may be required on systems that have limited memory available. Obfuscation of generated code may also be of interest for high-security systems. Satisfaction of preconditions and postconditions ensures that generated composition strategies are successful, and tailoring of solutions with software engineering techniques can tailor composition strategies for the needs of the engineer.
Thus far, our engineer has created a grammar detailing possible composition strategies, specified the preconditions and postconditions necessary to compose the new module, and executed SAGE to generate a set of optimal composition strategies. It seems that far more work has been incurred by setting up and running SAGE than would have been required to simply hand-code the composition. The benefit, however, is that SAGE can now be run for nearly any composition that the engineer may need to perform in the future. The defined grammar now forms a basis for any future compositions and can be easily extended to other domains that the engineer may encounter. Furthermore, SAGE can also automatically generate valid code in whichever language is required, enabling the engineer to simply copy and paste the generated composition strategy into the legacy framework.
For more information about Erik’s work, you can contact him at freder99 at msu dot edu.