Refactoring Tools
Refactoring is the process of modifying programs to improve program structure without changing functionality. A standard guide to refactoring possibilities and methods for accomplishing such refactorings can be found in Refactoring: Improving the Design of Existing Code (Fowler et al), but that list of refactorings is by no means exhaustive. The ideas apply to all scripting and programming languages and any other formal computer documents such web pages and database schemas.
Refactoring can be done manually on programs of modest size. However, it is easier and far more reliable if refactoring is implemented using robust tools that can automate the desired changes. The ideal refactoring tool can:
- detect a wide variety of code patterns representing opportunities for basic refactorings
- can apply refactorings without breaking the program
- can be extended to handle arbitrary refactorings
- can be applied to any computer language or formal document
Refactoring tools have two principal use cases. The first is refactoring in the small, e.g. interactive application by a programmer while operating in an IDE ("refactoring browser"). The second is refactoring in the large, e.g. off-line applications of single or multiple sets of refactorings by batch tools. The first is a useful convenience. The second can substitute entirely for the first, and can additionally carry out code modifications simply not practical by manual or even interactive refactoring.
There are some existing refactoring tools for the most widely used modern languages, such as Java, C# and C++, mostly as refactoring browser plug-ins to the most mainstream IDEs (e.g., Microsoft Visual Studio or Eclipse). But these tools are generally not reliable in applying the refactorings correctly, and in fact the most reliable are for Java, followed by C#, followed by C++ as a distant third. They also tend not to have any means for extending the set, and none known to us work on anything other than one specific dialect of one computer language. Refactoring tools for widely used languages such as COBOL or PHP, let alone more exotic languages, such as Coldfusion, APL or FORTRAN basically don't exist.
To be able to do refactoring well, a tool must be able to:
- parse a desired dialect of a specific language of interest at the same level of detail as its compiler
- build an compiler data structure to model the parsed program
- process the multiple files that comprise most applications at the same moment
- carry out compiler-level control and data flow analysis across multiple files
- use pattern matching and flow analysis results to recognize refactoring opportunities
- make changes to the compiler data structures that represent the refactoring result
- regenerate valid source code including the modifications, including retention of comments and code layout
Doing all of this is quite a complex task for even a single language and is the principal explanation for why refactoring tools are so rare. Adding the requirement that the refactoring tool be both highly interactive and integrate into a major (or worse, a minor IDE for which there is no economic justification) makes it that much harder to build such tools.
One way to avoid the problem of building limited capability refactoring tools, is to use generalized compiler technology that can be parameterized by language syntax/semantics and analysis/change details, such as the DMS Software Reengineering Toolkit. Being a general purpose program transformation system, DMS can be configured to implement an unbounded variety of refactorings, for a wide variety of langauges. Because DMS was designed to scale, it can also make such changes across very big systems reliably.
Refactoring tools have been or can be implemented with DMS for various languages include:
- Finding and removing duplicated code (Martin Fowler's #1 "code smell")
- Reformatting code to enhance readability
- Splitting God classes across large C++ code bases
- Renaming identifiers within a scope
- Replacing spaghetti-tangles of gotos with structured code
- Using CASE statements instead of nested IFs on the same variable
- Insertion of function/method documentation by automated extraction of facts
- Style-check repairs
- Converting blocks of code into functions/subroutines/methods with appropriate parameters
- Removing dead code
- Restructuring APIs to support a different OS
DMS can be configured to carry out arbitrary refactorings for your language, whether it is widely used or not. Semantic Designs can accomplish this as a service, or can provide your organization with tools and training on how to use DMS to implement such refactorings.