Source Code Formatters
Concept
A source code formatter accepts a program source file, and generates another equivalent source file which is nicely formatted according to the source language syntax, including indentation, normalized case for identifiers, etc. Example Formatted C text
The advantage of using a source formatter is severalfold:
- Properly nested indentation makes it much easier to identify matching program blocks
- No more manual re-formatting of programs for readability
- No more religious wars about the "right format"; automatic formatting kills the argument
- A standard style used across an organization enables faster work because programmers know what to expect
- Code from other organizations can be easily standardized
- Smaller DIFFs when doing code merges
- Eventually your customer will see some of your company's code, and presentation matters.
Programmers spend 50% of their time just looking at source code. Formatting source code can make them more productive in this task, saving a significant amount of an IT department's time and budget.
A related topic is obfuscators, which are "formatters" designed to make code very difficult to read and comprehend, to discourage reverse engineering.
Technology
Many conventional formatting tools use ad hoc string processing methods to implement the formatting process. This can work pretty well for many sample files. But it often fails when multiple statements per line, nested comments, comments around incomplete blocks of code or keywords, obscure language features such as escapes in quoted strings, etc. are encountered, as they always are in large systems of software. The result of such failure is badly formatted source text, or worse, program source that is no longer is acceptable to the source language compiler.
The most reliable way to build a formatter is to parse and unparse the source language according to the source language lexical and syntax rules. This ensures that the syntax structures found match those of the language. All of SD's formatters work this way, because they are based on DMS's ability to parse and prettyprint source files, and are based on the language definition modules used to drive DMS for large scale software reengineering tasks.
Another useful feature of SD's formatters is the availability for many languages (and the practical possibility of obtaining them for custom languages or dialects). All SD's formatters are designed to operate as command-line style programs, to enable inclusion in scripts. Consistent handling of formatting switches and I/O conventions across formatters aid software engineering staff when handling the multiple languages typically used by an organization.
Files can be formatted one at a time, or, using a project file to specify a list of files, in entire batches. Project files can be built using a text editor, or using a GUI file-selector provided.
Source files can be formatted to US-ASCII (ISO646), Western European ASCII (ISO8859-1), or Unicode representations. This is convenient for applications using non-English text strings or comments.
SD's formatters are presently available on Windows 2003, XP, and later operating systems.
Available Formatters
SD offers a family of formatters based on DMS. Presently available are:
- Ada83/95
- C (GCC2, GCC3, GCC4, ISO/IEC 9899 "ANSI", and Microsoft Visual C6 versions with optional obfuscation capability)
- C++ (GCC3, GCC4, ISO/IEC 14882 "ANSI", Microsoft Visual C++6 and VisualStudio2005/2010 versions with optional obfuscation capability)
- COBOL (IBMEnterprise and COBOL85/IBM VS COBOL II)
- C# (1.2, 2.0, 4.0, 5 with optional obfuscation capability)
- ECMAScript (JavaScript) (with optional obfuscation capability)
- Java (1.4, 1.5, 1.6, 1.7 with optional obfuscation capability)
- JCL
- Pascal (ISO 7185)
- PHP (4 and 5 with optional obfuscation capability)
- PL/SQL (with optional obfuscation capability)
- SystemC (with optional obfuscation capability)
- Verilog (SystemVerilog, formatter only; IEEE1364-2001 with optional obfuscation capability)
- VHDL (with optional obfuscation capability)
- VisualBasic (VBScript with optional obfuscation facility)
- XML
The following formatters are in Beta. Early adopters, please inquire:
- ActionScript
- Matlab M language
Custom Formatting Options
Semantic Designs can build custom formatters with special features:
- Unusual languages or dialects
- HTMLized output with colorized keywords
- Source Browsers: HTMLized output with clickable cross-references (examples):
- COBOL Source File Browser: Full cross-reference according to COBOL scoping rules
- Java Source Browser: Full software system sources with clickable cross-references embedded in JavaDoc.
- Automatic insertion of structured comments for data declarations, functions, etc.
- Anti-formatters or code obfuscators (rename variables, drop comments, remove indentation and line breaks)