HTML Parser (Front End)
The HTML parser (front end) enables the construction of HTML custom analysis tools, or source transformation tools. It is a member of SD's family of language front ends, based on first-class infrastructure (DMS) for implementing such custom tools. The HTML front end includes:
- Lexical analysis including parsing sources in ASCII and ISO8859-1, and UNICODE
- Conversion of literal values (numbers, escaped strings) into native values to enable easy computation over literal values
- String literals represented internally in UNICODE to support 16-bit characters
- Explicit grammar directly implements HTML4
- Full HTML parser
- Option for XHTML dialect
- Option for "dirty" HTML (accepted by most browsers)
- Automatic construction of complete abstract syntax tree
- Capture of comments and formats (shape) of literal values
- Ability to parse large systems files into same workspace, enabling interprocedural and cross-file analysis/transformation
- Ability to parse different languages into same workspace, enabling analysis/transformation of web software in multiple languages
- Facilities to process syntax trees
- Complete procedural API to visit/query/update/construct/print syntax trees
- Source regeneration by prettyprinting and/or fidelity printing of syntax trees with comments and lexical formats
- Automatically generated source-to-source transformation system
- Ability to define custom attribute-grammar-based analyzers
- Available as source code to enable complete customization
- Means to manage multiple language dialects with highly shared common core
- Robustness due to careful testing and application across many customers
Many of these facilities come as a consistent consequence of the front end being built to top of DMS.
Your organization may use DMS with the HTML front end to implement and deploy your own custom tools. The sample tools can be obtained in source form as part of the HTML front end for customization. Semantic Designs is also willing to build custom tools under contract.