12 июля 2013 г.

Parsers in modern compilers

Modern compilers and interpreters are based on different approaches to implement parsers


Brief overview of current state

  • GCC 4.8.1 - hand-written top-down recursive descent parser
  • Clang 3.3 - hand-written top-down recursive descent parser
  • PathScale EkoPath 4 - hand-written top-down recursive descent parser (based on GCC frontend)
  • Open64 5.0 - hand-written top-down recursive descent parser (based on GCC frontend)
  • Cray Chapel 1.7.0 - bottom-up LALR(1) parser (generated by Bison), left-recursive grammar
  • IBM X10 2.3.1 - bottom-up LALR(1) parser (generated by LPG)
  • Ruby 2.0.0 - bottom-up LALR(1) parser (generated by Bison), left-recursive grammar
  • PHP 5.5.0 - bottom-up LALR(1) parser (generated by Bison)
  • Open JDK 7default: hand-written LALR parser; subproject: ANTLR based LL(*) parser
  • Python 2.7.5 & 3.3.2hand-written top-down recursive descent parser
  • Groovy 2.1.6 - LL(*) parser (generated by ANTLR)
  • Go 1.1.1 - bottom-up LALR(1) parser (generated by Bison)
  • Google V8 JavaScript Engine - hand-written top-down recursive descent parser
  • Apple JavaScriptCore - hand-written top-down recursive descent parser






5 июля 2013 г.

Lexers in modern compilers


In modern compilers two approaches are widely used to implement a lexer/scanner/tokenizer: the hand-written approach and the automatic lexer generator approach.



Compiler Lexer/scanner/tokenizer implementation Source
Cray Chapel 1.7.0 Flex chaplel/compiler/parser/chapel.lex
IBM X10 2.3.1 LPG (LALR parser generator) x10/x10.compiler/src/x10/parser/X10Lexer.java
GCC 4.8.1 handwritten gcc/gcc/c-family/c-lex.c
Clang 3.3 handwritten cfe/lib/Lex/Lexer.cpp
PathScale EkoPath 4 handwritten - base on GCC frontend
Open64 5.0handwritten - based on GCC frontend