Parsing with Zippers (Fuctional Pearl)

Pierce Darragh and Michael D. Adams

Status: Published at ICFP 2020

Abstract

Parsing with Derivatives (PwD) is an elegant approach to parsing context-free grammars (CFGs). It takes the equational theory behind Brzozowski’s derivative for regular expressions and augments that theory with laziness, memoization, and fixed points. The result is a simple parser for arbitrary CFGs. Although recent work improved the performance of PwD, it remains inefficient due to the algorithm repeatedly traversing some parts of the grammar.

In this functional pearl, we show how to avoid this inefficiency by suspending the state of the traversal in a zipper. When subsequent derivatives are taken, we can resume the traversal from where we left off without retraversing already traversed parts of the grammar.

However, the original zipper is designed for use with trees, and we want to parse CFGs. CFGs can include shared regions, cycles, and choices between alternates, which makes them incompatible with the traditional tree model for zippers. This paper develops a generalization of zippers to properly handle these additional features. Just as PwD generalized Brzozowski’s derivatives from regular expressions to CFGs, we generalize Huet’s zippers from trees to CFGs.

The resulting parsing algorithm is concise and efficient: it takes only 31 lines of OCaml code to implement the derivative function but performs 6,500 times faster than the original PwD and 3.24 times faster than the optimized implementation of PwD.

Keywords

Parsing; Derivatives; Zippers; Parsing with Derivatives

Citation

Pierce Darragh and Michael D. Adams. Parsing with zippers (fuctional pearl). Proceedings of the ACM on Programming Languages, 4(ICFP):108:1–108:30, August 2020. ISSN 2475-1421. doi: 10.1145/3408990.

BibTeX Entry

@article{darragh2020parsing,
  author = {Darragh, Pierce and Adams, Michael D.},
  title = {Parsing with Zippers (Fuctional Pearl)},
  journal = {Proceedings of the ACM on Programming Languages},
  pages = {108:1--108:30},
  year = {2020},
  volume = {4},
  number = {ICFP},
  address = {New York, NY, USA},
  month = aug,
  publisher = {ACM},
  issn = {2475-1421},
  doi = {10.1145/3408990},
}