Parsing with Zippers

Pierce Darragh and Michael D. Adams

Status: To appear at ICFP 2020

Abstract

Parsing with Derivatives (PwD) is an elegant approach to parsing context-free grammars (CFGs). It takes the equational theory behind Brzozowski’s derivative for regular expressions and augments that theory with laziness, memoization, and fixed points. The result is a simple parser for arbitrary CFGs. Although recent work improved the performance of PwD, it remains inefficient due to the parser repeatedly re-traversing parts of the grammar.

In this functional pearl, we show how to avoid this inefficiency by suspending the state of the traversal in a zipper. When subsequent derivatives are taken, we can resume the traversal from where we left off without re-traversing the grammar from the start.

The original zipper is designed for use with trees, but CFGs can include shared regions, cycles, and choices between alternates, which makes them incompatible with this model. This paper develops a generalization of zippers to properly handle these added features.

The resulting parsing algorithm is concise and efficient: it takes only 36 lines of OCaml code to implement the derivative function but performs 15,600 times faster than the original PwD and 3.43 times faster than the optimized implementation of PwD.

Keywords

Ambiguous grammars; Tree automata

Citation

Pierce Darragh and Michael D. Adams. Parsing with zippers. Proceedings of the ACM on Programming Languages, 4(ICFP), August 2020.

BibTeX Entry

@article{darragh2020parsing,
  author = {Darragh, Pierce and Adams, Michael D.},
  title = {Parsing with Zippers},
  journal = {Proceedings of the ACM on Programming Languages},
  year = {2020},
  volume = {4},
  number = {ICFP},
  address = {New York, NY, USA},
  month = aug,
  publisher = {ACM},
}