Principled Parsing for Indentation-Sensitive Languages: Revisiting Landin's Offside Rule

Michael D. Adams

Status: Accepted at POPL 2013

Abstract

Several popular languages, such as Haskell, Python, and F#, use the indentation and layout of code as part of their syntax. Because context-free grammars cannot express the rules of indentation, parsers for these languages currently use ad hoc techniques to handle layout. These techniques tend to be low-level and operational in nature and forgo the advantages of more declarative specifications like context-free grammars. For example, they are often coded by hand instead of being generated by a parser generator.

This paper presents a simple extension to context-free grammars that can express these layout rules, and derives GLR and LR(k) algorithms for parsing these grammars. These grammars are easy to write and can be parsed efficiently. Examples for several languages are presented, as are benchmarks showing the practical efficiency of these algorithms.

Keywords

Parsing, Indentation, Offside rule

Citation

Michael D. Adams. Principled parsing for indentation-sensitive languages: Revisiting Landin’s offside rule. In Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’13, pages 511–522. ACM, New York, NY, USA, 2013. ISBN 978-1-4503-1832-7. doi: 10.1145/2429069.2429129.

BibTeX Entry

@inproceedings{adams2012layout,
  author = {Adams, Michael D.},
  title = {Principled Parsing for Indentation-Sensitive Languages: Revisiting {L}andin's Offside Rule},
  booktitle = {Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages},
  pages = {511--522},
  year = {2013},
  series = {POPL~'13},
  address = {New York, NY, USA},
  publisher = {ACM},
  isbn = {978-1-4503-1832-7},
  doi = {10.1145/2429069.2429129},
}

Copyright Notice

© ACM, 2013. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (2013). http://doi.acm.org/10.1145/2429069.2429129.