Several popular languages, such as Haskell, Python, and F#, use the indentation and layout of code as part of their syntax. Because context-free grammars cannot express the rules of indentation, parsers for these languages currently use ad hoc techniques to handle layout. These techniques tend to be low-level and operational in nature and forgo the advantages of more declarative specifications like context-free grammars. For example, they are often coded by hand instead of being generated by a parser generator.
This paper presents a simple extension to context-free grammars that can express these layout rules, and derives GLR and LR(k) algorithms for parsing these grammars. These grammars are easy to write and can be parsed efficiently. Examples for several languages are presented, as are benchmarks showing the practical efficiency of these algorithms.
Parsing, Indentation, Offside rule
Principled parsing for indentation-sensitive languages: Revisiting Landin’s offside rule. In Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’13, pages 511–522. ACM, New York, NY, USA, 2013. ISBN 978-1-4503-1832-7. doi: 10.1145/2429069.2429129.
.@inproceedings{adams2012layout, author = {Adams, Michael D.}, title = {Principled Parsing for Indentation-Sensitive Languages: Revisiting {L}andin's Offside Rule}, booktitle = {Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages}, pages = {511--522}, year = {2013}, series = {POPL~'13}, address = {New York, NY, USA}, publisher = {ACM}, isbn = {978-1-4503-1832-7}, doi = {10.1145/2429069.2429129}, }