Note: This paper has an erratum under “Annotation” below.
Several popular languages including Haskell and Python use the indentation and layout of code as an essential part of their syntax. In the past, implementations of these languages used ad hoc techniques to implement layout. Recent work has shown that a simple extension to context-free grammars can replace these ad hoc techniques and provide both formal foundations and efficient parsing algorithms for indentation sensitivity.
However, that previous work is limited to bottom-up, LR(k) parsing, and many combinator-based parsing frameworks including Parsec use top-down algorithms that are outside its scope. This paper remedies this by showing how to add indentation sensitivity to parsing frameworks like Parsec. It explores both the formal semantics of and efficient algorithms for indentation sensitivity. It derives a Parsec-based library for indentation-sensitive parsing and presents benchmarks on a real-world language that show its efficiency and practicality.
indentation sensitivity; layout; offside rule; parsec; parsing
Erratum: In the “Non-Terminal” clause of Figure 6, there is a typo. The “m” in the result should be replaced with “f”.
Indentation-sensitive parsing for Parsec. In Proceedings of the 2014 ACM SIGPLAN Symposium on Haskell, Haskell ’14, pages 121–132. ACM, New York, NY, USA, September 2014. ISBN 978-1-4503-3041-1. doi: 10.1145/2633357.2633369.
and .@inproceedings{adams2014layout2, author = {Adams, Michael D. and A{\u{g}}acan, {\"O}mer S.}, title = {Indentation-sensitive parsing for {P}arsec}, booktitle = {Proceedings of the 2014 ACM SIGPLAN Symposium on Haskell}, pages = {121--132}, year = {2014}, series = {Haskell '14}, address = {New York, NY, USA}, month = sep, publisher = {ACM}, isbn = {978-1-4503-3041-1}, doi = {10.1145/2633357.2633369}, annote = {**Erratum:** In the "Non-Terminal" clause of Figure 6, there is a typo. The "m" in the result should be replaced with "f".}, }