Talk:LL parser

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

High

This article has been rated as High-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Is LL(*) == CFG?[edit]

Is LL(*) equivalent to any CFG?98.169.86.44 (talk) 03:58, 10 April 2012 (UTC)[reply]

Must an LL parser be table based?[edit]

(Topic heading added for clarity.) yoyo (talk) 17:12, 30 April 2009 (UTC)[reply]

A LL parser doesn't really need to be table based. Correct?

That's not an easy question. In the dragon book, for example, the term "LL parser" is nowhere used or defined. They use terms such as "top-down parser", "recursive-descent parser", "predictive parser", "nonrecursive predictive parser" et cetera, but never "LL parser". Since LR parsers are always table based (they are defined in the book as such) I assumed that LL parsers would be defined in a similar way. But if you have an important reference work that says otherwise then we should change that, of course. I will see if I can get some more information on news:comp.compilers . -- Jan Hidders 03:21 Aug 19, 2002 (PDT)

SID is an example of a parser which generates LL(1) and does not use a table. You can see an example of its generated code. Instead it calls functions for each state. Of course it could just as well have used a table to express a DAG and transition through states that way. I'm not sure if this is what you meant by "not table based". I hope that helps!

(Incidentally, to the person mentioning Des Watson's book further down this page - I had the pleasure of taking two of his courses when I was at university. They've been the most useful (and interesting) from my whole degree.) 85.158.45.41 (talk) 23:26, 9 January 2008 (UTC)[reply]

The output line of applied rules should be [2,1,3,3], right?

First rule number 2 is applied for the parentheses, then as explained in the article. Am I correct here?

Correct. I'll change it. -- Jan Hidders 13:38 Oct 22, 2002 (UTC)

AFAIK, neither LR or LL parsers need to be table-based, although in practice they pretty well always are. Any pushdown automata implemented with a table can be implemented without the table also, if the table is replaced by a giant nested if-then or switch statement. If some book defines LR parsers as being table-based, I would not put too much stock in that definition.

The "Dragon book" by Aho, Sethi and Ullman is hardly just "some book" and the one that is probably the most widely used to teach compiler theory and compiler design. But if you have another equally authoritative work that says otherwise, be my guest and change it. -- Jan Hidders 10:48 Oct 28, 2002 (UTC)

Thanks for giving the authors of the mysteriously-named "Dragon book"! Any other popular texts? yoyo (talk) 17:12, 30 April 2009 (UTC)[reply]

Mysteriously named? ANYONE who knows ANYTHING about parsing and compiler design knows the colloquial name for "Compilers: Principles, Techniques, and Tools" by Aho, et. al. I would never listen to a professed expert on the subject that did not know this. I certainly wouldn't lend credence to anyone who called it "some book."

Obviously the Dragon Book is seminal, but also dated - they're not exactly right here. LR parsers can and have been implemented without tables, but this requires nonlocal flow control and so it can't really be generated as C code. For this reason most LR parsers are in practice table-based, but this is by no means the best or most efficient approach. Dcoetzee 21:15, 17 August 2009 (UTC)[reply]

Relevant for being a [XY]-parser is that the program computes certain derivations and cannot compute other ones for certain grammars etc., terms like LL(k) or LR(k) do not require some kind of a special code structure, they simply describe what can be computed and all those different implementations can be transformed into each other. If you refer to implementation-techniques, use terms like “table based LR(1) parser”, “recursive (a|de)scent parser” etc. --Chricho (talk) 15:05, 9 November 2010 (UTC)[reply]

Correction for "rightmost derivation"[edit]

(Topic heading added for clarity.) yoyo (talk) 17:12, 30 April 2009 (UTC)[reply]

In the LL Parser page it says: "...which is indeed a rightmost derivation if the input string". Should'nt if->of and rightmost->leftmost?

The necessary correction was made before today. yoyo (talk) 17:12, 30 April 2009 (UTC)[reply]

LL1+ parser[edit]

(Topic heading added for clarity.) yoyo (talk) 17:12, 30 April 2009 (UTC)[reply]

Is a "LL1+ parser" (like http://www.antlr.org/) a synonym for "multi-token lookahead parser"? Or what does "LL1+" mean? Christoph

Interesting question, but doesn't relate to actual (present) content of this article. yoyo (talk) 17:12, 30 April 2009 (UTC)[reply]

Algorithm of constructing parser table requires clarification, imo[edit]

Fi(w) is introduced, but Fi(A) - is not. What is the difference between them? The passage "add Fi(wi) to Fi(wi)" is not clear for me. I'd like to see more detailed and corrected description of the algorithm.

there should be an example -- as usually in those math related topics the description suffers from the lack of comprehensible language (what is it with mathematicians and computer scientist that makes them unable to speak other than in some gibberish? are the afraid common people will see, there's no magic?). i cam here because my book uses incomprehensible lingo, only to see wikipedia does, too. —Preceding unsigned comment added by 217.84.33.129 (talk) 11:34, 16 June 2008 (UTC)[reply]

The section Constructing an LL(k) parsing table is horrendously-difficult for a non-specialist to follow. As a pure mathematician by training, let me tell you that even mathematicians can't read this kind of notation well, if the symbols used are unmotivated and undefined - as they are here. Therefore, I will try to find a way to explain all this more clearly; but it will take some time and effort.yoyo (talk) 16:51, 30 April 2009 (UTC)[reply]

"Widely Believed"[edit]

I think this needs a reference (to the "widely believed" part -- not to the PCCTS paper). My understanding is that it was known long before 1990 that LL(k) parsers have exponential worst-case complexity, and this is still true. We have also known for a very, very long time that many O(2^n) algorithms "sometimes" run in less-than-worst-case-time.

That said, I don't disupte that PCCTS and ANTLR are cool... but the article makse it sound like the world suddenly woke up and abandoned LR parsing in 1990, which is nowhere close to the truth, especially with the new, efficient implementations of GLR parsers that have come out in the last few years.

OrangUtanUK: The following is my understanding, and may be complete rubbish. :o)

My key reference books - the Dragon Book for historical stuff, and Niklaus Wirth's "Compiler Construction" (aka "Compilerbau") - state that for LR(1) grammars you need a "shift-reduce" compiler, and that for LL(1) grammars you need a "recursive descent" compiler. The former recognises patterns as they emerge (ooh, that's an identifier, ooh, an equals, ooh, a numeric constant, aha, that's an assignment statement), whereas the latter works by prediction (if it starts with an identifier, it must be an assignment; now we expect an equals - yes, got it; now we expect an expression - call subroutine 'expression()'). Hence bottom-up or top-down parsing.

I was confused about the relative advantages of each until I read "high level languages and their compilers" by Des Watson. An RD compiler with an LL(1) grammar is O(n), very efficient, but quite demanding on the grammar designer. An SR compiler with an LR(1) grammar is O(n^3), less efficient, but capable of dealing with some very wild grammars, like those of COBOL or Java. Some languages can't easily be converted into an LL(1) grammar, so they end up as either LR(1) (hopefully) or as LL(k>1) (O(>n)). The thing about using tables is a generalisation, I think. There are some compilers that break the usual rule.

I think your understanding is incorrect. LR(1) grammars are also parseable in O(n) time; see: Knuth, D. (1965) "On the translation of languages from left to right." Information and Control, 8, 607--639. LR languages are strictly more powerful: every LL(1) grammar is LR(1). The converse is not true: grammars containing left recursion may be LR(1) but not LL(1).

In short, LR(1) parsers are strictly more powerful, but more difficult to write. That's why parser-generators for LR(1) and related classes, are popular.

I think you might be confusing LR(1) with "generalized LR" or "GLR". This algorithm can parse any context free grammar in worst-case O(n^3), but is vastly more powerful than either LR(1) or LL(1) -- in fact, you can reduce any instance of binary matrix multiplication to an instance of context-free (and hence GLR) parsing, so we'll unlikely to ever find an algorithm that runs faster than the fastest matrix-multiplication algorithm.

Megacz 18:15, 22 May 2006 (UTC)[reply]

Maybe adding a complexity section would be beneficial, or a whole article about comparative complexity of parsing methods (but creating a whole article would only make sense if someone has a thorough understanding of ANTLR's LL(k>1) optimizations.) By the way, even more generally than just LR(1), Knuth, D. (1965) "On the translation of languages from left to right." Information and Control, 8, 607--639 states, "LR(k) grammars can be efficiently parsed with an execution time essentially proportional to the length of string". He indeed proves in this paper that "A language can be generated by an LR(k) grammar [...] if and only if it can be generated by an LR(1) grammar". Espadrine (talk) 11:55, 29 May 2011 (UTC)[reply]

Need replacement diagram[edit]

It is requested that a computing diagram or diagrams be included in this article to improve its quality. Specific illustrations, plots or diagrams can be requested at the Graphic Lab.
For more information, refer to discussion on this page and/or the listing at Wikipedia:Requested images.

"Figure 1" has been removed because it was unsourced, but the text still refers to it. It needs replacing. Hairy Dude 11:20, 11 February 2007 (UTC)[reply]

There is presently no reference to "Figure 1", so there is no dangling reference in the article. However, it is still possible that a suitable diagram might improve the article. Yet I couldn't say what kind of diagram would be most suitable here. Perhaps the explanation of the parser procedure would be clearer with a well-crafted animation of the example. Thoughts, anyone? yoyo (talk) 17:20, 30 April 2009 (UTC)[reply]

Can't understand step with the first '1'[edit]

I do not understand the step which says: "Now the parser sees a '1' on its input stream so it knows that it has to apply rule (1) and then rule (3)". How does it know? —Preceding unsigned comment added by Damien Cassou (talk • contribs) 09:24, 2 October 2007 (UTC)[reply]

Maybe because of the rule S → F 193.50.213.15 07:30, 15 October 2007 (UTC)[reply]

I agree. The document makes quite a leap there, and I'm not sure how the algorithm makes this determination. Something is unexplained. Sure, a human can immediately discover which rule applies, but a compiler doesn't have the same intelligence. What mechanics enable a parser to come up with rule 1 at this point? —Preceding unsigned comment added by 153.98.68.197 (talk) 11:55, 7 October 2008 (UTC)[reply]

I've clarified (I hope!) the Parsing procedure section, by

being pedantic about the actual steps taken (one more than the earlier description implies);
replacing vague references to 'it' by 'the parser';
changing passive sentences into active ones, which requires naming the agent of change (usually the parser) - this adds definiteness and clarity;
spelling out how the parser gets instructions from the parsing table;
removing anthropomorphisms, such as the parser "sees" or "knows".

Please let me know directly if the explanation remains unclear - thanks! yoyo (talk) 16:34, 30 April 2009 (UTC)[reply]

LL(*) is undefined[edit]

There are several mentions of LL(*) in the article, but nowhere a definition or explanation. No wonder some people think computing and maths is black magic! ;-)

Could someone please supply a comprehensible definition of LL(*)? (Not me - I'm in the dark too.) Thanks! yoyo (talk) 16:56, 30 April 2009 (UTC)[reply]

Practically the only comprehensive (and pretty comprehensible) description of this is chapter 11 of The Definitive ANTLR Reference. Essentially it describes grammars for which any given production can be parsed by using a DFA to look ahead to decide which alternative to follow. There is no finite limit on the number of tokens the DFA can look ahead, but the lookahead languages for each alternative of a production have to be disjoint and regular (if I understand correctly). --David-Sarah Hopwood (talk) 05:11, 1 June 2009 (UTC)[reply]

Definition added. Someone please check that I've got it right. --David-Sarah Hopwood ⚥ (talk) 00:20, 11 August 2009 (UTC)[reply]

I think the definition currently in this article may be incorrect. The definition as it currently reads says "An LL parser is called an LL(*) parser if it is not restricted to a finite k tokens of lookahead, but can make parsing decisions by recognizing whether the following tokens belong to a regular language (for example by use of a Deterministic Finite Automaton)." It seems to me that that definition is really describing, not LL(*) grammars, but rather LL-regular grammars ( http://www.tandfonline.com/doi/abs/10.1080/00207168008803216?journalCode=gcom20#preview ). According to http://doc.utwente.nl/66955/ , every LL-regular grammar can be transformed into an equivalent LL(1) grammar. LL(1) grammars can be parsed in linear time by a predictive parser. By contrast, LL(*) are defined in http://www.antlr.org/papers/LL-star-PLDI11.pdf as a "parsing strategy" and are said to admit not just LL-grammars but predicated LL-grammars (section 4), and to have worst-case time complexity of O(n^2) (section 4). Bayle Shanks (talk) 03:39, 25 September 2013 (UTC)[reply]

Does each context-free language have a corresponding LL(1) grammar?[edit]

Can an LL(1) grammar be constructed for each and every context-free language, or are "LL(1) languages" a subclass on its own? --95.35.202.174 (talk) 08:15, 4 July 2009 (UTC)[reply]

No, LL(1) languages are a strict subset of context-free languages (as are LL(k) languages for each k). See Rosenkrantz and Stearns, Properties of deterministic top-down grammars, Information and Control 17:226–256, 1970 [1] (the answer to your question is stated in the paper's abstract). --David-Sarah Hopwood ⚥ (talk) 02:22, 11 August 2009 (UTC)[reply]

Thanks, David-Sarah. I've added the answer to this question to the article's introduction. Danielx (talk) 23:56, 11 May 2010 (UTC)[reply]

LL(1) Conflict resolution needs more detail[edit]

using table LL parsers to build a AST[edit]

Although this article is rather good, it only mentions how to use a table-based LL parser to do nothing more than verify if a set of tokens belongs to a certain grammar. It would be great if the article went a bit further than that and mentioned how to use a table-based LL parser to act on the grammar, either to build a abstract syntax tree or even how to extract information from a given language. -- Mecanismo | Talk 17:25, 27 March 2010 (UTC)[reply]

Probable mistake in LL_parser#Constructing_an_LL.281.29_parsing_table[edit]

In the algorithm of construction Fi() set, there is "add Fi(wi) to Fi(Ai)" both in the beginning of the 2nd step and in the 3rd step. If I get it right, they are about the same thing (which needs to be done only once), so the one of them should be removed. Roman Cheplyaka (talk) 11:16, 31 March 2010 (UTC)[reply]

LL(1) conflict[edit]

Shouldn't the definition for LL(1) FIRST/FIRST conflict be: If the first set for two different productions of the same non-terminal overlap? —Preceding unsigned comment added by 128.32.35.64 (talk) 01:52, 8 March 2011 (UTC)[reply]

I agree. The rule "* FIRST/FIRST conflict: The FIRST sets of two different non-terminals intersect;" is completely wrong. For the example grammar: S → F, S → ( S + F ), F → a; we would get a conflict because FIRST(S) = {'(', 'a'} and FIRST(F) = {'a'} so they would overlap resulting in a LL1 conflict. Metalbeppi (talk) 20:21, 10 May 2011 (UTC)[reply]

Please clarify the power of LL compared to LR[edit]

A section should be added that gives more info on the parsing power of LL parsers. How does LL(k) compare to LR(0)? LR(1)? What is the power of LL(*)? 83.226.178.77 (talk) 21:47, 7 January 2012 (UTC)[reply]

Is the second def. of Follow correct?[edit]

I wonder if the def. of Follow(A) as given in the section on Conflicts correct. What if there is no production such that a non-terminal follows A? If there is a production C -> zAb, where b is a terminal, then shouldn't b be included in Follow(A)? Anyway, I haven't thought about this so much. Maybe it's already correct. — Preceding unsigned comment added by Vkantabu (talk • contribs) 20:19, 18 October 2012 (UTC)[reply]

no epsilon rules?[edit]

In the second paragraph:

"LL parsers can only parse languages that have LL(k) grammars without ε-rules. "

I don't think is correct. Later in this article, in the subsection "Left factoring", there is an example of transforming grammars to better fit LL parsers - the outcome is this gammar:

S -> 'b' E | E
E -> 'a' | ε

which has an epsilon rule.

Was it meant "LL(0) parsers can only parse languages that have LL(k) grammars without ε-rules." maybe? Not sure...

Effbiae (talk) 03:17, 5 May 2015 (UTC)effbiae[reply]

is there an example of n^4 grammar for LL(1)?[edit]

According the scientific community (http://richard.myweb.cs.uwindsor.ca/PUBLICATIONS/PADL_08.pdf), the worst case scenario for any LL parser is O(n^4). However I cannot find any example, nor can I think of any, grammar that reproduces n^4 results for LL(1) parsing.

The example grammars given in the papers (S -> ‘s’ S S | S -> ‘s’ S | .. ) do not seem to hold. This is because all the rules start with 's', and therefore any non-s is immediately rejected.

Furthermore, if an 's' *is* found, parsing does not evaluate the less complex alternatives. It will always match the first rule, 's' S S. It will recursively loop through just that rule , and successfully find all the s's until the very last few characters. Processing never evaluates the S of 's' S until the last two characters.

I've walked through this on paper. The time seems linear. Is LL(1) grammar somehow a special case? Or have I missed something? — Preceding unsigned comment added by Jktejik (talk • contribs) 04:36, 15 February 2018 (UTC)[reply]

LL(*)[edit]

Am I right in thinking that $\mathrm {LL} (*)=\bigcup _{k\in \mathbb {N} }\mathrm {LL} (k)$ ? Hairy Dude (talk) 18:05, 11 May 2019 (UTC)[reply]

I guess,

\supseteq

is easy to obtain, but

\subseteq

does not hold, since there may be a language L in LL(*) where the length of the "following tokens" (whatever that is supposed to mean in the lead) needed for parsing decisions may grow arbitrary long. This L couldn't be member of any class LL(k), and hence not of their union. - Jochen Burghardt (talk) 19:54, 12 May 2019 (UTC)[reply]

Set difference and union have equal operator precedence, needs parentheses[edit]

In the section about constructing an LL(1) parser table, the set notation for step 2 is ambiguous. Set difference and union have equal precedence. Unlike arithmetic, set theory does not define a "left-" or "right associative" rule, so parentheses are always necessary.^[1]

Done Different authors of set theory books may have different precedence conventions; however, if in doubt, parantheses should always be used. - Jochen Burghardt (talk) 07:39, 28 August 2020 (UTC)[reply]

References

^ Ashlock, Daniel; Lee, Colin (24 June 2020). An Introduction to Proofs with Set Theory. p. 59. Retrieved 27 August 2020.

C# code sample[edit]

(Copied from User talk:Jochen Burghardt#LL parser C# code sample)

Jochen, I believe I 'reasonably addresses some aspect of those concerns' that the code sample is too complex. As opposed to the C++ code, the C# is about entirely readable as proper English. Instead of programming constructs like switch statements, the lookup table is a verbatim copy of that table from the text and the rules and symbols are strings and characters and readily recognizable.

In the C++ code otoh there are needles everywhere such as the stupid enum prefixes, shouting uppercase identifiers, abbreviations like ss, pointers with stars and whatnot, etc etc etc. For non-programmers especially, that C++ is a huge hurdle because you have to parse every little few characters carefully.

The reason I wrote this C# is that the text surrounding the C++ sample and the C++ code itself are not as digestible as you apparently assume and I believe my C# has none of these problems. 2A02:A45E:1569:1:30EB:ABCC:7DF4:2369 (talk) 15:38, 15 January 2023 (UTC)[reply]

I think we should not have 3 parser implementations in the article. From your above post, it seems you'd agree with that, but suggest to remove the C++ implementation. In that case, I leave the decision to editors who know both C# and C++. I only know some C++, but no C#, so I'd be biased. The C# implementation can be found at https://en.wikipedia.org/w/index.php?title=LL_parser&oldid=1133716445 . - Jochen Burghardt (talk) 16:52, 15 January 2023 (UTC)[reply]

[1] Ashlock, Daniel; Lee, Colin (24 June 2020). An Introduction to Proofs with Set Theory. p. 59. Retrieved 27 August 2020.

[1]