Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2015-01-31

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:48 jeffreykegler joined #marpa
00:55 jeffreykegler Jean-Damien's M4 clone looks very interesting: https://github.com/jddurand/MarpaX-Languages-M4
00:55 jeffreykegler A CPAN module in the making, I take it?
02:52 flaviu joined #marpa
04:19 jeffreykegler joined #marpa
05:19 sadmac jeffreykegler: suppose you constructed a regular approximation of your grammar using Mohri-Nederhoff, parsed with that using a simple DFA, then constructed Earley states from the results and continued parsing with the Earley algorithm.
05:21 jeffreykegler Recall the idea of the Ruby Slippers, left-eideticism, etc.?
05:22 jeffreykegler That is, Marpa being able to do procedural parsing because you can stop and start anywhere with full knowledge of the parse?
05:22 jeffreykegler I don't think approaches like the one you describe can preserve this property.
05:23 sadmac fair enough
05:23 jeffreykegler Grune and Jacobs describe several of these highly mathematical optimizations, and they are enticing.
05:24 jeffreykegler It's hard to tell when they are not counter-productive -- that is the extra effort costs more time & trouble than it saves.
05:25 jeffreykegler But all of them tend to put you back into LALR-land -- algorithms which might be fast, but are stupid.
05:25 jeffreykegler Stupid in the sense they don't know what they are doing at any point, or why.
05:26 jeffreykegler This is *NOT* to say I see no hope in any of these approaches.
05:26 jeffreykegler And I do glance at these articles, because they do dangle enticing possibilties.
05:27 jeffreykegler But I think I've explained why I'm not pursuing these approaches myself.
05:28 sadmac fair enough
05:32 sadmac jeffreykegler: supposing you used the Harspool-style earley states, and back pointers, could you not view the states as nodes in a set of converging linked lists with the back pointer being the list pointer? And are these lists not lists of LR0 states? And does earley not modify them in a manner consistent with converging stacks? Thus making Earley roughly equivalent to GLR with a fast stack-popping mechanis
05:32 sadmac m?
05:33 jeffreykegler Again, you lose left-eideticism AFAICT
05:34 sadmac Well, other than bringing back A&H states, I haven't changed much, so I shouldn't necessarily be losing much
05:35 jeffreykegler And (forgive me) but was it you who was telling me that you've experimented the the approach of caching stacks, etc., that GLR-like processing involves.
05:35 jeffreykegler ?
05:35 sadmac jeffreykegler: I've contemplated this equivalence previously and may have hinted at it, but beyond that no.
05:36 jeffreykegler OK, because I have, and I can tell you it's an awful mess.
05:36 jeffreykegler Now to be clear ...
05:36 jeffreykegler what I just told you ...
05:37 jeffreykegler is what others might have told me about going back to Earley's, and had I been told that, and listened, there would be no Marpa.
05:38 jeffreykegler It's a matter of guessing where complexity pays off, and where you are hitting the Law of Diminishing Returns.
05:39 sadmac right
05:40 sadmac jeffreykegler: unrelated note, does Marpa do any early ambiguity detection?
05:40 jeffreykegler Anyway, you asked for my guesses, and now you have them!
05:40 sadmac I've been exploring that as well
05:40 jeffreykegler No.
05:40 jeffreykegler No early ambiguity detection.
05:41 jeffreykegler Remind me, which language were you targeting?
05:41 sadmac That's a requirement I've been given. At least during compile phase there should be a way to get the test suite to point out ambiguities to you
05:41 sadmac Ceylon
05:41 jeffreykegler "been given"?  by whom, if I can ask.
05:41 sadmac jeffreykegler: Gavin King
05:41 sadmac the language designer
05:42 jeffreykegler of Ceylon?
05:42 sadmac he's had a few opinions on what he wants this parser to do
05:42 sadmac yes
05:42 ronsavage joined #marpa
05:43 jeffreykegler Early ambiguity detection is something I have not investigated.
05:43 jeffreykegler Work on and with Marpa does not suggest any use for it, but that does not necessary mean there is no use,
05:44 Aria I think that might be useful when dealing with the union of two languages.
05:44 Aria Just a passing thought.
05:44 sadmac You could argue for just picking a parse tree if you could define a sensible way of choosing.
05:44 jeffreykegler Aria: why?
05:44 sadmac ceylon.parse has precedence and associativity extensions, so we can resolve ambiguities manually pretty easily
05:45 Aria Trying to decide where one language ends and another begins -- for recognition, not useful, but I think the semantics and processing might need to be aware of it.
05:45 jeffreykegler Why not just rewrite to use disjoint symbol sets?
05:46 jeffreykegler Or is the point that you want to know if something might be valid in both languages?
05:46 Aria Yeah, that point. And so the parser can have procedural bits sort out where the parse of one should end, prune off the ambiguity correctly.
05:47 jeffreykegler sadmac: what kind of parser in ceylon.parse?  Top-down.
05:47 jeffreykegler ?
05:47 sadmac jeffreykegler: ...I might be revealing some profound ignorance here, but aren't all earley parsers top-down?
05:48 jeffreykegler Some people call them that but IMHO that is more misleading than helpful.
05:48 jeffreykegler sadmac: are you familiar with the basics of Earley items, etc.?
05:49 sadmac jeffreykegler: I should hope. Ignorant or not I did write one of these things :)
05:49 jeffreykegler Btw, other people call Earley's bottom-up.  Grune & Jacobs discuss this in their section on Earley's.
05:49 jeffreykegler I forget which way they decided to class it.
05:50 sadmac it definitely feels fuzzier when you look at earley than when you look at some other types of parsers.
05:51 jeffreykegler So, look at how Earley's work and ask yourself, "Is that top-down?"
05:51 Aria Especially since parsing starts at the top, but tree construction...
05:51 jeffreykegler Top-down is a very useful term for understanding recursive descent ...
05:51 jeffreykegler and bottom-up is very useful for understanding LR and LALR
05:52 jeffreykegler But the terms are not helpful in understanding what Earley's is and what it can do.
05:55 jeffreykegler It may be useful, also, to understand that the term "ambiguity" gets used in different senses, which can confuse things.
05:55 sadmac Earley, if not adjusted, will give me all possible parse trees. How do you pick one?
05:55 jeffreykegler In many cases users of parsing algorithm X, call "ambiguous" something that X cannot resolve.
05:56 * jeffreykegler decides to finish his answer on ambiguity first
05:56 sadmac certainly
05:57 jeffreykegler Which if your parser is top-down is just about anything not a Dyck language or a regular expression.
05:58 jeffreykegler sadmac: I say this because you associated precedence/associativity with ambiguity -- these are (in practical use) not ambiguous in the full Theory of Parsing sense ...
05:58 jeffreykegler it is just that top-down is too limited to parse them.
05:59 sadmac jeffreykegler: oh certainly, you can encode them in the grammar.
05:59 jeffreykegler In the Marpa context, ambiguity is true ambiguity in the Theory of Parsing sense, and associativity/precedence are not ordinarily ambiguous.
06:00 jeffreykegler "encode them in the grammar"?
06:01 sadmac jeffreykegler: choosing Expr -> Mul + Mul / Mul -> Num * Num  rather than Expr -> Expr + Expr   / Expr -> Expr * Expr
06:01 sadmac the latter case requires precedence annotation to be unambiguously parsed
06:01 jeffreykegler Ah, yes.  Rewriting.
06:02 sadmac I don't use rewriting to solve it, but you can rewrite. Or more importantly, you can write it that way in the first place, which many parsers make you do
06:02 jeffreykegler It is unambiguous and should be unambiguously parsed.
06:02 sadmac There's a UI argument that the second way is clearer though.
06:02 sadmac even if we need some non-EBNF constructs to make it unambiguous
06:02 jeffreykegler Have you looked at Marpa's prioritized statements?
06:03 sadmac not directly. I can imagine how they'd work
06:03 sadmac if I'm guessing at the meaning correctly
06:03 jeffreykegler These allow writing as "Expr -> Expr + Expr   / Expr -> Expr * Expr", but it is not ambiguous for Marpa.
06:04 jeffreykegler Perhaps I'm a bit obscure so far?
06:05 * jeffreykegler was asking sadmac if he's being clear
06:07 sadmac I follow
06:08 jeffreykegler OK
06:08 sadmac jeffreykegler: how do prioritized rules encode associativity in that case?
06:08 jeffreykegler You can specify it.  I think the default is left, but you get to pick.
06:09 jeffreykegler Now, follow this, it may be helpful.
06:09 sadmac ok
06:09 jeffreykegler Looking at a prioritized statment as in one of my tutorials -- can you bring up one?
06:10 jeffreykegler From it, you can see what is meant ... that is rewriting E ::= E * E || E + E
06:10 jeffreykegler is a rote process, if you do it by hand.
06:11 jeffreykegler So far, so good?
06:11 sadmac trying to find one...
06:12 jeffreykegler https://metacpan.org/pod/distribution/Marpa-R2/pod/Scanless.pod
06:14 jeffreykegler You see in the synopsis the big statement recursively defining <Expression>?
06:15 sadmac yes
06:16 jeffreykegler OK, you could rewrite that by hand into something that followed the precedence and associativity if you had to, right?
06:17 jeffreykegler || means lower the priority of the next statment
06:17 jeffreykegler while a single bar '|' separates choices at the same precedence.
06:17 jeffreykegler and I have abverbs for changing associativity.
06:17 jeffreykegler * abverbs -> adverbs
06:18 sadmac ok
06:18 jeffreykegler So (asking rhetorically) if it is possible to do it by hand, by rote, why cannot a computer do it, also by rote?
06:19 jeffreykegler And the answer in the past was, yes you could, but there was no way of being sure the result was anything your top-down parser could handle.
06:20 jeffreykegler But Marpa can easily handle the rewritten form.
06:21 jeffreykegler And a lot of the question you were asking about the "ambiguity" of expressions with precedence/associativity just disappear.
06:21 sadmac jeffreykegler: the purpose of the ambiguity checker is to check whether the user entered | when they meant ||
06:21 jeffreykegler Huh?
06:22 jeffreykegler OK, I get it.
06:23 jeffreykegler Right, check that they did not specify enough precedence.
06:23 jeffreykegler But what about "E ::= E * E || E + E | E - E"?
06:24 jeffreykegler How do you know that the single bar should not be a || instead?
06:24 jeffreykegler I mean we know how it is traditionally done, but doesn't that come down to reading intent?
06:26 jeffreykegler sadmac: btw have you looked at http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2014/11/ll.html ?
06:26 jeffreykegler The technical part of it will already be familiar to you ...
06:26 jeffreykegler the description of the algorithms, but ...
06:27 sadmac jeffreykegler: determinism is a useful error check. and ambiguities mean a program that expects to always get one parse tree suddenly gets two on a weird day
06:27 jeffreykegler what I say about their capabilities, etc., might help explain some of this.
06:27 sadmac jeffreykegler: I'll look
06:28 jeffreykegler Btw, you do know that ambiguity detection for a grammar is undecidable in general?
06:28 sadmac jeffreykegler: naturally
06:28 jeffreykegler Ok, given that,
06:28 sadmac jeffreykegler: I was hoping it was decidable in the bounded case, but nobody seems to have found a way
06:29 sadmac at least not a fast way
06:29 jeffreykegler Everything is decidable in the bounded case. :-)
06:29 jeffreykegler Marpa actually may be helpful to experiment with ...
06:29 sadmac yeah, but generating all possible texts of size N is not a good time
06:29 jeffreykegler because you can efficiently experiment with a selection of inputs.
06:30 jeffreykegler Marpa *can* tell is a parse (grammar+input) is ambiguous, and can do so quickly.
06:30 jeffreykegler * is a parse -> if a parse\
06:30 sadmac jeffreykegler: a goal of ceylon in general is to make as many errors as possible be compile-time errors
06:30 sadmac jeffreykegler: so we don't want to wait for input if we can help it
06:31 jeffreykegler AFAIK there isn't a better tool than Marpa for this.
06:31 jeffreykegler So if you are attacking the bounded case ...
06:31 sadmac jeffreykegler: btw, you don't need to rewrite the grammar to handle precedence and priority
06:32 jeffreykegler I know top-down parsers do it in various ways, yes.
06:33 sadmac jeffreykegler: Earley itself makes it quite easy to do inline as well
06:33 jeffreykegler any, attacking the bounded case ...
06:33 sadmac jeffreykegler: just don't predict a lower priority rule from a higher priority one
06:34 jeffreykegler you could experiment with ways of generating the "right" set of inputs that would give enough coverage to detect most ambiguities.
06:34 jeffreykegler I've only skimmed the Ceylon docs, so perhaps this is uninformed ...
06:35 jeffreykegler but Ceylon appears to be very focused on language oriented programming (LOP)
06:35 jeffreykegler Is that right?
06:35 sadmac I'm not familiar with the term
06:36 jeffreykegler You could call it programming via small languages or DSL's.
06:36 jeffreykegler Larry in the early days of Perl 6 talked a lot about this.
06:37 jeffreykegler and Steve Yegge has been a booster.
06:37 sadmac not really, no. Though my efforts and others hope to make that a very easy thing to do in Ceylon one day. I think what you're noticing is just the type system, which is very very powerful and has some constructs that sort of look like parsing.
06:37 sadmac (in fact we map many of those constructs on to parsing in this parser)
06:37 jeffreykegler OK, I see.
06:38 jeffreykegler Forgive, then.  I tend to see parsing everywhere. :-)
06:38 jeffreykegler Occupational hazard. :-)
06:38 sadmac In ceylon, you can have variables of type String|Integer, or of type Callable&Hashable
06:39 jeffreykegler Going back to the question you first started with, then.
06:39 jeffreykegler The sort of "alphabet soup" enhancement of Earley's with a lot of other stuff. :-)
06:40 jeffreykegler If you're tackling a very specific grammar/problem -- that is looking at a special single case ...
06:40 jeffreykegler that kind of approach becomes more promising.
06:42 sadmac The precedence/associativity stuff was cheap
06:42 sadmac The other additives are all UI considerations
06:42 sadmac https://github.com/sadmac7000/ceylon-parse/blob/master/source/ceylon/parse/ceylon/ceylonGrammar.ceylon
06:42 sadmac This is the first few bits of Ceylon's own grammar implemented in ceylon.parse
06:42 sadmac not sure how much you'll grok if you don't know ceylon but you might pick up on the big idea
06:44 jeffreykegler Hmnmmm
06:45 jeffreykegler I think I get the idea -- it's recursive descent, re-packaged.
06:46 jeffreykegler I'm reminded a lot of the Perl 6 self-parser.
06:46 sadmac It's the UI of recursive descent, with earley underneath
06:47 jeffreykegler There is Earley underneath?
06:47 sadmac jeffreykegler: all the methods annotated "rule" are turned into EBNF rules and put into an earley parser
06:47 sadmac jeffreykegler: we use metaprogramming to turn the structure of the code into the grammar
06:47 jeffreykegler Which is currently of the traditional pre-Leo sort?
06:48 sadmac indeed
06:48 jeffreykegler If I may be so bold, I think you guys are going to like my work.
06:48 sadmac Though once you have EBNF on top, I don't know how much value Leo adds. Why would you /use/ right recursion if you have a kleene star?
06:49 sadmac jeffreykegler: so far I've been pleased
06:49 jeffreykegler associativity
06:49 sadmac jeffreykegler: we solve that elsewhere
06:49 sadmac jeffreykegler: rule(0, rassoc)
06:49 jeffreykegler Well now you don't have to.
06:49 sadmac I tend to prefer to
06:50 sadmac I mean, I can turn right recursion into kleene stars, which is Leo optimization anyway
06:50 sadmac and I will eventually
06:50 sadmac but right now, not much use
06:50 sadmac the only right-recursive rule that comes to mind is bounded to one layer of recursion.
06:50 jeffreykegler Does your handling of RR require it being bounded?
06:52 sadmac jeffreykegler: plan is to just turn S->aA into S->a+S , forbid predicting that rule from itself, and fix up the AST afterward.
06:53 sadmac S->aS rather
06:53 jeffreykegler a+S?
06:54 jeffreykegler Sorry, got it.
06:54 sadmac a repeated 1 or more times
06:54 sadmac yeah
06:54 jeffreykegler OK, so you just eliminate the right recursions and post-process.
06:54 sadmac yep
06:55 sadmac the post processing fits into the existing framework for generating AST nodes anyway
06:55 jeffreykegler Which does look *very* like recursion descent
06:56 sadmac it's cheap in both speed and complexity
06:56 sadmac ceylon's grammar itself doesn't use a lot of right recursion otherwise I'd do it sooner.
06:57 jeffreykegler That is, you think you might move to Leo?
06:57 jeffreykegler "do it"?
06:57 sadmac implement the rewrite I just described
06:58 sadmac Ceylon's grammar itself doesn't use a lot of right recursion, so we don't need it right away
06:59 jeffreykegler Is that framework only intended to support Ceylon itself?  Or were you thinking of it as a grammar-writing mechanism for Ceylon users?
07:00 sadmac a grammar-writing mechanism
07:00 sadmac but self-hosting ceylon is the first use case
07:00 jeffreykegler I'm reminded of Perl 6 again and again.
07:00 jeffreykegler Except of course Larry did not use an Earley's engine.
07:04 jeffreykegler Which makes a big difference.
07:04 jeffreykegler It's late here in CA.  Good night!
07:05 sadmac good night
08:59 lucs joined #marpa
09:39 koo5 joined #marpa
13:57 lwa joined #marpa
16:45 jeffreykegler joined #marpa
16:55 btyler joined #marpa
18:29 flaviu joined #marpa
19:30 koo5 joined #marpa
19:54 koo5 joined #marpa
20:13 jeffreykegler joined #marpa
22:15 koo5 joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary