Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2015-01-29

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:05 idiosyncrat AFK
00:25 jeffreykegler joined #marpa
00:34 ronsavage lwa: Hero! Take the rest of the day off. Tell your boss I said it was OK :-)
02:45 jeffreykegler joined #marpa
02:48 ilbot3 joined #marpa
02:48 Topic for #marpa is now Start here: http://savage.net.au/Marpa.html - Pastebin: http://scsys.co.uk:8002/marpa - Jeffrey's Marpa site: http://jeffreykegler.github.io/Marpa-web-site/ - IRC log: http://irclog.perlgeek.de/marpa/today
03:38 jeffreykegler http://www.newyorker.com/magazine/2015/02/02/pursuit-beauty
03:39 jeffreykegler Reading this about Yitang Zhang -- it makes an interesting point relevant to Marpa.
03:40 jeffreykegler It points out that the tenure track requires you to publish frequently.  It does not allow you the luxury of working exclusively on something like Earley's algorithm for 7 years.
03:54 RandalSchwartz I think that's the thing that I hope came across in our podcast... you were able to syhnthesize the multiple approaches because you weren't under "publish or perish"
03:54 jeffreykegler Yes.
03:55 RandalSchwartz actually, I'd say you had at least two "aha!" moments
03:55 RandalSchwartz putting both of the optimizations in
03:55 RandalSchwartz and figuring out how to write the L0 and G1 in the same file.
03:55 jeffreykegler And rather than have to do a quick paper and move on, I could solve all the implementation problems, and create a working useable version.
03:55 RandalSchwartz although the latter was someone else, no?
03:56 jeffreykegler The pioneering implementation of a 2-grammar parser was Andrew's
03:56 jeffreykegler You remember -- the guy who claimed he only moved a few semi-colons. :-)
03:56 RandalSchwartz would it ever make sense to make that 3 levels?
03:57 jeffreykegler Why not?
03:57 RandalSchwartz well - what problem would it solve?
03:57 jeffreykegler But it's more efficient to milk the most out of each parser ...
03:57 RandalSchwartz yeah
03:57 RandalSchwartz really - humans think of "things" and "arrangement of things"
03:58 jeffreykegler and Marpa is powerful, so I haven't really come across any uses for more than 2 grammars yet.
03:59 jeffreykegler Yes, there does seem to be something about the lexical/structural distinction built into the way people want languages to look.
03:59 RandalSchwartz Yeah, we're not wired for much more.
03:59 RandalSchwartz except maybe politicians :)
04:00 RandalSchwartz ok - I'm heading for bed, waking at 3am west coast.  as always, good chat. we must do more of this.
04:00 jeffreykegler Good night!
05:14 ronsavage Just wrapping up the first version of my new module, Text :: Table :: Manifold, which supports, err, manifold output formats. More soon.
08:34 ronsavage joined #marpa
08:43 ronsavage Text :: Table :: Manifold supports output as (1) boxed with ASCII chars (rendered via internal code), (2) CSV via Text::CSV, (3) Github (internal), (4) HTML (internal) and (5) HTML via HTML::Table. I will probably add support for PDF::Table too. And all this thru a single, unified, interface.
09:49 lwa joined #marpa
12:51 koo5 joined #marpa
17:31 koo5 joined #marpa
20:05 jeffreykegler joined #marpa
20:35 koo5 joined #marpa
20:37 sadmac jeffreykegler: Hi. I've been writing a parser generator for Ceylon (ceylon-lang.org). I hadn't heard of marpa when I started but I also decided on Earley. I'd love to get your thoughts.
20:40 jeffreykegler Ceylon is new to me.  It looks interesting.
20:43 jeffreykegler At a glance it looks like a typical block structured language, similar to those for which Marpa parsers exist.
20:44 jeffreykegler Of course, the real test in this case is *not* the big picture, but dealing with a few hard-to-handle corner cases.
20:46 jeffreykegler But Marpa can parse anything that's currently being parsed -- it's as general as traditional Earley's, but it also allows you to mix in procedural logic.
20:47 sadmac jeffreykegler: it's not just parsing ceylon, but writing the parser /in/ ceylon. This is part of the self-hosting effort.
20:48 sadmac jeffreykegler: I'm writing a general purpose parsing library in ceylon, with parsing ceylon itself as the first task.
20:48 jeffreykegler Marpa is not a simple parser to implement.
20:48 Aria It really isn't.
20:49 sadmac well, I have general Earley parsing done
20:49 jeffreykegler There are several other efforts to re-implement it underway.
20:49 jeffreykegler That's a good start.
20:49 sadmac Leo optimization is done as well.
20:49 jeffreykegler That's a *very* good start.
20:49 sadmac s/done/coming/
20:49 sadmac sorry
20:50 Aria What kind of tokenizing are you doing?
20:50 jeffreykegler Have you looked at the pseudocode in my paper?
20:50 sadmac Aria: it's sort of a non-deterministic demand-based tokenizer.
20:50 Aria Nice. Sounds like a good start for marpa too.
20:51 sadmac Aria: each token has a separate method in ceylon that detects if it is at the beginning of a string.
20:51 sadmac Aria: and we call those methods when we know we want to consume one of those tokens.
20:51 sadmac I also handle zero-length rules, and we have error recovery (the "ruby slippers" system implemented with a running levenshtein distance stored at each injected token)
20:52 Aria Ooo. Interesting.
20:52 jeffreykegler https://www.academia.edu/10341474/Marpa_A_practical_general_parser_the_recognizer is my paper
20:52 sadmac yeah I read it
20:53 jeffreykegler Obviously you've done your homework then.
20:53 Aria Yeah, wow.
20:53 sadmac I read elsewhere that you'd gotten away from using the LR0 states and went back to regular earley states
20:53 jeffreykegler YEs.
20:53 jeffreykegler I was about to mention that.
20:54 jeffreykegler I found LR(0) state-based items counter-productive.
20:54 jeffreykegler At this point I believe I have as much experience with their practical use as anyone -- including Aycock & Horspool.
20:54 sadmac I actually started to run with that idea and went in some radical directions. When I came out the other side, I'd managed to derive GLR parsing from earley parsing XD
20:56 jeffreykegler So you know about the problems with GLR. :-)
20:56 sadmac in general yes
20:57 sadmac My derivation kind of simplified it, which was nice
20:59 jeffreykegler Anyway, bottom line, Marpa has totally moved away from the Aycock&Horspool LR(0) states, or anything derived from it.
20:59 sadmac alright
20:59 jeffreykegler But it does preserve an A&H influence in its rewriting.
20:59 sadmac that just leaves me with Leo optimization to implement
20:59 jeffreykegler Specifically, it's handling of nullable rules and symbols.
21:00 jeffreykegler Also my rearrangement of the parse engine -- the one that makes Ruby Slippers possible.
21:00 jeffreykegler Or have you already done that?
21:00 sadmac I haven't. I'd have to think about where that plugs in to our API
21:00 sadmac https://github.com/sadmac7000/ceylon-parse
21:00 sadmac we have ruby slippers-style error recovery but it's all automated.
21:01 sadmac and doesn't require that restructuring.
21:02 sadmac but then since tokenizers are manually written code, you can still inject false tokens. I'd just have to expose more information to them I suppose.
21:03 jeffreykegler Ruby Slippers is about knowing what the parser wants, so it requires a special order of operations ...
21:03 jeffreykegler though you could imitate it,I suppose, with backtracking.
21:03 jeffreykegler but my guess is that that would be costly.
21:04 jeffreykegler Specifically, in the traditional implementation, an Earley parser is adding items to 2 Earley sets at one time.
21:04 jeffreykegler In Marpa it adds items to only one -- every Earley set is complete before the next one is started.
21:04 ernimril how much of a difference does Leo optimization do for grammars for computer languages? I am building a parser in java for java and for me I did not really see any performance difference. The theoretical difference is big, sure, but seems like the java grammar at least does not hit the worst cases in practice
21:05 jeffreykegler The thing about using existing languages like Java is that they were designed to be parsed with the traditional parsers.
21:06 jeffreykegler So they don't necessarily exercise all Marpa's features.
21:07 jeffreykegler Leo optimization is only an improvement if you're using right recursion.
21:07 ernimril Currently I keep the leo optimizations in a separate branch, I want to investigate it more. I probably want to make my code into a more generic parsing library later on and then it will probably be a nice addition
21:08 ernimril correct, I have not really read through the grammar and thought about how much right recursion I have, quite little I think :-)
21:08 jeffreykegler Where the Leo optimization shines is with techniques like *real* higher order langauges -- languages which write languages.
21:08 jeffreykegler Parsing a very broad set of grammars in linear time, means you can automatically generate languages, and expect them to parse in linear time.
21:09 jeffreykegler ernimril: Have you noticed Marpa's precedence rules -- the ones with all the '||' and '|' operators?
21:10 ernimril jeffreykegler, I am building my own parser in java, so not really using marpa, only reading up on marpa to find nice things to do :-)
21:10 jeffreykegler Well Marpa is an algorithm, so if you find enough nice things, what you are doing becomes Marpa. :-)
21:11 ernimril jeffreykegler, after working with lr(1) parsing and having to change the grammar too much I got fed up and searched for alternatives. Marpa seemed interesting and my current code seems pretty fast
21:12 jeffreykegler Yes, I think my work has revived interest even in the traditional Earley's parser ...
21:12 sadmac jeffreykegler: I'd like to know more about the nullable rule issues. I think I've worked around it already (in nearly the same way A&H did) but I can't be certain and I haven't found an in depth description of the problem.
21:12 jeffreykegler which *is* seriously underrated.
21:12 jeffreykegler sadmac: the one in the A&H paper and mine are about it.
21:13 sadmac jeffreykegler: might have to look at it in yours again. Don't know if I have found a free copy of A&H
21:13 jeffreykegler Aria: you've got a copy on your site don't you?
21:15 jeffreykegler http://dinhe.net/~aredridel/.notmine/PDFs/Practical%20Earley%20Parsing.pdf
21:15 sadmac woot
21:15 sadmac jeffreykegler: one thing that sort of arose out of our particular interface that's proved quite useful is generic rules
21:15 jeffreykegler generic rules?
21:16 sadmac in ceylon.parse, you can do CommaSepList<K> -> K ',' CommaSepList<K> | K
21:16 sadmac and then later FunctionParameters -> '(' CommaSepList<Parameter> ')'
21:16 Aria Yep!
21:17 jeffreykegler Cool.  A true higher order language
21:17 sadmac or, better still: forall(Q): Q -> WhiteSpace+ Q
21:17 sadmac makes the tokenizers much more straightforward
21:18 jeffreykegler sadmac: The SLIF to accomplish its trickery rewrites it's grammar, which is the first thing I think of when I see your kind of generic rules.
21:19 sadmac jeffreykegler: yeah, we do it by rewriting
21:19 jeffreykegler This is where the Leo stuff comes in -- you can rewrite with confidence that the result will parse in linear time.
21:20 sadmac jeffreykegler: all our nonterminals (and terminals) are actually Types in ceylon, so we also handle inheritance (and some of the bigger strangeness of Ceylon's particular type system, like union and intersection types)
21:21 jeffreykegler Cool
21:22 sadmac Hmm. I wonder if instead of Leo optimization I can rewrite right recursion as kleene star rules...
21:22 sadmac we have a short-circuited optimization for those already
21:23 jeffreykegler Why not?
21:23 jeffreykegler Marpa writes its sequences rules into left recursions, but it could re-write them into right recursions.
21:23 Aria That sounds like the same thing.
21:23 Aria That optimization sounds like the leo optimization, at least from here.
21:24 jeffreykegler Aria: depending, the precedence might be difference -- different trees.
21:25 jeffreykegler * difference -> different
21:25 Aria Ah, yeah.
21:27 jeffreykegler A lot of parsers "deal" with right recursion by just parsing it as a list, and leaving it to a back end to figure out the actual structure.
21:27 jeffreykegler Which is leaving much of the job undone, and does not scale well.
21:30 jeffreykegler sadmac, ernimril: If you backlog, you'll notice that the Marpa algorithm is undergoing a further evolution ...
21:30 jeffreykegler A simple core parse engine, enabled by doing more in the grammar rewrite.
21:31 ernimril jeffreykegler, yes, already noticed that
21:31 jeffreykegler * simple -> simpler
21:31 jeffreykegler It still really won't be anything you would call "simple"
21:32 jeffreykegler I've found that doing extensive grammar rewriting not only has minimal cost at evaluation time ...
21:33 jeffreykegler but that it also does not interface with efficient "on the fly" hacks -- mixing procedural logic with the core parse engine ...
21:33 jeffreykegler you can quickly translate from an external to internal rules and symbols, as long as you stay within a certain framework.
21:35 jeffreykegler Some problems that solve with rewrites, require rewrites that scramble the semantics, but I've found a highly useful subset which does not.
21:39 pczarn joined #marpa
21:43 ronsavage joined #marpa
23:00 ronsavage Re http://irclog.perlgeek.de/marpa/2015-01-29#i_10027868. Rejecting the received wisdom of the ancients because it's just not appropriate, is the sign of a great programmer.
23:01 ronsavage Of course that leaves me free to reject good advice and simultaneously pretend I know what I'm doing!
23:01 jeffreykegler Actually I think Aycock at least is younger than I am.
23:19 ronsavage Ahh. Even so, I used 'ancients' as meaning 'experts in the field', meaning 'they got there first'...............

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary