Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2015-08-22

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:20 ronsavage joined #marpa
00:21 ronsavage I've had an idea for parsing POD. Define all POD as discard (or comment-like) lexemes + events, and process those, and actually discard the Perl code. Make sense?
00:35 idiosyncrat_ joined #marpa
00:35 idiosyncrat_ ronsavage: sure
00:36 idiosyncrat_ But if I understand it, you're basically using Marpa as a lexer ...
00:36 idiosyncrat_ which of course you can do, and you get a more powerful lexer whose rules you can define in BNF.
00:37 idiosyncrat_ But it probably won't win any lexer races. :-)
00:44 KotH i guess nobody has a vhdl parser at hand?
01:03 KotH is there a global setting to make all strings case insenstive? or do i really have to add a :i everywhere?
01:03 * KotH can either not see it or is too tired
01:04 KotH s/see/find/
01:04 KotH i guess i'm too tired ^^'
01:19 idiosyncrat_ KotH: nope -- no global case-insensitive option IIRC
01:19 KotH well.. a bit more to type
01:47 ilbot3 joined #marpa
01:47 Topic for #marpa is now Start here: http://savage.net.au/Marpa.html - Pastebin: http://scsys.co.uk:8002/marpa - Jeffrey's Marpa site: http://jeffreykegler.github.io/Marpa-web-site/ - IRC log: http://irclog.perlgeek.de/marpa/today
03:06 mauke_ joined #marpa
03:11 CQ_ joined #marpa
04:34 dvxd we could apply for a darpa grant: expand libmarpa to be a super fast lexer, like ragel is.
04:35 dvxd I love having parsing and lexing all in one, instead of separate.  The separation never made sense to me; just more cognitive load.
05:50 CQ don't know, I always liked the separation because after lexing I knew that I at least had _something_ that I could rely on being correct to some degree, and then in parsing I knew I didn't have to worry about potential lower level problems
05:51 CQ I like abstraction layers
05:51 CQ if something wants to do both, fine, but the layers are separate levels
05:53 CQ btw, why isn't marpa listed in https://en.wikipedia.org/wiki/Parsing or in https://en.wikipedia.org/wiki/Lexical_analysis ?
06:57 lwa joined #marpa
08:12 pczarn joined #marpa
08:12 koo7 joined #marpa
08:14 pczarn By unifying parsing and lexing, you lose LATM.
08:31 koo7 joined #marpa
08:49 rns left #marpa
09:38 ceridwen joined #marpa
10:48 lwa pczarn: What? No. LATM is just a workaround needed because parsing and lexing are separated in the first place. LATM is needed to correctly parse languages designed for LTM, if they have another language with different lexemes embedded in the same grammar (e.g. a programming language with a different language for strings, but in the same grammar).
10:48 lwa Of course, L(A)TM is just a greedy heuristic to limit the memory use of parsers, but this implies that L(A)TM-using parsers cannot recognize all context free languages. Consider the input "aab" and the grammar `S ::= A A B; A ::= 'a' | 'aa'; B ::= 'b'` – Marpa's SLIF chokes on this since it cannot backtrack.
10:53 nox left #marpa
10:56 pczarn joined #marpa
12:41 koo7 joined #marpa
12:43 pczarn lwa: Are you saying that LATM is undesirable, in general?
12:53 pczarn Consider input "aaab" to your example. Such ambiguous parses have factorings
12:55 pczarn Without greedy matching, parsers would have to make sure that e.g. parsing identifiers isn't ambiguous.
13:04 lwa pczarn: LATM is great, especially for practical use cases. At the time, I lobbied for its inclusion in Marpa's SLIF. Separating lexing from parsing makes parsing cheaper, but we trade expressiveness for performance. LATM bridges over most but not all of the resulting impedance between the two levels, and my example shows such a case – just try it out with Marpa::R2 :) After it has recognized "aa", it sees the "b" but expe
13:04 lwa another A production, so it will fail with the "no lexeme found" error.
13:04 lwa Marpa can parse any CFG, but this assumes we have a single grammar that works at the per-character level (which in practice would require excessive amounts of memory). Btw, Marpa can handle ambiguity even at the per-token level.
13:08 lwa Your example "aaab" would be parsed by that grammar, but Marpa's SLIF would only recognize the (S = (A = "aa"),(A = "a"),(B = "b")) parse, not (S = (A = "a"),(A = "aa"),(B = "b")) or (S = (A = "a"),(A = "a"),(A = "a"),(B = "b")) due to L(A)TM
13:09 lwa Obviously, the grammar could be writtend “correctly” as `S ::= A A B; A ::= a | a a; B ::= b; a ~ 'a', b ~ 'b'` which would fix all these problems – because it works on a one token per character level.
13:13 koo7 joined #marpa
13:16 pczarn On the other hand, parses could be ordered by lengths of their tokens
13:18 pczarn in an unified parser
13:19 pczarn a unified parser.
13:24 CQ Low Overhead Audio Transport Multiplex (LATM) ?? ...LATM must be something else...
13:24 pczarn Longest Acceptable Token Matching :)
13:25 pczarn s/Token/Tokens/
13:25 CQ thanks :)
13:48 lwa pczarn: regarding different-length tokens: Marpa can actually do that, but it's forbidden in the SLIF. and leads to all kinds of complications: tokens will start to overlap, and the concept of the current “input location” is terribly muddled (especially when we also have nullable rules).
13:48 lwa https://metacpan.org/pod/distribut​ion/Marpa-R2/pod/Advanced/Models.p​od#The-character-per-earleme-model briefly talks about variable-length tokes. L(A)TM neatly avoids these issues, since it will only pick the tokens with maximum length, not tokens with different length.
13:55 koo7 joined #marpa
14:13 pczarn Ok, I think LATM can be implemented at per-character grammar level, with token completion events and progress reports.
14:17 pczarn Strand parsing will reduce memory requirements
15:17 pczarn joined #marpa
15:19 idiosyncrat_ joined #marpa
15:19 idiosyncrat_ Good morning!
15:20 idiosyncrat_ Kollos will allow both LATM and "seamless" (each character is a lexeme) parsing.  The initial version will be seamless, but there will be no official release until it does both.
15:21 idiosyncrat_ The lexer/parser split does seem "natural" in some ways.  Note that in this IRC conversation, we go to the trouble of inserting spaces ...
15:21 idiosyncrat_ breaking the text into lines ...
15:21 idiosyncrat_ and sometimes even puncuating it!
15:22 idiosyncrat_ You can find ancient Greek papyri without spacing, but there the motive seems to be to save cost.
15:23 idiosyncrat_ On the other hand, "seamless" parsing raises new possibilities.
15:25 idiosyncrat_ Much of my recent slowness with Kollos had been plotting the recce/lexer/grammar interfaces so that they allow both, the abilitity to configure lexers, the ability to change character reading (ASCII-8 vs. UTF-8), etc.
15:25 idiosyncrat_ I'm now past that and working on semantics.
15:26 idiosyncrat_ pczarn recalls a good point -- seamless parsing, with its one Earley set per character is a real hog on space ...
15:26 pczarn lwa recalls that point, too
15:27 idiosyncrat_ seamless parsing will allow the parsing to "summarize" and throw away the Earley sets as it proceeds.
15:28 idiosyncrat_ CQ: Lots of Wikipedia reflects a pre-Marpa point of view on parsing.
15:28 lwa idiosyncrat_: so seamless parsing works like the "reduce" operation in LR parsers, yes?
15:29 idiosyncrat_ I put the bare minimum in, but don't think I will do more because of WP:OWN and because I wouldn't have time for any edit war which might break out.
15:29 idiosyncrat_ lwa: an interesting comparison -- I never really thought of it that way
15:29 lwa (err, strand parsing)
15:30 idiosyncrat_ lwa: I actually misread your mistyping, so I caught your meaning. :-)
15:30 idiosyncrat_ Someday I hope folks will catch Wikipedia up with Marpa.
15:31 idiosyncrat_ If there's a complicated edit that needs my help, let me know, but otherwise I'll spend my time on Kollos.
15:34 idiosyncrat_ AFK for a few minutes
16:03 idiosyncrat_ AFK -- errands
17:40 koo7 joined #marpa
18:00 koo7 joined #marpa
18:54 lwa My SLIF preprocessor is now available on GitHub: < https://github.com/latk/p5-​MarpaX-Grammar-Preprocessor >. Any feedback, feature suggestion, or documentation proofreading is welcome. During the next couple of days I'll iron out a few features before making a CPAN release.
19:37 purmou joined #marpa
19:51 idiosyncrat_ joined #marpa
19:52 idiosyncrat_ re http://irclog.perlgeek.de/m​arpa/2015-08-22#i_11099869 -- lwa's https://github.com/latk/p5-​MarpaX-Grammar-Preprocessor ...
19:52 idiosyncrat_ this will be receiving careful attention from me.
19:52 idiosyncrat_ lwa has a deep understanding of Marpa, and has done excellent work in the past.
19:53 idiosyncrat_ I've been looking forward to this for a long time.
20:07 lwa idiosyncrat_: You once mentioned that your reason for not allowing anonymous rules in the SLIF was debuggability and quality of error messages. At the time, I didn't really “get it”.
20:07 lwa This module grew from me trying to debug a large-ish grammar I was developing, so I wanted to simplify the grammar and create error messages that allowed me to really pinpoint the issues. While this module does not include a full documentation system, the t/json.t test includes a simplified version of what I'm actually using.
20:08 lwa This might be the first experiment trying to tap into the full power of Marpa for error messages, but I'm not sure I've managed to use Marpa::R2 ideally. I'd be really glad if you could look over the error handling function `pinpoint_marpa_error` in that test and see if it could be improved, especially regarding getting the rules that are currently being parsed – `$recce->progress` does not quite work how I expect it.
20:30 idiosyncrat_ Yes.  Did you find a way around the problem?
20:31 idiosyncrat_ Oops! scrolling error -- missed most of your message
20:31 idiosyncrat_ lwa: OK.  I'll try to look at that issue.
21:55 idiosyncrat_ After a quick skim of lwa's https://github.com/latk/p5-​MarpaX-Grammar-Preprocessor ...
21:55 idiosyncrat_ One good idea here is that of an error-checking Marpa-powered JSON parser ...
21:56 idiosyncrat_ we have a bunch of Marpa-powered JSON parsers at this point, including at least one by me, but all of them beg the question, "Why use Marpa?"
21:57 idiosyncrat_ That is, for pure JSON, with all the other excellent fast JSON parsers out there, it's not clear where Marpa adds value.
21:59 idiosyncrat_ And lwa's parser answers that question -- if you want an extensible error-checking JSON parser, Marpa is clearly IMHO the best basis for it.
22:53 idiosyncrat_ joined #marpa
22:54 idiosyncrat_ lwa: re http://irclog.perlgeek.de/m​arpa/2015-08-22#i_11100106
22:54 idiosyncrat_ I tried to describe a strategy for describing JSON errors in a Github issue that I just filed
22:55 idiosyncrat_ https://github.com/latk/p5-Marpa​X-Grammar-Preprocessor/issues/2
22:57 lwa Thank you. For now I'll have to go to bed, but I'm eager to look into these ideas during the course of the next week.
23:14 idiosyncrat_ joined #marpa
23:37 ronsavage joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary