Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-02-12

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:08 jeffreykegler joined #marpa
00:53 lucs jeffreykegler: Hi. I just got back, and I'll try out what you mentioned earlier.
01:15 shadowpaste "lucs" at 70.81.138.180 pasted "'forgiving'" (84 lines) at http://scsys.co.uk:8002/303945
01:16 lucs (There's a question in there at the end.)
01:23 jeffreykegler lucs: re http://irclog.perlgeek.de/​marpa/2014-02-12#i_8270295
01:24 jeffreykegler The parser catches all ambiguities, but the SLIF lexer does *not*, whether it is LTM or LATM.  The SLIF lexer does a longest tokenS match (note "tokens" plural).  So it does recognize ambiguous tokens if they are longest tokens.
01:25 jeffreykegler Ambiguous tokens which are shorter than the match one (whether the discipline is "forgiving" or not) are ignored by the lexer.
01:27 jeffreykegler While inconvenient in this particular example, before wishing it were otherwise, you might want to think out the case of variable names.  Suppose I recognizer all ambiguous lexings of the string "myvar".  To keep things simple let's assume the only lexemes are variable names.
01:28 jeffreykegler One lexing is to find 5 variable names: 'm', 'y', 'v', 'a' and 'r'
01:30 jeffreykegler There are 4 lexings which find two variable names.  The first is 'm', 'yvar', the second is 'my', 'var'.
01:30 jeffreykegler If my combinatorics are right there are 16 possible lexings in all.
01:30 lucs Yeah, I see the combinatorial explosion there.
01:31 jeffreykegler So I think you can see why whether it is LATM or LTM, it is always a *longest* tokens discipline.
01:31 jeffreykegler Note that regexes also default to longest matches, for similar reasons.
01:32 jeffreykegler I actually have toyed with alternatives to longest token disciplines, but they are tricky, and I think this explains why that particular effort is on a back burner.
01:33 lucs I must admit it's not completely clear in my head; how does the line "hello world MEEP MOOP" manage to be seen as four 'word' then?
01:33 lucs (I believe I'm confused and I thand you for your patience!)
01:33 lucs *thank
01:35 jeffreykegler According to your grammar, an <Elem> can contain only one <MeepMoop>
01:36 lucs Right.
01:36 jeffreykegler and if it does contain a <MeepMoop> it must be the only thing in the <Elem>
01:36 lucs That, I believe I understand.
01:37 lucs (hmm... I think I see where you're going...)
01:37 jeffreykegler So there is no legal parse of an <ELem> that contains a <MeepMoop> unless it is the only thing in the <Elem>
01:38 lucs Okay.
01:39 jeffreykegler So do you see why the one parse of the first <Elem> which is reported is in fact the only legal parse?
01:40 lucs Yes.
01:40 jeffreykegler By the way, I don't know if you've tried the progress reports?
01:41 jeffreykegler This grammar is small enough that it'd be a good one to practice on.
01:41 lucs Oh, no, I haven't -- I saw them mentioned in the docs, but didn't get around to using them.
01:41 lucs Okay. Tell you what, let me play with that, and maybe it'll clear things up for me.
01:41 jeffreykegler And you can see step by step the choices the parser sees, as it sees them.
01:41 lucs And if it doesn't, I'll ask more questions :)
01:42 jeffreykegler I mention it especially in this context, because (frankly) the progress reports for a whole parse can get very big.
01:42 jeffreykegler But in this case, I don't think they will be, and the progress reports will be an ideal way to follow what is going on ...
01:43 jeffreykegler and good practice for using them in other situations.
01:43 lucs Yeah, I saw the example, and I was a bit overwhelmed -- now is a better time to try it out, now that I've played more with Marpa.
01:43 jeffreykegler Have you studied Earley parsing before?
01:43 lucs Nope :/
01:44 jeffreykegler OK.  The basic ideas are simple, just follow the doc carefully.  There's also a blog post, from which the doc was taken.
01:44 jeffreykegler Further, the wikipedia page on Earley's while problematic in other aspects, has a very good Earley tutorial.
01:45 ronsavage joined #marpa
01:45 lucs Okay, thanks. I'll find that blog post and study it.
01:46 jeffreykegler By the way, re not having studied Earley parsing ...
01:46 jeffreykegler For many years you could take the graduate level parsing course at a top-level school ...
01:46 jeffreykegler and not even hear Earley's or general parsing mentioned, never mind explained.
01:47 ronsavage jeffreykegler: BTW: We haven't /all/ studied Earley parsing... ROFL
01:48 lucs Heh. That would explain it, but what also explains it is that I've never taken such graduate courses (nor undergraduate for that matter) -- I'm justa programmer, and Marpa appears to be a prettty good tool.
01:48 jeffreykegler Similarly, with textbooks on parsing.  Many full-length authoritative textbooks either do not mention Earley's, or if they do, do exactly that -- have a sentence mentioning it, with no explanation.
01:49 lucs jeffreykegler: I did read your rationale on why Earley parsers are feasable today though.
01:50 jeffreykegler As time went on, and yacc became less popular, the main change was that to de-emphasize the teaching of parsing entirely.
01:51 jeffreykegler They tell that today, you can get a Ph.D from one of the very best schools, and never have taken a compiler or parsing course.
01:51 ronsavage jeffreykegler: When you say "So it does recognize ambiguous tokens if they are longest tokens", does that imply all such tokens are of equal length? (I'm thinking of additions to the docs)
01:51 jeffreykegler That's why when someone says they have a newbie Marpa question, my response is that there is no such thing ...
01:52 jeffreykegler all Marpa question are advanced questions, presupposing a considerable amount of programming knowledge.
01:52 jeffreykegler ronsavage: Yes!  Very good observation.
01:53 jeffreykegler A consequence of my rules is that all of the recognized tokens, even in the case of ambiguous tokens, will be of equal length.
01:55 lucs jeffreykegler: I thought I would be able to easily spot your blog post explaining Earley parsing from here: http://jeffreykegler.github.io/Ocean-of-Aw​areness-blog/metapages/chronological.html , but maybe you can point it out?
01:55 lucs Or maybe tell me which one(s) I should read first.
01:56 jeffreykegler lucs: http://jeffreykegler.github.com/Ocean-of-Awarene​ss-blog/individual/2010/06/jay-earleys-idea.html
01:56 lucs Okay, thanks.
01:56 lucs *feasible  # I knew it didn't look right.
01:57 jeffreykegler By the way, a consequence of Phases 1 and 2 is that Marpa's inner working are now much closer to Jay Earley's original algorithm.
02:05 ronsavage I'm wondering if we need a glossary of acronyms...
02:11 jeffreykegler ronsavage: The #perl6 channel survives without one -- though I sometimes wonder how
02:47 ronsavage It's more your and my docs which I worry about, and how they're seen by newbies...
02:48 jeffreykegler ronsavage: In the docs, with a few exceptions, I spell an acronym out the first time it is used ...
02:48 jeffreykegler exceptions are things like LHS and RHS -- if they don't know those they need to start at the beginning, or with the Wikipedia article on BNF.
02:49 jeffreykegler "the first time it is used" -> "the first time it is used in that document"
03:02 ronsavage Yes, I try to always spell them out at first usage, per document, too. But even so, e.g. LTM and LATM. I support we could have scanned the Marpa docs to find those, but - probably wouldn't..........
03:02 ronsavage OK. I'll resist the urge to make work for myself. But I'll keep it in mind
03:03 ronsavage Off swimming. We all, not just me, need an exercise program.
03:05 jeffreykegler ronsavage: LTM and LATM I use currently as "local" to this IRC discussion.  Any use of the acronyms "LTM" or "LATM" in one of my docs, where it is not spelled out nearby, or at the very least, earlier in the doc, is a documentation bug.
04:14 lucs Um, trying out the example in the Progress POD, I get the expected "The error message", but how do I get "The value of the parse", the program just died?
04:35 lucs Bah, I'm an idiot, never mind :)
05:17 ronsavage joined #marpa
06:45 ronsavage left #marpa
17:03 jeffreykegler joined #marpa
23:44 jeffreykegler joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary