Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-12-29

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:46 flaviu joined #marpa
01:10 ronsavage joined #marpa
02:02 jeffreykegler joined #marpa
02:04 jeffreykegler ronsavage: re http://irclog.perlgeek.de/marpa/2014-12-28#i_9861368 -- looks like it would be fun for Marpa
02:05 jeffreykegler Aria: once there's the ability to take pieces of parses and mix 'em and match 'em all sorts of tricks open up.  Multi-tier, parse trees on errors, and backtracking ...
02:06 Aria Definitely.
02:06 jeffreykegler Backtracking is just taking the subparse so far, truncating it, and starting again with a new parse.
02:07 Aria Yup.
02:09 jeffreykegler Also, *real* combinator parsing, where the subparse has a well-defined interface with its parent ...
02:10 Aria And THAT is what I really, really, really want.
02:10 Aria Not just the union of two grammars with some weird intertwingling rule.
02:12 jeffreykegler The theory works out.  Implementation won't be trivial, but won't be extremely difficult either.
02:14 jeffreykegler Someone wanting to get a head start on the theory can looks at Grune & Jacobs, 2nd ed. pp 401-402
02:14 Aria I so need to buy that.
02:14 jeffreykegler Not really.
02:14 jeffreykegler I'm going to heavily modify their approach anyway.
02:15 jeffreykegler But those who *do* already have a copy and want to "peek ahead" will get a head start on the concepts I'll use.
02:15 Aria That's not the only reason ;-)
02:16 jeffreykegler By the way, on p. 548, they suggest as a potential future practical parser, the Leo algorithm.
02:16 Aria Heh.
02:17 jeffreykegler When I say Leo was *almost* totally ignored, Grune and Jacobs are the reason for the "almost".
02:18 jeffreykegler Even they didn't follow up -- their text basically treats LALR as the state of the art theoretical, then goes on to suggest that you really use left parsing.
02:18 jeffreykegler :-)
02:19 jeffreykegler So that parsing comes out like the Stevenson story, with recursive descent as Mr. Jekylll and LALR as Dr. Hyde.
02:19 Aria Heh.
02:21 jeffreykegler But G&J did not entirely miss the signifigance of Leo 1991 -- but neither they or anyone else followed up for over a decade.
02:22 jeffreykegler Correction -- actually G&J 2nd ed is 2007, the year I started on Marpa.
02:24 jeffreykegler Anyway, I'll get back to writing up the details of piece-wise parsing.  Once that's done, and I've convinced some folks to look at it, we can plot next steps.
02:25 jeffreykegler This puts Kollos on hold a bit, but that will happen a lot -- Kollos is a long term thing.
04:40 sirdancealot joined #marpa
05:34 ronsavage joined #marpa
05:42 ronsavage There's currently a discussion on the dbi-users email list re a new '?' escaping mechanism, and a delimited '?' suggestion has been made(placeholder_escape_delimiters => [ '{{{','}}}' ], among others), using - coincidentally - essentially the same syntax as I've just used in Text :: Balanced :: Marpa. So, I pointed them to the release on MetaCPAN,and we'll see what eventuates.
05:51 ronsavage References for G&J: The book http://dickgrune.com/Books/PTAPG_2nd_Edition/ and the later web-only additions http://dickgrune.com/Books/PTAPG_2nd_Edition/LowAvailability/. No metion of Kegler or Marpa yet.
07:16 jluis joined #marpa
08:00 btyler joined #marpa
09:29 lwa joined #marpa
15:27 rns joined #marpa
15:33 rns re http://irclog.perlgeek.de/marpa/2014-12-27#i_9859598 — perhaps abstract parse forests (ASF's) can be used in partial parse construction?
15:36 rns G&J says "the suffix grammar is ambiguous", and Marpa can handle ambiguity space-efficiently with parse forests.
15:41 rns As an aside, can't escape a feeling that partial (piece) parse results represented with a parse-forest grammar (G&J, 3.7.4, p. 91) is at least related to "left rules" grammar.
15:43 rns Those are just random thoughts as I'm reading constant_space.md and re-reading Grune, out here as a braindump.
15:45 rns Overall the subject is huge, and piece-wise parsing shows great promise, in constant-space parsing, threading, and composabilty and I'm still wrapping my head around it, I must confess.
15:47 rns re http://irclog.perlgeek.de/marpa/2014-12-28#i_9861043 — Marpa::R2 can show "parse-so-far" — https://metacpan.org/pod/distribution/Marpa-R2/pod/Tracing.pod#Dump-the-parse-value — and this is going to be even ore useful.
15:47 rns s/ore/more
15:48 rns left #marpa
16:29 jeffreykegler joined #marpa
16:31 jeffreykegler rns: re Marpa's current "value so far" capability -- http://irclog.perlgeek.de/marpa/2014-12-29#i_9863977 -- the key word in the doc is "sometimes" -- that is, there is *sometimes* a value, even on error.
16:32 jeffreykegler With the "tree so far" capability I am talking about, there will *always* be a tree.
16:34 jeffreykegler re http://irclog.perlgeek.de/marpa/2014-12-29#i_9863929 -- I'm using a variation of suffix grammars, which I call a "split grammar".  This breaks all non-nulling rule into "split pairs" or rules -- one in each pair left, the other right.
16:35 jeffreykegler When two subtrees are to be joined there is a left and a right -- the left one has left split rules, the right one has right split rules.
16:37 jeffreykegler All rule in the right subtree begin with a special pseudo-lexeme -- a right connector lexeme, and before resuming the actual input, I read the appropriate pseudo-lexemes at the right subtree's first position --
16:38 jeffreykegler this means that *not all suffixes* are parsed -- just those predicted by the left subtree ...
16:38 jeffreykegler so the right subtree will only be ambiguous if the parse is.
16:39 jeffreykegler rns: re http://irclog.perlgeek.de/marpa/2014-12-29#i_9863918 -- yes, I think I will use the ASF's and they do handle ambiguity --
16:41 jeffreykegler while subtree parsing does not *introduce* ambiguity into the parse, it does *handle* it.
16:42 jeffreykegler rns: I'm hoping for your help with some of this code, because at this point you understand Marpa and SLIF internals and near-internals better than anyone except me.
16:44 jeffreykegler And, yes, I agree, re http://irclog.perlgeek.de/marpa/2014-12-29#i_9863972, it looks like this may be huge, opening the door to many new techniques.
16:45 jeffreykegler ... it's not just the space hack that was my original motivation for looking into it.
16:49 rns joined #marpa
16:50 rns re http://irclog.perlgeek.de/marpa/2014-12-29#i_9864210 — sure, hope I have some free time in Jan and Feb next year and will be in touch.
16:51 rns I'm trying to visualize the piece-wise parsing: so the recognizer starts with g1 (in constant_space.md terms ) and reads lexemes until
16:52 rns (1) memory limit is reached or (2) there is parse error (lexeme is rejected, no more lexemes to read).
16:54 rns At that moment, there are "parse-so-far" data, which serve as the basis for split grammar building -- the split is done based on dotted rules, which show the parsed/unparsed border.
16:56 jeffreykegler Yes, basically.
16:56 rns The split grammar produces left and right grammars and connecting lexemes, so the next piece parsing starts with the new (right) rules and (connector) lexemes.
16:56 jeffreykegler Yes.
16:57 rns Ok, glad I got it right.
16:57 jeffreykegler Connector lexemes ensure that right split rules will only be in the right parse if they'd be in a standard "unified" parse.
16:58 rns Ok.
16:58 jeffreykegler So the split parse has all the same properties as the unified parse would -- it's exactly the unified parse cut in two.
16:59 jeffreykegler And connector lexemes make it straight-forward to put it together -- you just "connect" matching connector lexemes.
16:59 jeffreykegler Ambiguity is not introduced, and is not particularly tricky where it exists.
17:00 jeffreykegler Leo items are a bit more tricky, but the Leo optimization can also be preserved.
17:00 rns Ok and then, once the right piece is parsed, its data can be joined to the left parse data (the previous piece) to form the unified "parse-so-far", which can be final if there is no more lexeme to read or there is a parse error.
17:00 jeffreykegler Yes.
17:01 jeffreykegler Note that the "split point" need *not* be a place where there's an error or you are out of memory.
17:01 jeffreykegler For stuff like combinator parsing, you can put the split anywhere.
17:02 jeffreykegler And you can also put the split *before* where you've reached in the parse -- doing this you can implement backtracking.
17:02 rns So, basically, the application can raise the split point at any time, e.g. with events.
17:03 jeffreykegler Right.  Which really opens up the possibility for using this.
17:04 jeffreykegler * possibility -> possibilities
17:04 rns Sure.
17:04 rns re ambiguity: it isn't introduced, it's just that if the parse is ambiguous at the split point, it can be handled with e.g. parse forests, just Marpa does currently.
17:05 rns s/just Marpa does/just like Marpa does/
17:05 jeffreykegler Yes, correct.
17:05 jeffreykegler If you think about it, ...
17:06 jeffreykegler the actual mapping from left-right split rules to original rule is very straightforward.
17:06 jeffreykegler Once you get used to the idea, it seems almost obvious.
17:08 rns Well, my thinking from the table constant_space.md is that if the left rule and the connector are known, they produce the right rule and thus the original rule.
17:09 rns And the left rule (and thus the original rule) is defined by the right rule and the right connector, basically.
17:09 jeffreykegler Yes, in fact, you can deduce all of left rule, right rule, and the connector lexemes from the dotted rule.
17:09 jeffreykegler Right, which makes them easy to stitch back together.
17:10 rns And, e.g. c1L and c1R in
17:10 rns X-L ::= c1L            X-R ::= c1R A B C
17:11 rns are just syntax for
17:11 rns X ::= . A B C -- dotted rule
17:13 rns BTW, looks like there is a problem with this sentences in constant_space.md
17:13 jeffreykegler The <c1L> and <c1R> symbols will be used by the algorithm which joins the two trees up.  That number inside the two needs to be unique.  That way matching up right and left is very straighforward.
17:13 rns Ok.
17:13 rns The problematic (I think) sentences:
17:13 rns Pairs 1, 3 and 5 represent splits between symbols -- these will be called "inter-split pairs".
17:13 rns and
17:13 rns Pairs 2, 4 and 6 represent splits within symbols -- these will be called "inter-split pairs".
17:14 rns The latter must be "intra-split"?
17:14 jeffreykegler Yes, definitely a problem.  And your suggested fix is correct.
17:16 rns Ok, I'll file a PR.
17:16 jeffreykegler Thanks!  The PR's are a big time-saver!
17:17 rns Sure and history is preserved.
17:22 rns One thing I'm having a hard time understanding is
17:23 rns why a dotted rule, e.g. X ::= . A B C needs 4 rules (2 pairs: 1 & 2) to represent?
17:23 rns i.e.
17:23 rns 1: X-L ::= c1L            X-R ::= c1R A B C
17:23 rns and
17:24 rns 2: X-L ::= A-L c2L        X-R ::= c2R A-R B C
17:24 jeffreykegler This is one place where I had to improve on G&J's suffix grammars.
17:25 jeffreykegler The difference between 1 and 2 is that in 1 the split occurs *between* symbols (in this case actually, only before one, but you get the idea).
17:25 jeffreykegler In 2 the split occurs in the middle of the symbol <A>, so there has to be an <A-L> and an <A-R>
17:26 jeffreykegler The G&J grammars are simpler, but they don't allow me to accurately control *which* suffixes match the prefix (that is, the left subtree).
17:27 jeffreykegler To make sure the suffix parse perfectly matches a prefix I need more rules.
17:27 jeffreykegler The G&J grammar finds *any suffix*, which is also why it introduces ambiguity
17:28 rns Yes.
17:30 rns "in the middle of the symbol <A>" meaning that there is, e.g. rule A ::= a b c? and dotted rule A ::= a . b c?
17:31 rns or dotted rule A ::= a b . c
17:31 jeffreykegler There will also be a set of split rules for <A>
17:31 jeffreykegler **BUT**
17:32 jeffreykegler note that you can use the left subtree to determine exactly which right rules you will need, and in that way limit what goes into the right grammar "on the fly" ...
17:32 jeffreykegler however, putting *all* the right rules into the right grammar does no harm -- they won't be used if their connector lexemes don't appear at right position 0.
17:35 rns "right position 0" meaning "left position -1" (the last left lexeme)?
17:36 jeffreykegler I was trying to say "position 0 of the right parse"
17:36 rns Oh yes, got that.
17:36 jeffreykegler The connector pseudo-lexemes (and only those lexemes) go there.
17:37 jeffreykegler You use these to match up the left and right "edges", and then you throw the pseudo-lexemes away.
17:38 rns "connector lexemes" are pseudo-lexemes (generated on the fly), ok.
17:39 rns and once the pseudo lexemes are thromn away, we arrive at the original rule, so to say.
17:39 rns s/thromn/thrown/
17:39 jeffreykegler Yes.
17:40 rns I take it you're going to implement piece-wise parsing as part of libmarpa, like, e.g., events?
17:40 jeffreykegler One mildly tricky aspect is ambiguity, where the left and/or right side has the same connector pseudo-lexeme more than once.
17:41 jeffreykegler No, not in libmarpa.
17:41 jeffreykegler This is upper layer stuff.
17:41 rns Ok, so as part of Kollos, in lua perhaps? or a separate c lib?
17:41 jeffreykegler Back to ambiguous pseudo-lexemes -- in that case you generate rules for *all* matchups.
17:42 jeffreykegler Originally in Lua.
17:42 rns ok.
17:42 jeffreykegler But I'm also hoping to get someone to prototype this trick in a dedicated JSON parser -- I don't think I'll have time for it myself.
17:43 jeffreykegler That could be mixed C/Lua -- Lua so you don't have to reinvent a lot of data structures.
17:44 rns ok.
17:45 jeffreykegler For the JSON, the split rules could be written by hand.
17:45 rns I think a JSON parser you were going to build in C/Lua based on libmarpa and json.c can be a good start.
17:46 jeffreykegler I'm still at work on that :-)
17:46 jeffreykegler I needed to work out the layout of the Kollos modules -- namespace stuff.
17:47 rns Ok. :) re http://irclog.perlgeek.de/marpa/2014-12-29#i_9864562 — yes, the JSON grammar isn't that large.
17:47 rns re http://irclog.perlgeek.de/marpa/2014-12-29#i_9864534 this is where I can't help thinking about parse forest(s) (grammars).
17:47 jeffreykegler Also, I've been working (for Kollos) on a seamless error handling system -- one that both the C code and Lua code can share, with the same set of error codes, etc.
17:48 rns Ok. Sorry, have to go AFK, will backlog tomorrow.
17:48 jeffreykegler Bye!
17:48 rns AFK
17:48 rns left #marpa
18:27 sirdancealot joined #marpa
20:31 ronsavage joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary