Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2015-09-18

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:02 idiosyncrat joined #marpa
12:10 koo5 joined #marpa
14:22 Aria I am excited about this paper.
15:00 rns joined #marpa
15:02 rns Aria++.
15:03 rns Same thing here -- it's interesting to see how it's taking shape; going to be fun to translate.
15:03 rns And \hyperref’s are way cool, yes. :)
15:45 ceridwen idiosyncrat, Aria: I'm working on a Python implementation of the GLL algorithm (though at the moment I have some other non-parsing-related things I've been putting work into), so I've studied the papers.  GLL works on general CFGs and produces an SPPF.  Like GLR, it doesn't backtrack but rather has a mechanism for following all possible outcomes when it hits a non-deterministic rule.  Unlike GLR, there's no LR automaton involved, and its code looks
15:45 ceridwen a lot more like recursive descent.  GLR, Earley, and GLL are all closely related because they're all trying to solve the same problems.  A binarized SPPF is the only way to handle ambiguity in O(n^3) space.  They all have mechanisms for figuring out which parse paths have already been followed and which still need to be followed.  (Earley sets vs. graph-structured stacks.)  GLL traverses the parse rules depth-first pre-order, GLR depth-first post-or
15:45 ceridwen der, and Earley breadth-first.
15:47 Aria Yup.
15:48 djns joined #marpa
15:49 ceridwen The published versions of GLL are linear on LL(1) grammars, AFAICT---they use the standard recursive-descent approach of using the next character to decide which parse rules to follow and which to reject, so for non-LL(1) rules, they will take multiple paths and should thus be O(n^2).  I'm exploring the possibility of combining GLL with the techniques that ANTLR4 uses, using something like ATNs to predict which parse paths to follow, to make it line
15:49 ceridwen ar on all LL-regular grammars, while avoiding ANTLR4's fall-back to ordered choice.  There hasn't been much done on comparing GLL, GLR, and Earley, I would love to see more research on that.
15:50 ceridwen idiosyncrat, Can you expand on, "
15:50 ceridwen I *do* notice some nice touches -- they know enough *not* to claim constant time for hashing.
15:50 ceridwen which means their *implementation* is not constant time.
15:50 ceridwen (If anybody wonders why I spend the time describing PSL's in my paper, that's why)"
15:50 ceridwen I'm not sure I understand what you mean here.
15:55 ceridwen I think SPPF is more descriptive than bocage, via the tree-forest analogy.  Irrespective of that, I think SPPF is too long-established in the literature to change now---it was in use in the computational linguistics literature at least by the early 90s.  I don't think Scott has precedence on the idea of binarizing forests, wasn't that idea first presented in Billot and Lang's 1989 paper?
15:56 ceridwen Scott certainly gets credit for popularizing it, though, and bringing it to Earley's algorithm.
17:16 pczarn joined #marpa
17:21 idiosyncrat rns: re http://irclog.perlgeek.de/marpa/2015-09-18#i_11240066 -- translation into Russian?  That would be really nice.
17:24 idiosyncrat ceridwen: http://irclog.perlgeek.de/marpa/2015-09-18#i_11240428 -- a lot of people believe hash searches are constant time, but they are not
17:25 idiosyncrat see https://en.wikipedia.org/wiki/Hash_table
17:25 pczarn ceridwen: If you use a hash set for preventing duplicate Earley items, processing each item is something like O(log log n).
17:26 idiosyncrat They can be said to be O(1) "average" time, but that depends on how you average.
17:27 idiosyncrat The paradox of saying that a constant time algorithm is also an opening for a denial-of-service attack should strike people.
17:27 pczarn Only static hash map searches can be O(1).
17:28 idiosyncrat Many writers claim linear time for theory purposes, based on a possible but hard-to-implement linear search, and then simply throw in a hash in the actual implementation.
17:29 idiosyncrat ... which is an entirely fair thing to do.
17:29 idiosyncrat In the Libmarpa implementation, however, my searches *are* linear -- none of them are hashes.
17:29 idiosyncrat And PSLs, described in my original paper are one way I do it ...
17:30 idiosyncrat PSLs are my name, but the technique is referred to as earley as Jay Earley's paper, so it apparently was well known by then.
17:31 idiosyncrat Even so, AFAIK, my Marpa paper is the first description of the details of a PSL implementation.
17:37 idiosyncrat ceridwen: re http://irclog.perlgeek.de/marpa/2015-09-18#i_11240477 -- thanks!  Good catch
17:37 idiosyncrat ... I'd somehow come away the notion that Scott invented SPPF's -- no fault of hers, just my over-hasty reading.
17:38 idiosyncrat s/come away the notion/come away with the notion/
17:39 rns idiosyncrat: re http://irclog.perlgeek.de/marpa/2015-09-18#i_11240921 -- yes, and perhaps Ukrainian. Now I'm reading it to get comfortable with the terminology, so feel free to ping me when you feel that it's good enough to start.
17:40 idiosyncrat rns: I was thinking of releasing it in pieces, as I "finish" the pieces.
17:40 idiosyncrat I was wondering if there was enough interest to merit that, but there is, for which I am grateful.
17:41 idiosyncrat My opinion is best *not* to start actually translating until you have the entire thing in finished form.
17:41 rns My thoughts exactly.
17:42 idiosyncrat It is quite possible that changes I need in the later proofs can force a redo of earlier sections ...
17:42 idiosyncrat and this redo's, alas, can be major.
17:44 rns Ok. it'll take some time to match the terminology that can be done while you're writing as I'm going to, so it's worth to wait for the entire thing to be ready, e.g. 'at beta'.
17:46 idiosyncrat Once I'm more settled I may have suggestions -- not of course as to Russian/Ukrainian vocabulary, but my intent behind my choice of English words.
17:47 idiosyncrat In particular I teased apart some confusingly similar notions lumped together with the terms null and nulling.
17:47 rns Sure, some kind of glossary seems to be in order anyway, like in the beginning of marpa.w
17:48 rns re null and nulling -- in Russian/Ukrainian it's a bit easier as they have inflections.
17:48 idiosyncrat I think overloading of the terms null/nulling has caused trouble in the past, because the great Jay Earley's original paper has serious bug around the handling of nulling symbols.
17:49 rns It surely could.
17:49 idiosyncrat Actually right now I use null/nulling/nullable in their original sense, but only for narrow technical purposes in describing symbols.
17:49 rns yes.
17:50 idiosyncrat When an Earley item has nulls postdot, I call it fleeting, otherwise it is *lasting*.  These terms refer only to postdot behavior.
17:50 rns Ok.
17:50 rns Also, telluric/ethereal can be, well, interesting to render.
17:51 idiosyncrat When an Earley has nulls predot, I call it ethereral.  Otherwise it's telluric.  This is to emphasize that their *source* is in the behavior of null symbols.
17:52 idiosyncrat Ethereal/telluric are terms from archaic philosophy, and so suggest a pre-occupation with the nature/source/essence of things.
17:53 idiosyncrat Finally, telluric is also used instead of non-nullable, because, while I don't know about anybody else, the term "non-nullable" seems awkward and confusing to me.
17:53 idiosyncrat Telluric suggests "grounded", "of the earth" and "really there"
17:54 rns Ok. I also thought about elements, like earth and air, or rather ether.
17:54 idiosyncrat Ethereal suggests "coming out of nowhere" and being in some sense there but not really.
17:55 rns Yes. They are the terms you've introduced, aren't they?
17:55 idiosyncrat Yes, they're all new.
17:56 rns Ok.
17:56 idiosyncrat IMHO the literature tends to be over-conservative with confusing terminology.
17:58 rns Yes, Occam's razor can get overused at times.
17:59 idiosyncrat Another reason for different terms, btw, is that a symbol in Marpa is either nulling or non-nullable -- there is no such thing as an internal Marpa symbol which is nullable but not nulling.
18:00 idiosyncrat So non-nullable and nulling are in the Marpa context opposites, though they are not opposites in other contexts.
18:01 idiosyncrat IMHO "tellluric" versus "nulling/ethereal" makes this easier.
18:01 idiosyncrat The new terms highlight a number of distinctions obscured by the old ones.
18:02 idiosyncrat When you add an nulling symbol to a sentence (string of terminals) you get the identical sentence.
18:03 idiosyncrat When you add what was called a nulling symbol to a RHS, or a sentential form, the result is *not* the same -- it's a list of symbols with an additional symbol.
18:04 idiosyncrat They're actually very different concepts.
18:05 rns Ok.
18:12 rns I'm thinking about a good pair of synonyms to telluric/ethereal -- real/imaginable, material/immaterial, tangible/intangible?
18:12 rns of known origin/of unknown origin?
18:14 rns Sorry, have to go, will backlog. AFK
18:14 rns left #marpa
19:02 CQ joined #marpa
19:45 idiosyncrat rns: re http://irclog.perlgeek.de/marpa/2015-09-18#i_11241346 -- I rejected all of these in English because of overloading and confusion, and they don't really capture the idea -- ether was thought to be real and material and if not tangible, no less so than a lot of stuff that's important in physics.
19:46 idiosyncrat I'd think the language of Mendeleev would have a term for ether.
19:47 idiosyncrat As for telluric, "mundane" was an alternative, but I preferred telluric because it is part of the set:
19:47 idiosyncrat chthonic, telluric, ethereal
19:48 idiosyncrat Nothing about Marpa is especially chthonic, so I only use two of the triple. :-)
19:49 idiosyncrat The terms come from alchemy, I believe.
19:55 rns joined #marpa
20:03 rns idiosyncrat: re http://irclog.perlgeek.de/marpa/2015-09-18#i_11241989 -- :)) good catch.
20:04 rns And yes, AFAICT, Marpa is rather mercurial. :)
20:04 rns left #marpa
22:26 ceridwen idiosyncrat, In which paper do you discuss PLS, and/or what does the abbreviation stand for?
23:30 idiosyncrat_ joined #marpa
23:30 idiosyncrat_ ceridwen: https://www.academia.edu/10341474/Marpa_A_practical_general_parser_the_recognizer
23:31 idiosyncrat_ Whenever I say "the Marpa theory paper" that'
23:31 idiosyncrat_ s what is being referred to.
23:32 idiosyncrat_ PSL means "per-set list" -- they are probably not helpful in a context where there are no Earley sets.

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary