Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2015-05-19

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
02:51 ronsavage joined #marpa
03:10 koo6 joined #marpa
03:18 rns idiosyncrat: re location objects -- thanks, note taken.
03:33 ceridwen idiosyncrat: I didn't mean to be rude or pick nits---I was obliquely wondering aloud if you saw something that I didn't about the memory implications for Marpa.  I should have been clearer.  And yes, time complexity is a completely different matter.  Speaking of which, I kept thinking about this question of binarizing grammars.
03:41 ceridwen The obvious case of relevance to parsing is binarizing SPPFs to reduce their complexity from unbounded polynomial to O(n^3).  Outside of that and CYK, I can't think of anything else in the CFG world that's concerned with binarization---I know there are some techniques for parsing subsets of the CSGs that use binarization.  I'd expect grammar binarization to affect the size of the grammar more than the sp
03:41 ceridwen eed of the parser.
03:43 ceridwen Functional languages traditionally compile patterns to automata.
03:47 ceridwen And there are two standard approaches, one which aims to prevent blowup of code size, the other of which aims to ensure linear time dependence on the input size (the object to be matched against).  Binarization can hurt performance there, it turns out.
03:47 ceridwen Anyway, I'd like to hear more about whatever you come up with.
03:48 ceridwen I've been trying to figure out how best to implement parser combinators lately.
03:51 ceridwen And traditionally, sequence and alternation combinators are binarized.  For the algorithm I'm looking at (GLL), there's no good reason to binarize the alternation combinator and a very good reason not to.  For the sequence combinator, it's a lot less clear---I've been trying to figure out if binarizing sequence will make it easier to construct the SPPF (which does have to be binarized) and if so what the
03:51 ceridwen performance implications of doing that are.
04:31 idiosyncrat joined #marpa
04:32 idiosyncrat ceridwen: I will admit to being slightly sensitive on the matter of optimization
04:32 idiosyncrat The 3 temptations of Marpa are ignoring speed, multi-threading and object-oriented
04:33 idiosyncrat All good things in themselves, but they are taken way too far.
04:34 idiosyncrat If allowed to, they engulf new projects (not just Marpa) and cause hours of talented programmer time to go completely down the drain.
04:34 ceridwen I agree on all counts.
04:34 ceridwen I think that the Python community often has a too-cavalier attitude towards speed.
04:35 ceridwen Though that seems to be turning around some.
04:35 idiosyncrat And if you look at application software, it's now very slow to do simple things even on powerful chips.
04:35 idiosyncrat Layer after layer of software written by folks who were drilled into ignoring speed.
04:36 idiosyncrat Though I will reiterate, that I can't remember the last time I rewrote code for speed, as opposed to time complexity, without profiling.
04:37 idiosyncrat I've always profiled, had another reason for the change, or actually been after reducing time complexity.
04:38 idiosyncrat Anyway, your other questions ...
04:39 idiosyncrat Binarization -- Marpa uses SPPF's -- I invented them independently of, but after E. Scott.
04:39 ceridwen I'm so annoyed by the slowness of existing Python parser libraries I decided I needed to write my own, and parsing isn't exactly rocket science...
04:39 idiosyncrat I call them bocage, but they're almost exactly the same thing.
04:40 idiosyncrat Often ignored is that your parser has to go through your evaluation logic
04:40 ceridwen Parse forests go back a ways, but I haven't been able to trace their history very well.
04:40 idiosyncrat So if the evaluator is binarized and the parser is not, there's a conversion.
04:41 idiosyncrat Was it you who pointed me to an Earley's with regular expression paper the other day?
04:42 ceridwen Yes, I think so.
04:42 ceridwen I think Scott was the first one to propose binarization in the literature to solve the unbounded-polynomial time problem.
04:42 ceridwen Or at least the first I've found.
04:42 idiosyncrat Who were the authors?
04:43 ceridwen Jim and Mandelbaum, I think.
04:43 idiosyncrat Right, Ok
04:43 idiosyncrat I read Jim & Mandelbaum carefully some time ago.
04:43 ceridwen By "evaluator" what do you mean?
04:43 idiosyncrat And there's a lot in it about reducting the number of states.
04:43 ceridwen Yes.
04:44 idiosyncrat Evaluator: Earley's does not actually produce a parse tree, just Earley sets -- a parse tree have to be extracted from those.
04:44 ceridwen Right.
04:45 idiosyncrat That's, and all the stuff to interate and traverse the parse trees I am calling in this context the "evaluator"
04:45 idiosyncrat s/reducting/reducing/
04:45 idiosyncrat Anyway, Jim&Mandelbaum do not address the issue of evaluation --
04:46 idiosyncrat and I cannot imagine any way of using their method without having to do as much and more work unraveling their optimization in the evaluator than it saves in the recognizer.
04:47 idiosyncrat Because in the papers on Earley's they often talk about the recognizer in isolation, but in real life (with important exceptions) nobody parses ....
04:47 ceridwen Yes.
04:47 idiosyncrat unless they plan to construct a parse tree and evaluate it in some way.
04:48 idiosyncrat That is, there is (almost) always a semantics of interest.
04:49 idiosyncrat Much of my work with Leo's modification was figuring out how to undo his optimization in the evaluator without losing its benefits.
04:50 idiosyncrat All of which is to try to make the point that binarizing Libmarpa leads to complicated tradeoffs, which I have yet to assess ...
04:50 idiosyncrat A binarized grammar will be great (I hope, anyway) when it gets to the SPPF/bocage in Marpa.
04:51 idiosyncrat But it does mean the recognizer is dealing with lots of extra rules and symbols.
04:51 idiosyncrat Which on the other hand now fall into a very limited set of patterns.
04:52 idiosyncrat My plan for deciding -- I'm going to have to get into the Libmarpa code again for other reasons -- and at that point I'll be able to get a feel for the payoffs of binarization.
04:53 ceridwen Yes.
04:54 idiosyncrat ceridwen: I think that answers your questions (and perhaps even goes on and on a bit)
04:55 ceridwen Yes, it does.  Thanks.
04:57 ceridwen As I mentioned, I'm sorting out similar issues myself, so I'm interested when you do start working on it what you decide to do.
04:57 idiosyncrat It'll be a while ...
04:57 idiosyncrat right now Kollos will have a strange setup ... I've changed Libmarpa to limit rules to 7 RHS symbols.
04:58 idiosyncrat But the layer above it in Lua (the KLOL) will actually binarize.
04:58 idiosyncrat This way I'll be able to assess some of the trouble I might be getting into ...
04:59 idiosyncrat without the commitment of changing Libmarpa's C code.
05:00 idiosyncrat Anyway it's 10PM California time, and I should wind down.
05:00 idiosyncrat My apologies if I get into a bit of a rant, ...
05:00 ceridwen I understand.
05:01 idiosyncrat but anytime one of the Three Temptations of Marpa arise, I get very worried. :-)
05:02 ceridwen Yes.  I briefly thought about parallelization, then I realized it's simply not a very good fit for parsing.
05:03 ceridwen Neither is object-orientation, for that matter.
08:34 KotH joined #marpa
08:34 KotH a wonderfull good morning!
08:45 DrForr Indeed.
08:50 * KotH tries to wrap his head around how to build line based grammar
08:51 KotH is there a cannonical way how to seperate G1 bits by ^ and $ ?
09:08 koo6 joined #marpa
09:42 rns KotH: perhaps you can use \n-separated sequence, e.g. https://gist.github.com/rns/f10b43c1953112e6adad
09:44 rns Not sure I got the question right though.
09:47 KotH rns: hmm.. that might work
09:47 KotH rns: thanks
09:47 * KotH still has not marpa at all
09:47 * KotH still has not understood marpa at all
09:48 * rns been there too :)
09:49 rns Feel free to ask questions, I think.
09:49 KotH well, i started yesterday, after i failed to parse a file with regexp :)
09:50 KotH hmm.. why does
09:50 KotH ObjectiveSection ::= ObjectiveName Line
09:50 KotH work, but
09:50 KotH ObjectiveSection ::= ObjectiveName Line+
09:50 KotH throws an error?
09:51 rns because Line+ is a sequence (single alternative) rule.
09:52 rns try
09:52 rns ObjectiveSection ::= ObjectiveName Lines
09:52 rns Lines ::= Line+
09:52 KotH so i can use + only on single rules?
09:52 KotH might make sense...
09:52 KotH thanks
09:54 rns Quantifiers, like + and * form a special kind of rule -- https://metacpan.org/pod/distribution/Marp​a-R2/pod/Scanless/DSL.pod#Quantified-rule
09:54 rns Sequences need their own symbol to refer to, so to say.
09:56 rns their own left-hand side (LHS) symbol, as Lines above, to be more precise.
10:39 KotH ah.. ok
11:48 KotH rns: btw: why did you add a "use 5.010" in the example you gave me?
12:09 rns Just boilerplate -- the minimal perl version required for Marpa::R2 to work.
12:12 KotH ah
12:12 * KotH ignores that kind of stuff mostly if debian/stale has a recent enough version of perl :)
12:13 KotH i think the last time i actually cared about the perl version was with 5.6 or 5.8 when i needed some unicode stuff
13:56 idiosyncrat joined #marpa
14:15 idiosyncrat rns: re http://irclog.perlgeek.de/m​arpa/2015-05-19#i_10625468 -- thanks for answering.
14:15 idiosyncrat KotH: you may wonder, why does the SLIF require quantifiers to be separate statements?
14:16 idiosyncrat It's different from regular expressions, which everyone is accustomed to, and seems awkward.
14:17 idiosyncrat The reason is that, with a parser, you expect to attach a semantics to all the rules -- with regular expressions there is no such expectation.
14:18 idiosyncrat So if Marpa allowed " A ::= B+ (C | D) exp*", how would you specify how the arguments are passed to the semantics?
14:18 idiosyncrat If just as a list, they have to be re-parsed all over again.
14:19 idiosyncrat This problem can be solved, but the easiest way around it is to write it out as separate statements, with the quantifiers by themselves, and the semantics can then be attached to the statements.
14:21 KotH ah
14:21 KotH that makes so much sense, it should be obvious ^^'
14:22 idiosyncrat Btw, in Kollos, the next interface to Marpa, we *will* allow statements like "A ::= B* X Y* Z"
14:23 idiosyncrat But for the reasons I've explained, users will typically find that they just get themselves into trouble that way, and will find it best to have quantified rules be separate rules, all by themselves
14:24 idiosyncrat with separate semantics
15:28 rns idiosyncrat: re http://irclog.perlgeek.de/m​arpa/2015-05-19#i_10626588 -- totally welcome.
18:06 lwa joined #marpa
19:51 idiosyncrat joined #marpa
22:54 ronsavage joined #marpa
23:03 ronsavage I've added 6 lines from http://irclog.perlgeek.de/m​arpa/2015-05-19#i_10626590 to FAQ # 15. I'll fix # 23 (Moo/new/hashes) too. Just tell where in the docs to cross-check what I should say.
23:03 ronsavage 'tell where' => 'tell me where'.
23:04 ronsavage KotH: Have you read the FAQ :-)? It's at http://savage.net.au/Perl-​modules/html/marpa.papers/.
23:06 KotH ronsavage: no, i havent stumbled over that yet
23:06 KotH ronsavage: thanks!

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary