Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2015-12-22

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:51 Idiosyncrat joined #marpa
03:01 VsyachePuz joined #marpa
04:19 ronsavage joined #marpa
05:15 Idiosyncrat joined #marpa
06:17 ronsavage Hey! I'm just signed off $work for the year. Now, .....
06:19 ronsavage Oops. I'm => I've, And no, I'm /not/ full of Xmas cheer (yet).
07:19 koo7 joined #marpa
16:00 JPGainsborough joined #marpa
17:08 Idiosyncrat joined #marpa
17:11 Idiosyncrat http://www.antlr.org/papers/LL-star-PLDI11.pdf
17:11 Idiosyncrat I went back and looked at the ANTLR article for complexity claims.
17:12 Idiosyncrat It's a little hard to unpack, but I think Parr&Fisher do not claim better than exponential.
17:13 Idiosyncrat They do claim linear if you turn on memoization and change to a PEG strategy.
17:14 Idiosyncrat What they try to do is to use the best available top-down strategy for each grammar, ramping up gradually.
17:16 Idiosyncrat Left recursion is not handled.
17:17 Idiosyncrat The claim of "linear in practice" is a bit circular -- it's likely than if your grammar is linear for any *top-down* parser,
17:17 Idiosyncrat it is linear for ANTLR.
17:18 Idiosyncrat So if parsing "practice" to you means top-down (which it does for Parr&Fisher and much of the profession these days),
17:19 Idiosyncrat then that claim is viable.
17:20 Idiosyncrat But certainly in the practice of Marpa there are lots of grammars people can and do use in practice which are not linear for ANTLR.
17:20 ceridwen Idiosyncrat, Any specific examples you can share?
17:20 Idiosyncrat e
17:20 Idiosyncrat Examples of what?
17:22 ceridwen Marpa grammars that are not linear for ANTLR.
17:23 Idiosyncrat The one at the start of section 2 in Parr&Fisher, which they report goes exponential in certains cases for ANTLR
17:23 ceridwen A simple top-down parser with memoization is worst-case polynomial.
17:24 ceridwen So I'd be surprise if ANTLR is worst-case exponential?
17:24 ceridwen *surprised
17:24 Idiosyncrat I believe is linear for Marpa.
17:25 Idiosyncrat The SLIF's grammar is ambiguous, so I doubt it is linear for ANTLR
17:25 Idiosyncrat Also, anything left recursive.
17:27 Idiosyncrat Where's the "worst-case polynomial" result from -- it's not in Parr&Fisher.
17:28 Idiosyncrat Perhaps it's not for general top-down with backtracking, but for PEG?
17:33 ceridwen http://www.aclweb.org/anthology/J91-1004
17:34 ceridwen This is where I've seen the result cited, I haven't gone over it in detail myself.
17:34 ceridwen PEGs of course are linear with memoization.
17:42 Idiosyncrat The Norvig claim is for a special technique, one not used in Parr&Fisher
17:43 Idiosyncrat And not a general claim for simple top-down parsers w/ memoization.
17:43 ceridwen I agree---I should have said, "Can be worst-case polynomial."  I'm not sure what ANTLR 3's actual worst-case is.
17:45 ceridwen In the later paper http://www.antlr.org/papers/allstar-techreport.pdf, they prove an O(n^4) bound for ANTLR 4.
17:45 Idiosyncrat "We  should  point  out  that  without  memoization,  back-
17:45 Idiosyncrat tracking parsers are exponentially complex in the worst case"
17:45 ceridwen They have optional memoization, but they don't clarify how exactly they're memoizing.
17:46 Idiosyncrat From Parr&Fisher, last graf in section 6
17:46 ceridwen So it could be polynomial-bounded, but we simply don't know without digging through the code.
17:47 ceridwen "Backtracking in
17:47 ceridwen versions of ANTLR prior to 3.x suffered from exponential
17:47 ceridwen time complexity without memoization. Ford also solved this
17:47 ceridwen problem by introducing packrat parsers. ANTLR 3.x users
17:47 ceridwen can turn on memoization with an option."
17:49 ceridwen That's from the second-to-last paragraph in section 7, same paper.
17:49 Idiosyncrat Right, ANTLR 3 was linear if you switched to PEG mode, exponential worst-case.
17:50 ceridwen Okay, yes, I think we agree.  (I found some of these papers a bit confusing too.)
17:50 Idiosyncrat Not exactly the clearest way of stating it, but I think that's what they are saying.
17:50 Idiosyncrat Thanks for pointing out the ANTLR 4 paper ....
17:51 Idiosyncrat this does seem to have a non-exponential worst case
17:52 ceridwen They definitely changed approaches for ANTLR 4.
17:52 ceridwen Yes.
17:54 Idiosyncrat If I were convinced that top-down was as good as you could do ...
17:54 Idiosyncrat I would either study ANTLR very closely, or ...
17:55 Idiosyncrat just give up, and refer everybody to Parr's work. :-)
18:01 Idiosyncrat Also, the layout of the ANTLR 4 paper is better -- you can spot his claims w/ a quick scan -- in the Parr&Fisher it's an exercise in decipherment.
18:03 ceridwen I'm choosing the former, at the moment :).  I don't actually believe top-down is as good as you can do, for some values of "as good."  I think there are still questions I haven't seen answered to my satisfaction, such as what the constant factors are for the various algorithms compared to each other.  (I mean, we know that Valiant's algorithm with Coppersmith-Winograd has the best asymptotic time for some highly-ambiguous inputs, but the constant fa
18:03 ceridwen ctors are so bad...)
18:06 Idiosyncrat Actually, do we really *know* that about Valiant ...
18:06 Idiosyncrat I mean that is what everybody says, but I don't think there has been much in the way of trying to make Valiant practical.
18:07 Idiosyncrat Because of course, just quickly writing the algorithm off because the constant factors are hopeless is exactly what was done
18:07 ceridwen I've wondered about that myself.
18:07 Idiosyncrat with Earley's.
18:08 Idiosyncrat And they stuck to that "bad constant factor" line as CPU/$$ improved by a factor of a billion or so.
18:08 ceridwen At least for Coppersmith-Winograd, I think the arguments still hold (only for matrices larger than current hardware can hold).
18:09 Idiosyncrat I've never really studied Valiant/Coppersmith/Winograd, but I would not take the textbook assertions about bad constant factors overly seriously.
18:10 Idiosyncrat I did see a paper about Valiant's algorithm some months ago, where they did try to implement, and found the situation not to be as bad as feared.
18:11 ceridwen Oh, which paper?
18:11 Idiosyncrat I'll try to find it.
18:11 ceridwen The only discussion I've found is in https://www.reddit.com/r/haskell/com​ments/3b4ztr/a_neat_way_of_doing_non​greedy_parsing_with_parsec/csjgfnb , and they cite this paper: http://www.cse.chalmers.se/~bernardy/PP.pdf
18:15 Idiosyncrat http://www.cse.chalmers.se/~bernardy/PP.pdf
18:15 Idiosyncrat Ah, you beat me to it.
18:18 Idiosyncrat I had grave doubts about one aspect of the paper, which talked about the complexity bounds on the assumption that there were limits on recursive depth ...
18:19 Idiosyncrat which I cannot convince myself is a legitimate approach ...
18:19 Idiosyncrat it seems to me a bit like doing number theory on the assumption there is a maximum integer.
18:20 Idiosyncrat That aside, I found their stuff about Valiant's intriguing and thought they were very much to be praised for going beyond the quick "bad constant factor" dismissal
18:24 ceridwen Thanks :).  I haven't gotten a chance to read it yet, but I will have to do so.  My personal suspicion is that if there's something to be gained from Valiant's algorithm, it's going to be from applying changes in hardware capabilities like parallelization or vectorization.
18:25 ceridwen I haven't had much time to put into it, but I do wonder.
18:26 ceridwen Of course, other algorithms like GLL or GLR could also potentially benefit from parallelization.
18:26 Idiosyncrat In the reddit discussion that you cite, edwardkmett seems to suggest that the parallelization is a distraction.
18:26 ceridwen Yers.
18:27 Idiosyncrat And reading the paper, I even wondered if that aspect had been played up to get the paper accepted.
18:27 ceridwen It's quite possible there's nothing there.
18:27 Idiosyncrat Parallelization is fashionable, and Valiant's is not.
18:29 ceridwen I agree that in general the importance of parallelization has been overstated---I haven't studied in this particular problem enough to say one way or the other here.  For parsing in general, my take is similar to Grune and Jacobs': parsing is not really a naturally parallel problem, and the many attempts so far haven't come up with anything amazing.
18:30 ceridwen There's been a truly massive amount of work in creating software and hardware for fast matrix multiplication, though, which does make me wonder.
18:38 ceridwen It's interesting to me the old algorithms for parsing LL-regular grammars use two passes, the first reading from right to left.
18:45 ceridwen That supports your theory that you have to use the right context :).
20:28 ernimril I think it is very important to be able to parse several sources in parallel, but that is not the same as making the parser parallel, it may require some things (like the grammar) to be thread safe, but I think that is how it mostly ends up anyway
21:43 ronsavage joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary