Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2015-05-21

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:13 idiosyncrat joined #marpa
00:15 idiosyncrat ceridwen: I don't recall specifics, but maybe in Scott's paper: http://dinhe.net/~aredridel/.notmine/PDFs/SPPF-S​tyle%20Parsing%20From%20Earley%20Recognizers.pdf
00:15 idiosyncrat Note that it is an aspect of the traditions/rhetorical of the theoretical literature that they are not a lot of result for post-parsing processing, period.
00:17 idiosyncrat Sometimes the assertion is made that the post-processing does not add to the time complexity -- sometimes even that is skipped.
00:20 idiosyncrat Messed up the sentence before last badly, I'll repeat it:
00:22 idiosyncrat The tradition (= rhetorical convention) of the literature on theory of parsing is to not spent a lot of effort on time/space complexity results for the post-processing after the parsing phase.
00:23 idiosyncrat The reason for this, I think, is that the pioneers put the initial emphasis where it belongs -- you have to get the parsing algorithm and its complexity down, or you really cannot meaningfully
00:23 idiosyncrat ... look at anything else.
00:25 idiosyncrat But in the decades afterwards, due to loss of interest and ossification of the methodology, the widening of the inquiry did not happen.
00:36 ronsavage joined #marpa
02:11 ceridwen joined #marpa
03:43 rns ceridwen: re http://irclog.perlgeek.de/m​arpa/2015-05-20#i_10633906 -- interesting question, a couple of qoutes I've seen:
03:43 rns "it can be done in a space whose size is at most proportional to the third power of the length of the input string" -- Grune & Jacobs 2nd ed., 3.7.2, p.87
03:43 rns "there is a shared forest structure with at most cubic size for any CF grammar and any sentence in its language" -- http://bat8.inria.fr/~lang​/papers/acl89/RR-1038.pdf
03:44 rns Couldn't find a formal proof in either (the latter contains what I'd call "proof by construction")
03:48 ceridwen Yeah, I was looking at the later paper today, which everyone seems to cite, but like you say, it doesn't have a formal proof in it.
04:15 idiosyncrat I'm skimming, so forgive me if I glossing over difficulties.
04:17 idiosyncrat I'd regard the O(n**3) result as pretty obvious, because the space in Earley's is O(n**3) and the Earley sets are a kind of parse forest --
04:21 idiosyncrat joined #marpa
04:21 idiosyncrat I am having Internet troubles
04:22 idiosyncrat and the rest of my reply disappeared
04:23 idiosyncrat Having written the code to turn Earley sets into parse forests, I can tell  you that the main difference is that information must be thrown away.
04:23 idiosyncrat The Earley sets may include many things which do not wind up in the final parse
04:23 idiosyncrat and do not belong in the parse forest.
04:26 idiosyncrat Skimming the Billot & Lang paper, perhaps that's what they are doing -- they generatlize parsing as a PDT
04:26 idiosyncrat and then simulate the PDT with Earley's
04:27 idiosyncrat This would be why no direct proof of the result -- if you show what you've produced are essentially Earley sets you can use Earley's result.
04:27 ceridwen Do you have an intuition why going from a general tree to a binary tree reduces the complexity from unbounded polynomial to cubic?
04:27 ceridwen Right.
04:27 idiosyncrat Doesn'
04:27 idiosyncrat t the article say that is a fallacy?
04:28 ceridwen That'
04:28 ceridwen That's not how I read it.
04:29 ceridwen On p. 2: "A drawback is that this structure may be cubic in the length of the parsed sentence, and more generally polynomial for some proposed algorithms."  Here they're talking about Tomita's original GLR paper.
04:31 idiosyncrat_ joined #marpa
04:31 idiosyncrat_ Page 2 footnote 4
04:32 * idiosyncrat_ is still having Internet troubles
04:35 ceridwen They prove that all CFGs have a worst-case-cubic parse forest, but not that all parse forests have worst-case-cubic space complexity (because they don't).  What do binary forests have that general parse forests don't?
04:36 jeffreykegler joined #marpa
04:41 ceridwen Related question, maybe for when your Internet problems resolve: for worst-case-cubic-complexity, do all nodes have to have only two children, only "or" nodes, or only "and" nodes?  (In their terminology.)
04:43 rns ceridwen: to be precise, they prove that all CFGs have a worst-case-cubic _shared_ parse forests.
04:43 ceridwen Yes.
04:45 rns And *all* _shared_ parse forests do have cubic space complexity -- if by 'all' we mean 'of all sentences in a CFG'.
04:46 jeffreykegler joined #marpa
04:46 jeffreykegler Yes OK.  The footnote I cited says that the *grammar* does not have to be binarized.  It does not say anything about whether the tree must be binarized.
04:47 ronsavage joined #marpa
04:47 jeffreykegler But it makes sense to me that it helps to binarize the tree -- that is saves space.
04:47 jeffreykegler If you don't binarize, then you group several choice points together, and they go exponential.
04:48 jeffreykegler Binarization separates each choice point.
04:49 jeffreykegler I hope that conveys the intuition -- if you've fooled with these trees long enough the difference between binarization and not becomes pretty clear.
04:51 jeffreykegler As for a proof that you *have* to binarize the tree to obtain O(n**3), I leave that as an exercise to the reader. :-)
04:52 jeffreykegler Actually, I wouldn't know how to prove that -- I've got a O(n**3) algorithm, and hand-wavy objections to the non-binarized approach, and for my purposes,
04:52 jeffreykegler that suffices.
04:53 ceridwen Thanks for the insight.  I'm getting closer to feeling like I understand (rather than just knowing it's true).
04:55 jeffreykegler ceridwen: re http://irclog.perlgeek.de/m​arpa/2015-05-21#i_10635468
04:55 jeffreykegler IIRC the or-nodes do *not* have to be binarized.
04:55 jeffreykegler left #marpa
04:55 jeffreykegler joined #marpa
04:57 jeffreykegler I think I'd better call it a night.
04:57 jeffreykegler My Internet connection is not being co-operative anyway.
04:57 ceridwen Good night.
05:26 rns Interestingly, binarization of parse forests is used in machine translation [1], [2].
05:26 rns [1] http://static.googleusercontent.com/media/re​search.google.com/en//pubs/archive/37011.pdf
05:26 rns [2] http://www.cs.cmu.edu/afs/cs/project/cmt-55/lt​i/Courses/734/Spring-12/binarized_forests.pdf
07:52 lwa joined #marpa
11:49 hobbs joined #marpa
11:49 hobbs joined #marpa
13:17 rns Lua globals are truly global -- they are accessible beyond the scope of the file where they.
13:18 rns Lua globals are truly global -- they are accessible beyond the scope of the file where they'are created. Makes sense, but unexpected.
13:18 rns luacheck (Lua's perltidy) can help.
13:19 rns Sorry for duplication.
13:55 KotH btw: marpa is awsome!
13:56 KotH less then 72h from "never heard of it" to "full fledged parser that does a couple of non-trivial operations on the parse tree"
13:56 Aria \o/
13:57 KotH i don't think i've ever used another parser generator with such short learning time
13:58 Aria The lack of "what is a shift / reduce conflict" helps a ton.
13:58 KotH jupp
13:58 Aria And "how do I get it to match _____. It keeps picking the first thing!"
13:58 Aria And "I have to write a tokenizer first?!"
13:58 KotH i must admit, the grammar i have is pretty simple
13:59 KotH it's barely more complex than the infamous calculator example
13:59 KotH but still
13:59 KotH the speed with which i got stuff done is impressive
16:03 idiosyncrat joined #marpa
16:03 idiosyncrat KotH: re http://irclog.perlgeek.de/m​arpa/2015-05-21#i_10637503 -- Thanks!
16:05 KotH idiosyncrat: heh..that's the other thing... this channel is way too friendly
16:05 KotH idiosyncrat: OSS doesnt work like that
16:05 KotH i need people shouting at me... insulting me... ignoring me!
16:05 Aria I take some pride in being friendly, and everyone here has been pretty great.
16:06 idiosyncrat KotH: the civility is important to me and, ironically, I can be a bit strict about it
16:07 idiosyncrat Fortunately, lucs is the actual moderator and, I don't know about anyone else, but he exercises a moderating influence on me
16:07 KotH heh
16:08 KotH dont worry. i like it here :)
16:08 idiosyncrat Another positive influence is Larry Wall's example, over in #perl6
16:09 KotH after 15y of being in various channels, some better, some worse, this one feels an oasis in a desert.... or a zen garden in japan
16:10 idiosyncrat I think the snarkiness in other forums is a real obstacle to getting things done -- the problem is not just one of manners, but of productivity
16:10 Aria Absolutely.
16:10 Aria Little is so disempowering than an argument every time you ask a question
16:11 idiosyncrat Stackoverflow is good in that the *querent* rates the answers ...
16:11 KotH idiosyncrat: i dont want to argue that, but if you have been in a channel where most people who drop in are absolutely clueless, expect everything on a silver plate and blame you if things do not work, it becomes pretty hard not to become snarky and cynical
16:12 idiosyncrat Well, here, I cannot expect people to know basic parsing theory well, never mind, Marpa
16:13 idiosyncrat It is just the state of the industry that you can be a highly educated and brilliant programmer these days ...
16:13 KotH yes, but you can expect people to at least read. most people dont
16:13 Aria Heh. We solve that in #node.js by asking those people to leave. They're actually rare. They just get more airtime when you've a snarky culture.
16:13 idiosyncrat and not have a clue about parsing
16:13 KotH idiosyncrat: but i guess only those who are able to read and think for themselves are using perl these days
16:13 * KotH has no clue about parsing
16:13 idiosyncrat Aria: right.  We've not had to ask anybody to leave yet, but I am quite willing to see that happen.
16:14 KotH interestingly, i didnt need much clue to write a parser with marpa :)
16:14 Aria Yeah. Marpa has fewer pitfalls. So you end up less in the weeds with it.
16:14 idiosyncrat KotH: As I said, lots of excellent and even well-educated programmers these days do not have any background in parsing.
16:15 KotH and also a more "natural" interface ... you need a token in the middle of this syntactic rule? just add it! no need to define a special token just for that
16:15 idiosyncrat Also, I never expect folks to have read the relevant docs -- these are long and complex, ever-changing, and have no change log ...
16:16 idiosyncrat *I* have trouble keeping up with them, so I cannot expect anyone else to do so.
16:16 KotH eh.. i know that feeling
16:17 KotH though, i must say, that Grune/Jacobs gives a nice overview of the basics in a very concise form
16:17 idiosyncrat So I never say "read the docs", and I try to be careful in giving a doc ref not to seem to be inferring that I am saying "Read the docs!
16:17 KotH *nod*
16:18 idiosyncrat s/inferring/implying/
16:18 idiosyncrat Useful in leading a community is to study Larry Wall -- reading how he interacts on #perl6
16:20 * KotH takes notes and joins #perl6
16:20 idiosyncrat Also, but unfortunately inactive this days, Audrey Tang was an important influence, even on Larry.
16:21 idiosyncrat Linus also makes a good example, if you are selective .. his rants and nastygrams are *not* to be imitated
16:22 idiosyncrat But, modulo those, he's an excellent lead for the Linux community.
16:22 idiosyncrat ceridwen: I printed off the Billot & Lang and am reading it -- thanks for pointing me at it.
16:26 idiosyncrat I was referring to http://irclog.perlgeek.de/m​arpa/2015-05-21#i_10635387 , but ...
16:27 idiosyncrat now that I backlog I see the actual link to Billot & Lang is due, like so much else these days, to rns
16:27 KotH hehe
16:27 * KotH has a huge pile of papers too read on his desk
16:27 * KotH blames his curiosity
16:35 ceridwen KotH, I'm glad to hear that.  It's a real vindication of Marpa's approach to making parsers easier to use.
16:36 KotH jupp, definitly. i like what i saw
16:36 ceridwen idiosyncrat: For what it's worth, I'd found it too, I just didn't link it.
16:36 ceridwen So I'd already read it when rns linked it.
16:36 KotH would i have heard about marpa earlier, i would have done quite a few apps differently
16:37 KotH heck, i once wrote a vhdl parser completely in regexp, because all alternatives would have taken me much longer
16:37 ceridwen Ow.
16:38 KotH it was fun though. and it was also the moment when i learned that perl regexp are turing complete :)
16:39 ceridwen I personally think three cornerstones of what a good parsing library should have are a top-down algorithm like Earley or GLL (no shift-reduce conflicts, they're hard to debug for experts and brick walls for newbies), no prioritized choice, and a scanner-less/tokenizer-less/lexer-less interface.
16:39 Aria Yeah. Even if scanless is marpa's interpretation, which is an included lexer.
16:39 Aria And LATM is so much easier to get right than LTM
16:40 * KotH understand only half of those words ^^'
16:41 KotH but i find it interesting that top-down parsers are comming back. i remember when everyone laughed at wirth and him demanding top down parsers
16:43 ceridwen I dislike the terminology "lexer" anyways.  It obscures the central point, which is that a lexer *is* a parser.  A parser with a lexer is to a parser as a two-stage compiler is to a one-stage compiler.  You're just feeding the results from one parser to another parser.
16:44 rns Totally.
16:45 KotH ceridwen: that's one of the main points of Grune/Jacobs in the first or second chapter :)
16:46 rns Lexer seems to be just semantics  -- take a grammar, make its semantics parser produce a token stream and feed it another parser -- and it is a lexer.
16:46 ceridwen Now, there may be ways in which it's easier to break the parsing into two stages, but conceptually it's clearer to realize that when you do that, you're doing grammar composition.  I've heard people say things like, "Python is LL(1)," which it's not, not even remotely.  What you can say is that Python can be divided into a context-sensitive grammar and an LL(1) grammar.
16:47 ceridwen You can use an LL(1) parser for the second grammar, but the first grammar requires a context-sensitive parser, which is almost always in my experience a hand-written recursive-descent parser.
16:48 * KotH is pretty sure that none of the modern scripting languages is LL(1)
16:48 KotH not everything is as bad as perl, but still not as simple as LL(1)
16:49 ceridwen Python's second grammar is close to LL(1) if it's not actually LL(1).
16:50 ceridwen After you handle context-sensitive white space and break it into tokens.
16:50 KotH lol
16:50 ceridwen I know there's at least one custom ambiguity resolution that's done after parsing.
16:50 ceridwen Yes, exactly :).
16:50 KotH that sounds like "building a rocket is easy. you just need to take this rocket motor and you are done"
16:51 KotH anyways.. time to go home and sleep
16:51 * KotH waves
16:55 lwa joined #marpa
20:18 jeffreykegler joined #marpa
20:19 * jeffreykegler continues to have trouble with Internet connectivity *and* power outages
20:42 jeffreykegler I would note that I find the lexer/parser distinction important, even if optional.
20:43 jeffreykegler Even in reading text in natural languages, humans seem natural to want to 2-phase language ...
20:44 jeffreykegler words and punctuation with discardable spacing, in a layer largely w/o semantics
20:44 jeffreykegler and then, phase 2, the lexeme stream which needs to be parsed and whose parse *does* have a semantics.
20:45 jeffreykegler I've experimented with a merger of the two phases, and will continue to do so ...
20:47 jeffreykegler It's useful in some contexts to think of lexing as a kind of parsing, but it'd be very counter-productive ...
20:47 jeffreykegler at least in the current state of practice ...
20:47 jeffreykegler to not keep the distinction in mind.
20:48 jeffreykegler In particular, many programmers do not learn the lexer/parsing distinction and without it, they will have to tendency to solve all parsing problems by cobbling together regular expressions ...
20:48 jeffreykegler because they won't have the alternative theoretical model.
20:50 jeffreykegler Aria: for the collection of parsing papers -- http://bat8.inria.fr/~lang/​papers/icalp74/icalp74.pdf
20:51 Aria Oh excellent.
20:51 jeffreykegler This is where Lang proves the complexity and correctness results that he uses in Billot & Lang: http://bat8.inria.fr/~lang​/papers/acl89/RR-1038.pdf
20:52 jeffreykegler It's significant for more than that -- I'm halfway through what I am ashamed to say is my 1st reading of it.
20:53 jeffreykegler I expect to have more to say later, but for now I will say that it is *very, very elegant* work.
20:54 Aria Yeah, wow.
20:55 Aria Added to my collection!
21:18 * jeffreykegler goes back to the parsing/lexing distinction
21:19 jeffreykegler The fast that most programmers say lexing as parsing, and did not see a parsing vs. lexing distinction had real consequences early in Marpa's history.
21:19 jeffreykegler To describe what I was doing with Marpa, I'd say that I was trying to "solve the parsing problem"
21:20 jeffreykegler But if, for you, parsing == lexing, then this statement is meaningless -- the "parsing problem" has been defined out of existence.
21:21 jeffreykegler So I was left with no way to tell 90% plus of the community what I was trying to do and my early efforts struck many people as pointless theorizing.
21:22 jeffreykegler As for the other <10%, who did know some parsing theory, so that my claim to be "working on the parsing problem" actually has a meaning ...
21:23 jeffreykegler most of them would have followed the standard expert opinion that, yes, there *had been* a parsing problem, but that LALR and yacc solved it.
21:24 jeffreykegler So the field divided unevenly into those who thought the problem Marpa was solving did not exist, and those who thought it was already solved.
21:26 Aria Oh geez
22:54 ronsavage joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary