Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2015-02-16

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:46 ronsavage I did not know you went to such lengths to avoid recursion. And processing trees can easily lead people to assume a recursive algorithm is best, without them examining that assumption.
00:46 ronsavage Indeed, it may often be best, but we should still consider the consequences of choosing recursion.
00:49 ronsavage The problem with calling Marpa "a hyper-optimized packrat parser" is that people will immediately criticize it as having therefore all the worse aspect of PEGs, and is so doing they will deliberately degrade its reputation. Don't give them that opportunity!
01:29 pczarn joined #marpa
02:12 jeffreykegler rns: I "fixed" the marpa_g_symbol_is_counted() test failure in the documentation -- I think the actual behavior has now been documented as correct.
02:22 jeffreykegler ronsavage: I guess my envy of PEG is driven by its reputation-to-effectiveness ratio :-)
02:53 ronsavage jk: Understand. But the time to worry will be when Marpa has had as long to gain traction as PEG, but has still not done so. So, how long has PEG had? I have no idea.
02:54 jeffreykegler Actually, a history of PEG is something I'd thought of doing, because most folks don't know it.
03:00 jeffreykegler PEG is actually a way of specifying a parser, the actual algorithm had been around for a while when Aho&Ullman describe it as TDPL in 1972
03:01 ronsavage So, 42 years 'v' 7 (approx). Phew - I can relax.
03:03 jeffreykegler Aho&Ullman in turn describe their formalism as an abstraction of McClure's TMG compiler-compiler, which is 1965
03:05 jeffreykegler The basic idea is to try to take recursive descent and *somehow*, at least to some degree, make it syntax-driven
03:06 jeffreykegler When yacc & LALR came along, everybody though they had *real* syntax driven parsing, so the these PEG-precursors, which had always been beasts to work with, disappeared ...
03:06 jeffreykegler until ...
03:07 ronsavage Ah, yes. The famous *somehow*. And how does that differ from wishful thinking?
03:08 jeffreykegler To be fair, I also dreamed of somehow's and the difference between Marpa's somehow and others is as much a matter of luck as anything else.
03:09 jeffreykegler That is, I looked at Earleys' and said "*somehow* this can be made to work".
03:09 jeffreykegler so, anyway ...
03:09 jeffreykegler until ...
03:10 jeffreykegler yacc and LALR turn out to be almost as hard to work with as TMG, Atlas and that pre-historic generation of compiler-compilers.
03:11 jeffreykegler I say "pre-historic", not flippantly -- much of the early work on recursive descent compiler-compilers was *not* written up, so it is very hard to know exactly what was going on.
03:12 ronsavage But you and Peter made it work, whereas others in the field dreamed for twice Rip Van Winkle's siesta.
03:12 jeffreykegler Speaking for myself, it was a matter of good luck, and of being too stupid to be scared by the odds against me.
03:13 ronsavage So I don't accept the use of the word luck. Hard work, perspiration, inspiration, sure, but luck? Hmm. I don't think so, except perhaps to a v. small extend.
03:13 ronsavage 'extend' => 'extent'.
03:14 ronsavage And you must have had great determination, knowing the history in the field.
03:14 jeffreykegler I know a lot of friends who disagree with me with my emphasis on luck, but we'll have to agree to disagree.
03:15 jeffreykegler So, anyway, the story ... everybody's now given up on yacc, right.
03:15 jeffreykegler So recursive descent comes back big time.
03:15 jeffreykegler But it's not syntax-driven, right -- you gotta write the whole thing by hand.
03:16 ronsavage Sounds like change without progress, even if defined to be progress!
03:16 jeffreykegler And at this point TDPL (the algorithm behind PEG) get revived, with a new name and a new formalism.
03:16 jeffreykegler and the new name and formalism is PEG.
03:17 ronsavage I have often imagined that generations of computer science students were asked to investigate this issue, and many broken hearts resulted :-(
03:18 jeffreykegler ronsavage: the actual fact is that they were told that the problem was solved, and therefore not to be worked on.
03:18 jeffreykegler yacc is often still being taught as the solution, in the few cases when parsing is taught at all
03:19 jeffreykegler PEG comes out in 2002.
03:19 jeffreykegler I was going to work all this up for a new version of my chronology, but it takes time.
03:20 jeffreykegler Because to state these things sharply and pithily, but in a way that will stand up on examination, I have to weigh every word.
03:21 jeffreykegler PEG is a nice formalism, but it's misleading, because unlike Marpa or even yacc, the PEG formalism only looks syntax-driven.
03:22 jeffreykegler It's actually a description of a parsing algorithm, a subset of recursive descent, which uses a BNF-ish notation.
03:22 jeffreykegler That is, Marpa is BNF-driven, and yacc/bison (if you ignore precedence) is BNF driven,
03:23 jeffreykegler but PEG is *not* BNF driven, though the notation certainly makes it look that way at first glance.
03:25 jeffreykegler What happens with PEG is you work up your BNF,
03:26 jeffreykegler and PEG says, "Oh great!  Now tell me how to parse it using recursive descent ...
03:26 ronsavage 'they were told that the problem was solved'. But surely there were people who knew enough to be able to say that the problem was not really solved?
03:26 jeffreykegler and actually not recursive descent but a subset of it."
03:27 ronsavage Doesn't you description reinforce my suspicion that someone somewhere smelt a rat about 'it has been solved'?
03:27 jeffreykegler ronsavage: Back in one of my blog posts, I described this situation as the field being divided between theoreticians and practitioners ...
03:28 Aria That rings true.
03:28 jeffreykegler Each of whom take the others work in their field.
03:28 ronsavage And the practitioners were like ducks, paddling madly but not getting where they wanted to go.
03:28 jeffreykegler So the practitioners treat the problem as solved, because it's math*theory, and who do you go to is you don't get to the theoreticians and mathematicians for that?
03:29 ronsavage 'Each of whom take the others work in their field' Que?
03:29 jeffreykegler And the theoreticians treat the problem as solved, since the practitioners are trying their best to push that rope, and who do you trust about good practice, if not the practitioners.
03:30 jeffreykegler This goes on for decades, until the practitioners cry, "No mas" and quietly go back to recursive descent.
03:30 ronsavage As I see it, we need both types of thinkers, at various times. It's the theoreticians who explicitly cut themselves off the praxis that are the worry.
03:31 jeffreykegler The MIT compiler class is on-line and I've actually watched its two parsing lectures.
03:31 jeffreykegler One teaches LALR/yacc and the class is told that this is the solution to the parsing problem.
03:32 ronsavage Yes, they revert to who works, no matter how much pain is involved, because the theory offers promises which can't be put into practice. Sigh. Hence the pain of the practitioners.
03:32 Aria Eugh.
03:32 jeffreykegler The second lecture teaches recursive descent and .....
03:32 Aria MIT is poisoned by LISP.
03:32 Aria A lot of PLT is as well.
03:32 ronsavage PLT?
03:32 Aria "parsing! Yeah, that's been solved. Just match the braces"
03:32 Aria Programming language theory.
03:32 jeffreykegler at the end there is a list of criteria about whether you use "the solution" or its alternative and ...
03:32 ronsavage Ah. Thanx.
03:33 jeffreykegler if that list of criteria is parsed carefully :-) ...
03:33 jeffreykegler you'll find the class is basically being told:
03:33 jeffreykegler "Teach LALR, but use recursive descent"
03:33 ronsavage So they think inside the self-defined box. Luckily for everyone, you didn't.
03:33 Aria (Turns out that lots of the people who might ordinarily care about parsing are also the people who like tiny little languages, so they speak scheme or haskell. Which are both trivial to parse.)
03:34 jeffreykegler I think I said before that Marpa is out of the box thinking, but that is only by accident ...
03:34 jeffreykegler since the 1970's the box has shrunk.
03:34 ronsavage Exactly. Define a language which can be parsed, rather than force themselves to confront the issue of their beliefs crippling their thinking.
03:35 ronsavage No, it's can't be accidental. It's a reaction to the size of the box, and it's driven you to a solution. Take the credit. Others must have been driven, but gave up, no matter how much progress they made (which we probably will never hear about).
03:35 * jeffreykegler characterized the MIT lectures from memory, and hopes he was not unfair
03:36 ronsavage AFK
03:37 jeffreykegler because MIT in general, and those professors in particular, are being leaders by putting their lectures online, and deserve credit for that.
03:38 jeffreykegler MIT is also unusual, I think, in teaching parsing at all.
03:42 jeffreykegler Aria: you still there
03:42 jeffreykegler Anyway, I assume you'll backlog
03:43 jeffreykegler Would you be interested in adding a select collection of papers on PEG -- perhaps a subdirectory.
03:43 jeffreykegler Because there are some very revealing ones out there, but they are not widely known.
03:44 jeffreykegler If so, I'll put a list together.
03:50 Aria Back now. Sorry.
03:50 Aria I would love PEG papers.
03:51 jeffreykegler No problem, real life happens :-)
03:51 jeffreykegler There's one by Ierusalischy, who did a PEG-ish implementation, ...
03:52 jeffreykegler but his description of its capabilities is far more frank and realistic than most.
03:52 jeffreykegler And some others -- I'll get the list and put the links on this channel.
03:53 jeffreykegler Maybe we want to add Ford's 2002 paper(s) is it(they) are online.
03:53 jeffreykegler s/ is / if /
03:56 Aria Oh, wow. Ierusalimschy has done a lot.
03:56 jeffreykegler He did the PEG-ish parser as a kind of enhanced lexer, as a Lua module.
03:58 jeffreykegler So Roberto presents PEG as a spruced-up lexer, which is not over-selling it.
04:00 Aria Was this as late as 2008? "A Text Pattern-Matching Tool based on Parsing Expression Grammars" ?
04:00 jeffreykegler http://www.inf.puc-rio.br/~roberto/docs/peg.pdf
04:01 Aria That's the one.
04:05 Aria Oh this is delightful.
04:05 Aria The abstract on one of his later papers has much more hubris about what PEGs can do.
04:06 jeffreykegler http://www.romanredz.se/papers/FI2008.pdf
04:06 jeffreykegler Redziejowski's "Some Aspects of PEG"
04:07 jeffreykegler It describes the problem of figuring out what language your PEG specification actually parses
04:08 jeffreykegler The 1st few pages give examples which make it clear the kind of trouble you're getting into
04:09 jeffreykegler The rest of the paper gives some math trying to solve the problem of figuring out what strings a PEG does not and not accept.
04:09 jeffreykegler The point of the math to me being that nobody who hopes for a quick-n-dirty PEG parser is going to do even one-tenth of it.
04:11 Aria Heh, it's true.
04:11 jeffreykegler This makes it really ironic when Marpa is contrasted to PEG, on the ground that using PEG avoids ambiguity -- which is true, except not in the sense that is meant.
04:12 Aria Heh, yeah.
04:12 Aria Unambiguous like a bull in a china shop.
04:12 jeffreykegler PEG is unambigious in that it will only parse a language one way, but in practice you will never know what that language is.
04:13 jeffreykegler Marpa parses ambiguous languages, exactly as specified by their BNF, in the way specified by the BNF.
04:14 jeffreykegler There's a confusion between "ambiguous" and "unintended".  Ambiguity is not a problem when intended.  Unintended is always a problem, even if unambiguous.
04:14 jeffreykegler Anyway, those two are our start.
04:14 jeffreykegler I think there may be one or two more.
04:15 Aria Excellent. I'll keep an eye out -- I've skimmed both already. There's some good nuggets in both!
04:15 jeffreykegler Aria: thanks!
04:16 jeffreykegler Yes, PEG is very different from what most of its users expect that it is.
04:17 Aria Quite.
04:17 jeffreykegler Let me throw in a bit of "oral history" re TDPL
04:17 Aria I really should put together a long blog post on how to break your parser.
04:17 jeffreykegler "break my parser"?
04:17 Aria Like, how to specify a language that various algorithms can't parse.
04:17 Aria Or will go exponential on.
04:18 jeffreykegler Kind of, you can go exponential, but you really gotta work at it??
04:18 jeffreykegler Is that the idea?
04:18 Aria From exponential backtracking in DFA-based regex parsers to languages PEGs can't see, to right recursion in LR and left rercursion in LL..
04:19 Aria It'd just be nice to see examples collected of what breaks when you're writing parsers, what becomes impossibly hard with various approaches.
04:19 jeffreykegler Why not start with a post on PEG?
04:19 Aria I may have to, if only because it's the current golden child that needs to be taken down a ... er... peg or two.
04:19 jeffreykegler Using some of Roman R.'s examples, which will be news to a lot of folks.
04:20 Aria I'd also love to write a post about unions of languages.
04:20 Aria PEGs hit edge cases around that particularly easily.
04:21 jeffreykegler Roman's examples are all in his first 4 pages, I believe, and you can pull them out by skimming.
04:22 jeffreykegler My idea in taking on PEG is kinda of like, if you're out to take on a whole room full of people, you don't necessarily have to take on them all ...
04:23 jeffreykegler just take down the biggest, fittest-looking guy there, and the rest will probably get the idea.
04:24 * Aria laughs.
04:24 jeffreykegler So currently PEG is the toughest guy in the room.
04:24 Aria Very much.
04:25 Aria I want to develop that blog post that helps people get an intuition about the parsing algorithms capabilities. It would probably be well received among security folks too.
04:25 Aria Nothing like a DoS attack via someone's parser by making it go nonlinear.
04:25 jeffreykegler "an intuition"?
04:25 Aria s/an//
04:26 Aria Develop their own intuition about what languages work in various places.
04:26 Aria If anything, a breadth-first overview of parsing, rather than the depth-first into $algorithmofthemoment that classes have historically done.
04:26 jeffreykegler You don't have to be a security guy to let even quadratic get you seriously down.
04:26 Aria Indeed. Just thinking of the people I know who'd get a kick out of such a post.
04:27 jeffreykegler Yeah, I've tried to do that in my writings.
04:27 jeffreykegler Actually from the security point of view ...
04:27 jeffreykegler that unintended language thing can be even bigger than the DoS.
04:28 Aria Absolutely.
04:28 jeffreykegler There are a lot worse things that can happen to a site than a DoS, if it's holding any important data.
04:29 jeffreykegler Like, with your PEG algorithm should come a description of the language it parses -- and no the PEG formalism is not sufficient ...
04:30 jeffreykegler because in practice there is no easy way to go from an arbitrary string to saying "yes I accept it" or "no I don't".
04:30 jeffreykegler with a regex there is
04:30 jeffreykegler and with Marpa there is.
04:31 jeffreykegler with yacc there sort of is, modulo precedence and the exact boundary of LALR
04:31 jeffreykegler with recursive descent there conceivably could be, with the right discipline
04:32 jeffreykegler with PEG, no, and this is one of the things that Aho&Ullmann pointed out back in 1972.
04:37 Aria Yup.
04:37 jeffreykegler One reason I've never worked up my PEG stuff, if that I think I'm better employed doing other stuff.
04:38 jeffreykegler Like fixing Libmarpa nits
04:38 jeffreykegler s/ if that I think / is that I think /
04:41 Aria :)
04:42 jeffreykegler Oh, yes, that old-timer story I threatened to tell --
04:44 jeffreykegler Anyway, PEG comes from TDPL which emerges from a "prehistory" of compilers-compilers that were used before the first parsing paper was actually published, in 1961
04:45 jeffreykegler These compiler-compilers, I gather, were sort of like hand-assembled versions of PEG -- PEG with all its problems, but without its nifty-looking formalism.
04:45 jeffreykegler One of these technologies was called Atlas, and in the 1980's I knew Atlas programmers.
04:46 jeffreykegler Now at this point I already had my degrees, and knew a lot of parsing theory, but when I asked these guys what they did ...
04:46 jeffreykegler they described it in such arcane and round-about terms, that it didn't connect with anything I knew.
04:47 jeffreykegler Atlas was a craft, Atlas consulting (once you'd broken into it) was a nice steady living, because writing an Atlas compiler took a nice long billable time.
04:48 jeffreykegler The last thing Atlas programs wanted out of their compiler-compiler was to generate a parser quickly.
04:50 jeffreykegler Nor did they care to have outsiders know how they created compilers.
04:54 jeffreykegler My best guess is that Atlas compiler writing was writing of recursive descent compilers, but using a standardized toolkit and discipline which had been built steadily from perhaps 1951, by the programmers of the very first computers.
05:10 Aria Oh wow.
05:14 jeffreykegler My feeling on thinking back to those folks, is like encountering folks who can reliably start a fire using only dry leaves, vine ropes, wooden sticks and stones ...
05:14 jeffreykegler really an impressive, solemn, spectacle, ...
05:14 jeffreykegler but I'm still gonna use a match.
05:24 * Aria laughs.
05:26 ronsavage joined #marpa
05:28 jeffreykegler ronsavage: I hope you weren't an Atlas programmer -- no offense meant. :-)
05:30 Aria https://en.wikipedia.org/wiki/Atlas_Autocode ... wow.
05:31 Aria Huh. I wonder if my grandfather ever worked on this. Timeframe fits.
05:32 ronsavage No, I've never programmed an Atlas. I learned briefly on a CDC 3200  (Fortran), but mostly on B5500/B6700s (Fortran and Color, a little, and Algol for many years).
05:33 jeffreykegler I think the Atlas I was talking about was an compiler-compiler which was developed for the machine of the same name, and which was the basis of Atlas Autocode, but is different from both.
05:34 ronsavage Aria: Likewise, wow.
05:37 Aria Oh interesting. A precursor.
05:37 Aria That fits, time-wise. Aye.
05:38 jeffreykegler https://en.wikipedia.org/wiki/Atlas_Supervisor
05:39 jeffreykegler There's also an "Atlas" which is, arguably, the first operating system.
05:39 jeffreykegler Again, so-called because it was developed for the Atlas machine.
05:40 Aria Aye. So many pieces I've never heard of that didn't fit into the MIT / Harvard / Berkeley / Burroughs / DEC history that's always around.
05:41 jeffreykegler This old stuff is most useful for understanding what Thompson&Ritchie and the PDP emancipated us from.
05:42 ronsavage WTF: 'Color' => 'Cobol'.
05:42 Aria Still parsing trivial grammars, but I got something a hair less trivial working today!
05:43 Aria https://github.com/aredridel/lo​tsawa/blob/master/json-like.js
05:44 jeffreykegler Great!
05:44 Aria Yeah! And now, I must sleep.
05:48 jeffreykegler Bye!  And for myself also, good night.
05:48 jeffreykegler AFK.
06:05 rns jeffreykegler: re http://irclog.perlgeek.de/m​arpa/2015-02-16#i_10121168 - all tests pass now!
06:11 ronsavage I'm thinking of implementing https://www.ietf.org/rfc/rfc3966.txt. For a module name, one containing RFC3966 would make searching easier, but it not necessary. And what to put between MarpaX and RFC3966? Or call it something else? I can't decide.
06:12 rns Aria: re http://irclog.perlgeek.de/m​arpa/2015-02-16#i_10121601 -- as I have known recently, a grammar is trivial if it produces only a null parse, and yours is very close to the original json, so it's not exactly trivial and the whole lotsawa thing looks very promising.
06:12 rns Looks like you'd need a good lexer any time soon.
06:14 rns And that Atlas thing was actually huge -- http://elearn.cs.man.ac.uk/~atlas/ -- has a nice writeup and search for 'compiler compiler' produces interesting entries.
06:18 rns <ronsavage> perhaps MarpaX::URI::tel, akin to, e.g., URI::file and URI::data in https://metacpan.org/release/URI?
06:20 rns ronsavage: http://irclog.perlgeek.de/m​arpa/2015-02-16#i_10121675
06:26 ronsavage rns: Yes, plausible. A small problem is that the RFC covers 'tel', 'fax', and 'modem', and I had considered using 'tel' in the name, as in *::TelFaxModem. Still thinking.....
06:27 ronsavage Or I could ship *::Tel, *::Fax and *::Modem in 1 distro!
06:27 ronsavage I might even consider lower-case :-)
06:30 ronsavage Aria: Way back then, ICL (IIRC an English chemical company) had a famous computer. Do you remember the name?
06:31 rns Well, https://metacpan.org/release/URI seems to go that way with URI::URL and lower cased URI::http and URI::nntp.
06:48 ronsavage rns: Yep. Must go - the puppies are disassembling the sofa.
07:00 rns Take care!
07:01 rns jeffreykegler: that SO answer http://stackoverflow.com/a/28494196/4007818 describes parallel parsing efforts with YACC, cites difficulties with combining parses and an approach joining chunking with forward and backward parsing.
07:01 rns Difficult and not seems to be the motif. Looks like what would be made much easier with strand parsing.
07:26 rns s/Difficult and not seems/"Difficult" and "not easy" seem/
08:29 rns jeffreykegler: the PR for rank methods is filed, I'll be busy at $work for 2-3 days, will backlog -- before continuing with events, recce, and valuator methods.
09:56 pczarn joined #marpa
12:55 CQ joined #marpa
15:57 lwa joined #marpa
15:58 pczarn joined #marpa
17:31 sivoais joined #marpa
17:31 shadowpaste joined #marpa
17:31 ernimril joined #marpa
17:32 sivoais joined #marpa
18:58 jeffreykegler joined #marpa
19:10 Aria ronsavage: Re: http://irclog.perlgeek.de/m​arpa/2015-02-16#i_10121653 -- Not offhand! Things on the right side of the pond are always more distant in my mind.
19:11 jeffreykegler Aria, rns: re http://irclog.perlgeek.de/m​arpa/2015-02-16#i_10121653
19:12 jeffreykegler re "trivial grammars" -- it's very much OK to use that phrase in a loose sense.
19:13 jeffreykegler I needed a term for grammars which accept an input if and only if it is zero-length, and since "trivial" is in standard mathematical use for that sort of thing, I call them "trivial grammars"
19:13 jeffreykegler AFAIK, no other textbook or paper uses that term in that way, so it's not standard.
19:14 Aria Makes sense to me.
19:14 Aria Where as a programmer, I think of trivial as "doesn't exercise any of the hard parts"
19:14 jeffreykegler Another sense of the term, which in context, you should be able  to use.
19:15 jeffreykegler This sort of thing arises all the time with mathematical terminology.
19:15 Aria Yup.
19:15 jeffreykegler For example, when you say "category", you don't necessarily mean category as in category theory.
19:16 jeffreykegler There's potential for confusion but, hey, that's language.
19:17 jeffreykegler I'll note that in my language, I tend to use the more precise terms, but that's because somebody could extract it and say later, "Jeffrey, didn't you state X Y Z".
19:17 jeffreykegler And I can say, "Well yeah, I said X Y Z, but that's X and Z in the loose sense", but I prefer to avoid the possibility.
19:18 jeffreykegler Another example is "regular expression", which commonly is used to mean a regex in some package, for example a Perl regex.
19:19 jeffreykegler But "regular expression" has a mathematical definition, which is very important in my stuff, and Perl regexes go far, far beyond it.
19:19 jeffreykegler But if you advise someone in Perl-land to "use a regular expression", do you mean stick to the mathematical definition?  Of course not.
19:20 jeffreykegler At least, not usually.
20:10 rns re http://irclog.perlgeek.de/m​arpa/2015-02-16#i_10125207 -- actually, yes; e.g. in Grune & Jacobs 2nd ed. p.296 "trivial grammar" is apparently used to mean just "simple grammar".
20:13 jeffreykegler rns: good catch.
20:14 jeffreykegler The G&J use is informal, not mathematical, but clearly anyone is entitled to use the term "trivial grammar" the way G&J use it, instead of the way I use it.
21:21 ronsavage joined #marpa
21:37 ronsavage I suppose a grammar accepting null input could be called a null grammar, but to me null grammar implies the grammar itself has no content. That's language for you. Whoever invented 'context' (for human languages), and no matter how many millions of years ago, was a genius.
23:38 flaviu joined #marpa
23:55 ronsavage joined #marpa
23:58 jeffreykegler joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary