Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-05-25

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:55 jeffreykegler joined #marpa
01:56 jeffreykegler Some nice news -- someone is starting a Marpa re-implementation in Javascript: https://github.com/Hardmath123/nearley
01:57 jeffreykegler It's perhaps useful to make clear -- Marpa, at its core, is a new *algorithm*.  To explain the history of yacc is ---
01:59 jeffreykegler Don Knuth invents LR parsing, as a mathematical exercise.  He gives an algorithm, but it wasn't practical -- that was not his main objective.
02:00 jeffreykegler Frank? DeRemer invents LALR -- this is a subset of LR(1) that is thought to be practical and in fact ...
02:00 jeffreykegler Stephen Johnson writes it up as yacc
02:01 jeffreykegler it also becomes the core of the flagship C compiler until 2006 and in fact ...
02:01 jeffreykegler dominates the academic theory of parsing for decades.  This is the Golden Age of Right Parsing.
02:03 jeffreykegler Anyway, the point is at its basis, the Marpa project is about doing for Jay Earley's algorithm what Frank DeRemer did for Knuth's LR parsing, but ...
02:04 jeffreykegler I've wound up having to try to be Steven Johnson as well.
02:05 jeffreykegler Anyway, that was an attempt to explain why I welcome alternatve implementations of Marpa.  Marpa is an algorithm, and attracting multiple implementations is what you want to do is you're an algorithm.
02:51 kartikchandra joined #marpa
02:52 Hardmath123 Hardmath123, checking in.
02:52 jeffreykegler Hello
02:52 jeffreykegler Hardmath123: do you know if Aria is joining us here
02:52 Hardmath123 No, I don't.
02:53 jeffreykegler That's OK.
02:53 Hardmath123 I don't know him personally or anything, he just happened to find nearley and rewrite it to be infinitely better.
02:53 jeffreykegler Was my last answer helpful to you.
02:53 jeffreykegler Aria has been collaborating with you on nearley?
02:53 Hardmath123 Yeah
02:53 Hardmath123 Yeah -> collaborating
02:54 Hardmath123 I'm still trying to grasp the precomputation.
02:54 Hardmath123 nearley started as a proof-of-concept for someone at Berkeley.
02:54 jeffreykegler Were you able to understand my current paper?  It is unfortunately not easy.
02:54 jeffreykegler The new one will be easier.
02:55 Hardmath123 The one about Marpa? No. There's a lot of new notation, which takes a while to become familiar with.
02:55 Hardmath123 I was hoping to pass this off as my CS project...
02:55 jeffreykegler You may want to do just a subset.
02:55 Hardmath123 Subset?
02:56 jeffreykegler Only parts of the algorithm -- for example just a classic Earley.
02:56 Hardmath123 I've already done that (nearley).
02:56 jeffreykegler Great.  Project complete! :-)
02:56 Hardmath123 :P
02:56 Hardmath123 Except, I feel I should do something new to turn it in.
02:57 Hardmath123 Somehow turning in an old project feels wrong.
02:57 jeffreykegler Something new is required for a Ph.D.
02:57 Hardmath123 True.
02:57 Hardmath123 You know that's not what I meant, though.
02:57 jeffreykegler High school CS project, I'd hope a re-implementation of leading edge technology will be more than adequate.
02:58 Hardmath123 Technically, I don't need to submit anything.
02:58 Hardmath123 Now that I think about it.
02:59 jeffreykegler If it'
02:59 jeffreykegler s possible to wait for my new paper, I think you want to do that.
02:59 Hardmath123 It's due Thursday, fwiw.
03:00 jeffreykegler Just for perspective, I am in my 7th year of full-time work on Marpa.
03:00 Hardmath123 Full-time? Wow.
03:00 jeffreykegler Originally, I had hoped to have it done by Thursday too. :-)
03:01 jeffreykegler Didn't work out like that. :-)
03:01 Hardmath123 I can imagine.
03:01 Hardmath123 Ok, so Aycock-Horsepool
03:02 Hardmath123 what it's saying
03:02 Hardmath123 is that
03:02 Hardmath123 is that when you consume a symbol
03:02 Hardmath123 you somehow know ahead of time who expects that symbol?
03:03 jeffreykegler Are you familiar with NFA/DFA automata?
03:04 Hardmath123 Umm, let's see, it's like state machines that you use to do regexes, right?
03:04 jeffreykegler Yes
03:05 jeffreykegler How about LR parsing?  Have you looked at that?  (It'd be very advanced for high school.)
03:05 Hardmath123 Nope. Heard of it
03:06 Hardmath123 As well as LLR, LLK(n), or whatever
03:06 jeffreykegler OK.  Let me give some perspective.
03:06 Hardmath123 lots of those
03:07 jeffreykegler A&H learned from a colleague that Earley parsing and LR parsing were similar in their use of automata.
03:07 jeffreykegler So they combined the two.
03:07 jeffreykegler Something which you've got to understand both algorithms to see what they are trying to do.
03:08 Hardmath123 I see.
03:08 jeffreykegler A&H is a hybrid of LR(0) parsing and Earley's, and if you don't know about LR(0), it is very hard to see what A&H are up to.
03:08 Aria joined #marpa
03:08 Aria Hello then!
03:09 Hardmath123 Aria!
03:09 jeffreykegler Hello Aria
03:09 jeffreykegler So you'd have to read up on LR(0) parsing and how it works.
03:10 jeffreykegler LR(0) parsing is something I found hard to warm up to -- it took me decades, literally.
03:10 Hardmath123 Yeah. Is it vaguely related to recursive descent parsers?
03:10 Aria Indeed. I'm familiar.
03:10 jeffreykegler The simplistic answer to the RD question is no, they're not related.
03:10 Hardmath123 'k
03:11 jeffreykegler There are texts on LR parsing, because it was once very popular.
03:12 * jeffreykegler is answering Hardmath123's question
03:12 Hardmath123 I never thought parsing was so... hard.
03:12 jeffreykegler and to understand A&H you really should read those -- and they're not light reading.
03:13 Hardmath123 I can imagine.
03:13 jeffreykegler But anyway get to understand handles, how you find "handles", and what you do with them once you've found them.
03:13 Aria Yeah, they're really not. A&H is a painful, painful read.
03:14 Hardmath123 I feel most formal papers make things much harder than they are.
03:14 jeffreykegler Aria: It was.  They contributed crucial ideas to Marpa.
03:14 Hardmath123 I grokked Lambert/Phong really quick once I found blog posts written by, y'know, humans.
03:14 Aria Heh, yeah.
03:15 Aria And I didn't get the A&H paper until I read the Marpa paper.
03:15 Hardmath123 Notation :-(
03:15 Aria Ugh, yes.
03:15 Aria Awkward notation at best. Misleading and frustrating and underspecified at worst.
03:15 jeffreykegler Aria: I'm very flattered by your comments about the current Marpa paper.
03:15 Aria Hehe. It's well written!
03:15 Hardmath123 So
03:15 Hardmath123 the reason nearley happened
03:16 jeffreykegler I tried hard, but worried about my difficulty of my Marpa paper.
03:16 Hardmath123 is because I'm working on improving a CS course
03:16 jeffreykegler The new one will be easier.
03:16 Hardmath123 and that involved me knowing the material that was being taught
03:17 Aria Oh interesting!
03:17 Hardmath123 And our idea is to use some form of animation to make it make more sense.
03:17 jeffreykegler So you two are collaborating on nearley?
03:17 Hardmath123 (By the way, that Lua parser I built with nearley found a pretty interesting inconsistency with the Lua grammar. Dunno where to report that.)
03:19 Aria (Oh interesting. I'd love to confirm that and see what it is!)
03:19 Aria (I bet I could scare up someone who'd care.)
03:19 Hardmath123 (function (a) return function(b) return a + b end end)(a)(b)
03:19 Aria Yeah, we've been collaborating. I'd love to have a more elegant parser for, ultimately, SQL. I can do it with the tools available in Javascript, but don't particularly care for them.
03:20 Hardmath123 the lack of semicolons makes that ambiguous.
03:20 Aria And my coworker on the project is a Perl lover, and has used Marpa for a long time. She got me looking at Nearley that way.
03:20 Hardmath123 Two different versions of Lua react differently.
03:20 jeffreykegler Jean-Damien on this IRC channel has a start on an SQL parser
03:20 Aria Oh nice, Hardmath123.
03:20 Aria The kind of thing a PEG parser makes clear, but an Earley parser will show off the ambiguity of.
03:21 Hardmath123 Exactly.
03:21 Hardmath123 PEG also dies on left recursion.
03:21 Hardmath123 Which is why nearley > peg.js
03:21 Aria Yep.
03:21 Hardmath123 the other reason is that nearly doesn't generate reams of code.
03:21 Aria (We're porting an SQL migration tool -- Modyllic -- from PHP to Javascript. The PHP parser is a slow hand-coded LL parser)
03:22 Aria Yeah. Though I don't mind the reams of code if they're direct, easy to map back to the original constructs.
03:23 Hardmath123 Well, PEG.js has generated like 2,000 lines, which is a bit much.
03:23 Aria Yeah. Ometa's a bit tighter, I think. But still verbose.
03:23 Hardmath123 How about Jison?
03:23 Aria And I honestly can't understand the implementation, often.
03:23 Hardmath123 I never understood Jison's format.
03:23 Aria I've not tried jison.
03:23 Hardmath123 Because it wants a lexer, too.
03:24 Aria Yeah. I'd love to stay entirely scannerless. Even if my coworker can produce a really nice lexer.
03:24 Hardmath123 by the way, I plan on adding some form of error-checking to nearley.
03:24 Hardmath123 If you .feed something and the last row of the table is empty, throw an error.
03:24 Aria Oh, good. Because that's where Marpa really excels, and what we want to keep from our LL implementation.
03:24 Aria That makes sense.
03:25 jeffreykegler I'm curious.  Why a Javascript clone?
03:25 Aria Almost all our current tools are Javascript-based.
03:25 jeffreykegler They need to run on the client side?
03:25 Aria No, node.js.
03:26 Aria But we'll be using it in some perhaps clever ways in other parts of the codebase.
03:26 jeffreykegler Excuse my ignorance, that means they're not on the client side?
03:26 Hardmath123 Yeah. Node.js is on the server.
03:26 Aria Yeah, not the client side. Javascript divorced from the DOM.
03:26 Aria node.js is just an evented IO runtime for javascript.
03:27 jeffreykegler Is there a problem with a C library?
03:27 Aria Node.js actually makes calls into C very slow.
03:27 Hardmath123 Well
03:27 Aria And dealing with C-based dependencies is just a lesson in frustration in several ways.
03:27 Hardmath123 yeah
03:27 Hardmath123 Python's C bindings are much better
03:27 Hardmath123 afaik
03:27 Aria I did look into using libmarpa. It's intriguing, though I'm always tempted to dive in.
03:28 Aria Python's got a global interpreter lock, and a relatively simple foreign function interface. It's easy to interface with, though hard to optimize across.
03:28 jeffreykegler Because parsing is bit-twiddling, and doing that in a higher level language really slows things down.
03:28 Aria Node's at the other end of the spectrum -- a garbage-collection-heavy, heavily JIT-ed runtime written in C++.
03:29 Hardmath123 yay v8
03:29 Hardmath123 :)
03:29 Aria Yeah. Though one can be surprised by the V8 javascript engine. If you write code that's essentially statically typed, it does an amazing job at optimizing it.
03:29 jeffreykegler All the optimizations you get from going higher-level -- none of them help with the parse engine.
03:29 jeffreykegler It's all about how fast you can flip bits.
03:29 Aria Yup.
03:29 Aria V8 imposes < 30% overhead in the best cases for that.
03:30 Aria (It's scarily fast. It competes with C for things.)
03:30 jeffreykegler That little??
03:30 Aria Yeah. I was floored.
03:30 Aria I mean, it doesn't all the time. But it's certainly possible to write tight stuff using it.
03:30 Aria Of course, the main author of V8 has basically been doing the same compiler over and over for decades.
03:31 Aria He understands optimizing a dynamic runtime better than just about everyone.
03:31 jeffreykegler Aria: oh yes your question about predictions ...
03:31 Hardmath123 Are there any efficient JS -> low-level compilers out there?
03:31 Hardmath123 I feel there should be.
03:32 Aria Hardmath123: Not divorced from V8, since there's a great many constructs that you have to do stupid things to support.
03:32 Hardmath123 Aria: Oh. Yeah, JS has a lot of stupidy built-in.
03:32 Hardmath123 []+[] === ''
03:32 Hardmath123 among other things
03:32 Aria Heh, yeah. Though that's not what's hard to optimize.
03:33 Hardmath123 eval's what's hard to optimize.
03:33 Aria Just type-changing, highly dynamic structures are just ... slow. And then there's eval.
03:33 Hardmath123 Yeah
03:34 Hardmath123 I should go. I've been at this since 2, and it's 8:30
03:34 Aria Ta, Hardmath123!
03:34 Hardmath123 Hope I get to talk to you guys later!
03:34 jeffreykegler Bye
03:34 jeffreykegler Aria: still there?
03:35 Hardmath123 c
03:35 Aria I am indeed.
03:35 Hardmath123 ^c
03:35 Hardmath123 well, that didn't work
03:35 Aria "/quit"
03:35 jeffreykegler your question about predictions
03:35 jeffreykegler Do you use transitive closures?
03:37 Aria Not explicitly; why?
03:38 jeffreykegler It can help to have a subroutine that pounds them out.  They occur *a lot*.
03:38 Aria Oh, true.
03:38 jeffreykegler And prediction precomputation is a transitive closure.
03:39 jeffreykegler Libmarpa uses Warshall's algorithm.
03:39 Aria Oh, interesting!
03:39 jeffreykegler And there's an implementation in the code.
03:39 Aria Heh, that's exactly what I was set to tackle next.
03:40 Aria Like EXACTLY where I left off.
03:40 jeffreykegler So you just give it matrix of A's that connect to B's, and tell Warshall's, by the way this whole mess is transitive ...
03:40 jeffreykegler And poof, you've saved a lot of difficult code.
03:41 Aria Oh that's a fantastic tip.
03:41 jeffreykegler What'
03:41 Aria And useful even if you're not using Aycock and Horspool's automaton?
03:42 jeffreykegler s V8, by the way?
03:42 Aria V8 is the "core" of node.js.
03:42 Aria Google's javascript engine.
03:42 jeffreykegler YEs, useful even if you're not doing A&H.
03:42 Aria node.js is that, packed up as an executable, with an IO engine.
03:42 jeffreykegler Warshall's is a good example of what I mean by bit-twiddling.
03:43 jeffreykegler If V8 can do it with only 30% overhead over C, I will be stunned.
03:43 Aria I shall have to find out!
03:43 jeffreykegler While you're looking up Warshall's ...
03:44 Aria As a side note, a small map of Marpa--R2's directory layout would be super useful in examining it.
03:44 jeffreykegler The Libmarpa one uses Libmarpa's bit matrices and bit vectors which are fast, but may not port well ...
03:45 jeffreykegler If you want a good write up, the best is in the 2-volume Aho&Ullmann if you have it.
03:45 jeffreykegler Warshall's is used a lot to compute distances rather than connnections, so many write-ups are too complicated for our purposes ...
03:46 Aria I don't have it, but I bet I can scare it up at MIT ;-)
03:46 jeffreykegler since we are only interested in connections.
03:46 Aria Yeah.
03:46 jeffreykegler Distances translate to connections, but the translation ...
03:47 jeffreykegler is from infinite to 0, and from any other value to 1, which is a tough translation to do for a complicated algorithm in your head ...
03:47 jeffreykegler at least I found it so.
03:47 Aria Ayuh.
03:49 jeffreykegler Are you near MIT?
03:50 Aria Yeah. I live near Tufts, work near MIT.
03:50 Aria Have to go past Harvard on the way. Cambridge is crazy.
03:50 jeffreykegler YEs, I've driven there.
03:52 jeffreykegler Anyway, if you're really getting into this, you want a transitive closure subroutine ...
03:52 jeffreykegler and you want it to be Warshall's, as Jean-Damien helped me find out the hard way ...
03:52 Aria Alright then!
03:53 jeffreykegler and if you've trouble figuring it out from the code in Libmarpa, which is quite possible ...
03:53 jeffreykegler just come back on the channel.
03:54 Aria Alrighty. I'm gonna give this a deeper read. I think I'm actually ready to read libmarpa now. My first attempt didn't go so well, but now I've a lot more background than I did months ago.
03:55 jeffreykegler Apologies in advance for the state of the Libmarpa code ...
03:55 jeffreykegler I tried to keep it readable, and will clean it up someday ...
03:55 Aria Heh, it happens. It sure beats reading the corporate pile I deal with for work.
03:55 jeffreykegler but you managed to make it through my paper ...
03:55 jeffreykegler so you obviously have both talent and patience.
03:55 Aria Hehe. I'm not sure if that was brains on my part or persistence.
03:56 jeffreykegler I believe both were required.
04:01 Aria Oh literate programming environments. So good for documenting, but ... always a step to actually understand how the build system works.
04:02 Aria I think I see what you've done for Warshall's algorithm though. Simple enough.
04:02 Aria I'm going to have to get comfortable with just doing a lot of this as bit math and not trying to use any higher-level constructs to keep it fast.
04:02 Aria And this is why people think parsers are hard.
04:02 jeffreykegler My particular matrix and bit vector arrangement combines well with it.
04:03 jeffreykegler If you do them the way I do it, they are hard.
04:03 jeffreykegler That's why most folks prefer recursive descent, which make the theory easy, but the practice hard.
04:03 Aria Yup.
04:03 jeffreykegler I do it the oppostite way -- make the theory hard and the practice easy.
04:03 Aria Yup. Which is ultimately what's needed.
04:04 jeffreykegler And which was the hope for LALR.  Hard theory, easy practice.
04:04 jeffreykegler But LALR was a disappointment, and that has made the profession dubious of my kind of approach.
04:05 jeffreykegler They see all this math, and remember last time around it was all basically a lot of trouble for nothing ...
04:05 jeffreykegler and say "Hey, why not just call subroutines?" -- which is what recursive descent is a fancy name for.
04:08 Aria Yeah.
04:08 Aria Ugh.
04:08 Aria Recursive descent is so elegant for some things. A small LL language, so direct.
04:08 Aria But get any size? Combinators? Screw that.
04:09 jeffreykegler A second advantage I think Marpa might have over RD ...
04:09 jeffreykegler besides power ...
04:10 jeffreykegler is that it separates the grammar and the procedural stuff.  Time will tell ...
04:11 jeffreykegler but I think that will produce a much more maintainable system, than RD, which spreads all the special processing out
04:11 jeffreykegler among the grammar specification.  Marpa forces you to specify a nice, clean grammar,
04:11 jeffreykegler which you can then mess up as much as you want, separately.
04:12 jeffreykegler In a small, clean example, the difference will not show, but I think in real life, it will be a big deal.
04:17 Aria I agree.
04:17 Aria Rather a lot. Especially when you want to combine grammars.
04:17 Aria (Embedding languages is a particular interest of mine)
04:18 jeffreykegler Embedding? how?
04:42 Aria Parsing SQL inside Javascript; parsing HTML literals inside a host language.
04:42 Aria Language combination without delimiters.
04:42 jeffreykegler And you mentioned scannerless parsing?
04:43 jeffreykegler I assume they're connected because otherwise you have the problem of switching lexers.
04:44 Aria Indeed.
04:45 Aria Lexers are great and all, but ultimately frustrating.
04:45 jeffreykegler Have you read my older blog posts on scannerless parsing?  I originally intended the SLIF as scannerless in the sense we are now discussing.
04:45 jeffreykegler It turned out to be scannerless only in the lesser sense of not requiring an external scanner.
04:47 jeffreykegler Anyway my investigations led me in two opposite directions -- one seeing ways to do fully scannerless parsing via Marpa, but also ...
04:48 Aria I read a few -- haven't combed the backlog though.
04:48 jeffreykegler A feeling that for human-readable languages the two layers may be part of the terrain, and a two layer approach required for efficiency.
04:48 Aria I was thinking about that.
04:48 jeffreykegler Perhaps you'd want to take a hack at a fully scannerless interface for Marpa?
04:49 Aria I'd certainly play with it. I'm experimenting with ideas like that.
04:49 jeffreykegler I started one and described it in an old blog post.
04:49 jeffreykegler But eventually chickened out and went with the approach that is now in the SLIF.
04:50 Aria I keep wondering if there's a way to specify a single grammar down to the character, but add a process to identify layers -- and generate separate rulesets that consume each other.
04:50 jeffreykegler "consume each other"?
04:52 jeffreykegler OK. You kind of mean a front-end to a SLIF-like interface, which identifies the lexer for you?
04:53 Aria Yeah.
04:54 Aria Sorry. Not consume each other: each consumes the next.
04:54 jdurand joined #marpa
04:54 Aria (no reason it has to be two levels)
04:54 jeffreykegler Here's my worry about that idea --
04:55 jeffreykegler Consider ISO dates 10-24-1953 and their ilk, in some sort of date calculator
04:55 jeffreykegler (this example recently came up)
04:55 jeffreykegler Is the whole thing lexical?  Or are the "10", "24" and "1953" lexemes?
04:56 jeffreykegler I think the answer depends on what you're doing with the semantics, and that would require knowing the intent of the language designer.
04:56 Aria I wonder if it actually matters in practice.
04:57 jeffreykegler Good point.
04:57 jeffreykegler It matters in practice in that particular real-life example, but are there enough examples without that kind of corner case?
04:57 Aria Or if, given a grammar that has numbers and bare dates like that, one could deduce that there's "some run of digits" lexemes, and then a "numbers separated by dashes" lexeme.
04:58 Aria I think so. Especially since the ambiguity is local.
04:58 Aria (or at least in a lot of practical systems it would be)
04:58 jdurand Re http://irclog.perlgeek.de/​marpa/2014-05-25#i_8771760 - do you write a user-friendly wrapper on top earley algorithm so that grammar could be written in BNF-like notation
04:58 jeffreykegler OK.  Just throwing out the issue.
04:59 Aria Indeed. I haven't explored it fully. And I haven't thought through the full issues with local ambiguity.
05:00 Aria jdurand: Nearley does indeed have a BNF-styled input that we generate rules from.
05:00 jdurand Jeffrey, is there an gain by using marpa's proper flag when defining a rule, compared to usual notation a ::= b | c a
05:00 jdurand Aria: very good - thx
05:01 Aria jdurand: Of course! hardmath123 did all that work. It's delightfully simple. Nearley's design is pretty nice. Not fast, currently, but nice.
05:01 jeffreykegler Aria: please don't read my point as discouraging the idea.
05:01 Aria jeffreykegler: Not at all! I'm going to explore it more.
05:01 jeffreykegler Aria: Good!!
05:02 jeffreykegler jdurand: Is there a typo in the question?
05:02 jeffreykegler Did you mean a ::= b | b a ?
05:05 jeffreykegler jdurand: I think the answer is yes, there's a gain at evaluation time.
05:05 jeffreykegler If Marpa knows it's a sequence rule, it optimizes.
05:05 jeffreykegler But only when evaluating the stack -- otherwise they're the same.
05:06 jeffreykegler jdurand: Apologies for my confusion.  But also ...
05:06 jdurand I meant the equivalent of Names ::= Name (#x20 Name)* that I'd rewrite as Names ::= Name proper => X20
05:06 jeffreykegler If you do write the recursion out manually, prefer left recursion -- it is somewhat faster.
05:07 jdurand jeffreykegler: ok thanks - it appears that XML grammar is designed in such a way that it is not ambiguous - this mean that stack generation can be done a lexing phase -and this is exactly why SAX is possible
05:09 jdurand I am currently rewriting XML1.0 grammar with my marpa wrapper - this is an enjoying stuff - I like the fact that my marpa Wrapper forced the programmer to divide functionnalities in terms of grammar and semantic
05:10 jeffreykegler jdurand: I was just telling Aria that this was a property of Marpa that I had some hopes will be helpful.
05:10 jeffreykegler Aria: since you're new to the channel, let me introduce Jean-Damien as one of Marpa's power users ...
05:10 Aria Oh, hello!
05:11 jeffreykegler someone who's contributed a fair amount of code to Marpa::R2 and probably knows as much about it's build mechanism as I do.
05:11 Aria Oh excellent.
05:11 jeffreykegler And he's written or started some very large projects, including a C parser.
05:11 jdurand jeffreykegler: lol thnx -; Marpa is great it is and as I predicted this year will see Marpa growing very fast
05:13 jeffreykegler jdurand: Aria has read and understood my Marpa paper, and is a very serious student of parsing.
05:14 Aria Indeed. My interest in parsers goes way back.
05:14 jdurand Yep, I've read the backlog - it is very good that other theorists join us - on-the-edge parsing, i.e. Marpa family, is a remarquable subject for a PhD
05:15 Aria Quite.
05:15 jeffreykegler Yeah, maybe I should get one :-)
05:15 Aria I'd go do a PhD like that if it wouldn't require a slog through undergrad or clever bureaucracy hacks first.
05:16 jeffreykegler Actually, I'm not sure I could have done Marpa via a PhD program -- parsing has become very unfashionable in academia
05:16 Aria Yeah.
05:16 jeffreykegler and Earley parsing seems to have nearly disappeared from the face of the earth.
05:16 Aria Maybe UCLA, if one could convince Alan Kay that your approach was interesting.
05:17 jdurand Aria: that's PhD's constraint - my personal PhD in physics contains 50% of stuff that is a repetition of previous research or just presentation -;
05:17 Aria Yeah.
05:17 Aria (I've the added constraint of no formal education on the books.)
05:18 jdurand but I believe the fact tht Earley parsing have nearly disappeared could a chance to jump onto it - you would look like a very innonative student, even if the subject is old -;
05:18 jeffreykegler Aria: when you first surfaced on Google searches re Marpa, I looked up your Web page and read that.  You were running a CO ISP?
05:18 jeffreykegler jdurand: I did have the luxury of working on Marpa without a lot of other researchers breathing down my neck. :-)
05:21 Aria I was.
05:21 Aria That failed, so I moved to Boston. I work for a gay dating site now.
05:21 Aria Though I've an offer coming from PayPal, so we'll see where that takes me.
05:22 jeffreykegler Aria: Good luck!
05:22 Aria Thanks!
05:23 jeffreykegler Btw, my clue about the speed of the academic uptake on Marpa came from the reception of Joop Leo's 1991 paper ...
05:23 jeffreykegler which dropped into a black hole.
05:24 jeffreykegler From my point of view it was obviously revolutionary, but apparently nobody else was reading papers the way I was
05:24 jdurand jeffreykegler: that's academic style, remarquable things can drop down and we do not know why - a question of politics and influence perhaps - anyway you might notice that Marpa is ramping up these days -;
05:25 jeffreykegler And I expected that on the academic side I'd face the same uphill battle that Joop had.  So I figured I'd do what Joop had not done, and actually implement it.
05:25 jeffreykegler I've been reading Ada Lovelace recently by the way ...
05:26 jeffreykegler and in one of her letters she remarks that she does not really on being recognized until after she's dead.
05:27 jeffreykegler Her view of things is a good one to emulate [ though one can hope for better luck :-) ]
05:27 jeffreykegler jdurand: But I do notice Marpa ramping up.
05:29 * Aria nods.
05:35 jdurand Jeffrey, given how to marpa code is generated, there are two major ways to use it: a native interface proxy, like R2.xs, or upgrade marpa.w to have target language - do you think the later is worth?
05:35 jdurand "given how the"
05:36 jeffreykegler Not sure I understand the 2nd alternative: "upgrade marpa.w"?
05:36 jdurand Suppose there would be a target language. Then processing marpa.w could generate Marpa code directly in another language but C
05:37 jeffreykegler Do you have a specific language in mind?
05:37 jdurand I am not sure it is worth doing it, still. All major language have bindings to C library - that was just an idea - I had Java in mind
05:38 jdurand I ask the question because marpa's API is well separated in terms of "objects" - nevermind, this is not important
05:39 jeffreykegler Libmarpa is 100% bit and pointer twiddling.  The slowdown in Java would be horrific.
05:40 jeffreykegler There are some applications where higher-level languages offer trade-offs which make them competitive with C ...
05:40 jeffreykegler bit- and pointer-twiddling applications are not among those.
05:40 jeffreykegler The only other language I think might make sense for Libmarpa might be golang.
05:41 jdurand ok
05:41 jeffreykegler Now, for higher-level wrappers, that's a whole other question, with a whole lot more possibilities.
05:42 jeffreykegler The cool thing about C is what you write in C stays written.
05:43 jeffreykegler In the sense that everything I write in Perl can be endlessly twiddled and has to be to protect itself from the environment.
05:43 jeffreykegler But if I have to rewrite C code it is for one reason and one reason only -- because I didn't write it as well as it could be written the first time.
05:45 jeffreykegler Btw, my Internet connection is beginning to flake out.
05:48 jeffreykegler1 joined #marpa
05:53 Aria Hm. go, rust would both be good languages for libmarpa
05:54 Aria But I'd be surprised if Java was actually that slow.
05:54 Aria Its arrays, I think, match the uses of pointers I'm reading. And its integer math is fast.
05:55 jeffreykegler1 My prototype for Libmarpa was in Perl and the speedup in converting to C was a factor of 10.
05:55 Aria That doesn't surprise me.
05:55 jeffreykegler1 Libmarpa is worst case for a higher level language.
05:56 jeffreykegler1 Also, Libmarpa does its own memory management.
05:56 Aria Quite. It's something a compiler has to recognize are bit ops.
05:56 Aria Oh fun!
05:56 jeffreykegler1 That one reason it's so fast ...
05:56 Aria Yup.
05:56 Aria I'm trying to implement transitive_closure in javascript now. We'll see if I can make it fast.
05:57 jeffreykegler1 because one way of describing Earley processing is allocate memory, twiddle a bit or two, allocate more memory ...
05:57 jeffreykegler1 In terms of speed, an Earley parser is a memory allocator -- period.
05:58 jeffreykegler1 It's a large number of small memory allocations.
05:58 Aria Yeah. I saw that in Nearley. That's one of its bottlenecks.
05:59 jeffreykegler1 I solved it in Libmarpa with a custom memory allocator based on GNU's obstacks.
05:59 Aria Ah, excellent. Regional management.
06:00 jeffreykegler1 So, for example, if I used go, I would *not* use its memory allocator ...
06:00 jeffreykegler1 I'd grab large blocks in the most efficient way and allocate within them.
06:02 jeffreykegler1 From that point of view, C is ideal, because it's major defect -- no memory management -- is, for Libmarpa, almost a plus.  It's certainly not a problem.
06:03 jeffreykegler1 I notice from Wikipedia that rust does "safe" memory management -- does it allow a application to work around it?
06:03 Aria Absolutely.
06:03 Aria You can implement your own allocator with no trouble. People are writing OS kernels using Rust.
06:04 Aria Their notion of safety is different than most languages.
06:04 Aria (mostly having to do with pointer and memory ownership)
06:05 Aria As far as I can tell, Rust is C done right for the modern era.
06:05 jeffreykegler1 Wikipedia says no null or dangling pointers allowed in rust.
06:05 Aria Correct.
06:05 jeffreykegler1 Not sure how I could write a memory allocator under those restrictions.
06:06 Aria https://github.com/huonw/rust​-malloc/blob/master/malloc.rs
06:06 Aria unsafe { do magic }
06:07 jeffreykegler1 There's an "unsafe" keyword?
06:07 jeffreykegler1 And it does what it sounds like it does?
06:07 Aria Yep.
06:07 Aria Lets you munge into memory space a little bit.
06:07 jeffreykegler1 And have the best of both worlds.
06:07 Aria Rust isn't designed to tie your hands, just make it clear when you're doing something dicey.
06:07 Aria Yep.
06:08 jeffreykegler1 So, OK, rust would be a 3rd alternative to golang and C.
06:09 * jeffreykegler1 doesn't mention C++ because he's trying to be polite
06:10 Aria Oh, good. I'm happy to be in polite company.
06:10 Aria ;-)
06:11 jeffreykegler1 Actually, Peter Stuifzand is working on some kind of Libmarpa interface using C++
06:11 jeffreykegler1 And it does look nice.  Maybe he's going to surprise us with something really cool.
06:11 Aria Excellent.
06:12 Aria I find that C programmers and Lisp programmers can write excellent C++
06:12 Aria Not so much the C++ programmers often.
06:12 jeffreykegler1 Peter is in a way responsible for Marpa's current interface -- I had a non-DSL interface ...
06:12 jeffreykegler1 and he wrote a mini-DSL, that convinced me that a DSL was the way to go as a Marpa interface.
06:13 Aria Oh interesting.
06:13 Aria I keep hearing from Rebecca that she's not used the DSL much.
06:13 Aria Not sure if that's inertia or some dislike.
06:13 jeffreykegler1 Since Marpa is all about DSL's you'd think I could have make the move myself, but it took Peter's prodding.
06:14 Aria Heh, yeah.
06:14 jeffreykegler1 Ron Savage, another very, very early user also got to like the old NAIF.
06:14 Aria At least you didn't end up like ometa, where the entire thing is written in ometa. You have to have the precompiled version to precompile it.
06:15 Aria As much as I adore metacircular interpreters, it ends up being crazy if you actually want to integrate it.
06:15 jeffreykegler1 Actually the SLIF is written in the SLIF.
06:15 Aria Oh is it!
06:16 jeffreykegler1 The SLIF is self-compiling.
06:16 jeffreykegler1 The circular dependency is a bit daunting, but I've learned how to tame it.
06:19 Aria Yeah. That's the tough bit, keeping that under control.
06:20 Aria Without the bootstrapping or the main code rotting.
06:22 Aria Woot! Almost have transitive_closure working.
06:27 jeffreykegler joined #marpa
06:27 jeffreykegler https://metacpan.org/source/JKEGL/Marpa-​R2-2.084000/lib/Marpa/R2/meta/metag.bnf
06:28 jeffreykegler Aria: I look forward to seeing the Javascript vs. C numbers on Warshall's
06:28 Aria Likewise! I'm not trying particularly hard right now, so I doubt it'll get there as I've written it.
06:28 Aria But I bet I can
06:29 jeffreykegler Gonna have to sign off, my Internet connection is really acting up.
06:30 jeffreykegler Above was the link for the SLIF grammar written in SLIF.  A grammar that describes itself in its own language.  Douglas Hofstadter, eat your heart out!
06:30 jeffreykegler Good night!
06:30 Aria Good night!
06:32 jdurand Jeffrey, suppose I have 50 lexemes, and instead of using then directly like rule ::= LEXEME, I proxy all of them like: lexeme ::= LEXEME followed by rule ::= lexeme - will that be a big performance pernalty ?
06:32 jdurand "penalty"
06:35 jeffreykegler joined #marpa
06:36 jeffreykegler jdurand: No, I don't think it will be.
07:11 Aria Naive transitive_closure in javascript, 4x4 matrix? 100,000 iterations takes 15ms.
07:12 Aria And on that note, I go to bed.
08:50 ronsavage joined #marpa
09:11 ronsavage joined #marpa
16:10 sivoais joined #marpa
16:49 jeffreykegler joined #marpa
16:50 sivoais joined #marpa
17:24 jeffreykegler1 joined #marpa
17:27 jeffreykegler joined #marpa
17:32 jeffreykegler joined #marpa
17:37 jeffreykegler1 joined #marpa
18:36 jdurand joined #marpa
21:22 jdurand Jeffrey, in marpa_g_sequence_new(), it seems the separator Id must be the symbol Id of a rule, is that correct ?
21:41 jdurand forget about it, I may have an error on my side - sorry -;
22:19 ronsavage joined #marpa
22:28 jdurand FYI bug on my side confirmed - Hello and good bye at the same time Ron, I'm doing to bed -;
22:28 jdurand "to"
22:28 jdurand "going" grrr
22:46 ronsavage jeffreykegler: I've just finished reading 'A Female Genius', a biography of Ada Lovelace. It's very informative.
22:47 ronsavage Another reason Joop didn't gain traction was - probably - jealousy.
22:53 ronsavage I "got to like the old NAIF"????? Better to say I got to the point of pretending I could understand it. Haha.
23:04 Aria Joop's not too hard to understand. Easier than A&H anyway!
23:04 Aria Harder to get ahold of the paper though
23:55 ronsavage I haven't read those papers. My BSc in Maths didn't prepare me for that. I studied Special Relativity and waves on the surfaces of stars. No - I didn't understand it.
23:55 ronsavage It was so difficult, the uni (Monash) set the pass mark for the degree at 14% (sic), so I'm told.
23:58 Aria Yow.
23:58 Aria Joop and A&H are both more set math than anything. But you kind of have to go looking for the set aspects to see what they're talking about.

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary