Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-12-25

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:04 ronsavage If you're playing with Text::Balanced::Marpa, the test script is now called as perl -Ilib scripts/samples.pl info. The previous name was test.pl, which is confusing.
00:05 ronsavage I tested asymmetric delims such as q|xxx| and it worked, but I felt uneasy about the code, so I deleted that part of it. Still, it's good to know it works.
00:26 jeffreykegler joined #marpa
00:27 jeffreykegler Actually, i don't think q|xxx| presents any more difficulty than more standard pairs like (), [], {}
01:06 ronsavage True, but there's the problem of which chars to escape when using multi-char delims. For <: and [:, I escape the 1st, but for q| I escaped the 2nd. I will revisit this issue. It's in the TODO in the docs"
01:06 ronsavage "[:" => "[%".
01:10 jeffreykegler Oh, I see
01:10 ronsavage When my gut feel tells me to stop, I stop. It may be that there is no problem really. I'm being cautious since I'm currently extending the code to allow the user to specify delims, and that affects various char classes. And I've made dying optional when a closing delim does not match the last open delim.
01:25 ronsavage World Domination Next: Here's the plan. (1) Finish this module. This will allow the user to define all HTML tags as delims. (2) Run any web page thru Marpa::R2::HTML to get perfect HTML (is that true?). (3) Run that output thru this module to generate a DOM. Implications: Not just are all compilers going to be re-written to use Marpa, but all browers as well!
01:26 jeffreykegler Pitiful earthings!  Resistance is useless!!
01:26 jeffreykegler * earthings -> earthlings :-)
01:27 jeffreykegler Actually some other things are at work in HTML (escapes, etc.) besides the tags.  But those could be added as options.
01:28 jeffreykegler Or is they are left out, the resulting tool would still have some uses.
01:29 jeffreykegler * is they -> if they
01:51 ronsavage Hehehe.
04:37 ronsavage Sample output from Text::Balanced::Marpa: https://gist.github.com/ronsavage/24f2eb7fed8d4afd4821
05:41 ronsavage joined #marpa
07:52 jdurand joined #marpa
07:55 jdurand FYI https://news.ycombinator.com/item?id=8782716
10:06 lwa joined #marpa
14:59 koo6 joined #marpa
15:00 koo5 joined #marpa
15:11 jeffreykegler joined #marpa
15:12 jeffreykegler re http://irclog.perlgeek.de/marpa/2014-12-25#i_9851204 -- actually, as most of you know, the news about Earley parsing is even better than this.
15:14 jeffreykegler With Leo's speedup, Earley parsing is linear for every grammar that yacc can handle
15:16 jeffreykegler Loup is worried about complexity of implementation issues, but the difference between Earley-with-Leo and Earley-without is in some practical cases the difference between quadratic and linear, and that's significant.
15:18 jeffreykegler Practical speed consists of two things -- time complexity, which shows up in big-O terms, and the so-called "constant", which is what's left over, and which for Earley's was significant in the 1970's when the opinion about it that everybody has repeated for decades was formulated.
15:19 jeffreykegler Joop Leo took care of the time complexity issue for Earley's in 1991 and ...
15:19 jeffreykegler the hardware engineers has gradually consigned any issues with the "constant" to total oblivion.
15:21 jeffreykegler http://jeffreykegler.github.io/Ocean-of-Awareness-blog/individual/2013/04/fast_enough.html
15:22 jeffreykegler In the above post I look at issue of the "constant" in great detail.
15:23 jeffreykegler To use Earley's algorithm in its first decades, yes, you had to be prepared to pay a price in speed.
15:23 jeffreykegler No more.
15:36 jeffreykegler I also note in that exchange the confusion between "ambiguous" and "unintended" -- the idea that by being powerful enough to parse classes of grammar which include ambiguous grammars, Earley's is producing unintended parses.
15:36 jeffreykegler Actually the opposite is the case.  PEG and yacc will not parse ambiguous grammars, but are excellent at producing unintended parses.
15:37 jeffreykegler Earley's handles ambiguous grammars, but every parse will be one specified by your BNF.
15:52 Aria This really bears repeating.
15:52 Aria Over and over and over.
18:00 flaviu Has anyone tried to use Boehm GC with marpa?
18:02 jeffreykegler joined #marpa
18:04 jeffreykegler flaviu: Libmarpa has its own custom memory allocator, based on GNU obstacks, which takes advantage of the patterns of allocation in Marpa -- no resizes, and large objects which accumulate subobjects, and which are released all at once.  GC of any kind would slow it down.
18:05 flaviu That makes sense. I'd still like to try it to see what happens, but thanks.
18:06 jeffreykegler An typical allocation in Marpa is a comparison, a pointer increment and an assignment.
18:07 jeffreykegler Deallocation does not even have to look at each object -- things allocated are on obstacks which are a few, large linked blocks -- deallocation consists of deallocating those blocks, so it's literally less than one instruction per allocated object.
18:09 flaviu I'm looking to see if the livespan of the objects can be reduced
18:09 flaviu *lifespan
18:10 jeffreykegler To ease up on the space requirements?
18:10 flaviu Yes.
18:11 flaviu I'd like to benchmark the json parser in kollos so that I can put a number on the words "fast enough", but it runs out of memory on files larger than 10MB
18:12 jeffreykegler Is this json.c you mentioned the other day?
18:12 flaviu Yep
18:18 jeffreykegler GC might not help -- obstacks are cheap in terms of space -- the overhead is per large block, not per object, so most allocations have zero overhead.
18:19 jeffreykegler It's just that Marpa keeps a lot of information.
18:19 jeffreykegler You might want to experiment with subparses -- that is 3 layers --
18:19 jeffreykegler 1.) a lexer
18:19 jeffreykegler 2.) a fragment parser, which parses large fragments of JSON and produces a tree.
18:20 jeffreykegler 3.) a top level parser, whose pieces are fragments, and which gathers subtrees into a tree.
18:21 jeffreykegler This requires some strategy for dividing the grammar into fragments -- for efficiency, you want them large ...
18:21 jeffreykegler For space, you don't want to let them get too large.
18:23 jeffreykegler The reason that I don't think GC would help (might be counter-productive) is that right now there's nothing than can be thrown away -- so you need to come up with a strategy where stuff can periodically be thrown away.
18:24 flaviu Adding in a GC is nearly trivial, so I'll start with seeing if that helps.
18:25 jeffreykegler By the way, a Marpa parser within a Marpa parser is a strategy pioneered by Andrew Rodland (hobbs) and it is the way that the SLIF does its lexing -- the SLIF lexes by repeatedly creating Marpa subgrammars, getting the lexeme, and throwing away the subgrammar.
18:26 jeffreykegler And space is one of the motives for the throwing the subgrammars away.
18:26 jeffreykegler flaviu: re the GC.  Let us know!  I will be curious.
18:56 flaviu Well, it's certainly a lot slower, but that's expected
18:59 jeffreykegler Slower with the GC, you mean?
19:08 flaviu Yes, and it doesn't seem to decrease memory usage either.
19:09 jeffreykegler My guess was that it would increase it.
19:10 flaviu Possible, I haven't set up anything to measure memory usage besides waiting for the OOM killer.
19:10 jeffreykegler flaviu: I do have ideas for reducing Marpa's memory use for large files if you (or anyone else) is interested in taking this on as a research project.
19:11 jeffreykegler I'll type them up if anyone is interested in pursuing this -- realistically, other things will probably keep me from tackling this.
19:11 flaviu I'm afraid I'm not familiar enough with computer science to take on any research project. Although if you had time, I would like to read them.
19:12 jeffreykegler AFK: I'm going out now.  Merry Christmas!
19:12 flaviu Merry Christmas!
21:48 ronsavage joined #marpa
22:27 ronsavage How to avoid learning XS: https://metacpan.org/pod/Inline::Module::Tutorial
22:43 jeffreykegler joined #marpa
22:46 ronsavage Hint: Swim (https://metacpan.org/release/Swim), the replacement for Markdown, calls for someone to fork MarpaX::Languages::CommonMark::AST (https://github.com/rns/MarpaX-Languages-CommonMark-AST).
22:48 Aria Heh. "the replacement for Markdown"
23:00 lwa ingy does some interesting things, but not everything he does should be considered “important” or “standard”. He has an interesting toolchain including Pegex, Swim, and Zilla::Dist, but all of those lack adoption outside his immediate projects.
23:19 jeffreykegler lwa: good points.
23:20 jeffreykegler Something I've considered important is precise characterization of the capabilites of the tools I create.
23:20 jeffreykegler What grammars do I parse?
23:20 jeffreykegler Which ones do I parse in linear time?
23:21 jeffreykegler Examples are important, but so are specifications, especially for people taking on big projects.
23:22 jeffreykegler This would be one (of the very few IMHO) rational reasons new projects continue to use recursive descent.
23:23 jeffreykegler Because, painful and costly as recursive descent is, if you're starting a large project, there are ways to assure yourself that, 15 hacker-years laters, you won't discover that recursive descent just can't do the job.
23:25 jeffreykegler So this would be why when I announce "Hey everybody!  Great new parser!", folks continue to stick to the old tried-and-if-not-true-not-entirely-false-either methods.
23:33 ronsavage joined #marpa
23:34 ronsavage Of course, I almost wrote /a/ replacement, but then I thought I'd give everyone a tiny, git-wrapped (sic) provocation for Xmas. Hahaha.
23:37 jeffreykegler I like to follow the example of Jacques Cousteau, who said about Greenpeace: "I find that if I don't say nasty things about them, they don't say nasty things about me."
23:39 ronsavage And as for ingy's modules, I have never used any of them, except many, many years ago I played with IO::All for a couple of days. And the fact that I am not at all attracted to his type of modules makes me wonder what's going on (in my mind, and in his code). His new inline stuff may well be a bit hit, though.
23:41 jeffreykegler I only looked at it quickly, but I could not find any characterization of what C subset/variant it parses, and of precisely what it turns it into.
23:41 ronsavage In this case, the effort he's poured into Pegex would make abandoning it a huge decision, as it would be for any of us to abandon Marpa and adopt, say, Pegex.
23:43 ronsavage This can be said about more-or-less anyone about things they are committed to. Certainly it does not apply just to him or just to programmers. It is after all effectively the definition of conservatism.
23:43 jeffreykegler One way or another, abandoning Pegex is inevitable -- it does not scale.
23:44 ronsavage jeffreykegler: They target C++ too. See http://inline.ouistreet.com/node/y9ga.html
23:45 ronsavage One way to avoid abandoning $favourite_parser is to simply move onto another project where it is, or seems to be, applicable.
23:46 ronsavage AFK. But before I go: Anyone played with Const::Fast or Attribute::Constant. I'm about to choose 1.......

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary