Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-03-30

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:00 ronsavage Marpa::R2 V 2.083001
01:00 ronsavage Counts: Tests: 542. Modules: 8. Passes: 8. Fails: 0
01:00 ronsavage Duration: 1 minute and 40 seconds
01:05 jeffreykegler ronsavage: Thanks!
01:44 jeffreykegler1 joined #marpa
01:59 LLamaRider joined #marpa
03:09 jeffreykegler1 jdurand: re "Earley item warning threshold exceeded" -- that means your grammar is very, very ambiguous.  I put a limit on the ambiguity of the parse, set to be so high that it usually will not trigger unless it seems clear that the parse is on its way to overflowing memory.  You can remove the limit, in which case Marpa will add Earley items until it gets an "out of memory" error.
03:13 jeffreykegler1 Anyway, for XML I don't think a very ambiguous grammar will be necessary.
04:02 jdurand joined #marpa
04:05 jdurand Re http://irclog.perlgeek.de/marpa/2014-03-29#i_8511945 - yes I was using events - can try your method -; but this mean at least two passes on the same input isn't it
04:07 jeffreykegler1 jdurand: yes at least two passes
04:07 jeffreykegler1 Several fast passes are better than one slow one
04:07 jeffreykegler1 To put that last statement in context ....
04:08 jdurand alike in a chemistral chain reaction, for sure -;
04:08 jeffreykegler1 we have to understand that in the Marpa context we're committed to a DOM-ish way of doing things.
04:09 jeffreykegler1 Anyway, I'm guessing that if you first do a census of tags
04:09 jeffreykegler1 ... and build a custom grammar with only those tags
04:09 jeffreykegler1 ... you'll eliminate much (most?) of the need for events
04:10 jeffreykegler1 This is how Marpa::R2::HTML works, so this is something I've tried ...
04:10 jeffreykegler1 and it is part of the Marpa::R2 test suite
04:10 jdurand ok, I'll see - I have another idea also - will look to Marpa::R2::HTML code - many thanks!
04:11 jeffreykegler1 Btw, re HTML::Parser
04:11 jeffreykegler1 it's not strictly "Pure Perl" because the core lexing is done in C ...
04:11 jeffreykegler1 which does mean it is very fast ...
04:11 jeffreykegler1 and I am sure that it ports everywhere Marpa does ...
04:11 jeffreykegler1 because it is part of the test suite.
04:12 jeffreykegler1 HTML::Parser is not really a parser in the sense that it does not find the structure of the HTML, but it is a very high-quality lexer
04:14 jdurand Yep -; my write-up of "pure-perl" in quotes meant I new it was not a correct sentence! HTML::Parser is interesting because it is also seems to be a way to identify missing tags isn't it
04:14 jeffreykegler1 No
04:14 jeffreykegler1 HTML::Parser basically lexes tag by tag
04:15 jeffreykegler1 there may be some tricks HTML::Parser can do ...
04:15 jeffreykegler1 but in Marpa::R2::HTML finding missing tags is done at the Marpa level
04:15 jdurand Ah, ok - nevermind - I did not plan do that anyway
04:16 jeffreykegler1 The Ruby Slippers were invented for HTML/XML and the missing tags issue
04:16 jdurand Ok. I was thinking at a third way, also. Keep events to a minimum and only on tags that are greedy
04:16 jeffreykegler1 Greedy?
04:17 jdurand the XML grammar is a pain: it contains tags that are basically equivalent to char*
04:17 jeffreykegler1 Can you give an example?
04:17 jdurand while others are fixed strings
04:19 jdurand Yes, one of them is CDATA, c.f. rule [20] in http://www.w3.org/TR/2008/REC-xml-20081126/#dt-cdsection
04:19 jdurand a CDATA is a char* minus ']]>'
04:19 jdurand i.e. it is easy to get a CDATA lexeme that is eating 100% of the input while it is not expected!
04:19 jeffreykegler1 I think that is one thing that HTML::Parser *does* do for you
04:20 jeffreykegler1 It is very good at the hard lexical problems.
04:20 jeffreykegler1 IIRC Marpa::R2::HTML handles CDATA by just trusting HTML::Parser to do the job
04:20 jeffreykegler1 .... which it does very well
04:21 jeffreykegler1 Note that HTML::Parser has a special XML mode
04:21 jdurand Ah, ok - a closed look to HTML::Parser is needed so - many thanks! Otherwise, Marpa performs very very well modulo the events processing time!
04:22 jeffreykegler1 As I've said DOM will never beat SAX, and Marpa will never beat custom coded XML parser, but I believe it can come close
04:23 jeffreykegler1 and in the process offer a lot more flexibity and be a much smaller, more maintainable program.
04:24 jdurand I believe that as well. In fact the only thing I missed at the very beginning was a callback to get next character in Marpa. Getting next character in Marpa is let's say hardcoded in slr_g1_read
04:25 jdurand so the alternative is the pause before, when I say before it is not a "lexeme pause before" but really at the "lexeme prediction" level
04:26 jeffreykegler1 If HTML::Parser is doing the lexing, this should no longer be an issue, if I understand correctly
04:29 jdurand Ok. Another question: if I remember well, the call to $slr->progress() is claimed to be very fast with its default parameters
04:29 jdurand is that (still) correct?
04:30 jeffreykegler1 jdurand: Did I say that? :-)
04:30 jdurand hmmm I believe so when it is about current location, and only current location (!?) let me google to confirm/infirm!
04:30 jeffreykegler1 progress is one of the slower things you can do in Marpa
04:31 jdurand I am talking about current location only
04:31 jeffreykegler1 (I take we are talking about progress reports.)
04:32 jdurand yes, progress reports. Ok, you say it is one of the slowest things - so I give up on this
04:32 jeffreykegler1 I try to use progress only in exceptional or rare situations
04:32 jeffreykegler1 It's OK for things that don't happen a lot in a parse, or for error reporting, or for tracing ...
04:33 jeffreykegler1 but in particular if you're chasing libXML, it's something to be used very lightly and not at all if you can find another way
04:35 jeffreykegler1 In particular, use of progress reports implies you're doing to have a lot of logic in Perl, and in the race against libXML, that makes you PErl against its C language ...
04:35 jeffreykegler1 and that's a hard race to win.
04:36 jeffreykegler1 Do you remember my benchmarks against Parse::RecDescent?
04:38 * jdurand is (re)reading http://blogs.perl.org/users/jeffrey_kegler/2013/06/marpa-v-parserecdescent-a-rematch.html
04:39 jeffreykegler1 Anyway, those turn out to be all about who is doing how much in C and how much in Perl.
04:40 jeffreykegler1 Marpa's big advantage is that it's based on a mathematical parse engine, while recursive descent is custom code ...
04:40 jdurand Yes - understood
04:40 jeffreykegler1 so Marpa's parse engine could be recoded in C (which I did) and the advantage drops into every parse, whereas ...
04:41 jeffreykegler1 if you recode a recursive descent parser in C, you've optimized just that one parser.
04:41 jeffreykegler1 Anyway, in the benchmarks ...
04:41 jeffreykegler1 most of it is about who is using how much Perl ... the more Perl you use the worse off you are
04:43 jeffreykegler1 So I did two benchmarks against Parse::RecDescent and the first one Marpa was 10 times faster, because it used a lot of Perl where PRD used C.
04:44 jeffreykegler1 Then I redid Marpa as Marpa::R2 and used even more C and reran the benchmark ...
04:44 jeffreykegler1 and clocked in at 100 to 1.
04:46 jdurand These are interesting and impressive numbers, proving Marpa's approach is great since a generic approach beats a dedicated one -;
04:47 jeffreykegler1 Right, because generic means you can recode in C and reap the benefits everywhere, but note ...
04:48 jeffreykegler1 this advantage is neutralized in a situation like libXML, where they've paid the price of coding in C for just that one application, so at best it's C code against C code.
04:50 jeffreykegler1 Another advantage of generic that remains, however, is flexibility -- custom code is by nature hack-ish, hard to enhance and hard to maintain.
04:51 jdurand Indeed, what I look to libxml2 bug list, I am (blady) impressed given this library exist since years...
04:51 jdurand (badly)
04:52 jeffreykegler1 I've never looked at the libxml code, but I've looked at a lot of code for big left parsers, and it gets very, very ugly ...
04:52 jeffreykegler1 and it becomes very easy to understand why bugs are slow to get fixed.
04:58 jeffreykegler1 jdurand: Anyway, hope this has helped
04:59 jdurand Yes! Many thanks again ans as usual!
04:59 jdurand and
20:02 LLamaRider joined #marpa
21:39 LLamaRider joined #marpa
21:50 ronsavage joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary