Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-03-13

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:21 jeffreykegler joined #marpa
00:52 jeffreykegler jdurand: if my arithmetic is right, each character read is consuming slightly more than 270 bytes
00:54 jeffreykegler In this case, it's one Earley set per token, where 1 token == 1 character
00:57 jeffreykegler Under this model, a character is a fairly heavy-weight thing for Marpa -- each Earley set tracks all possible parses, a token is expanded into Unicode, data structures are created to memoize which regexes it matches, ...
00:57 jeffreykegler the Earley set has words for possible events, the memoized token value, etc., etc.
00:58 jeffreykegler plus overhead at the Perl level, all squeezed into less space than it would take to store 68 int's
00:59 jeffreykegler So if my arithmetic is right, this is Marpa doing what it does at a reasonable cost.
01:01 jeffreykegler All the behaviors you checked looked fine, and I checked how memory grew with increasing input lengths -- the growth is smoothly linear.
01:02 jeffreykegler Bottom line, structural (DOM-ish) approaches are much harder on the memory than the SAX-ish (lexical/regex) approaches ...
01:03 jeffreykegler and Marpa is firmly committed to the DOM-ish (structural) style, where the user wants to know the structure and is willing to pay the cost.
01:20 jeffreykegler On a related topic, Robin Smidsrød asked about the speed of Jean-Damien's new Marpa-powered XML parser, as compared to libxml.
01:21 jeffreykegler Robin asked on Google+, but I hope he won't mind if I initially reply here -- the topic is likely to come up again and I find the back logs here more convenient than finding old G+ exchanges.
01:21 jeffreykegler Anyway ....
01:22 jeffreykegler XML was designed down to the now-standard left parsers, and spec'ed down to them as well --
01:23 jeffreykegler Left parsers are bad at error reporting, for example, and reporting errors is IIRC, literally in violation of the XML spec -- you're not allowed to do it if you are fully conforming.
01:24 jeffreykegler Having had the spec designed for them, the left parsers were then carefully and expertly crafted in hightly optimized C code tailored to that spec.
01:25 jeffreykegler Playing this game,  Marpa is never going to put better number up on the board.  But ...
01:25 jeffreykegler ... on the other hand ...
01:26 jeffreykegler that does not mean it makes no sense to play.  Marpa has a lot to offer.  For example, spec or no spec, XML authors will appreciate good error recovery and reporting,
01:26 jeffreykegler and there is extensibility and customization, at all of which Marpa has the advantage.
01:27 jeffreykegler So it becomes a matter of what price Marpa is making users pay for the extra power and flexibility.  Reasonable users will want that to be as low as possible ...
01:28 jeffreykegler and it's worth looking at where Marpa could improve its numbers.
03:36 jeffreykegler joined #marpa
06:31 jdurand Jeffrey, would it be possible to accept a lexeme_complete that goes out of the end of input scalar ?
06:32 jdurand I mean: beyond the end
06:34 jeffreykegler jdurand: yes, I think
06:35 jeffreykegler That is, yes, in the sense I think you mean the question
06:35 jeffreykegler If you look at the test suite ...
06:35 jeffreykegler I rescan several of the input strings
06:36 jeffreykegler because it's an easier to do that than to construct the input I want.
06:36 jdurand Strange, because I am hitting "Bad length in slr->g1_lexeme_complete(): 5 at /usr/local/lib/perl/5.18.2/Marpa/R2/SLR.pm line 1596, <DATA> line 1."
06:37 jdurand What I do is the following:
06:37 jdurand I have my input string, and do "my $fake_input = ' '; my $pos = $recce->read(\$fake_input);"
06:38 jdurand and then I lexeme_alternatie()/lexeme_complete() my self for the entire stream
06:38 jdurand For that I have done prediction events on all G1's that are of the type G1 ::= lexeme
06:39 jdurand so that 100% of the lexemes are predicted, but not using a pause before on them, using a predicted event on their associated G1
06:39 jdurand i.e. I prevent Marpa to read anything. Instead it is my application that is reading and feeding.
06:41 jeffreykegler OK, now I see.
06:41 jeffreykegler The string for lexeme complete has to be completely inside the input string.
06:41 jeffreykegler Because it is potentially used for error messages.
06:42 jeffreykegler And this is the case even if the relationship of your tokens to the input string is very abstract ...
06:43 jdurand Hmmm...
06:43 jeffreykegler I didn't really follow your example
06:44 jdurand Ok. I have another question.
06:44 jeffreykegler Did I answer the first one?
06:45 jdurand Yes - I cannot use a fake input. This seems to be only related for error messages, which is an unfortunate constraint.
06:46 jeffreykegler Or rather it can be fake ...
06:46 jeffreykegler but part of the fakery has to make the lengths work out.
06:46 jdurand You could remove this constraint by saying "Oups, cannot find where is the problem" if the position of the problem is not reachable using the input scalar given to read()
06:47 jdurand I have another question, though.
06:47 jeffreykegler OK, we can move on if you'd like
06:47 jdurand Is it legal to do numerous lexeme_read() at the same position
06:47 jdurand I mean lexeme_complete, really, i.e.:
06:48 jeffreykegler YEs
06:48 jdurand Ah ah...
06:48 jdurand So this mean that, a possible bypass to my first problem is:
06:48 jeffreykegler And that was the question I thought I was answering initially
06:48 jdurand Ah ok, ....
06:48 * jeffreykegler thinks jdurand has guessed the answer
06:49 jdurand This mean that I can use a fake input provided that:
06:49 jdurand 1. at the position where I do the lexeme_complete
06:49 jdurand 2. the fake input contains the value of the lexeme I am putting
06:51 jeffreykegler The "string value" or at least something you would not mind seeing echoed back in an error message
06:51 jeffreykegler Which if you don't use it in the semantics can be anything at all
06:51 jeffreykegler so long as it is long enough
06:52 jeffreykegler which you can make work out by repeatedly re-starting at location 0
06:52 jdurand Ok, will try. I do not mind if Marpa's input scalar is not the reality, because in my implementation I do all the read()s myself, and always know where I am.
06:53 jeffreykegler I think I talk in the doc about fake inputs
06:53 jeffreykegler somewhere ...
06:54 jdurand I will check. One last question though: how is reliable the line_column(): you use the input scalar given in input to read(), or you use the values given to lexeme_complete() ?
06:54 jeffreykegler The only reason I require an input at all is that I want to avoid a lot of special-case-ing internally.
06:55 jeffreykegler IIRC the input scalar
06:55 jdurand I suppose this is the first answer, i.e. using a fake input is also making line_column wrong (still, fine with me). No pb that you require an input in fact. I just would like to have my application work with a fake input to Marpa.
06:56 jeffreykegler It should work fine, and I do some fakery in the test suite.
06:57 jeffreykegler But line columns for error messages will be meaningless.
06:57 jdurand Ok. No pb with that. Many thanks. Now time for me to move drive to my office for business work.
06:58 jeffreykegler Midnight my time, so good night!
06:58 jdurand Thx -;
13:10 LLamaRider joined #marpa
17:06 bohemias joined #marpa
17:59 jeffreykegler joined #marpa
23:23 yxhuvud joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary