Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-09-17

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:47 daxim__ joined #marpa
01:51 ilbot3 joined #marpa
01:51 Topic for #marpa is now Start here: http://savage.net.au/Marpa.html - Pastebin: http://scsys.co.uk:8002/marpa - Jeffrey's Marpa site: http://jeffreykegler.github.io/Marpa-web-site/ - IRC log: http://irclog.perlgeek.de/marpa/today
01:52 jeffreykegler joined #marpa
01:54 sivoais joined #marpa
02:07 kkrev Is '\0' the recommended way to match the end of file in a grammar? seems to work.
02:08 kkrev btw, the memory blow up I was inquiring about went away after a system update and reboot. Maybe cosmic rays or something.
02:09 jeffreykegler kkrev: gremlins.  I knew it.
02:10 jeffreykegler Re EOF: since Marpa only directly knows about strings, there is no recommended way to deal with EOF.
02:11 jeffreykegler If you want an explicit EOF, it's up to you to make sure one is in the string, and up to you to make sure your grammar's idea of EOF is consistent.
02:11 jeffreykegler Here I'm talking about Marpa::R2.
02:11 jeffreykegler ... and Perl strings.
04:23 jeffreykegler joined #marpa
04:38 jeffreykegler I have just uploaded Marpa-R2 2.094000 to CPAN -- an indexed, stable release.
04:40 jeffreykegler Most important feature is a new very high-level interface, intended for beginners.
04:40 jeffreykegler Testing appreciated!
05:19 ronsavage joined #marpa
05:27 ronsavage Marpa::R2 V 2.094. Test statistics:
05:27 ronsavage Fails: 0. Files: 342. Modules: 7. Passes: 7. Tests: 342.
05:27 ronsavage Duration: 49 seconds
07:26 lwa joined #marpa
08:05 rns Found this one lurking about Lua — http://blog.reverberate.org/2014/06/beware-of-lua-finalizers-in-c-modules.html — hope it'll be useful.
08:58 ronsavage I've released Graph::Easy::Marpa V 2.04, MarpaX::Demo::StringParser V 1.08 and MarpaX::Languages::SVG::Parser V 1.05, to switch from action_object to semantics_package, and they're now all on github for the 1st time.
09:04 pczarn joined #marpa
13:53 rns Marpa-R2 2.094000 built, installed and runs ok for me under winxp and cygwin (5.18.1/cl/nmake and 5.14.2/gcc/make).
14:05 jeffreykegler joined #marpa
14:05 jeffreykegler ronsavage: re http://irclog.perlgeek.de/marpa/2014-09-17#i_9368665 -- Thanks!
14:27 rns left #marpa
14:52 rns joined #marpa
15:55 jeffreykegler joined #marpa
16:41 jeffreykegler rns: re http://irclog.perlgeek.de/marpa/2014-09-17#i_9368966 -- Thanks!
17:02 rns joined #marpa
17:03 rns joined #marpa
17:03 lucs joined #marpa
17:03 shadowpaste joined #marpa
17:03 hobbs joined #marpa
17:03 Aria joined #marpa
17:03 kkrev joined #marpa
17:03 sivoais joined #marpa
17:03 lwa joined #marpa
17:03 pczarn joined #marpa
17:03 jeffreykegler joined #marpa
17:48 kkrev Is there a way to modify http://scsys.co.uk:8002/423380 so that unstructured_text_body does not result in a list of chars? Or is it necessary to resort to pause/resume stuff?
18:05 rns kkrev: do you need unstructured_text_body as a string rather than the list of chars?
18:05 kkrev yes.
18:09 rns Can I ask why? <unstructured_text> is omitted (in parens) so <unstructured_text_body> won't be seen in the output anyway. Or it is just to avoid seeing that long list of chars?
18:12 kkrev The long list of chars is actually problem as some of these blocks can get quite big. The list seems to use a lot of RAM.
18:12 kkrev I need the data as a string, not as a list of chars.
18:14 shadowpaste "kkrev" at 50.190.12.218 pasted "confusion over resume lexeme_read stuff" (60 lines) at http://scsys.co.uk:8002/424130
18:15 kkrev I'm trying to handle the situation with the pause/resume stuff at the moment and failing to understand from the docs how the resume/lexeme_read stuff fits together.
18:15 jeffreykegler kkrev: For efficiency, you can define sequences of non-terminals chars -- something like <not end chars> ~ [^~*\n]
18:16 jeffreykegler These will be slurped up as strings, and won't interfere with finding the terminator.
18:19 kkrev I don't understand how <not end chars> can work. Won't it get tripped up on embedded '~' and '*'?
18:20 jeffreykegler The initial caret means "anything but"
18:20 jeffreykegler And I meant something like <not end chars> ~ [^~*\n]+
18:21 kkrev Yes, but as a character class it only matches one character.
18:21 kkrev Not the combination of them together that marks the terminal lexeme.
18:21 jeffreykegler <not end chars> ~ [^~*\n]+ would match a sequence
18:22 kkrev But "\n*~" would match and that's bad.
18:22 kkrev *wouldn't match
18:23 jeffreykegler rns: Did you get where I'm going with this?
18:25 jeffreykegler In addition to "unstructured_text_body ::= unstructured_text_char+", I'm suggesting another rule --
18:26 rns jeffreykegler: yes, I'll try it now.
18:26 jeffreykegler kkrev: Sorry -- cancel that last and start over :-)
18:27 jeffreykegler unstructured_text_body ::= unstructured_text_stuff+
18:28 jeffreykegler unstructured_text_stuff ::= unstructured_text_char | <not end string>
18:29 jeffreykegler So that the <not end string>'s are slurping up those characters which cannot possibly be part of terminators.
18:30 kkrev OK, thanks, I'll try that approach. <not end string> can be a string, not a character class, right?
18:30 jeffreykegler That is my idea, yes
18:32 kkrev I'll play around with that approach but regardless I am confused about how the resume stuff is supposed to work. I could just pause and use a regex to grab what I want -- that much is easy. I thought calling resume with the position at which to resume scanning was all that was necessary, but I don't even understand what it's doing.
18:36 rns kkrev: I thunk you need to call smth. like $re->lexeme_read( 'cwm_binary_data_terminator', $start, $span_length, $value ) // die; like in https://github.com/jeffreykegler/Marpa--R2/blob/master/cpan/t/sl_json.t#L360
18:36 jeffreykegler kkrev: yes, rns said what I was just about to.
18:37 rns and save $binary_data for later processing, probably with its span in the input.
18:38 kkrev thanks.
18:40 jeffreykegler kkrev: Also, if you can do trickery in slurping up whole strings, that probably will be faster than resume()/lexeme_read(), but either should work.
18:41 kkrev So if you use lexeme_read the $pos parameter to resume must be omitted?
18:49 jeffreykegler IIRC, the defaults make omitting $pos convenient.   But you can also specify it explicitly.
18:50 jeffreykegler To be 100%, the lexeme_read description is not exactly as clear as an Alpine lake on this topic.  I'll have to look at that.
18:51 jeffreykegler rns: I'm talking about this sentence in the lexeme_read description:
18:51 jeffreykegler "Current location in the input stream is moved to the place where read() paused or, if it never pauses, to $start+$length."
18:52 jeffreykegler I wonder if that's a cut and paste error.  If it's not, at a minimum I have to make it clearer -- if I can't understand the doc I can't expect anyone else to.
19:01 rns Well, I admit to not reading the doc much, just working from code examples.
19:03 jeffreykegler I think what that should be is:
19:05 jeffreykegler "If lexeme_read() pauses due to an event, current location in the input stream is moved to the place where lexeme_read() paused.  If lexeme_read() does not trigger an event, current location is moved to $start+$length."
19:07 jeffreykegler If you're doing lexeme_read(), events are probably a nuisance, since you're doing everything by hand anyway, but they occur, just the same as for read() and resume()
19:08 rns The doc says above "Completion named events can occur during the lexeme_read() method. Named events can be queried using the Scanless recognizer's events() method."
19:09 rns So yes, http://irclog.perlgeek.de/marpa/2014-09-17#i_9373957 makes more sense to me
19:10 jeffreykegler AFK.  Looks like I'll have some doc touch-ups to work on when I get back. :-)
19:12 rns BTW, the code underlying the doc (or vice versa) seems to be this — https://github.com/jeffreykegler/Marpa--R2/blob/master/cpan/t/sl_dyck.t#L70
19:22 rns kkrev: re http://irclog.perlgeek.de/marpa/2014-09-17#i_9373523 — if you s/unstructured_text_char ::= [\s\S]/unstructured_text_char ~ [\s\S]/, the list of lists becomes the list of chars that must use less RAM and perhaps will suffice for you needs.
19:23 rns s/you needs/your needs/
19:40 rns The difference in maxresident (/usr/bin/time) can be 2162688 (list of chars) to 3145728 (list of lists) for roughly 32k <unstructured_text_body> contents — seems to be an easy fix to save ~30% RAM for 32k strings.
20:18 shadowpaste "rns" at 77.120.243.111 pasted "kkrev's script based on jeffrey's <not end string> idea" (58 lines) at http://scsys.co.uk:8002/424162
20:19 rns This shaves off another 30% to 1062912 maxresident for 32k <unstructured_text_body> contents.
21:11 kkrev I hate to keep bugging you guys over and over, but I really don't get the lexeme_read thing from the docs or examples. I can't make the most trivial usage of my own work. I really need the binary data stuff I've been talking about as one string, not a list, and I assume the way to do that is the pause stuff. But I can't figure out even the basics of how that's supposed to work.
21:11 shadowpaste "kkrev" at 50.190.12.218 pasted "how the heck is lexeme_read supposed to be used?" (63 lines) at http://scsys.co.uk:8002/424175
21:13 kkrev I get the data I need just fine. That wasn't hard at all. I cannot figure out for the life of me how one properly tells marpa to go back to eating the stream.
21:20 rns kkrev: re http://irclog.perlgeek.de/marpa/2014-09-17#i_9374639 — with http://irclog.perlgeek.de/marpa/2014-09-17#i_9374432 you can get the string by join("",...)'ing the list of strings in the tree.
21:24 kkrev I am interested in how the pause stuff should work anyway. I don't see how joining the list in a sort of postprocessing step is superior. Regardless, I would like to understand how resume is supposed to work.
21:36 shadowpaste "rns" at 77.120.243.111 pasted "kkrev's http://irclog.perlgeek.de/marpa/2014-09-17#i_9374640 revised" (81 lines) at http://scsys.co.uk:8002/424176
21:38 rns kkrev: You can look into http://irclog.perlgeek.de/marpa/2014-09-17#i_9374797 — hope this helps.
21:39 rns kkrev: it works as far as lexeme_read() is concerned.
21:42 lwa Random style advice: don't do "my $data = substr($input, $pos)". Instead put the current regex match position where you want it "pos($input) = $pos", then anchor the regex match at that position: "$input =~ /\G .../". Why? "substr" returns a view on the whole rest of the string, and assignment actually creates a copy.
21:44 rns It's getting late here, AFK.
21:44 rns left #marpa
21:46 kkrev much thanks to both of you.
21:47 jeffreykegler joined #marpa
22:10 shadowpaste Someone at 124.170.35.156 pasted "Using substr() and wanting someone to re-write to use pos($s) = $pos" (135 lines) at http://scsys.co.uk:8002/424179
22:11 ronsavage I'm intrigued by lwa's suggestion to use "pos($input) = $pos". I've never tried to assign to pos. I've posted a substr() gist http://scsys.co.uk:8002/424179. Can someone please re-write to assign to pos()? TIA.
22:19 lwa ronsavage: The pos() technique is not applicable here, as you're using an explicit length parameter for "substr". Therefore, no performance concerns arise. Using pos() is advisable whenever you're using *regexes* to take over parsing.
22:30 jeffreykegler I'm reading my docs re events, and realize they need improvement.  I'm moving that toward the top of my priorities.
23:00 ronsavage lwa: OK.
23:56 ronsavage kkrev: There is an article on using pause here: http://savage.net.au/Ron/html/A.New.Marpa-based.Parser.for.GraphViz.html
23:58 kkrev thanks

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary