Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-04-11

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
06:49 ronsavage joined #marpa
09:42 seki joined #marpa
09:42 seki hi
09:43 seki in the case I need to parse some data in loop against the same grammar
09:44 seki what is the canonical way to do?
09:44 seki instanciate a new Marpa::R2::Scanless::R each time?
09:45 seki as it seems that there is no mean to perform a new read() on the same recognizer
09:48 seki btw, thanks Ron for all the valuable info that you published (along Jeffrey's work) to help mastering Marpa, especially the article on conditional preservation of whitespace with custom lexer
15:30 jeffreykegler joined #marpa
15:32 jeffreykegler seki: re http://irclog.perlgeek.de/marpa/2014-04-11#i_8572208 -- what you're looking for is series_restart(): https://metacpan.org/pod/distribution/Marpa-R2/pod/Scanless/R.pod#series_restart
15:40 seki jeffreykegler: thanks. i am only discovering Marpa for a few days and I supposed that series_restart() was to get an alternate parse from the same data in the case of ambiguous grammar
15:41 seki or maybe I have missed a point in the doc
15:42 seki I am in the process of parsing a set of several files against the same grammar
15:42 jeffreykegler Repeated calls to value() produce the ambiguous parses of a "parse series"
15:44 jeffreykegler A parse series is another series of (possibly ambiguous) parses for the same input stream -- you can change the end point, though.
15:45 seki but it is possible to provide a new input stream? the doc states that read() is only allowed once per recognizer
15:46 seki or do you suggest that new data can be appended at the end of current input stream?
15:46 jeffreykegler No -- for a new input stream you need to create a new recognizer object
15:46 jeffreykegler You can reuse the grammar object, however.
15:46 jeffreykegler IF all you're doing is appending data -- there's a trick ...
15:46 seki ok, that was the point I wanted to clarified. Thanks
15:47 jeffreykegler First read the entire input stream into a recognizer ...
15:47 jeffreykegler Then create a recognizer, setting the end point to the end of the shortest input steam
15:48 jeffreykegler Then call series_reset with a different "end", pointing after the first section of "appended" data
15:49 jeffreykegler And so on, until you've finished all the "append" sections.
15:50 jeffreykegler I use this trick in the test suite to easily test inputs of various lengths -- I just $slr->read() the longest one, and then change the end points.
15:51 seki actually, i do not have absolute need to keep the same recognizer,
15:51 seki I was just wondering how to do in case of parsing multiple files with the same grammar
15:52 jeffreykegler If each of the different files is a prefix of any longer one, you can use series_reset()
15:52 jeffreykegler Keeping the same recognizer is more efficient, when it's possible
15:52 seki and creating one recognizer per file to process seems fine and surely less tricky
15:53 jeffreykegler If you're new, you might want your first attempt to throw a new recognizer each time ...
15:53 jeffreykegler and then convert to series_reset(), if appropriate
15:54 seki in my case, each file is another instance of the same grammar, but they are unrelated for each other
15:55 jeffreykegler The term "recognizer" is traditional in the parsing literature -- it means "to recognizer an input" -- so basically new input means new recognizer
15:56 jeffreykegler * "to recognizer an input" -> "to recognize an input"
15:57 seki also, I have seen some example code e.g. in stack-overflow related to the pre-R2 Marpa
15:57 seki (what is now NAIF interface, if I understood correctly) mentioning the grammar precompilation,
15:58 seki is there anything particular to do in a loop process, or is G->new(), R->new() R->read() and R->value() is necessary and sufficient?
15:59 jeffreykegler seki: grammar precompilation still has to be done at the lower level, that in hte SLIF that's one of the details that is hidden from the uses.
16:00 jeffreykegler The idea in the NAIF was that you add rules, symbols, etc., etc., but you needed some way to tell Marpa you were done adding things, and it could do what was needed to turn all the rules and symbols into a grammar in useful form.
16:01 jeffreykegler In the SLIF, you specify everything all at once in a file, so the the SLIF just silently makes sure precompilation happens.
16:01 seki yes I am kinda new to both Marpa (and recently initiated to Perl too :), but I am "playing" with parsers for some time,
16:02 seki being more accustomed to plain old lex/yacc or antlr but always with seperate lexer / parser generally speaking
16:02 jeffreykegler Interesting.  Most folks are the opposite these days -- you can become very much an expert programmer, and not know any parsing theory.
16:02 seki so used to talk in term of "tokens"
16:03 jeffreykegler I also was accustomed to a separate lexer/parser, having learned the craft in the 1970's
16:04 jeffreykegler The SLIF abstracts from that because modern programmers are not used to two-phase parsing ...
16:04 jeffreykegler and I got a lot of snark about it, as if it were some perversity I had invented.
16:05 jeffreykegler People gave me lines like: "You mean I need to parse in order to parse?" ...
16:06 jeffreykegler Because these days lexing is often called parsing ...
16:06 jeffreykegler so much so that many programmers are unaware the the parsing problem, in the traditional sense, even exists.
16:07 jeffreykegler It's kind of like in Orwell's 1984, where Big Brother removes word from the dictionary, knowing it will stop people from thinking in certain ways.
16:09 jeffreykegler For much of the profession (even, I emphasixe some quite expert members of it), the word "parsing" was, not actually removed, but redefined to mean the same thing as lexing ...
16:09 jeffreykegler So there was no longer a word that meant "find the *structure* of a stream of symbols"
16:10 jeffreykegler And I found in the initial reception of Marpa, that my problem was not so much people were rejecting my solution, as that they did not even realize that the problem I was trying to solve existed.
16:11 jeffreykegler I was going to have to raise awareness of the existence of the problem, if I was going to encounter actual rejection of my solution to it.
16:13 seki I am using programming languages for decades (*sigh*) but only recently dived deeply in the dark side of the compilation
16:13 seki by personal interrest but also with the idea to improve some data processing at work
16:13 seki so I met Aho the dragon keeper and also Dirk Grune, Appel and several other like Chuck Moore or McCarty to get some
16:13 seki alternative point of views on parsing and compilation
16:13 seki I feel ok with seperate lexing/parsing and btw, I have just a worry concerning the lexeme processing by Marpa
16:14 seki i feel the syntax quite unfamiliar, like the way for expressing a <lexeme>?
16:14 jeffreykegler You've actually met Aho and Grune in person?
16:14 seki (no, no, only through their publications :)
16:15 seki or the way that a lexeme with + or * need to be alone in its own grammar rule
16:16 jeffreykegler Some of the reason for the SLIF's syntax limits is attaching the semantics.
16:17 seki I feel it less expressive than many other parser I have experimented recently
16:17 jeffreykegler If you have xyz ::= abc* def ghi+, where do you put the actions?
16:18 jeffreykegler In the lexer, actions are not an issue, but I keep lexer rules and G1 rule syntax similar for reasons of orthogonality.
16:18 jeffreykegler Is there another parser generator you'd recomment I look at?
16:18 jeffreykegler * recomment -> recommend
16:19 jeffreykegler Also, I very much encourage people to take me out of the interface business, by writing their own SLIF ...
16:19 seki I a agree for the actions, but for lexemes, it feels limited (with all respect),
16:19 seki for example I have just written that rule today to get a date litteral:
16:19 seki <format date>~ [\d][\d][\d][\d][\d][\d][\d][\d]
16:19 jeffreykegler either as a layer on top, or more efficiently ...
16:20 seki while i am used to \d{8}
16:20 jeffreykegler using the "thin" interface of Marpa
16:20 jeffreykegler Oh, yes, numerical ranges
16:21 jeffreykegler That's something it would be useful to add -- the holdup with that ...
16:22 jeffreykegler Is that I need to plot how to best support it at the low C-language level.  Now you may ask, why is that an issue?  Why don't I just hack it into the SLIF at a high level and solve the low-evel problem later.
16:22 seki well, i guess that I must not have much to teach you, but if you could allow a bit more of usual regular expressions syntax in the lexer rules i guess it would be even more convenient
16:23 jeffreykegler And the problem is that potentially I am painting the SLIF into a corner -- any quick first implementation now has to be supported and a later better one may have to work around it for backward compatibility, or not be possible at all.
16:25 jeffreykegler You'll note here this is a problem only I have -- someone else writing an alternative interface to Marpa can add all this stuff, and I very much wish interfaces like that will emerge
16:30 seki i will suggest that to the perl nerd am working with that recently teached me perl and with who I am discovering Marpa
16:31 seki Marpa let me write a parser for some simple source files today in only 2 hours
16:32 seki looking at all I still have to do, i wont have finished with Marpa for some time
16:33 seki thanks for having published and maintaining something so useful, and thanks for your time
16:34 seki i need to quit for now, à bientôt
16:34 jeffreykegler seki: You are quite welcome.  bye!
16:36 jeffreykegler Btw, I've recently noticed that some people are learning Perl in order to use Marpa -- this seems to be a bit of a trend
16:37 seki i must admit it is quite scary at first, especially the perl part ;)
23:14 ronsavage seki: Thanx for the compliment
23:27 jeffreykegler joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary