Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-10-03

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
03:43 CQ_ joined #marpa
07:01 ronsavage joined #marpa
08:04 lwa joined #marpa
14:42 rns left #marpa
16:24 jeffreykegler joined #marpa
16:30 lucs joined #marpa
17:51 jdurand_ joined #marpa
17:52 jeffreykegler FYI: rns has been posting a lot of Marpa solutions as answers on Stackoverflow
17:53 jdurand_ Hello Jeffrey - I was wondering why you choose to put the length of lexeme in the SLIF interface into the lexeme_complete() whlie in the API this is in the marpa_r_alternative()
17:55 jeffreykegler The SLIF does not allow variable length and overlapping lexemes, only ambiguous ones, so lexeme span is always the same at any G1 location.
17:56 jeffreykegler Libmarpa allows much more freedom, and the alternative lexemes in an Earley can have different lengths, which is not allowed in the SLIF.
17:57 jeffreykegler A lot of the problem in developing the SLIF was deciding what Libmarpa features *not* to include.
17:58 jdurand_ Fine with me. Still, in the SLIF, do you character per earleme model or the token one
17:58 jeffreykegler Token.
17:59 jdurand_ The fastest one -;
17:59 jeffreykegler Wrt overlapping lexemes, a possible use case is sandhi, as in Sanskrit, but nobody has explored this yet.
18:01 jdurand_ Btw, recently my generated successfully parsed an XML - I decided to wrote a very generic routine, its abstraction is different than the SLIF's read() and resume() - c.f. https://github.com/jddurand/marpaXml/blob/master/src/internal/marpaWrapper.c#L1625
18:01 jdurand_ "my generated code"
18:02 jdurand_ the idea was to abstract completely the reader, and the producer of lexeme value and length
18:03 jeffreykegler This XML thing has been a big project -- when do you think you might "unveil" it?
18:05 jdurand_ it is very big, much more than what I though. Parsing is now a priori near to be complete. Remains the DOM interface. I could have stopped at this stage if I wanted to do only an AST. But for sure nobody will use it if there no DOM/SAX/STAX stuff. Which is a pain.
18:05 jdurand_ I have the storage mechanism. Finally I abandonne AVL in favour of an emebedded SQLite.
18:05 jdurand_ I have the parsing near to be 100% satisfactory
18:06 jdurand_ Remains the DOM stuff, less hard than the others, a priori.
18:06 jeffreykegler I have just uploaded Marpa-R2 2.096000 to CPAN
18:07 jeffreykegler It's major feature is to move the improved SLIF parse event documentation into a stable, indexed release.
18:07 jeffreykegler Testing is appreciated!
18:08 jeffreykegler jdurand: so it still has a ways to go?
18:09 jdurand_ you mean
18:11 jdurand_ Sorry I have not understood. If you mean this will be unvelled OOTD, I want it and this will happen: I am working on it every day
18:12 jdurand_ in my spare time, which had decreased to week-end and nights -;
18:14 jeffreykegler I was just wondering, but I expect it will be something of significance for us.
18:14 jeffreykegler Us == the Marpa community
18:16 jdurand_ I hope - to get it very popular I know what will be the last step. Do the JNI (Java Native Interface). This will be the last thing to do.
18:16 jdurand_ Ok, AFK - just follow eventually my commits on github -; you will see activity on it never stop...
18:17 jeffreykegler jdurand: yes, I have been following your commits.  Happy hacking!
18:44 jdurand_ joined #marpa
18:44 jdurand_ I remember once you wre looking for support of long long into libmarpa
18:44 jdurand_ was it for lexeme value and lexeme length ?
18:59 jeffreykegler jdurand: re http://irclog.perlgeek.de/marpa/2014-10-03#i_9452843 -- neither actually -- it was for support of Perl's version of Unicode, which allows 64-bit numbers to be encoded in a variant of UTF-8
19:00 jeffreykegler Since then I've decided to follow Lua's example, and leave Unicode to the layers above Marpa (and above Kollos when it happens)
19:01 jeffreykegler That is, they will stay 8-bit clean, but as far as including the tables and other apparatus needed for full Unicode support, that will not be done.
19:01 jeffreykegler Marpa::R2 will, of course, stay fully backward compatible.
19:02 jdurand_ joined #marpa
19:02 jdurand_ Re http://irclog.perlgeek.de/marpa/2014-10-03#i_9452907 - ok
19:03 jdurand_ would it be a major change to allow long long for lexemes? At least from API point of view this would be backward compatible.
19:03 jeffreykegler The 64-bit question should solve itself -- it's already getting hard to find a 32-bit machine.
19:04 jeffreykegler I discovered the hard way that "long long" is not widely enough supported, and had to yank it out when it caused failures -- I forget the specific platforms involved
19:04 jeffreykegler Are we talking about C code here?
19:05 jeffreykegler If so, you can simply have your own array, indexed by location, and include in it whatever per-lexeme data you want.
19:05 jdurand_ yes
19:05 jeffreykegler You can in Perl too, but it's a bit expensive.
19:06 jdurand_ Yes, I know - but it sounded too native to me to use the ROWID of a external storage - and ROWID is usually mapped to an unsigned long long
19:07 jdurand_ and even when you say indexed by location, this is limiting the location to INT_MAX. Which is rather huge I know, but not so much
19:08 jdurand_ This was just a wish, not a real pb, please note
19:08 jeffreykegler My guess is that on a 32-bit machine, you'll run out of memory long before you worry about INT_MAX ...
19:08 jdurand_ Yes for memory. But if the storage is external, the answer is no.
19:09 jeffreykegler These ROWID's are for Marpa lexemes, right?
19:09 jdurand_ Yes -;
19:10 jeffreykegler A parse that big will almost certainly run out of memory long before there are that many lexemes.
19:11 jeffreykegler And any machine with enough memory is very likely to be 64-bit
19:11 jdurand_ Yes, if all lexemes numbers are consecutive. No if you use an external storage again. Typical use is to delete rows/update rows. Then ROWID increment despite the number of rows can almost stall.
19:12 jdurand_ Anyway, this is not a pb for me at the moment, and it is very likely that in real world, you are right. My point of view was theorical.
19:12 jeffreykegler We're really talking Earley set ID's here, since lexemes don't really have numbers.
19:13 jdurand_ I implemented a check on that not-so-much-a-limitation at https://github.com/jddurand/marpaXml/blob/master/src/internal/grammar/xml_common.c#L71
19:13 jeffreykegler And the Earley set ID's are consecutive in the token model.
19:13 jdurand_ Ok, fine! Many thanks. I am sure this will be ok in real life -;
19:14 jeffreykegler Lunchtime! AFK
19:15 jdurand_ Fir earley set Id, I already have in mind the way to deal with enormous input data: one of the inner rule, the central one of XML, is (of course) a sequence, and I loop on recognizing it with successive recognizers, instead of using a single recognizer
19:15 jdurand_ Have a nice lunch
19:15 jdurand_ AFK as well
19:39 flaviu1 joined #marpa
20:31 flaviu11 joined #marpa
20:53 idiosyncrat joined #marpa
21:05 ronsavage joined #marpa
21:16 ronsavage Marpa::R2 V 2.096. Test statistics:
21:16 ronsavage Fails: 0. Files: 296. Modules: 6. Passes: 6. Tests: 296.
21:16 ronsavage Duration: 43 seconds
21:51 idiosyncrat ronsavage: Thanks!

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary