Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2016-06-19

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:59 rgrinberg joined #marpa
01:27 ronsavage joined #marpa
01:48 ilbot3 joined #marpa
01:48 Topic for #marpa is now Start here: http://savage.net.au/Marpa.html - Pastebin: http://scsys.co.uk:8002/marpa - Jeffrey's Marpa site: http://jeffreykegler.github.io/Marpa-web-site/ - IRC log: http://irclog.perlgeek.de/marpa/today
02:55 dvxd PING
04:11 ronsavage DVXD: Ping
04:12 jdurand rns: thx
04:13 ronsavage DVDX: Re http://irclog.perlgeek.de/marpa/2016-06-18#i_12688145. I think of '::=' rules as grammar structure rules, where the RHS is composed of 2 types of tokens.
04:14 ronsavage Either the token are defined elsewhere on the LHS of a '::=' rule, or on the LHS of a '~' rule. But the RHS of the '~' are different. They must be found in the input stream.
04:16 ronsavage So the LHS of a '~' rule must be found in the BNF on the RHS of a '::=' rule, and the RHS of a '~' rule must be a string, which includes charsets of course.
04:17 idiosyncrat_ Re ::= vs. ~
04:18 idiosyncrat_ This differentiates G1 rules (::=) from L0 rules (~), which terms I will use so that you don't think a chicken just walked across your screen.
04:18 idiosyncrat_ :-)
04:19 idiosyncrat_ Marpa uses a two-layered grammar (a scheme pioneered by Andrew Rodland) where G1 is the upper layer and L0 is the lower one.
04:19 idiosyncrat_ The G1 rules have a semantics.
04:19 idiosyncrat_ The L0 rules do *not* have a semantics.
04:21 idiosyncrat_ That's the Marpa technicalities.  Now, in fact, G1 and L0 often reflect the parsing/lexing distinction, which was well know in the 1970's and which users of yacc will know, because yacc was invented in the 1970's.
04:21 idiosyncrat_ When parsing theory was invented, everybody thought of it as occurring in those two phases.
04:23 idiosyncrat_ This distinction seems also basic to natural written languages -- that dividing a line up into sentences, word and punctuation, is a first phase ...
04:23 idiosyncrat_ and deciding the structure of the sentence is a 2nd phase.
04:25 idiosyncrat_ I deal with this more in https://metacpan.org/pod/distribution/Marpa-R2/pod/Scanless.pod
04:28 kaare_ joined #marpa
04:57 idiosyncrat_ Good night!
09:10 dvxd idiosyncrat_: did you see the recent article about how human brains parse language at THREE levels... phoneme, phrase, and sentence level?  Perhaps there is some analogy to parsing?  Are we missing a third layer?
09:10 dvxd and the parsing is simultaneous
09:17 dvxd https://www.sciencedaily.com/releases/2016/06/160610140815.htm
12:35 jdurand rns: FYI I am finally implementing my own - almost finished already c.f. https://github.com/jddurand/c-genericHash - it is generic in the sense that it is not limited to a single datatype, everything can be hashed, hashing function and comparison functions are left to the user-space -;
12:35 jdurand AFK
14:08 kaare_ joined #marpa
14:55 rgrinberg joined #marpa
16:23 idiosyncrat_ joined #marpa
16:25 idiosyncrat_ The "phoneme layer" corresponds to the character set / codepoint issue in Marpa.
16:26 idiosyncrat_ A problem I have set myself to solve, perhaps in Marpa::R3, is how to deal with arbitrary codepoint systems.
16:26 idiosyncrat_ Marpa::R2 handles Perl's UTF-8 and ASCII.
16:27 idiosyncrat_ To work smoothly with Perl 6, Marpa::R3 will have to handle Perl 6's NFG as well.
16:34 idiosyncrat_ The above link was to an abstract -- here's one to the text http://www.honeylab.org/wp-content/uploads/christiansen_chater_BBS_2016.pdf
16:34 idiosyncrat_ It looks interested but I'm afraid I won'
16:34 idiosyncrat_ t be getting to it soon.
16:43 jdurand you might be interested in my tconv library at https://github.com/jddurand/c-tconv - it is an iconv-like API without the needed ot know in advance the charset -;
16:43 jdurand "the need to know"
17:02 idiosyncrat_ The approach I have to take is to make both Kollos (formerly Perl/XS layer) agnostic wrt charset.
17:03 idiosyncrat_ That is, it will not only not know in advance, like Libmarpa it will *never* know what charset it is dealing with.
17:04 idiosyncrat_ The basic idea is very simple -- push all of that to a higher level.
17:05 idiosyncrat_ The only complication is how to do that efficiently.
17:06 jdurand I was thinking to the same thing, and imagined simple callbacks having upper layer context in an opaque pointer, and the symbol ID as parameter, something like isLexeme(const void *opaqueContext, int symbolId)
17:07 jdurand so that the position in the real input stream, and how to interpret it, is left to the callback
17:18 idiosyncrat_ Right.  The trick is to minimize the number of callbacks.  Lua uses a scheme where the lower layer asks for a size, call it n, and then the upper layer replies with anywhere from 1 to n codepoints.
17:18 idiosyncrat_ That allows the lower layer to pick an efficient n, and the upper layer then makes things as efficient as possible.
17:20 idiosyncrat_ A question is how to implement things like boundary checks -- that is, am I at a word boundary?
17:20 idiosyncrat_ Only the upper layer can know the answer to this.
17:21 idiosyncrat_ But you don't necessary want this "side channel" information for every codepoint.
17:22 idiosyncrat_ It may have a significant cost to compute, there may be many such checks, and they may be rarely used.
17:22 idiosyncrat_ On the other hand, a special callback for it could also be costly.
18:05 jdurand Why would you want to implement boundary checks, if only the upper layer knows ? If the layer in the middle is about to be able to talk about code points, IMHO a safe thing is to impose UTF-8 encoding in this layer in the middle, Up to the upper layer to be able to answer to you in terms in the encoding that you impose
18:06 jdurand I.e. Lua layer (any charset) <-> your wrapper on marpa (UTF-8) <-> libmarpa (earlemes)
18:06 jdurand Just a thought -;
19:40 JPGainsborough joined #marpa
19:51 idiosyncrat_ I want someday to be able to trigger events based on whether or not they occur on a word boundary.
19:52 idiosyncrat_ This would allow alternative approaches to lexing
19:52 idiosyncrat_ AFK
20:30 idiosyncrat_ joined #marpa
20:30 idiosyncrat_ Just found this: http://act.yapc.eu/ye2016/talk/6752
20:31 idiosyncrat_ Cluj-Nopoca 24-26 Aug specific date and time to be announced.
20:31 idiosyncrat_ Be there or be square!
20:32 idiosyncrat_ s/Cluj-Nopoca/Cluj-Napoca/
22:08 ronsavage joined #marpa
22:09 dvxd joined #marpa
23:17 idiosyncrat_ joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary