Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-12-07

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
03:00 jeffreykegler joined #marpa
03:50 jeffreykegler FYI: In putting Kollos together, I've put it under the "MIT License" -- the one which Lua uses.
03:51 jeffreykegler I will be the copyright holder.
03:52 jeffreykegler Also, I am considering switching Libmarpa to a more liberal license, perhaps the MIT License.
03:53 jeffreykegler Note that some sublibraries in Libmarpa are derived from LGPL'ed libraries, and must remain under the LGPL -- this is true of the obstack code and of the AVL code.
05:23 jdurand joined #marpa
05:24 jdurand Jeffrey, aside using a pause predicted, is there an elegant way with the grammar to force a lexeme to have word boundaries ? Thx.
05:30 jeffreykegler joined #marpa
05:31 jeffreykegler jdurand: Usually the longest match discipline forces a word boundary.
05:31 jeffreykegler That is, if "week" and "weeknight" are both lexemes, "weeknight" because it goes all the way to a word boundary.
05:32 jdurand I'd say: xxx_this, and after the '_' there are two possibilities: 't' or 'this'
05:33 jeffreykegler jdurand: btw did you notice the SQL question on the G+ group?  Something about the output of your parser IIRC.
05:33 jdurand I'd like to not have 'this' matched because it is not at a word boundary
05:33 jeffreykegler And the longest match does not take care of that?
05:34 jdurand In latm mode, 'this' will take over but the grammar fails - I know grammar will be ok if 't' would have matched
05:34 jdurand (yep, will answer on G+ group)
05:35 jeffreykegler But you want word boundaries?  So you don't want 't' to match, do you?
05:36 jeffreykegler Because in the example, 'xxx_t' does not end at a word boundary
05:37 jdurand I want 't', 'h', 'i' and 's' to match, not 'this'
05:39 jeffreykegler OK, so you do *not* want word boundaries -- kind of the opposite.
05:40 jeffreykegler You want word which end a places which are not "word boundaries" in the usual sense -- that is, not the boundary between word and non-word characters
05:40 jdurand The subtilitty is here: if a lexeme length is defined as a string that have length > 1, I want it to match word boundaries. If its predicted length is 1, I do not mind
05:41 jdurand The difficulty is that it applies only to strings, not character classes. I.e. THIS ~ 'this' and T ~ 't'
05:41 jeffreykegler Sounds like command line options
05:42 jdurand Yep -; I'll do that programmatically with predicted events, nevermind - inspecting the grammar, is it possible to know if a lexeme is defined as a string and only a a string ?
05:42 jeffreykegler Could you re-phrase that last question?
05:44 jdurand If a lexeme is defined as THIS ~ 'this', I want word boundaries. If a lexeme is defined as THIS ~ [tT] anything, I do not want word boundaries. If a lexeme is defined as T ~ 't' I do not mind about word boundaries
05:45 jdurand ... Do not headache with it - I'll solve that!
05:45 jeffreykegler I think that should be do-able with LATM, though I may not understand the question fully.
05:46 jeffreykegler That is, if the input is 'this', you should be able to match all 4 characters if and only if they are a lexeme acceptable by the grammar.
05:46 jeffreykegler If it is not acceptable, then you'll fall back onto shorter lexemes -- presumably your single characters.
05:46 jeffreykegler I may be overlooking a difficulty, however, my apologies if so.
05:47 jdurand No pb. Let's go back to LATM: Suppose that 'this' is matched. Lexer continues. But later on it is not ok. LATM will not roll back to try previous alternatives that are short than 'this' isn't it?
05:48 jdurand "shorter"
05:48 jeffreykegler "later on" in the sense it does not result in a good parse after more lexemes are read?
05:48 jdurand Exactly.
05:49 jeffreykegler Inside, Libmarpa allows variable length lexemes, so Marpa's ambiguity handling would solve this problem ...
05:49 jeffreykegler but that ability is not exposed in the SLIF.
05:50 jdurand Ok, no pb.
05:51 jeffreykegler You can also move the problem up out of the lexeme layer, and solve it in G1, which can handle ambiguities OK.
05:52 jdurand The fact that I am used with the direct C stuff sometimes confuses me - yes, I was thinking to move that into G1 as well. Good idea. I have two solutions from SLIF point of view, so. Paused before lexemes, or everything in G1. Thx -; !
05:52 jeffreykegler That is -- S ::= LONG A B C | SHORTS X Y Z ; LONG ::= T H I S ; SHORTS ::= T H I S ; T ~ 't' ; H ~ 'h' ; 'I' ~ 'i' ; S ~ 's'
05:53 jeffreykegler <LONG> is no longer a lexeme, which may not be convenient, but I think it will work.
05:54 jeffreykegler You'll get a <LONG> or <SHORTS> depending on whether you see "A B C" or "X Y Z" afterwards
05:54 jdurand Yes, this will do it. And for convenience, nothing prevents me to have an action on LONG that return "this", an action on SHORTS that return "this" - faking the lexeme!
05:54 jeffreykegler Exactly!
05:55 jdurand He he great great - many thanks - this will do it. Since I have a grammar generated programmatically, this will be fun (and short -;) thing to do
05:55 jeffreykegler Great!  It's late here, so I'm going AFK.
05:56 jdurand ok - thx & cheers
05:56 jeffreykegler jdurand: btw I'm glad to see your impressive projects are now getting more attention.
07:09 flaviu joined #marpa
08:52 lwa joined #marpa
10:04 pczarn joined #marpa
10:20 koo5 joined #marpa
11:14 lwa Does anyone know how to implement non-associative binary operators in the SLIF? Left associativity is the default, but there's no "assoc => none" adverb. I'd rather not encode all precedence levels manually.
11:15 lwa Motivation: Some operators have no agreed associativity, and I want to force the usage of parens in these cases.
11:16 lwa Example: fractions / division. "1 / 2 / 3" could mean "(1 / 2) / 3" (most languages do that) or "1 / (2 / 3)". How must the grammar "E ::= Number | '(' E ')' assoc => group || E '*' E | E '/' E || E '+' E | E '-' E" be modified to disable associativity for division, and have all "E"s refer to the next-tightest precedence level?
12:54 pczarn does marpa_g_new mutate configuration?
12:59 pczarn ah, I guess it could be mutated at any time to store the error code
13:02 koo6 joined #marpa
13:04 pczarn what happens when I unref the base grammar and keep using time objects?
13:20 koo6 segmentation fault?
13:30 pczarn not after a single unref, apparently. "Child objects “own” their parents, and when a child object is successfully created, the reference count of its parent object is automatically incremented to reflect this."
13:32 jdurand joined #marpa
13:35 shadowpaste "jdurand" at 217.168.150.38 pasted "typical marpa C calls" (881 lines) at http://fpaste.scsys.co.uk/451285
13:36 jdurand pczarn: if that can help, the pastebin upper show a typical working sequence of calls to marpa C library. Please note how the _new() and _unref() calls are ordered
13:39 jdurand Also: marpa library is thread-safe but not reentrant - this mean you can use any marpa object in any thread created by any thread, provided that the calls are to it sequential (i.e. as if running in a single thread)
13:39 pczarn thanks, I see, I'm testing my bindings with that "2 - 0 * 3 + 1" example
13:39 jdurand "the calls to it are"
13:40 jdurand Ok, this example is very good - and if you have a trace compatible with mine, your logic will be ok
13:40 jdurand will backtrack the IRC - AFK
13:41 pczarn I'm more interested in what could possibly happen and how it works internally
13:43 koo6 jdurand, about a week ago i spent about a hour unsucessfully trying to pose a question that would make jeffrey say this, the "created by any thread, use in any thread"
13:44 pczarn what's the motivation for including reference counting in libmarpa?
13:46 koo6 actually, the answer was negative, but without any details. The doc is clear: "While Libmarpa can be used safely across multiple threads, a Libmarpa grammar cannot be. Further, a Libmarpa time object can only be used safely in the same thread as its base grammar. "
13:46 koo6 " This is because all time objects with the same base grammar share data from that base grammar. " doesnt make a lot of sense to me
13:47 koo6 as a reason
13:48 koo6 pczarn, for what language are you creating bindings?
13:48 pczarn currently Rust
13:50 pczarn are there more extensive examples that use the low-level interface?
13:50 koo6 not sure, i reimplementend the example in the docs for R2:Thin
13:51 koo6 as a first experiment
13:51 koo6 and thats the one you have, apparently
13:51 koo6 2 - 0 * 3 + 1
14:01 pczarn it would be very helpful to have a listing of all functions grouped by the way they report errors
14:06 koo6 cant argue with that
14:07 pczarn how does SLIF work? Do I need to understand and use L0?
14:08 koo6 i think the only offender is marpa_r_alternative, the rest signal error consistently with either -2 or null
14:09 koo6 or rather the only offender ive found so far
15:04 pczarn what's the lower bound on stack size, approximately? I'd like to preallocate it
17:27 jdurand joined #marpa
17:27 jeffreykegler joined #marpa
17:29 jeffreykegler lwa: re http://irclog.perlgeek.de/marpa/2014-12-07#i_9770150 -- I'll think about this a bit.
17:29 jeffreykegler One way is to not use the '||' priorities, and code them by hand.
17:30 jdurand Re http://irclog.perlgeek.de/marpa/2014-12-07 - My understanding of SLIF is that it is a grammar (named G1) that has natively embeded a subgrammar (named L0) - i.e. one can very well write a full working grammar with G1 only. L0 is a very conveniant way to express tokens (aka lexemes) that obey to a subgrammar. The value of a L0 is injected as a whole token witin G1.
17:31 jeffreykegler If you don't force associativity, the grammar goes ambiguous, which you can catch and fail on.
17:32 jeffreykegler lwa: I think you can also write the BNF so that, if operators of a certain kind come more than two in a row, they must be parenthesized.
17:34 jeffreykegler pczarn: you're right.  The ref-counting scheme Libmarpa uses is very common -- it's exactly the same as Perl, Perl/XS use internally, for example.
17:35 jdurand Re http://irclog.perlgeek.de/marpa/2014-12-07#i_9770402 - the current documentation is organized in terms of functionnality - this is fine with me - alike the C man pages, wanting to have an exhaustive list of errors requires to look to the documentation of the related function explicitely - I am personnaly used with this way of working -;
17:36 jeffreykegler One advantage of the scheme, is that the programmer does not need to keep track of the interdependencies of Marpa objects.  They increment the counts they need when they are created, and decrement them when they are destroyed.
17:37 jeffreykegler jdurand: re http://irclog.perlgeek.de/marpa/2014-12-07#i_9770336 and thread safety -- actually Marpa is *NOT* thread-safe in that sense.
17:38 jdurand Re http://irclog.perlgeek.de/marpa/2014-12-07#i_9770553 - you might want to take advantage of my generic stack approach - I use a very small value of 4 as initial stack size
17:38 jdurand jeffreykegler - ah, ok - please correct me then - thx & apologizes
17:38 jeffreykegler For thread purposes, every Marpa object has a base grammar -- itself if it is a grammar, or the grammar from which it was created, directly or indirectly.  Call these object the grammar's family.
17:39 jeffreykegler Every Marpa object in the same grammar family must be used in the same thread.  The reason is that they all mutate data in their base grammar.
17:39 jdurand Re http://irclog.perlgeek.de/marpa/2014-12-07#i_9770553 I meant https://github.com/jddurand/marpaWrapper/blob/master/src/genericStack.c
17:40 pczarn I have to check the stack size always before accessing it at arg_0 or arg_n, right? if so, my code is technically wrong
17:41 jeffreykegler koo6: re http://irclog.perlgeek.de/marpa/2014-12-07#i_9770347 -- yes I remember that conversation re threads.  There are major limits to Marpa's threading ability.
17:42 jeffreykegler Note that I don't think, so far, those limits have been obstacles to any application -- you can work around them by creating multiple grammar objects from the same grammar.
17:43 jdurand pczarn: Marpa will not require you to get something from the stack that it has not already told you to push in a prior step
17:43 jeffreykegler Kollos, by the way, will allow cloning grammars.
17:43 * jeffreykegler jeffrey is answering question from last night FIFO, by the way, for those trying to follow all of this. :-)
17:44 * jeffreykegler , in other words, is answering questions in the order posed.
17:45 jdurand jeffreykegler: indeed I was thinking that using multiple Marpa_Recognizer's in multiple threads based on a same Marpa_Grammar should not cause any problem - only the creation of a Marpa_Recognizer has to be synchronized with respect to this famous reference count isn't it
17:46 jeffreykegler pczarn: re http://irclog.perlgeek.de/marpa/2014-12-07#i_9770402 -- the closest thing to a "census" of how Libmarpa functions report errors is in Marpa::R2's THIF document: https://metacpan.org/pod/distribution/Marpa-R2/pod/Advanced/Thin.pod
17:47 jeffreykegler It describes a "general pattern", and then lists exceptions.  However, the THIF introduces some regularities, so this would not 100% reflect Libmarpa.
17:48 jeffreykegler If a census of Libmarpa's error reporting is contributed, I'll include it in the docs, and update it.
17:49 jeffreykegler jdurand: re http://irclog.perlgeek.de/marpa/2014-12-07#i_9771504 -- multiple recce's mutate the same date in their base grammars, and reference counting offers no protection against this whatsoever.
17:51 jeffreykegler Reference counting's only purpose is to keep objects from being destroyed while you still need them (and make sure they are destroyed when you don't).  Ref counting does nothing to enforce data integrity.
17:53 jeffreykegler jdurand: re http://irclog.perlgeek.de/marpa/2014-12-07#i_9770323 -- thanks for pasting this, good idea.  A very useful way of illustrating some of the points being discussed.
17:54 jeffreykegler pczarn: re http://irclog.perlgeek.de/marpa/2014-12-07#i_9770553 -- don't know the lower limit on stack -- if you get numbers, let us know!
17:56 jeffreykegler By the way, Libmarpa is very careful about stack usage -- in particular it makes *no* use of function recursion, which when you think about what the application is, is pretty remarkable.
17:56 jeffreykegler When Libmarpa has to do the equivalent of a recursion, it creates it's own stacks, so that the function call stack stays small.
17:57 jeffreykegler The motivation for this was threading -- at the time I started, some threading implementation were reported to restrict each thread's stack to a maximum, and *not* a very big one.
17:57 jdurand rns: Re http://irclog.perlgeek.de/marpa/2014-12-06#i_9767559 - implemented as per your link, without the 30 chars limitation - thx again
17:58 pczarn interesting
17:59 jeffreykegler Libmarpa is unlikely to "blow stack", even in highly restrictive environments -- so that's one way in which it *is* very thread-safe.
17:59 pczarn what kinds of tokens are accepted by L0? Any characters?
17:59 jeffreykegler Re limiting Libmarpa's stack usage -- I don't know if those restrictive threading environments are still in use, btw -- so all of that may be unnecessary at this point.
18:02 jeffreykegler pczarn: re http://irclog.perlgeek.de/marpa/2014-12-07#i_9771474 -- OK, I think I see the context -- we're not talking about the function call stack, but the stack you have to manage for evaluation.
18:02 pczarn we can talk about both
18:03 jeffreykegler pczarn: not at once! :-)  I'm too old to multi-process. :-)
18:03 jeffreykegler pczarn: re the Marpa evaluation stack -- that does not even exist unless you create and keeping it appropriate sized is 100% up to you.
18:04 jeffreykegler Btw, that's one Libmarpa feature I might do over if I were rewriting -- at the time I had in mind allowing users to do their own tricky implementations -- alternatives to stacks, etc., etc.
18:05 jeffreykegler There's a such thing as allowing too much flexibility, and I often do it -- I think having the application manage its own evaluation stack is perhaps one example of that fault.
18:07 jeffreykegler pczarn: re http://irclog.perlgeek.de/marpa/2014-12-07#i_9771556 -- the SLIF uses Perl strings, so it's Unicode.
18:09 jeffreykegler rns: I hope you'll backlog through this jungle. :-)  Thanks for the answer on Stackoverflow ...
18:09 jeffreykegler I think it best not to mention Kollos on Stackoverflow, or in dealing with new user questions ...
18:09 pczarn re flexibility: I can't claim I could replace AVL anything better, but... I had in mind doing the ref-counting myself, if at all :)
18:11 jeffreykegler we're of course excited about Kollos, but it doesn't really exist at this point, and when it does it will be very, very alpha.  Kollos is not even bleeding-edge at this point.
18:12 pczarn and the Marpa evaluation stack might be fine
18:13 * jeffreykegler wonders if he caught up with the questions
18:13 jeffreykegler Am I caught up?
18:13 pczarn Does the SLIF have multiple symbols? how?
18:14 pczarn yes.
18:14 jeffreykegler "multiple symbols"?
18:15 pczarn hard for me to imagine parsing individual characters
18:16 jeffreykegler pczarn: Still not sure what you mean, but there are character classes.
18:17 pczarn ah, how much does it differ from regex?
18:17 jeffreykegler In fact, they're Perl's character classes.
18:18 jeffreykegler They are exactly the character classes in Perl's regexes -- I pass them up to Perl for matching ...
18:18 jeffreykegler that's expensive, but I deal with that by memoizing the answers.
18:20 jeffreykegler Btw, it's nice to see all this interest.  The first years of my work on Marpa were very lonely.
18:26 jdurand jeffreykegler: glad you keeped up - I sort of predicted interest growing for sure this year - I am happy to see this happening
19:10 koo6 pczarn, about the return value list, i might go and try, if youre not on it already
19:14 pczarn I'm not on it
19:17 pczarn when creating numbered objects, generally any negative return value is a failure (not just -2?)
19:18 koo6 For methods that return an integer, Libmarpa usually reserves −1 for special purposes, such as indicating loop termination in an iterator. In Libmarpa, methods usually indicate failure by returning −2. Any result greater than −2 usually indicates success.
19:18 koo6 :)
19:19 pczarn what about results less than -2?
19:19 koo6 oh
19:22 jeffreykegler For results less than -2, if the documentation does not say anything, it's best to treat them as fatal internal errors.
19:23 jeffreykegler If results less than -2 are not documented, however, an application should also be able to safely ignore the possibility they will happen.
19:26 jeffreykegler AFK
19:46 koo6 in marpa_version (int* version), the reason it is an int* and not a version[3] is for compatibility?
19:53 koo6 jeffrey, thanks for the answer on the mailing list, although we seem to be talking right past each other:)
19:54 jdurand koo6: re marpa_version - c.f. http://irclog.perlgeek.de/marpa/2014-10-12#i_9497134 - basically it was done like that at the beginning in the answer -;
19:54 jdurand "is the answer"
20:02 koo6 thx, got it
20:11 jeffreykegler joined #marpa
20:13 jeffreykegler re http://irclog.perlgeek.de/marpa/2014-12-07#i_9772165 -- jdurand is right :-)
20:15 jeffreykegler I've been doing "int*" since 1971.  I suppose version[3] might give the optimizer some extra chances, but I doubt it -- in C you can't assume someone won't try to go beyond the array's declared size -- it might be variable length.
20:16 koo6 cool cool
20:16 koo6 not about optimizing, about typechecking
20:17 jeffreykegler Hmmmm.  might be a good idea for the future.
20:17 koo6 although, pythons cffi didnt catch me passing in a single int pointer anyway:)
20:19 jeffreykegler koo6: re http://irclog.perlgeek.de/marpa/2014-12-07#i_9772228 -- another way to say it, if you seem to believe that the problem is global mutated data ...
20:20 jeffreykegler but it is *shared* mutated data.  Globals are just one way of sharing, and they are all thread-unsafe without the kind of special precautions that in the case of grammar data ...
20:20 jeffreykegler I have not taken.
20:23 jeffreykegler So absense of global sharing is evidence for thread-safety, but not proof of it -- there are endless ways of making threads unsafe. :-)
20:25 koo6 pczarn, https://gist.github.com/anonymous/0bb908a79c17b874b6d3
20:25 koo6 jeffreykegler, i will try writing an example
20:25 pczarn thanks
20:26 koo6 unfinished
20:27 jeffreykegler koo6: OK, if you want to, but I suspect it won't advance the discussion ....
20:28 jeffreykegler I'm likely to glance at it, decide I can't certify as safe, and therefore announce it is unsafe.
20:29 jeffreykegler Which may seem unfair, but if you think about it, I have to be that way.
20:30 pczarn absence of shared data mutation is a proof of thread-safety, FWIW
20:40 koo6 thats perfectly understandable, i just wish to *what* causes libmarpa to be unsafe in that regard, or if it is future-proofing
20:40 koo6 *know
20:40 koo6 please check it out: https://gist.github.com/anonymous/759478b7109988b2e7b7
20:43 koo6 pczarn, im omitting the -1's like with "If rule rule_id is valid, but is not the rule ID of sequence rule, −1."
20:43 koo6 just looking for the return values that should cause Exceptions
20:45 koo6 although on a second thought...
20:50 jeffreykegler koo6: I just sent an email to the G+ group, which more or less repeats your example as pseudo-code, and explains why I don't consider it safe ...
20:50 jeffreykegler bearing in mind I don't know the guarantees for Python, so it might well be safe in Python's threading model.
20:51 jeffreykegler For me to shared data in Libmarpa, and announce that it is thread-safe, it has to be thread-safe for *every* threading scheme, now or in the future.
20:51 jeffreykegler That's restrictive.
21:01 jeffreykegler Re -1 and -2 error returns, it might [or might not :-) ] be useful to know my "concept" behind the distinction.
21:02 jeffreykegler It is, frankly, the one bit of Perl-interface type thinking that I allowed to sneak into Libmarpa.
21:03 jeffreykegler -1 might be considered as being intuitively, like a Perl 'undef' -- (for those who don't know Perl, it has an explicit undefined value).
21:04 jeffreykegler -2 is strictly for fatal errors.
21:04 jeffreykegler Perl's use of 'undef'
21:04 jeffreykegler returns is not a well defined thing.
21:04 jeffreykegler So -1 can be, in fact, any value it is convenient to single out as special --
21:05 jeffreykegler that one 'error' which is usually ignored, for example ...
21:05 jeffreykegler or that one 'error' which is usually recoverable ...
21:07 koo6 things that you *probably* dont want to see happen unless you know what youre doing, so, i imagine logging a warning would be appropriate?
21:07 jeffreykegler anything more general -- any result that is in some way special.
21:08 jeffreykegler For example, you might use a call to test for the existence of a symbol, *looking* for the -1
21:08 jeffreykegler Conceivably, that might be "success" and you would want to warn if anything else was returned.
21:08 koo6 :)
21:09 ronsavage joined #marpa
21:11 koo6 which function would that be?
21:12 jeffreykegler Grep'ing my Libmarpa API, it seems my only current uses are "no such X" conditions and possible some values which legitimately can be -1, though I can't find any right now.
21:14 jeffreykegler koo6: which function to test for existence of a symbol.  A bunch work for that, I never really settled on any one.
21:14 jeffreykegler Here's an example: marpa_g_sequence_separator()
21:15 jeffreykegler It returns -1 if you are testing a rule which *is* a sequence rule, but does *not* have a separator symbol.
21:16 koo6 understood
21:17 jeffreykegler In that case, -1 would often be considered an indication of success.
21:20 koo5 thaks for the threading answer again, im finally quite happy with it:)
21:20 koo5 *thanks
21:21 jeffreykegler koo5: Great!
21:21 koo5 :)
21:23 jeffreykegler Another -1 return which always means success, by any reasonable definition: marpa_r_progress_item()
21:24 jeffreykegler It's the progress report iterator, and -1 means there are no more items.  Termination of a finite list is, of course, a normal success condition, one that should occur on any full traversal of a list.
21:26 jeffreykegler One way to summarize -1 returns is that they are returns that are "special" in some way, but whose meaning otherwise varies greatly, depending on the method.
21:27 jeffreykegler They may be success's, soft errors or hard errors.
22:06 koo6 continuing, https://gist.github.com/anonymous/7e10fe6cc37a42ab3eb5
22:23 koo6 https://gist.github.com/anonymous/a336a87f3b826e252330

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary