Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2015-08-27

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:04 idiosyncrat_ joined #marpa
01:19 djns hi
02:36 ronsavage joined #marpa
02:41 rns joined #marpa
03:06 CQ_ joined #marpa
03:10 rns idiosyncrat_: re http://irclog.perlgeek.de/marpa/2015-08-24#i_11104079 -- from https://github.com/latk/p5-MarpaX-Grammar-Preprocessor#COMMANDS, \adverb’s are mostly used as shortcuts for SLIF’s adverb => value pairs, e.g. \array expands to action => ::array.
03:10 rns Looks ok to me, even if arguably less readable.
03:12 rns lwa: perhaps \include "filespec" could be a nice companion to \namespace? There was a talk about possibility of such statement in SLIF.
03:14 * rns changing topic
03:14 rns Another potential marpa app: "Prune lets programmers edit the tree structure directly using commands invoked by typing and rendering the resulting code in a familiar textual format." --https://www.facebook.com/notes/kent-beck/prune-a-code-editor-that-is-not-a-text-editor/1012061842160013
03:16 * rns changing topic
03:17 rns "a very rough speed ranking of Lua->C interfacing methods, from fastest to slowest" -- http://lua-users.org/lists/lua-l/2015-07/msg00189.html
03:25 idiosyncrat_ rns: right
03:27 idiosyncrat_ "right" re \adverb's that is
03:27 idiosyncrat_ I wondered if anybody else found the syntax attractive
03:28 idiosyncrat_ I think I was influence toward adverb => value *because* of its potential for ambiguity -- basically I was kind of showing off. :-)
03:32 rns idiosyncrat_: :-)) FWIW, ANTLR uses #alternative_name (labelling) -- which is a directive, sort of -- where SLIF uses name => 'alternative name'
03:33 rns adverb => value syntax looks more self-explanatory to me, arguable verbosity being the price paid.
03:34 idiosyncrat_ Right.  It needs the extra punctuation, whereas Marpa could get by on even less -- Marpa can parse stuff that's too hard to eyeball
03:36 idiosyncrat_ Since we'll be doing new interfaces for Kollos, I'm interested in folks take on the SLIF's syntax --
03:36 idiosyncrat_ At this point, the SLIF itself is frozen and will never change.
03:37 idiosyncrat_ Re the Lua C interfacing methods, I've never actually even considered using the FFI -- it's all Lua C API.
03:37 idiosyncrat_ Disappointing if the Lua
03:38 idiosyncrat_ if the LuaJIT is just 10% faster.
03:41 rns FWIW, the author notes "LuaJIT, fully compiled FFI" is "orders of magnitude" faster. Not sure what is meant by
03:41 rns "fully compiled FFI" --
03:43 idiosyncrat_ Well, I'm using 5.1 so we can find out someday.
03:43 rns -- luajit has ffi compiled in so ffi = require("ffi") can be it.
03:43 idiosyncrat_ In Kollos progress, I've been going from grammar downwards.
03:44 rns Yes.
03:44 idiosyncrat_ Currently grammars are in the Pure Lua InterFace PLIF, which implements all the cool features like sequences, precedenced rules, etc., but does not have a fancy SLIF-like syntax.
03:45 idiosyncrat_ I've now parsed input with a PLIF grammar, and created a bocage.  Next comes evaluation.
03:46 idiosyncrat_ After that comes doing example by example and test by test until all the features of the SLIF are in the PLIF.
03:46 idiosyncrat_ My hope is that the PLIF is straightforward enough that users will write their own interfaces which compile to it.
03:46 rns BTW, "don't use C. it's faster to do tight loops in fully compiled Lua code" looks interesting in terms of moving parts of libmarpa to lua.
03:48 idiosyncrat_ Yes, reading that I felt reassured -- I think it says if it's not a big piece, you're better off in pure Lua.
03:49 idiosyncrat_ Reassured because I'm not converting tiny pieces of Lua to C anyway -- the result would be a maintenance nightmare.
03:49 rns Yes.
03:49 idiosyncrat_ The LUIF document, by the way, is now very out of date, but I'm not doing anything with it.
03:50 idiosyncrat_ I'll wait until I've done evaluation, so we no longer have a moving target.
03:53 rns Yes, it's ok for now and can be synced later, based on the code.
03:54 idiosyncrat_ Re speed, I honestly think that some day we'll have earley/Leo items done in silicon, the same way you no longer have to write your own floating point logic in terms of binary, but floating point logic is built into floating ponit instructions
03:58 mauke_ joined #marpa
04:03 rns FWIW, Marpa is a proven algorithm, so it sounds very plausible.
04:08 idiosyncrat_ And Marpa's success seems to be being taken as a "proof of concept" for Earley parsing.
04:10 rns Yes. BTW, along with silicone, compiling Marpa to some VM -- JVM, C#, LLVM -- can be an option -- using luajit in Kollos is in this department, I think.
04:13 rns idiosyncrat_: re the LUIF doc -- once you think something can be documented/updated, you can file an issue, marked suitably, e.g., well "Sync", against https://github.com/rns/kollos-luif-doc so that we'll be able to work it out.
04:16 * rns changing topic
04:19 rns re Kollos -- Seamless (character-based) input model, with strand parsing to reduce memory intensity will deliver grammar composability out of the box and that is gonna be a very cool thing.
04:20 idiosyncrat_ joined #marpa
04:21 idiosyncrat_ The first version of the PLIF will be seamless only, so we'll get a chance to play with it.
04:21 rns Great.
04:22 idiosyncrat_ I'll have to add 2-layer parsing quickly, though, because we want to leverage the Marpa::R2 test suite, and its tests assume the parser/lexer divide
04:29 rns Yes, literals and charclasses would be easy to add once seamless it here - literals as literal ::= 'l' 'i' 't' 'e' 'r' 'a' 'l' and charclasses as charclass ::= 'c' | 'h' | 'a' | 'r' ... -- essentially 'and' and 'or' rules.
04:33 idiosyncrat_ joined #marpa
04:42 ronsavage Re http://irclog.perlgeek.de/marpa/2015-08-27#i_11121601. I'm with rns on this. The macro expansions offered seem minor.
04:47 rns ronsavage: yes, adverb-wise -- there are also other commands, e.g. \namespace, \optional, \doc look.
04:47 rns And \include seems to be a welcome addition.
04:55 idiosyncrat_ Good night!
05:16 rns "Pure BNF view of grammar syntax" -- Other grammar systems use extended BNF (EBNF), with all kinds of additional alternation, sequencing and Kleene plus/star operators, with embedded actions for building tree nodes or checking semantics. (Ira Baxter, the architect of DMS, did that for 30 years. Then he got smart.) -- http://www.semanticdesigns.com/Products/DMS/DMSParsers.html
05:27 dvxd rns, what smarter alternative is there to EBNF?  Is it that DMS link you pasted?
05:28 rns Yes, "It is easy to code the other EBNF ideas with this notation." and one, the next paragraph.
05:29 rns s/and one/and on/
05:30 dvxd rns thank you, sorry I didn't read the link first, very constrained for time.
05:30 rns No problem. Not sure about "smarter", though, but easiest on syntax: only one op — ::=
05:37 rns with attributes, where needed.
06:10 pczarn joined #marpa
06:37 ronsavage joined #marpa
06:49 CQ2 joined #marpa
06:52 CQ2 tag_body ~ [^\s+#@&!]+
06:52 CQ2 tagstart ~ [+#@&!]
06:53 CQ2 is there a way to use tagstart in teh tagbody declaration? I find myself defining something, and then duplicating a good part of it
07:04 rns CQ2: tag ~ tagstart tag_body, perhaps? or tag_body ~ tagstart tag_contents   tag_contents ~ [^\s+#@&!]+
07:20 pczarn CQ2 probably wants to declare tag_body with negation of tagstart's character class.
07:23 rns tag_body ~ [^+#@&!] tag_contents
07:24 rns tag_contents ~ [^\s+#@&!]+
07:24 rns is what I used once. pczarn: thanks!
07:25 pczarn I mean, tag_body's character class in terms of tagstart's character class
07:27 pczarn or not, I misread
07:29 rns tag_start_char ~ [^+#@&!]
07:29 rns tag_body_char ~ [^\s] | tag_start_char
07:29 rns tag_body ~ tag_body_char+
07:29 rns perhaps then.
07:38 pczarn Two kinds of duplication are possible: first, rns explained how to deduplicate tag_body ~ [+#@&!] [^\s+#@&!]+, tagstart ~ [+#@&!]
07:39 pczarn Second, tag_body ~ [^\svery long character class], tagstart ~[very long character class] can be deduplicated with interpolation of the source string, I guess
07:41 CQ2 bubut rns last example defines tag_start_char as the inverse of what it should be... I want to be able to parse a sentence like "This contains a #tag !" so I need to grab the tag restrictively (the ! without following characters is not a tag, and !!!! isn't a tag either)
07:41 pczarn sorry
07:42 rns CQ2: you're right s/tag_start_char ~ [^+#@&!]/tag_start_char ~ [+#@&!]/
07:42 CQ2 tns ok, makes more sense... then the negation from [^\s] | tag_start_char applies agter the | as well?
07:42 CQ2 after
07:43 pczarn no, it won't work
07:44 pczarn CQ2: `tag_body ~ tagstart tag_contents`   `tag_contents ~ [^\s+#@&!]+` is ok
07:46 rns tag_start_char ~ [+#@&!]
07:47 rns tag_body_char ~ [^\s] | tag_start_char
07:47 rns tag_body ~ tag_start_char tag_body_contents
07:47 rns tag_body_contents ~ tag_body_char+
07:49 rns correction (sorry): s/tag_body_char ~ [^\s] | tag_start_char/tag_body_char ~ [^\s] | not_tag_start_char/ not_tag_start_char ~ [^+#@&!]
07:50 rns so it boils down to defining tag_start_char ~ [+#@&!] and not_tag_start_char ~ [^+#@&!]
07:50 CQ2 rns but there is no way to do a negation like that in marpa using something line ^<<tagstart>> or something?
07:52 rns you can use events -- define tag_body ~ tag_start_char tag_body_contents
07:54 rns :lexeme ~ <tag_start_char> pause => after event => 'tag_start_char' and check if the input after tag start char matches what you need.
07:55 rns and lexeme_read('tag_body_contents') if it is or throw an exception
07:56 rns example -- https://gist.github.com/rns/d19b40ffc5523659dec9 distinguishes between AND as a keyword and identifier.
07:57 rns or :lexeme ~ <tag_body_contents> pause => before event => 'tag_body_contents'
07:57 rns events take some time and effort to get used to, but they are very powerful.
08:00 CQ2 rns: I'd prefer to stay away from them for now, still having enough issues getting used to marpa without them :=)
08:00 pczarn alternatively: my $tag_start_class = "+#@&!"; my $slif_rules = "tag_body ~ [^\s$tag_start_class]+  \n  tag_start ~ [$tag_start_class]";
08:00 rns pczarn: good point!
08:01 CQ2 afk for a while
08:01 rns CQ2: ok
08:09 ronsavage joined #marpa
08:21 ronsavage The discussion of tags sounds to me very much like recognizing C/C++/hash style comments. See https://metacpan.org/source/RSAVAGE/GraphViz2-Marpa-2.03/lib/GraphViz2/Marpa.pm#L385
08:23 rns ronsavage: it does, yes.
08:25 CQ2 wait, rns != ronsavage? Ithought you were the same person :)
08:27 rns :)) No, we weren't.
08:36 CQ2 can someone explain to me how precedences work in the parsing? is the order in the grammar important?
08:48 rns it's rather the order in which subexprerssions' symbols occur on the RHSes, e.g. https://metacpan.org/pod/distribution/Marpa-R2/pod/Marpa_R2.pod#Synopsis -- note how Term (higher precedence) is on RHS of Expression and Factor (lower precedence) is on RHS of Term.
08:52 rns The same thing using SLIF precedenced rules:
08:52 rns Expression ::=   Number
08:53 rns || Expression '*' Expression
08:53 rns || Expression '+' Expression
08:53 rns || operator meaning 'lower precedence than above'
08:53 rns some intro (not TL;DR though) https://en.wikibooks.org/wiki/Introduction_to_Programming_Languages/Precedence_and_Associativity
08:55 CQ2 ok
08:56 rns The recursion -- Expression ::= ... || Expression '*' Expression -- in precedenced rules -- is required for precedence to work.
09:00 CQ2 what's the diff between | and || there again? one is precedence, one is not?
09:09 CQ2 problem is that kind of stuff wit symbols is hard to google
09:18 rns | means 'same precedence as above', || means 'lower precedence than above'
09:18 CQ2 thanks
09:19 rns e.g. e ::= e '**' || e '*' e | e '/' e - pow takes precedence over mul and div
09:19 rns s/e '**' ||/e '**' e ||
09:20 CQ2 so that would work if a lexer can find two different matching lexemes for a string?
09:24 CQ2 i.e. I have word ~ [^s] and tag ~ [#] [^s]  I can then define that the tag wins over the word at G1 using e ::= tag || word  ?
09:29 rns not exactly -- for precedence you need operators and operands -- e ::= tag || word has no e on RHS, so precedence takes no effect.
09:31 rns for these, you can use ranks and lexeme priorities -- e.g. https://github.com/ronsavage/MarpaX-Languages-Lua-Parser/blob/master/lib/MarpaX/Languages/Lua/Parser.pm#L771
09:31 CQ2 rns do lexemes have to be distinct, or is the word and tag definition above allowed?
09:33 rns CQ2: Sorry, but I'm a bit busy at work now. Can you post a script with your use case -- I'll be able to take a look in 6-7 hours?
09:34 CQ2 rns no problem, let me chip away at it, I'm getting somewhere, but "solving" it won't help, I need some explanations, but that can wait.
09:34 rns CQ2: ok
09:34 CQ2 I just got over one problem, the rest should be doable
09:34 CQ2 I'm still chipping away at the task manager I startes 2 years ago
09:34 CQ2 the parsing is almost complete, next is the processing of the content
09:35 rns ok, I remember that thing.
09:35 rns looking at the code would be good though, if possible.
09:35 CQ2 sure... its fairly clean and commented since I forgot what I did 2 years ago :)
09:36 CQ2 https://www.roaringpenguin.com/products/remind ... this is a little similar, has some nice features which I might pick up later
09:38 rns ok I'll take a look in the evening.
09:41 rns https://www.roaringpenguin.com/products/remind -- where is marpa-related code in this?
09:41 rns CQ2: have a question, see above.
09:42 CQ2 there isn't, I'll pastebin my code later. That has some nice features and some similar functionality though, and could be fun to parse as well
09:45 rns CQ2: ok, will take a look.
09:49 lwa joined #marpa
10:22 pczarn joined #marpa
11:35 pczarn joined #marpa
15:27 idiosyncrat_ joined #marpa
15:32 CQ2 rns no code for you tonight, I rewrote teh grammar, am rewriting the back end now, and cleaning up and simplifying and understand what's going on. One critical passage in the docs that I missed before:
15:34 CQ2 Tokens at the boundary between L0 and G1 have special significance. The top-level undiscarded symbols in L0, which will be called "L0 lexemes", go on to become the terminals in G1. G1's terminals are called "G1 lexemes". To find the "L0 lexemes", Marpa looks for symbols which are on the LHS of a L0 rule, but not on the RHS of any L0 rule. To find the "G1 lexemes", Marpa looks for symbols on...
15:34 CQ2 ...the RHS of at least one G1 rule, but not on the LHS of any G1 rule.
15:35 CQ2 that kept getting me messed up because I didn't have that clear distinction in mind
15:39 rns CQ2: Been there too; glad you nailed it.
15:41 CQ2 well, still fighting one thing: !text is recognized as a tag, but I cant get it to recognise ! as plain text (no tag) ... everything I think of in teh grammar would make the second one also swallow the text of the first.
15:48 rns Well, not seeing the grammar, I'd say you can try moving L0 rules to G1 and see if Marpa can do the right thing (G1 level is much better at ambiguity).
15:53 rns Also, trace_terminals => 1 or even 99 and show_progress() can help you see what's going on.
15:53 CQ2 rns: I have it at L0 and got it working, should be OK.
15:53 rns Great.
15:54 CQ2 stop telling me all this useful stuff, my comments in the grammar file are getting too long :)
15:56 rns :)) Also familiar. :) BTW, you can put this trace_terminals/show_progress stuff into a sub, e.g. parse_debug() to be called when a parse fails.
16:06 CQ2 what's wrong with this?
16:06 CQ2 tag_body ~ [a-zA-Z0-9_] [\w]*
16:10 lwa CQ2: sequence rules (those with "+" or "*" quantifiers) must be rules of their own. Try this: tag_body ~ tag_body_start tag_body_rest; tag_body_start~ ...; tag_body_rest ~ [\w]*;
16:13 CQ2 lwa thanks... why?
16:15 lwa Marpa's SLIF will never autogenerate a rule ID for you, in order to make debugging easier. And sequences must be a grammar rule of their own: In plain BNF without "*", you'd have to write "A ::= ; A ::= x A" instead of "A ::= x*", but Marpa optimizes sequence rules so you'd better use them :)
16:17 CQ2 I have one issue left I think, but that's more my fault because I haven't decided exactly what I want yet :)
16:19 CQ2 thanks for the help.... later!
16:21 idiosyncrat_ Re "optimization" of sequences -- that takes two aspects.
16:21 idiosyncrat_ First, it's rewritten as a left recursion, which is slightly faster in some circumstances than a right recursion.
16:22 idiosyncrat_ Second, it recognizes in the evaluator that it's a sequences, so and builds the sequence all at once.
16:22 idiosyncrat_ If you write sequences in BNF, you should get into the habit of writing them as left recursions unless you have no other reason.
16:23 idiosyncrat_ And in that case, if your sequence, does not have much in the way of semantics, the optimization may come down to nothing.
16:23 idiosyncrat_ My point ...
16:23 idiosyncrat_ and I was getting around to one :-) ...
16:24 idiosyncrat_ is that if a SLIF sequence is inconvenient, you may well want to just do it in BNF.
16:24 idiosyncrat_ By the way, I've never really tested that my optimization of the evalution of long sequence does in fact, speed things up.
16:25 idiosyncrat_ If anyone has the time to test it, I'd be curious about the results.
16:32 pczarn Before the evaluator prepares values for a rule's semantics, it must store the stack's size, is that right?
16:33 pczarn The number of arguments is known for all rules except sequences.
16:36 pczarn The evaluator must know how many arguments to take off the stack.
17:06 idiosyncrat_ pczarn: right
17:07 idiosyncrat_ and if you don't explicitly use a sequence, but write the sequence in BNF, that happens via a series of evaluations which do something like build an array, depending on what the application decides
17:08 idiosyncrat_ In the SLIF if you use a sequence, the regular stack is used, which I hoped would be faster.
17:16 idiosyncrat_ By the way, I am thinking of crowdsourcing Marpa
17:17 idiosyncrat_ In a sense, I've already started -- Ron Savage was the first contributor.
17:18 idiosyncrat_ We did that via Paypal but Paypal is not entirely friendly to crowdsourcing -- they seem to want you to be registered as a non-profit.
17:19 idiosyncrat_ Does anyone know about and/or suggest sites, etc., to handle crowdsourcing?
17:19 idiosyncrat_ I've been happy to self-fund my own full-time work on Marpa in the past, but I am reaching the point where that's no longer practical.
17:33 roxfan patreon? though that one is more for artists/writers etc
17:43 idiosyncrat_ roxfan: I'll wear a beret :-)
17:43 idiosyncrat_ Actually, there are open source projects on patreon.
17:47 pczarn https://bountysource.com/, https://freedomsponsors.org/ and https://gratipay.com/
17:55 koo7 joined #marpa
18:35 koo7 joined #marpa
18:47 idiosyncrat_ AFK
18:48 pczarn joined #marpa
19:18 koo8 joined #marpa
21:02 rns idiosyncrat: re http://irclog.perlgeek.de/marpa/2015-08-27#i_11126126 -- here is my take (hope I got it right) -- https://gist.github.com/rns/84605eb1551d45918e2b
21:17 koo7 joined #marpa
22:31 ceridwen_ joined #marpa
22:35 ronsavage joined #marpa
23:43 djns joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary