Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-09-13

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:02 jeffreykegler "DSL to recognize numeric constants" (72 lines) at http://scsys.co.uk:8002/423149
00:03 jeffreykegler kkrev: did you see my paste?  It doesn't seem to show in my backlog.
00:04 jeffreykegler I took Jean-Damien's code as a basis, worked it up and tested it.
00:14 kkrev got it thanks.
00:15 hobbified joined #marpa
00:21 Aria joined #marpa
00:21 jeffreykegler joined #marpa
00:43 shadowpaste "kkrev" at 50.190.12.218 pasted "How to parse quoted strings?" (40 lines) at http://scsys.co.uk:8002/423166
00:47 jeffreykegler kkrev: Look this over -- https://metacpan.org/source/JKEGL/Marpa-R2-2.092000/lib/Marpa/R2/meta/metag.bnf
00:47 jeffreykegler It's Marpa own self-grammar, with a variety of comments, strings, etc.
00:48 jeffreykegler The format it parses is therefore described in Marpa'
00:48 jeffreykegler s description of its own DSL: https://metacpan.org/pod/distribution/Marpa-R2/pod/Scanless/DSL.pod
00:55 jeffreykegler It probably is not exactly what you want but should give you a good idea of how things work.
00:55 jeffreykegler And it is the grammar actually used to parse Marpa's DSL, including itself.
00:55 kkrev I don't think it has to handle empty quotes anywhere?
00:55 ronsavage idiosyncrat: It's /already/ in the FAQ: http://savage.net.au/Perl-modules/html/marpa.papers/chapter3.html.
00:55 ronsavage kkrev: You're homework is the read that doc!
01:07 daxim joined #marpa
01:07 ronsavage kkrev: MarpaX::Languages::SVG::Parser::SAXHandler has various grammars at the end of the code (after __DATA__), including number as float | integer, copied from Jean-Damien's code.
01:07 ronsavage Agggg. That's * :: SVG :: Parser.
01:07 ronsavage kkrev: Sorry - A grammar mistake discussing grammar. "You're homework" => "Your homework"
01:07 ronsavage kkrev: Sorry - Another mistake. "is the read" => "is to read". Sigh. I'd better go back to sleep, it's 11::00 am (sic) already :-).
01:07 jeffreykegler_ joined #marpa
01:07 jeffreykegler_ ronsavage: Ooops!
01:07 hobbs joined #marpa
01:07 jeffreykegler_ Ooops
01:08 kkrev thanks, that's a good doc.
01:08 hobbs joined #marpa
01:28 ilbot3 joined #marpa
01:28 Topic for #marpa is now Start here: http://savage.net.au/Marpa.html - Pastebin: http://scsys.co.uk:8002/marpa - Jeffrey's Marpa site: http://jeffreykegler.github.io/Marpa-web-site/ - IRC log: http://irclog.perlgeek.de/marpa/today
01:31 daxim_ joined #marpa
01:32 jeffreykegler joined #marpa
01:32 * jeffreykegler has been having trouble connecting.
01:33 shadowpaste "jeffreykegler" at 162.232.214.245 pasted "kkrev's string example, with tests working" (42 lines) at http://scsys.co.uk:8002/423180
01:34 jeffreykegler Here's kkrev's string example -- look it over to see what's changed but it's basically that semantics has to go at the G1 level.
02:12 kkrev jeffreykegler: your number parsing dsl earlier blows up on -0.1. Easy to fix as I don't care about octal.
06:25 jdurand joined #marpa
06:27 jdurand Re http://irclog.perlgeek.de/marpa/2014-09-12#i_9345216 - yes, tge ECMAScript grammar handles all utf8 stuff correctly, but is more difficult to follow than the C grammar, because it is modified on-the-fly for injection into the main "Program" grammar - nevertheless it is "utf8" compliant
06:30 jdurand A simpler view is the I_CONSTANT and F_CONSTANT lexemes from the C grammar - Ron's SVG stuff as well, I suppose. I start to wonder if we shall have a collaborative module containing common patterns, a bit like Regexp::Common, but for Marpa -;
08:01 lwa joined #marpa
10:17 ilbot3 joined #marpa
10:17 Topic for #marpa is now Start here: http://savage.net.au/Marpa.html - Pastebin: http://scsys.co.uk:8002/marpa - Jeffrey's Marpa site: http://jeffreykegler.github.io/Marpa-web-site/ - IRC log: http://irclog.perlgeek.de/marpa/today
11:56 lwa joined #marpa
12:38 ronsavage joined #marpa
12:40 ronsavage jdurand: I'm with you re common patterns. Basically plug-in sub-grammars would save a lot of people from having to reinvent the wheel.
13:55 sivoais joined #marpa
13:56 daxim_ joined #marpa
14:26 ronsavage joined #marpa
14:29 jeffreykegler joined #marpa
16:04 kkrev joined #marpa
16:52 shadowpaste "kkrev" at 50.190.12.218 pasted "parse interspersed unstructured text" (51 lines) at http://scsys.co.uk:8002/423368
16:52 kkrev I have a format that intersperses unstructured text between two delimiters. After the delimiter it resumes a regular structure. I'm at a loss how to match the unstructured text. Maybe I need suspend the latm setting? Somehow use the "pause" stuff and then work with the reader? It's going over my head.
16:57 jeffreykegler Don'
16:57 jeffreykegler t suspend LATM -- you'll want it.
16:58 jeffreykegler It's a matter of coming up with a pattern / BNF for unstructured text that does not slurp stuff up past the terminator.
17:09 jeffreykegler Looking it over, it's actually an interesting grammar / application.
17:14 jeffreykegler I'll need to think on it a bit.  Ideas, anyone?
17:32 jeffreykegler Those of you interested in exploring in parsing, this can be seen as an example of a very interesting problem I explored in some long-buried blog posts:
17:33 jeffreykegler How to spot a pieces of a structured language (or more generally, some pattern) floating in a sea of random stuff.
17:34 jeffreykegler Applications include incremental development of parsers/compilers for existing languages -- that is, just write pieces of the parser, and have the parser treat the rest as noise or cruft.
17:34 jeffreykegler That you way can use an existing codebase without having to finish the whole compiler.
17:35 jeffreykegler Another application is error reporting -- that is, what appears to be unstructured text, in this case are the places where your error are.
17:36 jeffreykegler This can be a powerful, general approach to what is now called "error recovery".
17:37 Aria I like that idea a lot
17:39 jeffreykegler I've talked about this stuff a lot in the past, back when my blog's audience was a lot smaller ...
17:39 jeffreykegler or at least a lot quieter. :-)
17:55 jdurand joined #marpa
17:58 jdurand Re http://irclog.perlgeek.de/marpa/2014-09-13#i_9352643 - this case is almost identical to what the C grammar calls an opaque ASM statement, c.f. https://github.com/jddurand/MarpaX-Languages-C-AST/blob/master/lib/MarpaX/Languages/C/AST/Grammar/ISO_ANSI_C_2011.pm#L1352 - In this case I have a pause after on the begin delimiter, and scan the text character per character, inserting a whole big lexeme what I call ASM_OPAQUE
18:21 lwa joined #marpa
18:37 shadowpaste "jeffreykegler" at 162.232.214.245 pasted "Answer to unstructured text problem" (54 lines) at http://scsys.co.uk:8002/423380
18:38 jeffreykegler kkrev: I looked back at your problem and realized that your issue has a much simpler solution, which I've just pasted
18:38 jeffreykegler But now I'm going to think about the more general problem.
18:45 jeffreykegler The solution relies heavily on LATM.  If you try to modify it, be sure to think our which lexemes are expected when, and which will be longest.
18:46 jeffreykegler In particular, the solution relies on the terminator being longer than any token inside the unstructured token, and this is guaranteed by having them all be length 1.
18:46 kkrev jeffreykegler, thanks. pretty obvious. dunno why I was failing to try the simpler thing first.
18:51 jeffreykegler kkrev: do you use trace_terminals, by the way?
18:52 jeffreykegler It tells you what the parser *is* seeing, at least at the lexeme level, and is very helpful is seeing at what point things go wrong.
18:55 kkrev I have not looked into that setting. I will.
19:13 jdurand kkrev, indeed jeffrey's answer is correct because you're "lucky", your unstructured text end with something longer than 1 - the general solution might go by another sub-grammar that explicitely puts the lexeme ending the unstructured text with a higher priority
19:14 kkrev Is there anyway to match the text not as a list of chars but as a proper block of text with the posted approach, or do I need to resort to 'pause' stuff or sub-grammars?
19:15 jdurand but what is the unstructured text contains also the lexeme that is marking the beginning of it, e.g. EDITEDBINARYDATAITEM xxx EDITEDBINARYDATAITEM. I.e. can it be nested
19:16 jdurand This was the difficulty with the C grammar v.s. ASM. the general asm statement is something like __asm { <everything }. But the inside could also contain {}, i.e. __asm { xxx { yyy } zzz } - anyway, this was just a remark
19:16 kkrev It could happen in theory but in practice I can ignore the risks of lexemes showing up in the unstructured text for this application.
19:20 jdurand Ok, anyhow the general solution, Jeffrey will probably with (-;) - I am thinking to a sub-grammar that has the "start" and "end" lexemes with priority 1, anything else being a lexeme of length 1 with priority 0, this sub-grammar being recursive, and anything having the "start" lexeme has a pause after that is triggering the sub-grammar
19:20 jdurand "will probably come with"
19:22 jeffreykegler kkrev: I don't think other lexemes inside the unstructured text would be a problem.
19:24 jeffreykegler kkrev: In fact, I just tested and they are not.  The reason is LATM -- nothing is expected inside structured text, once it has started, except more structured text or the terminator --
19:25 jeffreykegler other lexemes are not acceptable and therefore will not be recognized.  Which is pretty cool, if I say so myself. :-)
19:26 jeffreykegler jdurand: I'm golng AFK soon, and I'm doing to mull over the general problem, with applications to error reporting, incremental grammar writing, etc.
19:48 jdurand Re http://irclog.perlgeek.de/marpa/2014-09-13#i_9351381 - thnx!
19:50 jdurand Re http://irclog.perlgeek.de/marpa/2014-09-13#i_9353012 - ok, only 10mns to 22 today and I am already tired - AFK and off to bed... my working hours during the last weeks are very intensive and I have little energy to do something else when at home -; see you
21:25 jeffreykegler joined #marpa
21:38 kkrev_ joined #marpa
21:38 daxim__ joined #marpa
23:20 jeffreykegler joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary