Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2015-08-17

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:00 idiosyncrat_ joined #marpa
01:00 idiosyncrat_ https://www.reddit.com/r/programming/comm​ents/3h8dsn/marpa_is_a_parsing_algorithm/
01:00 idiosyncrat_ Apparently a discussion of Marpa has started over at reddit.com
01:54 idiosyncrat_ joined #marpa
06:08 ronsavage joined #marpa
10:31 nox joined #marpa
10:31 nox Hello.
10:31 nox How can Marpa handle all grammars covered by recursive descent parsers while it mentions restrictions with ambiguous  grammars?
10:32 nox Recursive descent parsers correspond to context-free grammars, and context-free grammars can be ambiguous. The claims on Marpa's web page sound contradictory. Am I missing something?
11:31 rns Marpa's web page says "Ambiguous grammars that are _unions of a finite set of any of the above grammars_." and, above "All grammar classes that recursive descent parses"
11:32 rns which I can't read other than "Marpa parses all ambiguous grammars that recursive descent parses"
11:33 rns in linear time, that is.
11:34 rns Oh and hello and welcome, of course. -- politeness is better late than never. :)
12:47 idiosyncrat_ joined #marpa
12:47 idiosyncrat_ nox: Hi!
12:48 idiosyncrat_ First off, recursive descent is a strictly speaking not an algorithm, but a technique ...
12:48 idiosyncrat_ this makes it hard to describe which class of grammars it handles ...
12:49 idiosyncrat_ for example, you could have a recursive descent implementation that calls Marpa at points,
12:50 idiosyncrat_ In practice it seems fair to treat recursive descent as LL(1) -- top-down with one character as look-ahead.
12:51 idiosyncrat_ No LL-parser handles ambiguous grammars -- it's a deterministic method.
12:51 idiosyncrat_ Marpa *does* handle all context-free grammars, even the ambiguous ones.
12:52 idiosyncrat_ The restrictions you mention are on grammars that are parsed in *linear* time.
12:54 idiosyncrat_ You can be far more generous about recursive descent and the same holds -- if you call recursive descent LL(k), for any k, it is still deterministic, and Marpa still handles all the grammars it handles in linear time.
13:10 nox I see.
13:10 nox Why Marpa::HTML gives 404 everywhere?
13:54 rns nox: Marpa::HTML is backpan-only; the replacement is Marpa::R2::HTML -- https://metacpan.org/pod/distrib​ution/Marpa-R2/html/pod/HTML.pod
13:55 nox Ok.
13:56 nox rns: I guess it's just a showcase of what Marpa can do, right?
13:58 rns Not quite -- it was used in production by Jeffrey and others, IIRC.
13:58 nox But it does things that are wrong.
13:58 nox Like the part "The same logic works even if a table is very defective:"
14:01 rns This is by design -- "Marpa::R2::HTML is an extremely liberal HTML parser. Marpa::R2::HTML does not reject any documents, no mater how poorly they fit the HTML standards."
14:02 nox There is a difference between rejecting documents, and doing the thing in the HTML spec.
14:02 nox Which now completely specify how trees should be built, and which tags can be omitted, and which tags must be ignored in which context.
14:03 rns Yes, but one of the goals was to demonstrate Marpa parser's error recovery capability and configurability -- BTW, it  has fully confgurable http://blogs.perl.org/users/jeffrey_kegle​r/2012/10/a-configurable-html-parser.html
14:04 rns Standard compliance was not the goal, as cited above.
14:04 rns s/it  has/it is/
14:51 djns joined #marpa
14:53 djns hi
15:07 rns djns: hello
15:13 idiosyncrat_ nox: which HTML spec are you referring to?
15:13 nox HTML 5.
15:14 nox (I don't think trying to abstract over HTML 4.01 and HTML 5 is a good idea at all.)
15:15 idiosyncrat_ In writing Marpa::R2::HTML, I based some of its behavior on an earlier CPAN module ...
15:16 idiosyncrat_ in particular its test suite repaired tables, and I copied that behavior
15:16 nox I see.
15:18 idiosyncrat_ At the time Marpa::R2::HTML was designed, HTML 5 was not as dominant, and in any case its design criteria was to turn *anything* into HTML.
15:18 nox For the curious, HTML 5 says you must clear the stack back to a table context if you encounter a <tr> start tag in text insertion mode. https://html.spec.whatwg.org/multipage​/syntax.html#parsing-main-intable:clea​r-the-stack-back-to-a-table-context-5
15:18 idiosyncrat_ You can give it a Perl script, and it will turn it into HTML.
15:19 nox Err, misread spec. Anyway I'm pretty sure you just don't infer new <table> start tags. But with your explanation it makes sense.
15:21 idiosyncrat_ The CPAN module (whose name I forget right now) was pre-HTML 5 and I used the table repair strategy in its test suite.
15:22 nox Got it.
15:23 idiosyncrat_ IIRC Marpa::R2::HTML's preference is to keep items on the stack and put them into some sort of context, making up as much stuff as needed.
15:23 idiosyncrat_ Apparently HTML 5 mandates things be thrown away if they don't make sense.
15:25 idiosyncrat_ Also I don't know how recent that HTML 5 language is -- the page you cite is July 2015, though its contents may be older.
15:25 idiosyncrat_ I did study HTML 5 at the time for language as to what to do in this case, though it is not impossible I missed some that already existed.
15:26 nox idiosyncrat_: HTML 5 is a living standard now.
15:26 nox So you implement whatever it says that moment, and then you keep up and cry;
15:26 nox s/;/./
15:27 idiosyncrat_ I suspect most HTML renderer teams do not catch up instantly with each change.
15:27 idiosyncrat_ Marpa::R2::HTML sure won't.
15:27 nox idiosyncrat_: But i'm pretty sure no version of HTML mandated the introduction of a missing <table> start tag.
15:28 nox Modern browsers all have quite complete HTML 5 parsers.
15:28 idiosyncrat_ IIRC correctly at the time I read the standard it was silent.
15:29 idiosyncrat_ That is, silent as to what to do with an "orphan" <tr> tag
15:30 idiosyncrat_ Note also that if you are using Marpa::R2::HTML as a utility (its more typical use) and not as a renderer ...
15:30 idiosyncrat_ throwing things away is not usually what you want.
15:31 idiosyncrat_ If it builds a table, you'll say "where'd that come from?" and fix the table.
15:31 idiosyncrat_ And while I've not read the HTML 5 language, I suspect it is a mandate for *renderers*
15:33 idiosyncrat_ And it makes sense in the context of a rendering engine, which has next to no probability of turning the lone '<tr>' into a useful table, and whose user is *not* going to change the original HTML ...
15:33 idiosyncrat_ to simply throw that information away.
15:35 idiosyncrat_ In the context of a user who'd want to find and fix the tag, it's annoying to throw it away silently.
15:35 idiosyncrat_ Another point:
15:36 idiosyncrat_ I'd hoped to use Marpa::R2::HTML to illustrate Marpa's power and flexibility for HTML parser -- its behavior is table-driven, and could be made configurable.
15:37 idiosyncrat_ ... and to inspire others to build on it.
15:38 idiosyncrat_ Unfortunately, I wrote it before Marpa's latest interface, the SLIF, and so its actual implementation is quite obsolete in Marpa terms.
15:47 idiosyncrat_ By the way, I'd *love* to see somebody fork Marpa:;R2::HTML, in which case they'd be able to make these choices.
15:49 idiosyncrat_ My guess is that they fork-er will see its primary use as a utility, and decide that simply throwing away tags silently is, in that context, counter-productive.
15:49 idiosyncrat_ s/they fork-er/the fork-er/
15:50 idiosyncrat_ But I'd hope they also made behavior configurable, and a strict HTML 5 mode would certainly be nice to have.
16:05 djns HTML is fairly easy to parse, I wrote an HTML parser once just with a loop and state variable
16:06 Aria I suspect you may not have encountered some HTML that's out there in the wild.
16:50 djns yeah lol
16:50 idiosyncrat_ HTML is fairly easy to *tokenize*.  Parsing in the strict sense is creating the tree hierarchy formed by its elements.
16:51 idiosyncrat_ And strict HTML parsing is not that hard either.
16:51 djns Yes. I did work once with the SAX API in Perl. SAX is generally a good idea
16:51 djns yes
16:52 idiosyncrat_ But Marpa::R2::HTML parses "liberal" HTML, including highly defective HTML.
16:52 djns SAX is for XML, but also can made to work on HTML
16:52 djns yes
16:52 idiosyncrat_ SAX-ish approached often beat Marpa, as long as the problem is one that you can actually solve with a SAX-ish approach
16:52 idiosyncrat_ s/approached/approaches/
16:53 djns I am sure you could make a SAX front end for Marpa-based HTML parser
16:53 djns for the SAX API
16:54 idiosyncrat_ djns: true, but you'd wind up with the overhead of Marpa -- a great benefit of SAX is that it is low overhead
16:55 idiosyncrat_ djns: that's actually a nice formulation of it -- if you problem can be parsed with "a loop and state variable", Marpa is probably overkill for your problem.
16:59 djns right
18:14 roxfan joined #marpa
18:40 mauke joined #marpa
20:12 koo7 joined #marpa
21:15 purmou joined #marpa
21:15 purmou hi all. my parser generator is pretty much complete!
21:15 purmou noticed a ton of interest in the recent marpa post on r/programming, so I posted my parser generator there too
21:15 purmou https://www.reddit.com/r/programmi​ng/comments/3hd1xx/pearleyparser_j​avascript_parser_generator_using/
21:16 purmou feel free to keep lookout there and answer any questions -- you all here are the experts after all :)
22:30 ronsavage joined #marpa
23:31 idiosyncrat_ joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary