Perl 6 - the future is here, just unevenly distributed

IRC log for #marpa, 2014-09-22

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:00 idiosyncrat joined #marpa
02:30 idiosyncrat1 joined #marpa
05:56 ronsavage joined #marpa
06:40 shadowpaste "ronsavage" at 124.148.159.94 pasted "A grammar for quoted strings with escaped chars" (264 lines) at http://scsys.co.uk:8002/424923
06:58 ronsavage Ignore that 1st pasted grammar. I've extended the code slightly.
07:07 shadowpaste "ronsavage" at 124.148.159.94 pasted "Another grammar for quoted strings with escaped chars" (268 lines) at http://scsys.co.uk:8002/424926
08:17 pczarn joined #marpa
08:43 lwa joined #marpa
11:34 LLamaRider joined #marpa
15:32 daxim__ I'm hitting the Earley item warning threshold at ~860 items maximum.  setting Scanless::R->new(too_many_earley_items…) silences the warning, and otherwise the parsing works as expected.
15:32 daxim__ the docs say about this situation: "Large Earley sets are something most applications can, and will wish to, avoid." -- but how? I want to see that set of Earley items - how?
15:32 daxim__ trace_values shows just 62 G1-R and 88 G1-S values, am I looking at the right thing?
15:45 lwa daxim__: Large Earley sets are an indication that the grammar may be more ambiguous that you think. Rewriting the grammar to reduce the ambiguity can be a solution. The "$recce->show_progress" output might be enlightening.
16:10 jeffreykegler joined #marpa
16:11 jeffreykegler daxim_: Congratulations :-)  You're the first person i've heard of to hit the Earley set limit, without having a pathological application.
16:12 jeffreykegler Usually, if you hit the Earley set limit, you're going to run out of memory.
16:13 jeffreykegler Typically, you've found one of the grammars that are non-linear, and your inputs are long enough to make this a problem.
16:13 jeffreykegler I'm curious about what your grammar is, if you are able to share.
16:14 jeffreykegler Details: grammars can be non-linear if they 1) have unbounded ambiguity
16:14 jeffreykegler 2) have an unmarked middle recursion
16:14 jeffreykegler 3) have an ambiguous right recursion
16:17 jeffreykegler daxim: lwa's suggestion to check things out with $slr->show_progress() is a good one.
16:26 daxim__ I've read Progress.pod now and don't know how to apply this to my situation.  I found out that the threshold goes up with size of the parser input, about linear.
16:26 daxim__ I cannot decide whether my grammar is non-linear, it's all grown together piece-meal with experimentation and I know nothing about parsing theory.
16:27 daxim__ I can share it via email, if that's okay.
16:27 jeffreykegler daxim_: Sorry -- at this point I don't usually look at grammars unless they are open source.
16:28 jeffreykegler I have nothing against non-open-source projects, I want to emphasize, ...
16:28 jeffreykegler it is just a matter of priorities.
16:30 daxim__ I can publicly show the grammar, but not the input without asking for permission
16:30 daxim__ http://paste.scsys.co.uk/424972  #  it's a programming language
16:32 jeffreykegler daxim_ -- have you ever read a tutorlal on the traditional Earley's algorithm -- I ask because it would help you read show_progress() output.
16:33 daxim__ no, I haven't
16:34 jeffreykegler Loup Valliant just started one that's getting rave reviews -- http://loup-vaillant.fr/tutorials/earley-parsing/
16:35 jeffreykegler Long ago, I tried to explain the idea in a very basic way: http://blogs.perl.org/users/jeffrey_​kegler/2010/06/jay-earleys-idea.html
16:36 daxim__ wow, lots of stuff.  I can't read it all today, will be back tomorrow.  (I'm on european time)
16:36 jeffreykegler I mention this because show_progress() would really help, and the hurdle would be being able to understand Earley items -- which are actually simple and make sense.
16:37 jeffreykegler daxim_:  I will print out and read the grammar.
16:38 daxim__ the grammar is complete now, and I only get those warnings for the largest two files (about 800 lines of input, or about 29000 octets)
16:39 jeffreykegler But what if yours becomes the hot new language and people will want to run files many megabytes long? :-)
16:39 jeffreykegler I've found some issues, by the way.
16:40 jeffreykegler Statements like "assignment  ::= expression assignmentoperator expression" ...
16:41 jeffreykegler where expression can be an <assignment>
16:41 jeffreykegler That stuff is highly ambiguous.
16:42 jeffreykegler Marpa's prioritized rules are one way to tame these -- http://search.cpan.org/dist/Marpa-R2/p​od/Scanless/DSL.pod#Prioritized_rules
16:43 jeffreykegler Assuming this is a major interest, one you've have and/or will invest some time in ...
16:44 jeffreykegler I think you'd also benefit the reading the classic parsing theory way of dealing with very simple operator expressions --
16:45 jeffreykegler that "E ::= E * T  E ::= T  T ::= T + F  T ::= F  F ::= <digit>+" example you see in 1000's of places ...
16:45 jeffreykegler what precedence is, etc.
16:46 jeffreykegler Marpa will just bang it out for you, but if you're doing an ambitious project, you want to understand a bit what's under the hood.
16:47 jeffreykegler Btw, the classic way to tame these kind of expressions is to rewrite them to obey priority levels ....
16:47 jeffreykegler which is basically what Marpa's prioritized rules do -- that your expressions and rewrite them in prioritized form before passing them on to the parse engine.
16:48 jeffreykegler For other IRC readers: -- this brings me to a topic I've thought of blogging ...
16:49 jeffreykegler Marpa's prioritized rules are a language that writes another language -- that is a prioritized rule is turned into normal BNF, which is the language that the parse engine actually parses ...
16:50 jeffreykegler This is new with Marpa, not in the sense nobody's ever done it before, but in the sense that before no parser generator can take that wide a variety of BNF, and reliably produce a reasonable parser.
17:04 Aria Yeh, I was really interested to see how you'd done that.
17:05 Aria Because it IS the best way to cut a lot of languages down to non-ambiguous.
18:13 lwa joined #marpa
18:59 jeffreykegler joined #marpa
22:11 ronsavage joined #marpa

| Channels | #marpa index | Today | | Search | Google Search | Plain-Text | summary