Perl 6 - the future is here, just unevenly distributed

IRC log for #moarvm, 2015-11-04

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:20 tokuhiro_ joined #moarvm
02:00 camelia joined #moarvm
02:03 camelia joined #moarvm
02:06 camelia joined #moarvm
02:15 tokuhiro_ joined #moarvm
02:21 camelia joined #moarvm
02:48 ilbot3 joined #moarvm
02:48 Topic for #moarvm is now https://github.com/moarvm/moarvm | IRC logs at  http://irclog.perlgeek.de/moarvm/today
02:54 camelia joined #moarvm
03:05 jnthn joined #moarvm
03:07 camelia joined #moarvm
03:18 vendethiel joined #moarvm
04:35 dalek joined #moarvm
06:12 kjs_ joined #moarvm
07:25 harrow` joined #moarvm
07:38 FROGGS joined #moarvm
07:40 vendethiel joined #moarvm
08:45 nwc10 good UGT, #moarvm
08:45 nwc10 almost typoed that as UFT
08:45 nwc10 I wonder what UFT would be
08:46 timotimo Universal Food Time?
08:46 nwc10 so, dammit, there goes the answer to the security question "qwerty or dvorak?" :-)
08:46 nwc10 that sounds very fattening
09:58 lizmat joined #moarvm
10:49 zakharyas joined #moarvm
12:12 dalek MoarVM: 2b9c98a | jnthn++ | src/strings/ (3 files):
12:12 dalek MoarVM: Fix encoding \r\n grapheme in non-utf8 encodings.
12:12 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/2b9c98adb0
12:51 dalek MoarVM: fdd5bd5 | jnthn++ | src/strings/ (5 files):
12:51 dalek MoarVM: Fix non-streaming decoders for \r\n as 1 grapheme.
12:51 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/fdd5bd50c7
12:52 ilmari jnthn++ # I was about to have to add realloc to the encoders to handle multi-character replacements, now I don't have to :)
12:53 jnthn ;)
12:53 jnthn ilmari: I need to update the streaming decoders too; writing tests for that at the moment.
12:53 ilmari btw, what's the best way to pass them to the low-level function if we want to provide them as Bufs at the perl6 level?
12:54 jnthn Did we want to provide them as Bufs at the Perl 6 level, or Str?
12:54 jnthn If it's a Buf, will be re-validate that the Buf in question actually decodes OK?
12:54 ilmari Str avoids the risk the replacement being invalid
12:54 ilmari yeah
12:54 ilmari so just taking it as an MVMString in _encode_substr and recursing to encode it?
12:54 jnthn OTOH, Str needs care too
12:55 ilmari throwing an exception if it self isn't encodable
12:55 jnthn Because you really need to feed its graphemes through the normalization process too
12:56 jnthn Oh wait, I guess that's overkill
12:56 jnthn Yeah, that's kinda silly
12:56 jnthn So ignore :)
12:56 jnthn And yeah, you'd just recurse to encode the replacement string, or just do it once.
12:57 jnthn Also there's no streaming encode, so you can ignore my "streaming decode not updated" comment too :)
12:57 jnthn (Just do it once at the start, I mean)
12:58 ilmari yeah
12:59 jnthn Doing it once at the start would also mean you get to check if it will/won't work out up front
12:59 ilmari yeah
12:59 jnthn And so ease memory management a little
13:00 ilmari if (replacement) repl_bytes = MVM_string_ascii_encode_substr(tc, replacement, &repl_length, 0, -1, NULL)
13:00 jnthn That works
13:00 ilmari NULL being the new «MVMString *replacement» paramter
13:01 ilmari I see the code mixes size_t/MVMuint64 and char*/MVMuint8* a bit
13:01 ilmari e.g. the function returns char*, but the result var is MVMuint8*
13:02 jnthn Which encoding for, ooc?
13:02 jnthn Oh, even the ASCII one I'm looking at casts at the end
13:02 ilmari I'm doing ascii first
13:03 ilmari ah, it does cast at the end
13:14 domidumont joined #moarvm
13:46 ilmari bah, I keep typoing MVMString as MVString
14:24 dalek MoarVM: a362d21 | jnthn++ | src/strings/ (3 files):
14:24 dalek MoarVM: Fix various streaming decoders for \r\n grapheme.
14:24 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/a362d213cd
14:30 ilmari I've done as much as I can easily do wrt. encoding replacment chars: https://github.com/ilmari/MoarVM/commits/encode-exceptions
14:30 ilmari the next bit requires changing the bytecode format to allow specifying the replacement string for unencodable characters
14:32 ilmari if someone can point me in the right direction I could have a go later
14:32 ilmari now I need to get back to work
14:37 jnthn ilmari++
14:38 jnthn ilmari: The overall process is: update src/core/oplist, and run tools/update_ops.p6, then edit src/core/interp.c.
14:39 jnthn ilmari: But we have to deal with back-compat of bytecode, so I'd add a new op (encoderep or so)
14:40 jnthn ilmari: There are instructions on where to add ops at the top of src/core/oplist (simple answer: after all normal ops, before any special ops)
14:51 ilmari are fallthroughs in the interp.c switch allowed?
14:51 ilmari OP(encode): {
14:51 ilmari MVMString *replacement = NULL;
14:51 ilmari OP(encoderep):
14:51 ilmari replacement = GET_REG(cur_op, 6).s;
14:54 ilmari then I guess I need to wire it up in nqp and finally rakudo?
14:58 jnthn ilmari: Yeah but I worry a bit about whether compilers are smart enough to turn the switch into a jump table if we do tricks like that.
14:58 jnthn ilmari: Which is quite important for the interp.
14:58 jnthn And the op needs to come at the end
14:58 jnthn Well, the end of the normal ops
15:11 jnthn m: say +"42\n"
15:11 camelia rakudo-moar 36a351: OUTPUT«42␤»
15:11 jnthn m: say +"42\r\n"
15:11 camelia rakudo-moar 36a351: OUTPUT«should eventually be unreachable␤  in block <unit> at /tmp/ZU7uQLeOwj:1␤␤»
15:13 [Coke] O_o
15:14 jnthn :)
15:14 * jnthn makes things more eventual :)
15:15 jnthn \r\n being a grapheme is being really good at shaking out various things.
15:16 dalek MoarVM: 4b46f83 | jnthn++ | src/ (2 files):
15:16 dalek MoarVM: Make radix ops not blow up over synthetics.
15:16 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/4b46f83cfa
15:33 tokuhiro_ joined #moarvm
15:40 jnthn m: say uniprop "\r", 'Decomp_Type'
15:40 camelia rakudo-moar 36a351: OUTPUT«0␤»
15:40 jnthn m: say uniprop "\r", 'Decomposition_Type'
15:40 camelia rakudo-moar 36a351: OUTPUT«None␤»
15:40 jnthn m: say uniprop "\r", 'WhatNoThisIsNoProp'
15:40 camelia rakudo-moar 36a351: OUTPUT«0␤»
15:44 ilmari $ ./perl6 -e 'say "skjærgårdsøl".encode("ascii", :replacement("?")).decode("ascii")'
15:44 ilmari skj?rg?rds?l
15:44 ilmari \o/
15:47 jnthn \o/
15:47 jnthn What on earth...
15:47 jnthn My debugger claims that we're passing a 13 to a function
15:47 jnthn And it receives -1
15:48 jnthn (Yes, same type in both cases: MVMCodepoint)
15:52 TimToady it's a good thing we included NFG as one of the Big Three
15:54 jnthn Indeed
15:54 jnthn Though the S of NSA is...uh...in need of some love..
15:54 TimToady one hopes that won't be quite so upsetting as \r\n
15:55 nwc10 the Three Wise Ones to do before Christmas
15:56 jnthn m: say uniname(214)
15:56 camelia rakudo-moar 36a351: OUTPUT«LATIN CAPITAL LETTER O WITH DIAERESIS␤»
15:57 nwc10 j: say uniname(214)
15:57 camelia rakudo-jvm 273e89: OUTPUT«LATIN CAPITAL LETTER O WITH DIAERESIS␤»
15:57 TimToady you'll note those are not the same version
15:57 nwc10 I failed to note that. Thanks.
15:58 TimToady we decoupled them last night
15:58 TimToady and jvm build has been failing for a couple days
15:59 nwc10 it needs a champion/hero/superhero?
16:00 TimToady jvm has periodic champions
16:00 nwc10 jnthn: this might be a daft thought experiment, but as the grapheme for \r\n is likely to crop up quite a bit, and is the only one in ASCII, might it be useful to "hard code" it as ord -1?
16:01 nwc10 it means that "is this string ASCII" becomes the range (-1, 127), and ISO-8859-1 (-1, 255)
16:01 nwc10 but there might be downsides to this
16:01 jnthn It gets -1 anyway since it's the only synthetic we encounter during VM initialization
16:01 nwc10 ah OK
16:01 jnthn Though yeah, I didn't code anything hard against that assumption
16:01 TimToady well, on windows
16:01 nwc10 I wondered if it's bad to make that assumption.
16:02 nwc10 Or a saving.
16:02 nwc10 I guess it's too early to know
16:02 TimToady is it -1 on *nix?
16:02 nwc10 I don't know. And I don't know how to answer that
16:02 jnthn TimToady: Yeah. It's 'cus stdin/stdout/stderr are initialized at startup, and their separators are set up to include \r\n whatever the platform
16:03 jnthn nwc10: At a guess it'd be a very modest saving
16:03 TimToady in a sense, having -1..127|255 makes ascii and latin-1 into variable-width encodings
16:05 jnthn oh wtf
16:05 nwc10 I used to like text better than numbers. Maybe I should give up and decide that booleans are the only fun type.
16:05 jnthn Seems some buffer mis-management
16:05 nwc10 jnthn: ASAN doesn
16:05 nwc10 t
16:05 nwc10 barf.
16:06 jnthn Not an overrun
16:06 jnthn Don't think the code can overrun
16:06 jnthn Just re-consider something it already processed
16:08 [Coke] TimToady++ hacking on camelia.
16:09 ilmari $ ./perl6 -e 'say "\x[ffffff]".encode(:replacement).decode("utf8")'
16:09 ilmari
16:09 * ilmari does a little dance
16:13 lucasb joined #moarvm
16:19 lucasb iiuc, Configure.pl clones the 3rdparty submodules everytime it is run in a "clean" tree. Is it worth to have the full clone? Would a git --depth option speed the clonning?
16:19 lucasb *cloning
16:20 lucasb but I understand these repos may be small, so I wouldn't matter much. maybe except for libuv
16:21 lucasb *so it wouldn't matter much
16:27 JimmyZ ilmari++ # another RT down.
16:30 * jnthn has figured out what's going on
16:40 jnthn And again, was something that was bust before, but \r\n as a single grapheme showed it up
16:40 * jnthn wonders how many of the regressions from making n magical will be solved by his fix
16:44 timotimo jnthn: did you see the one where ords( ) on that string containing \r\n gives you a super high number (didn't check, but probably -1)
16:45 timotimo but not just any string with \r\n
16:45 jnthn That's hopefully covered by the local patch I have
16:45 timotimo neat
16:48 timotimo there was such a lot of backlog today that i couldn't keep up yet; is there a fix for socket's .get giving you more than you've wished for?
16:48 jnthn timotimo: I *think* so
16:49 timotimo it's a good start
16:49 jnthn At lesat, RabidGravy++ seemed to think I fixed all the things, and I think that was included
16:49 * timotimo grabs fresh sources
16:49 timotimo do you have the file for ord() -> -1? if not, i can link it or test it locally after your latest patch lands
16:54 timotimo ords leaking synthetics seems fixed already, but it could just be that the reproduction isn't doing it in this case any more
16:56 timotimo it was not reliable in the first place
17:00 timotimo ah, "special handles"
17:01 jnthn Oh, that wasn't The Change
17:02 timotimo it's not interesting for the .get on sockets thing?
17:02 * timotimo looks
17:02 jnthn No. It'd be worth checking if the sockets thing is still busted
17:03 jnthn These line endings things have been a lot mroe time-consuming than I expected...
17:03 jnthn Including making \n virtual on Windows
17:03 jnthn Well, everywhere, but it means it's \r\n on Windows
17:04 timotimo i'm trying out panda without the hotfix right now
17:05 timotimo seems still broken, let me double-check if i've got latest everything forever
17:05 dalek MoarVM: 62fbd37 | jnthn++ | src/strings/normalize.c:
17:05 dalek MoarVM: Never re-normalize what we already considered.
17:05 dalek MoarVM:
17:05 dalek MoarVM: In some cases, we get into a situation where already normalized chars
17:05 dalek MoarVM: are left sitting in the normalizer's buffer. We could then end up
17:05 dalek MoarVM: trying to renormalize them, and any synthetics would cause immense
17:05 dalek MoarVM: upset to things expecting codepoints.
17:05 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/62fbd3755c
17:05 jnthn ^ caused some nasty issues too
17:06 timotimo ok, i'll grab & build that, too
17:06 timotimo i wish there was a way to tell configure "no, it's all right, just accept the version i have. i just didn't re-configure everything"
17:08 domidumont joined #moarvm
17:12 timotimo Use of Nil in string context  in block  at /home/timo/perl6/install/share/perl6/site/lib/Panda/Ecosystem.pm:106
17:12 timotimo this is the problem we've had
17:13 timotimo about .get giving us more than just one line
17:13 * timotimo instruments with debug output
17:15 timotimo "HTTP/1.1 200 OK\r\nDate: Wed, 04 Nov 2015 17:15:21 GMT\r\nServer: Apache/2.4.10 (Debian)\r\n(and so on)…"
17:16 timotimo that's what .get gives us
17:18 [Coke] timotimo: I think we could make our makefiles smart enough to include the config step to avoid that issue.
17:18 timotimo well. that's what you get!
17:18 [Coke] but it's so much pain.
17:19 timotimo i hear ya
17:19 timotimo seriously, please don't talk so loud! i hear you from all the way over there! ;)
17:21 jnthn break &
17:34 tokuhiro_ joined #moarvm
18:00 camelia joined #moarvm
18:32 tokuhiro_ joined #moarvm
19:01 nwc10 jnthn: much test fail. Tried one - ASAN barfage: http://paste.scsys.co.uk/500963
19:02 vendethiel joined #moarvm
19:12 leont joined #moarvm
19:38 synbot6 joined #moarvm
19:38 tokuhiro_ joined #moarvm
20:23 zakharyas joined #moarvm
21:40 tokuhiro_ joined #moarvm
21:44 Peter_R joined #moarvm
21:58 Ven joined #moarvm
22:03 ShimmerFairy joined #moarvm
22:18 Ven joined #moarvm
23:57 flussence I've cobbled some code together over the last 2 days that *looks* pretty reasonable but breaks in various ways. It's reached the point where it's throwing mvm-internal errors, so I give up :)  Here it is if someone else wants a fun time: https://gist.github.com/flussence/a27ca3f5632476e80019

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary