Perl 6 - the future is here, just unevenly distributed

IRC log for #moarvm, 2015-10-29

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:25 tokuhirom joined #moarvm
01:09 tokuhirom joined #moarvm
01:20 TimToady joined #moarvm
02:37 vendethiel joined #moarvm
02:47 ilbot3 joined #moarvm
02:47 Topic for #moarvm is now https://github.com/moarvm/moarvm | IRC logs at  http://irclog.perlgeek.de/moarvm/today
04:08 TimToady joined #moarvm
04:13 KDr2 joined #moarvm
05:24 ingy joined #moarvm
05:36 FROGGS joined #moarvm
06:41 FROGGS_ joined #moarvm
06:47 FROGGS__ joined #moarvm
06:57 nwc10 joined #moarvm
07:44 FROGGS joined #moarvm
08:38 kjs_ joined #moarvm
09:24 zakharyas joined #moarvm
10:12 kjs_ joined #moarvm
10:12 jnthn .tell Hotkeys I didn't actually change how \r behaves yet, just did the groundwork. Also, unless you're building a MoarVM at HEAD, rather than the version Rakudo will pick by default, you'll not have any of my changes yet anyway.
10:12 jnthn oh, no bot
10:12 jnthn Hotkeys: I didn't actually change how \r behaves yet, just did the groundwork. Also, unless you're building a MoarVM at HEAD, rather than the version Rakudo will pick by default, you'll not have any of my changes yet anyway.
10:13 jnthn psch: \r\n will become a synthetic, *not* identical to \n, just to be clear.
10:48 brrt joined #moarvm
10:49 brrt \o
10:50 brrt guess who had to fix... yet another compilation bug this morning?
10:52 jnthn brrt? :)
10:52 brrt jnthn++ :-P
10:53 brrt apparantly, accessing the lower bytes of rsp-rdi (registers 4-7) dynamically in x64 means you *have* to add a REX byte
10:54 jnthn This REX byte seems to cause plenty of fun...
10:54 brrt because otherwise the address is understood as the second byte of the lower 4 (rax-rbx) registers
10:54 brrt even if you don't actually address any of the regular extended registers
10:54 brrt 'fun'
10:55 brrt why is this? GOD ONLY KNOWS
10:58 brrt i'm also seeing that my approach to register-allocation-over-conditional-and-call-boundaries was oversimplisitc
10:58 brrt as it happens, *online algorithms suck*
10:58 brrt whereby 'suck' has a technical definition meaning 'make things much more complicated and difficult to analyse'
11:03 brrt anyway... i was hoping i could get the compiler to run with just bugfixing the current setup. but the complexity of it feels like it's spiralling
11:03 brrt so i'm wondering if i should move the register allocator to an offline phase, as well as having the tiles in linear memory order
11:05 brrt one of the things that bothered me slightly is that if i linearize the tiles to an array, then i can't splice in spills and loads easily.
11:06 brrt which means one of three things: a): i linearize to something else than an array, like a linked list, which is kind of not-so-fun with regards to allocation (or i should use the spesh allocator, but I don't like that much);
11:07 brrt b): i add a second array which is walked next to the first array that holds spills and stores (i kind of like that solution, but it is complex)
11:07 brrt c): i do spills and stores inline
11:08 brrt but i'm not sure how c) interoperates with the idea of making the register allocation step an offline step
11:08 jnthn Hmmm
11:08 brrt (having register alloc as a offline step is basically compiler best practice, or so i've heard)\
11:08 jnthn For (a) what's wrong with using the spesh allocator?
11:09 brrt consistency. i use the spesh allocator for nothing, and then bam!, it reappears
11:09 brrt although......
11:09 brrt hmmm
11:09 brrt i could actually use the spesh allocator to hold the info nodes
11:10 brrt the info node array is quite redundant as all the tree 'pointers' and constants also get a info node
11:10 timotimo well, the spesh allocator is good for things that are unevenly sized and that you're going to throw away completely at the end
11:10 brrt aye
11:10 timotimo so it seems like a good fit
11:10 brrt it is a good fit
11:10 * brrt wonders if i want random access to the tiles
11:14 brrt ok, that is a good idea, i think
11:14 timotimo i've heard a thousand times that linked lists never outperform arrays even if you do inserts in the middle ... or something like that
11:14 timotimo because ... CACHES!
11:14 jnthn yeah but the spesh allocator sticks the nodes in order anyway :P
11:15 timotimo L1 cache can give you multiple gigs per second throughput! doesn't matter that it's still small and has to grab data from RAM at a much, much slower rate all the time
11:15 jnthn That's why I picked that kind of design
11:15 jnthn Thus a bunch of MVMSpeshIns will be continguous, unless you hit a point something was spliced in.
11:15 jnthn So it's relatively cache friendly
11:16 brrt i think knuth's old quote applies really well here
11:17 brrt timotimo: just do multiple gigs of computation on 8k of memory or so :-P
11:17 * brrt wonders if that is actually done anywhere
11:18 timotimo i'm still wondering if my idea for a Big Data product that handles "Big Data data sets as large as 100x your L1 cache effectively" will be seen as revolutionary and awesome
11:21 brrt lol
11:22 brrt 'big' data like ... a gigabyte :-o
11:22 brrt fwiw, did anybody see the 'go is a poorly designed language' article anywhere?
11:22 brrt it's funny because all the things the author considers as 'poor design' are quite logical imho
11:23 timotimo oh, does it say "its types are spelled totally differently from C"?
11:24 brrt no, actually, it doesn't
11:24 brrt it says 'i want my negative array indexing to work like python and it doesn't :-('
11:24 brrt while negative array indices are a huge cost in a potential fast path
11:25 * jnthn heard Go had attracted a bunch of Python folks, but isn't sure how accurate that is
11:25 brrt perhaps not when they are constant (you can constant-fold it away), but definitely variable indices
11:25 brrt yeah, well, if i were to go and blog about how python doesn't support sigils, i'd look ridiculous, no?
11:26 timotimo wouldn't that be fun?
11:26 brrt yes. yes it would
11:27 brrt 'splicing stuff out doesn't look easy' - that's because it *isn't* easy
11:27 brrt and cheap
11:28 brrt 'declaring a variable in a new scope using shorthand notation shadows my outer scope variable' - i'm not even sure what anybody should expect
11:29 brrt i'm going to write an article about how go doesn't have a whateverstar and how that makes it a sucky language
11:31 timotimo A/B-test it against an article about python not having a whateverstar
11:32 brrt that...
11:32 brrt is an excellent idea
12:13 dalek MoarVM: 385e498 | jnthn++ | src/strings/normalize.c:
12:13 dalek MoarVM: Make NFG algorithm use Unicode Grapheme Clusters.
12:13 dalek MoarVM:
12:13 dalek MoarVM: As described in Annex #29. We do all of it except the CRLF case, as
12:13 dalek MoarVM: enabling that even breaks our ability to parse Perl 6 code (will need
12:13 dalek MoarVM: to figure out why). Aside from the CRLF case, though, we now pass all
12:13 dalek MoarVM: the Unicode grapheme boundary tests (that is, we get the .chars that
12:13 dalek MoarVM: are expected).
12:13 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/385e498d88
12:14 dalek MoarVM: 82f93f7 | jnthn++ | src/strings/normalize.h:
12:14 dalek MoarVM: Toss #define we ended up not needing.
12:14 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/82f93f796e
12:14 dalek MoarVM: 3519077 | jnthn++ | src/strings/normalize.h:
12:14 dalek MoarVM: Slightly simplify a conditional.
12:14 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/3519077425
12:15 jnthn Time to see what the spectest fallout of the NFG algorithm change will be... :)
12:19 nwc10 spectests weren't clean before
12:19 lizmat yeah, were 4 files failing for me yesterday eve
12:19 nwc10 they are in a state of sin a bit too often for my liking
12:20 jnthn With MOAR_REVISION or with master?
12:23 jnthn Seems 7 test files are vicitms of the change
12:24 jnthn Well, have tests that are vicitms
12:27 jnthn lunch &
12:38 nwc10 jnthn: "nom", I believe is the culprit
13:01 nwc10 more spectests fail, but eyeballing the summary, doesn't look like anything surprising
13:06 * jnthn back
13:09 brrt \o jnthn
13:19 jnthn Turns out 1 failing file was 'cus I needed to patch Str.perl (now done)
13:19 jnthn Next 2 were tests that don't make sense under NFG semantics.
13:19 jnthn And that we didn't spot last time around 'cus the NFG algo was insufficient.
13:30 jnthn m: say uniprop("\x1B3D", 'General_Category')
13:30 camelia rakudo-moar da8881: OUTPUT«Mc␤»
13:31 jnthn m: say "\x1B3D".NFD
13:31 camelia rakudo-moar da8881: OUTPUT«NFD:0x<1b3c 1b35>␤»
13:31 jnthn m: say "\x1B3D".chars
13:31 camelia rakudo-moar da8881: OUTPUT«1␤»
13:31 jnthn m: say "\x1B3D".ord
13:31 camelia rakudo-moar da8881: OUTPUT«6973␤»
13:31 jnthn wtf
13:32 jnthn m: say 0x1B3D
13:32 camelia rakudo-moar da8881: OUTPUT«6973␤»
13:32 jnthn Locally
13:32 jnthn > say "\x1B3D".ord
13:32 jnthn 6972
13:32 jnthn o.O
13:34 jnthn m: say Uni.new(0x1B3D).Str.ord.base(16)
13:34 camelia rakudo-moar da8881: OUTPUT«1B3D␤»
13:35 jnthn > say Uni.new(0x1B3D).NFC
13:35 jnthn NFC:0x<1b3d>
13:35 jnthn Not NFC that got busted
13:35 jnthn > say Uni.new(0x1B3D).Str.ord.base(16)
13:35 jnthn 1B3C
13:36 jnthn I don't even...
13:45 jnthn m: say Uni.new(0x1B3D).NFD.NFC
13:45 camelia rakudo-moar da8881: OUTPUT«NFC:0x<1b3c 1b35>␤»
13:46 jnthn ah
13:46 brrt what, how
13:46 jnthn m: say uniname(0x1B3D)
13:46 camelia rakudo-moar da8881: OUTPUT«BALINESE VOWEL SIGN LA LENGA TEDUNG␤»
13:46 jnthn m: say uniname(0x1B3C)
13:46 camelia rakudo-moar da8881: OUTPUT«BALINESE VOWEL SIGN LA LENGA␤»
13:47 jnthn m: say uniname(0x1B35)
13:47 camelia rakudo-moar da8881: OUTPUT«BALINESE VOWEL SIGN TEDUNG␤»
13:47 jnthn m: say uniprop(0x1B3C, 'General_Category')
13:47 camelia rakudo-moar da8881: OUTPUT«Mn␤»
13:47 jnthn m: say uniprop(0x1B3D, 'General_Category')
13:47 camelia rakudo-moar da8881: OUTPUT«Mc␤»
13:47 jnthn m: say uniprop(0x1B35, 'General_Category')
13:47 camelia rakudo-moar da8881: OUTPUT«Mc␤»
13:48 jnthn Yowser. That's a bit of an interesting problem.
13:49 nwc10 It's not at all obvious to me why (or what's wrong)
13:49 nwc10 I have not read this yet: http://morepypy.blogspot.co.at/2015/10/pypy-400-released-jit-with-simd.html?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed:+PyPyStatusBlog+%28PyPy+Status+Blog%29
13:49 jnthn nwc10: Well, it came from a test regression
13:50 * brrt should probably subscribe to their blog since it is interesting
13:50 jnthn m: say uniprop(0x1B3D, 'NFC_QC')
13:50 camelia rakudo-moar da8881: OUTPUT«Y␤»
13:50 nwc10 I'm just looking at what's on http://planetpython.org/
13:50 brrt although i recently lost all my subscriptions when i forgot to copy a opml file
13:50 jnthn So, that character passes the NFC quick-check
13:51 jnthn Which implies "we're already in NFC"
13:51 brrt uhuh
13:51 jnthn But NFC should afaiu be stable
13:51 jnthn Such that if you compute NFD and then again compute NFC, you get the same thing back
13:51 jnthn That's not happenign here
13:51 jnthn *happening
13:52 nwc10 we're into "#11907 Looking for a compiler bug is the strategy of LAST resort. LAST resort." ?
13:52 jnthn Not yet
13:53 jnthn I need to go look at our NFD -> NFC
13:53 jnthn But NFC is *defined* in terms of NFD
13:57 rarara_ joined #moarvm
13:58 * brrt wonders what the current state of the art is in rubyland, since ruby may be even more comparable to perl6 in terms of indirections
14:07 jnthn Well, I can't find a way we're inconsistent with the actual Unicode data files
14:16 jnthn So yeah, it's very odd. I have a case where a string passes the NFC QuickCheck, but actually computing NFC on that string doesn't give identity
14:17 nwc10 use more 'coffee'; ?
14:17 jnthn And it's a one-char string so I don't think I could be getting the use of NFC_QC wrong.
14:37 tokuhiro_ joined #moarvm
14:40 jnthn OK, seems we have something wrong in our canonical composition
14:56 jnthn Yeah, nailed it I think
14:57 brrt that was fast
14:59 jnthn The number of characters added by patch to time spent ratio is pretty awful :P
15:00 brrt aw, there goes your enterprise points
15:01 jnthn :P
15:03 * brrt has seen a lot of articles lately about how COBOL was making a comeback
15:03 brrt maybe we should have an Inline::COBOL
15:03 brrt or just port moar to  COBOL for enterprise points
15:05 jnthn If your goal is -Osalary, COBOL may well be one of the best languages to learn :)
15:09 dalek MoarVM: 5ff3001 | jnthn++ | src/strings/normalize.c:
15:09 dalek MoarVM: Fix a canonical composition bug.
15:09 dalek MoarVM:
15:09 dalek MoarVM: We didn't admit various starter/starter composition cases. This bug
15:09 dalek MoarVM: actually managed to survive despite us passing the complete Unicode
15:09 dalek MoarVM: normalization test suite, because we never hit this code path before
15:09 dalek MoarVM: thanks to the NFC_QC property. Now, thanks to NFG_QC, we can hit it
15:09 dalek MoarVM: in some more cases (this does perhaps point to a future optimization).
15:09 dalek MoarVM: Fixes a spectest regression.
15:09 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/5ff3001a04
15:11 brrt i don't even...
15:13 jnthn :)
15:14 brrt understand how or why that makes a difference :-)
15:15 jnthn brrt: Sometimes, two characters that are *not* combining chars are composed into a single char when canonicalizing.
15:16 jnthn brrt: And the bug I just fixed meant we didn't let that happen.
15:17 jnthn Now I'm down to one affected test file
15:17 jnthn Bizzarely, S32-io/IO-Socket-INET.t
15:21 jnthn m: say uniname(0xbeef)
15:21 camelia rakudo-moar 3cc195: OUTPUT«<Hangul Syllable>␤»
15:21 jnthn haha
15:21 jnthn m: say uniname(0xbabe)
15:21 camelia rakudo-moar 3cc195: OUTPUT«<Hangul Syllable>␤»
15:22 jnthn Pro tip: when writing tests and wanting some random "Unicode character", don't just spell cute words :)
15:22 jnthn Otherwise you might (or in this case, will!) end up with something that will, under NFG, end up combining with the previous grapheme.
15:37 brrt no, i stil don't get it
15:38 brrt but i'll accept that for now
15:38 jnthn brrt: I only get it in so far as "I read the Unicode spec and know what the terms mean"
15:39 jnthn I don't know anything about the Balianese language and the specific thing that's going on with these chars.
15:39 brrt the NFG business seems similar in a way to the x86 instruction encoding business
15:40 brrt you think you fixed it, but no, something funny happens
15:40 brrt only in specific magical cases
15:40 jnthn Well, this wsan't even an NFG bug, just an NFC one :)
15:40 brrt fair enough :-)
15:40 jnthn It's not *that* bad, tbh. It's just that humans are darn creative about their writing systems.
15:41 jnthn Hangul is probably the biggest offender in terms of amount of code we have to write just for it.
15:50 nwc10 t/spec/S15-nfg/cgj.rakudo.moar
15:50 nwc10 TODO passed:   5-8
15:50 jnthn Yup :)
16:12 jnthn Oh man. The \r\n => 1 grapheme thing may be a bit fraught
16:12 nwc10 why so?
16:14 jnthn Well, NQP can't even parse a simple test file with a \r\n in it any more
16:15 jnthn And then the error handling code it uses to try and report that fails too
16:17 jnthn And the REPL hangs
16:18 nwc10 step away from the keyboard, and make a curry?
16:18 nwc10 the "error reporting" thing sounds like a bug that needs fixing whatever else happens next.
16:18 jnthn aye
16:21 jnthn Ah
16:21 jnthn One possible problem is that concatenation of a \r and \n doesn't produce the grapheme
16:28 dalek MoarVM: f1a216d | jnthn++ | src/strings/nfg.c:
16:28 dalek MoarVM: Update concat code in prep for \r\n as grapheme.
16:28 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/f1a216d29e
16:35 jnthn Aha
16:36 jnthn say(?("\r\n" ~~ /\v/))
16:36 [Coke] oho?
16:36 jnthn That doesn't match
16:36 jnthn So, that's almost certainly the source of the \r\n -> 1 grapheme bustage
16:37 jnthn I suspect fixing this is going to need an NQP bootstrap updage
16:37 jnthn *update
16:37 jnthn But worse, it probably also brings up the issues with NFG and the NFA engine
16:45 TimToady m: say "\x037e".ord.base(16)
16:45 camelia rakudo-moar 3cc195: OUTPUT«3B␤»
16:46 jnthn m: say uniname(0x037E)
16:46 camelia rakudo-moar 3cc195: OUTPUT«GREEK QUESTION MARK␤»
16:46 TimToady is there an explanation for why we lose track of GREEK QUESTION MARK?
16:46 jnthn Almost certainly :)
16:46 jnthn m: say uniprop(0x037E, 'Decomp_Spec')
16:46 camelia rakudo-moar 3cc195: OUTPUT«003B␤»
16:46 jnthn ^^
16:46 jnthn 'cus Unicode says we should as part of normalization
16:47 jnthn See singleton equivalence in http://unicode.org/reports/tr15/ for more info
16:47 TimToady k
16:47 jnthn (There's a handful of 'em)
16:49 [Coke] TimToady: I thought we already said that on #perl6. my bad.
16:49 [Coke] jnthn: it makes us impervious to the "mess with your friend's code" meme that was going around recently.
16:51 jnthn [Coke]: Nah, there's still plenty of other ways to do that :)
16:51 TimToady jnthn: btw
16:51 TimToady m: say uniprop(0x1B3D)  # don't need to type General_Category every time
16:51 camelia rakudo-moar 3cc195: OUTPUT«Mc␤»
16:51 TimToady that's the default
16:51 jnthn Darn, wish I'd know that earlier :P
16:51 jnthn After today I probably don't need to look up gen cats again for another few months :P
16:53 jnthn TimToady: I assume you want us to end up with \r\n as a synthetic so we totally follow the Unicode grapheme cluster rules? :)
16:53 TimToady unless there's some showstopper reason we can't
16:54 jnthn Not that I see so far, it's just a bit of hunting down the badass umptions in the code... :)
16:54 TimToady at least it makes it easier to match \n in regex :)
16:54 jnthn Yeah
16:54 TimToady and certainly \v should match it too
16:55 jnthn For sure; just patched that locally
16:55 jnthn Though it's a hack :/
16:55 jnthn And will break LTM of \V until I more generally fix the NFG/NFA interaction
16:55 jnthn The NFA design we inherited is rather into doing nqp::ord
16:55 jnthn And so loses synthetics
16:56 jnthn Not a huge engineering problem to fix, I don't think. Just another thing to do.
16:57 TimToady though even if we have an nqp::gord or so, we're still in trouble if we go with per-string or per-domain tables
16:57 jnthn Indeed. I don't want to go that way.
16:57 jnthn Better to actually pass one-char strings.
16:57 jnthn At the "API"
16:58 jnthn (We can still keep it all integers on the inside at actual matching time)
16:58 TimToady let's keep doing it right, and then think about optimization
16:58 jnthn Well, we're not doing it right yet...but yeah.
16:58 TimToady *righter
16:59 jnthn Otherwise, I think the NFG algo tweaks have turned out OK.
16:59 TimToady any feel on input performance degradation?
17:00 tokuhiro_ joined #moarvm
17:00 TimToady presumably shouldn't be much if most of the file is quick-reject ASCII
17:01 jnthn Yeah, it's a little slowdown for ASCII 'cus we have to care about \r now
17:01 TimToady well, except insofar as \r\n is ... yeah
17:01 jnthn And we're more careful over controls
17:01 jnthn When we have to do the full analysis it's more costly
17:01 jnthn But I computed us an NFG quickcheck property
17:02 jnthn So we should only be doing the hard work when we really need to
17:02 jnthn m: say uniprop('x', 'NFG_QC')
17:02 camelia rakudo-moar 3cc195: OUTPUT«0␤»
17:02 jnthn Ah, build is behind
17:02 jnthn But yeah, we leak that property to userspace as if it was a normal Unicode property, when it's in fact one we've made up
17:03 jnthn Dunno if that bothers you.
17:03 TimToady not much
17:03 jnthn k
17:04 jnthn Ended up with all the control chars being NFG terminators, btw.
17:04 TimToady though if the UC adopts the "NFG" term we could eventually get a name collision, but so far they seem to prefer Normalization_Form_Grapheme and such
17:04 jnthn Which was something you suggested before.
17:05 jnthn Well, they do call their quickcheck properties NFC_QC for example
17:05 TimToady well, hopefully then they won't adopt it and flip the sense :)
17:05 jnthn If they did, they'd make it inconsistent with how the other quickcheck properties work
17:05 jnthn And they don't seem that crazy. :)
17:06 jnthn Or at least, no more crazy than you have to be to try and bring some order to the world's writing systems...
17:08 TimToady in the backlog: "never hit this codepath"
17:08 TimToady do we have any plans for a code coverage tool?
17:09 TimToady so we can tell if there are glaring blind spots in roast?
17:09 jnthn Well, a user-level thing "not yet"
17:10 jnthn Do we have the tech to hack something up to tell us where our roast blind spots are? That's easier.
17:10 jnthn The cross-thread write logging and the profiler use bytecode instrumentation, and it's easy enough to write extra ones of those
17:10 TimToady well, thinking setting-level mostly there
17:11 jnthn My Big Plan is to turn the instrumentation stuff into a kind of meta-interpreter framework so you can write stuff like profilers and coverage tools and debuggers in NQP or Perl 6 code
17:12 jnthn But that's not going to happen this side of 6.c
17:15 jnthn Anyway, I can do a couple-of-hours hack solution to get an approximate answer to "what is roast not covering"
17:15 jnthn And maybe it'll be inspiration for somebody to go and make a good one :)
17:17 jnthn OK, I've got NQP patches that get all but one of the NQP tests passing
17:18 jnthn (With \r\n as a grapheme)
17:19 jnthn It'll need a rebootstrap, alas
17:20 jnthn Hm, and it doesn't make it all the way through the Perl 6 build either.
17:22 TimToady we have 5 comments in nqp that mention things to change after a rebootstrap :)
17:22 TimToady which I'm sure were put there at least one reboot ago...
17:23 jnthn :)
17:23 jnthn Time for me to go cook us something tasty :)
17:26 * jnthn is happy to have this bit of the NFG work nearly done
17:26 jnthn Guess I need to worry about the threads/IO things next week
17:27 jnthn Well, various I/O things...
17:27 jnthn Anyway, away for a bit
18:06 kjs_ joined #moarvm
18:17 tokuhiro_ joined #moarvm
18:18 zakharyas joined #moarvm
18:28 leont joined #moarvm
18:40 FROGGS joined #moarvm
18:42 vendethiel joined #moarvm
19:24 kjs_ joined #moarvm
19:54 tokuhiro_ joined #moarvm
19:55 rarara_ joined #moarvm
20:10 kjs_ joined #moarvm
20:37 kjs_ joined #moarvm
21:19 kjs_ joined #moarvm
21:33 zakharyas joined #moarvm
22:20 tokuhiro_ joined #moarvm
22:20 tokuhiro_ joined #moarvm

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary