Perl 6 - the future is here, just unevenly distributed

IRC log for #moarvm, 2015-11-21

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:20 tokuhiro_ joined #moarvm
02:12 colomon joined #moarvm
02:18 BinGOs_ joined #moarvm
02:18 btyler_ joined #moarvm
02:18 leedo_ joined #moarvm
02:18 [Coke]_ joined #moarvm
02:19 jnthn_ joined #moarvm
03:19 timotimo i think we threw the frame pool out because it no longer helped performance
03:19 timotimo but it wasn't replaced by malloc; it was replaced by the Fixed Size Allocator
03:22 tokuhiro_ joined #moarvm
05:24 tokuhiro_ joined #moarvm
06:20 diakopter yah, but from the fixed sized allocator, it spends 10% of time in malloc
07:26 tokuhiro_ joined #moarvm
07:47 domidumont joined #moarvm
07:52 domidumont joined #moarvm
08:24 arnsholt_ joined #moarvm
09:27 tokuhiro_ joined #moarvm
09:30 domidumont joined #moarvm
09:45 Peter_R joined #moarvm
10:10 kjs_ joined #moarvm
10:14 BinGOs joined #moarvm
10:35 vendethiel joined #moarvm
11:17 tokuhiro_ joined #moarvm
11:25 domidumont joined #moarvm
12:03 FROGGS joined #moarvm
12:23 tokuhiro_ joined #moarvm
12:24 TimToady joined #moarvm
13:28 leont joined #moarvm
13:36 timotimo oh
14:24 tokuhiro_ joined #moarvm
14:53 vendethiel joined #moarvm
16:11 kjs_ joined #moarvm
16:18 colomon joined #moarvm
16:26 tokuhiro_ joined #moarvm
16:37 colomon joined #moarvm
17:08 zakharyas joined #moarvm
17:32 hoelzro o/ #moarvm
17:32 japhb o/
18:02 colomon joined #moarvm
18:05 hoelzro o/ japhb
18:06 hoelzro I've been digging around in MVM_string_utf16_encode_substr, and afaict, it doesn't specify if the output is UTF-16BE or UTF-16LE
18:06 hoelzro it seems to just use native encoding
18:06 hoelzro er, endianness
18:06 hoelzro is that something that should be well defined for that function?
18:13 timotimo needs a BOM :P
18:14 hoelzro should we set us up the bom in that function, then?
18:14 hoelzro that feels like it belongs at a higher level, maybe
18:14 timotimo dunno, what does the spec say about its power level?
18:15 colomon joined #moarvm
18:15 hoelzro > 9000
18:15 timotimo WHAT NINE THOUSAND
18:15 timotimo well, the BMP is a bit more than nine thousand, isn't it?
18:16 hoelzro 16K, right?
18:16 hoelzro er, no
18:16 hoelzro 64K
18:16 * hoelzro can't math today
18:17 timotimo math! what is it good for?
18:18 dalek MoarVM: 47ab6f3 | hoelzro++ | src/strings/utf16.c:
18:18 dalek MoarVM: Resize buffers as needed when taking a UTF-16 substring
18:18 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/47ab6f3183
18:18 dalek MoarVM: 05ad276 | hoelzro++ | src/strings/utf16.c:
18:18 dalek MoarVM: Initialize repl_length to 0
18:18 dalek MoarVM:
18:18 dalek MoarVM: Otherwise we depend on uninitialized values for growing the buffer
18:18 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/05ad276431
18:22 hoelzro timotimo: do you think the endianness thing is RT worthy?
18:24 timotimo no clue
18:24 timotimo i'ven't seen an UTF-16 thing in a long time
18:25 timotimo isn't it quite common in asian parts of the world?
18:25 hoelzro I thought it was just MS stuff
18:25 hoelzro and Java, but Java uses UCS-2
18:25 leont I think so does Oracle
18:25 hoelzro I'm not sure about asian countries, but I thought that Japan, for example, has stuck with Shift-JIS
18:26 hoelzro oh, I didn't know that
18:26 timotimo what is Shift-JIS?
18:26 hoelzro it's an encoding that was (is?) popular in Japan
18:27 tokuhiro_ joined #moarvm
18:27 timotimo let's see ...
18:28 leont Or at least it's producing CESU-8, which is an eldrich horror
18:29 leont (UTF-8, but with surrogate pairs…)
18:29 hoelzro wtf
18:30 timotimo so just like json?
18:30 leont Almost
18:31 leont AFAIK JSON is Modified UTF-8, which is the same except that a null character is encoded as 0xC0,0x80…
18:32 leont Which is a Java thing
18:33 leont Don't see it mentioned in the JSON RFC, I may be mistaken there
18:39 colomon joined #moarvm
18:41 jnthn hoelzro: I think we should probably have UTF-16 write a BOM and mean native, and add UTF-16-LE and UTF-16-BE
18:42 jnthn Which can re-use the same code near enough
18:42 jnthn And just twiddle the endianness on the way out
18:42 jnthn Or in
18:43 leont "twiddle the endianness on the way out"
18:43 arnsholt leont: What on Earth is the rationale for something like CESU-8? If you're restricted to bytes, wouldn't UTF-8 be simpler?
18:43 leont ?
18:43 leont arnsholt: it's cheaper to convert UTF-16 to CESU-8 than to UTF-8, I guess
18:43 jnthn leont: As in, after grabbing codepoints, doing the surrogate pair split, and so forth
18:43 arnsholt True, I guess
18:43 hoelzro jnthn: should I make a ticket for that?
18:43 jnthn Heck, can even pass in a function pointer
18:44 jnthn hoelzro: Yeah, can do
18:44 leont endianness and surrogates have a clear order in my head
18:45 hoelzro https://rt.perl.org/Ticket/Display.html?id=126704
18:45 leont (possibly I'm misunderstanding what you just said and we're in agreement)
18:45 jnthn leont: You write the surrogates in a different order too?
18:45 jnthn I thought you just wrote the 16-bit values in a different order...
18:45 hoelzro https://rt.perl.org/Ticket/Display.html?id=126705
18:45 leont No, I don't think so
18:46 leont We're probably talking past each other, just ignore what I said :-
18:46 leont )
18:46 hoelzro jnthn: re: a BOM, though; I would think that would be the responsibility of a higher layer? ex. what if a protocol *always* uses UTF-16BE; does it make sense to throw a BOM on?
18:47 leont Depends on the protocol
18:47 jnthn leont: https://en.wikipedia.org/wiki/UTF-16#U.2BD800_to_U.2BDFFF seem to agree with what I mean... :)
18:47 hoelzro leont: right, so why force the BOM if the programmer doesn't need it?
18:49 leont jnthn: indeed that's the obvious thing
18:51 jnthn bah
18:51 jnthn "If the BOM is missing, RFC 2781 says that big-endian encoding should be assumed. (In practice, due to Windows using little-endian order by default, many applications similarly assume little-endian encoding by default.)"
18:51 jnthn Standards... :/
18:52 leont Little-Endian is a bit silly, but given that's how all architectures work nowadays (even ARM switched) it seems a fait accompli
18:52 ilmari however, because the first character of JSON must be < 127, you can tell by the pattern of nulls
18:54 ilmari RFC 7159 says «Implementations MUST NOT add a byte order mark to the beginning of a
18:54 ilmari JSON text.»
18:56 leont UTF-16 has all the disadvantages of UCS-2 with all the disadvantages of UTF8, and adds one of its own: it isn't binary sortable (even UTF-16BE) due to surrogate pairs. It's a mess really.
18:58 ilmari 00 00 00 xx: UTF32-BE, xx 00 00 00: UTF-32LE, 00 xx: UTF-16BE, xx 00: UTF-16LE, xx: UTF-8
19:07 hoelzro 9^/win3
19:07 hoelzro oops
19:29 tokuhiro_ joined #moarvm
19:36 kjs_ joined #moarvm
19:44 domidumont joined #moarvm
19:45 vendethiel- joined #moarvm
19:57 kjs_ joined #moarvm
20:07 kjs_ joined #moarvm
20:22 tokuhiro_ joined #moarvm
20:27 lizmat joined #moarvm
21:34 vendethiel joined #moarvm
21:43 diakopter here's a CORE.setting compilation profile output using XCode Instruments: http://imgur.com/5LJOuf7
21:45 diakopter in case anyone wants to find some low-hanging fruitzies
21:48 diakopter that's at a 40-microsecond sample rate
21:54 diakopter and sorted by Self (ms) if you're interested: http://i.imgur.com/uW1KBZ6.png
21:58 jnthn Nice
21:58 * jnthn drops them in browser tabs for when he's not tired :)
21:58 jnthn Rest time for now... o/
21:59 diakopter o/
22:02 Ven joined #moarvm
22:24 tokuhiro_ joined #moarvm
22:31 Ven_ joined #moarvm
23:28 diakopter in core setting compilation, MVM_sc_find_object_idx hits its cache 127855 times, but misses the cache 667817 times. Each time it misses the cache, it does a linear search through possibly thousands of objects to find the match..
23:28 diakopter 667817 linear searches is not good
23:32 timotimo that seems like a good catch
23:50 timotimo i'm definitely looking forward to when moar's jit builds a /tmp/perf-PID.map

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary