Perl 6 - the future is here, just unevenly distributed

IRC log for #moarvm, 2014-02-16

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:23 dalek MoarVM: 0017869 | jnthn++ | src/io/syncpipe.c:
00:23 dalek MoarVM: Avoid an assertion fail on Windows.
00:23 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/0017869c47
00:23 dalek MoarVM: d364f5e | jnthn++ | src/io/syncstream.c:
00:23 dalek MoarVM: Correct a thinko.
00:23 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/d364f5e7da
01:05 dalek MoarVM: 705d815 | jnthn++ | / (12 files):
01:05 dalek MoarVM: Support a custom separator in sync streams.
01:05 dalek MoarVM:
01:05 dalek MoarVM: Well, at least a 1-char one. Enough for the tests.
01:05 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/705d815735
01:12 jnthn The only thing left behind in MVMOSHandle from the previous design now is directory handles.
01:19 jnthn I'll leave those for tomorrow. Once that's done, then IO refactor is done.
01:44 woolfy joined #moarvm
02:46 FROGGS_ joined #moarvm
03:03 cognominal joined #moarvm
03:38 timotimo hmm. when perl6-m calls MVM_repr_box_num for the very first time - which seems to come from the datetime setup stuff it does when loading the core.setting? - it already uses 70 megabytes of resident ram
03:39 timotimo the very first call to box_int however is at only 3 megabytes of resident ram
04:06 colomon joined #moarvm
04:13 dalek MoarVM: 5f86fb0 | jimmy++ | src/ (6 files):
04:13 dalek MoarVM: Add const keyword
04:13 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/5f86fb037a
04:21 timotimo https://gist.github.com/timo/9ec85de3f926f3f299dd
04:23 timotimo i wonder if these share the pointer to string memory or if they all have their own copy of the actual string itself
04:26 dalek MoarVM: bf9c708 | jimmy++ | tools/ (2 files):
04:26 dalek MoarVM: updated tools/windows1252_cp_gen.p6
04:26 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/bf9c708f6d
04:35 timotimo strings:
04:35 timotimo 0  [================================================== 7941
04:35 timotimo 1  [                                                   13
04:35 timotimo MVMStringBody.flags of the strings in the gen2
04:35 timotimo seems like there's a decent memory win to be had by investing a little more time into trying to build strings with 8 bits instead of 32 bits per character
04:36 timotimo but first i'll try to get some rest before it becomes day again
07:30 nwc10 jnthn++ # Moar Pandas
07:38 FROGGS_ good morning
09:00 FROGGS_ joined #moarvm
10:03 lizmat joined #moarvm
10:05 lizmat joined #moarvm
10:32 crab2313 joined #moarvm
11:03 jnthn timotimo: At the moment, our hash key computation works only on 32-bit wide, so it forces things to that before doing it.
11:04 jnthn timotimo: So that piece needs a tweak first.
12:56 crab2314 joined #moarvm
12:58 crab2313 joined #moarvm
13:47 timotimo ah, hm.
13:48 timotimo where do i find sthe hash function then?
13:49 jnthn 3rdparty/uthash.h or so
13:49 jnthn But it only cares about blobs of memory so far.
13:50 timotimo ah, mhm.
13:51 timotimo did you see how often some of the strings get duplicated?
14:10 timotimo there's a whole lot of code in there :|
14:25 timotimo i don't feel up to that task :(
14:28 timotimo maybe there's some way to get a bit of dedup for the strings, though? perhaps we're accidentally copying in some place instead of re-using the string objects or something?
15:05 dalek Heuristic branch merge: pushed 43 commits to MoarVM/gdb-support by timo
15:05 dalek MoarVM/gdb-support: b62c422 | (Timo Paulssen)++ | / (2 files):
15:05 dalek MoarVM/gdb-support: the gdb thing ought to live in tools/.
15:05 dalek MoarVM/gdb-support:
15:05 dalek MoarVM/gdb-support: it needs to be symlinked to where the moar binary lives
15:05 dalek MoarVM/gdb-support: anyway, so why not move it out of the way a little bit.
15:06 dalek MoarVM/gdb-support: review: https://github.com/MoarVM/MoarVM/commit/b62c4224f7
15:06 dalek Heuristic branch merge: pushed 25 commits to MoarVM by timo
15:25 crab2313 joined #moarvm
16:36 d4l3k_ joined #moarvm
17:04 rblackwe_ joined #moarvm
17:04 dagurval_ joined #moarvm
17:24 * timotimo is recording how much extra space utf8_decode wastes by not shrinking the buffer after decoding
17:24 timotimo though it doesn't count that strings may get cleaned up, it's at about 64 bytes per string on average that we allocate too much
17:25 timotimo but that's in compiling the setting
17:25 timotimo otherwise it's at usually 33 bytes per string wasted
17:25 timotimo for -e 'say 1' it outputs: 32000: 1083448 wasted (33.857750 per string)
17:25 timotimo so 32k strings allocated, 1 megabyte wasted on that
17:25 timotimo not terribly worrying, it seems to me
17:26 jnthn True, though a meg win is kinda nice too
17:27 timotimo i'll put in a realloc to see if it's only a meg or if the many copied strings make a bigger difference, or if the cleanups of the strings makes the difference smaller in practice
17:29 jnthn Might also be checking what our first guess at the size is.
17:30 timotimo 16, eh?
17:30 timotimo maybe.
17:31 timotimo could even run-time tune that
17:31 timotimo 0.20user 0.02system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 89432maxresident)k
17:31 timotimo that's -e 'say 1' with shrunk strings
17:31 timotimo 0.18user 0.04system 0:00.22elapsed 99%CPU (0avgtext+0avgdata 90388maxresident)k
17:32 timotimo that's without shrunk strings
17:32 jnthn Oh...I think a better guess is the number of codepoints will be the number of bytes we're decoding?
17:32 jnthn As that covers the ASCII case correctly
17:32 timotimo right
17:33 dalek MoarVM: 1c64183 | jnthn++ | src/6model/reprs/MVMOSHandle.h:
17:33 dalek MoarVM: Remove unused symbol.
17:33 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/1c64183c00
17:33 dalek MoarVM: 198c5d7 | jnthn++ | src/io/dirops.c:
17:33 dalek MoarVM: Move directory listing to new IO scheme.
17:33 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/198c5d700d
17:33 jnthn Anyway, a one line (or so) addition to save a meg seems worth it to me.
17:34 timotimo i only shrink if it would save more than 4 32-bit integers in the buffer
17:34 timotimo i should measure how often the shrink is actually needed
17:34 timotimo (crazy strings, they are!)
17:34 dalek MoarVM: 40029f3 | jnthn++ | src/6model/reprs/MVMOSHandle.h:
17:34 dalek MoarVM: Finish cleanup of OSHandle.
17:34 dalek MoarVM:
17:34 dalek MoarVM: Now everything is switched over to the new IO model.
17:34 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/40029f376d
17:35 timotimo also, *=2ing the buffer all the time may not be a good idea if we're in the last 5% of the source string ;)
17:35 timotimo actually, wouldn't the utf8 source string *always* be bigger than the number of codepoints?
17:35 timotimo so starting at 16 in any case would be unwise either way?
17:35 jnthn Well, identical for ASCII.
17:35 jnthn And an over-estimate for other things
17:36 jnthn So yeah you could turn the resize into an "just make sure we don't overflow it" sanity check.
17:36 timotimo i've done that now, the amount of wastage has decreased sharply
17:37 jnthn it's more efficient too since we're not realloc'ing, which may have to copy
17:39 timotimo 32000: 1412 wasted (0.044125 per string, shrunk 54 times)
17:39 timotimo this is the core.setting compilation
17:40 timotimo 0.20user 0.03system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 89252maxresident)k
17:40 timotimo another 200kbytes less
17:40 timotimo i'll clean up the patch and commit it
17:41 timotimo do we actually decode any utf16 in a regular program?
17:41 jnthn No
17:41 timotimo should i just copy the logic to the other encodings? latin1?
17:41 timotimo actually, latin1 wouldn't ever need to do anything
17:42 jnthn I'd hope latin1 just allocates the right size straight away
17:42 jnthn So, another megabyte off with this?
17:42 dalek MoarVM: aac8e0a | (Timo Paulssen)++ | src/strings/utf8.c:
17:42 dalek MoarVM: better estimate and perhaps shrink utf8 decoded buffers
17:42 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/aac8e0a14c
17:42 timotimo yes, another megabyte
17:43 timotimo how do you feel about "smallstring"? if the string is so short it fits into a pointer, set a flag and use the pointer instead of a buffer?
17:44 timotimo oh, wait, we have 32bit big codepoints there
17:44 timotimo not 8 bit codepoints
17:44 jnthn Yeah...
17:44 jnthn Maybe some day it's worth it.
17:44 timotimo yup
17:44 timotimo until then, some other way of deduplicating very short strings may be in order
17:45 jnthn I think we should try and clean up/fix what we have first...
17:45 timotimo https://gist.github.com/timo/9ec85de3f926f3f299dd <- did you see this? this is just from looking at the nursery at some random point.
17:45 jnthn How many times is the empty string duplicated?
17:45 timotimo if i'm looking at it correctly, it seems like: often.
17:45 jnthn Well, de-duplicating in the nursery isn't so worth it.
17:45 jnthn I mean, those are most likely going to get collected
17:46 timotimo right, but a whole bunch of strings ought to have found their way into the gen 2, too.
17:46 timotimo i should actually properly investigate that
17:46 jnthn Not if they're being produced afresh.
17:46 timotimo mhm
17:47 jnthn When was that nursery snapshot taken?
17:47 timotimo i don't remember :(
17:47 timotimo pretty early i guess
17:47 timotimo in the empty program
17:49 jnthn I wonder if all the :, <, prec, etc. are from <O(...)>
17:49 timotimo well, the empty program doesn't call the allocater at all :)
17:50 * timotimo has a bigger nonsense-program
17:51 jnthn gonna do dinner and stuff...will look at the STable repos issue afterwards.
17:51 timotimo hm. string histogram seems b0rked ATM
17:52 timotimo oh, yes, i only collect the flags, not the values right now
17:55 timotimo i was editing the file pre-mv m)
17:57 timotimo oh, i forgot to set the sampling rate up
18:00 timotimo and i should also grab the MVMString from inside P6str.
18:01 timotimo oh, i don't even have to, those are put into the memory separately and are just pointered at
18:02 jnthn right
18:02 benabik Compile errors on OS X with the new directory I/O bits.  Most are fixed by adding an #include <dirent.h>, but one isn't...
18:02 benabik src/io/dirops.c:343:79: error: no member named 'encoding_type' in 'MVMIODirIter'
18:02 benabik return MVM_string_decode(tc, tc->instance->VMString, "", 0, data->encoding_type);
18:03 jnthn whoa...just discovered I've been handed so many round tuits over the last few year's conferences, if I pile them all up I get a tower a foot high!
18:04 timotimo :D
18:05 timotimo gen 2 looks quite like this:
18:05 timotimo 'ctxsave'             [================================================== 9
18:05 timotimo '__6MODEL_CORE__'     [============================================       8
18:05 timotimo 'Regex'               [============================================       8
18:05 timotimo the majority of strings appear twice or once
18:05 timotimo before the very first nursery run, the nursery looks like this:
18:05 timotimo ''                    [================================================== 102
18:05 timotimo ':'                   [==============================================     94
18:05 tadzik :D
18:05 timotimo '<'                   [========================================           83
18:06 tadzik why do we need such strings btw?
18:06 timotimo for sad emoticons :<
18:06 tadzik :<
18:06 timotimo exactly!
18:06 tadzik <:
18:06 tadzik always look on the right side of life
18:07 tadzik timotimo++ # awesome work
18:07 timotimo thank you :3
18:09 timotimo tadzik: it'll take a whole lot of time to nibble away the startup memory usage if it takes me like 1 week for each megabyte
18:09 timotimo it's likely going to get harder over time, though
18:09 benabik Ah.  If I s/_type//, it seems to work.
18:10 benabik And dirent.h seems to exist on Linux, so I'll just #include it in a #ifndef _WIN32
18:19 dalek MoarVM: da1ae55 | benabik++ | src/io/dirops.c:
18:19 dalek MoarVM: Fix directory I/O compilation on OS X
18:19 dalek MoarVM:
18:19 dalek MoarVM: Possibly fixes errors on Linux as well.
18:19 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/da1ae55580
18:22 timotimo jnthn: i've run through a whole bunch of collections now and the gen2 seems to contain the same string at most like 10 times (though i think i'm not sampling all of the objects)
18:22 timotimo (and 596 completely filled pages) (and 2 empty pages) (freelist with 25 entries)
18:22 timotimo VMArray    [================================================== 26254
18:22 timotimo MVMString  [===============                                    8221
18:22 timotimo P6opaque   [=======                                            3940
18:23 timotimo m: 256 * 596
18:23 camelia rakudo-moar 230a54: ( no output )
18:23 timotimo m: say 256 * 596
18:23 camelia rakudo-moar 230a54: OUTPUT«152576␤»
18:23 timotimo that's about how many objects are in that size class
18:23 timotimo m: say (26254 + 8221 + 3940) / 152576
18:23 camelia rakudo-moar 230a54: OUTPUT«0.2517762␤»
18:23 timotimo so we sample about 1/4th of the objects
18:24 timotimo so, it's likely that the strings that appear most often only appear about 40 times
18:28 * timotimo turns the sampling up to 100%
18:54 timotimo with a 100% sample rate looking at a random point in time i get:
18:55 timotimo 'dba'                 [================================================== 31
18:55 timotimo 'prec'                [=============================================      28
18:55 timotimo 'assoc'               [=============================================      28
18:55 timotimo ''                    [========================================           25
18:56 timotimo so string duplication may not be a terribly big deal
19:05 timotimo https://gist.github.com/timo/41d9eb48aea43a48d35a
19:05 timotimo i suppose it may add up
19:05 timotimo given the "long tail" etc
19:09 timotimo MVMCode    [================================================== 15827 ← i wonder if this is right?
19:21 jnthn timotimo: Is that in gen2?
19:22 jnthn timotimo: It's something the lexicals => locals work should help with, anyways.
19:22 jnthn timotimo: The flattening part, anyway.
19:22 timotimo gen2, aye.
19:25 jnthn Well, the flattening work will help more with making less of 'em in the nursery, I suspect.
19:29 jnthn timotimo: "at about 97 megabytes of ram usage all in all" in https://gist.github.com/timo/41d9eb48aea43a48d35a
19:29 jnthn huh? :) The whole process is < 90MB of RAM now?
19:30 timotimo huh huh?
19:31 timotimo that's a random program that does stuff with a list over and over and throws the list away and makes a new one
19:31 timotimo so it's more than 90 megabytes for that reason
19:36 jnthn Oh!
19:36 jnthn That explains it :)
21:50 benabik joined #moarvm
21:59 tgt joined #moarvm
22:38 tgt joined #moarvm
23:28 crab2313 joined #moarvm

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary