Camelia, the Perl 6 bug

IRC log for #moarvm, 2014-01-25

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:09 dalek MoarVM: a4a5e60 | jnthn++ | src/ (3 files):
00:09 dalek MoarVM: Mechanism for HLL handler of 'method not found'.
00:09 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/a4a5e608f3
00:13 FROGGS_ joined #moarvm
00:35 timotimo I'm no longer do hopeful for my sec comp approach tbh
00:36 timotimo I would have to fork libuv or trap and emulate system calls
00:36 timotimo it may actually be more sensible to try to make the white list compiler thing work instead
00:37 timotimo though I think I will keep the fork and communicate operations I made so far
00:40 diakopter timotimo: what is the goal of the effort?
00:41 timotimo running user supplied code without fear
00:42 timotimo and getting data out of the process easily
00:57 timotimo the whitelisting compiler thing would be extra beneficial in that it could work with all backends
00:59 jnthn I think we probably want to do it at VM level...
00:59 timotimo yeah :\
01:00 timotimo i could also combine a really strict selinux policy with a pretty lenient seccomp whitelist
01:00 timotimo though of course capsicum would be much more ideal
01:06 jnthn Time for some sleep here
01:06 jnthn 'night o/
01:10 diakopter o/
01:43 jnap joined #moarvm
02:44 jnap joined #moarvm
02:59 jnap joined #moarvm
04:35 jnap joined #moarvm
04:44 FROGGS joined #moarvm
05:36 jnap joined #moarvm
06:37 jnap joined #moarvm
07:37 jnap joined #moarvm
08:29 FROGGS joined #moarvm
08:38 jnap joined #moarvm
08:46 FROGGS joined #moarvm
09:04 crab2313 joined #moarvm
10:33 Mouq joined #moarvm
10:40 jnap joined #moarvm
11:40 jnap joined #moarvm
12:08 FROGGS joined #moarvm
12:26 odc joined #moarvm
12:29 dalek MoarVM: dd7ddb4 | jnthn++ | src/core/ (3 files):
12:29 dalek MoarVM: Factor out HLL symbol lookup, add to public API.
12:29 dalek MoarVM: review: https://github.com/MoarVM/MoarVM/commit/dd7ddb43bd
12:41 jnap joined #moarvm
12:48 Mouq joined #moarvm
13:38 timotimo do we already have a proper datastructure in place to handle freeing pointers of things that were allocated using malloc?
13:40 jnthn timotimo: Well, many things are attached to a garbage collectable object (so its gc_free or gc_cleanup does it)
13:40 timotimo er, i missed the key point :P
13:40 timotimo "in a separate thread"
13:41 timotimo how much percent of the whole run time are spent inside free()? you seemed to have a profile some time recently
13:42 jnap joined #moarvm
13:42 jnthn Depends enormously on what you're profiling
13:42 timotimo well, that makes sense.
13:43 jnthn When I do something that creates loads of Ints, it's malloc/free of mp_int that dominates.
13:43 jnthn I've been pondering that one a bit.
13:43 timotimo dominates?
13:43 timotimo as in, on the top of the leaderboards?
13:44 jnthn The dominator is usually on top...
13:44 timotimo wow.
13:44 jnthn It's the biggest source of free/malloc, I mean. :)
13:44 timotimo oh
13:44 timotimo yeah, well ... :)
13:44 jnthn I had an idea on it though
13:45 jnthn We could define it as a union of mp_int and two int32s.
13:45 jnthn And set one of those int32s to MAX_VAL to indicate "this is not a bigint"
13:45 jnthn And use the other one to store the value
13:46 jnthn So, small numbers are just stored
13:46 timotimo that could conceivably help
13:46 timotimo how likely is it that MAX_VAL ends up there in a legit value?
13:46 jnthn Thing is, if we do math on them as 64-bit numbers, you can always check the result doesn't overflow.
13:46 jnthn Very, very unlikely given it'd need the pointer to have those bits set
13:46 timotimo right.
13:47 jnthn If we are careful and define the struct so that the MAX_VAL flag is overlapping the LSB of the pointer, then "no chance" in so far as it'd mean we got back non-aligned memory.
13:47 timotimo ah, yes, that'd be helpful
13:48 timotimo checking for a 64 bit overflow should be relatively cheap compared to even the tinyest amount of data retrieval from ram, no?
13:48 jnthn Well, thing is, what you're actually looking for is "is this bigger/smaller than a 32-bit number could hold"?
13:48 jnthn If so then promote to big int.
13:49 jnthn Well, it's a heck of a lot cheaper than a malloc
13:49 jnthn And yeah, those sorts of things pipeline pretty well
13:49 jnthn And probably we get a decent hit rate on branch prediction.
13:49 jnthn So the check may well be almost "free", like our write barriers hopefully almost are.
13:52 timotimo good point about the predictor
13:56 timotimo ttyl
13:58 jnthn o/
13:58 krunen joined #moarvm
14:20 FROGGS joined #moarvm
14:43 jnap joined #moarvm
15:06 V_S_C joined #moarvm
15:10 V_S_C just noticed hoelzeo++ fixed dir() on moarvm, so tried Panda bootstrap afresh
15:10 * jnthn wonders what the next failure is :)
15:12 V_S_C after ==> Fetching File::Find
15:13 V_S_C moar.exe just keeps consuming CPU & getting more memory (gets memory very slowly, much unlike other processes)
15:13 jnthn hm
15:14 * jnthn has no idea what that could be
15:14 jnthn I suspect somebody will debug it sooner or later and get it down to a smaller test case, though.
15:17 V_S_C @jnthn, after your last commit to nqp, nmake rakudo ended with unexpected version of QRegex.nqp, so I used the Jan release of rakudo instead of bleeding edge from git
15:17 V_S_C I'll try again with updated rakudo
15:18 diakopter jnthn: why not always put the whole mp_int inline the bigint body?
15:18 jnthn diakopter: Hm, only immediate worry is how well it'd cope with moving...probably ok
15:19 diakopter [then make a new one to store mutations to]
15:19 jnthn diakopter: That sounds very do-able though...
15:19 diakopter HOW BIG IS IT
15:19 diakopter er
15:19 diakopter hpone
15:19 diakopter how big is it
15:20 jnthn Not sure right off...
15:20 * diakopter guesses less than 100 bytes. I'm sure it mallocs on its own for its data array
15:20 jnthn typedef struct  { int used, alloc, sign; mp_digit *dp;
15:20 jnthn } mp_int;
15:22 diakopter probably worth making a few sized pools for those dp arrays
15:22 jnthn Well, maybe, but I think the other plan I mentioned of not even doing bigints at all for thing that fit into 32-bit may be the bigger win.
15:23 jnthn We may be able to do both of these, of course...
15:23 jnthn All we need is a non-ambiguous way to know what we have
15:23 diakopter lots of checking, but probably yeah. it's what JS JITs do anyway
15:24 diakopter except they are tagged, so it only uses 4 bytes total
15:24 jnthn The common case of Perl 6 Int is not small
15:24 jnthn uh
15:24 jnthn The common case of Perl 6 Int is not big
15:24 diakopter (and 31 bits of int)
15:24 timotimo jnthn: i have an idea that may help a tiny bit
15:24 timotimo do we know in advance how much memory an mp_int is going to take when we build our P6Int object?
15:25 timotimo because then we could allocate the P6Int storage + the mp_int storage and only have to free 1 instead of 2 pointers
15:25 timotimo also, the values would always be close together which may help caching stuff
15:25 jnthn Well, but there's no separate malloc for the Int itself
15:26 timotimo there is not?
15:26 timotimo well, that's fine then
15:26 diakopter well theoretically you could even know how big mp_dogit *dp will be too
15:26 jnthn True
15:27 jnthn I think the union of mp_int * and two int32s I mentioned before may work out best overall, though.
15:27 jnthn Sure, we malloc an mp_int, BUT only when people have numbers that are actually big
15:27 jnthn If we union an mp_int and the two int32s, we can still make it work, but we make very Int pay the size cost
15:27 jnthn *every
15:28 diakopter so Int
15:28 timotimo how big is the size cost of using the union approach?
15:28 jnthn I pasted the struct above
15:28 jnthn It's 3 * 32-bit integers + 1 pointer
15:29 timotimo ah, there
15:29 jnthn So on 64-bit that's 3 * 4 + 4 (padding) + 8
15:29 diakopter jnthn: are you sure the bigint lib isn't already doing that optimization?
15:29 timotimo isn't using ints for used and alloc a bit excessive?
15:29 jnthn So 3 times the size if we assume "fits in int32" is the overwhelmingly common case.
15:30 jnthn timotimo: Not if you want to get really big. :)
15:30 timotimo who has that amount of memory?
15:30 jnthn NSA? :P
15:30 timotimo right.
15:30 timotimo wait, those ints are only 32 bits anyway?
15:31 jnthn .oO( well, now there's this channel being monitored... )
15:31 jnthn timotimo: Yes
15:31 diakopter jnthn: are you sure the bigint lib isn't already doing that optimization?
15:31 timotimo and we pad to 64 bit boundaries?
15:31 timotimo on a 64 bit system, that is
15:31 jnthn diakopter: yes, trivially because it can't save us the cost of allocating its mp_int struct...
15:32 jnthn timotimo: Well, a pointer needs to be on an 8-byte boundary.
15:32 diakopter no I mean using an int32 only instead of alloc a mp_digit*dp array
15:32 timotimo ah. okay. since the two ints are followed by a pointer, we wouldn't lose anything to padding in that case
15:32 jnthn diakopter: If it is, then the data structure here sure isn't making it convenient for them to be...
15:33 diakopter why not
15:33 timotimo how does libtommath decide how much to allocate? is it double-the-amount-of-memory-each-time? in that case we could store the log2 of the allocated size and an int8 would suffice. the used amount wouldn't get better, though
15:33 jnthn diakopter: I'd expect to see a union somewhere in mp_int
15:33 diakopter maybe it's cheaper not to have a union and have a couple tracking firlds
15:34 diakopter need one for array size after all
15:35 diakopter in fact it does need all three of thosr
15:35 diakopter used alloc sign
15:35 tadzik I think I'll rewrite panda fetcher to use system's "cp" and "cp -r" where available
15:35 tadzik fetcher is so much pita
15:36 jnthn diakopter: I just read through the add operation and I see no evidence there of it doing the opt
15:36 diakopter but yeah maybe there would be a union instead of just reusing another field without another name
15:36 diakopter hm
15:37 jnthn tadzik: Won't that mean maintaining two codebases where one would do?
15:37 jnthn tadzik: two codepaths, I mean...
15:37 diakopter BuildUtils
15:37 diakopter er
15:37 diakopter UnixUtils
15:37 diakopter (haha)
15:39 tadzik jnthn: yep ;/
15:40 V_S_C jnthn: I'll be gr8ful to tadzik++, coz I got started with PERL from Rakudo PERL6 onward
15:40 V_S_C & module installer will give me focus
15:40 V_S_C otherwise its more about staying out of way
15:41 V_S_C I mean, there's nqp, then the build employs perl5
15:41 V_S_C & more than 1 VMs
16:29 krunen joined #moarvm
16:44 jnap joined #moarvm
16:46 japhb Just backlogged -- given the struct, why can't you union the entire struct with 2-3 64 bit ints, and overlay e.g. the sign field with a flag to indicate one of the 64-bit ints can be used instead of the mp_digit*?  That would avoid being limited to 32-bit small ints, greatly increasing the cases when the full beast can be avoided.  Or am I misunderstanding the discussion?
16:49 jnthn japhb: The point of using 32-bit ints was you could do the math in 64-bit and have easy, portable overflow detection.
16:50 japhb OIC
16:50 japhb That's the point I missed.
16:51 * timotimo is going to tackle varint encoding now
16:55 jnthn varint?
17:01 timotimo variable-sized integers
17:01 timotimo for our serialization blob
17:01 jnthn oh!
17:01 jnthn yeah
17:02 timotimo :)
17:09 jnthn curry and beer and beer &
17:29 FROGGS_ joined #moarvm
17:31 timotimo this varint type could even encode inf and -inf and NaN :P
17:31 timotimo because there's 9 different representations for the 0
17:32 timotimo and only one would ever be chosen
17:36 ggoebel1113 joined #moarvm
17:45 jnap joined #moarvm
17:47 timotimo wow, that was surprisingly easy to do in the end
17:51 japhb (surprisingly easy)++
17:58 timotimo i still have to hook it up properly
18:35 timotimo it seems like we have some endianness trouble on moarvm
18:35 timotimo and the tarball is missing dyncall and libuv
18:43 krunen joined #moarvm
18:46 jnap joined #moarvm
19:21 raiph joined #moarvm
19:23 raiph A redditor has two questions about moarvm performance:
19:23 raiph Q1: Is the fast spectest time mainly a function of the fast startup time?
19:23 raiph Q2: How does it currently compare to other VMs once the startup cost is amortized?
19:24 raiph http://www.reddit.com/r/perl/comme​nts/1w0e3i/p6_january_rakudo_compi​ler_release_moarvm_support/cey6l13
19:30 timotimo raiph: there's still the worst-possible-implementation for concatenation of strings, but i have some benchmarks that show moarvm beating parrot almost all the time and jvm sometimes; these benchmarks have startup time removed completely.
19:30 timotimo let me upload them somewhere
19:31 raiph awesome, thx (the ones that are currently at an ipv6 address only, right?)
19:33 timotimo yeah, but i'm going to remove the older moar from it
19:33 timotimo the one from before putting in inlining
19:33 timotimo when you post the benchmarks, make sure to point out this is on a -O1 -g3 moar vs a -O3 parrot
19:35 timotimo also, someone said that panda requires nativecall, that's not true
19:36 timotimo .o(building a new rakudo-parrot right now to make the benchmark results)
19:37 FROGGS we need nativecall for rakudo*, that is all
19:43 Mouq joined #moarvm
19:43 raiph i'm wondering if I should share the link or tell folk to come to #moarvm (or #perl6)
19:44 timotimo ho-hum.
19:44 timotimo raiph: do you have a webspace where you could upload it?
19:44 raiph feather?
19:44 timotimo that'd be excellent
19:44 timotimo https://www.dropbox.com/s/r4lodw7e​to6nker/all_three_backends.tar.bz2
19:45 raiph cool
19:46 timotimo i should probably add perl5 to those to make our accomplishments look much less impressive
19:46 timotimo and also the NQPs
19:48 timotimo for that, let me get off my desktop and spend the rest of my day on my laptop while my desktop crunches benchmarks
20:37 krunen joined #moarvm
20:47 jnap joined #moarvm
21:00 flussence joined #moarvm
21:31 [Coke] moar is at 28709 pass today
21:31 timotimo about 90 away from parrot, seems like
21:32 timotimo okay, i got the numbers finally :)
21:32 [Coke] parrot is at 28807 todya
21:33 [Coke] r: say 28807-28709
21:33 camelia rakudo-parrot e51b6c, rakudo-jvm e51b6c, rakudo-moar e51b6c: OUTPUT«98␤»
21:38 timotimo raiph: are you still there?
21:38 timotimo can you upload another benchmark to the same directory and give us the links?
21:38 raiph sure
21:38 timotimo https://www.dropbox.com/s/py4ss​vp41232beh/everything_ever.html
21:39 timotimo i think you can just wget that directly onto the server the way it is
21:45 timotimo raiph: so, can has links?
21:47 raiph http://feather.perl6.nl/~raiph/25jan​2014-benchmark-p5-nqps-rakudos.html
21:48 raiph timotimo++
21:48 jnap joined #moarvm
21:48 timotimo thanks for uploading these
21:50 ozmq joined #moarvm
22:07 ozmq Benchmarks good ... my Int $i = 0; while ($i++ < 1000000) {};
22:08 ozmq A million empty blocks does 170 million opcode dispatches in moar's interp.c loop. 170 ops per loop seems like quite a lot.
22:09 timotimo i'm the first one to admit that the microbenchmarks are practically meaningless
22:14 ozmq joined #moarvm
22:16 japhb timotimo: Well, kinda.  The microbenchmarks are like golfing a bug -- they help find how one small change (s:g/Int/int/, for instance, or using a different loop construct) can make a big performance difference.
22:17 japhb Which often means the loser in that comparison has an implementation issue that is slowing it down more than it should be.
22:17 timotimo right
22:17 timotimo for example: why are the rakudos so much slower at doing empty loops than the nqps? :)
22:17 japhb And in that sense magnify problems that added together make a real benchmark like rc-forest-fire mysteriously slower or faster.
22:17 timotimo well, i suppose for a regular "for" loop, they have to go via the *Iter
22:18 timotimo is there a --tests-tagged for "only minibenchmarks" btw?
22:18 timotimo if i'm to do regular benchmarks, i may not want to do all the microbenchmarks
22:19 japhb Well, there's a limited set of non-micro-benchmarks, so I would probably just specify the explicit list of ones I want.  But yes, micros and non-micros should be tagged.  In fact, we should add a LOT of tags.
22:19 timotimo agreed.
22:20 japhb timotimo: Thanks for all your work on perl6-bench stuff, BTW.
22:22 timotimo sure
22:22 timotimo it's not been that much :P
22:24 japhb Converting Richards to idiomatic Perl 6 ought to be pretty easy.  For the Perl 5 version, we need to decide whether to use "native" Perl 5 OO, or something like Moose or Moops.
22:25 japhb I haven't kept up with the relative performance there.
22:25 ozmq joined #moarvm
22:26 japhb nwc10: As the resident perl5 expert, do you happen to have any thoughts on making that test fair and useful for Perl 5 comparison?
22:32 ozmq joined #moarvm
22:32 timotimo i must have done something terribly wrong
22:33 timotimo ah, i see
22:41 jnthn For those benchmarking or discussing performance, it might be worthwhile remembering that Moar doesn't do a shred of optimization of the bytecode its fed yet, let alone any kind of JIT compilation.
22:42 timotimo it seems like i didn't see correctly
22:42 timotimo i still get only one datapoint for nqp-parrot for parse-json
22:43 japhb jnthn: Do we even *want* it to make opt passes over the bytecode when in interpreted mode?  That would kinda kill the point of mmap'ing the bytecode, and it's easier to write the optimizations in the code generators anyway, isn't it?
22:43 japhb When going to JIT, I totally see optimizing the hell out of it.
22:43 jnthn japhb: It should certainly make opt passes on hot things; that's how runtime specialization works.
22:43 japhb jnthn: Oh, I think we were thinking in different directions.
22:44 japhb I was thinking e.g. peephole optimizations.
22:44 jnthn japhb: Ah, those should be done earlier
22:44 japhb Agreed.
22:44 timotimo japhb: my feeble attempt to just put in || +@all_times < 3 into the timing loop failed to produce proper results :\
22:44 japhb timotimo: paste?
22:45 japhb Or push to a branch?
22:47 timotimo pushed
22:48 timotimo does parrot have a peephole optimizer?
22:49 jnap joined #moarvm
22:51 timotimo we would be implementing the peephole optimizer in nqp, right? not in c
22:54 jnthn timotimo: Given it's VM-specific, I could easily imagine it being done in src/mast/compiler.c or so
23:04 timotimo but then we have to implement it in c! :P
23:11 japhb timotimo: I think you wanted ... < 3 * $runs since each call to time_command performs $runs timings, and these get flattened into @all_times.
23:13 timotimo ooooh
23:13 timotimo that's a good point, thanks!
23:14 * timotimo fixes
23:38 timotimo jnthn: i would have to give every operation on big integers a check if it's a mp_int or stored 64bit int, right?
23:38 timotimo and do manual overflow checking, too?
23:38 timotimo i'm not sure how to do the latter in C, actually
23:38 jnthn timotimo: Stored 32-bit int.
23:39 timotimo ah
23:39 jnthn timotimo: The point is to store 32-bit things specially, but do the math at 64-bit size. Then see if you can store the result in 32 bits or not - which is just a > and < check :)
23:41 timotimo ah
23:42 timotimo that seems easy enough
23:42 timotimo is that guaranteed to work?
23:42 timotimo i suppose for exponentiation i could still overflow it :)
23:44 jnthn Yes, you'll have to go on a case-by-case basis.
23:46 flussence joined #moarvm
23:50 jnap joined #moarvm

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary