Perl 6 - the future is here, just unevenly distributed

IRC log for #moarvm, 2017-07-28

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:12 timotimo heh, the moarvm.org repo already has the fixes, but the website itself is still on an older revision
01:24 samcv hmm i'm getting a crash when i call MVM_free on something i've MVM_realloc'd
01:24 samcv anybody know of any gotcha's that could maybe cause that? it only crashes on the free
01:37 samcv An interesting article on use of conditionals http://llewellynfalco.blogspot.com/2016/02/dont-use-greater-than-sign-in.html
01:37 samcv i've since taken its advice and now only use the less than or less than or equal to sign when programming. it's resulted in much clearer and consistently typed conditionals
01:42 samcv ok so i used realloc(ptr, 0) to free it and that probbaly works from what the documentation on realloc says
01:52 ilbot3 joined #moarvm
01:52 Topic for #moarvm is now https://github.com/moarvm/moarvm | IRC logs at  http://irclog.perlgeek.de/moarvm/today
03:17 solarbunny joined #moarvm
04:01 MasterDuke joined #moarvm
06:13 brrt joined #moarvm
06:26 brrt good * #moarvm
06:34 samcv brrt, you should check out that article i posted
06:34 brrt i should first backlog
06:41 brrt that's an interesting idea, yes
06:43 samcv i've been loving it since i started doing it several weeks ago
06:43 samcv makes everything more readable and the code and conditionals just end up looking nicer
06:44 nwc10 good_ *, jnthn_
06:44 brrt ohai nwc10
06:44 nwc10 jnthn: that panic seems to also crop up in some spectests. It's always the same negative value in the message, and ASAN makes no comment
07:05 domidumont joined #moarvm
07:12 domidumont joined #moarvm
07:15 brrt joined #moarvm
07:36 geekosaur joined #moarvm
07:42 geekosaur joined #moarvm
07:43 geekosaur joined #moarvm
07:44 geekosaur joined #moarvm
07:58 zakharyas joined #moarvm
08:06 zakharyas joined #moarvm
08:18 TimToady joined #moarvm
08:53 robertle joined #moarvm
08:53 jnthn_ moarning o/
08:59 samcv morning jnthn_ :)
09:04 brrt moarning jnthn, samcv
09:22 jnthn Grr, two mornings in a row the internet connection here breaks for about 30 minutes :/
09:25 jnthn Bemusingly, I don't see the crash people have been reporting even in stresstest
09:27 brrt i'm not seeing a crash either, fwiw
09:27 timotimo i only get it with some debug stuff turned up
09:28 timotimo and it's only a guaranteed on-startup-crash when i change the is_on_stack
09:28 timotimo maybe i did the is_on_stack function wrong?
09:32 jnthn Maybe :)
09:55 jnthn Hm, no geth?
10:06 nine Geth_: status
10:07 Geth joined #moarvm
10:07 nine Geth_: status
10:07 Geth ¦ MoarVM: 8d7f9d404c | (Jonathan Worthington)++ | 3 files
10:07 Geth ¦ MoarVM: Remove now-unused pool_index in MVMStaticFrame.
10:07 Geth ¦ MoarVM:
10:07 Geth ¦ MoarVM: Plus code to increment it.
10:07 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/8d7f9d404c
10:07 Geth ¦ MoarVM: 5a5b1be977 | (Jonathan Worthington)++ | 12 files
10:07 Geth ¦ MoarVM: Eliminate now-unused Lexotic REPR, lexotic cache.
10:07 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/5a5b1be977
10:07 Geth ¦ MoarVM: f302bbed14 | (Jonathan Worthington)++ | src/core/instance.h
10:07 Geth ¦ MoarVM: Remove a dead field.
10:07 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/f302bbed14
10:08 timotimo moarvm's shrinking ... and i like it
10:08 jnthn :)
10:11 * jnthn was doing the trans-siberian when that song was at the height of its popularity
10:12 jnthn It followed me across the whole of darn Siberia. Every. Single. City.
10:13 Geth ¦ MoarVM: fdd04a6a99 | (Jonathan Worthington)++ | 2 files
10:13 Geth ¦ MoarVM: Remove unused gotolexotic function.
10:13 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/fdd04a6a99
10:13 timotimo what song was that? "i'm so excited"?
10:14 jnthn hah, no, that Katy Perry one...I thought you were referencing it :P :P
10:15 jnthn Semes I've killed off all the lexotic leftovers
10:16 jnthn The thing that motivated me to do that was pondering cleaning up invocation a bit
10:16 jnthn Now there's only 3 things that can hang off the invoke pointer on an STable
10:17 jnthn Which means it is somewhat endangered :)
10:17 travis-ci joined #moarvm
10:17 travis-ci MoarVM build errored. Jonathan Worthington 'Remove now-dead newlexotic functions.'
10:17 travis-ci https://travis-ci.org/MoarVM/MoarVM/builds/258490308 https://github.com/MoarVM/MoarVM/compare/08d58699beba...aeb7fb2f4c78
10:17 travis-ci left #moarvm
10:17 jnthn pull fail
10:19 jnthn oh hah, after that change memory moved just enough that now I do get a spectest fail too
10:19 jnthn t/spec/integration/deep-recursion-initing-native-array.t
10:20 jnthn Oh no, until I do a debug build
10:20 nine Of course. That would have been too easy.
10:22 timotimo i'm not sure which katy perry song we're talking about %)
10:25 nine Probably "I kissed a girl" as jnthn's trans-siberian trip was some time ago
10:27 jnthn Yeah, that one :)
10:32 timotimo ah, i see
10:32 timotimo yeah, it has "and i liked it", here i said "and i like it" :P
10:32 timotimo "i shrunk moarvm and i liked it"
10:33 timotimo "the taste of the git commit ..." what exactly?
10:33 Geth ¦ MoarVM: 82c282efda | (Jonathan Worthington)++ | src/core/frame.c
10:33 Geth ¦ MoarVM: Zero owner of a stack frame also.
10:33 Geth ¦ MoarVM:
10:33 Geth ¦ MoarVM: The flags being zero is a sufficient test for whether the frame is a
10:33 Geth ¦ MoarVM: stack frame or not when we know it's certainly a frame. However, when
10:33 Geth ¦ MoarVM: we have an unknown collectable, the fully zeroed flags would also be
10:33 Geth ¦ MoarVM: a normal object. The temp roots mechanism thus uses a zero owner to
10:33 Geth ¦ MoarVM: identify this situation. Ensure the owner is zeroed.
10:33 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/82c282efda
10:34 timotimo oh i thought zeroed flags would be "not a normal object"
10:34 jnthn No, I forgot that we do that :)
10:34 jnthn We do it 'cus it means object allocation never has to set any flags :)
10:34 jnthn So the common case is cheap
10:34 timotimo ah
10:35 jnthn An alternative to save, like, 1 instruction, would be to introduce a flag set on stack frames on the call stack
10:35 jnthn Instead of 0
10:38 jnthn spectest looks happy with GC stressing
10:39 jnthn So hopefully that nails it
10:42 jnthn One upshot of MVMStaticFrame having spesh stats hung off of it is that it often ends up needing to be marked every single collection
10:43 jnthn Well, that's not quite true
10:43 jnthn But since spesh stats sample values, it means a lot more frames end up in the inter-gen roots
10:44 jnthn We thus spend 2.32% of CORE.setting compilation cycles in gc_mark of a static frame
10:45 jnthn Since there's a load of stuff to mark
10:45 timotimo oh, hmm
10:45 timotimo right
10:46 jnthn Static frame's gc_mark is thus called 35 million times
10:49 timotimo the sampled values never get stale, do they? like, we don't throw out stuff that never gets used any more in the future?
10:49 jnthn There's a throwing out mechanism
10:49 jnthn If the stats weren't updated for N spesh runs then they are thrown away
10:49 timotimo that's good
10:49 samcv is create in array that has as many elements as the longest sequence of codepoints. i'll put codepoints into this array. 0, 1, 2 then wrap 0, 1, 2. so it just continuously puts codepoits in there
10:50 samcv always storing only 3 to prevent having to store too much for really long strings
10:50 timotimo oh so like a ring buffer?
10:50 jnthn I guess we could have an MVMSpeshStats repr
10:50 samcv then when it finds a mismatch it puts the three codepoits which might be 2, 0, 1 for example since it wraps continuously
10:50 samcv yeah
10:50 samcv and then pass that through to the collation function
10:50 jnthn Just to hang the stats off
10:51 jnthn And then that'd end up in the inter-gen set
10:51 brrt joined #moarvm
10:51 samcv right now i do tricks so i don't have to go the whole length of the strig before starting collation. since how the UCA works: primary_1, primary_2 .... primary_n, secondary_1 etc
10:51 jnthn Alternatively, we could just keep an array of MVMSpeshStats ** in the instance and just store an index in the MVMStaticFrame
10:51 samcv is how the ordering of them go. so if you just niavely did that you'd have to go the whole length of the string and that'd be no fun
10:52 samcv so need to now solve how to not have to push them starting from the very beginning of the string
10:52 jnthn And have the throwing away done by index
10:52 jnthn And re-use indices
10:53 timotimo that'll naturally throw out old stuff
10:55 jnthn It's a bit tricky to reason about compared to just having a spesh stats repr though
10:56 jnthn And we'd still need to make sure that if a static frame gets collected then we throw out the stats
10:58 jnthn And it gets tricky if two threads are doing GC and want to toss stuff out and add the indices to an "unused" array
10:58 jnthn Then we need a lock on taht
10:58 jnthn And urgh
10:58 jnthn So I suspect the stats holder object is the way to go
11:00 jnthn I wonder if we should hang *all* spesh stuff off such an object
11:01 jnthn Tempting.
11:02 jnthn Then for frames that are never called at all (which in a typical program is most of them) we save 24 bytes (64-bit)
11:02 jnthn And the MVMStaticFrame itself will never be forced into the inter-gen roots list
11:03 jnthn (The most of them meaning all of the CORE.setting frames we don't call)
11:05 timotimo that doesn't sound bad; is it problematic to install the spesh object in a threaded environment?
11:06 jnthn I was figuring we'd allocate it on first call
11:06 jnthn When we do all the other static frame setup work
11:06 jnthn Which we do under lock
11:06 jnthn So it's safe
11:07 jnthn It's an extra indirection to reach the arg guards is all
11:07 jnthn But when we're paying so much in the static frame gc_mark...
11:07 timotimo OK
11:10 jnthn Yeah, I think that's a good bet
11:10 jnthn And I can move things bit by bit
11:10 jnthn So, will do that. But...after lunch :)
11:13 timotimo in come lots of "body."
11:15 travis-ci joined #moarvm
11:15 travis-ci MoarVM build passed. Jonathan Worthington 'Zero owner of a stack frame also.
11:15 travis-ci https://travis-ci.org/MoarVM/MoarVM/builds/258502483 https://github.com/MoarVM/MoarVM/compare/fdd04a6a993a...82c282efdae8
11:15 travis-ci left #moarvm
11:59 committable6 joined #moarvm
11:59 quotable6 joined #moarvm
11:59 bloatable6 joined #moarvm
11:59 bisectable6 joined #moarvm
11:59 greppable6 joined #moarvm
11:59 evalable6 joined #moarvm
11:59 coverable6 joined #moarvm
11:59 unicodable6 joined #moarvm
11:59 benchable6 joined #moarvm
11:59 statisfiable6 joined #moarvm
12:39 zakharyas joined #moarvm
12:49 Geth ¦ MoarVM: 3f26aa9a6e | (Jonathan Worthington)++ | 11 files
12:49 Geth ¦ MoarVM: Add spesh data REPR; hang off static frame.
12:49 Geth ¦ MoarVM:
12:49 Geth ¦ MoarVM: All of the spesh-related fields in MVMStaticFrame will move into this
12:49 Geth ¦ MoarVM: in order to get better generational GC behavior (by cutting down how
12:49 Geth ¦ MoarVM: many times we need to mark static frames) and making static frames a
12:49 Geth ¦ MoarVM: bit smaller in the case they aren't used (which, for something like
12:49 Geth ¦ MoarVM: CORE.setting, is a lot of the time for typical programs).
12:49 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/3f26aa9a6e
13:09 Geth ¦ MoarVM: 838f7cf70c | (Jonathan Worthington)++ | 13 files
13:09 Geth ¦ MoarVM: Move candidates and entries count to spesh object.
13:09 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/838f7cf70c
13:13 nine I think unused code is actually quite common even outside the setting. Think about all those times you use some module just for one of its gazillion functions. Or about loading a full DBIx::Class schema with all table definitions (and custom methods) just for inserting some entries in a single table in a script.
13:19 jnthn Yeah, good point :)
13:19 jnthn Saving 24 bytes off each of those could quickly add up :)
13:20 nine I guess that's also true for all the generated accessor methods?
13:21 jnthn Yup
13:24 brrt jnthn; the spesh guard tree thingy… do we ever update that as more specializations come in?
13:24 timotimo yeah
13:25 timotimo we use the "free at safepoint" mechanism to make replacing it safe
13:25 brrt so, you mentioned at some point that we should JIT compile that
13:25 jnthn We update it by copying it and tweaking and then installing the new one as just a pointer update
13:25 brrt i'm guessing that at the point we are compiling a JIT frame, we at least have the current version of that tree
13:26 jnthn And yeah, safepoint thing as timo mentioned it
13:26 brrt so we can compile a new tree into the latest JIT frame
13:26 jnthn At the moment we delete the tree after
13:26 jnthn oops
13:26 jnthn update the tree after
13:26 brrt hmmm
13:26 jnthn But there's no need for that
13:26 jnthn It's just the way it is
13:26 jnthn I can't think of a reason why we can't re-order
13:26 jnthn I mean, we can't *publish* the new tree until the compilation is done
13:27 jnthn But we can produce it and have it avaialble earlier
13:27 brrt i'm asking mostly because if we'd allocate it separately, we need at least a whole page for it
13:27 brrt whereas if we do it in the compiled frame, we can append it after
13:27 jnthn ah
13:27 jnthn True
13:28 jnthn Though we lose the ability to free older ones
13:28 jnthn But still, it's probably very little code, so...
13:28 brrt why would we lose that?
13:29 jnthn Because we still need the JITted candidate
13:29 jnthn I mean, most of the time it's a non-issue as probably there's space left over in the page we put JIT output on anyway
13:36 Geth ¦ MoarVM: 5fdfddf300 | (Jonathan Worthington)++ | 15 files
13:36 Geth ¦ MoarVM: Move spesh arg guards into spesh object.
13:36 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/5fdfddf300
13:41 Geth ¦ MoarVM: 7fae17c96e | (Jonathan Worthington)++ | 8 files
13:41 Geth ¦ MoarVM: Move spesh stats to spesh object.
13:41 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/7fae17c96e
13:41 Geth ¦ MoarVM: eff1b1f35d | (Jonathan Worthington)++ | src/core/frame.c
13:41 Geth ¦ MoarVM: Fix typo; MasterDuke17++.
13:41 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/eff1b1f35d
13:41 lizmat joined #moarvm
13:47 Geth ¦ MoarVM: 1db5c5020a | (Jonathan Worthington)++ | src/spesh/candidate.c
13:47 Geth ¦ MoarVM: No longer force static frame into gen2 after spesh
13:47 Geth ¦ MoarVM:
13:47 Geth ¦ MoarVM: All of the spesh-related data is on the static frame spesh object
13:47 Geth ¦ MoarVM: instead now, so just barrier that.
13:47 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/1db5c5020a
13:48 jnthn Now for a callgrind run to see if it helped :)
14:10 Voldenet joined #moarvm
14:10 Voldenet joined #moarvm
14:20 zakharyas joined #moarvm
14:32 jnthn Yeah, it's pushed static frame marking way down the profile
14:43 Geth ¦ MoarVM: 2bfee71331 | (Jonathan Worthington)++ | 2 files
14:43 Geth ¦ MoarVM: Break MVM_sc_get_sc into inline and slow path.
14:43 Geth ¦ MoarVM:
14:43 Geth ¦ MoarVM: This gets called incredibly often, but in the common case does very
14:43 Geth ¦ MoarVM: little work. Split it up into a static inline and a normal function
14:43 Geth ¦ MoarVM: for the slow path part, since the fast part will most probably be
14:43 Geth ¦ MoarVM: less insturctions than making the call.
14:43 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/2bfee71331
14:48 timotimo yes, yes, we must get rid of these terrible insturctions :)
14:51 jnthn :P
15:03 domidumont joined #moarvm
15:09 brrt joined #moarvm
15:30 Geth ¦ MoarVM: 5c67d732b1 | (Jonathan Worthington)++ | src/gc/collect.c
15:30 Geth ¦ MoarVM: calloc a tospace instead of memset old fromspace.
15:30 Geth ¦ MoarVM:
15:30 Geth ¦ MoarVM: Callgrind had the memset as a huge cost, and believes this to be far
15:30 Geth ¦ MoarVM: cheaper. Wallclock timings show it seems to be slightly faster, though
15:30 Geth ¦ MoarVM: not by ~5%; not all instructions are created equal, and the callgrind
15:30 Geth ¦ MoarVM: counts are in terms of instructions, and the outcome does show a ~5%
15:30 Geth ¦ MoarVM: reduction there.
15:30 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/5c67d732b1
15:30 Geth ¦ MoarVM: 1d3a139bf2 | (Jonathan Worthington)++ | 4 files
15:30 Geth ¦ MoarVM: Avoid range check on every SC object access.
15:30 Geth ¦ MoarVM:
15:30 Geth ¦ MoarVM: By validating them as part of bytecode validation to ensure they
15:30 Geth ¦ MoarVM: aren't too big, and using unsigned instead of signed there and in
15:30 Geth ¦ MoarVM: the access to avoid the zero check.
15:30 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/1d3a139bf2
15:46 Geth ¦ MoarVM: MasterDuke17++ created pull request #619: Remove unnecessary variable in MVM_string_(ascii|latin1)_encode_substr
15:46 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/pull/619
16:38 robertle joined #moarvm
18:41 * jnthn went home but his brane kept working on how we'll take spesh forward anyway...
18:43 jnthn Basically, we cheated really hard on aliasing so far. We presumed that just because we guarded a Scalar and it contained something then it would continue to contain that. This didn't really work so we ended up with a "block it if we see a call", which is both too strong (most calls immediately decont their args) and too weak (we don't know the Scalar isn't aliased elsewhere, and being touched by another thread)
18:43 lizmat jnthn: a datapoint: "my @a = (^10000)>>.Str" with --profile reliably segfaults for me, with ^1000 it doesn't
18:43 jnthn lizmat: That's not really related to what I'm thinking about now :)
18:44 jnthn But maybe you meant a general thing that wants a look at :)
18:44 lizmat true, it wasn't related to it in any way...
18:44 jnthn Ah :)
18:44 lizmat sorry to have interrupted your train of thought
18:44 jnthn OK, not sure why that'd be; could you RT it so it doesn't get lost?
18:45 lizmat will do
18:45 jnthn Thanks
18:46 jnthn Anyway, doing a more proper alias analysis is useful for cutting down on our GC. But we kinda have a problem in that almost everything in Perl 6 is a call. All operators are multi-dispatch. And we pass them scalar containers.
18:46 jnthn Anyway, I had the idea a while back that spesh could calculate frame-level facts, and attach them to the spesh candidate
18:47 jnthn Included "we always decont this straight away" for each object arg, and later richer escape info. Plus "we always return type T"
18:48 jnthn The other thing I just realized is that our facts need to become lattices like:
18:49 jnthn Morphic (?)
18:49 jnthn /    .... \
18:49 jnthn Int        Str
18:49 jnthn \         /
18:50 jnthn Unknown (?)
18:50 jnthn Uh, crappy IRC drawing :)
18:51 zakharyas joined #moarvm
18:51 jnthn So in a case like my $i = 0; while $i < 100 { $i++ }
18:51 jnthn We can do an abstract interpretation
18:52 jnthn Actually that's an awkward example, let's do
18:52 jnthn So in a case like my $i = 0; while $i < 100 { $i = $i + 1 }
18:52 jnthn So here the PHI at the entrance to the loop body has $i as containing an Int
18:53 jnthn We then say "OK, if we presume that we call the $i + 1 candidate we'd be allowed to, what'd it give us?" and we know from the frame facts it's an Int
18:53 jnthn Which then gets stored into $i
18:54 jnthn Then we have to join our content model and at the join point both say Int and Int so we stay on Int
18:54 jnthn If they instead could differ then we'd end up at Morphic
18:54 jnthn Which may have impacts on other things
18:54 jnthn So we'd iterate to a fixed point
18:55 jnthn Which the lattices promise we'll reach
18:55 jnthn And only then would be go and do the various transformations
18:58 * lizmat is afk again&
19:01 jnthn Anyways, I think that gives me enough of an idea of how to move forward that I can relax and enjoy my weekend now :P
19:08 [Coke] \o/
19:10 Zoffix \o/
19:27 timotimo that's good
21:07 brrt joined #moarvm
21:15 praisethemoon joined #moarvm
21:15 praisethemoon joined #moarvm
21:16 brrt good * #moarvm
21:17 timotimo yo brrt
21:17 timotimo how's you?
21:17 brrt happy that it's weekend
21:17 brrt how are you?
21:17 timotimo i'm okay
21:18 timotimo a bit frustrated by my continuing lack of productivity
21:22 praisethemoon joined #moarvm
21:22 praisethemoon joined #moarvm
21:22 [Coke] I feel your pain.
21:23 [Coke] (about, to be clear, my own productivity)
21:28 mst similar
21:28 mst too much staring at an empty editor
22:02 dogbert2 joined #moarvm
22:02 dogbert2 there's at least one bug still lurking in the shadows
22:04 timotimo crashbug?
22:04 dogbert2 yes
22:04 dogbert2 read all about it :) https://gist.github.com/dogbert17/3cc123e33a87a66984ae8752eee4fb3c
22:12 samcv woo my ring buffer at least seems to work. now to run tests on it
22:12 samcv though most of the unicode tests are all really short so won't see any difference there. but it shouldn't matter if i chop off the front
22:15 samcv and also having the ringbuffer allows me to break any resulting collation ties easily. i already know which string would be greater or less than the other if broken by a codepoint tie
22:15 samcv atm i have it go back through the string again from start to finish in case the collation values are the same. now i can just store that value and return it if the collation values end up the same
22:17 timotimo that sounds like good news
22:18 samcv yep :)
22:19 timotimo dogbert2: does it trigger reliably? did you set anything up like a smaller nursery?
22:21 timotimo i think i see what's wrong
22:21 dogbert2 timotimo: not reliable, I had to run with a small nursery but it still doesn't trigger every time
22:23 dogbert2 I ran with '#define MVM_NURSERY_SIZE (32768 * 8)'
22:23 timotimo right
22:24 timotimo the spesh thread doesn't know to wait while the threadcontext gets destroyed while it's joined
22:24 timotimo so it blindly reaches for where there used to be a thread to fetch some log data
22:24 timotimo and bam
22:26 samcv anybody know if i'm wrong that when you MVM_malloc something, and then use MVM_realloc, that you can't just call MVM_free on it?
22:26 samcv because if i had some allocated memory that had been resized with MVM_realloc, mvm would crash when i ran MVM_free
22:27 timotimo didn't you say you realloced it to 0?
22:27 samcv i did
22:27 timotimo i'd say that the implementation treats that as if you had used free on it
22:27 timotimo and another free would be explosive
22:27 samcv well not quite
22:27 Geth ¦ MoarVM/collation-arrays: 8 commits pushed by (Samantha McVey)++
22:27 Geth ¦ MoarVM/collation-arrays: c8321299e6 | Rework the generated collation values into functions and cleanup
22:27 Geth ¦ MoarVM/collation-arrays: ff2c339e48 | Update to use the newly generated normalize.h code
22:27 Geth ¦ MoarVM/collation-arrays: 44d56aec84 | Update to decompose codepoints if it can't find a match
22:27 Geth ¦ MoarVM/collation-arrays: f92db8da27 | separate out NFD code into its own function and remove unneeded hacks
22:27 Geth ¦ MoarVM/collation-arrays: fe5b26b377 | Make sure we fall back to the first node if the last node doesn't have collation elements
22:27 Geth ¦ MoarVM/collation-arrays: f0b753a1be | More cleanup and resize the stack if it gets too big
22:27 Geth ¦ MoarVM/collation-arrays: 9150c4df1e | Add MVM types to more variables
22:27 Geth ¦ MoarVM/collation-arrays: b145128c80 | Add some code to a macro to deduplicate it
22:27 Geth ¦ MoarVM/collation-arrays: review: https://github.com/MoarVM/MoarVM/compare/8dd36c2ad5...b145128c80
22:28 samcv timotimo, https://github.com/MoarVM/MoarVM/blob/collation-arrays/src/strings/unicode_ops.c#L62-L68 this is what i do now
22:28 samcv after reading the documentation for realloc more closely
22:28 timotimo huh, that's strange?
22:29 samcv it says it may return NULL or a pointer suitable to be freed
22:29 samcv wondering if we do this other places in moarvm where we use realloc?
22:29 timotimo but why do you realloc to 0 in the first place?
22:29 samcv because otherwise it crashes moarvm
22:30 timotimo o_O
22:30 samcv yep
22:30 timotimo valgrind should tell you why exactly, i'd expect
22:30 samcv it prints out a big stacktrace
22:31 timotimo like it'll tell you if it had already been freed, or if it's not a pointer that was returned by any *alloc
22:39 timotimo it looks like we never actually root an MVMThreadContext
22:39 timotimo we clearly just use the status to tell when it may be destroyed
22:46 dogbert2 aha
22:48 timotimo i'm not sure what the right step forwards is
22:49 timotimo but jnthn will know :)
22:50 Geth ¦ MoarVM: b161c85143 | (Timo Paulssen)++ | src/profiler/instrument.c
22:50 Geth ¦ MoarVM: be safe about marking NULL call graph nodes
22:50 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/b161c85143
22:55 praisethemoon joined #moarvm
22:56 timotimo dogbert2: want to open a RT about the join issue?
22:57 timotimo something along the lines of "the spesh worker doesn't cope with a tc being freed while it's working on data contributed by it"
23:10 jnthn timotimo: Is that profiler patch for the profiler bug lizmat mentioned earlier?
23:21 samcv jnthn, can i make a proposal for MVM_string_compare?
23:22 jnthn samcv: sure
23:22 samcv currently it isn't predicable when synthetics are involved. we don't need to expand most synthetics, but only one. i.e. if the different grapheme between the two is a synthetic, we should return based on codepoint instead of the synthetic number
23:22 samcv all other synthetics we encounter just get left aloe
23:22 samcv *alone
23:23 samcv so shouldn't slow it down except a very tiny amount
23:25 jnthn Well, it's predictable if you consider it a codepoint level operation... :)
23:25 jnthn oh wait
23:25 jnthn It ain't
23:25 samcv well. if synthetics get added in a different order then it will have a different response
23:25 samcv yep
23:25 jnthn It's scanning graphemes
23:25 jnthn d'oh
23:25 jnthn Yeah, it isn't doing what I thought it was doing
23:26 jnthn I dunno, maybe the easiest way to do it is with codepoint iters...
23:26 samcv yeah. we save time not doing a codepoint iterator, which is good. it's not needed except if the deciding codepoint is a synthetic
23:26 samcv then it matters
23:26 jnthn Or the spiritual equivalent
23:26 jnthn Yeah
23:26 samcv codepoint iterator won't be as fast and isn't needed
23:26 jnthn Well, it depends :)
23:26 samcv we can do a graphemecodepoint iterator the thing i just created a few days back
23:26 samcv but only if one of the deciding codepoints is a synthetic
23:26 jnthn Codepoint iter is faster if you've strandy stuff going on
23:27 samcv wait why?
23:27 jnthn 'cus it doesn't have to re-scan the strands to figure out which one has the grapheme you want
23:27 samcv codepoint iterators are more work
23:27 samcv since it has to iterate each grapheme
23:27 samcv and we *don't* want to iterate by codepoint
23:28 samcv we want to iterate by grapheme and oly compare one that differs
23:28 samcv otherwise the strings will become unaligned
23:28 samcv if one grapheme expands more than another
23:28 samcv oh wait
23:28 samcv that makes no sense
23:28 samcv hah
23:28 jnthn I don't think that could ever cause a different result
23:28 jnthn :)
23:28 samcv well it won't become misaliged
23:28 jnthn 'cus we return as soon as we see the diference
23:28 samcv but it will have to expand graphemes for no reason
23:29 jnthn But yeah, agree
23:29 samcv and codepoint iterator is just a grapheme + codhepoint iterator. it iterates both
23:29 jnthn My point wasn't so much about codepoint iter so much as about grapheme iterator and codepoint iterator in general
23:29 timotimo jnthn: i stumbled upon a null deref and fixed it, but forgot to push for hours. i don't know if it also fixes what liz found
23:29 jnthn Which is that MVM_string_get_grapheme_at_nocheck is cheap when it is just a buffer string and you can fixed index.
23:30 timotimo damn, without the patch it doesn't crash either
23:30 samcv oh
23:30 samcv ah
23:30 jnthn But when you've a string of strands it might have to look through 10 strands to see their lengths to know that the grapheme it wants is actually in the 11th one, for example
23:30 samcv yeah
23:30 samcv good point. so we should use a grapheme iterator probably
23:30 jnthn Yeah
23:31 jnthn I think what I wanted to say is that we probably want the semantics we'd get if we used codepoint iter
23:31 jnthn Not that using codepoint iter was the best way to implement it
23:31 samcv oh. wait how does codepoint iter have different semantics to grapheme iterator?
23:31 jnthn Though maybe what you're suggesting is subtly different
23:32 samcv the codepoint iterator is just a grapheme iterator but if it's synthetic iterates by codepoint then once done with that synthetic grabs next from the grapheme iterator
23:32 jnthn iiuc, you're saying that when we see a synthetic, than we'll look at the codepoints it would conssit of
23:32 samcv yeah
23:32 jnthn Which is I think going to give about the same results, but faster, as if we just used the codepoint iter
23:32 timotimo it was a bit strange to me that the title said "crash with --profile" but the examples in the body didn't have --profile
23:32 samcv so we don't waste time expanding synthetics that are going to be equal
23:32 samcv yeah
23:33 jnthn Yeah, I think we're agreeing :)
23:33 samcv cool :-)
23:33 jnthn I fully agree the semantics today are weird. I didn't realize it did that.
23:36 Geth ¦ MoarVM/collation-arrays: 5fc8dce423 | (Samantha McVey)++ | src/strings/unicode_ops.c
23:36 Geth ¦ MoarVM/collation-arrays: Implement a ring buffer so only different codepoints need processing
23:36 Geth ¦ MoarVM/collation-arrays:
23:36 Geth ¦ MoarVM/collation-arrays: The ring buffers hold the exact number of codepoints which comprise the longest
23:36 Geth ¦ MoarVM/collation-arrays: sequence of codepoints which map to its own collation keys in the Unicode
23:36 Geth ¦ MoarVM/collation-arrays: Collation Algorithm. As of Unicode 9.0 this number was 3.
23:36 Geth ¦ MoarVM/collation-arrays:
23:36 Geth ¦ MoarVM/collation-arrays: This also allows us to return based on codepoint based on the determination
23:36 Geth ¦ MoarVM/collation-arrays: we found using the ring buffer
23:36 Geth ¦ MoarVM/collation-arrays: review: https://github.com/MoarVM/MoarVM/commit/5fc8dce423
23:38 samcv also we should make sure to use MVMuint64 or MVMint64 for every "i" we use iterating a string. since if the string is the 2**32 long, then i will roll back to 0. at least i assume
23:38 samcv or maybe i'm wrong
23:39 timotimo what the hell, it crashes with that exact number, but not with a higher number
23:39 timotimo but with the patch cherry-picked it doesn't crash immediately
23:40 jnthn samcv: MVMuint32 would be save since the grapheme count is currently a 32-bit number
23:41 jnthn samcv: We could decide we want to support strings of more than 4GB I guess... :)
23:41 samcv 32bit signed?
23:41 timotimo didn't someone just upgrade that for us?
23:41 jnthn I thought it was unsinged
23:41 samcv so it can be max 2**31 ?
23:41 samcv yeah it is unsigned
23:42 jnthn But we index from 0 so a for loop up to the int max value is safe?
23:42 samcv for (i = 0; i < str_len; i++) { } ah you're right ok so
23:42 samcv i think that sholud be fine
23:42 samcv since i would == str_len and then it'd end the loop
23:42 jnthn We could go 64-bit
23:42 jnthn Yeah
23:42 samcv but str_len = 2**32 -1  then?
23:42 jnthn Don't know anybody has hit the limit yet though :)
23:43 jnthn Yeah
23:43 samcv i can do 'a' x 2 ** 32 - 2
23:43 samcv but not - 1
23:43 jnthn But when doing things like finding the lenght of a join/concat we should really use 64-bit
23:44 jnthn *length
23:44 samcv yeah
23:44 jnthn Though if we used 64-bit everywhere we're fine. It's just such a ridiculously big number ;)
23:50 samcv jnthn, i'm also going to make it so if all the codepoints in the synthetic match, it goes based on which has the most codepoints
23:51 samcv (in that synthetic)
23:51 timotimo jnthn: got a hot take for "tc gets freed because thread_join but spesh is still working on stuff submitted from it"?
23:51 samcv since that could totally happen
23:53 samcv will come in useful that you can use MVM_string_grapheme_ci on non-synthetics
23:54 samcv i think i used that as well to greatly simplify the code so it wouldn't have to have a huge number of branches

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary