Perl 6 - the future is here, just unevenly distributed

IRC log for #moarvm, 2017-06-25

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:48 ilbot3 joined #moarvm
01:48 Topic for #moarvm is now https://github.com/moarvm/moarvm | IRC logs at  http://irclog.perlgeek.de/moarvm/today
04:11 samcv good *
04:18 samcv so it looks like the max number of codepoints => 1 or more collation keys is 3
04:19 samcv so we are dealing 3 max cp there. and the max number of collation keys which match a certain codepoint or sequence is 18
04:20 samcv though the very vast majority only have 1-3
04:20 samcv array => [(9060 32 26) (9116 32 26) (9157 32 26) (521 32 26) (8971 32 26) (9116 32 26) (9116 32 26) (9137 32 26) (521 32 26) (9070 32 26) (9116 32 26) (9158 32 26) (9137 32 26) (521 32 26) (9143 32 26) (9049 32 26) (9116 32 26) (9123 32 26)], codepoints => [65018], comment => ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
04:20 samcv one codepoint.... to 18 collation keys...
04:20 samcv the array up there is what i'm calling collation keys (primary, secondary, tertiary factors for each key)
04:21 samcv so we will need to check for the next codepoint ahead in only certain cases. and rare cases 2 ahead. and i think that should be reasonable from a speed standpoint
04:22 samcv and having a ton of collation keys for a single codepoint is not really going to impact any of the other simple codepoints
04:23 samcv that sigature must be pretty long to need 18 collation keys though heh
04:24 samcv u: ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
04:24 unicodable6 samcv, U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM [Lo] (ﷺ)
04:24 samcv totally unreadable with monospace :P
04:25 samcv it's a ligature for three words. wow
04:30 geekosaur not much better with a proportioinal font. I have a dreadful suspicion someone tried to do calligraphy in a computer font...
04:31 samcv well. it's three words
04:32 samcv it means 'may peace be upon him'
04:32 samcv well it's three words in arabic. in english it's 5 :)
04:32 samcv and the words are on top of each other
04:33 samcv now with the UCA you can collate an equivilant of 18 characters in arabic together with the actual 18 characters :)
04:33 samcv hahaha
04:37 samcv oh and importantly. there's only 62 "special" starter codepoints. these are codepoints that have a possiblily of having to look at the next codepoint
04:37 samcv to see if it matches one of the special collation keys for series of cp's
04:37 samcv so that's not too many
04:41 geekosaur joined #moarvm
04:42 samcv though certain codepoints have 48 possibilities. so have 48 possiblities going off the first codepoint. eek
06:13 domidumont joined #moarvm
06:18 domidumont joined #moarvm
07:04 domidumont joined #moarvm
07:20 domidumont joined #moarvm
08:54 domidumont joined #moarvm
08:59 nine Trying to profile zef, I get a segfault caused by MVM_repr_get_int being called with a NULL object from JITed code: https://gist.github.com/niner/d5d2e6cf104ab1b592835826be972d2d
09:16 timotimo nine: it probably uses multiple threads? that makes the profiler tend to go boom
09:18 timotimo i mean if it uses a proc it now turns on the event loop thread, which makes the profiler unhappy
09:21 nine timotimo: no, there's only a single thread running
09:23 timotimo hmm
09:23 timotimo yeah, that's bad then :)
09:23 timotimo and disabling the jit makes it no longer asplode?
09:34 nine At least it works on 2017.06
09:36 nine Bisecting this sucks as I have to recompile moar+nqp+rakudo, otherwise I get errors like "Unhandled exception: Hash keys must be concrete strings"
09:43 nine timotimo: I should have checked if the JIT is to blame. Reverting your recent "JIT some more ops" commit gives me a shorter backtrace but still leaves the segfault
09:43 nine https://gist.github.com/niner/75f4f03fcdf50aedaf946b634ee4ba3f
09:45 nine So MVM_nfa_run_alt get's called with a NULL labels parameter and passes it on to MVM_repr_at_pos_i
09:56 timotimo i'll be afk for much of the day again
10:39 domidumont joined #moarvm
11:17 samcv ok so i visualized the data i have here with max depth 3 would a trie still be a good plan? idk. https://gist.github.com/0239ee41677dbc064d6d2692af3f01ef here it is. obviously the terminal hashes are the terminal points
11:17 samcv as most cases it's only two codeponitns in a row. the max is 3
11:17 samcv 1st -> 2nd -> 3rd codepoints or whatever
11:19 samcv since i have only 60 starters or so. maybe i should store indices of an array in the unicode data
11:20 samcv check the codepoint, gives me a point in an array which from there hmm. though also makes me think that some of these may be faster to have code to check if the next cp is between a certain range of values. and a switch or something
11:21 samcv since the highest depth is almost always 2. i get pointed to this array number, from 0 to 60 and then have a switch beneith that if the codepoint is in the proper range
11:22 samcv anybody got any suggestions?
12:13 moritz use a linked list for the second (or maybe third) character, and all that follow?
12:22 timotimo if the numbers we have are low enough, encode a "contination bit" in the numers?
12:41 AlexDaniel joined #moarvm
13:19 domidumont joined #moarvm
13:41 brrt joined #moarvm
13:43 brrt (my infinite loop is OSR)
13:43 brrt osr disabled, it finishes
13:51 brrt nine: i'll investigate at some point. i want to have a jit-bisect for specific ops...
13:52 timotimo oh, interesting
13:55 brrt well,
13:55 brrt it's virtually impossible, really
13:56 timotimo that's what bugs are :P
14:49 zakharyas joined #moarvm
14:51 ggoebel joined #moarvm
17:27 zakharyas joined #moarvm
17:32 MasterDuke joined #moarvm
17:48 domidumont joined #moarvm
19:12 zakharyas joined #moarvm
19:17 domidumont joined #moarvm
20:09 AlexDaniel joined #moarvm
22:08 lizmat joined #moarvm

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary