Perl 6 - the future is here, just unevenly distributed

IRC log for #moarvm, 2017-09-10

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:16 geekosaur joined #moarvm
01:55 ilbot3 joined #moarvm
01:55 Topic for #moarvm is now https://github.com/moarvm/moarvm | IRC logs at  http://irclog.perlgeek.de/moarvm/today
05:02 Geth ¦ MoarVM: samcv++ created pull request #685: Merge collation-arrays branch
05:02 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/pull/685
07:30 Geth ¦ MoarVM: 088aa0a022 | Skarsnik++ (committed using GitHub Web editor) | src/6model/reprs/CArray.c
07:30 Geth ¦ MoarVM: Fix a leak in CArray repr.
07:30 Geth ¦ MoarVM:
07:30 Geth ¦ MoarVM: When expand is called (via at-pos) it make room for more perl6 object in the body, but this does not get free if the array is not managed by MoarVM.
07:30 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/088aa0a022
07:30 Geth ¦ MoarVM: 1059eed1cc | niner++ (committed using GitHub Web editor) | src/6model/reprs/CArray.c
07:30 Geth ¦ MoarVM: Merge pull request #684 from Skarsnik/patch-1
07:30 Geth ¦ MoarVM:
07:30 Geth ¦ MoarVM: Fix a leak in CArray repr.
07:30 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/1059eed1cc
07:58 domidumont joined #moarvm
08:03 domidumont joined #moarvm
08:32 nine ugexe: it's taken me years to get this into my head: MVM_ROOT is less about the value than it is about the variable. It's for having the GC update your local pointer to the object after it moved the object to the second generation as the object's address will change.
08:32 yoleaux 8 Sep 2017 15:53Z <tbrowder> nine: I just filed issue #101 with Inline::Perl5; failure using Expect::Simple
08:32 yoleaux 8 Sep 2017 19:19Z <tbrowder> nine: my version was too old, closing issue 101
08:52 leont joined #moarvm
10:18 Geth ¦ MoarVM: 866623d933 | (Samantha McVey)++ (committed using GitHub Web editor) | 8 files
10:18 Geth ¦ MoarVM: Merge Full Unicode Collation Algorithm Implementation
10:18 Geth ¦ MoarVM:
10:18 Geth ¦ MoarVM: This is a full implementation of the Unicode Collation Algorithm. We iterate
10:18 Geth ¦ MoarVM: by codepoint and put this into a ring buffer. The ring buffers hold the exact
10:18 Geth ¦ MoarVM: number of codepoints which comprise the longest sequence of codepoints which
10:18 Geth ¦ MoarVM: map to its own collation keys in the Unicode Collation Algorithm. As of Unicode
10:18 Geth ¦ MoarVM: 9.0 this number was 3. When Generate-Collation-Data.p6 is run, this number will
10:18 Geth ¦ MoarVM: <…commit message has 42 more lines…>
10:18 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/866623d933
10:21 Geth ¦ MoarVM: 0b81969db2 | (Samantha McVey)++ | 2 files
10:21 Geth ¦ MoarVM: Remove unneeded file from UCA implementation
10:21 Geth ¦ MoarVM:
10:21 Geth ¦ MoarVM: This file isn't needed for the generation script.
10:21 Geth ¦ MoarVM: review: https://github.com/MoarVM/MoarVM/commit/0b81969db2
10:45 timotimo nine: not only will the address change when the object is moved to the second gen, but we also move it from one half of the nursery to the other half at least once
10:57 nine Oh, I didn't know that!
10:58 timotimo yeah, we have "semispace copying nurseries" or what it's called
11:00 jnthn Yeah, the nursery is a semispace; we allocate in half of it, then evacuate the objects still alive, but never spotted by GC before, into the other half, and move the ones we did see once before into gen2
11:00 jnthn So each thread switches between the spaces per GC
11:02 jnthn This is a really nice scheme in that 1) you get cheap bump-the-pointer allocation, 2) for objects that die right away you do close to zero work
11:02 jnthn And 1 implies good cache locality also
11:03 jnthn Though you can't do it for the entire heap, otherwise your memory overhead becomes immediately at least 2 * what the program actually needs
11:09 Skarsnik joined #moarvm
11:44 vendethiel- joined #moarvm
11:46 MasterDuke samcv: your latest commit or two broke the moarvm build for me
11:48 MasterDuke dir
11:48 MasterDuke wrong window
11:49 MasterDuke src/strings/unicode.c:78279:43: error: unknown type name ‘sub_node’
11:49 MasterDuke src/strings/unicode.c:78324:24: error: ‘codepoint_sequence_no_max’ undeclared here (not in a function); did you mean ‘codepoints_by_name’?
11:59 MasterDuke anyone else getting those?
12:13 brrt joined #moarvm
12:14 timotimo let me look
12:15 MasterDuke i did make realclean, but still seeing it
12:15 timotimo yeah i get these errors too
12:16 timotimo this starter_main_elems is a #define in unicode_uca.c
12:16 MasterDuke huh, it builds fine on my laptop
12:17 timotimo easy fix
12:17 timotimo rm unicode.c; make
12:39 Skarsnik what MVMROOT does?
12:40 timotimo what about it?
12:41 Skarsnik Dunno I am reading CArray.c at_pos function, since I get 2 crash pointing at this
12:42 Skarsnik well pointing at the NC.pm operator [] for array
12:42 timotimo oh you mean "what does it do"
12:42 Skarsnik Yeah sorry x)
12:42 timotimo it tells the GC "this local variable points at a GC-managed object. i want this object to stay alive and i want you to update the pointer when the object gets moved"
12:45 Skarsnik hm I am not sure understand. This mark the whole child_objs array and not the new object created in it ? https://github.com/MoarVM/MoarVM/blob/master/src/6model/reprs/CArray.c#L296
12:47 timotimo the storage array is likely handled by gc_mark
12:47 timotimo the object we're calling at_pos on is "root", that gets pointed out to the GC and the GC will then do the rest
12:48 timotimo but it doesn't look like the storage array is gc-relevant
12:51 Skarsnik storage is handled by C in this part. What I get is the chlild_objs array is what Moar add to track the perl6 objects it return
12:51 timotimo mhm
12:55 timotimo oh
12:55 timotimo the reason why the crash points at the MVMROOT
12:55 timotimo is because gdb is stupid
12:55 timotimo it treats the whole macro as a single line
12:56 timotimo so where the crash happens exactly is hidden
12:57 Skarsnik I was not clear, it does not crash on MVMROOT but the backtrace pointed me at this function x)
12:57 Skarsnik So I try to understand it x)à
13:00 timotimo oh ok
13:02 dogbert2 joined #moarvm
14:08 dogbert2 c: MVM_SPESH_NODELAY=1 HEAD use Test; sub try_eval($str) { try EVAL $str }; is(try_eval('my @x = <a b c>; sub y (@z) { @z[0] }; y(@x)'), "a b c", "NO-BREAK SPACE") for ^10;
14:08 committable6 dogbert2, ¦HEAD(9b42484): «ok 1 - NO-BREAK SPACE␤ok 2 - NO-BREAK SPACE␤ok 3 - NO-BREAK SPACE␤ok 4 - NO-BREAK SPACE␤ok 5 - NO-BREAK SPACE␤ok 6 - NO-BREAK SPACE␤ok 7 - NO-BREAK SPACE␤ok 8 - NO-BREAK SPACE␤ok 9 - NO-BREAK SPACE␤ok 10 - NO-BREAK SPACE»
14:08 dogbert2 hmm
14:09 dogbert2 c: MVM_SPESH_NODELAY=1 HEAD use Test; sub try_eval($str) { try EVAL $str }; is(try_eval('my @x = <a b c>; sub y (@z) { @z[0] }; y(@x)'), "a b c", "NO-BREAK SPACE") for ^10;
14:09 committable6 dogbert2, ¦HEAD(9b42484): «ok 1 - NO-BREAK SPACE␤ok 2 - NO-BREAK SPACE␤ok 3 - NO-BREAK SPACE␤ok 4 - NO-BREAK SPACE␤ok 5 - NO-BREAK SPACE␤ok 6 - NO-BREAK SPACE␤ok 7 - NO-BREAK SPACE␤ok 8 - NO-BREAK SPACE␤ok 9 - NO-BREAK SPACE␤ok 10 - NO-BREAK SPACE»
14:10 dogbert2 this code is part of t/spec/S02-lexical-conventions/unicode-whitespace.t and on my system it behaves very badly when MVM_SPESH_NODELAY=1
14:12 dogbert2 lots of test failures, although the first few works. Interestingly, if I add MVM_SPESH_BLOCKING=1 the problem vanishes
14:21 dogbert2 for me the output from running the above mentioned file looks like this: https://gist.github.com/dogbert17/cb708cb3a3872b94ded3e00ff0a668e5
14:21 robertle joined #moarvm
15:06 zakharyas joined #moarvm
15:56 zakharyas joined #moarvm
16:28 dogbert2 should it be possible to decode any Buf, containing bytes, to a Str with utf8-c8 ?
16:31 timotimo should be, yeah, but currently we'll bail if there's "almost valid" utf8
16:33 dogbert2 ah, I was looking at one of [Tux]'s test files and it generates random buffers and try to decode them to Str with utf8-c8
16:34 dogbert2 so if the buffer contains "almost valid" utf-8 then the utf8-c8 decoder will fail?
16:36 timotimo right
16:36 timotimo for you see
16:36 timotimo there's encoding-wise valid utf8 that's still invalid because the codepoints encoded aren't proper
16:36 timotimo we'd still want to encode these with synthetics, though
16:36 timotimo nobody implemented that yet, though
16:37 dogbert2 aha, that explains it, many thanks
16:39 domidumont joined #moarvm
16:39 timotimo sure
16:43 timotimo m: say chr(0x10ffffff + 1)
16:43 camelia rakudo-moar 9b4248: OUTPUT: «Error encoding UTF-8 string: could not encode codepoint 285212672 (0x11000000), codepoint out of bounds. Cannot encode higher than 1114111 (0x10FFFF)␤  in block <unit> at <tmp> line 1␤␤»
16:43 timotimo m: say chr(0x10ffff + 1)
16:43 camelia rakudo-moar 9b4248: OUTPUT: «Error encoding UTF-8 string: could not encode codepoint 1114112 (0x110000), codepoint out of bounds. Cannot encode higher than 1114111 (0x10FFFF)␤  in block <unit> at <tmp> line 1␤␤»
16:47 timotimo if i'm not mistaken, utf8 can no-problem represent codepoints higher than this, but they are "forbidden"
16:47 dogbert2 that's mean :)
16:47 timotimo so if you encounter a properly encoded value above 0x10ffff we won't create a synthetic because the encoding is correct, but it's still explosive in a "later stage"
16:47 timotimo with the cold i'm having i don't have the necessary brain grease to step in and fix it
16:48 dogbert2 do you use any house remedies to get well, e.g. c-vitamins and such
16:49 timotimo i got aspirin complex which is painkiller + cough suppressant + something else
16:50 timotimo something that prevents the nose from running as much
16:50 dogbert2 cool, I believe that these combo meds are forbidden where I live, don't know why though
16:51 timotimo huh? weird.
16:51 dogbert2 indeed, so you have to get several meds instead of one
16:52 timotimo i wonder if the combo meds are noticably more expensive than getting the individual parts and mixing them by yourself
16:53 dogbert2 good question
17:04 leont utf-8 isn't really well-defined beyond that point
17:15 timotimo oh
17:18 pmurias joined #moarvm
17:19 pmurias the MoarVM wants to have all not static function MVM_ prefixed?
17:27 leont Start bytes F0-4 are defined as encoding for 4 byte codepoints, and logically F5-F8 would do the same (IMHO), but what should F9 do? 5 bytes?
17:27 leont Erm, that's a one off, F5-F7, and F8
17:29 leont (in the C0-F4 range, the number of initial 1s is the number of bytes for the character)
17:32 leont The reason for the 0x10ffff limit is that UTF-16 can't express any higher codepoints, because it's a idiotic encoding
20:19 robertle joined #moarvm
20:27 robertle_ joined #moarvm
20:31 zakharyas joined #moarvm
21:58 dogbert2 joined #moarvm
23:02 bisectable6 joined #moarvm
23:02 benchable6 joined #moarvm
23:17 MasterDuke joined #moarvm

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary