Perl 6 - the future is here, just unevenly distributed

IRC log for #moarvm, 2014-04-09

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:33 eternaleye joined #moarvm
00:47 TimToady I'd think hash functions would be a good use for polymorphism based on the storage type
00:48 TimToady if you're using conditionals for that, it's kinda smelly
00:49 TimToady but maybe the rules are different for in-lined stuff...
00:50 jnap joined #moarvm
00:50 TimToady (just spouting off, haven't actually looked at what you're doing :)
01:30 lizmat_ joined #moarvm
04:45 lizmat joined #moarvm
05:50 woolfy joined #moarvm
06:07 woolfy joined #moarvm
06:07 woolfy left #moarvm
06:07 colomon joined #moarvm
06:58 zakharyas joined #moarvm
07:07 timotimo it's kind of hard to do since the uthash is implemented with c preprocessor macros, so if the hash function to implement is conditional upon the type, some kind of conditional needs to remain in the code
07:08 timotimo we could of course put a function pointer into the struct that you have to put into the hashable entries anyway, but that is quite a lot of overhead - or at least it seems to me.
07:10 brrt joined #moarvm
07:11 lizmat joined #moarvm
07:20 brrt \o #moarvm
07:21 jnthn o/ brrt
07:21 nwc10 \o/
07:22 timotimo o
07:22 moritz \|o|/
07:22 brrt wow, such happiness :-)
07:23 * brrt reading backlog
07:24 jnthn so joy
07:27 brrt i'm at a loss what the goal of timotimo's last commit is tbh
07:28 timotimo the one from uthash_padding?
07:28 brrt yes
07:28 brrt doesn't seem to be used anyway?
07:28 brrt anywhere
07:28 timotimo not yet
07:28 timotimo we are currently forced to turn all our strings into the 32bit representation so that two equal strings will hash to the same value
07:29 timotimo otherwise we wouldn't be able to use hashes any more unless we force every string that can be in 8 bits to always be in 8 bits
07:29 brrt oh….
07:29 timotimo that's also why you see the MVM_string_flatten call everywhere
07:29 timotimo that gets rid of ropes and forces the 32bit thing
07:30 brrt hmmm
07:30 jnthn also 'cus the ropes code is...uh...ropey
07:30 brrt but then you have 32 bit strings … everywhere
07:31 jnthn brrt: Right
07:31 timotimo that's true
07:31 timotimo uthash doesn't make it very easy to swap out the hash function for different things
07:31 jnthn brrt: Which in terms of "get the right answers up to codepoint level" and "constant time indexing" is a fine choice.
07:31 brrt i guess thats somewhere between acceptable and annoying
07:31 jnthn Just not for efficiency. But optimization always comes after wroking. :)
07:32 brrt true enough
07:32 timotimo one way to "defeat" this problem is to write a "key extractor" function that turns the string into the 32bit representation for hashing, but that's an immense amount of overhead
07:32 jnthn uthash.h is like, epic macros
07:32 timotimo maybe i can factor out the "calculate the hash bucket" part of all the hash functions and allow the user to calculate the hash bucket directly
07:32 jnthn yeah
07:32 timotimo that would make it much less hacky i think
07:32 jnthn ok, I should teach stuff
07:33 timotimo except then you can do even more terrible things :)
07:33 jnthn bbiab
07:33 brrt what today?
07:33 brrt you want the bucket rather than the hash to be computed?
07:33 jnthn brrt: MVC web dev stuff
07:33 jnthn Nothing too thrilling ;)
07:33 brrt well, stay strong
07:33 brrt :-p
08:39 jnthn yeah, it's going fine :)
08:39 jnthn Well, only web-y course I gotta do this month
09:03 timotimo jnthn: putting nameds into callsites ... won't that make our callsite interning strategies much less successful?
09:03 jnthn timotimo: ONLY IF WE DON'T UPDAET THE INTERN CODE
09:03 jnthn uh, ooops]
09:04 timotimo was scared there for a little moment
09:04 timotimo the intern code can still only intern callsites if they have the same exact set of nameds, no?
09:04 jnthn hit caps lock instead of tab, then didn't look at what I'd typed :)
09:05 jnthn not only exact same name, but also exact same order
09:05 timotimo right
09:05 jnthn That's what allows us to do the names => positions opt
09:05 timotimo mhm
09:05 timotimo but at least the callsite doesn't contain a reference to what is being called, right?
09:06 timotimo (except a little cache for invocants)
09:06 jnthn Right. :)
09:06 timotimo what file do i look at to find the callsite writing code?
09:06 jnthn src/mast/compiler.c
09:06 jnthn I'd start by updating docs/bytecode.markdown or so
09:06 jnthn Just to get the format clear.
09:07 timotimo oh, get_callsite_id also writes the callsite to the bytecode file
09:07 brrt that would be awesome jnthn :-)
09:07 timotimo that explains why i overlooked it
09:07 timotimo what exactly?
09:07 brrt updating the docs ;_)
09:07 timotimo oh, now i get it!
09:07 brrt ;-)
09:07 jnthn bytecode.markdown is actually quite up to date, afaik.
09:08 timotimo we're actually turning nameds into something that looks exactly like a positional (because really it is)
09:08 jnthn brrt: Do you want a spesh doc?
09:08 jnthn brrt: If so, what parts do you most want a document on?
09:08 jnthn Same question to timotimo I guess :)
09:08 jnthn I'm happy to write something up, but it's good to know what the goal is :)
09:09 jnthn (Which should be "make things clear that the code doesn't make clear".)
09:09 * brrt is thinking about how i'd attack it
09:11 timotimo an index to the string heap is a 16 bit integer, isn't it?
09:17 jnthn 32 now
09:17 dalek MoarVM/named_to_positional: aa55bbc | (Timo Paulssen)++ | docs/bytecode.markdown:
09:17 dalek MoarVM/named_to_positional: fix highlighting in vim ~_~
09:17 dalek MoarVM/named_to_positional: review: https://github.com/MoarVM/MoarVM/commit/aa55bbc452
09:17 dalek MoarVM/named_to_positional: 738a4cd | (Timo Paulssen)++ | docs/bytecode.markdown:
09:17 dalek MoarVM/named_to_positional: initial spec attempt for callsites storing named arg names.
09:17 dalek MoarVM/named_to_positional: review: https://github.com/MoarVM/MoarVM/commit/738a4cdd20
09:17 timotimo OK
09:17 jnthn after doing r-j where JVM has 16-bit ones I...learned ;)
09:18 dalek MoarVM/named_to_positional: 5eedf49 | (Timo Paulssen)++ | docs/bytecode.markdown:
09:18 dalek MoarVM/named_to_positional: index to string heap is 32bit big.
09:18 dalek MoarVM/named_to_positional: review: https://github.com/MoarVM/MoarVM/commit/5eedf495f6
09:18 timotimo should i store the names so that they line up with argument numbers or should i "compact" them?
09:19 timotimo (in the actual in-memory callsite, not the bytecode format)
09:19 timotimo i.E. will the MVMString **named_names; begin with one NULL per positional?
09:19 jnthn Hmmm
09:20 timotimo could potentially set the whole thing to NULL and not allocate at all if there's only positionals as a slight optimization
09:20 jnthn well, yeah, you only need it if there are names, for sure
09:20 jnthn I think it'll be a question of looking at what args.c needs to be fast/easily done, tbh.
09:21 timotimo oh, actually, how about this crazy idea:
09:21 timotimo ... yeah, actually a crazy idea, not a good one.
09:21 jnthn Data structures need designing around use cases :)
09:21 timotimo forget about it :)
09:22 jnthn I *think* that args.c may be efficiencly implementable with them compacted.
09:22 timotimo i also just realized that it'd always be a number of NULLs, then a number of names
09:22 timotimo rather than any kind of mixture
09:23 timotimo so just knowing the index of the first named will be sufficient to calculate every named index
09:23 jnthn and you do know that 'cus we cache num_pos
09:23 timotimo sounds great tehn
09:23 timotimo then
09:24 jnthn We don't build a hash table for looking things up 'cus there's not enough names to be worth it almost all the time
09:25 brrt can we do symbolic lookups?
09:26 timotimo huh? where would that hash table live/what would it be used by?
09:27 jnthn well, potentially you could have a name => index hash on a callsite
09:27 jnthn but it's not gonna be worth it
09:27 jnthn lunch time; bbiab
09:27 timotimo ah, aye
09:27 timotimo a linear scan would probably always be better
09:27 timotimo except if you have something like "Hash.new(:foo<bar>, [ and 1000 others ])"
09:28 timotimo that could potentially wreak some havoc
09:29 brrt you know what might be worth it?
09:29 timotimo do tell :)
09:29 brrt sorting the symbols by (symbol pointer value or something like that)
09:30 brrt so that you could - if callsite name lists get very large - always resort to binary search
09:30 timotimo you'll have to give me a bit more context
09:30 timotimo symbol means name of named parameter here?
09:30 brrt yes
09:31 brrt with the added notion that the named parameter should / could be 'normalized' - i.e. all instances refer to the same in-memory-object, so that comparison is pointer-comparison
09:31 brrt (that is to say, equality is pointer-equality, not what i said)
09:32 brrt binary search is really cheap and memory-efficient
09:32 timotimo i'm not entirely sure how exactly that plays together with all other pieces of the puzzle
09:33 brrt neither do i
09:34 * timotimo has much code to read
09:34 brrt :-)
09:34 timotimo hey, if you want to do something string-related that'll probably pay off in memory usage:
09:35 timotimo there's strings that show up incredibly often, over and over again
09:35 timotimo i've tried to add a string interning step to the minor collection of the nursery before, that didn't work out well
09:35 brrt such as?
09:35 brrt hmmm
09:35 timotimo "dotty"
09:35 timotimo (this is just from settings compilation, though)
09:36 brrt as part of collection that wouldn't be ideal i imagine
09:36 brrt you'd ideally want the compiler / mast to fix that
09:36 timotimo not possible
09:36 timotimo "foobar".substr(1, 2) ← how to? :)
09:36 brrt … possible for spesh?
09:36 timotimo the compiler/mast already has a string heap
09:37 timotimo https://gist.github.com/timo/e1af6d5c10a4e34d6cb0 ← check it.
09:37 timotimo that is a random sub-sample of the gen2
10:28 jnthn I think we need to keep GC simple, fwiw.
10:28 jnthn Otherwise we make pause times worse.
10:45 lizmat joined #moarvm
10:50 lizmat_ joined #moarvm
11:11 brrt joined #moarvm
11:13 * brrt is checking
11:13 brrt wow
11:13 brrt these are all different strings?
11:14 tadzik so "OPER" is there 876 times?
11:22 brrt as in, different sections of memory
11:22 tadzik that's my understanding
11:22 brrt wow
11:22 brrt (again)
11:25 brrt ok, and name-lookups are string comparisons?
12:22 lizmat joined #moarvm
12:38 jnthn I suspect the OPER thing gets fixed with the arnsholt++ work on O
12:38 jnthn mmmm...cheesecake
13:52 vendethiel joined #moarvm
14:04 jnap joined #moarvm
14:31 dalek joined #moarvm
14:31 btyler joined #moarvm
14:36 jnap1 joined #moarvm
14:38 synopsebot joined #moarvm
14:40 masak joined #moarvm
15:12 timotimo and the lowered_param_N thing i've fixed manually by stashing these strings in the Actions (i think) where they were previously generated with string concat
15:30 timotimo jnthn: we don't inline method calls yet; is that something the specializer will do at some point? or will we leave that for the jit?
15:32 timotimo hmm. maybe the callsite used in a specialized piece of bytecode could have extra information attached to it that the specializer could then use to improve calls that come from the specialized bytecode
15:36 timotimo oh, i think i understand why it's hard' in many cases we just put a slot for a little cache into the specialized bytecode to be filled later
15:36 timotimo so at specialize time we may not even know a single likely candidate
15:41 lizmat joined #moarvm
16:01 cognominal joined #moarvm
16:54 jnthn timotimo: Well, the specializer and the JIT are very related
16:55 jnthn Such that if the specializer learns to inline then that's done the work for the JIT to also, I expect
16:55 jnthn Or most of the work at least
16:55 jnthn The way we de-opt in the two cases may be different.
16:56 timotimo mhh, okay
16:57 jnthn On "don't know what's likely", one of the later things we'll do is 2-stage spesh
16:58 timotimo jnthn: how do you recommend i tackle the kind of daunting task to change argument handling from "two arguments in order" to "names in the callsite"?
16:58 jnthn The first will introduce various "recording" instructions; if the spesh remains hot enough then we'll use them to emit an even specialer version with guard clauses.
16:59 timotimo ah, these recording instructions will be using the spesh slot mechanism?
16:59 jnthn timotimo: Well, my plan was to get the bytecode writer to emit the right thing first, and then update the bytecode reader to read them in.
16:59 jnthn yeah, will use spesh slots for it.
17:00 timotimo i am not quite convinced that this will work fine without a new stage0
17:00 jnthn And then rebootstrap so we always have them.
17:00 jnthn And then switch args.c over
17:00 timotimo oh, that makes sense
17:00 jnthn oh, it'll want a new stage0
17:00 jnthn Maybe twice over.
17:01 timotimo even if i have to make two, should i commit both to the repository?
17:01 jnthn Anyway, my idea was to ween us off needing to put the names in.
17:01 jnthn Sure
17:01 jnthn You'll be doing it in a branch
17:01 timotimo i already am :)
17:01 jnthn They should be the only NQP changes needed for this.
17:02 jnthn So then we just merge --squash the branch.
17:02 timotimo ah, fair enough
17:02 timotimo what does "ween us off needing to put the names in" mean? o_O
17:03 jnthn Hack until deleting the argconst_s instructions for names doesn't break anything:)
17:03 jnthn Can leave the holes
17:04 jnthn And deal with them once everything works.
17:05 timotimo so 1) write the names into the callsites and serialize that, 2) build a new stage0 with the names in the callsites, 3) remove the argconst_s bytecodes that put the names in the even slots
17:05 timotimo and then build an even newer stage0 that doesn't have the argconst_s bytecodes at all any more
17:07 jnthn uh, 1 is really "and deserialize too"
17:07 timotimo ah, yeah
17:07 jnthn And 3 is a lot of work to mkake it possible :)
17:07 jnthn *make
17:07 timotimo i should have put a "just" in there for good measure :D
17:10 timotimo run-time named arguments and |%foo are handled by creating a new callsite on the fly, right?
17:10 jnthn yeaah
17:10 jnthn We ignore any flattening callsites in spesh
17:10 timotimo that'll want fixed, too somewhere between 1 and 3
17:10 jnthn And will do for the future
17:10 timotimo oh, hold on; i thought i was supposed to change named argument passing for everything ever
17:11 jnthn you still need to update it :)
17:11 jnthn Just saying that spesh isn't going to try and deal with |
17:11 jnthn Not for the time being anyway.
17:12 timotimo update what now?
17:12 timotimo sorry, i seem to be having a brainfart or something
17:12 timotimo .o( brainfort, much cooler than a blanketfort )
17:12 timotimo spesh isn't, but the regular code is
17:14 jnthn update the flattener
17:14 timotimo yea. i was going to do that
17:14 jnthn Note you'll need to do GC updates also.
17:14 timotimo oh? how so?
17:15 jnthn 'cus callsites now point to MVMString which is collectable
17:16 timotimo ah, that makes sense
17:44 lizmat joined #moarvm
17:45 lizmat joined #moarvm
17:55 benabik joined #moarvm
18:47 woolfy joined #moarvm
20:07 zakharyas joined #moarvm
20:18 brrt joined #moarvm
20:29 timotimo the interning mechanism doesn't count nameds?
20:29 timotimo the arity it considers seems to be only the number of positionals
20:30 jnthn we don't intern named things, right.
20:31 timotimo oh, i see that now
20:31 timotimo arg_count != num_pos
20:31 timotimo that line, i overlooked
20:31 timotimo so i won't need to teach the interner about nameds yet
20:49 jnthn no, that can come later
20:50 timotimo is the "names_used" mechanism still needed the way it is right now after the refactor?
20:54 brrt (reading backlog) yes, its deopt that will differ rather substantially
20:55 jnthn timotimo: Well, we need it to behave the same. Doesn't have to work exactly the same.
20:55 timotimo mhm
20:56 timotimo args_proc_init would introspect the args MVMRegister assuming it's an array-like and get the strings from the args list, yes?
20:56 timotimo or am i looking at the wrong thing?
20:58 jnthn sounds like
20:58 jnthn I mean, it doesn't introspect args today
20:58 timotimo i feel kinda dumb. maybe today isn't a good day
20:58 jnthn It knows what's there from arg_flags
20:58 timotimo i thought that thing is what creates the callsite and the callsite is supposed to have the nameds set
21:00 lizmat joined #moarvm
21:07 brrt joined #moarvm
21:12 timotimo huh. so MASTNode *args is supposed to be some kind of array of MVMObjects
21:12 timotimo among them should be strings for the nameds, right?
21:14 timotimo when i use ATPOS_S_C, the get_str gets passed a null pointer
21:14 timotimo oh, huh.
21:15 timotimo yeah, i confused flags and args
21:15 timotimo there's one flag for two args
21:19 woolfy joined #moarvm
21:22 timotimo i'm not doing very well right now >_<
21:22 jnthn Well, it's probably a little tricky...
21:22 timotimo what i'm trying to do here should be simple, though :)
21:22 jnthn timotimo: Did you give up on the NQP regex opt thingy?
21:23 timotimo for now, yes
21:23 jnthn OK, I may task steal that.
21:23 timotimo right now i want to teach the mast compiler to write out the nameds into the callsite
21:23 timotimo thus, when looping through the flags, i num_nameds++ if i see MVM_CALLSITE_ARG_NAMED | _FLAT_NAMED
21:23 timotimo then, i go from 0 to num_nameds and get args[(elems - num_nameds) + i * 2]
21:24 timotimo i tried that with ATPOS_S_C, but that didn't work, neither did ATPOS_S
21:24 timotimo and (MVMString *)ATPOS(...) didn't work either
21:25 timotimo what i get from there is a p6opaque at least, so it *could* be a P6str
21:26 brrt joined #moarvm
21:26 timotimo huh, the object i get there has no unbox_str_slot, though
21:27 timotimo say, could we perhaps put the name of the class we've created into the MVMP6opaqueREPRdata?
21:29 jnthn Oh
21:29 jnthn It's perhaps a MAST::SVal
21:29 jnthn Which contains the string
21:29 timotimo ah, that would be helpful
21:30 jnthn For "what class is this" I think it's more "what type is this" and belongs on the STable.
21:30 timotimo hm. yeah probably
21:31 timotimo value = <error reading variable: Cannot access memory at address 0x3f>
21:31 timotimo that doesn't seem right
21:33 timotimo AFK for a bit
21:52 timotimo oh btw
21:52 timotimo i've looked and it *is* a p6opaque
21:55 timotimo interesting. a few times the function i'm hacking in actually succeeds, even with nameds
22:19 brrt left #moarvm
22:42 timotimo well, i'm certainly at a loss here.
22:43 timotimo ah, ok
22:43 timotimo it's due to the fact that it's FLAT_NAMED
22:43 timotimo or rather: the first one that is FLAT_NAMED breaks my code
22:43 jnthn ah
22:43 timotimo how are those treated?
22:43 jnthn A flat named is really a positional...
22:43 jnthn That will be flattened.
22:44 timotimo so i shouldn't be looking at the object that's put there at all to extract a name from it
22:44 timotimo in what combinations do those flattened nameds appear?
22:44 timotimo always in the last slot? always at most one?
22:47 jnthn Not sure off hand...
22:47 jnthn See QASTOperationsMAST.nqp
22:47 timotimo thanks
22:47 jnthn Near call and callmethod there's some code that sorts things out
22:47 jnthn I know it makes sure nameds come later.
22:48 timotimo ah, so maybe all i'll need to do is just ignore the NAMED_FLAT flag
22:50 timotimo wow, that way i actually get one file to compile and the next error i'll have to hunt is All positional args must appear first
22:51 jnthn That's in bytecode.c iirc
22:51 timotimo the verifyer, aye
22:53 jnthn Hmmm
22:53 jnthn grr
22:54 timotimo ah, that's probably because it's trying to read the callsites
22:54 timotimo and it's not reading the named args
22:54 timotimo it just treats them as if they were the next callsite :)
22:55 jnthn Oh!
22:55 jnthn Meanwhile, I got a version of the regex thingy that calls back into the main optimizer
22:55 timotimo also: i said "the verifyer, aye" which was untrue
22:55 timotimo i had one, too. it was just that it horribly broke compilation :)
22:55 jnthn Well, mine does pass the NQP tests and build Rakudo.
22:56 jnthn It for some reason doesn't ever lower self.
22:57 timotimo well, that's much better than what i had already!
22:57 timotimo cool :)
23:00 jnthn ohh
23:00 jnthn tssk
23:02 timotimo ah dangit
23:02 timotimo now i'm trying to read in the stage0 and of course i can't
23:02 timotimo i'll need a bytecode version incraese
23:03 jnthn Yes, you need to bump the bytecode version and maintain backcompat.
23:03 timotimo yup
23:03 timotimo very obvious in retrospect :)
23:03 jnthn .oO( A description of much of computer science... )
23:04 jnthn Figured out why it didn't lower self.
23:04 jnthn However, the cursor match variable is gonna be a stiffer challenge.
23:04 timotimo mhm :|
23:04 timotimo but it'll probably really be worth it
23:06 jnthn yeah, 'cus it's the only remaining lexical now.
23:06 jnthn In most rules
23:08 FROGGS joined #moarvm
23:10 * jnthn is pondering an asserttype op for conveying stuff that should always be true.
23:12 jnthn spesh can leave it in place (unless it can prove it's not needed), and then copy the type info.
23:13 jnthn Essentially a way for code-gens and optimizers to convey to the VM, "I'm really really sure this is true even if you can't prove it; blow up if it ever ain't, and optimize assuming it is"
23:13 timotimo right
23:14 jnthn Immediate use case is that !cursor_start always returns something of the same type as self.
23:14 jnthn Granted some day inlining might get it.
23:15 jnthn But !cursor_start is perhaps a little big to be an obvious inline.
23:17 jnthn Well, patch passes spectest too
23:17 timotimo now i'm getting bogus data in my arg_name array :\
23:17 timotimo i'm setting it with get_heap_string, that seems correct, aye?
23:18 timotimo (gdb) print frame->cur_args_callsite->arg_name[0]
23:18 timotimo $4 = <error reading variable: Cannot access memory at address 0x34>
23:18 timotimo just ... huh?
23:20 timotimo i must be creating a callsite somewhere and forgetting to null that out or something
23:20 jnthn that looks...odd, yeah.
23:21 jnthn 0x34 is clearly not a pointer.
23:21 jnthn I just pushed my NQP patches
23:22 woolfy left #moarvm
23:23 jnthn No significant effect yet.
23:24 timotimo is it aborting for some other reason?
23:25 jnthn Well, it can't lower $¢ yet
23:25 jnthn But also it could benefit from the asserttype thing I mentioned
23:27 timotimo i'm segfaulting and i have no idea where this data could possibly come from
23:27 jnthn Anyway, need sleep...still exhausted from last day's teaching/bad sleep...
23:27 jnthn 'night
23:27 jnthn and good luck ;)
23:28 timotimo oh, huh
23:28 timotimo it's actually a straight-up null pointer
23:28 timotimo good night and rest well!
23:31 timotimo (gdb) print num_nameds
23:31 timotimo $8 = 32783
23:31 timotimo yeah, that's not so probable
23:32 timotimo ah, yeah, C doesn't null out variables on the stack for you
23:34 timotimo what in the ...
23:36 timotimo i'm getting a segfault in MVM_gc_worklist_add
23:36 timotimo and, this happens: (gdb) print /r **(MVMCollectable **)(frame->cur_args_callsite->arg_name[0])
23:36 timotimo Cannot access memory at address 0x38000800000001
23:42 timotimo other places use MVM_ASSIGN_REF for the result of get_heap_string, but I can't do it here, because the root, in this case the callsite, isn't an MVMCollectable
23:42 timotimo does everything that has MVMCollectables inside them have to be an MVMCollectable?
23:55 timotimo i guess i'll go to sleep now

| Channels | #moarvm index | Today | | Search | Google Search | Plain-Text | summary