Perl 6 - the future is here, just unevenly distributed

IRC log for #opentreeoflife, 2015-04-28

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:36 kcranstn joined #opentreeoflife
00:56 jar286 joined #opentreeoflife
01:16 jar286 joined #opentreeoflife
01:56 ilbot3 joined #opentreeoflife
01:56 Topic for #opentreeoflife is now Open Tree Of Life | opentreeoflife.org | github.com/opentreeoflife | http://irclog.perlgeek.de/opentreeoflife/today
09:32 mtholder joined #opentreeoflife
10:55 mtholder joined #opentreeoflife
11:24 josephwb joined #opentreeoflife
11:24 josephwb you there jimallman
11:28 josephwb you there mtholder?
11:40 josephwb joined #opentreeoflife
11:49 jar286 joined #opentreeoflife
12:16 josephwb joined #opentreeoflife
12:18 mtholder hi, josephwb. sorry I missed you pinging me earlier.
12:22 josephwb hey
12:22 josephwb you are writing up the subprob stuff, right?
12:23 josephwb the earlier stuff, where contested taxa were pruned, caused unsupported nodes (taxa absent from subprob newicks, but present in full testing newicks)
12:24 josephwb the new stuff (where contested taxa are replaced by polytomies) doesn't work, as treemachine treats them as hard polytomies
12:24 josephwb mtholder^
12:24 josephwb mtholder ^
12:24 mtholder I see what you wrote, but I'm confused.
12:25 josephwb ok, which part?
12:25 mtholder I wrote about the decomp. So yes to that.
12:25 mtholder the "earlier stuff" means the bug in otcetera?
12:25 josephwb no
12:25 josephwb well, not really
12:26 mtholder What did you mean by "earlier stuff"?
12:26 josephwb you used to drop contested taxa from the subproblem newicks, right?
12:27 mtholder no, they are in the taxonomy.
12:27 mtholder they just don't generate a subproblem
12:27 josephwb oh, we drop the taxonomy bit
12:27 josephwb sorry
12:27 josephwb i mean: a contested taxon is a tip in an input tree
12:27 mtholder it is just a pruned version of the OTT
12:27 mtholder so if you have OTT in the db, then it is the same info
12:27 mtholder (just pruned)
12:28 mtholder Oh. gotcha
12:28 josephwb yeah, sorry i wasn;t clear
12:28 josephwb say a tree has (A,(B,(C,D))));
12:28 mtholder no, the tips mapped to nonterminal taxa are now expanded earlier in the pipeline
12:28 mtholder so there are no contested tips
12:28 josephwb if C is contested, it was pruned, and added by taxonomy
12:28 mtholder in the phylo inputs.
12:28 josephwb ok, yes. now they are expanded, but as a polytomy
12:29 mtholder Now, C would be expanded to a set of exemplars. (From that taxon)
12:29 mtholder https://docs.google.com/document/d/1qq9VZccfPMG9Xic0wmp5BXMur98KrjXOY3-ZVuKzz1U/edit#heading=h.w83wi7lqiwyn
12:29 mtholder is the doc on that section.
12:29 josephwb ok, i am doing this poorly ;)
12:29 josephwb i understand things
12:30 josephwb just saying that earlier version of pruning didn't work
12:30 josephwb new version of expanding doesn;t work
12:30 josephwb for treemachine
12:30 mtholder what doesn't work about the  new system?
12:30 josephwb the expanded taxa are interpreted as a hard polytomy
12:30 josephwb not soft
12:31 josephwb does that make sense?
12:31 josephwb maybe we should video chat this
12:31 mtholder I understand what you are saying
12:31 mtholder just not why they are treated as hard polytomy.
12:32 mtholder fwiw the doc section for the supp mat that I wrote was at https://docs.google.com/document/d/1qq9VZccfPMG9Xic0wmp5BXMur98KrjXOY3-ZVuKzz1U/edit#heading=h.59cgjj64dpw4
12:32 josephwb treemachine has no concept of a soft polytomy: trees that come in are taken as given
12:33 josephwb anywho, we are using a hybrid approach: using the old subproblems (where contested were pruned), but adding the pruned back in
12:33 mtholder that is unfortunate. I knew that was the case of earlier versions.
12:33 josephwb it gets rid of *almost* all of the unsupported nodes
12:35 josephwb because now subproblem newicks correspond with the testing (full) newicks
12:35 josephwb 'if that makes sense
12:36 mtholder but the tip that is mapped to a contested taxon is interpreted as supporting the contested taxon?
12:37 josephwb the contested taxon may be supported by other trees
12:37 josephwb maybe all but 1 tree
12:37 mtholder (also note that the bug fixed in the new decomposition had to do with resolving ancestral polytomies - not tips mapped to higher taxa)
12:37 mtholder so I don't think that the old subproblems should be used.
12:39 josephwb i did notice weird things with polytomies
12:40 josephwb like ((A1, A2, B1, B2),C) would be ((A,B),C)
12:42 josephwb the concern with the contested stuff is that if _just_ one (maybe not-so-great) study finds a taxon to be non-monophyletic, but all others (even if they are all higher ranked) find it to be monophyletic, then it is "contested"
12:42 josephwb treemachine is happy to have such conflict in the graph, but cannot handle it via soft polytomies
12:42 mtholder yes. that is an issue as we get more trees (the contested bit).
12:43 josephwb btw the new db is up on devapi if you want to test things
12:43 mtholder it would be trivial to add an arg for "ott IDs to generate subproblem regardless of whether or not they are contested"
12:43 mtholder from the old, buggy subproblems?
12:43 jimallman hi folks, here now… catching up on above.
12:43 josephwb that would work for us
12:43 josephwb jimallman we have a distinct issue
12:43 josephwb sent via email
12:44 mtholder I mean "what is on devapi, was created from the old, buggy subproblems?"
12:44 josephwb the bits above need not worry you jimallman
12:44 josephwb old "fixed" subproblems ;)
12:45 josephwb it is the version that everyone signed off on, despite having many unsupported nodes
12:45 mtholder "fixed" how? by just adding the pruned taxa?
12:45 * jimallman is checking email...
12:45 josephwb bascially, the version right before the contested-expansion
12:45 josephwb mtholder: yes
12:46 josephwb there were a (not-so-small) number of trees involved in the contested nodes. it was possible to find out what was missing (pruned) and add it back it
12:46 josephwb back in
12:48 josephwb regarding your idea of: "ott IDs to generate subproblem regardless of whether or not they are contested" that would work (i think)
12:49 mtholder it is an easy way to generate a constraint. but would need some tweaks in otcetera.
12:49 josephwb alternatively: "for contested taxa that are slated to be expanded, don't. just leave'em"
12:50 josephwb we don't need to have a separate subproblem for those contested taxa
12:50 josephwb probably don't want them
12:51 josephwb how i see things: the number of subproblems will remain the same, just the contents of some are different (those that have contested taxa as tips)
12:51 josephwb yeah, we would want the second option. don't know how hard that would be
12:53 josephwb sorry to bring this up out of the blue. was experimenting with weird unsupported nodes, found i could get rid of just about all of them
12:53 mtholder I don't know what to say.
12:53 josephwb jimallman do you understand my problem with the tree browser?
12:54 jimallman josephwb: yes, arguson is missing lots of info (email on the way)
12:54 josephwb it sees that there are supporting sources, but they are blank
12:54 josephwb oh, okay
12:55 josephwb curl calls work; i don't really understand the voodoo that arguson does
12:55 josephwb mtholder maybe we should video chat instead? this (meaning i) is not efficient
12:55 jimallman treemachine assembles the data, so my guess is that supporting information is missing from its graph db.
12:56 josephwb no it is there jimallman
12:56 mtholder I'm not in a spot where talking out loud works.
12:56 mtholder but I can chat before the call later, if you like.
12:56 mtholder I'll be home then.
12:56 josephwb ok. i need to get to the office. anytime that works for you will be fine
12:57 josephwb maybe between 11:00 and 12:00 my time?
12:57 josephwb so, starting 2 hours from now
12:57 mtholder how about 11:45 your time
12:57 mtholder my bus should have me home by then.
12:57 josephwb jimallman some nodes will say "supported by"
12:58 josephwb mtholder ok, see you then
12:58 josephwb jimallman the "supported by" is blank
12:58 josephwb some have several entries
12:58 josephwb let me see if i can find one
12:58 jimallman yes, there’s just enough information in soureToMetaMap to indicate support, but no further details to show
12:58 jimallman sourceToMetaMap
12:59 josephwb ok, i will see wots wot
13:00 josephwb figured if curl calls worked, browser would. but arguson uses something else that i am not touching in testoing
13:02 jimallman what kind of curl calls are you talking about? to treemachine / getSyntheticTree?
13:02 josephwb jimallman: https://devtree.opentreeoflife.org/opentree/otol.draft.22@3875492/Melanitta--Anas-bernieri
13:02 josephwb 2 sources there, just blank
13:02 josephwb is this a format thing?
13:03 jimallman it’s because sourceToMetaMap says there are two supporting studies, but then it has no more information to show
13:03 josephwb hmm
13:03 jimallman missing data (see screenshots in previous email)
13:03 josephwb ok, i think we are almost there, then
13:03 josephwb will do. have to catch bus. l8r
13:26 kcranstn joined #opentreeoflife
13:39 mtholder left #opentreeoflife
14:03 scrollback3 joined #opentreeoflife
14:03 kcranstn joined #opentreeoflife
14:13 5EXAA3F3S joined #opentreeoflife
14:16 64MACPRHN joined #opentreeoflife
14:18 josephwb joined #opentreeoflife
14:18 josephwb hey jimallman
14:18 josephwb i think i see what the biz is with supported soucres
14:19 josephwb what information do you use from the sourceToMetaMap?
14:19 josephwb maybe just "ot:studyId"?
14:22 jimallman all the information that’s normally displayed comes from here
14:22 jimallman publication reference is the most obvious
14:23 josephwb you need it all?
14:23 jimallman also link to study in opentree, supporting tree id, curator name
14:23 jimallman i think we want it all. :)
14:23 josephwb ok
14:23 josephwb it came from nexsons. we don't use nexsons anymore
14:23 josephwb you cannot gran the remainder from "ot:studyId"?
14:24 jimallman see tree.opentreeoflife.org for example  click the ‘i’ next to Bacteria
14:25 jimallman if the information is not readily available in treemachine, we can probably add a quick fetch to get it on-demand. i’d need to review all the fields and see how much we need.
14:25 josephwb ok
14:25 josephwb it is possible to add it to the db
14:26 kcranstn let’s talk about the details in the latter half of today’s meeting
14:26 jimallman sounds good
14:26 josephwb you are currently getting:
14:26 josephwb "source" : "ot_104_1_a2c48df995ddc9fd208986c3d4225112550c8452"
14:26 jimallman yes, i noticed that the single ‘source’ value has key ids in it
14:26 josephwb could parse that: studyid_treeid
14:26 jimallman right, and SHA
14:26 jimallman not bad
14:27 josephwb yes
14:27 josephwb just an alternative
14:27 josephwb (work for you, not for me ;) )
14:27 jimallman jar286 has a long-standing request for this kind of support information on-demand, so maybe we can kill two birds with one stone.
14:27 josephwb but yeah, i can fix it on our end
14:27 josephwb whatever people want
14:28 kcranstn can the two of you come up with some (rough) estimates on time for the two approaches?
14:28 kcranstn deadline for resubmission looming
14:29 josephwb for me: need to 1) code (maybe a few hours, max), 2) rebuild the db (~6-8 hours until it is back up)
14:29 jar286 jar286’s request IIRC was for a better division of labor. stephen said it’s not treemachine’s job to provide this info and I agree. but whatever’s most expeditious at this point
14:29 jar286 some support info has to come from treemachine, but that could be keys to be used with oti for drilldown
14:30 josephwb at the moment we provide supporting sources as: studyid_treeid_gitsha only
14:30 jar286 right. so how hard would it be to get that from oti?
14:30 jar286 i mena, the additional info
14:39 josephwb joined #opentreeoflife
14:43 josephwb disconnected. that question was for jimallman, right?
14:44 josephwb the question "how hard would it be to get that from oti?"
14:44 jimallman i’ll need to check oti to see what’s available. i know we have some oustanding issues with single-vs-multiple values (oti will only give us one curator name, tag, etc).
14:46 josephwb erg. gotta go again. bbl8r
14:48 jimallman jar286: i’ll have more information on oti fields (vs. what we show for support in tree view) by today’s meeting.
15:07 jar286 ok…
15:08 jar286 it’s mainly bibliographic info
15:28 josephwb joined #opentreeoflife
15:28 josephwb joined #opentreeoflife
15:44 mtholder joined #opentreeoflife
15:45 mtholder josephwb: skyp or G+?
15:45 mtholder + an e in there somewhere.
15:46 josephwb either
15:46 kcranstn should we just have this conversation at the end of the call?
15:47 mtholder OK
15:47 josephwb fine by me
15:49 josephwb jimallman: any developments on how to get the metadata in the tree browser?
15:50 jimallman josephwb: i’m chasing the data to make sure we can get it on-the-fly from oti, almost ready
15:57 jar286 if getting it on the fly is too slow we can optimize by creating a special oti service specializing in what the browser needs… but that should be done only if necessary, the simpler solution is better
15:57 jar286 that is, reduce the number of api calls by bundling
15:58 jar286 my guess is that doing them one node-view at a time might be fast enough
16:01 josephwb sweet
16:14 jar286 josephwb, do you have a way to do diffs on versions of synthetic trees?
16:15 josephwb no
16:16 jar286 dommage.
16:25 josephwb what are you looking for?
16:26 pmidford2 joined #opentreeoflife
16:26 jar286 wondering how you do regression testing. for ott there is a diff file I can look at, which is nice
16:37 josephwb we haven't set anything up
16:39 josephwb suggestions welcome
17:49 jar286 josephwb, at the very least I think you’d want to be able to answer the question, which components have changed since the last run?
17:51 jar286 then you might want, how many (or which) nodes (ingroups) have disappeared?
17:51 jar286 this goes to the question, did the fix to Mark’s code have any real effect on the result?
17:55 josephwb right, i get the purpose, but not how to pull it off.
17:56 josephwb we've had changes to taxonomy, tip labels, and trees
17:56 jar286 you mean algorithmically?
17:56 josephwb yes
17:58 jar286 maybe you could use bit sets, or something like https://github.com/OpenTreeOfLife/reference-taxonomy/wiki/Mutual-MRCA-method-for-finding-merge-compatible-pairs-without-set-operations
17:59 jar286 i see, you’re worried about isolating the cause of a change?
17:59 josephwb yeah
17:59 jimallman joined #opentreeoflife
18:00 jar286 hmm. i was thinking that a bug fix would mean small changes.
18:00 jar286 if a single bug fix leads to changes in taxonomy and tip labels and trees, that makes things much harder.
18:01 josephwb within a single taxonomy and labelling approach, bit sets sounds good
18:35 jar286 josephwb, is it fair to say that we first preprocess each tree, then divide the tree set into subproblems, then for each subproblem {construct TAG, do synthesis}, and finally combine the subproblem-trees to make a whole tree?
18:36 jar286 the document is rather confusing, saying different things in different places
18:37 kcranstn probably better to put questions and comments in the doc than here
18:37 kcranstn cody and stephen rarely here
18:37 jar286 yeah…
18:38 jar286 I want to make a whole bunch of changes
18:38 kcranstn better to add a re-written section than be deleting and heavily revising, I think
18:39 kcranstn at this point
18:39 mtholder I don't think that we do synth on each subproblem.
18:39 kcranstn that’s what cody confirmed
18:39 mtholder I think that all of the subproblems are loaded
18:40 jar286 oh. that’s weird. ok.
18:40 mtholder they don't overlap in a meaningful way, so they don't lead to a combinatorial explosion of nodes and edges.
18:41 jar286 right, but if that’s the case, why keep them all in memory at the same time. that invites worry about whether there is information leakage.
18:41 mtholder works like doing them separately and attaching them later, for all intents and purposes.
18:41 mtholder their in a db not RAM.
18:41 mtholder (I think)
18:42 jar286 not sure that matters… I don’t think the db is used for persistence during the tag/synthesis process, and everything that’s active is cached in memory, so db is sort of an irrelevant detail
18:44 jar286 I’m mulling a reorg with top level sections of ott, treestore, preprocessing, decomposition, TAG, synthesis
18:45 josephwb hey
18:45 josephwb all subproblems are loaded
18:45 kcranstn that’s basically what we have now, but synthesis includes preprocessing, decomp, TAG + synthesis
18:45 kcranstn and is a mess of stuff written by different people at different times
18:46 josephwb there is no individual synth + combining of subproblems
18:46 jar286 currently decomp follows synthesis. this is not the logical order
18:46 josephwb all subproblems are loaded, then we do synth as before
18:46 jar286 ok
18:47 josephwb to cut down on memory, only bits are done at a time
18:47 josephwb these do corrspond to subproblems
18:47 josephwb but it is a single synthesis
18:47 josephwb if that makes sense?
18:47 kcranstn let me make some heading changes to reflect authors
18:48 jar286 I made the ToC more real
18:48 jar286 if synthesis is driven off of the TAG, and the TAG covers the whole tree, how has any memory been saved?
18:48 josephwb so, to be clear: single TAG, single synthesis (although synthesis does a bit at a time to save on memory)
18:49 jar286 if there is a single TAG, how have the subproblems helped you?
18:49 josephwb loading order problems
18:49 josephwb identifies where we can do the subsynth problems to save on memory
18:50 jar286 what memory is saved? bit set formation? what else?
18:51 jar286 is the final TAG you get any different?
18:51 josephwb keeping all synthesis decisions in memory
18:51 kcranstn jar286 - refresh the TOC to get a better sense of history of doc
18:52 jar286 not sure how much I care about how it got into its current state…
18:52 kcranstn helps with why sections ordered the way they are
18:52 josephwb jar286: all conflicting paths are kept in memory until a decision is made
18:53 kcranstn (not that ordering will stay this way)
18:53 jar286 ok, so paths, not just bit sets…
18:54 josephwb yes
18:54 jar286 ‘decision’ re TAG creation or re synthesis?
18:54 kcranstn that’s why there is a decomp section after synthesis
18:54 josephwb synthesis
18:54 josephwb the TAG is done at that point
18:54 josephwb done at loading
18:54 jar286 I thought the TAG was the only input to synthesis. you’re saying the paths are also an input?
18:55 josephwb the paths are the traversals during synthesis
18:55 josephwb some paths conflict
18:55 josephwb keep all of them iun memory until we have to choose
18:55 josephwb i am not being cleare
18:56 josephwb every edge is traversed
18:56 josephwb in the TAG
18:56 jar286 I was asking about the TAG creation phase, why it uses less memory when done divide-and-conquer rather than all at once, and whether the final TAG is the same in both cases
18:57 josephwb from amongst conflicting edges
18:57 jar286 you said synthesis is not done divide and conquer, so it can only be affected if the TAG is different with and without
18:57 jar286 divide and conquer
18:57 mtholder my interpretation: no, not the same TAG.
18:57 josephwb subsproblems on loading solves ordering issues
18:57 mtholder subp not subsp
18:57 josephwb subproblems on synthesis solves memory issues
18:59 josephwb mtholder ?
18:59 mtholder yes
18:59 jar286 but i thought you just said you don’t do subproblems for synthesis.
18:59 josephwb there is only ever a single TAG
18:59 mtholder the findSums and walkPaths don't scale well when the input trees overlap alot.
18:59 mtholder the subproblems break things up.
19:00 mtholder 1 input tree becomes a forest of tree fragments.
19:00 jar286 if you do a run without divide-and-conquer, and one with, then there are two TAGs, one for each run. my question is whether they’re the same TAG, or different ones.
19:00 mtholder fewer of the fragments overlap, so the scaling is a lot better.
19:00 josephwb this is how synthesis proceeds: start at root, determine the topolgoical ordering, when subproblems are encoutered do them, release memory, go on to next
19:00 mtholder Different
19:01 josephwb but a single traversal
19:01 jar286 i’m really puzzled now.
19:01 jar286 what is the input to synthesis?
19:01 jar286 or, what are the inputs?
19:01 josephwb the TAG with all subproblems loaded
19:02 josephwb every edges/node is traversed
19:02 jar286 is that TAG the same as what you’d get if there were a different division into subproblems?
19:02 josephwb should be
19:02 kcranstn really? doesn’t seem like it would
19:02 josephwb subproblems do not conflict with eachother
19:03 jar286 ok, so then synthesis cannot know about the subproblem structure, unless it’s a second input to synthesis
19:03 jar286 since the TAG does not encode the subproblem breakdown
19:03 josephwb well, the subproblems are loaded, which treemachine does, so it knows about them
19:03 jar286 as you just said
19:03 jar286 ok, so the subproblem structure is a second input to synthesis. that’s what i’ve been asking
19:04 josephwb i guess, yeah
19:04 josephwb but that is built into the db
19:04 josephwb TAG
19:04 josephwb there is an index, in the TAG, of the roots of the subproblems
19:05 jar286 I’m sort of tearing my hair out here
19:05 josephwb they are all connected
19:05 josephwb i think video would be more efficient here
19:06 codiferous joined #opentreeoflife
19:07 kcranstn hey codiferous - thanks for joining
19:07 jar286 well maybe your last comment answers the question. I think this is important for the method description. I’ll ask Cody about it
19:07 codiferous hey kcranstn. just looking through the history now
19:09 josephwb in case it was not obvious: the TAG contains the taxonomy prior to loading of the subproblems
19:11 josephwb because subproblems are all ott taxa, they are all connected
19:11 josephwb so: single synthesis
19:14 codiferous just going to address things i see in the log as i see them.
19:14 codiferous jar286: "I don’t think the db is used for persistence during the tag/synthesis process, and everything that’s active is cached in memory, so db is sort of an irrelevant detail"
19:14 codiferous the db is used for persistence
19:14 jar286 oh, how?
19:14 jar286 I mean beyond making the webapp work
19:14 codiferous each subproblem is loaded independently into the db
19:15 codiferous once they are all done, then synthesis is performed over the entire db (i.e. tag)
19:16 codiferous jar286: "if synthesis is driven off of the TAG, and the TAG covers the whole tree, how has any memory been saved?"
19:17 codiferous the synth procedure is an iteration over all subproblems, in topological order wrt the taxonomy
19:17 jar286 sounds like divide-and-conquer is mainly for synthesis, not TAG construction?
19:18 josephwb divide-and-conquer avoids the ordering problems
19:18 josephwb on loading
19:18 codiferous well, the loading procedure does each subproblem independently
19:18 codiferous so i would say that is pretty divided
19:18 codiferous and the division greatly reduces the complexity of loading
19:18 jar286 i’m sorry i’m not being clear.
19:21 jar286 yes, that’s what i’m trying to figure out, how does d&c reduce complexity of loading? does it only affect running time of loading, or does it also affect the TAG that results?
19:22 jar286 i.e. does the TAG have fewer nodes and/or edges as a result of d&c?
19:22 kcranstn from mth: “This decision can reduce the total number groups from input trees which are displayed by the supertree because some cases of conflict with the taxonomy only arise through the interaction of multiple input trees. In other words, the decision to constrain uncontested taxa may affect the output supertree (not just the running time)"
19:22 jar286 I was asking about TAG creation, not synthesis
19:22 kcranstn that is from the Decomposition into subproblems by uncontested taxa
19:23 kcranstn sectin
19:23 kcranstn section
19:23 codiferous yes, the tag will have fewer nodes/edges when divided into subproblems, because the input tree nodes/taxa that go into each subproblem are only considered against one another
19:25 codiferous without subproblems, all input tree nodes are compared against one another
19:25 jar286 ok. so, in effect, logically, you have lots of little TAGs, which just happen to live in the same database, right?
19:25 codiferous you could think of it like that
19:26 codiferous but they have overlap at common nodes
19:26 jar286 yes, but that’s easy since the common nodes are all labeled
19:26 kcranstn there are edges between the little TAGs
19:26 codiferous no edges between the little tags
19:26 jar286 not edges, nodes
19:26 codiferous but nodes in common
19:27 codiferous each subproblem root node is also a tip node for another subproblem (except the deepest subproblem)
19:27 kcranstn right
19:27 jar286 so my next question is, is synthesis of subproblems mutually independent? that is, even though it happens that it’s all done in one address space and in one fell swoop, would it be possible in principle to do a bunch of little separate syntheses, and then stitch them together?
19:28 josephwb subproblems would need to know boundaries of subproblems; we only have roots
19:28 codiferous in essence that is what it does. but they are not actually independent: each subproblem synth depends on the result of synth for its child subproblems
19:29 jar286 how is that possible?
19:29 codiferous how is what possible? that it depends on the results for its children?
19:29 jar286 yes
19:29 codiferous hm, because it evaluating sets of paths
19:29 codiferous and it needs to know what paths to compare
19:30 josephwb we only store subproblem roots in the index, not the full subproblem breadths
19:30 josephwb need to do children before parents
19:31 josephwb e.g. subprobs might be: Birds, Waterfowl, Ducks, some genus of duck
19:31 jar286 let me try again.  suppose you did synthesis on each subproblem separately, on its own little mini-TAG. then, you stitch all those trees together to get a big tree. would that big tree be different from the big tree you get with the current method?
19:32 jar286 I’m trying to think about this in terms of input/output flow, not tied to the accidents of the implementation.
19:32 codiferous i think it could be different, yes
19:32 codiferous let me try to break it down a little
19:33 jar286 ‘could be’ might mean because of nondeterminism, and that’s not what I’m talking about
19:33 josephwb i don't thin keach subproblem is guaranteed to be in the synthetic tree
19:33 codiferous i think they are
19:33 josephwb er, maybe
19:33 josephwb trying to think of an example
19:33 josephwb i think you are right, tho
19:34 codiferous synth is optimizing based on information about represented trees
19:34 codiferous so consider some subproblem A that has children X and Y. without solving synth on X and Y, it's not known what tree info is going to be represented in those synth results
19:35 codiferous but the decision whether to include some other edge q in A might depend on that information
19:35 jar286 why would the X or Y synth be relevant to A? we already know there’s no conflict across subproblem boundaries.
19:36 codiferous ah, yes that is true
19:36 codiferous but that is just taxonomic overlap that we know does not exist
19:38 jar286 none of the source trees conflict with any boundary node.
19:38 codiferous the synth also uses information about edges in source trees, but it may be true there as well that one source tree edge cannot be represented in multiple subproblems
19:40 codiferous currently, synth uses topo order because the implementation requires it, otherwise it would crash from lack of available data it expects
19:40 codiferous but i see why it seems like synth may be possible on each subproblem independently
19:40 jar286 there are many topo orders, and at least one is consistent with the subproblem structure
19:41 codiferous any topo order wrt the taxonomy is fine, there are many
19:42 codiferous i suppose the difference in implementation would just be that within some subproblem, it would treat the root nodes of other subproblems as tips
19:43 jar286 and that would yield the same overall result, I believe
19:43 codiferous i do not see any reason why that should not work. that said, i have not tried that at all
19:44 jar286 if we can say in the writeup that the treatment of the subproblems is independent, that will be a huge relief for readers
19:44 codiferous i agree
19:44 jar286 it doesn’t matter to me if it’s not implemented that way, so long as it’s true
19:45 josephwb when ranking is the criterion, it should be the same. but not if, say, all else being equal, decisions were made based on number of descendant taxa.
19:45 josephwb or resolution
19:45 codiferous well, the reasoning is:
19:46 codiferous 1. all root-tip paths within a given subproblem must pass through that subproblem root, so all subproblem roots must be in the synthetic tree
19:47 jar286 check
19:48 codiferous 2. within a given subproblem, each other subproblem root node must therefore be present in the synthetic tree for that subproblem
19:48 jar286 I’m not worried about non-priority syntheses for purposes of the supplementary materials, since we don’t report on them
19:49 codiferous assuming that any correct synth includes all possible non-conflicting nodes, there should not be any correct synth result that does not include the subproblem roots
19:49 jar286 check
19:50 jar286 won’t every synth include all tips?
19:51 kcranstn joined #opentreeoflife
19:51 codiferous yes. however, sometimes we add some at the end
19:52 codiferous any tips that are only accessible via paths that are in conflict with a higher ranked path
19:52 jar286 so if you did subproblem-wise synthesis, you’d never drop a subproblem root (appearing as a tip in its parent)
19:53 codiferous well, i'm not what reasoning i give for why we would *never* do that, but i don't think currently we are
19:54 codiferous it would be pretty obvious: that subproblem would be replaced entirely by taxonomy
19:55 jar286 that doesn’t make sense, if the root of the subproblem isn’t in the synthetic tree, how can anything at all be below it, taxonomy or not?
19:55 codiferous it would be replaced
19:55 codiferous but i agree it doesn't make sense
19:56 jar286 … this is not too important, I’m satisfied that there is independence of synthesis for subproblems
19:56 jar286 at least for priority order
19:56 codiferous i just don't know how to explain why we would never drop a subproblem root in the same way we sometimes drop tips
19:57 codiferous but it seems like there should be a reason why this would not happen
19:57 jar286 you said above that every synth will include every tip.
19:57 codiferous ah, sorry for not being clear
19:57 codiferous the result will have all tips
19:57 codiferous the final result, that is
19:57 codiferous because we do something called "adding missing children" to replace tips that are not included in the raw result from synth itself
19:58 codiferous it is possible to lose a tip, if the only paths that go to it are all in conflict with higher ranked paths
19:58 jar286 ah, and *this* operation might not commute with d&c …
19:59 josephwb for non-monophyletic "taxa", unsampled tips have only a taxonomy relationship (to the non-monophyletic node), so have to be added in later
19:59 josephwb jar286 - yes, mark throws out unsampled taxa when doing the decomposition
19:59 jar286 ok, but a boundary node is never one of those
19:59 josephwb yeah
20:00 codiferous yes, it seems like it could be tricky to "add missing subproblems" instead of tips if for some reason we could fail to include a subproblem root in the synth result for a parent subprob
20:00 jar286 but there still might be a different in what bit tree you got, depending on when you did this adding-back (componentwise or final)
20:01 jar286 my guess is that you’ll get the same answer, or pretty darn close
20:01 jar286 in any case this is not fatal to the idea of d&c/synth commutativity
20:01 codiferous hm, it depends on the level on non-monophyly in the parent taxon of the missing tip i think
20:02 codiferous ah, but no taxon can ever be divided among two different subproblems correct?
20:03 jar286 I would assume so, otherwise there’d be some kind of conflict
20:04 codiferous any all subproblem roots have to be monophyletic taxa. so i the add missing children operation should be contained within the appropriate subproblem
20:05 codiferous that procedure just visits each taxon node and attaches any missing children of that taxon to the mrca of the non-missing children in the synth tree
20:07 codiferous but i'm not sure how you know which taxa to visit if you do it independently for each subproblem
20:08 jar286 each subproblem defines a subtree of the taxonomy.  wouldn’t it be those?
20:09 codiferous yes, that should be it
20:13 josephwb but subprobs may not include all taxa
20:14 jar286 if you take the parts of the taxonomy delimited by the boundary nodes of subproblems, and union them all together, you get the whole taxonomy
20:14 josephwb nevermind, i missed some stuff
20:15 josephwb subproblem boundary nodes do not include unsampled taxa
20:15 jar286 ok, thanks codiferous, you’ve confirmed my conjectures
20:16 josephwb unsampled taxa are not their own subproblems
20:19 codiferous np, glad i can help. thanks for helping improve the presentation. fwiw, stephen also agrees
20:20 codiferous i'm going to go now, but will check here/the docs later this evening
21:07 pmidford2 joined #opentreeoflife
23:26 kcranstn joined #opentreeoflife
23:55 kcranstn joined #opentreeoflife

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary