Perl 6 - the future is here, just unevenly distributed

IRC log for #opentreeoflife, 2015-05-14

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:06 jar286 joined #opentreeoflife
01:03 kcranstn joined #opentreeoflife
02:26 jar286 joined #opentreeoflife
04:16 kcranstn joined #opentreeoflife
04:55 kcranstn joined #opentreeoflife
11:35 josephwb joined #opentreeoflife
11:52 kcranstn joined #opentreeoflife
12:21 blackrim joined #opentreeoflife
12:39 kcranstn joined #opentreeoflife
13:48 josephwb joined #opentreeoflife
14:04 jar286 kcranstn, josephwb, blackrim, I think we said we’d touch base via hangout today at 10, is that your recollection?
14:04 codiferous joined #opentreeoflife
14:04 kcranstn I can lurk on the doc / via irc but can’t join a hangout (in midst of workshop at Duke)
14:05 jar286 unfortunately I’m forgetting the agenda. was it just to see what’s needed in the home stretch?
14:05 jar286 I think it’s pretty close
14:06 kcranstn we were hoping that we would now understand the TAG creation and could look at synthesis
14:07 jar286 I’m hoping for a first-principles discussion at some point, I just don’t know if this is that point
14:09 codiferous right. i'm here, and can be on a hangout if that helps
14:10 codiferous will be working on the doc as well
14:13 blackrim I am here and working on things but get pulled away to deal with some SSB planning things
14:13 josephwb do we know yet where the processed newicks are going to live?
14:14 jar286 any reason not to put them on files.opentreeoflife.org ?
14:15 kcranstn joined #opentreeoflife
14:15 josephwb fine with me, but no decision has been made.
14:15 kcranstn that works for me
14:16 jar286 you can upload there using scp or (better) rsync
14:16 josephwb it's been a standing question for a while: https://github.com/OpenTreeOfLife/treemachine/issues/170
14:16 jar286 create a new top level directory, sibling of ott/ and trees/
14:16 josephwb ok, will do
14:16 jar286 they’ll get copied into the dryad deposit
14:17 josephwb will do it shortly
14:17 jar286 make a little index.html file so someone stumbling on it will know what it is (with a date and so on)
14:18 jar286 is the supplement going to be published as it, or does the publisher do copy editing and formatting?
14:18 jar286 s/as it/as it is/
14:19 codiferous typically it is as-is
14:19 jar286 are we planning on converting to another format?  there’s a fair amount of formatting infelicity
14:26 jar286 looks like the testsums part is not done. that looks to me like the biggest hole
14:27 codiferous yes, i agree. i'll see what i can do there now
14:28 kcranstn I’ll pull it off google docs and do the final format offline
14:28 kcranstn so don’t worry about formatting right now
14:29 jar286 terrific, thanks
14:29 kcranstn I also want to re-post a revised biorxiv preprint that includes the supplement’
14:31 josephwb jar286 i thought of just putting up a gzipped archive of the newicks. do you instead want indidivual files? maybe both?
14:31 josephwb indidivual files == not gzipped
14:31 jar286 gzipped is fine.  but it should still be in a subdir of the root
14:31 jar286 with an index.html
14:32 josephwb understood
14:32 josephwb but do we want people to be able to download just individual files?
14:32 mtholder-kiwi joined #opentreeoflife
14:32 jar286 we can do that later if there’s demand.
14:32 josephwb e.g. the treemachine service could link directly to those
14:32 jar286 oh…
14:32 josephwb instead of "not available"
14:32 jar286 foo.
14:33 jar286 well why not.  sure.  I had forgotten about the service
14:33 josephwb [didn't mean to complicate things]
14:33 josephwb ok
14:33 josephwb thanks
14:36 codiferous josephwb, if you are going to update the treemachine service messages, would you also add 'http://files.opentreeoflife.com' to the error messages for the too-many-nodes-in-your-request case?
14:37 josephwb sure thing.
14:37 josephwb good idea
14:38 josephwb new tree is not up there yet
14:38 codiferous yeah, i have had a couple of emails asking where people can download the trees when they want more than the web service allows, guessing that may not be an uncommon thing
14:38 josephwb i will put that up too
14:38 jar286 there is a trees/ directory but that’s for whole synthetic trees
14:39 josephwb just let me know how you want things named/organized, and i will do it
14:41 jar286 we have synthesis version numbers now, right?
14:41 josephwb hmm
14:41 josephwb the currrent one is 3.0
14:41 jar286 so we could have /preprocessed/v3.0/index.html, all.tgz, trees/studyid_treeid.tre
14:42 jar286 or something like that
14:42 josephwb alright, i will get something up, and you can alter to your liking
14:45 codiferous jar286 in regard to first principles, the idea is to identify overlap and conflict among trees
14:46 codiferous that enables the synthesis procedure
14:46 jar286 that’s too vague.
14:46 jar286 what I’m asking is, how do I know when there are enough/too many nodes or edges? how do I tell whether one TAG is better than another?
14:47 jar286 or, whether a TAG is right or wrong given a set of input trees?
14:47 codiferous i think "too many" would mean nodes that are not supported by the testsum procedure
14:47 jar286 which is not documented (in the doc)
14:47 codiferous please feel free to add it wherever you think it would be helpful
14:48 codiferous i am just trying to help think through the answer to the question here
14:48 jar286 there is a place for it. I thought someone else was working on this, you or Mark or Stephen
14:48 codiferous i have not been working on it
14:49 jar286 and what was written there previously was not at all helpful - seemed complicated and arbitrary
14:49 codiferous one tag is better than another if it identifies more overlap/conflict than the other
14:49 codiferous the question of rightness seems a little trickier
14:49 markholder joined #opentreeoflife
14:50 codiferous i think it might make more sense as a question of quality
14:50 jar286 how are overlap and conflict identified? by presence of nodes in the TAG? by edges? still pretty vague
14:50 jar286 quality was one of the questions I asked
14:50 codiferous overlap is: paths in the tag from different passing through the same nodes
14:51 jar286 what I hear is: you want to maximize the number of supported nodes (although I don’t know what that means), and you want to maximize the number of edges, subject to the NCO invariant
14:52 jar286 another way to say it: if you had unlimited, infinitely fast computing resources, what would be the best TAG you could make?
14:52 codiferous well, we don't want to maximize the number of edges, we want to maximize the number of edges that identify overlap and conflict
14:52 codiferous hence not adding extra edges among nodes within a tree
14:53 codiferous hm. that tag would contain all possible nodes that pass testsum
14:54 codiferous and all nco edges among them that correspond to edges in the input trees
14:54 codiferous i think
14:54 jar286 I don’t get that about edges. (a) why is it incorrect to have too many nco edges, (b) given an edge how do you tell whether it ‘identifies overlap and conflict’
14:54 codiferous ('correspond to' = map trees conditions)
14:55 codiferous i suppose "identifies overlap and conflict" may be too vague/strict of a condition
14:55 codiferous we want all the edges that can be associated with an edge in a tree
14:56 codiferous identification of overlap and conflict should result from that
14:56 jar286 you mean, you want an edge for every nco relationship that would yield an edge associated with an edge in a tree?
14:56 codiferous not sure what you mean by yield
14:57 jar286 x nco y yields the edge (x,y)
14:57 jar286 (or (y,x) I can never remember the polarity)
14:58 codiferous ok, yes, i think that is correct
14:59 jar286 so the idealized algorithm is, create all possible supported nodes, then create all possible associated edges (which necessarily satisfy nco)
14:59 jar286 so supported is at the heart of TAG creation, and does not occur in the writeup.
15:01 jar286 the definition in 2.2.3 is completely opaque.
15:07 jar286 are you saying that if I’m to understand node support, I am going to have to reverse engineer it?
15:09 codiferous hm, i hope not
15:11 jar286 well, that’s what I’m starting to do. what is the alternative?
15:12 codiferous are you looking at the code then? i have to yet to make it through the last bit with comments
15:13 jar286 I’m looking at section 2.2.3 from ‘Cody’s notes’
15:13 jar286 or text
15:15 jar286 I can’t even figure out the quantifiers
15:15 codiferous don't look at that
15:16 jar286 shall I go do something else and come back to this later?  for me this is really the only major piece of unfinished business
15:16 codiferous look at the code, i've simplified it and made a lot of comments: https://github.com/OpenTreeOfLife/treemachine/blob/master/src/main/java/org/opentree/tag/treeimport/BipartOracle.java#L737
15:16 codiferous that old 2.2.3 text should just be removed
15:16 jar286 why did you say ‘don’t look at that’? who wrote that text, you or Mark
15:17 jar286 I don’t really want to look at the code
15:18 jar286 and it looks like some of the reverse engineering has already been done, by way of 2.2.3, so that seemed a reasonable starting point
15:19 josephwb markholder did you figure out why our breakdowns differ?
15:22 josephwb codiferous: jar286 noted a problem with the resolved comment involving the correspondence of tree and scaffold edges. i think things might be flipped in direction, and 1 line is just wrong. i put what i think might be correct in a comment.
15:23 markholder no
15:23 codiferous i see. well the code is a lot easier to understand than the text in 2.2.3
15:23 markholder but I didn't really understand your last email to me on that thread.
15:24 codiferous jar286 you don't really even need to look at the code. just read the comments in the code
15:27 josephwb markholder: ignore the "Sum_input_IDs.txt" bit; that was just describing an alternate approach (not using unions)
15:27 josephwb we get the same number of unique source tip IDs
15:28 josephwb [in R] I ask: "how many of those tip IDs are terminal. the code just sums the values of "TRUE"
15:29 jar286 by ‘reverse engineering’ I mean figuring out what the code accomplishes, as opposed to how it does it.  something as central as this has got to have a simple (nonexecutable) specification.
15:29 codiferous "This test determines if, given a potential merger node x, there exists some set D(x) of nodes in input trees, of which each node in D(x) is merge-compatible with x, whose cumulative phylogenetic information is sufficient to imply the separation of all the taxa in x’s ingroup from all the taxa in x’s outgroup.
15:29 codiferous "
15:31 josephwb markholder: strike that. [in R] I ask: "for each of those tip IDs, is it terminal?" the code just sums the values of "TRUE"
15:32 josephwb ah, i think i might know what is going on.
15:32 jar286 that’s not worded in a way that I can understand
15:33 josephwb you are comparing against a pruned taxonomy. i am comparing against the whole thing. do you throw out terminal taxa if only higher taxa are sampled? that would explain why your terminal numbers are higher.
15:33 jar286 by ‘cumulative phylogenetic information’ you just mean the merger of all the D(x), I presume
15:34 jar286 and by ‘sufficient to imply’ you just mean that x’s ingroup is a subset of the merger’s ingroup, and similarly outgroup
15:39 codiferous cumulative phylogenetic information: yes
15:40 codiferous sufficient to imply: i don't think it's as simple as x's ingroup is a subset of the merger's ingroup, although that is a requirement (because each element of D(x) must be merge-compatible with x)
15:43 jar286 I had read that sentence before, and it sounded like handwaving as a substitute for 2.2.3, which was suppressed due to complexity and dryness
15:43 jar286 so what does sufficient to imply mean?
15:44 jar286 the only kind of implication between nodes is ingroup/outgroup subsetting
15:45 jar286 clade hypothesis A|B implies c.h. C|D if C is a subset of A and D is a subset of B
15:46 josephwb markholder: i sent an email summing up my "understanding" so I don't pollute this thread further on this topic.
15:46 jar286 it’s enrichment, not pollution
15:48 markholder OK will look when I get a chance. My numbers are coming from the post-treemachine pruning of taxa based on flags
15:48 codiferous i think joseph means he and mark are talking about something else
15:48 jar286 yes, I know, I hate emojis so didn’t write one
15:48 jar286 or is it emoticons
15:49 codiferous but how can you assess sarcasm without them? :p
15:50 josephwb thank you jar286
15:51 jar286 anyhow ‘cumulative information’ and ‘imply separation’ sounds like an inference system to me
15:52 jar286 there’s a logic here that tells you how combine ch’s to get new ch’s
15:53 mtholder joined #opentreeoflife
15:58 codiferous that statement makes sense to me, that seems like what we are doing
16:00 jar286 but I still want to understand that sentence
16:03 codiferous we are looking for phylogenetic statements that do not contain information that is not present in the input trees
16:03 codiferous in other words, there is no new information (which would be "unsupported")
16:06 jar286 but the word “information”, however appealing, is not actionable
16:06 jar286 you have to talk about how claims imply one another
16:06 jar286 or derive from
16:08 jar286 I’m thinking you want to say a ch is derivable (proof theory) or valid (model theory) based on inputs (trees)
16:08 jar286 but in order to say either of those you need to know what the logic is
16:09 jar286 it doesn’t have to be as formal as that, but whatever’s said needs to be actionable (you need to be left knowing what you would do to get the answer)
16:11 jar286 I haven’t gone over ‘Merger nodes from the root of one tree to nodes in other trees’ carefully yet.  Interested to know how it could fail to lead to a supported node
16:14 jar286 oh wait, testSum applies to both root an non-root mergers?  there has to be a forward reference in the earlier section to the later ‘node support’ section
16:14 jar286 otherwise there are categorical statements made that just aren’t true ‘We will create a merger node…’
16:15 jar286 getting lunch now
16:17 codiferous no, we don't apply testsum to the root mergers
16:17 codiferous not for any particular reason, it just never happened
16:45 mtholder joined #opentreeoflife
16:55 jar286 wondering if ‘corresponding’ would be better than ‘associated’
16:59 codiferous probably, it seems to imply more about logical consistency rather than just some undefined grouping mechanism
17:00 jar286 I’m happy to change it, it’s been bothering me for a while
17:00 jar286 shall I?
17:02 codiferous go ahead
17:03 jar286 codiferous, is the following right:
17:03 jar286 A|B is supported if
17:03 jar286 for all x in A, y in B, there exists an input node C|D
17:03 jar286 with A|B merge-compatible with C|D
17:03 jar286 such that x is in C and y is in D?
17:04 codiferous i think that is too strict. not completely sure. thinking about it
17:06 kcranstn joined #opentreeoflife
17:17 codiferous ok, so one correction is that each C|D need only be compatible with A|B
17:17 codiferous but there is some missing condition
17:17 jar286 ah
17:17 codiferous here is an example. consider the sum ACD|EFG
17:18 codiferous given input tree nodes AC|G and D|EF, that node is not supported
17:18 codiferous because there is no overlap between those input nodes, so they cannot be combined
17:18 codiferous there is no information to claim the splits A|EF, C|EF, or D|G
17:19 jar286 sure… (that situation would never come up in the search for mergers, but maybe that’s beside the point)
17:19 codiferous in other words, there is some overlap condition that must also hold, but i'm not exactly sure what it is
17:20 codiferous obviously, not all "supporting nodes" (e.g. C|D from your original example) must overlap with all others
17:21 jar286 (noticing that all clade hypotheses with a taxon set size of 2 are true)
17:22 codiferous yes, the minimal case for phylogenetic meaning is a rooted triple
17:23 jar286 my A and B above were sets of taxa… in case that wasn’t clear
17:23 codiferous right, that was clear
17:23 codiferous in my example the letters were individual taxa
17:24 jar286 yes
17:28 jar286 (drrawing pictures)
17:30 codiferous i think the condition is that from some merger node A|B there must exist in the input trees a set of *compatible* rooted triples X such that some tree displaying all X displays A|B
17:31 codiferous just fyi, i'm going to sign off here in 30 mins or so
17:33 jar286 ‘displays’? (looking it up)
17:33 codiferous contains one or more nodes in which that split is present
17:34 codiferous not sure if there is a more standard term
17:34 jar286 looking at ‘Definition: Sum of displayed groups’ weights’
17:35 jar286 what you said sounds right, working on it
17:37 jar286 now it doesn’t sound right, but it’s something to work with
17:40 codiferous what seems wrong about it?
17:41 jar286 quantifier problem.  a single tree that displays all X?
17:41 codiferous that is the definition of compatible
17:42 codiferous they must be able to be displayed in the same tree
17:42 jar286 oh I think I see.  some hypothetical tree
17:43 codiferous in the the example, ACD|EFG is supported if we have two input nodes AD|C and AC|EFG
17:44 codiferous those input nodes are compatible and can be displayed in the tree ((A,D),C),(E,F,G))
17:45 jar286 that’s compatible with A|B
17:46 codiferous whereas the input nodes AC|G and D|EF are not sufficient to describe a tree containing ACD|EFG
17:46 jar286 an example where the ingroups overlapped would be more compelling, but I’m starting to get it
17:46 codiferous so that indicates a correction to my first attempt. the condition is not that there exists some tree displaying all X that displays A|B, it is that X is *sufficient* to describe a tree that contains A|B
17:51 jar286 (i.e. the tree so described would have A|B’s taxon set as its tips, right?)
17:52 codiferous yes, i would think so
17:53 codiferous have to run now, but i will try to catch up with emails and the doc later tonight
17:53 jar286 ok
18:00 jar286 PR review time.  are there any?
18:01 jar286 no.
18:07 jimallman jar286: agreed, nothing there is fully baked
18:07 jar286 I think I’ll do the production deployment this afternoon
18:08 jimallman ok. i should be around if you want a second pair of eyes
18:09 jar286 plan, roughly speaking:
18:09 jar286 api:
18:09 jar286 Upgrade to jessie.
18:10 jar286 Install java 8.
18:10 jar286 Remove venv.
18:10 jar286 Push master branches.
18:10 jar286 Delete files if necessary to make space.
18:10 jar286 Install treemachine db (already copied).
18:10 jar286 Test web api.
18:10 jar286 Reset OTI database to taxonomy.
18:10 jar286 Rebuild OTI index.
18:10 jar286 Smoke test.
18:10 jar286 tree:
18:10 jar286 Push master branches.
18:10 jar286 Smoke test.
18:11 jimallman why is “Remove venv” required? and is it restored later? (i suppose yes, by deployment)
18:11 jar286 cool, I wonder who Saulo is (committed to phylesystem)
18:12 kcranstn I am also around
18:12 jar286 has to be removed because of python libraries upgrade, and yes, it’s restored
18:12 * jimallman nods
18:12 jimallman hi kcranstn!
18:13 kcranstn *waves*
18:19 jar286 going through https://github.com/OpenTreeOfLife/germinator/wiki/Debian-upgrade-notes:-jessie-and-openjdk-8  on api.opentreeoflife.org
18:22 jar286 (apache will soon be temporarily broken)
18:50 jar286 jessie installed, now installing java 8 & pushing out new treemachine, oti, etc….
18:51 kcranstn oh, the suspense
18:54 josephwb are we using "paperpile" for references?
18:54 kcranstn yes
18:54 kcranstn or, I am using paperpile for references
18:54 josephwb i don't have it
18:54 kcranstn you can leave a reference string in the doc and I can add it
18:54 josephwb ok
18:54 josephwb references are a mess at the moment; merging of two versions
18:54 kcranstn in what doc?
18:54 josephwb supp
18:55 josephwb i guess Smith 2013 just needs to be added then?
18:55 josephwb i also updated the phylesystem reference to bioinformatics
18:55 kcranstn I already did that
18:55 josephwb was bioarxiv
18:56 josephwb oh, i did that this morning; it was the old reference then
18:56 kcranstn don’t edit the references list - that will get overwritten when I re-run paperpile
18:56 josephwb ok, i get that now
18:56 josephwb but phylesystem is not updated
18:57 jar286 "message" : "No such ServerPlugin: \"tree_of_life\"",
18:57 kcranstn I haven’t been re-processing the references while we’ve been editing
18:57 josephwb ok
18:57 josephwb i thought it was plain text
18:57 kcranstn so the list at the end is certainly out of date
18:57 kcranstn let me re-run now
18:57 josephwb ok
19:01 josephwb Phylesystem is old copy (and from the future!)
19:01 jar286 4 out of api 10 tests failed (but only 1 failed on davapi)
19:01 jar286 maybe it will do better after i’ve installed the new synthetic tree
19:03 jar286 rebuilt oti, not only 3 of 10 fail
19:05 kcranstn references should look better now
19:05 kcranstn no more future phylesystem
19:06 josephwb looks good
19:14 jar286 going to copy the treemachine db over to production now… this will take a long time
19:14 jar286 should have done that earlier, oh well
19:23 kcranstn joined #opentreeoflife
22:13 josephwb are you there jar286?
22:20 jar286 not really, what’s up
22:21 jar286 api.opentreeoflife.org passes basic api tests
22:22 jar286 still need to rebuilt the study index
22:22 jar286 josephwb, what’s up
22:22 josephwb for the source_tree service, do we want to return the raw newicks, or offer the possibililty of replacing ottids with names?
22:22 jar286 you mean in an ideal world?
22:22 josephwb of course
22:23 josephwb the best of all possible worlds, if possible
22:24 josephwb i've updated the synth tree page: http://files.opentreeoflife.org/trees/
22:24 josephwb and made the source tree page: http://files.opentreeoflife.org/preprocessed/v3.0/
22:24 josephwb plugin works on my machine; will try it on dev soonish
22:25 josephwb jimallman is it ok to push stuff to devapi?
22:25 josephwb treemachine plugin
22:25 jimallman josephwb: fine by me
22:25 josephwb thanks
22:26 jimallman i’m working elsewhere. jar286 might want to compare behavior as he’s working on a production deployment, but as long as dev is stable i think we’ll be fine.
22:26 jar286 ideally, give user a choice between ids only, names only, and ids+names
22:27 josephwb i was thinking only 1 or 3
22:27 josephwb we gots to include the ottid
22:27 josephwb got to
22:28 jimallman what if someone is interested in high-quality Newick, but not planning to interoperate?
22:28 jar286 ok…  well in general we want the options to be the same as what they are in other api calls
22:28 josephwb they never have an option from treemachine
22:29 josephwb this is not really important; just wondering what you thought
22:31 jar286 gotta run, will be back on later tonight

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary