Perl 6 - the future is here, just unevenly distributed

IRC log for #opentreeoflife, 2015-01-20

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
02:47 ilbot3 joined #opentreeoflife
02:47 Topic for #opentreeoflife is now Open Tree Of Life | opentreeoflife.org | github.com/opentreeoflife | http://irclog.perlgeek.de/opentreeoflife/today
06:05 mtholder joined #opentreeoflife
06:20 mtholder joined #opentreeoflife
06:28 mtholder joined #opentreeoflife
07:49 mtholder joined #opentreeoflife
12:08 mtholder joined #opentreeoflife
12:14 jar286 joined #opentreeoflife
16:02 mtholder joined #opentreeoflife
16:23 mtholder joined #opentreeoflife
18:30 mtholder joined #opentreeoflife
19:05 josephwb hey mtholder and jimallman.
19:05 mtholder yes...
19:06 mtholder hi josephwb
19:07 josephwb it seems that the property "ot:nearestTaxonMRCAOttId" only appears if a curator has manually clicked the "test" button
19:07 josephwb in the nexson, i mean
19:07 josephwb appear in the nexson, click in the curator
19:08 josephwb if i gave one of you a list oif study and tree ids, could this be atumated?
19:08 josephwb probably for mtholder ^
19:09 mtholder yes. we can add something with a script. Is that a property that we use?
19:09 josephwb it is used as a check (e.g. that focal clade ~ mrca)
19:11 mtholder I see it in 120 studies.
19:11 mtholder fwiw
19:13 josephwb yeah, the only ones where it was checked
19:13 josephwb but couldn't we just turn it on for everything?
19:14 josephwb maybe part of the save tree process
19:14 mtholder I suspect that it is coming from rick's viz tool
19:14 mtholder so it would be cases in which that tool was tried. Maybe?
19:14 josephwb no, this is new to the curator
19:15 josephwb inferred against our taxonomy
19:15 mtholder If it is in the NexSON's I'd prefer that it be something that gets recalculated whenever there is a otus change.
19:15 mtholder I meant rick's new viz tool
19:15 jimallman hi, here now
19:17 mtholder we're discussing nearestTaxonMRCAOttId
19:17 mtholder it's only in a few studies.
19:17 mtholder looks it takes some special clicks to get it there.
19:18 josephwb actually would like "ot:nearestTaxonMRCAName", but they go together
19:18 mtholder we should probably be removing it (or recalculating it automatically) whenever OTUs change.
19:18 josephwb or trees
19:18 josephwb e.g. rerooting
19:18 mtholder if it is fast to calculate, we may not need to store it in the nexson at all.
19:18 josephwb SUPER fast
19:19 josephwb well, i was thinking of finding errors from the nexson itself
19:19 mtholder rerooting wouldn't change the taxonomic MRCA for the tree, though.
19:19 jimallman fyi, the curation UI just calls {treemachine}/getMRCA from both Test buttons
19:20 josephwb is this not a valuable piece of information? "focalClade" is specified by the curator, and may be wrong.
19:20 jimallman i’ve thought of this as an on-demand integrity check, but subject to curator’s judgment
19:20 mtholder I don't mind plowing through the nexsons to check for something. but if this property is going to be in the nexsons, then we need to make sure it is recalculated when inputs change...
19:21 mtholder it will confuse people using the nexsons if it is stale.
19:21 jimallman hm, i see your point. someone using the nexson is probably going to trust these values
19:21 mtholder If it is just a check, it could be client side (not stored in the nexsons at all).
19:21 josephwb but trusting focalClade is far more likely to be wrong.
19:22 mtholder true, but focalClade is curation data, so it has to be in the NexSON. the MRCA is something we can calculate whenever we need it (and it changes w/ OTT)
19:23 josephwb here is what i am thinking: i download a "bird" tree with "focalClade" "Aves"; if the inferred "MRCA" is, "Eukaryotes", then I won't trust the study.
19:23 josephwb i.e. poorly curated
19:23 mtholder sure. the check makes sense.
19:23 mtholder just a question of whether we do that check on the fly or cache the results in the nexson...
19:24 josephwb i guess i am in the minority here.
19:24 mtholder I don't feel too strongly. I can write a recalculate MRCA script so it is easy to calculate.
19:25 josephwb in fact, i would rather search for trees that have the MRCA property than the focalClade property; i would trust the former.
19:25 mtholder or we can add an issue to the curator to check it each it time something changes.
19:26 mtholder we could add a hook on the phylesystem side to calculate it on the fly and add it. - but that would be the first time that we add substantive content to the NexSON
19:26 mtholder from the server app.
19:26 mtholder seems like  a big change of mindset from a datastore to a calculate + store app.
19:26 mtholder why don't we create an issue in opentree repo
19:27 jimallman i believe we store these MRCA test results so that reloading the study shows sensible status for all trees. (this affects their “quality” scores in the header.)
19:27 josephwb if you don't think it would be useful, we can just forget it.
19:27 jimallman otherwise we’d need to fire these tests automatically when the study is finished loading. or when building the curation page.
19:28 jimallman perhaps these tests run server-side when the study is saved? that way they’re always up to date (in case the curator didn’t retry them).
19:28 jimallman re: a big change of mindset, isn’t this similar to our validate-on-save, which adds annotations to the nexson?
19:31 mtholder true it is like the validation annotations. those actually don't get stored. they get cached (or regenerated on GET)
19:31 mtholder we can do it on the server side.
19:32 mtholder need to decide what to do if the property is present in the nexson.
19:32 mtholder seems odd to override what is there.
19:34 mtholder just need to decide what the behavior should be. (1) server recalculates and rejects studies with bogus values
19:34 mtholder (2) server recalculates and overwrites any bogus values
19:35 mtholder (3) server recalculates only when it is absent
19:35 mtholder or some other behavior
19:35 mtholder I guess that is is the idea of replacing an input (as opposed to adding an annotation) that seems like a big change to me.
19:37 mtholder gotta step away for a minute...
19:41 jimallman i see.. i guess “(2) server recalculates and overwrites any bogus values” seems reasonable to me. if we treat this as validation, it’s kind of not the curator’s “property” to protect. maybe we move these into annotations? the curation app could still read and update them, but this might set expectations correctly.
19:42 mtholder I think that they are all pretty reasonable as long as we document the fact that we'll be blowing away an input field.
19:42 mtholder I'm confident that jar286 will have an opinion ^
19:43 jar286 falling behind
19:43 mtholder I suppose that we should note the version of OTT that the field pertains too, as well.
19:43 mtholder basically we have nearestTaxonMRCAName and nearestTaxonMRCAOttID in some (but few) nexsons
19:43 jar286 hmm… should always note the version of OTT, otherwise reproducibility is threatened
19:44 jar286 trying to catch up on conversation
19:44 mtholder we're discussing whether this should be the frontend's job, the backend's job, or an on-the-fly property
19:47 jimallman i misspoke above. it seems we don’t use these values in the client-side “quality” score. they’re strictly for the curator’s reference. so maybe we shouldn’t save them with the nexson after all.
19:49 mtholder I think josephwb wants to use them
19:49 mtholder on the gcmdr side of things
19:49 mtholder (and other folks who use the nexsons might want the property)
19:53 jar286 so what is the cost of recalculating?  OTI already has the tree, so you could just give it study+tree id, and the turnaround would be fast, yes?
19:53 jar286 storing redundant info is an invitation to confusion
19:53 jar286 as google has said, ‘metadata is just another way to lie’
19:55 jar286 I guess my vote is on the fly.
19:55 jar286 gcmdr can call OTI as easily as anyone else.
19:56 jar286 or is it taxomachine. whatever.
19:56 mtholder taxomachine, i think
19:57 mtholder on the fly is fine with me. josephwb how much of a pain is it for you?
19:58 jar286 jimallman, the curator gets these mrcas from taxomachine? or where?
19:59 jimallman treemachine/getMRCA
19:59 jimallman (service)
19:59 jar286 that should be an issue.  getting the info from treemachine gives the appearance of circularity (data depending on results)
20:00 jar286 i’ll add that to treemachine issues
20:00 josephwb not a pain.
20:01 jimallman https://github.com/OpenTreeOfLife/treemachine/blob/master/src/main/java/jade/tree/JadeTree.java#L299
20:03 jar286 https://github.com/OpenTreeOfLife/taxomachine/issues/90
20:04 jar286 jimallman, taxomachine doesn’t use the jade representation internally, it reads trees that way then immediately converts to neo4j
20:05 jar286 jade predates open tree of life.
20:06 jimallman gotcha. i see your point about reckoning this from taxomachine instead.
20:06 jimallman FYI, here’s a brief past conversation about whether to store these values in Nexson: https://github.com/OpenTreeOfLife/opentree/issues/365#issuecomment-49912971
20:06 jimallman looks like I stashed them Just In Case, and there’s no (apparent) downside to stripping them out before we save a study.
20:07 jar286 good
20:07 jimallman hm, josephwb had an objection to taxomachine here?  https://github.com/OpenTreeOfLife/treemachine/issues/111#issuecomment-53113496
20:09 josephwb what does that guy know?
20:09 jar286 thanks for hunting that down, jimallman - added to new issue discussion
20:10 josephwb the point was that querying treemachine, we are guaranteed that the tree was built with the same taxonomy; if things get out of sync, taxomachine could be using a different taxonomy.
20:10 josephwb mak sense?
20:11 jar286 the purpose of the curator is to create the input to synthesis, right?
20:11 jar286 ‘same taxonomy’ should be same taxonomy as that used for OTU mapping
20:11 jimallman fwiw, the resulting taxon appears (in curation UI) as a link to the synthetic-tree browser. so a taxon that didn’t make it to synthesis would (i suppose) not become a link. a minor issue, i suppose.
20:12 josephwb well, the idea is that we would check against the current synthetic tree AND the taxonomy; any difference would give an idea of how informative (or wrong!) a tree may be.
20:13 jar286 ok, that might be useful, but information coming from treemachine should never be stored in the nexson, otherwise we have an input depending on an output.
20:13 jar286 curator could still call treemachine for advice I suppose, on the fly
20:13 jimallman yes, less of an issue if we don’t save the results.
20:13 jar286 but I’d prefer that to be a separate tool, other things being equal…
20:14 mtholder If either of these calls (treemachine or tacomachine) is expensive, we can add a service on the phylesystem-api that caches the result (but keeps the result out of the nexson)
20:14 josephwb "tacomachine". yuck.
20:14 jimallman mmm
20:14 mtholder yummy.
20:14 mtholder it is hard to find good mexican food here...
20:15 jimallman so much for the Zimmerman telegram
20:15 mtholder ha!
20:19 mtholder-nick-re joined #opentreeoflife
20:20 mholder joined #opentreeoflife
20:22 mtholder joined #opentreeoflife
20:23 jimallman for now (since there seem to be no bad side effects), i’m going to strip these values before saving Nexson.
20:23 mtholder I'll make an issue to strip them from the studies that have them.
20:26 mtholder https://github.com/OpenTreeOfLife/phylesystem-1/issues/3
20:34 josephwb well, i'm glad i brought that up; will end up getting the exact opposite of what i was looking for!
20:34 josephwb not upset, BTW.
20:43 mtholder yes. it is good to get it sorted out. let me know if you want help with a command line tool to calculate this on the fly for a nexson.
20:46 josephwb another issue: different trees in a single study may have different taxonomic foci; say, one bird tree, one mammal tree. focalClade will not work here.
20:46 josephwb BTW we have such studies.
20:47 josephwb not common, but they are there.
20:47 mtholder yeah. I never saw the point of focalclade.
20:47 mtholder we do have the ingroup marked
20:47 mtholder (hopefully)
20:47 josephwb well...
20:47 josephwb don't count on that.
20:48 josephwb more often than not, we don't
20:48 josephwb synth sources are an exception.
20:48 josephwb focalclade can be searched for in the curator, or elsewhere.
20:48 josephwb how else would they be found?
20:49 mtholder from the mapped otus (and an indexer)
20:52 jar286 josephwb, can you give an example of a study with multiple would-be focal clades? we should record that somewhere for testing
20:53 josephwb just a sec
20:54 josephwb https://tree.opentreeoflife.org/curator/study/view/pg_2419/?tab=trees
20:56 jar286 ok thanks. I will squirrel it away
20:57 josephwb there are others dealing with comparative geography, although I do not know them off the top of my head.
20:58 jimallman https://github.com/OpenTreeOfLife/opentree/pull/556
20:58 jimallman just FYI (stripping MRCA results from client-side Nexson)
20:58 mtholder https://tree.opentreeoflife.org/curator/study/view/pg_2607/?tab=trees might be another. (it has way too many trees - ugh).
20:59 josephwb ugh.
20:59 jar286 gotta go do an errand. c u later & thanks
20:59 josephwb give trees informative labels!

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary