Perl 6 - the future is here, just unevenly distributed

IRC log for #opentreeoflife, 2014-05-24

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:12 towodo joined #opentreeoflife
00:36 kcranstn joined #opentreeoflife
00:37 kcranstn you there, @towodo?
00:38 towodo I here.
00:38 mtholder joined #opentreeoflife
00:38 kcranstn "The taxonomy (v 2.6) consists of 3,134,507 named entities, 1,200,645 synonyms, and 2,348,919"
00:38 kcranstn correct?
00:38 kcranstn "The taxonomy (v 2.6) consists of 3,134,507 named entities, 1,200,645 synonyms, and 2,348,919 entities when only phylogenetic lineages are considered"
00:40 towodo I would have to check. And I'm not sure how Stephen calculated the last number in detail. Want me to check the 1st two numbers? It's just a wc command
00:40 towodo I thought Stephen was thinking of upgrading to 2.8 (major fungi fixes) and doing some more synthesis? Any word on this?
00:40 kcranstn not for the paper
00:41 towodo ah...
00:41 kcranstn I was assuming that last number was #otus - hidden
00:42 towodo I don't know. Extinct are 'phylogenetic lineages' maybe but they're hidden … really don't know how he defines the term
00:43 towodo wc tax/2.6/taxonomy.tsv tax/2.6/synonyms.tsv
00:43 towodo 3135021  42728557 270924972 tax/2.6/taxonomy.tsv
00:43 towodo 1204926  16104571 113916268 tax/2.6/synonyms.tsv
00:44 towodo damn… maybe there's version skew and he's using some draft version of 2.6
00:44 kcranstn no, I added the specific version #
00:44 kcranstn and there is a note in the doc to get the latest numbers
00:44 kcranstn so expected that my quoted string doesn't match your numbers
00:45 josephwb joined #opentreeoflife
00:45 towodo ok… but it's going to be quite important to know exactly which version of ott he used...
00:45 towodo I have saved copies of all 2.6 drafts
00:46 kcranstn how hard to get the list of non-hidden entities?
00:46 towodo and I checked for reproducibility of 2.6 final (but not the drafts)
00:46 towodo umm.  like I say 'hidden' is a bit of a moving target. i'd have to read the treemachine source code to know exactly what he's hiding
00:47 kcranstn damn
00:47 mtholder joined #opentreeoflife
00:47 towodo i can do that and then do a count fairly easily, if you want… maybe 20 minutes of work
00:48 kcranstn let me check with stephen first
00:48 towodo i mean, the list of flags he excludes is all in one place in treemachine, it's not that hard to find...
00:48 kcranstn ah, I see
00:48 kcranstn ok, then it would be great to have that number
00:48 towodo and then I would plug that flag list in to smasher, which already has a way to filter and then count
00:49 towodo treemachine/src/main/java/opentree/GraphInitializer.java
00:51 jimallman joined #opentreeoflife
00:52 towodo I'm getting more and more angry at firefox, it gets mired in the mud, slow to respond, slows down my machine… just clobbered it again
00:53 towodo chrome doesn't do that nearly as much
00:57 towodo kcranstn, my hidden test seems to be the same as Stephen's, and the number of hidden taxa was already recorded in the log file for 2.6
00:57 towodo the answer would be: (- 3135020 695386) = 2439634
00:58 towodo which is rather different from Stephen's number, 2,348,919.
00:58 towodo maybe something about excluding viruses?… let me re-run in a more modern version of smasher
01:00 towodo cel = ott.select(ott.taxon("cellular organisms"))
01:00 towodo | Selection has 2998906 taxa
01:00 towodo | Added 1186198 synonyms
01:01 kcranstn define that 2998906 number?
01:01 towodo that's total number of nodes (internal + external) in the 'cellular organisms' subtree (no viruses)
01:01 towodo i'm looking at version 2.6
01:02 towodo I don't have a "count hidden" primitive… will write the loop in jython
01:03 towodo I'm trying to see if stephen is counting viruses as hidden, in addition to things flagged hidden.  could account for some of the difference
01:07 towodo yes, much closer.
01:07 towodo >>> cel.propagateFlags()
01:07 towodo cel.propagateFlags()
01:07 towodo >>> counthidden(cel)
01:07 towodo counthidden(cel)
01:07 towodo 2385328
01:07 towodo so that last number is: Start with 2.6, extract the cellular organisms branch (i.e. no viruses), count visible (non-hidden) taxa
01:07 towodo (function has wrong name, sorry)
01:10 towodo In other words, there are about 90,000 non-hidden viruses in 2.6.  In later versions I think I just hide all viruses. And I'm wondering if they should just be excluded completely
01:20 josephwb joined #opentreeoflife
01:25 towodo kcranstn, email sent
01:26 kcranstn thanks, jonathan!
02:11 jar joined #opentreeoflife
02:14 jimallman joined #opentreeoflife
02:35 kcranstn joined #opentreeoflife
02:57 kcranstn joined #opentreeoflife
03:06 josephwb joined #opentreeoflife
12:18 josephwb joined #opentreeoflife
12:45 mtholder joined #opentreeoflife
13:01 mtholder joined #opentreeoflife
13:25 mtholder joined #opentreeoflife
13:34 towodo joined #opentreeoflife
17:46 mtholder towodo and jimallman: a draft of a config-file-generator is implemented and described at https://github.com/OpenTreeOfLife/deployed-systems#experimental
17:46 mtholder not tested yet.
17:46 mtholder (by which i mean that I have not tried to deploy any of the generated config files)
17:47 mtholder thoughts are welcome, of course. perhaps our decision to set up production and dev domain names obviates the need for a terse configuration system.
17:57 kcranstn joined #opentreeoflife
18:05 towodo mtholder, I want to see if we can keep everything to one command (push.sh -c foo.config). ergo, a single command to make the config file and run it.
18:06 towodo the spontaneity is important IMO… separate compile/run steps are annoying
18:11 mtholder agreed. I want to be able to write into peyotl methods that are smart enough to detect the endpoints of services in dev and production mode.
18:11 mtholder currently the config files get a lot of old cruft
18:12 mtholder stuff that is only relevant to previously deployed services.
18:12 mtholder If we are consistent about the newly agreed upon domain names, then peyotl won't need to read a config file to determine the endpoints.
18:13 mtholder towodo^
18:15 mtholder to be clear, though... the terse.conf system does not entail rerunning the generate_config.py everytime you want to tweak a config. just remembering to update terse.conf everytime you change the endpoints for a service...
19:44 kcranstn joined #opentreeoflife
20:44 kcranstn joined #opentreeoflife
20:52 kcranstn joined #opentreeoflife
21:18 kcranstn joined #opentreeoflife
21:52 kcranstn joined #opentreeoflife

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary