Perl 6 - the future is here, just unevenly distributed

IRC log for #opentreeoflife, 2014-10-23

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:58 kcranstn joined #opentreeoflife
01:48 ilbot3 joined #opentreeoflife
01:48 Topic for #opentreeoflife is now Open Tree Of Life | opentreeoflife.org | github.com/opentreeoflife | http://irclog.perlgeek.de/opentreeoflife/today
01:55 kcranstn joined #opentreeoflife
03:06 kcranstn joined #opentreeoflife
04:01 scrollback joined #opentreeoflife
09:30 mtholder joined #opentreeoflife
11:01 mtholder joined #opentreeoflife
12:55 towodo joined #opentreeoflife
13:55 pmidford2 joined #opentreeoflife
13:57 kcranstn joined #opentreeoflife
14:40 kcranstn joined #opentreeoflife
14:48 scrollback joined #opentreeoflife
14:52 21WAAA3ZL joined #opentreeoflife
16:38 kcranstn joined #opentreeoflife
18:13 jimallman pmidford2: hi! this channel has grown sleepy
18:13 pmidford2 yes
18:13 * jimallman has been focused on Tree Illustrator work lately, which is nice
18:14 pmidford2 I'm still working on the summary statistics
18:15 pmidford2 do you know when Karen is leaving for sweden?
18:15 jimallman cool.
18:15 jimallman ah, i was not aware (Sweden trip)
18:15 pmidford2 I think she's going to TDWG
18:16 jimallman yes, that does ring a bell
18:17 pmidford2 Otherwise, I'm still trying to get another update on the Index Fungorum taxonomy (it's been a couple of months now)
18:18 jimallman is there lots of curation for this? or do you mean you’re trying to get new data from IF?
18:25 pmidford2 Trying to get new data; grind it through, fix the problems and pass on to JAR
18:56 towodo joined #opentreeoflife
19:29 towodo jimallman, can we talk about the interface between the statistics-generating script and the web app?
19:50 mtholder joined #opentreeoflife
19:52 jimallman towodo: back now. sure, let’s talk about stats
19:52 towodo ok
19:52 pmidford2 I'm here too
19:53 towodo good. current plan is that there will be 2 scripts each generating a json file
19:53 towodo this will lead to a series of ‘profiles’ over time…
19:53 towodo not sure whether that’s all profiles in one file, or a growing directory
19:54 towodo regardless, the files need to be put somewhere for the web app to pick up.  the question is where
19:54 jimallman either way, we can show the current/latest stats by default, with an option (in the page) to show older profiles
19:54 jimallman or if it’s very compact, maybe new and old (reverse chron. order) in a table
19:54 towodo maybe eventually show comparative data or plots
19:55 pmidford2 Or select one field from a profile to plot over time
19:55 towodo I don’t want to get too ambitious right away though.
19:55 jimallman plotting is easy, esp. if we’re building up a single JSON file.
19:55 towodo there will be a study corpus profile, and a synthetic tree corpus
19:56 towodo the studies profile will be rerun frequently and reflect curators’ progress
19:56 towodo like maybe daily
19:56 towodo the synthetic tree profile only changes when the synthetic tree changes, which currently is every 5 months (hoping for acceleration, that’s another story)
19:57 jimallman understood.
19:57 pmidford2 yes
19:57 towodo that affects what ‘before’ and ‘after’ means in the two cases
19:57 towodo if we are to do comparisions that is…
19:57 towodo but I would like to start with a version 1 that doesn’t do comparisons, just has pages made from single profiles
19:58 towodo the script can push the profile out to any location on the server.  what is most natural for web2py to pick up?
19:58 towodo the static directory?...
19:58 towodo private?
19:59 jimallman static/ is certainly easy, but it has other uses (might be clobbered in deployment)… uploads/ doesn’t sound right either...
19:59 jimallman maybe a new folder for this purpose would be safest:  stats/  ?
20:00 towodo where would it live? ~opentree/ ? or in the web2py tree?
20:00 jimallman or somewhere in the web-space of treemachine
20:00 jimallman i’d put it in the app-folder’s space, like opentree/stats/
20:01 towodo hmm.  it’s not really part of treemachine, the synthetic tree profile is a client of treemachine.  so doesn’t belong there
20:01 jimallman your “it” = synth-tree stats?
20:01 jimallman or source-tree stats? or both?
20:02 towodo you said treemachine, I was responding to that.  neither set of stats belongs to treemachine.
20:02 towodo you mean:  repo/opentree/stats ==  web2py/applications/opentree/stats ?
20:02 jimallman gotcha. i suppose the main webapp (opentree) is where it all comes together.
20:02 towodo yes, main webapp
20:02 jimallman yes, those two paths would be equivalent (via symlink into web2py/applications/)
20:03 towodo peter, the scripts are time consuming - I’d like to not run them on the production server.
20:03 pmidford2 yes
20:03 pmidford2 so run them somewhere else and copy to production?
20:04 * jimallman is checking other directories in the application space, just in case there’s a better option…
20:04 towodo yes, we can run them on varela.csail.mit.edu, or on the dev system
20:05 towodo probably dev (ot10)
20:05 towodo I guess we should also run the scripts on the dev phylesystem and synthetic tree
20:07 pmidford2 so the synthetic tree and phylesystem locations need to be specifiable to the scripts
20:09 jimallman towodo: do you want these stats to be available as direct downloads, too? this might affect the placement of these files.
20:11 towodo just a sec
20:12 towodo sorry, interrupted
20:12 towodo hmm.
20:13 towodo sure, direct downloads would be good.  we can put them under static/
20:13 jimallman it’s easy enough to expose a new directoy, just another block like this:  https://github.com/OpenTreeOfLife/opentree/blob/master/deploy/setup/apache-config-shared#L34-L38
20:13 jimallman (maybe i’m wrong, and extra stuff in static/ won’t be clobbered by deployment)
20:14 towodo up to you.  I like the answer that minimizes configuration, that’s why I like static/
20:15 towodo I guess there’s a question as to what we want the URLs to look like. if advertized we might have to live with the URLs for a while
20:15 jimallman right
20:16 jimallman i agree that the configuration is already complicated, but i like the visibility of this, the sensible URLs, and an easy precedent if we decide to offer statistics for other OpenTree sites.
20:18 jimallman there’s also the minor pain of carrying these files forward, similar to our uploaded data files.
20:18 towodo don’t want to get bogged down with questions that aren’t very important right now.  we can do static for now, and then when we go to advertise, we can set up a redirect
20:18 towodo yes, they can be backed up just like the uploads,using the same script.
20:19 jimallman sounds good
20:20 towodo ok.  then, two directories under static/, one for phylesystem stats and one for tree stats?  with one profile per file? and maybe the date in the file name?
20:21 towodo static/phylesystem-stats/2014-10-23.json
20:21 pmidford2 ok
20:21 towodo static/synthesis-stats/2014-10-23.json
20:21 towodo or something of that ilk?
20:22 jimallman not bad, unless someone downloads both types and gets confused… maybe static/stats/phylesystem-2014-10-23.json   ?
20:22 jimallman nah, i like yours better
20:23 towodo there is a problem with ssh credentials copying these files from ot10 to production.  I will figure that out (there’s a way with ssh to do privilege attenuation)
20:23 towodo for now we can debug on ot10
20:23 jimallman using multiple files (one per dated profile) will slow down plotting eventually. how much data are we talking about in a profile? i guess quite a lot for phylesystem data…
20:23 pmidford2 ok
20:24 towodo I don’t think it’s a lot
20:24 pmidford2 not much yet
20:24 towodo we’re not talking about measurements for every tree, only overall measurements
20:24 towodo right now each profile has 5-10 numbers in it, so tiny
20:25 jimallman ah, ok. then i might ask for one cumulative JSON for phylesystem, another for synthesis
20:25 towodo I’m not sure how to architect study- or tree-specific analyses
20:25 towodo cumulative is harder for the script to manage.
20:26 towodo it would have to read, then augment, then write
20:26 towodo as opposed to simply generate and write
20:26 jimallman understood. i assume we’re using a smart JSON parser in any case, for safety’s sake.
20:26 towodo ?
20:26 towodo ‘smart’?
20:26 towodo safety?
20:26 jimallman as opposed to trying to “roll your own” JSON using string concatenation, etc.
20:27 jimallman it never ends well.
20:27 towodo it’s a python script, it would never do that
20:27 pmidford2 No, using the python json library
20:27 towodo I can’t think of any situation that would justify rolloing one’s own json i/o
20:28 jimallman ok, great. so hopefully parsing, modifying, and writing out JSON would not be a big deal. but i’ll defer to your judgment on this.
20:28 towodo hmm.  I go  back and forth on this
20:28 towodo so you’re worried about the overhead of reading a few hundred tiny json files
20:29 towodo i see. that will happen on every hit for that page
20:29 jimallman yes, lots of traffic if using AJAX. if we load+parse on the server side, no biggie i guess.
20:30 jimallman the Bibliographic References page works that way, with server-side fetches and rendering. so we have a precedent either way.
20:30 towodo ok, we can do it as a single file.  if the script can do scp in one direction it can do it in the other
20:30 pmidford2 Otherwise, I could serialize json to a string and open the file for append, or am I missing something
20:30 towodo that works too
20:31 jimallman pmidford2: just the likely need to insert (vs. append) to get inside the topmost list, or object, or whatever.
20:31 towodo you should be able to open for append and write json directly
20:31 towodo can javascript read multiple json forms from a single file?
20:31 pmidford2 But that gives a file of multiple json expressions, not one big one
20:31 towodo and check for EOF?
20:32 jimallman with JS, i can fetch the raw file contents, split on newlines, and parse the guts. or wrap with the outermost list/object notation and parse the result.
20:32 jimallman so yes, i can work with that. we’d just need to explain that quirk to anyone else using the files.
20:32 towodo weird, you mean the library will read a single json thing from a file, but not multiple json things?
20:33 jimallman it would expect an outer list or object, i think. extra stuff would likely be ignored or declared invalid.
20:33 jimallman but i can confirm, as i’m not 100% sure
20:33 towodo ouch. ok, single thing then.
20:34 towodo static/stats/phylesystem.json  =>  file containing single json thing containing a bunch of dated entries
20:34 pmidford2 That should work on my end as well
20:34 towodo static/stats/synthesis.json similarly
20:35 jimallman { “2014-10-23” : { … }, “2014-09-05”: { … }, … }
20:35 jimallman that’s what i’d expect (dates as keys)
20:35 towodo that makes sense
20:35 pmidford2 ok
20:35 jimallman and leading zeroes for easy sorting… any chance of intra-day profiles?
20:36 jimallman multiple in one day, i mean?
20:36 jimallman for synthesis, i suppose
20:36 towodo umm… synthesis takes 23 hours to run. chance of two in a day is vanishingly small
20:36 jimallman :D
20:37 jimallman i’m an optimist
20:37 towodo but we might have intra-day during testing, or reboot, or something like that
20:37 towodo the key should just be arbitrary ISO 8601
20:38 towodo it could give the hour only, or it could change over time as needs change
20:38 towodo 2014-10-23T21
20:38 towodo but let the server decide that.
20:38 towodo (that should be 2014-10-23T21Z)
20:38 jimallman sounds good
20:39 pmidford2 works
20:39 jimallman zulu time, yes
20:39 towodo ok.  so Peter will design what goes inside the {…}, then deposit a sample file in the static directory on ot10
20:40 pmidford2 Yes
20:40 towodo then email Jim. then Jim will have a sample to work from..
20:40 jimallman i should be able to cook up a page for this pretty quickly.
20:40 pmidford2 Do I have access to ot10?
20:40 jimallman might need some clarification regarding the meaning of some stats. we’ll want such docs anyway, of course, for other consumers
20:41 jimallman pmidford2: looks like you just need opentree.pem
20:41 jimallman (vs production.pem)
20:41 towodo yes, to write to ot10
20:42 towodo I think I will make a fresh key for production and restrict it to writes to the stats/ directory
20:42 towodo do you have the key?
20:43 pmidford2 I don't think so
20:43 towodo do you have pgp email?
20:43 pmidford2 I don't think so, if I did I haven't configured it
20:43 towodo you would know.
20:44 towodo I’ve been using norbert as a secure channel, but it’s dead in the water… two bad disks
20:45 towodo I guess I can send it in gmail.  do you use the gmail email client?
20:46 pmidford2 I use either the web version imap to thunderbird
20:46 pmidford2 or imap
20:46 towodo imap over ssl? …
20:48 pmidford2 it says ssl/tls
20:49 towodo ok, I will email it but then save it in ~/.ssh/opentree.pem and delete it from gmail
20:49 pmidford2 ok
20:49 towodo sent
20:49 jimallman hm, maybe a different filename vs opentree.pem? we’ve already got one of those..
20:50 towodo what I sent is that file, so that’s the right name
20:51 towodo the new key I will deal with later, no rush to get this out to production
20:51 pmidford2 Got it and deleted
20:51 jimallman oops. thought it was the new (stats-only) key. my bad.
20:51 towodo ok, so we’re all set on this project for a while.
20:51 towodo back in 3 mins
20:51 jimallman i think so.
20:52 pmidford2 yes
20:55 towodo pmidford2, when this is working, there’s more taxonomy work to be done
20:55 towodo I’ve got WoRMS integrated into OTT so that’s looking good
20:55 pmidford2 OK
20:55 towodo but I’ve been talking to Dail and would like to get some motion on microbes
20:56 pmidford2 Ok
20:56 towodo when you’re ready and if you’re willing we can make a plan… I think it could be pretty interesting
20:56 pmidford2 I poked Paul Kirk again this week, but nothing
20:56 pmidford2 Ok, I'll finish getting this to Jim and let you know.
20:57 towodo I’m looking for the thing I wrote about SILVA to help you get excited about it…
20:58 pmidford2 Great
20:59 towodo here it is, maybe you saw it before: https://docs.google.com/document/d/1XjEOONt2HItYFvfHn2kXveWpyvpwFG8bpJSbo4alvYE/edit#heading=h.1ushca77zbqp
21:00 towodo the difference this time around is that we’re going to put all 500,000 SILVA clusters into OTT as tips
21:01 towodo and (jimallman, this is you) I’m hoping to support OTT id lookup from genbank id - SILVA was kind enough to give us (earlier this week) their mapping from genbank ids into clusters
21:01 jimallman cool!
21:01 pmidford2 Nice
21:01 jimallman is this described in more detail in the google doc above?
21:02 towodo no, the google doc is old, it’s how we did it before
21:02 jimallman gotcha.
21:02 towodo I plan to write up a plan (so to speak)
21:02 jimallman pmidford2: just in case the big JSON files slow things down in python, here’s a good thread that talks about ijson (a Sax-style JSON parser): http://stackoverflow.com/questions/2400643/is-there-a-memory-efficient-and-fast-way-to-load-big-json-files-in-python
21:03 towodo I’m going to bookmark that one...
21:03 jimallman yes, looks very handy
21:04 pmidford2 yes, though we won't be at that scale for a while.
21:11 pmidford2 Got to go.  Jim I'll let you know as soon as I have the files in place.
21:11 pmidford2 left #opentreeoflife

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary