Perl 6 - the future is here, just unevenly distributed

IRC log for #opentreeoflife, 2015-06-01

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:20 jar286 joined #opentreeoflife
01:14 jar286 joined #opentreeoflife
01:48 ilbot3 joined #opentreeoflife
01:48 Topic for #opentreeoflife is now Open Tree Of Life | opentreeoflife.org | github.com/opentreeoflife | http://irclog.perlgeek.de/opentreeoflife/today
01:59 jar286 joined #opentreeoflife
04:01 jar286 joined #opentreeoflife
04:59 jar286 joined #opentreeoflife
06:00 jar286 joined #opentreeoflife
07:04 jar286 joined #opentreeoflife
09:05 jar286 joined #opentreeoflife
10:06 jar286 joined #opentreeoflife
12:07 jar286 joined #opentreeoflife
12:48 jar286 joined #opentreeoflife
13:40 kcranstn joined #opentreeoflife
14:06 josephwb joined #opentreeoflife
14:07 josephwb kcranstn is the paper submitted? did peter get added to the author line?
14:08 kcranstn no (doug wanted to review, and now I need to incorporate his changes). Peter - haven’t asked him yet.
14:09 kcranstn emailed Peter
14:11 josephwb hmm. don't think i like doug's title. too grand. tree is not good enough to warrant that title.
14:11 kcranstn I honestly haven’t looked at it yet
14:11 kcranstn nescent deadline today
14:12 josephwb it looks like he maybe only changed the title
14:22 kcranstn I have two Word docs and I assume there are other changes
14:22 josephwb oh, ok
14:26 josephwb should get peter on the biorxiv preprint too
14:27 kcranstn i’ll be uploading a new version that matches the resubmissin
14:27 josephwb http://biorxiv.org/content/early/2014/12/15/012260
14:27 kcranstn resubmisson
14:27 kcranstn can’t type today
14:27 josephwb great
14:27 josephwb poor quiet peter
14:28 josephwb super smart
14:29 josephwb super quiet
16:31 kcranstn joined #opentreeoflife
16:57 jar286 jimallman, there?
16:58 jimallman yes, hi
16:58 jar286 sometimes I see @url = /curator/default/to_nexson?output=input&uploadid=u226f995a-efbc-448b-973f-a2c29867bc39
16:58 jar286 but sometimes I see @url = /curator/download/supporting_files.doc.b41809b118fd4b62.5041535445442e747265.tre
16:59 jar286 the latter matches the file names in the uploads directory
16:59 jimallman i seem to recall something about stale/bad file download URLs, but i thought this was restricted to dev (phylesystem-0) .are you seeing this on production?
16:59 jar286 yes
16:59 jar286 I can get some study ids, hang on
17:00 jimallman hm. ok, i’ll review the email trail on this and see if I can suggest a way to convert these en masse.
17:04 jar286 well the good news is that of the ones that aren’t of the /to_nexson form, we are missing none
17:04 jar286 the bad news is that there are 34 of the bad ones and I don’t know how to check to see if they exist
17:04 jar286 not sure where to file the issue…
17:05 jar286 I’ll email you the list
17:07 jar286 they are all ot_ studies
17:07 jar286 almost all are ‘Source data for tree …’ which means no proper file name
17:07 jar286 but a few look like files we should recover is possible
17:08 jar286 maybe we already have them, I don’t know.
17:08 jar286 getting lunch now, back in a bit
17:09 jimallman agreed. it looks like i’m using separate code paths to save supporting files from tree import, vs. uploads in the Files tab. both should now be correct. (i’ll test a new tree-import operation just to be sure.)
17:09 jimallman i’m guessing we have old tree-import URLs to fix.
17:27 jar286 jimallman, my priority now is not fixing the @urls, but rather making sure that we have the files.  I think fixing the @urls is relatively low priority.
17:27 jimallman gotcha
17:29 jar286 it looks like there’s significant overlap with the list in the gist
17:29 jar286 the gist list is longer, 43 instead of 34
17:30 jar286 I disregarded any with name “content provided as a string”
17:32 jar286 rerunning to include those…
17:33 jimallman i’m digging around on ot14 (tree.opentreeoflife.org), and it looks like some/all of this data can be found in /home/opentree/repo/opentree/curator/private/scratch
17:33 jar286 there are 164 @url properties with “to_nexson?…” values
17:34 jimallman for example, here’s a quick search based on the guid for ot_209….
17:34 jimallman grep -ri -ls u9f43f05f-a09e-4043-bba9-0a917404bf54 ./repo/opentree/curator/private/scratch
17:34 jimallman ./repo/opentree/curator/private/scratch/2nexml/u9f43f05f-a09e-4043-bba9-0a917404bf54/bundle_properties.json
17:35 jar286 mm, should have thought of that.  I’ll copy the nexml directory locally and tweak the script to look there
17:35 jimallman 259 subdirectories there, so it’s posisible we have all..
17:35 jimallman nicely organized, each subdir is named with a corresponding guid (thanks mtholder!)
17:37 jimallman fyi, i do not see a subdir for the oldest(?) ot_8 file, guid = u93af4391-e18f-47f2-a1e4-917bc3d9e60a
17:40 jimallman jar286: it looks like in.nex is always the raw input (whether pasted or uploaded?), probably named .nex regardless of its real format…
17:41 jimallman most useful metadata (original format and filename) are found in bundle_properties.json
17:42 jar286 wonder why this scp -r command is taking so long… probable poor network capacity on ot14’s class of ec2 server
17:42 jimallman regarding the older uploads, i believe we’ve been backing these up for awhile, so perhaps they’re still around on another box?
17:42 jimallman hm. josephwb has also reported very slow transfers of large db.tgz files the API servers
17:43 jar286 not sure. once I get a new list I’ll look for files on the server I was backing up to.  then I will ask Mark, since he was supposed to be doing backups too
17:46 jimallman relevant discussion of mirror_supporting_files.sh, just as a reminder:  https://github.com/OpenTreeOfLife/opentree/issues/280#issuecomment-57383995
17:52 jar286 very nice, have found a lot of what’s needed in 2nexml … (all results not yet in)
18:00 jar286 excellent, only 27 missing, of which only 11 have file names (not “content provided as a string”)
18:16 jar286 going to put spreadsheet in google docs now, and ask curators to provide the files if they still have them
18:21 jimallman any point in asking mtholder for older backups first?
18:23 jar286 right. will do before going to curators.
18:23 jar286 https://docs.google.com/spreadsheets/d/1HNg-w1B67R-1vUdIdVHd_FjT8rnrjKldaXscqItzPUA/edit#gid=0
18:42 jimallman joined #opentreeoflife
19:40 josephwb joined #opentreeoflife
20:06 jimallman joined #opentreeoflife
21:06 jimallman_ joined #opentreeoflife
21:57 jar286 jimallman, how are you identifying duplicate studies? maybe my script could do that. Or it could look for ‘deleted’ etc.
21:57 jar286 or maybe it’s not worth it since you’re almost done…
21:57 jimallman just using the curation app (which tries to flag duplicate DOIs).
21:58 jar286 oh, right, you said that in the comment
21:58 jimallman so far no luck finding intact files. the “sister” studies have no files at all (missing or otherwise)
22:00 jimallman as an aside, user lzogbaum has created a LOT of duplicates. i wonder why...
22:01 jimallman drat. i found no supporting files (salvageable or otherwise) in any of the duplicate studies.
22:01 jar286 I have no idea when Mark will be back. I think I will write to josephwb and chris owen at least
22:01 jimallman jar286: should i leave these row markings in the spreadsheet? as long as we’re talking to curators, maybe we could ask them to clean up unwanted dupes..
22:02 jar286 ‘back’ meaning ‘responding to email’
22:02 * jimallman nods
22:03 jar286 is cleaning up the dups something I could do? I want to reduce curator load after asking them a favor like this
22:04 jar286 we should separately make a list of dups, not confuse it with this task, if possible
22:04 jimallman certainly, if it’s clear that no useful information is being lost.
22:05 jimallman Agreed, I can add a small spreadsheet for this purpose (though I thought we had a nice way to search for these in curation app…)
22:17 jar286 low priority
22:17 jar286 email sent to joseph and chris
23:52 kcranstn joined #opentreeoflife

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary