Perl 6 - the future is here, just unevenly distributed

IRC log for #opentreeoflife, 2014-07-22

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:34 kcranstn joined #opentreeoflife
00:39 kcranstn I am back. What’s the status?
00:49 jimallman kcranstn: i found the problem. taxomachine isn’t responding to API calls.
00:49 kcranstn step 1 - finding the problem. Success!
00:49 kcranstn now on to step 2.. fixing the problem
00:49 jimallman this affects all calls to the main webapp, which wants to fetch a list of search contexts for the header’s taxon search.
00:50 jimallman right. i’m digging into neo4j tools now
00:50 kcranstn what machine?
00:50 jimallman ot15
00:50 jimallman api.opentreeoflife.org
00:51 kcranstn so rebooting didn’t fix it?
00:51 towodo joined #opentreeoflife
00:51 kcranstn @towodo: @jimallman says “i found the problem. taxomachine isn’t responding to API calls.”
00:52 towodo hum.
00:52 towodo stopping and restarting …?
00:53 kcranstn nope
00:53 jimallman reboot + redeploy didn’t fix it before. is it possible we need to reload or re-init the database?
00:53 jimallman (i wouldn’t think so…)
00:54 towodo I wouldn’t think so… watching neo4j log …
00:54 jimallman note that the problem could just be apache itself, which proxies the requests. like i said, i’m looking up how to test for neo4j itself
00:54 towodo right.
00:54 jimallman i’m trying to hit the api with curl:
00:54 jimallman $ curl http://api.opentreeoflife.org/taxomachine/v1/
00:54 jimallman this stalls for me, never really even times out. compare to the snappy response here:
00:54 jimallman $ curl http://devapi.opentreeoflife.org/taxomachine/v1/
00:55 jimallman (it’s a complaint, but that means it’s responding)
00:55 towodo right
00:56 jimallman i’m getting a 503 if we call an actual method:
00:56 jimallman $ curl http://api.opentreeoflife.org/taxomachine/v1/getContextsJSON
00:56 towodo now this seems to be ok:  http://api.opentreeoflife.org/taxomachine/v1/
00:57 jimallman hm, you’re right
00:57 towodo main site responding
00:57 jimallman whoa, the curl call as well!
00:58 kcranstn I don’t like these kinds of solutions
00:58 jimallman yes, all pages are snappy now. !?
00:58 jimallman drat kcranstn, i was hoping you were working some magic
00:58 towodo neither do I. what’s odd is that ‘neo4j status’ said taxomachine was just fine.
00:58 jimallman something in its connection w/ apache?
00:59 towodo all I did was stop and restart neo4j.
00:59 jimallman hm, just now?
00:59 towodo yes, see my irc entries above.
00:59 towodo I stopped using ‘neo4j stop’, then started with ‘neo4j start’
00:59 jimallman i don’t see that in the irc log, but OK.
01:00 towodo hmm, the ‘?’ was misleading.
01:00 towodo I guess I was cryptic
01:00 jimallman oh, stopping and restarting. gotcha
01:00 jimallman i feel somewhat better knowing there’s a proximate cause… :)
01:00 towodo you said taxomachine didn’t respond, so I stopped & restarted it. etc.
01:01 towodo nothing funny in the neo4j log
01:01 jimallman so either neo4j was running, but not responding to proxied requests from apache… or apache couldn’t reach the prior running neo4j..? weird.
01:02 towodo but we didn’t do anything that would change apache.
01:02 towodo looks to me like taxomachine decided to go south.  we have no idea why.
01:02 jimallman agreed. i’ll check the apache logs to see if it was proxying the calls as expected.. did it see these as timeouts?
01:02 towodo unfortunately i don’t think timeouts show up in the logs.
01:03 towodo but it’s worth a look.
01:04 jimallman here’s an interesting item:
01:04 jimallman https://gist.github.com/jimallman/6a2d11be94eae57bddca
01:05 jimallman from about 6 minutes ago
01:05 towodo 7476 is taxomachine
01:06 towodo “error reading status line” means no response (status line is 1st line)
01:06 towodo so doesn’t look like apache’s fault to me.
01:07 towodo When you (or KC) rebooted before did you manually restart the neo4js? Or did it happen by magic? Or did you redeploy?
01:07 towodo jimallman ^
01:08 towodo by the way I stand corrected. you’re right, timeouts do show up.
01:08 kcranstn I just rebooted
01:08 kcranstn but not sure what @jimallman did
01:10 * jimallman is back now, reading...
01:11 jimallman i redeployed, figuring that would be the most thorough (and by-the-book) method
01:11 jimallman wait, i’m pretty sure i did. but not 100% sure.
01:12 towodo somehow it got started - I think - because: ./bin/neo4j status  /  Neo4j Server is running at pid 28884
01:12 jimallman shoot, i might have expected neo4j to just restart (which is should, on reboot).
01:12 towodo now maybe neo4j lies.  could have just been checking for the presence of a pid file
01:13 towodo it did time out when I tried to shut down… but that says nothing, compatible with either hypothesis
01:14 jimallman sorry i can’t be more certain. i definitely re-deployed to both ot14 and (previously) ot15 this afternoon, but i don’t have timestamps locally.  wait, lemme check the deployment logs on ot15
01:14 towodo has been up for 1:14  (uptime)
01:15 towodo no that’s not right.
01:15 towodo up 13 days. it didn’t reboot.
01:15 towodo !
01:15 towodo redploying doesn’t necessarily shut down neo4j.  maybe it should
01:16 towodo so taxomachine never got shut down. that’s a problem…
01:16 towodo not sure of the fix here.
01:16 towodo the deployment system is fragile.
01:17 towodo we need that nagios setup
01:17 jimallman https://gist.github.com/jimallman/5cef7a2ece7789aba035
01:17 jimallman that’s all of today’s setup activity on ot15 (joseph, then me)
01:18 towodo right. still no idea why taxo got hosed, but it’s clear it never got stopped and restarted
01:18 jimallman fwiw, i was pretty sure that deployment *does* try to shut down and restart neo4j… checking now…
01:18 towodo no. i checked
01:18 towodo only stops it if the repo has changed.  premature optimization ?…
01:19 towodo can change it to always shut down.
01:20 jimallman hm, we’re careful to stop+start in index-doc-store.sh and install-db.sh...
01:20 towodo the idea is you only stop if there’s a reason.  if the repo hasn’t changed no need to disturb the baby.
01:21 jimallman ah, so this isn’t doing the job: https://github.com/OpenTreeOfLife/opentree/blob/8a75b89d43d8f64db4d69fc077e513a37c8eb82e/deploy/setup/install-neo4j-app.sh#L70-L74
01:22 jimallman yes, looks like premature optimization (though i’m not sure why we’d need the restart here)
01:23 towodo restart isn’t needed but it doesn’t hurt much (1 minute delay maybe)
01:23 towodo no, look at the outer if.
01:23 towodo if git_refresh …
01:24 jimallman Looks like (based on comments) we intended to restart here, but pulled the punch: https://github.com/OpenTreeOfLife/opentree/blob/8a75b89d43d8f64db4d69fc077e513a37c8eb82e/deploy/setup/install-neo4j-app.sh#L102-L107
01:24 towodo that’s not a restart.
01:24 towodo that’s just a start.
01:24 jimallman right, but the comment suggests we were going to restart if neo4j is already running
01:25 towodo the code is working as intended. it just didn’t anticipate this particular failure mode.
01:25 jimallman # Start or restart the server
01:25 jimallman yep
01:25 towodo comment is misleading I guess. starting a running server is a no-op
01:25 * jimallman nods
01:25 towodo wait, the if … status rules that out.
01:26 jimallman i can move the stop+start block from above down here, so it happens regardless..
01:26 towodo there’s no start & stop block. there’s a conditional stop at the top, and a conditional start at the bottom.
01:26 towodo the thing to do would be to make the stop unconditional.
01:26 jimallman re: if … status, i assumed that checks to see if neo4j is *not* already running
01:27 towodo right. if it’s running, that code is not activated
01:27 towodo status returns 0 (success) if running, 1 (failure) if not
01:27 jimallman i see, you’re right in this script. stop and start are separate. i believe they’re inline in the other scripts i mentioned
01:29 jimallman here’s a much less cautious approach: https://github.com/OpenTreeOfLife/opentree/blob/e8fb44b23f008e0db9b5423ef054b8c8e93582a6/deploy/setup/install-db.sh#L35-L43
01:29 towodo but it’s known to be necessary in this case. install-db *always* makes a change. not comparable
01:29 towodo push often doesn *not* make a change.
01:31 towodo the idea was that it’s like ‘make’ - if nothing changed since last push, do nothing.
01:31 jimallman fyi, this was another version (currently disabled), based on the finding that neo4j sometimes returns a message for a stopped server: https://github.com/OpenTreeOfLife/opentree/blob/e8fb44b23f008e0db9b5423ef054b8c8e93582a6/deploy/setup/install-db.sh#L35-L43
01:31 jimallman wrong like! sorry: https://github.com/OpenTreeOfLife/opentree/blob/8a75b89d43d8f64db4d69fc077e513a37c8eb82e/deploy/setup/install-neo4j-app.sh#L76-L85
01:32 towodo (I assigned a deployed-systems pull request to you)
01:33 towodo that last code only runs (or would run) if there were a change, see the git_refresh above.
01:33 jimallman thanks, reviewing now..
01:33 towodo git_refresh is true if there was a change, false if no change
01:33 jimallman understood re: the git_refresh test.
01:35 towodo the fix is to move the if neo4j status/stop fi business up above the git_refresh.  I don’t like it because it’s slow, and we’re fighting the last battle, but…
01:35 towodo (whole thing should be rewritten to use make)
01:35 jimallman agreed. it looks like the safest approach is to always stop+start. i can make that change.
01:36 towodo ok, can you let me review the pull request?
01:37 towodo except i need to sign off for tonight, so tomorrow or whenever.
01:37 jimallman sure, will do.
01:38 towodo ok. thanks for hanging in there & good night
02:59 josephwb joined #opentreeoflife
10:51 josephwb joined #opentreeoflife
11:11 josephwb joined #opentreeoflife
11:32 towodo joined #opentreeoflife
12:03 kcranstn joined #opentreeoflife
12:21 josephwb joined #opentreeoflife
12:26 kcranstn joined #opentreeoflife
12:47 kcranstn @towodo: should we have a separate hangout to discuss what happened yesterday?
12:48 towodo hmm.
12:48 towodo we could talk about why it happened technically, ow about prevention?…
12:48 kcranstn yes and yes
12:49 towodo I don’t know. I was thinking of sending an email postmortem, thought that might be enough
12:49 towodo see https://github.com/OpenTreeOfLife/opentree/pull/380
12:50 kcranstn are there tests / monitoring that we should set up to at least detect similar problems?
12:50 towodo I don’t understand why your restart didn’t restart though.
12:50 towodo we should at least nagios. (your name is on the card)
12:50 towodo that would have detected the problem.
12:51 kcranstn the nescent nagios installation won’t be around after spring 2015
12:51 towodo then if we had api nagios checks that would have localized it.
12:51 towodo we can use MIT, or even AWS
12:51 towodo spring 1025 is a long way away
12:51 towodo 2014
12:51 towodo 2015
12:51 kcranstn good point. I should get it set up on nescent and we can move later
12:51 towodo yes
12:53 towodo yesterday did you do reboot at the aws console?
12:54 kcranstn yes
12:54 towodo the machine didn’t reboot.
12:54 towodo that caused a lot of confusion...
12:54 towodo now we know.
12:54 kcranstn I didn’t get much feedback in the UI, but the checks both went to fail, then to ok
12:54 kcranstn it was confusing to me
12:55 towodo well according to ‘uptime’ there was no reboot.
12:55 towodo ah… hm.. ot14 did reboot
12:55 towodo it was ot15 that needed the reboot.
12:55 towodo maybe we do need a hangout.
12:56 jimallman my bad, i thought we had rebooted both.
12:57 towodo so the only unexplained bit was how joseph’s action (which shouldn’t have touched taxomachine because he was deploying opentree) could possibly have made taxomachine go south.
12:58 towodo maybe that was disk space… ? but that seems unlikely, we looked into it.
12:58 kcranstn just to be clear, no one is pointing any fingers at @josephwb
12:58 jimallman just fyi, joseph did push to ot15: https://gist.github.com/jimallman/5cef7a2ece7789aba035
12:59 towodo yes, that’s one reason I don’t want to pay too much attention to this.
12:59 jimallman as did i
12:59 kcranstn I do think we should have some guidelines about who can touch production and when
12:59 towodo he didn’t do anything that should have caused a failure. the bug isn’t his.
13:00 towodo I had been thinking for a while maybe separate ssh keys
13:00 josephwb just to be clear, I am pointing all fingers at @josephwb
13:00 towodo but that’s pretty unfriendly
13:00 jimallman towodo: the deployment was interrupted due to lack of a secure file. this may have abandoned things in a non-working state.
13:00 towodo I don’t see how that could possible have affected taxomachine.
13:01 towodo s/le/ly/
13:02 towodo opentree and taxomachine just don’t interact at all - only over http, not through shared state
13:02 jimallman see my comment above, there was a push to ot15
13:02 jimallman (ot15 == api.opentreeoflife.org)
13:03 towodo I stand by what I say.  touching peyotl and certs has nothing to do with taxomachine.  It makes no sense to me
13:04 jimallman ah, i see what you mean.
13:04 towodo now the scripts could probaby be improved in how they handle missing certs and so on.  that should never be an error.
13:04 towodo that joseph’s action could have led to any kind of a problem, is a bug.
13:05 towodo not his action, but in the defenses.
13:05 jimallman and yet, in this situation that would mean restarting *production* without certs or other requirements, in a “fault-tolerant” way.
13:05 towodo we need to get the asterales system going, it will test this
13:05 jimallman good for test servers, bad for production
13:06 towodo production would have been unaffected.  it doesn’t matter if there’s an extra dysfunctionsl webapp running on the api server
13:06 towodo the scripts should be robust against this ‘attack’ and I thought they were
13:08 jimallman sorry if i’m not being clear. we’ve talked about making the deployment tools more tolerant of missing certs, credentials, etc. I’ve understood this to mean that we’d just relax security or surrender features. This works well for someone’s “private” deployed test system, but not if we’re updating our own production system.
13:08 jimallman apologies if this is a derail.
13:08 towodo updating opentree on api is not updating our production system. it’s just benign
13:09 jimallman understood
13:09 towodo and if the cert is already there it’s not deleted, right?
13:09 jimallman i would hope not. good point.
13:10 towodo so I still see no way that what he did could/should have affected anything.
13:11 towodo so there’s a bug out there that we haven’t found.
13:13 kcranstn good thing you have all of that entomology trailing
13:14 towodo comes in handy.
13:17 kcranstn separate thread: @jimallman - can I do a simply copy of that ot_29 study from phylesystem-0 to phylesystem-1 (updating the id)?
13:17 kcranstn or is that dangerous?
13:17 jimallman sorry, i’ve been trying to get back to that situation.
13:17 kcranstn no prob. I am happy to do it
13:18 kcranstn should I put it on a branch?
13:18 jimallman the only danger i can think of would be re-assigning the same id, which shouldn’t happen once you’ve assigned it. believe it or not, i’d do this on master to “reserve” the id.
13:18 kcranstn ok
13:19 kcranstn I’ll do it and create a PR
13:19 jimallman ok. we’ll just need to check the id just before merging the PR
13:19 jimallman and match any study id within the NexSON itself (this can be the final step)
13:20 jimallman towodo: i see the error message that jwb reported, here: https://github.com/OpenTreeOfLife/opentree/blob/6061045865a22970ed7a4bcf08415b60ad06cc23/deploy/push.sh#L214-L225
13:20 jimallman note that it does *not* stop the deployment, so my earlier concerns were moot.
13:22 towodo yes, that’s what I thought. the mystery remains
13:22 jimallman agreed, just wanted to tie up that line of questioning
13:25 jimallman kcranstn: the only instance of this id i see in the NexSON is the obvious:    â€œ^ot:studyId": "ot_29"
13:25 kcranstn yup. already checked that
13:25 jimallman kcranstn: …so the tricky part may just be to emulate the folder structure used in the phylesystem
13:29 jimallman you probably already know this, but here’s the intent as i understand it:
13:29 jimallman study/{SOURCE_PREFIX}_{LAST TWO DIGITS}/      is intended to gather all studies with a given prefix *and* the specified final digits.
13:29 jimallman it’s easier to understand using a well-populated example like this: https://github.com/OpenTreeOfLife/phylesystem-1/tree/master/study/pg_01
13:36 kcranstn we’ve skipped a few numbers, e.g. http://tree.opentreeoflife.org/curator/study/view/ot_10
13:39 kcranstn and I see things in the UI that I don’t see in the github repo
13:39 kcranstn e.g. http://tree.opentreeoflife.org/curator/study/view/ot_90
13:41 kcranstn so it is hard to tell what the next number is
13:42 jimallman kcranstn: eek, good catch! that’s the lag between phylesystem-api’s “local” repo and the remote on GitHub.
13:42 kcranstn can we shorten that lag?
13:42 jimallman it has become substantial (it’s on my to-do list for today, already have a thread open with mtholder)
13:42 jimallman it might be misconfigured webhooks, which will be easy to clean up.
13:44 jimallman in the meantime, we’ll need to grab the next available id in the phylesystem-api repo, and either add the study there or pull quickly from github to reserve its id
13:44 jimallman (adding the study there would be preferred. we can do this using ssh on ot15.
13:45 towodo could we be slow due to oti indexing?  I don’t know how that’s initiated
13:46 towodo should be backgrounded
13:46 jimallman the lag in question is hours/days
13:47 jimallman i don’t see conflicts in git, so the repo is just not sync’ing with github for some reason
13:49 jimallman towodo: just to beat the dead horse a little more, i reviewed the deployment scripts for the ‘opentree’ component (updates python virtualenv, installs web2py and the main webapp)… like you, i don’t see any way this could/should affect neo4j on the system. the only overlap i can see is a shared apache instance, which is of course restarted.
13:50 towodo and the apache isn’t really shared. it is just a client of taxomachine. no shared state.
13:51 * jimallman nods
13:52 jimallman i thought perhaps there was some conflict in how we set apache directives for virtualhosts or proxy’ing… but we’ve always mixed web2py and neo4j in our typical “api servers”, because phylesystem-api is a web2py app.
13:55 kcranstn where is the code that determines “what is the next available id?"
13:55 kcranstn I don’t see a method in the docs for phylesystem-api
13:59 josephwb any idea when ott will be updated in the name mapping?
13:59 jimallman i’m looking for it now (in the create-new-study code)
13:59 jimallman kcranstn: ^
14:00 josephwb we turned off TNRS in treemachine, figuring curator will do better job
14:00 josephwb but curator is ott2.6, and treemachine is using ott2.8draft5
14:00 josephwb i.e. some new names
14:01 towodo curator is ott2.8
14:01 towodo or it should be. on both prod & dev
14:01 towodo curator talks to treemachine and treemachine should be ott2.8.  not sure how to confirm
14:02 towodo could look for new or deprecated ids
14:03 towodo oops taxomachine!
14:03 towodo delete delete delete
14:03 towodo : curator talks to taxomachine and taxomachine should be ott2.8.  not sure how to confirm
14:04 josephwb don't think that is right
14:05 jimallman kcranstn: i’ve tracked down the new ID “minting” code in peyotl: https://github.com/OpenTreeOfLife/peyotl/blob/8dcf3bcd72d9c8a88d46a8bd518e6f9475125141/peyotl/phylesystem/__init__.py
14:06 josephwb towodo: genus "Polyascus" is major_rank_conflict_inherited in ott2.6, but good in ott2.8
14:07 josephwb i have a tree with Polyascus taxa, but the curator cannot map them
14:08 towodo this is in 2.8 but not 2.6 5307743|3839809|Tolyposporella puccinioides|species|if:547190|||
14:09 towodo darnit.
14:10 towodo no, seems ok.
14:10 towodo Tolyposporella puccinioides  shows up in the ‘search for taxon’ box, so taxomachine must be 2.8
14:10 towodo what’s evidence against?
14:10 towodo oh.
14:10 towodo you just gave it.
14:13 towodo ok, your evidence is better than mine.
14:19 blackrim joined #opentreeoflife
14:20 jimallman kcranstn: to get the next available study ID, we can use the ‘phylesystem_config’ service like so:
14:20 jimallman http://api.opentreeoflife.org/phylesystem/v1/phylesystem_config
14:21 jimallman This returns lots of JSON, including shards[0][‘_next_study_id’], which is currently 99
14:21 kcranstn cool, thanks
14:22 towodo josephwb, I checked Aphanius lunatus which is in 2.8 but not 2.7 or 2.6.  Looks like taxomachine mistakenly got set up with the database from ot10 instead of the one from ot12.  we should update this.
14:23 josephwb ok
14:23 josephwb should help with microbe studies
14:23 towodo it’s fine on devtree.
14:23 josephwb but animals too
14:23 jimallman any chance the new db will explain our missing-infraspecies puzzle?
14:24 towodo don’t know, will look
14:25 josephwb towodo: who should update the taxomachine db?
14:25 josephwb (not em)
14:25 josephwb (not me)
14:25 towodo I should do it.
14:25 towodo it will mean a short downtime
14:26 towodo no, a long downtime, since oti will need rebuilding. maybe 1-2 hours
14:27 josephwb we need it
14:27 towodo ok, we shouldn’t wait until 5pm? I guess I can do taxomachine right away, it would be fairly fast (just copy db from devapi to api & restart)
14:27 towodo shouldn’t matter if oti is behind
14:28 josephwb right
14:28 towodo looking into strains now
14:30 kcranstn joined #opentreeoflife
14:34 jimallman kcranstn: i’ve started a HOWTO to capture the steps in recovering a study from the wrong repo: https://docs.google.com/a/ibang.com/document/d/12ZPhIo5NEGmwhG8n9_mwOBlGi5yf1c8GmbecgpVVHYE/edit
14:34 jimallman we can move it to a wiki, but i’m not sure of the right place.  i just wanted to the ball rolling.
14:35 kcranstn thanks! I’d like to figure out how to script this using peyotl so that the files go through the validator (not necessary for moving between opentree dev and opentree prod but thinking about other users)
14:36 towodo josephwb, both 2.8 and 2.6 have Chlorobium luteolum DSM 273. In 2.8 it’s marked infraspecific, in 2.6 it’s not.
14:36 jimallman agreed. it’s tempting to support importing (study “creation”) from NexSON
14:36 kcranstn it was one of the big motivations for using github
14:37 jimallman yes. i’m chasing the lag situation now, which makes everything more confusing. mark says it lag should be seconds, not hours.
14:42 josephwb towodo: I don't know anything with the infraspecific problems.
14:43 towodo right, I have to believe Cody that it’s not a taxomachine issue (can look at the source again…)
14:54 towodo kcranstn, are we talking today at 11 or are you doing the hackday?
14:54 kcranstn let’s chat
16:12 jimallman kcranstn: i think i found the source of the lag (oti and GitHub repo). it was a configuration error in the production setup. I have a PR request for Jonathan to review, since this will require another push to production.
16:12 kcranstn cool!
16:12 towodo ok, ready to deploy a new taxomachine database, at joseph’s request. this would mean an outage, I don’t know how long, less than an hour I think. thoughts?
16:12 towodo on production
16:13 towodo this is with taxonomy 2.8 which might fix some problems
16:13 jimallman towodo: while you’re at it, please try this update as well (should fix our repo-lag problem): https://github.com/OpenTreeOfLife/deployed-systems/compare/fix-phylesystem-1-access
16:15 towodo the trouble is that devapi had 2.8 so curators got used to it. but when we went to production api kept 2.6
16:15 jimallman wow
16:16 towodo kcranstn, how do you feel about short-notice production downtime of about an hour in the middle of the day?
16:17 towodo I’m hesitant
16:17 kcranstn um… not thrilled
16:17 kcranstn can we send out a note and do this at the end of the day instead?
16:17 towodo we can wait until 5 like we did before.  sure
16:17 kcranstn does that work for you and jim?
16:18 towodo works for me, it was josephwb who asked for it asap
16:18 towodo and i understand why
16:19 towodo jimallman, will you be around at 5?
16:19 jimallman after 5pm seems best, unless it’s leading to bad data + later cleanup
16:19 jimallman sure, i’ll be here
16:19 towodo ok. let’s plan on that
16:19 jimallman (though maybe we can try my PR above before 5pm? it should have no appreciable downtime)
16:20 towodo fine with me. although I don’t understand it
16:20 towodo oh… dawning light…
16:21 jimallman see the linked GH issue. pushes from phylesystem-api to GitHub have been failling. looks like the wrong .pem file was specified
16:22 josephwb please let me know when ott is updated on prod
16:23 josephwb i can't really do much until that is done
16:23 josephwb towodo ^
16:24 towodo jimallman, go ahead
16:24 jimallman ok… pls stand by...
16:24 towodo josephwb, it will be today at 5
16:25 josephwb started at 17:00, right. will it take long? when will it be working?
16:25 towodo sending email now
16:26 towodo it should take about 15 minutes.
16:26 josephwb got it
16:29 jimallman for the record, that was:  $ ./push.sh -c ../../deployed-systems/Bos/api.config api
16:29 jimallman and we have a successful push!   $ curl -vv -X PUT http://api.opentreeoflife.org/phylesystem/push/v1
16:30 jimallman sorry, that link is only good from cURL
16:32 jimallman github repo (phylesystem-1) is looking up to date now:  https://github.com/OpenTreeOfLife/phylesystem-1/tree/master/study
16:32 jimallman checking oti (and re-indexing webhook) now…
16:33 jimallman josephwb: please take a look here for some of your more recent studies: http://tree.opentreeoflife.org/curator
16:37 jimallman i see that Lützen, 2003 is there, but it seems to an old/deleted version (ot_58, which is no longer in the repo). looks like oti is either still re-indexing, or the webhooks need fixing. checking this now...
16:41 josephwb jimallman: ot_58 wasn't working for some reason (could not view the tree); i started a fresh new study (ot_98)
16:41 josephwb that should be ot_98, not ot_9sunglasses
16:42 josephwb i deleted ot_58
16:42 jimallman right :)   apparently someone deleted it (Bryan, perhaps?) and oti never recognized the change.
16:42 jimallman oh, OK. yeah, it looks like the webhook that updates oti had an old API method URL in it, so the index wouldn’t know this yet. testing a fix now…
16:44 josephwb when I search for Lützen in the curator I only get ot_58 (though ot_98 is good). makes sense now with what you say about oti
16:45 jar joined #opentreeoflife
16:46 josephwb we are almost up to 100 studies using the curator (less the few that were deleted). Yay!
16:51 josephwb kcranstn: regarding downtime, is anyone else using this other than me?
16:51 josephwb seems like a taxon db could be built locally, copied to the server. downtime should not be long (?)
16:52 kcranstn we have been promoting the tools more widely lately, and it looks bad to have the main website down in the middle of the day without any warning
16:54 josephwb alright
17:11 kcranstn @jimallman: given the unexpected work yesterday / today, do we have an ETA on the deployment (I assume on dev) of the new curator features (testing inferred vs defined ingroup)
17:11 jimallman kcranstn: i should have that done tonight, just need to match the latest API changes from Cody
17:28 jimallman josephwb: please check oti again, it should be up to date now: http://tree.opentreeoflife.org/curator
17:29 josephwb jimallman: not sure what you mean
17:29 josephwb check for my study? oh.
17:30 jimallman i mean the old Lutzen should be gone (since it was deleted), and your new one should be OK
17:30 josephwb yes, looks good
17:31 jimallman cool, thanks
17:31 josephwb fyi only hits with "Lützen", not "Lutzen". something should be changed there
17:31 josephwb oe weird-lettered people will never be found
17:32 josephwb "oe" == "or"
17:40 josephwb jimallman ^
17:40 josephwb this is an oti issue, right?
17:50 jimallman in this case, the filtering is done in client-side Javscript. so I’d need to add a series of diacritical substitutions wherever a “questionable” letter is found.
17:51 jimallman not trivial, but not a dig deal. i’ll make an issue for this
17:55 jimallman https://github.com/OpenTreeOfLife/opentree/issues/381
17:57 kcranstn I think I successfully POSTed chris owen’s dev study to production
17:58 kcranstn $  data  curl -X POST "http://api.opentreeoflife.org/phylesystem/v1/study/?auth_token=$GITHUB_OAUTH_TOKEN" --data-urlencode nexson@ot_99.json
17:58 kcranstn {"description": "Updated study #ot_99", "branch_name": "master", "resource_id": "ot_99", "sha": "65c57781ab656b889841568195ad4716583c90d2", "merge_needed": false, "error": 0}
17:59 kcranstn hmmm: http://tree.opentreeoflife.org/curator/study/view/ot_99
17:59 kcranstn damn
18:00 kcranstn looks like it created a new (empty) study rather than uploading the json
18:01 kcranstn following instructions here: https://github.com/OpenTreeOfLife/phylesystem-api/tree/master/docs#creating-a-new-study
18:08 kcranstn a subsequent PUT following instructions here: https://github.com/OpenTreeOfLife/phylesystem-api/tree/master/docs#updating-a-study
18:08 travis-ci joined #opentreeoflife
18:08 travis-ci [travis-ci] OpenTreeOfLife/phylesystem-api#571 (master - 7a45cf6 : Jim Allman): The build passed.
18:08 travis-ci [travis-ci] Change view : https://github.com/OpenTreeOfLife/phylesystem-api/compare/97ddd2461930...7a45cf62eb8c
18:08 travis-ci [travis-ci] Build details : http://travis-ci.org/OpenTreeOfLife/phylesystem-api/builds/30578543
18:08 travis-ci left #opentreeoflife
18:08 kcranstn gives me an “invalid arguments” error
18:08 kcranstn argh
18:11 kcranstn ^jimallman - any ideas?
18:13 * jimallman is catching up now...
18:14 kcranstn (I am putting notes in your google doc)
18:14 jimallman thanks, reviewing the docs
18:18 kcranstn I am going to do a simple copy and update through git push
18:21 kcranstn unless you think that is a bad idea
18:21 jimallman or try the alternate curl method using --data-binary ..?
18:22 Jar286 joined #opentreeoflife
18:22 jimallman any chance you were not in the same directory as ot_99.json?
18:22 kcranstn nope
18:22 jimallman i haven’t used these methods with curl in a long time, looks like they’ve changed :-/
18:23 Jar286 (trying new irc client)
18:23 kcranstn aren’t these the methods we are using in curator?
18:24 jimallman good point. i’ll take a look at how we construct the calls there.
18:31 jimallman ah, this method has become complicated by the various import and creation methods. if the phylesystem-api can’t figure it out, it assumes “manual entry” and creates an empty study.
18:34 jimallman kcranstn: i’ve added a GH issue to restore these ways of creating/updating a study: https://github.com/OpenTreeOfLife/phylesystem-api/issues/97
18:35 jimallman meanwhile, we don’t seem to have a way of creating a study from existing NexSON, so your approach of using git makes sense.
18:35 jimallman do you want to ssh to ot15, or push to a local clone and get them in sync?
18:35 kcranstn but the PUT should still work, right?
18:36 kcranstn I think I’ve just got problems with quotes and ampersands in the curl call
18:36 jimallman hm, true. let me check, that would be simpler.
18:45 jimallman kcranstn: here’s a working API call to update a study (on devtree): https://gist.github.com/jimallman/b3f668a3640d50ece3d0
18:46 jimallman i’m looking at this now, paring down unneeded headers...
18:48 kcranstn are all of those options needed? trying to figure out why my much simpler example does not work
18:50 jimallman certainly most of them are not. it’s possible we’re expecting commit-msg and others in the query string...
18:50 jimallman but first, let’s remove the leading ‘nexson’ from nexson@blah
18:50 kcranstn I have to say, our documentation is a mess
18:50 jimallman i don’t see that kind of prefix or name in any example, just --data @myfile.txt
18:51 jimallman this might be from Duke’s original implementation… cobwebs
18:52 jimallman ah, i see why we have nexson@filename, it adds this var-name and url-encodes the nexson.
18:53 jimallman please try the alternate form (--data-binary @10-modified.json --compressed) shown here: https://github.com/OpenTreeOfLife/phylesystem-api/tree/master/docs#updating-a-study
18:54 kcranstn nope
19:00 jimallman here’s a suggested call (needs a couple of substitutions, see heading): https://gist.github.com/jimallman/b3f668a3640d50ece3d0#comment-1267986
19:02 jimallman ah, this should be the proper starting_commit_SHA (from creating the empty study), unless your tests are generating new SHAs: 65c57781ab656b889841568195ad4716583c90d2
19:06 kcranstn http://tree.opentreeoflife.org/curator/study/view/ot_99/?tab=metadata
19:06 kcranstn OMG, that was painful
19:08 kcranstn so, the issue is that there are more necessary options that what is in the current documentation?
19:13 jimallman maybe so, one or more of the query-string options. i’m testing a similar call here, will see how much this can be simplified
19:14 kcranstn I am worried about this sort of thing happening at the hackathon and causing many wasted hours
19:16 jimallman agreed. we’ll need to test the calls we’ve promised, and tweak the code (or docs).
19:16 jimallman would you mind sending along the call that worked for you? in a gist, i guess.
19:18 kcranstn I put it in that google doc
19:43 jimallman kcranstn: i’ve checked for required and optional arguments, and the docs may be right after all.
19:43 jimallman for PUT, that is. POST (creation) is just wrong
19:43 jimallman i’m adding comments to the google doc
19:45 kcranstn now I am getting an “You have provided an invalid or expired authentication token” error. Same oauth_token
19:59 kcranstn wow, almost none of the calls on that page work
20:15 * jimallman is checking the google doc for the failed calls..
20:19 jimallman kcranstn: can you add the failed calls to our google doc for a closer look?
20:19 kcranstn annotating the readme in the docs folder instead
20:19 jimallman ah, OK
20:19 kcranstn but off to a reception at the moment
20:20 kcranstn most of the fails are fairly straightforward base URL fixes
20:20 jimallman ok, thanks for doing that. i’ll work on cryptic error messages...
20:20 kcranstn @josephwb - nice summary on the thread about metadata
20:20 kcranstn and +1 for the use of “controlled vocabulary"
20:37 * jimallman will be back ~5pm to monitor the push to production
21:07 towodo joined #opentreeoflife
21:07 * towodo ?
21:07 * towodo am I connected?
21:08 * towodo yes.
21:09 towodo blimey. it’s a cascade of dependencies
21:14 jimallman towodo: hi, i’m here if you need a second pair of eyes/hands on production
21:14 jimallman and yes, you are connected
21:14 towodo hi. I’m stressed by all the taxomachine changes, and by Cody’s silence.
21:14 towodo he claims the new taxomachine db needs the new plugin features. sounds like a big risk to me
21:15 towodo thinking of setting up everything on ot18 and then doing switcheroo
21:16 towodo wish I had known. maybe I should have. we could have used the db on ot12 but it’s been deleted
21:17 jimallman i had no idea so much work was underway there.
21:17 towodo what I hadn’t counted on was that the taxomachine db had been rebuilt on devapi since we moved off of ot12.
21:17 towodo is that stuff deployed on ot15 ?
21:18 towodo that’s the whole reason we have the incompatibility, right? (new db won’t run with old plugin?)
21:18 towodo I have until about 9pm… but hate surprises so may reschedule
21:18 towodo going to start in on ot18 now.
21:18 towodo https://docs.google.com/document/d/1JU3kXu7zQHG0ZriwkUIbZuR9QhTitykRKeV8L1491IU/edit
21:19 jimallman i’ll check ot15 and see what code is there..
21:20 towodo thanks
21:21 jimallman ot15:taxomachine is branch Bos, latest commit from June 17: https://github.com/OpenTreeOfLife/taxomachine/commit/e8d6ce264f0a00a6755c9e8ec2905f1f009e303c
21:21 towodo ok, so what about on devapi? (ot10?) have we been testing it?
21:24 jimallman looks pretty recent, but latest commit was July 17 (5 days ago).
21:24 jimallman since then, the only substantive change is a merge of branch new_features (oh, that’s the 40 files changed). so no, it seems to be untested on devapi.
21:25 towodo it has the commit id… compare to taxo commit log ?
21:25 jimallman right, latest commit on ot10 is https://github.com/OpenTreeOfLife/taxomachine/commit/31e66351c58c09a4acfe45d297e76a90f7d7b75b
21:25 towodo (oh by the way i’m going to change the ssh keys for the production machines)
21:25 towodo i say we abort.
21:26 kcranstn joined #opentreeoflife
21:26 towodo joseph is out of luck for a while
21:26 jimallman yeah, it’s a biggie. he does say these features are for the hackathon. kcranstn, do you know about this work on taxomachine?
21:27 kcranstn no, I don’t
21:27 jimallman i’m reviewing commit logs now, and nothing here implies that it will fix broken API stuff.. just general improvements, i think.
21:28 josephwb can you guys email me if this gets up?
21:28 towodo it won’t get up today.
21:28 josephwb ok, good to know
21:28 towodo we can’t proceed without some testing of cody’s changes
21:29 josephwb i could always work my deployment magic ;-)
21:29 towodo and without knowing what the condition of the database on devapi is
21:29 towodo too much danger of breaking everything.
21:29 kcranstn @towodo - can you send a summary to the opentree software list?
21:30 towodo yes
21:30 kcranstn thanks
21:30 towodo so… let’s plan on downtime maybe Friday?
21:30 josephwb really?
21:30 josephwb i am trying to push through on synthesis
21:31 josephwb never mind, i can set up something local
21:31 jimallman looks like we should deploy new taxo code and db to devapi, and also newest oti (for compatibility)
21:31 towodo so where do we get a new db?
21:31 towodo there’s no reason oti has to be compatible with taxomachine - there’s no interaction
21:32 towodo ok. preparing email
21:32 jimallman i only know what i read in the papers: https://github.com/OpenTreeOfLife/oti/pull/24
21:34 jimallman ah, it’s just a couple of changes to Java package names and import paths:
21:34 jimallman -import org.opentree.taxonomy.TaxonomyRelType;
21:34 jimallman +import org.opentree.taxonomy.constants.TaxonomyRelType;
21:41 josephwb towodo: you said that the devapi was using the most recent ott?
21:42 towodo yes, I think so - but I have no way of knowing, since our lines of communication are poor.
21:42 towodo Let me get the test taxon
21:42 josephwb is so, can we not just point the prod api to that taxonomy db?
21:42 josephwb if so
21:43 towodo it’s Aphanius lunatus
21:43 towodo many interactions.  interfaces to other components have changed.
21:43 towodo curator especially
21:44 towodo too risky.
21:44 josephwb ok
21:44 towodo needs to be tested first.
21:53 towodo jimallman, it may be time to go to C (Cavia ?)
21:54 jimallman where C is our next “release candidate” for production, yes?
21:54 towodo yes
21:54 jimallman sound good
21:54 towodo you have all your notes, right? are we closer to scripting?
21:55 jimallman notes, yes. scripts, no.
21:55 towodo I am using https://github.com/OpenTreeOfLife/otb/wiki/Setting-up-the-asterales-system
21:55 towodo as a guide
21:55 towodo it’s almost a script
21:56 kcranstn joined #opentreeoflife
21:57 jimallman yes, nice. i believe the missing piece (in my notes) is mainly a short sequence of operations involving SSL certs: https://gist.github.com/jimallman/2c60be5eb49e7a443133
21:59 jimallman ah, and i also have notes on copying neo4j databases  (incl. “lateral moves”) and some other bits
21:59 towodo I’m thinking we can just recycle ot14 for the front end
21:59 jimallman sounds reasonable.
22:00 jimallman shall i make Cavia branches from master, in all the usual repos?
22:01 jimallman or i can start setting up SSL keys, if ot17 is ready. or modifying deployed-systems, if you prefer.
22:02 towodo don’t do anything until after we get devtree / devapi up and running.
22:02 towodo it’s not ot17, it’s ot18.  ot17 is asterales
22:02 jimallman with latest taxomachine and oti, you mean?
22:03 towodo we may need to fix bugs.  need 1-2 days of testing
22:03 jimallman ah, ok. i got a little frisky there.
22:03 towodo email sent
22:08 jimallman fyi, i believe i’ve found the cause of missing strains and infraspecies critters, at least in the taxon search (header bar)
22:08 kcranstn exciting! what was the issue
22:08 towodo cool!
22:08 kcranstn ?
22:09 jimallman as i suspected, these show up via ‘contextQueryForNames’, but for not in the older ‘autocompleteBoxQuery’.
22:09 jimallman …which is what i’m still using for search (there’s an active issue to move to contextQueryForNames)
22:10 jimallman i’ll check OTU mapping next. i wouldn’t be surprised if it’s the same situation there.
22:10 kcranstn does this explain the OTU mapping as well, or just the search ox
22:10 kcranstn box
22:10 kcranstn ah, ok
22:11 towodo here is sketch of upgrade plan (similar to before) https://docs.google.com/document/d/1JU3kXu7zQHG0ZriwkUIbZuR9QhTitykRKeV8L1491IU/edit
22:13 jimallman kcranstn: confirmed, i’m still using the old method for OTU mapping as well. i’ll move this up in the to-do list, since it’s obviously a pain point for curators.
22:40 kcranstn joined #opentreeoflife
22:50 towodo jimallman, you there?
22:51 towodo guess not.
22:51 towodo sending email
22:54 jimallman towodo: sorry, here now
22:55 towodo see email.  I’m going to go eat now
22:56 kcranstn thanks for updating the github issue, jimallman
22:57 jimallman sure thing! lots of loose ends at the moment, but i’m hoping to wrap up a few tonight :)

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary