Perl 6 - the future is here, just unevenly distributed

IRC log for #opentreeoflife, 2015-08-24

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
12:24 jar286 joined #opentreeoflife
14:53 kcranstn joined #opentreeoflife
16:58 jimallman fyi - looks like we have a general failure on api.opentreeoflife.org (unable to view or edit studies)… i’m chasing this down now
17:09 jimallman argh, and now the synth-tree viewer is down. not sure what’s up yet...
17:12 jimallman ah, it seems treemachine is down
17:13 snacktavish joined #opentreeoflife
17:13 jimallman snacktavish: any idea how to gently restart treemachine?
17:13 snacktavish none :(
17:13 jimallman (hi, by the way)
17:13 snacktavish hi!
17:13 jimallman ok, i’ll keep digging through my notes
17:14 snacktavish ya, I was just looking at the issue but am a bit mystified thus far...
17:19 jimallman i just restarted treemachine, let’s see if this helps.
17:21 snacktavish Hope so!
17:21 snacktavish I just realized that a script I'm running hits tree machine to get an mrca, and that it was working an hour ago, and then stopped (I'm having other issues and didn't immediately realize that was the problem)
17:22 jimallman yeah, it’s a sudden failure with a familiar symptom (one of our plugins is “missing”)
17:22 snacktavish can I have overwhelmed treemachine with too many mrca queries? It wasn't very many...
17:22 snacktavish ahhhh
17:22 jimallman try ‘curl https://api.opentreeoflife.org/treemachine/v1/getNodeIDForottId'
17:22 jimallman No such ServerPlugin: "GoLS"
17:23 jimallman re: overwhelming treemachine, i’m not sure how robust it is. but it seems unlikely that it could cause this kind of error, right?
17:23 snacktavish yep
17:23 snacktavish I agree
17:24 jimallman historically, we’ve been able to “restore” the “missing” plugins by re-deploying treemachine… but i’d like to review recently merged PRs to make sure it’s compatible with the other already-deployed stuff.
17:24 jimallman see also general treemachine failures here: http://phylo.bio.ku.edu/status/status.html
17:24 snacktavish but where did the plugin go? it doesn't seem that anything should have changed in teh last hour or so?
17:26 jimallman agreed, it’s peculiar. we’ve seen this error message before, but one would expect it to appear after a deployment…
17:26 snacktavish also, I didn't realize that the curation editor relied on tree machine, I thought it was just OTI and phylesystem
17:27 snacktavish what is treemachine's role?
17:27 snacktavish (I should know this)
17:27 jimallman synth-tree relies on treemachine.. i’m not sure yet why curation is down, esp since the status page (above) shows working phylesystem API.
17:27 snacktavish ah right.
17:28 jimallman i can’t think of any treemachine dependencies in curation, offhand.
17:28 snacktavish https://github.com/OpenTreeOfLife/opentree/blob/51a96f3885c5eecc3a14547d6c664a0e714d49b8/curator/controllers/study.py#L83
17:28 snacktavish I guess it's here.
17:28 jimallman nothing that should block loading study nexson, in any case
17:28 snacktavish yeah, I agree
17:29 jimallman ! good catch! that does seem to be the specific error on api server (latest SHA not found)
17:29 snacktavish ya, seems like that is a reasonable place, but shouldn't block curation
17:30 jimallman au contraire.. curation uses an “expanded” response with nexson, plus supplemental stuff…
17:30 jimallman including the latest synthesis SHA: https://github.com/OpenTreeOfLife/opentree/blob/51a96f3885c5eecc3a14547d6c664a0e714d49b8/curator/controllers/study.py#L48
17:31 snacktavish ahhhh yiss
17:31 jimallman i believe we use this to notify the curator whether a study has changed since last synthesis
17:31 snacktavish ah, makes sense
17:31 jimallman so i’m going on the hunch (for now) that this is all due to treemachine… which is running, but not responding?
17:32 snacktavish and re-start had no effect?
17:33 snacktavish hmmmm. The rain stopped briefly, so I will contemplate while walking home and then hop back on irc and see if you've solved it!
17:33 snacktavish :)
17:43 jimallman a more through stop+start of treemachine has restored both webapps to normal behavior, and cleared all treemachine errors on the API status page (http://phylo.bio.ku.edu/status/status.html)
17:44 jimallman We’ll need more investigation to find the cause. In some weird way, the “missing plugin” message is a red herring, i think. (Very interesting that this can “just happen” without a restart or re-deployment!)
17:45 jimallman It’s also annoying that I didn’t see any Nagios alerts for this! Just a bug report from Romina..
18:03 kcranstn joined #opentreeoflife
18:36 snacktavish joined #opentreeoflife
18:37 snacktavish yep, glad it is solved, but that certainly seems concerning!
18:40 snacktavish @jimallman I wonder if it would be worth changing that treemachine call for if the study changed since synthesis so that curation doesn't go down when treemachine does...
18:41 jimallman yes, it seems sensible to opt for looser coupling in cases like this.
18:43 snacktavish I didn't see anywhere else obvious that it depends on treemachine, but there may be more!
18:43 snacktavish but worth considering
18:44 jimallman yeah, mtholder has recently listed all the API URLs from each app’s config file. some of those are unused, but it’s a good place to start looking.
23:38 guest|51159 joined #opentreeoflife

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary