Perl 6 - the future is here, just unevenly distributed

IRC log for #opentreeoflife, 2014-07-21

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:58 kcranstn joined #opentreeoflife
01:15 scrollback joined #opentreeoflife
02:08 jimallman joined #opentreeoflife
02:12 towodo joined #opentreeoflife
03:19 towodo joined #opentreeoflife
10:59 towodo_ joined #opentreeoflife
13:18 kcranstn joined #opentreeoflife
14:07 josephwb joined #opentreeoflife
14:20 josephwb kcranstn: have you looked at this? https://github.com/OpenTreeOfLife/opentree/issues/345
14:21 josephwb i found the code, and can change it myself. go ahead?
14:22 kcranstn I applaud the inclusion of ‘u’
14:22 josephwb ;-)
14:22 josephwb is there a controlled vocab for this property somewhere?
14:22 kcranstn I can seen people being confused about species tree vs bayesian vs ML (“my study is a bayesian species tree method”)
14:22 josephwb right
14:23 kcranstn might be worth some discussion on the main list
14:23 josephwb ok, I'll put it up
14:23 kcranstn thanks!
14:57 josephwb kcranstn: which otX is the current production machine?
14:57 josephwb ot10?
14:58 kcranstn https://github.com/OpenTreeOfLife/deployed-systems/blob/master/opentree-servers.txt
14:58 kcranstn looks like that hasn’t been updated
14:59 kcranstn should bring that up at the call today
15:00 josephwb sounds good
15:59 pmidford joined #opentreeoflife
16:49 jimallman joined #opentreeoflife
16:50 kcranstn joined #opentreeoflife
16:58 josephwb1 joined #opentreeoflife
17:02 kcranstn https://plus.google.com/u/0/events/cvh3d2bf4o5s7ursug1v7a9v8ao
17:58 kcranstn @jimallman - can you look into this asap: https://github.com/OpenTreeOfLife/opentree/issues/378
17:59 jimallman yes, will do
18:01 josephwb joined #opentreeoflife
18:03 kcranstn_ joined #opentreeoflife
18:18 towodo joined #opentreeoflife
18:23 josephwb jimallman: why are there no commits more recent than 11 days ago on phylesystem-1?
18:23 josephwb https://github.com/OpenTreeOfLife/phylesystem-1/commits/master
18:25 kcranstn_ clearly need bigger warning text on dev site, particulary in curator app
18:25 josephwb i am not on dev site
18:25 josephwb but, yes
18:25 kcranstn_ yes, but chris was
18:26 josephwb yes, i know
18:26 kcranstn_ your problem is unique ;)
18:27 kcranstn_ @jimallman - perhaps we need a warning popup when you creating / editing studies on dev
18:28 jimallman i suppose so, yes. i had a super-obnoxious DEVELOPMENT header at one time, and it just seemed like too much. maybe there’s no such thing as too much. :)
18:29 kcranstn_ can we move chris’ commits over to phylesystem-1?
18:29 jimallman re: painful lag between “local” phylesystem-api repo and GitHub… acknowledged, sending an email now.
18:30 jimallman kcranstn_: i believe so, researching this now. obviously, i’d like to maintain history if possible, and make sure that “inner” and “outer” study ids are consistent.
18:30 kcranstn_ explain inner and outer?
18:33 josephwb kcranstn_ did we ever figure out which machine was the prod server? this is out of date, yes? https://github.com/OpenTreeOfLife/deployed-systems/blob/master/opentree-servers.txt
18:33 jimallman i suspect that the history (and the NexSON doc) might refer to this study’s own ID. if i’m going to move a study from phylesystem-0 to -1, i’m inclined to change its ID to the next available one in phylesystem-1.
18:33 josephwb yes, b/c there is already a study ot_29
18:33 kcranstn_ yes
18:33 jimallman of course, it needs to change.
18:34 jimallman re: which is the production server? try the dig tool:  $ dig tree.opentreeoflife.org
18:34 kcranstn_ @towodo - see @josephwb question re: opentree-servers.txt
18:34 jimallman kcranstn_: josephwb ^^ see my answer above
18:35 towodo must have been updated on a different branch. will check
18:35 jimallman jonathan has set up server names so that dig calls will answer this question. currently tree.opentreeoflife.org is ot14, and api.opentreeoflife.org is ot15
18:36 josephwb got it
18:37 towodo ok, pushed updated deployed-systems to master. sorry about that
18:38 josephwb there is no config files in the deployed-systems/development repo for oti14 and ot15
18:38 towodo that’s correct.
18:38 josephwb should there?
18:38 towodo no.
18:39 towodo production is Bos
18:39 towodo = ot14 + ot15
18:39 josephwb "Bos"?
18:39 towodo starts with the letter B and is short.
18:39 towodo https://github.com/OpenTreeOfLife/deployed-systems/blob/master/genera.txt
18:40 josephwb how to update (e.g. with the push.sh -c …)?
18:40 towodo what do you want to update?
18:40 josephwb curator, but just generally asking
18:40 towodo do you want to update production or development?
18:40 josephwb prod
18:42 towodo you would say something like, ./push.sh -c ../../deployed-systems/Bos/ot15.config opentree
18:42 josephwb oh, i see now. thanks.
18:46 jimallman josephwb: important: in order to do the above, you’d first need to move your (cloned) deployed-systems repo to branch Bos:
18:46 jimallman $ git checkout Bos
18:47 josephwb erg.
18:47 josephwb already started
18:47 josephwb did i break everything?
18:47 jimallman ah, i might have steered you wrong there.
18:49 jimallman i suspect you’re fine. we started making release-specific branches (Atta, Bos) but realized that wasn’t in the spirit of this particular repo.
18:49 jimallman it was set up to stay on master, with release-specific folders (Atta, Bos, …) and an ever-present development folder
18:49 josephwb ok
18:49 jimallman josephwb: ^ so it’s all good
18:50 jimallman i’m going to delete Bos branch to avoid (my) confusion in the future   :-/
18:50 josephwb it didn't work for me anyway.
18:50 josephwb Cannot find GITHUB_CLIENT_SECRET file ..
18:51 jimallman Yes, there’s a hefty family of secure files that don’t get committed to the repository.
18:52 jimallman API client secrets, private keys, etc.
18:52 jimallman if you’re seriously interested in using the deployment tools, we’ll need to get these to you in a secure way (probably upload to one of the servers, then you’d use scp to pull them down).
18:53 josephwb it might come in handy
18:53 jimallman Agreed. I’ll put that on my list.
18:54 josephwb no rush
18:54 jimallman FYI, here’s a (pretty) complete list of the secure files. There are a few more now…
18:54 jimallman https://github.com/OpenTreeOfLife/deployed-systems/blob/master/README.md#sensitive-information
18:54 jimallman (I should clean up that list and purge some of the old key files.)
19:00 blackrim joined #opentreeoflife
19:00 blackrim josephwb: i am getting a failure to download past 1776. are you getting this?
19:00 josephwb let me try
19:01 josephwb nope, works for me
19:01 josephwb which file are you using?
19:01 blackrim I will try again
19:03 josephwb are you using one of my files? note that I have set up my conf differently
19:04 blackrim I am using your metazoa (copied it)
19:06 blackrim still failing but trying a couple other things
19:19 josephwb jimallman: is the server down?
19:20 jimallman which?
19:20 jimallman josephwb: ^
19:20 josephwb tree.opentreeoflife.org
19:20 josephwb won't load
19:24 jimallman i see. i’ll re-do the push that failed for you (it probably just stopped apache and wouldn’t restart)
19:24 josephwb right. of course i broke it.
19:34 towodo josephwb, it pushes the Bos branch of all repos that get pushed, had you modified a Bos branch?
19:34 josephwb not to my knowledge
19:34 josephwb ]just used the config file you specified
19:34 towodo so then the push would have been a no-op. what were you trying to do?
19:35 josephwb update the curator
19:35 towodo the one I specified gives the Bos branch.
19:36 josephwb i din't check out Bos
19:37 towodo curator would have been ot14 anyhow
19:37 josephwb i tried that too
19:38 towodo we need to be careful about updating production.. the setup is designed to protect it, so that makes it sort of complicated
19:38 towodo I didn’t know you going to just do it, thought you were asking for general knowledge
19:39 josephwb whoops
19:39 josephwb sorry about that
19:39 josephwb should have been cleaer
19:39 towodo it’s ok (assuming jimallman is going to succeed in repairs, which I assume he will)
19:40 jimallman towodo: something’s funny on ot14 (tree.opentreeoflife.org)… i’m trying to restart apache, but port 80 is tied up.. working on it now.
19:41 towodo try shutting it down, then wait a few secs, then start
19:41 kcranstn joined #opentreeoflife
19:42 towodo apache2ctl stop; sleep 5; apache2ctl start
20:00 jimallman towodo: yes, i’ve restarted a few different ways. perhaps the problem is elsewhere. it’s weird, the apache logs show successful requests (response=200), but my requests never return.
20:00 jimallman josephwb: can you reach tree.opentreeoflife.org now?
20:00 towodo hanging
20:01 jimallman UPDATE: curator is working, but the synth-tree viewer on tree.opentreeoflife.org is down
20:01 jimallman probably an “upstream” problem on api.opentreeoflife.org
20:01 kcranstn ok, we need to make sure this doesn’t happen again
20:01 kcranstn what made that possible?
20:02 towodo I gave Joseph a little knowledge & didn’t know he’d use it
20:02 kcranstn lessons learned ;)
20:03 josephwb Lesson: never give anything to Joseph.
20:04 towodo there’s no favicon.ico.
20:04 towodo coming through that is.
20:05 towodo do jimallman, you did a push and it succeeded, and now apache is broken? that’s very odd
20:06 towodo s/do/so/
20:06 jimallman i’m tracing the problem now. it’s not apache on the webapp server (tree.opentreeoflife.org), because the curation tool works there.
20:07 josephwb sorry jimallman
20:07 towodo http://tree.opentreeoflife.org/curator fails for me
20:07 jimallman i suspect the real problem is on api.opentreeoflife.org, a failed request from the tree-browser webapp… but it’s not leaving any obvious clues
20:07 towodo as does http://tree.opentreeoflife.org/contact
20:07 towodo and http://tree.opentreeoflife.org/favicon.ico
20:08 jimallman hm, these are failing for me now too. but i was just using that curator (still open in a browser tab)
20:08 josephwb seems to be working now
20:08 jimallman is it possible we’re tying up apache workers with synth-tree requests that take a looong time to fail?
20:09 josephwb er, it was working for a second
20:09 jimallman josephwb: same here, my requests are eventually returning (curator)
20:09 josephwb and it is back
20:09 towodo favicon.ico working
20:10 towodo well that was odd.  theories?
20:10 josephwb i cannot edit a study
20:11 jimallman again, my hunch is we’re somehow tying up apache workers (or some other resource), so we get intermittent breakage
20:11 towodo hmm.
20:11 towodo I just tried the home page, that fails.
20:12 towodo did you try to redeploy ot15?
20:12 towodo i.e. api?
20:12 jimallman yes
20:12 jimallman looked like a clean/good deployment
20:12 towodo we have 6 apache workers
20:12 jimallman FYI - GitHub is up and running fine: https://status.github.com/
20:13 towodo I see 3 javas running on ot15
20:14 towodo maybe reboot??? we would need to manually restart the three neo4js maybe …
20:17 jimallman agreed, i’ll reboot ot15. i don’t see any good reason why this should have failed.
20:17 jimallman towodo: ^
20:18 towodo (I see lots of debugging infor in the apache error log. may be time to flush this)
20:18 towodo reboot… hmm, need to think this through, hang on…
20:18 towodo I’ll check amazon to make sure it won’t kill the server
20:19 jimallman towodo: i was just thinking the same. looks like the safer route would be through the EC2 web dashboard…
20:19 jimallman http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-reboot.html
20:20 towodo yes, i’m there
20:20 towodo yep, I see the control.
20:20 towodo so, I’m to reboot ot15, then we manually restart the three neo4js? double checking
20:21 towodo jimallman ^
20:21 jimallman yes, or we could just push/deploy to ot15 again (i’m inclined to try this)
20:21 jimallman towodo: ^
20:22 towodo ok, let’s try that.  there may be a bug in the neo4j stuff, looking at a hunch now
20:23 towodo nope, the bug I imagined is not there.  re-pushing will definitely restart the servers (but won’t stop them unless the repos have changed)
20:23 towodo looking at push log on ot15
20:24 * jimallman is pushing now
20:24 towodo joseph attempting an opentree install on ot15, which doesn’t want it.  that led to an ncl install…
20:24 towodo and a peyotl install…
20:25 towodo s/ting/ted/
20:25 towodo I need to step out for about 1/2 hour, sorry
20:25 jimallman ok, i’ll keep at it (push complete, testing now)
20:25 towodo jimallman ^
20:25 jimallman thanks for the assist
20:25 towodo np, thanks for taking initiative
20:26 jimallman towodo: would you mind rebooting ot14? this is not working (yet)
20:38 jimallman kcranstn: do you have access to our AWS web dashboard, to restart ot14?
20:39 kcranstn I do have access
20:40 kcranstn you want me to reboot ot14?
20:41 kcranstn jimallman ^
20:41 jimallman kcranstn: please do
20:43 kcranstn ok
20:43 kcranstn done
20:43 jimallman thanks
20:44 kcranstn let me know if that worked - UI doesn’t give much feedback
20:51 jimallman kcranstn: it didn’t hurt, but didn’t help much. something is holding up most of our web requests, but it’s not clear what. sometimes a complex page comes right back, then another request will stall for minutes.
20:52 jimallman (we’ve restarted both production servers and i’ve re-deployed everything to both.)
20:53 jimallman i need to step away for ~30 minutes, then i’ll get right back to the hunt. jonathan will be back before then and might have more insights.
20:54 josephwb sorry jimallman for the extra work
20:55 jimallman it’s ok, we’ll sort it out. this is actually useful information, since i (for one) didn’t think a failed deployment could cause such a failure.
20:55 josephwb You're right: I'm sort of a debugging hero.
20:56 josephwb jk
20:56 josephwb I feel horrible about this
21:07 towodo josephwb, don’t.
21:13 towodo I’m thinking: halt all services on ot15, flush the repo/ dirs except phylesystem, then retry
21:14 towodo oops got the numbering backward!
21:14 towodo I meant api
21:20 towodo except that we haven’t localized the problem between ot14 (api) and ot15 (tree)
21:20 towodo thinking aloud
21:22 towodo disk 88% full.
21:22 towodo not good.
21:26 jimallman towodo: good thought about disk space. it’s one thing that would persist across reboots
21:26 towodo sorting things out, hang on…
21:27 towodo ot14 = tree, medium.  ot15 = api, large.
21:28 towodo ot15 disk = 150G, 88% used
21:28 towodo that doesn’t make any sense. shouldn’t need that much space. looking
21:29 towodo all of it in ~opentree/
21:30 towodo but numbers in ‘du -s -m *’  don’t add up… dotfiles?
21:32 towodo I can’t add. it does add up.
21:34 towodo maybe the swap files don’t get included? seems unlikely
21:35 jimallman the bulk of the stuff is in phylesystem-1_par
21:35 jimallman (two copies of the docstore in there)
21:36 towodo no, i think the bulk is the oti and taxo dbs (49G + 44G)
21:36 towodo nothing to be done about that
21:36 towodo we could provision a new machine.
21:36 towodo ‘machine’ that is
21:37 towodo but I’m not convinced disk space is the problem
21:37 jimallman hm, that’s not what i’m seeing at all (bulk = oti and taxo)..
21:37 towodo du -s -m *
21:39 jimallman ah, i’m with you now. i was comparing stuff under ~/repo
21:39 jimallman of course it’s the neo4j installations
21:39 towodo not sure what to do. disk space is our only suspect.
21:40 towodo will start provisioning process while we think & look
21:40 jimallman ~/downloads is a fairly sizable chunk (13G).. can we toss that and see if it helps?
21:41 josephwb joined #opentreeoflife
21:41 towodo I wondered about that.  Those files all exist elsewhere, can you get the dbs easily enough?
21:41 jimallman i’ll check to see if there’s a standard command for “deflating” or recovering disk space from a neo4j instance
21:42 jimallman re: downloads, i seem to recall this is just temporary storage used during deployment
21:42 jimallman ah, and a few manual scp tasks as well
21:43 towodo it is temporary, it’s just annoying to have to reload it if it’s needed.  but not a major worry
21:43 towodo you have all your notes…?
21:44 towodo launching instance… we can throw it away if we don’t need it
21:47 towodo ot18, m3.large 7.5G, 250G ‘disk’
21:48 towodo going to edit the Bos files…
21:49 towodo jimallman, considering renaming ot14.config to tree.config, ot15 to api, OK? to help prevent mishaps
21:50 jimallman hm, i like it
21:51 jimallman no luck so far on compacting tools for neo4j… it seems to be fairly lax about db size, log file lengths, etc.
21:52 towodo hmm. did you find stuff that looks deletable and big?
21:52 josephwb i like the proposed renaming (not that I'll ever touch them again :-$)
21:53 jimallman towodo: yes, each neo4j app contains the main db and a graph.db.previous that’s equally huge
21:53 jimallman these are obvious candidates, assuming they’re here Just In Case
21:54 towodo ah… the .previous ones… we don’t need those
21:54 jimallman clobbering all 3 would recover about half the disk space (55GB or so)
21:54 towodo never used them before, they would be for disaster recovery.. but not helpful unless you know what they are
21:55 towodo which I don’t. wonder why they’re there
21:55 towodo look at mod times ...
21:55 jimallman perhaps a safety measure during deployment..? something to fall back on. checking times now..
21:55 towodo very suspicious
21:56 towodo yes it’s a safety measure but I don’t see why we would have ever copied two dbs to ot15
21:56 towodo could be a bug
21:56 jimallman all unchanged since July 10 (time of last db update, i’m sure)
21:57 jimallman towodo: do you mean because ot15 should only have had dbs installed once?
21:57 towodo yes
21:58 jimallman looking at install-db.sh now
21:58 towodo good catch by the way
21:58 towodo setup/install-db.sh  says it all
21:58 jimallman right
21:58 josephwb joined #opentreeoflife
21:58 towodo but did you do install-db.sh twice when you set the server up?
21:59 towodo seems odd.
21:59 towodo I say blow them away.
21:59 towodo the .previous ones that is
21:59 jimallman it’s possible we did something twice.. don’t recall
21:59 jimallman agreed, clobbering now.
22:00 jimallman (hm, this suggests that we’ve been carrying double db’s for the last couple of weeks. why would it go blooey today?)
22:00 towodo right.
22:01 jimallman (i thought josephwb’s attempted push might have created the second dbs, but that doesn’t seem likely)
22:01 jimallman i’ll stash the mod-dates and sizes for later pondering
22:01 towodo thanks
22:02 towodo the script should be changed to preserve two .tgz files, not two databases.
22:03 towodo josephwb, i’ve determined your punishment
22:03 towodo you’re to make the asterales server work
22:03 josephwb what is it?
22:03 josephwb ha!
22:03 josephwb ok
22:03 towodo see my email on the subject
22:03 josephwb will do
22:04 jimallman $ ls -alF neo4j-*/data/graph.db.previous > db-previous.out
22:04 jimallman sizes, mod dates, permissions stored in ot15:/home/opentree/db-previous.out
22:05 towodo good. the databases don’t change much (if at all) so I don’t expect much information, but …
22:06 jimallman new web requests to tree. are still crawling.
22:07 towodo yes I saw…
22:07 towodo foo.
22:07 towodo can we get a single POST executable from shell that hangs?
22:08 towodo that is, can we test the failure in isolation, to be completely sure it’s api and not tree?
22:08 jimallman agreed, i need to review what server-to-server AJAX calls are done in a typical request, then try each.
22:09 jimallman or we can reboot ot15 again and cross our fingers..
22:09 jimallman but i’ll get started on the AJAX thing and try to isolate using cURL
22:09 towodo hmm…
22:09 towodo that’s something we need to have in any case
22:13 josephwb joined #opentreeoflife
22:13 towodo doing push.sh to ot18
22:15 towodo api.config now points to ot18, not ot15. for now I won’t touch ot14 or ot15
22:26 josephwb joined #opentreeoflife
22:27 jimallman towodo: just FYI - ot14 still thinks api.opentreeoflife.org is ot15
22:27 jimallman $ ping api.opentreeoflife.org
22:27 jimallman PING ot15.opentreeoflife.org (50.112.237.122) 56(84) bytes of data.
22:28 jimallman my bad, i misconstrued your “points to” above to think you had changed the DNS
22:33 towodo ok, finished; no ssl cert
22:34 towodo I can fetch it from ot15
22:37 josephwb joined #opentreeoflife
22:43 jimallman FYI - we get slow response even on the Contacts page, which does VERY little. It does ping the GitHub Issues API, but that seems pretty snappy: https://api.github.com/repos/OpenTreeOfLife/feedback/issues
22:43 jimallman …unless we’re being throttled by GitHub (i’ll check for their special headers)
22:48 towodo hmm.
22:49 jimallman just checked the basic login request to webapp. this attempts a login via GitHub user api, but skips the rest.
22:50 jimallman this is also super-slow, which makes no sense.
22:50 jimallman i now suspect the problem is more fundamental, either apache or web2py in the webapp server is messed up…
22:50 jimallman tryign to troubleshoot this now.
22:50 towodo I think you may be on to something re github.
22:51 towodo can comments be turned off temporarily?
22:51 pmidford left #opentreeoflife
22:51 towodo in general I think comments traffic may need special handling - it would be unfortunate if github nailed us because the comments repo got hammered because we got hammered.
22:52 towodo jimallman, i’m at MIT and need to go home and eat.
22:53 jimallman no problem, i’ll keep digging here. see my comment above about the login method.
22:55 towodo I’ll leave this in your hands, but check in later tonight.
22:55 towodo ot18 has a completed deployment but no databases. but something tells me we won’t need it.
22:56 towodo talk to you in about 2 hours.
23:54 josephwb joined #opentreeoflife

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary