Perl 6 - the future is here, just unevenly distributed

IRC log for #opentreeoflife, 2015-09-20

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:28 kcranstn joined #opentreeoflife
00:49 guest|31282 joined #opentreeoflife
01:03 jar286 cpu & ram are fine… maybe try increasing the number of threads next
01:12 kcranstn joined #opentreeoflife
01:14 jar286 latency test 1:20
01:31 jar286 increased number of threads, no difference
01:36 jar286 maybe decrease KeepAliveTimeout
02:18 mtholder joined #opentreeoflife
02:28 mtholder surely, the reddit traffic has subsided somewhat at this point, bu the latency is still really high. Are we in some messed up state that could be helped by a restart?
02:53 guest|59639 joined #opentreeoflife
05:08 guest|90439 joined #opentreeoflife
06:39 guest|89290 joined #opentreeoflife
08:06 guest|83401 joined #opentreeoflife
11:29 mtholder joined #opentreeoflife
11:57 jar286 joined #opentreeoflife
11:58 guest|23802 joined #opentreeoflife
12:05 jar286 mtholder, we are configured to handle 300 connections. these are all busy right now, so there is no practical sense in which traffic has subsided. since going from 150 to 300 led to immediate saturation, my guess is that going from 300 to 1000 would also lead to immediate saturation. what effect that would have on latency is not clear to me. going from 150 to 300 increased latency, so going from 300 to 1000 I would
12:05 jar286 expect would also increase latency.
12:07 kcranstn joined #opentreeoflife
12:07 kcranstn morning, all!
12:07 mtholder can we try going to 1000?
12:07 jar286 I did a restart going from 150 to 300 and that didn’t help in any sense. so I doubt a restart now will make a difference. apache is extremely robust and I don’t expect it to get into broken states
12:07 mtholder mornin'
12:07 jar286 Yes, we can try that
12:09 jar286 how about after breakfast
12:09 kcranstn that’s not the only problem, though, right? We were slow even before the paper published and we hit reddit
12:09 jar286 correct, but the latency swamps all other problems right now
12:09 kcranstn agree
12:09 mtholder once the page loads, you seem to be able to click around with few problems
12:09 kcranstn that’s my experience as well
12:10 jar286 that surprises me since the keepalive timeout is only 5 seconds
12:10 mtholder but it takes 80 seconds for apache to give you a connection.
12:10 mtholder i may not have waited 5 seconds
12:10 mtholder does the move from http to https mean a new connection?
12:11 jar286 yes, it’s a different port
12:11 jar286 but the latency test I’m doing is strictly http
12:11 jar286 and it’s still 80 seconds
12:12 mtholder so I think that any user using the http port will have to wait for connections twice.
12:12 mtholder so it is a 3 minute load
12:12 kcranstn all of the links in the pnas article are https
12:12 kcranstn and in other news, it seems we are being spammed in teh comments
12:13 jar286 that’s just now
12:13 mtholder classic TreX
12:16 jar286 I’ll be back in 1/2 hour
12:16 kcranstn I just got coffee delivered!
12:20 kcranstn media interview with BBC World Service!
12:20 kcranstn nope, that’s spam
12:23 mtholder please don't tell the bbc we have a website
12:23 mtholder ;-)
12:24 kcranstn I still don’t have a good sense of what our problem(s) is / are
12:25 mtholder I think we aren't creating nearly enough connections. Though I agree with @jar286 that it is odd that doubling them did not improve latency at least somewhat
12:25 mtholder I don't have a good sense of the variance
12:25 mtholder of our # of requests
12:26 mtholder if it is high, a single test would not be reliable
12:26 kcranstn that’s problem #1, but why is a single connection taking so long?
12:27 mtholder because you have to wait 80 seconds for a connection to become available. I don't know if we know the lag, once you get a connection.
12:28 mtholder see jar's comment this morning (before you joined) about the fact that all 300 connections are used.
12:29 kcranstn yeah, I browser through the history already
12:29 mtholder apache is using between 5 and 14% cpu if you watch top
12:29 mtholder and only 4 % of memor.
12:29 mtholder memory
12:29 mtholder so, it is not clear to me why we can't bump the  # of connections to 1500 or so
12:30 kcranstn is our KeepAlive low?
12:31 mtholder MaxKeepAliveRequests 100
12:32 mtholder timeout 5
12:32 mtholder last I checked
12:32 kcranstn ok
12:38 mtholder we have a lot of variance when you grep a timestamp in access.log
12:39 mtholder that is requests, not connections and may be more than one thing per request
12:39 mtholder so I'm not exacltly sure how to use it.
12:42 mtholder 538 log lines per minute with a standard deviation of 265
12:42 mtholder 274, rather
12:53 kcranstn hmm
12:56 jar286 we’ve used up something like 8% of disk space on apache logs
13:00 jar286 I think we’re facing a steady, relentless pressure for connections.  I predict that increasing MaxClients to 1000 will allow many more to be serviced, but that it won’t help with latency, and won’t relieve the pressure, and could even make latency 3x worse.  But I don’t know, so it’s worth doing the experiment
13:01 mtholder so do have any idea why the latency is high if it is not waiting for a free connection?
13:01 kcranstn time for replication?
13:02 mtholder we might get some benefit from concatenating all of the stylesheets and javascript that is in the staticpage
13:02 mtholder I know that'll all be the same connection, but should reduce the # of requests
13:02 jar286 no, I don’t have a good understanding of the latency, it seems paradoxical
13:03 mtholder I agree with karen, that if we can spin up 1 (or even 4) new machines, we might be OK.
13:03 jar286 why do you think that?
13:03 mtholder because we weren't seeing 80 latency under low load
13:03 mtholder 80 sec. that is
13:04 jar286 so, your hypothesis is more servers -> lower latency?  could be true but I don’t see why it would be, unless we’re talking about enough servers to satisfy the demand.  and we have no data on what the actual demand is
13:05 jar286 enough servers might be 10 or 20 right now
13:07 mtholder 13K of our last 18K requests were from the staticpage
13:08 jar286 I understand many web sites put static content on s3 these days
13:08 kcranstn it’s embarassing that we were basically down for the whole day yesterday and we still don’t have a solution...
13:08 mtholder what have we tried? we tweaked the # of connections (150 -> 300) and the time out. Anything else?
13:09 jar286 well it could be my fault, but I think this is very hard stuff and you need people with the right experience to make high volume sites work
13:09 jar286 the only experiment so far has been 150->300
13:10 jar286 I also played with ThreadsPerProcess (? I don’t think that’s right) and it made no different)
13:10 jar286 going ahead with 300->1000 experiment, ok? any other thoughts?
13:10 kcranstn yes, let’s try that
13:10 mtholder sounds good.
13:11 kcranstn and start planning for replication
13:11 jar286 ok, proceeding… this requires an apache restart so the site will be down for a few seconds
13:11 mtholder ha ha
13:11 kcranstn we may get another wave on Monday
13:12 jar286 WARNING: MaxClients of 1000 would require 40 servers,
13:12 jar286 and would exceed the ServerLimit value of 16.
13:12 jar286 Automatically lowering MaxClients to 400.  To increase,
13:12 jar286 please see the ServerLimit directive.
13:14 jar286 https://httpd.apache.org/docs/2.2/mod/mpm_common.html#serverlimit
13:14 mtholder I guess we can bump up ServerLimit, too. noting "Special care must be taken when using this directive. If ServerLimit is set to a value much higher than necessary, extra, unused shared memory will be allocated. If both ServerLimit and MaxClients are set to values higher than the system can handle, Apache may not start or the system may become unstable.
13:15 mtholder that is the page that I quoted from btw
13:15 kcranstn got it
13:15 kcranstn are we using worker or prefork?
13:16 jar286 worker
13:16 kcranstn ok
13:17 kcranstn http://sowingseasons.com/blog/surviving-the-reddit-hug-of-death.html
13:18 mtholder we're getting about 9 requests a sec fwiw
13:18 jar286 yes, that says dozens of servers
13:19 jar286 we’re *processing* 9 requests a sec, I think
13:19 jar286 would be worth getting measurements now given that we increased from 300 to 400…
13:20 kcranstn ok, I have to leave for a bit. Be back in a couple hours
13:20 kcranstn FYI, reddit is pushing up our AMA (which may also bring a spike in traffic)
13:21 jar286 processing 13 requests/sec
13:22 jar286 restarting apache again with larger ServerLimit (40) , ok?
13:22 mtholder 40?
13:22 mtholder what was it before?
13:22 jar286 16
13:22 mtholder ok cool.
13:23 jar286 maxclients(1000) = serverlimit(40) * threadsperchild(25)
13:24 jar286 hmm… failed to set serverlimit… maybe it needs to go elsewhere in the conf file
13:24 kcranstn joined #opentreeoflife
13:26 jar286 worked that time.  apache restarted.
13:26 mtholder cool
13:28 jar286 actually the status tool tells us at what rate connection requests are coming in… I did a series of queries, so can make a plot of connections being processed vs. time
13:29 jar286 damn terminal app.  scrolled away & lost
13:30 jar286 oh no, it’s there.  will record the numbers
13:31 mtholder top shows not much free memory
13:32 jar286 it’s topped out at 400.  I need to change some other parameter in tandem
13:32 mtholder KiB Mem:   3853840 total,  3471092 used,   382748 free
13:32 jar286 if we’re maxed out i’d expect no free memory
13:32 jar286 it was about twice that much free memory before
13:33 mtholder still getting 90 sec. latency
13:33 mtholder but now that we are closer to max memory, it makes me feel like we are closer to using what the machine can do.
13:35 mtholder I've got to step out for a few hours, too...
13:35 mtholder left #opentreeoflife
13:43 guest|20801 joined #opentreeoflife
13:54 mtholder joined #opentreeoflife
13:55 mtholder jar286 and jimallman http://devtree.opentreeoflife.org/static/statfat.html now has a version of the top static page that has the CSS and JS inlined
13:55 mtholder that should cut down on the # of requests
13:56 mtholder that the server has to deal with.
13:57 mtholder doesn't really help with the # of connections.
14:10 guest|6803 joined #opentreeoflife
14:11 guest|45295 joined #opentreeoflife
14:13 guest|49949mark joined #opentreeoflife
14:18 jar286 connection demand appears to be about 1.4 per second, FWIW
14:29 jar286 well I’m baffled. MaxClients is 1000, but there are only 400 active requests.
14:41 jar286 “There seem to be some changes after version 2.3.13. For example MaxClients is now MaxRequestWorkers. “  ??
14:43 jar286 “MaxRequestWorkers was called MaxClients before version 2.3.13. The old name is still supported.”
14:48 jimallman i’m quite certain we’re running apache 2.2 (good morning, all)
14:48 * jimallman will be offline for an hour or so soon, then available to help
14:51 jar286 right, apparently it doesn’t matter, it’s ok to use the old name even in 2.4
14:52 jar286 I’m really stuck.  the only thing I can think of now is that ‘apache2ctl restart’ wasn’t adequate to make apache pick up the new value of MaxClients
14:52 jar286 and I need to do a full stop / restart
14:52 jar286 I’ve read the documentation about 10 times, and read about 10 how-to articles
14:55 jimallman here’s a detailed breakdown of stop vs. restart vs. graceful, etc.  https://httpd.apache.org/docs/2.2/stopping.html
14:56 guest|93790 joined #opentreeoflife
15:01 jimallman no obvious answers, but i agree that our current status (400 available connections) doesn’t seem to reflect the new settings.
15:04 jimallman i’ll keep looking on the web2py front for relief, since it might be exacerbating the problem
15:04 jimallman meanwhile, perhaps a full apache stop+start might make a difference..?
15:05 jimallman jar286: ^
15:19 jar286 I’m wondering about caching the phylopics…
15:23 jar286 jimallman, re the home page, can’t we just put it in static/ and do something like Alias / …/static/… ?
15:25 jimallman it seems like that should work, but AliasMatch i think (to avoid capturing /foo, etc). we’ll need to test on devtree to make sure it behaves as expected
15:29 jimallman meanwhile, this page suggests ‘apache2ctl graceful-stop && apache2ctl start’ to see the new MaxClients in action:  https://fuscata.com/kb/set-maxclients-apache-prefork
15:32 jar286 ah right, I had seen & lost that & couldn’t find it again
15:32 jar286 will try that right now
15:33 jimallman apache2ctl status says currently unavailable (still servicing old requests, apparently)...
15:34 jimallman and we’re back!
15:34 jar286 and with >400 connections
15:34 jar286 602 …
15:34 jimallman and idle connections :)
15:35 jimallman this looks healthier, but we’ll see how quickly the queue fills up
15:35 jar286 free RAM is dropping… 20k now
15:36 jimallman yeah, ‘apache2ctl status’ is very slow to return now… maybe we’ve started swapping?
15:36 jar286 no, free ram is increasing
15:37 jimallman wow, 1000 requests, all ‘W’ in the scoreboard
15:37 jar286 yes, I have a funny feeling that most of those W’s are threads waiting for phylopics
15:37 jar286 maybe we could consider turning phylopics off until this blows over
15:38 jar286 I could be wrong of course, that was just the impression I got watching the apache log go by
15:39 jimallman ok, checking this out (it’s certainly a likely bottleneck, i don’t think they’re built for this scale either)
15:39 jar286 ok, time for latency test
15:41 jar286 I think it’s just as I predicted… latency scales with MaxClients
15:43 jar286 yup.  3 minutes 58 seconds
15:44 jimallman shall i do a hot fix in treeview.js, to stop phylopic fetches?
15:44 jimallman i can stop this block by changing the initial if-test to if(false) { …
15:44 jimallman https://github.com/OpenTreeOfLife/opentree/blob/26aa74928e2c5d38c2ae397fc63c3d1c72164410/webapp/static/js/treeview.js#L863-L884
15:44 jar286 umm… I guess why not.. but did you do it on dev first to make sure no surprises?  … and it’s going to be hard to test.  I guess I can look at the logs
15:45 jimallman yes, it should show as a gradual improvement for *new* visitors; others will have cached the JS file
15:45 jimallman or i can try to immediately return something harmless from our phylopic_proxy
15:46 jimallman more requests, but they should return instantly
15:46 jar286 not sure how this works.  there’s no phylopic on the home page, is there?
15:46 jar286 but the request will be made anyhow?
15:46 jimallman there should be a phylopic for cellular organisms; we’ll try in any case
15:46 jar286 oh i see there is one.
15:47 jimallman the first solution (JS hot-fix) will stop the requests for new users only
15:47 jar286 fine.
15:47 jimallman the second solution (immediate/empty return from phylopic_proxy) should help with “old” users
15:47 jimallman why not both?
15:49 jar286 one at a time.  I suspect our heaviest load is from the home page (new users)
15:49 jar286 I don’t know
15:50 jar286 can watch the logs
15:50 jar286 it might be a good experiment
15:53 jar286 yes, there’s a lot of GET / in the log
15:53 jar286 certainly the majority of the non-static requests
15:55 jar286 GET / vastly outnumbers GET of phylopics
15:55 jar286 so I’d look into installing Mark’s static home page first
15:56 mtholder joined #opentreeoflife
15:56 jar286 92.196.3.99 - - [20/Sep/2015:15:48:29 +0000] "GET / HTTP/1.1" 303 708 "http://www.spiegel.de/wissenschaft/natur/evolution-des-lebens-stammbaum-mit-2-3-millionen-arten-a-1053850.html" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36"
16:01 jimallman quick exit from phylopic_proxy on devtree (server-side) looks good. applying on production…
16:02 jar286 ok. looks like we should be able to do AliasMatch ^/$ …/static/…
16:06 mtholder whoa we are maxing out the CPU on tree now. That is good, I guess
16:07 jar286 reminder: http://devtree.opentreeofli​fe.org/static/statfat.html
16:07 jimallman skipping phylopic fetch from treeview.js is working well on devtree, applying to production now
16:07 jar286 ok I will monitor log
16:08 jimallman done
16:09 jimallman testing this on production page (slow, of course), then moving on to statfat…
16:09 jimallman ok, looks good on production (no phylopic appears, no side effects that i can see)
16:10 jar286 yes. but you’re right about caching - lots of non-home-page requests still coming in
16:11 jar286 still a bunch for cellular organisms… wonder what that’s about
16:12 jimallman going to the home page means a quick redirect to the root node of synth-tree (cellular organisms)
16:12 jimallman maybe not a redirect, exactly. would need to review for details, but we manipulate history, build a “proper” synth-tree-view URL with the node id, and refresh.
16:13 jar286 I meant, why would people want a cellular organisms phylopic right now. they would be visiting the home page for the 2nd time
16:14 jimallman oh, you meant phylopic requests.
16:15 jar286 still a steady stream of phylopic requests, but not an outrageous number. maybe 1/second
16:16 jimallman these should taper off (and return “immediately” in the meantime)
16:16 jar286 so are you looking into the static home page? looks straightforward
16:16 jimallman i see idle connections in the apache2ctl scoreboard
16:16 jimallman yes, testing this on devtree now…
16:18 jar286 well that’s amazing.  maybe my hunch was right
16:18 jar286 and latency is down to 25 seconds
16:19 jar286 and throughput is 53 requests per second!
16:22 jimallman progress!
16:23 jimallman we’ll need to follow up with phylopic folk later
16:25 jar286 yes.
16:25 jar286 I think the phylopics should be cached in the webapp…
16:26 jar286 don’t forget that mark’s combined home page will have phyopics enabled…
16:26 jimallman or we can help them to beef up their specs for heavy traffic, and support HTTPS (the only reason we currently proxy their images)
16:26 jimallman good point! i’ll tweak that.
16:27 jimallman still having trouble getting Alias or AliasMatch to work X(
16:27 kcranstn joined #opentreeoflife
16:27 jimallman (on devtree, that is)
16:27 jar286 damn.
16:27 jar286 phylopic could have brilliant heavy traffic support, without that making any difference to our performance; don’t think that’s an issue
16:28 jar286 just the fact that we go out to the web for the pics is going to slow us down a lot
16:28 jar286 well… maybe I’m overstating this
16:29 jar286 well there are other approaches
16:29 jar286 we can change the WSGIScriptAlias or whatever it is
16:30 jar286 WSGIScriptAlias /opentree /home/opentree/web2py/wsgihandler.py
16:30 jar286 WSGIScriptAlias /curator /home/opentree/web2py/wsgihandler.py
16:30 jar286 or even
16:30 jar286 WSGIScriptAlias /..* /home/opentree/web2py/wsgihandler.py
16:30 kcranstn I am back! and I see progress! yay!
16:30 jar286 yes, progress
16:31 jar286 latency 26 seconds
16:33 jar286 or have we already been down the WSGIScriptAlias path?
16:33 jar286 oh that would be WSGIScriptAliasMatch
16:34 jar286 I know you (jimallman) were working on this but not sure if this is the way you were going.  I.e. try to divert everything except root, or try to divert a specific list of prefixes
16:35 jar286 and then just put the static home page in DocumentRoot
16:35 mtholder joined #opentreeoflife
16:35 jimallman yes, i spent a lot of time trying to “skip” / in WSGIScriptAliasMatch, but no luck
16:36 jimallman maybe a fresh pair of eyes will get it right
16:36 jimallman i think i’m getting the AliasMatch (to static homepage) now… trying to confirm with JS disabled
16:36 jimallman (on devtree)
16:36 jar286 is making an explicit list (not a wildcard) an option? or are there too many things like /opentree /curator /favico etc?
16:38 mtholder jimallman, as luck would have it treeview.js is one of the only js files that I did not inline in statfat.html
16:38 mtholder so if you've only tweaked that js file, then the statfat should be OK
16:38 jimallman i see that :)
16:39 jimallman right. it’s working well on devtree, will copy statfat to production and tweak our apache config
16:39 jar286 server is happy.  load average below 8, some free RAM, 74 requests / second
16:39 jimallman idle connections! woot!
16:40 jimallman i suspect phylopic latency was tying up lots of connections
16:40 jar286 yes
16:40 jimallman continue with statfat? or wait and see?
16:41 jar286 well, 26 second latency is still quite high, so I say continue.  my prediction is that it will make a mild difference but we’ll see
16:42 jimallman back to testing statfat on devtree… its AliasMatch seems to be capturing *all* synth-tree view URLs, not just /...
16:43 mtholder is there anyway to get timing on the individual requests? (to see if there are other sluggish ones like the phylopic)?
16:43 mtholder perhaps we could replace our favicon with https://raw.githubusercontent.com/OpenTreeOfLife/opentree/master/webapp/static/favicon.ico
16:43 jar286 does github do the cors thing?
16:43 mtholder dunno
16:43 jimallman false alarm on statfat.. i think i tried a production nodeid that doesn’t exist in dev
16:44 jar286 wait a minute that’s not right
16:44 jar286 it’s not done through cors.  you don’t get to choose the url of the favico
16:44 jimallman re: favicon, isn’t that what we currently show? (i see it in the tabs on Chrome, but not in the address bar since we show the padlock for HTTPS)
16:45 jar286 we could do a 30x redirect, but that’s probably no faster than a 200 with the image
16:45 jar286 I think mtholder is brainstorming on how to offload work from the server
16:45 jimallman gotcha
16:48 jar286 we’re still getting a phylopic request about every 2 seconds… requests are all from a set of 3-4 IPs… I wonder if someone is running a crawler.  but no matter
16:50 jar286 mtholder, re timing individual requests, I don’t think apache has any support for that. you’d have to do it client side
16:50 kcranstn I finally got an OK from nagios re: tree.opentreeoflife.org :)
16:50 kcranstn after 35 critical messages
16:51 jimallman i’m bouncing / requests on production to statfat.html (neat!)
16:51 jimallman oddly, this means we stay in HTTP (vs HTTPS) since we don’t pass through web2py
16:52 jimallman and boom, apache is once again swamped with 1000 busy connections :-/
16:52 jimallman this is my confused face
16:53 kcranstn one machine just not enough to deal?
16:54 jimallman this may not be a problem, if we’re turning over connections prompty
16:54 jimallman promptly
16:55 jar286 that is certainly a surprising outcome.  let me think about this
16:56 jar286 server is still happy.  cpu, ram being well used but under control
16:56 jimallman i’m waiting quite a while to get my new HTTPS connection for login
16:56 jimallman done now… after 1.3 minutes (!)
16:56 jimallman hmmmmm
16:57 jar286 yes, I see high latency
16:57 jimallman i’m reviewing now, to see if using statfat might have removed any of our fixes…
16:59 jar286 api server is showing somewhat higher load, but load ave is still below 1.  not worried about it
17:01 jar286 I don’t get how nagios was able to get a response in under 3 seconds
17:02 jimallman how often does nagios poll, vs our keep-alive time?
17:03 jar286 about 10 minutes
17:03 jar286 our keepalive timeout is 5 seconds
17:03 jar286 and nagios does the http connection from a shell script, so that’s a separate connection each time it checks
17:04 jar286 I think nagios waits 10 seconds for a reply… it just failed
17:09 jar286 so my current idea on why active threads and latency went up is that decreasing static pages loads led to high proportional use of services…
17:09 jar286 especially /opentree/plugin_localcomments
17:09 jar286 that is, the same situation that held for phylopics, is not holding for /opentree/plugin_localcomments
17:10 kcranstn can we start a hangout to chat for a few minutes?
17:10 jar286 jimallman ^
17:11 jimallman yes, just a sec..
17:11 jar286 sure
17:12 kcranstn let me know when you are ready, jimallman
17:12 jimallman now’s good
17:47 kcranstn “Sorry - our comment feature isn’t working right now due to high server load. You can leave an issue in our [feedback repository on GitHub] or contact us via one of the methods on [Contact page]."
17:47 jimallman cool, thanks!
17:47 kcranstn chickens here are getting leftover waffles. There is much happy.
17:50 jimallman no doubt!
17:51 jar286 nagios is happy again… 9.5 seconds
17:58 jimallman quick fix (static HTML message in place of localcomments plugin) is looking good on devtree. applying this to production now...
17:59 mtholder very cool
18:01 jimallman …and now i’m waiting for connections… sigh
18:01 jimallman (and yet optimistic!)
18:02 mtholder an optimistic sigh
18:03 jimallman ok, static HTML message is working, albeit slowly. hopefully this congestion will subside
18:05 mtholder I'm getting a reasonably fast load of the top page (which unfortunately has the 61 comments hard coded), but slow loads if I search for a taxon.
18:05 jimallman :D i’ll remove the hard-coded comments and replace with our static message.
18:08 jar286 still seeing localcomments requests.
18:09 jar286 referrer = https://tree.opentreeoflife.org/opentree/argus/opentree3.0@1
18:10 jimallman ok, will chase these down
18:13 jar286 could be the cached page.
18:13 jar286 looks like frequency is not all that high… about 1/second
18:14 jar286 similar to phylopic situation.  so maybe nothing to chase down
18:14 kcranstn joined #opentreeoflife
18:14 jar286 didn’t make the big difference I expected.
18:15 mtholder On chrome, I'm seeing dollar signs: .$ You can leave an issue in our$ feedback repository on GitHub$ or contact us via one of the methods on the$ Contact page.
18:15 mtholder but the links work, so it is not a huge deal
18:15 jimallman yeah, i’m still waiting a long time for secure connections
18:15 jimallman thanks mtholder ! they should be gone now
18:16 jimallman once i get a connection, it flows very quickly
18:17 kcranstn but initial connection still taking way too long :(
18:17 jimallman i see a problem getting the Bootstrap icons (bad URL?)… chasing this now.
18:18 mtholder I think that my statfat broke some link
18:18 mtholder ../img should be ./img
18:18 jar286 this took 2 minutes 50 seconds: wget --no-check-certificate "https://tree.opentreeoflife.org/opentree/argus/opentree3.0@1"
18:18 mtholder I guess when those css files are plugged in, there is some interpretation of relative paths?
18:19 mtholder I basically just cut and pasted the content (with a script)
18:19 mtholder jimallman^
18:19 mtholder those are the bootstrap icons, right?
18:19 jimallman ah yes, i’m sure that’s it (relative to HTML vs. CSS file)
18:19 jimallman and yes, note the lack of icons on the ‘Zoom tree view’ control, for example
18:19 mtholder hmmm maybe the statfat was a bad idea...
18:20 jimallman time will tell :)
18:21 mtholder do you want me to fix them, or did you get it?
18:22 jimallman got it, testing now
18:23 jimallman better!
18:24 jimallman and now, a snack break (back in a few)
18:27 mtholder I fear that the "argus" in the url that jar286 pasted in is another artifact of the statfat.html.  I think that we need to back out of that...
18:28 jar286 if you got it from the original page, how could it be wrong?
18:29 kcranstn joined #opentreeoflife
18:29 mtholder statictop.html was from the original page
18:29 mtholder statfat was pasting in the JS and CSS
18:29 mtholder but I wasn't think about relative paths when doing that.
18:30 mtholder It was just for making it one request rather than 13 - but it is probably not worth it.
18:35 mtholder when jimallman returns we can talk about changes that he might have made to statfat.html
18:35 jar286 right
18:41 jimallman back now. in statfat, i just changed (relative) image URLs in CSS, and replaced the stale comments with our static HTML message
18:42 jimallman argus/ in the URL is optional, should have no effect
18:42 kcranstn joined #opentreeoflife
18:42 jimallman (this was optimistic routing, with the notion that we’d have alternate tree renderers, eg. onezoom)
18:44 mtholder OK. I wasn't sure if that was a side-effect of my copy and paste.  FWIW, here is my (too naive) script https://gist.github.com/mtholder/b0296d7402b65fc51b6d
18:46 mtholder 8 idle workers
18:48 jimallman i believe jar286 ’s URL must have come from a TNRS search result. each result gets a hyperlink whose href includes argus/, but this is often omitted (as a default value) in our URLs.
18:49 mtholder I think that OTI may be down
18:49 jimallman hm, many idle connections now
18:49 jimallman (which seems good)
18:49 mtholder or at least I'm getting lots of XMLHttpRequest cannot load https://api.opentreeoflife.org/oti/v1/singlePropertySearchForStudies. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'https://tree.opentreeoflife.org' is therefore not allowed access. The response had HTTP status code 503.
18:50 kcranstn yup getting nagios warnings on oti
18:50 jimallman that’s a CORS error, which is odd
18:55 mtholder jar286, how do we get OTI going again? just redeploy?
18:56 jimallman we can do neo4j-treemachine/bin/neo4j restart, heading there now..
18:56 mtholder I agree, jimallman, that it looks like CORS, but peyotl tests are failing.
18:56 mtholder oti restart not treemachine!!! jimallman
18:56 jimallman right
18:57 jimallman restarting oti now
18:57 kcranstn breathe, everyone. breathe...
18:58 kcranstn replaying this as some sort of disaster movie: “NO, JIMALLMAN! NOT TREEMACHINE…..!”
18:58 * jimallman runs away from a fireball
18:58 * jimallman … but bursts into flame!
18:59 jimallman well, oti is taking quite a while to shut down.
19:00 jimallman starting now…
19:01 jimallman hm, i’m getting a 503 (Service Unavailable). anyone els?
19:01 kcranstn from what?
19:02 jimallman wget https://api.opentreeoflife.org/oti/v1/singlePropertySearchForStudies
19:02 jimallman curl gets a more informative (and so familiar) message
19:02 jimallman curl https://api.opentreeoflife.org/oti/v1/singlePropertySearchForStudies
19:02 jimallman No such ServerPlugin: "QueryServices"
19:02 kcranstn the front page is much snappier now, though
19:03 kcranstn although I am still seeing a POST to localcomments
19:03 kcranstn or has that fix not been deployed yet?
19:03 jimallman hm, should already be in place!
19:04 kcranstn hmmm
19:05 jimallman you’re seeing POST requests, you mean? perhaps a user with a stale view and comment UI in place..
19:05 kcranstn yes, POST requests. Let me clear history and re-try
19:06 kcranstn second clear history got rid of it
19:06 jimallman restarting oti, fingers crossed…
19:07 jimallman and now it’s back to 503 Service Unavailable
19:07 jimallman curl https://api.opentreeoflife.org/oti/v1/findAllStudies
19:08 jimallman and now (no action on my part) back to “No such ServerPlugin “QueryServices””. ugh.
19:08 jar286 this happened last week. fix was to recompile (push.sh -f)
19:08 kcranstn joined #opentreeoflife
19:09 jimallman jar286: would you mind doing this, if you have the incantation handy?
19:09 jar286 ./push.sh -c ../../deployed-systems/production/api.config -f oti
19:09 jar286 I can do it
19:10 jimallman thanks, stashing that in my notes.. (i forgot about the -f option)
19:10 jar286 this failure makes absolutely no sense
19:11 jar286 how can a Java class just spontaneously decide to go away?
19:11 jar286 note that nagios picked up on it
19:11 jar286 I suppose it could be a resource exhaustion problem (e.g. memory)
19:12 jar286 but if so that’s certainly an odd way to express it.
19:12 jimallman yeah, it’s weird alright.
19:12 mtholder If we can get it running again, then I think we should grab the study info for the 484 studies that are in synthesis and hard code that into treeview.js (so that we stop calling oti from the treebrowswer)
19:12 jar286 it claims to be finished. try again
19:12 jimallman ok, looks better
19:12 jimallman from a curl call, at least
19:12 mtholder passes peyotl tests
19:12 jimallman study list loads nicely, too
19:13 jar286 and of course a resource problem means there’s a leak, which obviously there shouldn’t be
19:13 mtholder do we have the list of the 484 studies handy...?... I must have it somewhere
19:13 jar286 thanks to josephwb for adding -f
19:13 kcranstn http://datadryad.org/resource/doi:10.5061/dryad.8j60q
19:13 jimallman on the bright side, apache status is looking really nice. lots of idle connections.
19:13 kcranstn list is there
19:14 jar286 the argus URL I showed came from the referrer field in the apache log
19:14 kcranstn cautiously suggests beer time?
19:14 mtholder I've already had one
19:14 jar286 I need to take my daughter to soccer practice, so I’ll be out for a couple of hours
19:15 mtholder well, at least 1
19:15 jar286 you can reach me on slack if necessary. haven’t had success with irc clients on iphone
19:16 kcranstn thanks, all!
19:28 mtholder and it is down, again...
19:30 mtholder I'll restart it...
19:34 jimallman oti is down? or something else?
19:35 mtholder it is back
19:35 mtholder i just had to restart it
19:35 jimallman it=?
19:35 mtholder oti
19:35 jimallman gotcha
19:36 mtholder it won't stay up for long, now that the browser loads and people can call it
19:36 mtholder granted I'm hammering it to get the 484 studies
19:36 jimallman hmm
19:36 mtholder I just need 10 more...
19:36 mtholder I think I have them..
19:37 jimallman we could cache this pretty easily i’d think. just add /cached/ to the URL and web2py will fetch just once, then serve it from cache.
19:43 mtholder doh. A lot of what I got was error messages...
19:54 mtholder jimallman http://tree.opentreeoflife.org/static/otiresults.json is a JSON file where the keys are the study IDs and the values are OTI's response when each of these is used as the study ID
19:54 mtholder I think that this all of the study IDs in the synth tree.
19:54 jimallman kewl, thanks
20:00 mtholder we could: call a GET to that in place of the
20:00 mtholder $.post(
20:00 mtholder singlePropertySearchForStudies_url
20:01 mtholder in treeview.js
20:01 mtholder (line 828)
20:01 mtholder and then look at the correct key to get the "data" field for the success callback
20:01 mtholder or I could try to write a controller that when you pass in the study id you get just that studies info.
20:02 jimallman sure, or just load the whole file on the client once
20:02 jimallman JSON file, i mean
20:06 jimallman hm, i see that it’s 300KB
20:06 jimallman then again, it’s a one-time fetch and no more requests. thoughts?
20:06 guest|23436 joined #opentreeoflife
20:06 jimallman mtholder: ^
20:07 mtholder I don't think it'll be too bad...
20:08 jimallman which?
20:09 guest|23436 everything given by taxonomy only right?
20:09 mtholder sorry loading everything in JS
20:09 jimallman hi guest|23436, where possible we support relationships with phylogeny
20:10 jimallman (solid lines in the tree, vs. dotted lines to children)
20:10 jimallman mtholder: ok, that should be easy to try.
20:17 guest|23436 Does anyone know the youngest derivation in the tree?
20:18 mtholder sorry, I don't know what you mean by "derivation"
20:18 guest|23436 Something like genesis of a new species
20:18 jimallman most recent divergence
20:19 mtholder the tree does not have time estimates on it, unfortunately.
20:20 mtholder If you are looking for a recent speciation, the http://entnemdept.ufl.edu/creatures/fruit/tropical/apple_maggot_fly.htm may be worth a look
20:30 mtholder jimallman, would it mess you up if I deployed something to devtree to test (opentree webapp new controller for the oti info
20:30 mtholder ?
20:31 jimallman i’m doing some manual edits there, hold on.
20:31 jimallman i can do ‘git stash’ to capture all that
20:31 mtholder no rush.
20:33 jimallman i’ve stashed the local changes as HOTFIXES-ON-DEV, but i’m concerned that they might be discarded in deployment. let me save a patch file, just in case…
20:33 mtholder Actually, I'll just tell you. If you want to deploy the hackoticache branch, you should find 1 new controller that would take an URL like hack/oticache/ot_104 and give you just the oti result for that study
20:34 mtholder I just think that I need to restart web2py to be honest
20:34 mtholder i scp'ed the controller, but it does not seem to find it.
20:34 mtholder works for me locally, though.
20:36 jimallman ok, i’ll restart apache
20:38 mtholder cool. seems to work: https://devtree.opentreeoflife.org/hack/oticache/ot_104
20:38 mtholder just copying the hack.py controller to tree should work
20:39 jimallman fwiw, this doesn’t deal gracefully with malformed ids: https://devtree.opentreeoflife.org/hack/oticache/ot104
20:39 jimallman but i don’t suppose that’s likely
20:40 jimallman thanks, i’ll work on this instead of giant-JSON-on-client
20:40 kcranstn joined #opentreeoflife
20:41 mtholder I just changed it to a 403 rather than 503
20:41 mtholder maybe 400
20:41 guest|23436 Thank you, mtholder. This is interresting! Even the apple maggot is related to maggotis by txonomy only.
20:45 guest|23436 If you'd hybridize apple maggot races (hawthorn feeding and apple feeding), whould we get anything different?
20:46 kcranstn I am back. OTI still melting?
20:47 mtholder It seems to be working now
20:47 mtholder it did need a restart after you went off line
20:48 mtholder I grabbed the results we need and wrote a silly controller that would serve up the content that jim's code needs w/o calling OTI
20:49 kcranstn w00t
20:50 mtholder guest|23436 short answer: I don't know. But you might want to look at the research done by http://federlab.nd.edu/
20:54 jimallman mtholder’s hacky oti-cache is working (very quickly) now on devtree, copying this to production…
20:54 kcranstn we shall refer to this as the weekend of hacky caching
20:54 * jimallman nods
20:56 mtholder gotta run...
20:56 mtholder left #opentreeoflife
21:03 jimallman so that’s now working on production, hopefully takes most/all of the burden off of oti
21:03 jimallman except for TNRS search
21:05 kcranstn whew
21:07 kcranstn we need to do a summary of the slack / irc logs after this week and decide how to implement much of this stuff in a less hacky way
21:16 guest|23436 Thanks! By!
21:18 mtholder joined #opentreeoflife
21:19 jimallman mtholder: hacky-cache is in place on production. very snappy!
21:20 jimallman kcranstn: agreed. i’ve been stashing changes to source code, and of course we’ll need a separate effort to document and push changes to the main apache config.
21:20 mtholder good.
21:20 mtholder wrt TNRS. that is taxomachine, which seems much more stable than oti
21:58 kcranstn joined #opentreeoflife
22:08 kcranstn joined #opentreeoflife
22:38 jimallman general status report: things remain snappy on production, and lots of idle connections. hopefully we’re ready for Monday traffic.
23:05 jar286 thinks look good
23:06 jar286 scanning irc log
23:41 kcranstn joined #opentreeoflife

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary