Perl 6 - the future is here, just unevenly distributed

IRC log for #metacpan, 2013-11-25

| Channels | #metacpan index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
02:34 klapperl_ joined #metacpan
02:50 Kovensky joined #metacpan
04:04 preflex_ joined #metacpan
05:00 preflex_ joined #metacpan
05:38 Farow joined #metacpan
06:32 Farow|2 joined #metacpan
07:08 dpetrov_ joined #metacpan
08:53 [Sno] joined #metacpan
08:59 dolmen joined #metacpan
09:49 daxim joined #metacpan
11:48 kentaro joined #metacpan
12:52 ilmari_ joined #metacpan
14:22 Farow joined #metacpan
15:15 Farow|2 joined #metacpan
15:17 Topic for #metacpan is now Have you installed your MetaCPAN VM? https://github.com/CPAN-API/metacpan-developer | Chat logs available at http://irclog.perlgeek.de/metacpan/ | Can't find your module on MetaCPAN? https://metacpan.org/about/missing_modules
16:10 jpn joined #metacpan
16:21 Farow|2 joined #metacpan
16:21 chmrr joined #metacpan
16:24 chmrr Did something change in the production metacpan deployment between ~1615 and ~1815 US/Eastern on Saturday?  The rt.cpan.org cron jobs which search metacpan started flipping out around then.
16:34 oalders chmrr: https://twitter.com/metacpan/status/404354954590052355
16:34 dipsy [ Twitter / metacpan: Many thanks to .@fastly for ... ]
16:34 oalders what's the problem?
16:35 alh Eek, Time::Cubic is still available for view on metacpan
16:35 alh I thought that was scrubbed
16:35 ether alh: is it? I cannot find it
16:35 alh https://metacpan.org/release/BANTOWN/Time-Cubic-1.0
16:35 dipsy [ Time-Cubic-1.0 - IMPLEMENTATION OF NATURE'S HARMONIC SIMULTANEOUS 4-DAY TIME CUBE - metacpan.org - Perl programming language ]
16:35 alh SImple google turned it right up
16:36 ether use constant { JEWS => 911, TRUTH => 4 };   # omg
16:36 chmrr oalders: http://chmrr.net/nopaste/2013-11-25Y5sW1YvD
16:36 ether is this module destructive, or just dumb and offensive?
16:37 oalders chmrr: what's the query that returns the error?
16:38 oalders alh: that module is removed from BackPAN?
16:38 oalders i don't think we got an email about it
16:38 BinGOs view the source and decide for yourself.
16:39 chmrr oalders: https://github.com/bestpractical/cpan2rt/blob/deploy/lib/CPAN2RT.pm#L333
16:39 dipsy [ cpan2rt/lib/CPAN2RT.pm at deploy · bestpractical/cpan2rt · GitHub ]
16:39 chmrr (trs may have greater context for what's going on here)
16:40 chmrr http://chmrr.net/nopaste/2013-11-25O_L6bva9-if_eth1-week.png is also ... notable
16:41 oalders chmrr: not sure i understand the graph
16:41 chmrr That's data that rt.cpan.org is getting _in_.  starting on Saturday, it's a sustained 50M/s at times: http://chmrr.net/nopaste/2013-11-256RiSj8lf-if_eth1-day.png
16:42 oalders and is metacpan related to the inbound traffic?
16:43 chmrr Since the uptick coincides with when the cron job started to die, that's my presumption.  Do you have logging on your side you can easily compare it to?
16:45 oalders chmrr: http://munin.bm-n2.metacpan.org/metacpan.org/bm-n2.metacpan.org/
16:45 dipsy [ Munin :: metacpan.org :: bm-n2.metacpan.org ]
16:47 chmrr *nod*  OK, so nothing glaringly obvious there.  Is fastly acting as a caching proxy for you now?  What is it that they're doing?
16:47 chmrr (it's entirely posible that we're hosing _their_ servers instead of yours)
16:48 oalders chmrr: yeah, just a caching layer between the api and the world at this point
16:48 oalders ranguard did the setup
16:48 oalders no other complaints so far
16:48 oalders but maybe this is just the first :)
16:49 chmrr Hm.  I know ~0 about ElasticSearch.  Anything jump out as Interesting about https://github.com/bestpractical/cpan2rt/blob/deploy/lib/CPAN2RT.pm#L333 ?
16:49 dipsy [ cpan2rt/lib/CPAN2RT.pm at deploy · bestpractical/cpan2rt · GitHub ]
16:50 oalders chmrr: no, but it looks like the error is in the scrolling rather than in the initial request.  you could run that with logging enabled so that we could see the problem query
16:51 chmrr My brief attempt at that earlier failed to actually error when run from not-cron.  I'll give it another shot
16:51 oalders chmrr: see trace_calls in https://metacpan.org/pod/ElasticSearch
16:51 dipsy [ ElasticSearch - metacpan.org - Perl programming language ]
16:52 chmrr I assume you don't have any pretty munin graphs from the fastly side of things?
16:57 chmrr OK, I've kicked off the cpan2rt script for just the metacpan query, with $es->trace_calls(1), and we'll see what it turns up
16:58 oalders fastly graphs look pretty good
17:01 chmrr Hm.  Nothing that looks similarly-shaped to http://chmrr.net/nopaste/2013-11-256RiSj8lf-if_eth1-day.png ?
17:02 chmrr Because the trafffic just hit 24M/s from ~0, and the timing seems non-coincidental with the metacpan query I just started.
17:04 oalders weird. how often does this script run?
17:05 chmrr Every two hours.
17:06 oalders chmrr: did you get any logs on this run?
17:06 chmrr It's still spewing.  I was assumming I should run it until it errored.
17:07 oalders if it doesn't cause too many problems
17:08 chmrr No, it's fine to run.  From quickly watching it scroll by, it looks like it's spewing the same modules over and over again, though.
17:08 chmrr I have ~no knowledge of how scrolled searches in ES work, so if that's expected or not.  The code was written by trs
17:09 oalders that would depend on the query
17:18 chmrr If my read of the munin graphs is right, it generally runs for just about half an hour before exploding.  It really just does seem to be the same results over and over again, though, so the query may be wrong
17:18 chmrr And fastly may be giving us the finger for running too long, which is the cause of the 500s.
17:38 trs chmrr, oalders: around now
17:39 trs so, the scrolled search sets a time limit of 5m
17:39 trs which ostensibly tells the ES server, "keep this result set around for at least 5m while I scroll through it"
17:39 chmrr http://chmrr.net/nopaste/2013-11-25IslCb38d is the result that 500'd
17:39 trs chmrr: I believe that was sufficient at the time, but perhaps it's not anymore and fastly is saying "buh-bye results"?
17:40 trs I wonder what magic fastly is working in front of ES
17:40 chmrr ==trs.  How ES-aware is the caching?
17:41 trs chmrr: what are the headers on a successful ES response?
17:41 trs i.e. are the caching headers set correctly for fastly to know what to do with it?
17:42 chmrr trace_calls doesn't log the headers
17:42 trs tcpdump? :)
17:43 trs or an LWP hook if ElasticSearch.pm is using the LWP backend?
17:43 chmrr Yeh, looks like there's no debug flagm and I'm _assuming_ from the fact that "curl" shows up in those lgos that it's not using LWP
17:43 trs oh, hah
17:43 trs wrap curl! ;)
17:44 chmrr Imma gunna go wit the tcpdump, I think
17:44 chmrr Don't know what's up with that IP address, though
17:44 chmrr http://127.0.0.1:9200/ is a thing
17:45 trs yeah, I don't know enough about trace_calls to know if it's spewing things for easy copy&paste into terminal while devving
17:49 trs ES scrolling appears to be broken.
17:49 trs it's returning the same page each time
17:50 chmrr Joy!
17:50 chmrr But it's returning that page very fast! :)
17:51 trs see the output of http://zulutango.net/~tom/paste/2013-11-25wtUeTLNh-scroll-test (when it doesn't 500)
17:51 trs it sees the same releases over and over
17:52 chmrr Yeag, that's what I was observing.  I didn't know if that was expected or not, hence my 12:08
17:52 trs ahh, missed that.
17:53 chmrr I also don't know how related https://github.com/bestpractical/cpan2rt/commit/b0e6e6250 is
17:53 dipsy [ api.metacpan looks to no return "field" but rather "_source" not · b0e6e62 · bestpractical/cpan2rt · GitHub ]
17:53 trs chmrr: so the "size => 100" dictates 100 results per fetch
17:53 trs chmrr: and scroll => 5m says, let me scroll through each page for at least 5m
17:54 chmrr Cache the fullresults server-side for 5m, and I'll ask for them 100 at a time?
17:54 trs yes
17:54 trs that's a better way of putting it. I'm still drinking my coffee. :)
17:54 chmrr Undertood. :)
17:54 trs that fields -> _source is strange
17:55 chmrr That was my 5-minute "wut" on Saturday.
17:55 trs but I don't know enough about ES to know if that was an ES upgrade or something else weird. oalders?
17:57 trs chmrr: btw, transport => http should mean "not curl", since "curl" is another transport option. but, dunno for sure.
17:57 chmrr Hm, odd.
17:57 trs I suspect it's just a logging convention
17:57 trs but <- guessing
17:57 oalders no changes on our end other than fastly
17:58 oalders i did upgrade some API code, but it was mostly tests
17:58 oalders the localhost bit in the logs is so people can copy/paste to test the queries locally
17:58 oalders that's something that clintongormley set up.  using the actual host name would be so much easier
18:00 chmrr I suspect that the fastly API needs to be told to not cache scrolled query requests
18:03 trs oalders, chmrr: if I put 127.0.0.1 api.metacpan.org in /etc/hosts and setup a proxy to the real api.metacpan.org (46.43.35.68), I get the correct behaviour.
18:04 trs the fields -> _source change is significant, because the fastly api responses are including the full results, instead of only the fields we ask for like the real api server.
18:04 trs so much more data is transferred too.
18:04 oalders ah, that's actually really helpful to know
18:08 trs hahaha
18:08 oalders i want to wait until ranguard is around before changing anything, but this will have to get sorted
18:12 trs so, ElasticSearch.pm is sending a GET request with JSON in the body
18:13 trs and I'd bet money that fastly is ignore GET bodies.
18:13 trs among the problems here
18:14 trs ^ explains ignoring the request for only certain fields
18:15 oalders that sounds plausible
18:15 oalders that accounts for the extra inbound bandwidth?
18:16 trs it also probably explains why scrolling "isn't working" because the actual search terms are in the JSON body, not the URL.
18:16 trs so the search is everything in metacpan
18:16 oalders :D
18:16 trs all releases.
18:16 trs the actual URL for scrolling should change for each fetch, looking at logs now
18:17 oalders right
18:17 trs yay! :)
18:17 oalders and since metacpan.org is using the local ES rather than the cached version, we haven't noticed a problem with the search results
18:17 trs nod
18:18 trs the proxy that's useful for comparing responses, btw:
18:18 trs plackup -p 9200 -MPlack::App::Proxy -e 'Plack::App::Proxy->new(remote => "http://46.43.35.68", preserve_host_header => 1)->to_app;'
18:18 dipsy [ Search the CPAN - metacpan.org - Perl programming language ]
18:18 oalders trs: could you add that to the wiki?
18:19 oalders this may come up again
18:23 trs sure, cpan-api or metacpan-web?
18:25 oalders trs: https://github.com/CPAN-API/cpan-api/wiki/API-docs
18:25 dipsy [ API docs · CPAN-API/cpan-api Wiki · GitHub ]
18:25 oalders we actually, just have the one wiki.  having two would be even more confusing, i think :)
18:27 trs fair :)
18:27 trs oalders: just added it here: https://github.com/CPAN-API/cpan-api/wiki/Fastly-CDN
18:27 dipsy [ Fastly CDN · CPAN-API/cpan-api Wiki · GitHub ]
18:27 trs I can move it to API-docs if you want
18:27 oalders trs++
18:27 oalders i actually didn't know about that page.  that's a good place for it
18:27 * ranguard is around for a few mins... catching up
18:28 trs which has my new updated code that doesn't need /etc/hosts, yay psgi
18:30 * trs needs to re-focus on his day job now, but I'm on irc all day and can chime in asynchronously
18:31 chmrr trs: Thanks!
18:32 ether irc \o/
18:32 [Sno] joined #metacpan
18:37 oalders ranguard: quick summary is that fastly may be ignoring the body sent in a GET
18:37 oalders which is problematic for people using ElasticSearch.pm
18:37 ranguard back in about 30 mins - if someone can write a quick tesr case for me that would be amazing
18:46 trs oalders: does Elasticsearch.pm (sigh) do a POST instead?
18:46 trs chmrr: no problem, I like it when the CPAN ecosystem continues to work :)
19:07 trs fastly is probably in the right (as far as RFCs go) about not looking at the body of a GET
19:08 trs the RFC doesn't prohibit a body, but does say that a server must not use the GET body to vary the response.
19:11 Mike-PerlRecruiter_ joined #metacpan
19:12 * ranguard is asking on #fastly (freenode)
19:19 ranguard http://stackoverflow.com/questions/978061/http-get-with-request-body/15656884#15656884 <- looks like it is against teh RFC
19:19 dipsy urgh. long url. Try http://tinyurl.com/lgvn2ad
19:19 dipsy [ rest - HTTP GET with request body - Stack Overflow ]
19:20 oalders trs: i don't know of a way to force a POST with ElasticSearch.pm
19:20 trs ranguard: so what I said above :)
19:20 ranguard trs: yea, indeed - just repeating what they said ;)
19:20 trs oalders: it's in ElasticSearch/Transport.pm, under _tidy_args().  it should default method to POST instead of GET if there's data.
19:21 oalders ranguard: is there any way to get fastly to ignore a GET with a body?
19:21 trs and I'm assuming the underlying call doesn't specify a method and it's defaulting to GET there.
19:21 ranguard does ANYTHING set Last-Modified in api?
19:22 oalders ranguard: i don't think so
19:25 ranguard oalders: do you have an example script I can use to test this?
19:26 trs ranguard: http://zulutango.net/~tom/paste/2013-11-25wtUeTLNh-scroll-test
19:26 trs it's a start
19:26 trs but the query that was failing. you can use my proxy trick on the wiki to try it against both fastly and not fastly.
19:26 metacpan joined #metacpan
19:26 metacpan [cpan-api01] ranguard pushed 1 new commit to leo/fastly: http://git.io/DH9eag
19:26 metacpan cpan-api/leo/fastly e22231d Leo Lapworth: Set surrogate control for everything, no matter what
19:26 metacpan left #metacpan
19:26 dipsy [ Set surrogate control for everything, no matter what · e22231d · CPAN-API/cpan-api · GitHub ]
19:27 ranguard trs: cheers
19:29 trs ranguard: so the Surrogate-Control commit you just pushed, does that mean that only stuff which explicitly sets Surrogate-Control: 1 now gets cached?
19:29 * trs knows ~not much about fastly's caching
19:29 metacpan joined #metacpan
19:29 metacpan [cpan-api01] ranguard pushed 1 new commit to master: http://git.io/yCK6hg
19:29 metacpan cpan-api/master a03bac9 Leo Lapworth: Merge branch 'leo/fastly'
19:29 metacpan left #metacpan
19:29 dipsy [ Merge branch 'leo/fastly' · a03bac9 · CPAN-API/cpan-api · GitHub ]
19:29 ranguard Surrogate-Control should be a ttl for JUST fastly to look at
19:33 * ranguard restarts api
19:34 ranguard trs: please test now - think I fixed it
19:34 oalders it looks fixed to me
19:36 ranguard ok, it was either the api hadn't been restarted with the surrogate-control stuff, or the default ttl in fastly was kicking in - I fixed both :)
19:36 ranguard chmrr: thanks for reporting - please test again
19:37 ranguard oalders: seperatly we should look at getting ES.pm to POST instead of GET if there is a body
19:37 oalders ranguard: i could have sworn i restarted the api after deploying on saturday
19:38 oalders chmrr++ trs++ ranguard++
19:38 trs ranguard, oalders: I just tested it again.  It appeared fixed, but subsequent runs are still broken.
19:38 ranguard eek
19:39 trs i.e. sounds like something is still cached somewhere, or it was all properly expired and then cached incorrectly again
19:40 ranguard all traffic seems to be coming through as 'pass' - which means fastly isn't caching
19:40 ranguard (which is what we want)
19:40 oalders it's working on and off for me. i wonder if the successes are cache misses
19:40 * ranguard runs a full purge of fastly
19:41 oalders ranguard: you can use these queries to test https://gist.github.com/oalders/7647464
19:41 dipsy [ gist:7647464 ]
19:41 trs "There are only two hard things in Computer Science: cache invalidation and naming things."
19:41 ranguard done
19:41 chmrr trs: ITYM "two hard problems: caching, naming, and off-by-one errors"
19:42 trs chmrr: :D
19:42 ranguard ok, 100% pass through on fastly - but seeing the same error
19:43 ranguard oalders: those are all returning results
19:43 ranguard it's trs's script that sometimes errors
19:43 oalders ranguard: they always return results, but the one that calls the api directly sometimes returns all of the fields
19:44 chmrr ranguard: But note that the results are _different_ on thoe two.  For instance, "fields" is wrong on the fastly one
19:44 oalders i.e. ignores the body
19:44 chmrr ==oalders
19:51 ranguard asking on #fastly for help
20:49 ranguard did someone post a link to where ES does the body thing with GET ?
20:49 ranguard oh, found it
20:55 ether the new release of Pod::Markdown now generates links to metacpan instead of s.c.o.  Thank you rwstauner!!!!
20:56 ether (this is used, among other things, by Dist::Zilla::Plugin::ReadmeAnyFromPod)
20:56 ether I'm going to refresh all my README.md files in my github repos tonight
20:56 ranguard sweet
20:59 oalders rwstauner ftw!
21:00 ether \o/
21:01 jayallen joined #metacpan
21:06 ranguard trs chmrr - we've changed the DNS for api.metacpan.org back to the origin, it's the GET + Body thing, fastly are going to look at it, I've also reported a bug to ElasticSearch.pm
21:07 ranguard but fastly have rightly said they and other CDN's arn't likely to support it but they're looking at options
21:07 chmrr Yeah, totally legit.  IMHO ElasticSearch switching to POST is the most right fix
21:09 chmrr Thanks for looking into it
21:09 ranguard np, thanks for reporting and helping identify it
21:10 chmrr If we pass method => 'POST' to ->scrolled_search, will it just start doing the right thing?  Seems a bity to lost the fastly speedup just because of one bad client
21:11 ranguard one bad client that's the main one we recommend :)
21:11 chmrr Fair. :)
21:11 ranguard we'll go back to fastly when we can :)
21:13 ranguard chmrr: you can always ping global.prod.fastly.net - then set your local machine to that as the IP address for api.metacpan.org if you want to go via fastly :)
21:13 chmrr "want" is not really the correct word.  I think I prefer correct answers to very fast wrong ones. :)
21:13 ranguard :)
21:37 * mst giggles
21:37 mst ranguard: I told clinton that'd be a problem while mo was still working on the original version of the web UI
21:38 mst http://trout.me.uk/tys.txt applies, I think.
21:40 ranguard mst: heh
21:40 ranguard mst: if you want to give me IRC logs I'll add them to the ticket :)
21:42 ranguard https://github.com/clintongormley/ElasticSearch.pm/issues/47 <- or add yourself :)
21:42 dipsy [ URGENT: GET + Body is BAD and breaks CDNs · Issue #47 · clintongormley/ElasticSearch.pm · GitHub ]
21:44 mst that would involve finding them; I think the additional amusement to additional effort ratio involved isn't sufficient to bother
21:44 mst I'll just keep giggling quietly to myself instead ;)
21:45 ranguard fair enough :)
21:45 trs oalders: just added a pr
21:45 oalders trs: that was quick :)
21:46 ranguard trs++
21:46 ranguard though.. might just be better to POST everything?
21:46 trs ranguard: no, when a GET is fine, that's what you want. and there are other methods used too (HEAD, OPTIONS, PUT, ...)
21:47 ranguard ahh, cool
21:47 trs also, I'm trying to make the change conservative and the most specific fix.  especially since ES.pm is considered deprecated by clinton in favor of Es.pm
21:49 ranguard oh!
21:50 ranguard trs: is Es.pm going to need patching as well?
21:50 trs ranguard: no idea.
21:50 ranguard https://metacpan.org/source/DRTECH/Elasticsearch-0.75/lib/Elasticsearch/Transport.pm#L65 defaults to GET
21:50 dipsy [ lib/Elasticsearch/Transport.pm - metacpan.org - Perl programming language ]
21:51 oalders you  have to e-sign a contributor agreement before a pull request to Es.pm can get merged
21:53 ranguard I'll email clinton about that
22:09 mst I really despise what they did with the naming
22:09 mst way to fuck up windows and OS X users
22:09 mst but I'm assuming clinton didn't get a choice
22:17 oalders you won't find too many people fond of the naming
22:23 ranguard yea - and the old version neess depreciate in the NAME and DESCRIPT sections so its clear in search results
22:23 * ranguard mentione that in his email to clinton
22:23 oalders :)
22:24 oalders he'll be so happy to hear from us
22:24 ranguard yep :)
22:44 klapperl joined #metacpan
23:04 ether mst: I wouldn't assume that. when I informed him of the issue, he said "oh, I hadn't thought about that".
23:04 ether it could have been averted if he'd done a trial release first and called for testers
23:04 ether or even if he'd just yanked the new name from PAUSE the day it went up, as I saw the problem immediately
23:05 * ether made a point to mail him ASAP about it in order to give him that chance, so it wouldn't be a matter of "oh well, too late to fix it now"
23:24 klapperl_ joined #metacpan

| Channels | #metacpan index | Today | | Search | Google Search | Plain-Text | summary