Camelia, the Perl 6 bug

IRC log for #metacpan, 2013-05-25

| Channels | #metacpan index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
02:33 SineSwiper heh, hundreds of thousands of CPAN modules, and it's still can fit into a thumb drive
03:54 preflex_ joined #metacpan
03:55 oalders SineSwiper++ # doc patches
05:28 ether_ joined #metacpan
06:39 thaljef joined #metacpan
10:15 dpetrov_ joined #metacpan
10:52 SineSwiper oalders: thanks, though I'm still getting 0 repos from some of the othe API scripts
10:53 SineSwiper trying to figure out which script needs to do the final indexing
10:53 SineSwiper do you have a prod copy of API's crontab?
10:53 SineSwiper the git repo needs updating, anyway
11:34 mo SineSwiper: https://gist.github.com/met​acpan/29add0da741e9cbba306
11:34 dipsy [ gist:29add0da741e9cbba306 ]
11:36 mo actually, we should delete that config stuff from the api
11:37 mo it is quite misleading
12:36 ranguard SineSwiper: https://github.com/CPAN-API/metacpan-puppet/b​lob/master/modules/metacpan/manifests/cron.pp is what generates the crontab (should you want to reference it in docs so it's always current)
12:36 dipsy [ metacpan-puppet/modules/metacpan/manifests/cron.pp at master · CPAN-API/metacpan-puppet · GitHub ]
13:23 SineSwiper I guess I should use /home/metacpan/CPAN as the CPAN directory
13:27 ranguard that's what live does
13:28 ranguard FYI eventually we'd like to have a CPAN::Faker that creates a minimal (and known) /home/metacpan/CPAN
13:28 ranguard with a config file so we can easily add new modules as we find edge cases
13:28 SineSwiper I'm keep adding wget commands, since minicpan doesn't seem to grab them
13:29 SineSwiper I wonder if a wget mirror command would be better
13:29 SineSwiper what do you guys use for CPAN sync ups?
13:29 ranguard then that data would be part of the VM we ship for developers
13:29 ranguard rrr
13:29 ranguard https://github.com/CPAN-API/metacpan-​puppet/tree/master/modules/rrrclient
13:29 dipsy [ metacpan-puppet/modules/rrrclient at master · CPAN-API/metacpan-puppet · GitHub ]
13:30 ranguard but that's overkill for just developing
13:30 SineSwiper well, I don't mind the 4GB extra
13:31 SineSwiper it's really quite shocking just how small a footprint CPAN is
13:31 ranguard heh, well, it's just txt :)
13:31 ranguard (mostly)
13:31 SineSwiper compressed txt at that
13:31 ranguard throw in backpan if you want to use up more disk space
13:31 SineSwiper heh, yeah
13:32 ranguard our live uncompressed dir is using 130G :)
13:33 ranguard yea, with backpan you are at 32G (if I'm looking at this right)
13:37 SineSwiper ranguard: you uncompress them to save on CPU, but MetaCPAN can index them either way, right?
13:38 * ranguard believes so but doesn't really know the api/ES stuff
13:38 ranguard I think the uncompress is actually more for the pod and source for the web - instead of the index - but could be wrong
13:39 ranguard also think it only uncompresses those that are used
13:40 * ranguard has to go - back later
14:00 SineSwiper 28100 tarballs found... I thought CPAN had more than that...
14:35 SineSwiper 4 minutes between indexing a few modules and looking at the next one?
14:36 SineSwiper this is going to take forever
15:30 przemek joined #metacpan
15:38 oalders SineSwiper: yeah, it takes a long time. so, for dev purposes, you could just index an author directory to get started
15:39 przemek joined #metacpan
17:31 przemek_ joined #metacpan
18:25 thaljef joined #metacpan
19:36 SineSwiper I'd expect it to take a while, but it almost sounds like there's something wrong if it's taking around 7-8 minutes for every two modules (on a two child process)
19:38 SineSwiper odd, it's not a CPU bottleneck, but a I/O one
19:39 ranguard ES doing $stuff?
19:40 SineSwiper ranguard: I think so; does it spit out logs somewhere?
19:40 ranguard yea /opt/elasticsearch/ something
19:41 ranguard /opt/elasticsearch/logs/ infact
19:41 SineSwiper [2013-05-25 15:33:28,090][INFO ][index.search.slowlog.query] [Valentina Allegra de Fontaine] [cpan_v1][2] took[9.9s], took_millis[9908], types[release], stats[], search_type[QUERY_THEN_FETCH], total_shards[5]
19:42 SineSwiper [2013-05-25 15:41:37,366][WARN ][monitor.jvm              ] [Valentina Allegra de Fontaine] [gc][Copy][147400][484] duration [10.4s], collections [1]/[12.1s], total [10.4s]/[14m], memory [36.3mb]->[19.5mb]/[61.9mb]
19:43 SineSwiper I wonde how much memory/CPU this VBox is set up for
19:44 ranguard 384MB RAM
19:45 ranguard you can change it through VirtualBox
19:45 SineSwiper yeah, looks like a memory bottleneck, as kswap0 is sucking down the usage
19:45 ranguard and 1 CPU
19:47 ranguard cool, please add to the docs (I think it's still correct as the default as we should ship a VM that runs on a minimum of machines
19:47 SineSwiper ranguard: yeah, I'm keeping notes for a second PR
19:47 SineSwiper this is more of a "advanced working-on-the-API" build
19:47 ranguard but I'd be easlily convinced to increase it with a good reasons :)
19:48 ranguard yea, btw what you are doing is really valuable - seeing as I've never even looked at the API side :)
19:49 thaljef joined #metacpan
19:50 SineSwiper yeah, no problem, having fun kicking the tires and seeing how far I can break this thing
19:53 SineSwiper well, we can either do this with 7-8 minutes per module, or we can do it with 4 seconds :)
19:54 SineSwiper much faster
19:54 SineSwiper currently working with a 1536MB of memory and 2 CPUs
19:54 SineSwiper though, limitations may vary per PC
19:54 SineSwiper this Oracle VBox doesn't recommend going past half of your resources
19:55 ranguard :)
19:55 ranguard our live box has  32G of memory and 4 CPUs :)
19:57 SineSwiper you know, all of this setup made me think that there could be some potential of crowdsourcing your web site
19:58 ranguard ?
19:58 ranguard the hosting?
19:58 dipsy the hosting is fine - just am paying too much for what i use
19:58 SineSwiper I mean, you'd have to make sure the people are trusted to not messing with the site
19:59 SineSwiper it's just that I've got this neat MCPAN box and shortly I'll have it working almost like a mirror
19:59 SineSwiper one sec, door
20:00 ranguard yea, but you won't have an ES running in 32G of ram, so won't match our speed (or so I've been told)
20:01 ranguard it's acutally the user (not CPAN) data that's the pain as well, because ES clustering assumes low latency
20:02 ranguard that asside we don't need to offload at the moment, and we have some options if we need to (http://www.fastly.com/ have given us a free account for a start, I just haven't gotten around to setting it up)
20:02 dipsy [ Fastly ]
20:03 ranguard but the concept is interesting in general :)
20:07 SineSwiper yeah, mostly just bantering about a half-baked idea, but it would be interesting to have something set up that says "hey, run these few commands, and bam, a mirror site"
20:08 SineSwiper after all, CPAN has many many different mirror sites that they can seem to trust
20:23 oalders the other thing you could do is set it up at $work for your darkpan/current version of cpan and you'd have searchable docs for exactly the versions you're using in production
20:36 SineSwiper oalders: yeah, that's a good idea, though you'd have to make sure your darkpan is going to have a similar file configuration as CPAN
21:06 SineSwiper ES is a java applet; does it have a fixed memory footprint?
21:06 SineSwiper I wonder if the base 384MB memory allocation wasn't enough for ES in general
21:09 SineSwiper oh, neat, you have munin on this thing somewhere
23:00 thaljef joined #metacpan

| Channels | #metacpan index | Today | | Search | Google Search | Plain-Text | summary