Perl 6 - the future is here, just unevenly distributed

IRC log for #rosettacode, 2011-01-23

| Channels | #rosettacode index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:11 mwn3d_phone left #rosettacode
00:12 mwn3d_phone joined #rosettacode
00:12 parsleyfirefly joined #rosettacode
00:23 SoniaKeys joined #rosettacode
01:01 SoniaKeys left #rosettacode
01:18 Mathnerd314 joined #rosettacode
01:37 Mathnerd314 left #rosettacode
02:19 mwn3d_phone left #rosettacode
02:33 mwn3d_phone joined #rosettacode
02:57 BenBE3 left #rosettacode
04:06 BenBE2 joined #rosettacode
04:23 BenBE2 The GeSHi Server (and all the other pages it runs) is up and RUNNING again ;-)
04:36 Mathnerd314 joined #rosettacode
04:42 shortcircuit BenBE2: ?
04:42 shortcircuit Is that related to the XSS alert?
05:30 Mathnerd314 left #rosettacode
06:18 parsleyfirefly left #rosettacode
08:27 mwn3d_phone left #rosettacode
08:36 mwn3d_phone joined #rosettacode
08:42 mwn3d_phone left #rosettacode
08:46 mwn3d_phone joined #rosettacode
09:41 FireFly left #rosettacode
11:51 BenBE2 shortcircuit The XSS alert was from NoScript in my browser when opening the shortlink by fedaykin
11:52 BenBE2 shortcircuit The "My Server now is up and RUNNING" is related to my server now being moved from a temporary VPS to a Dual-Opeteron system with 4GB RAM
13:15 MigoMipo joined #rosettacode
14:15 parsleyfirefly joined #rosettacode
14:15 FireFly joined #rosettacode
14:35 shortcircuit Ah. :)
14:57 kpreid left #rosettacode
14:58 kpreid joined #rosettacode
15:50 FireFly left #rosettacode
16:02 FireFly joined #rosettacode
18:04 Mathnerd314 joined #rosettacode
18:36 BenBE2 is now known as BenBE
18:36 BenBE left #rosettacode
18:36 BenBE joined #rosettacode
18:59 mwn3d_phone left #rosettacode
19:00 mwn3d_phone joined #rosettacode
19:56 belugs joined #rosettacode
19:56 belugs left #rosettacode
20:37 shortcircuit Ok, wow.
20:37 shortcircuit Something's hammering the server.
20:41 BenBE shortcircuit How's your server surviving the hammering?
20:41 shortcircuit Somthing's wrong with /wiki/Walk_A-directory/Non-recursively, and it keeps returning HTTP 500.
20:41 shortcircuit And Yahoo! Slurp keeps retrying.
20:43 shortcircuit BenBE: Short answer, it's surviving, cpu's at a solid 100%, and the load average is... *drumroll*
20:43 BenBE Redirect Slurp to a Temp Redirect to some static image ;-)
20:43 shortcircuit 20:43:29 up 98 days,  6:00,  3 users,  load average: 2.50, 5.65, 5.57
20:44 shortcircuit I just got an HTTP 500 trying to look at the diff for the last edit to that page.
20:44 BenBE Nice ;-) I wish I had 98 days uptime; but well; just rebooted today in the morning to get it up and running ;-)
20:44 BenBE First step: Get the load away by serving static content instead ;-)
20:45 BenBE And if Yahoo sees a nice picture with cat content for a few hours ... can't hurt :P
20:45 shortcircuit It_should_ be serving static content, once a version of the page has been generated. Yahoo's not asking for anything that shouldn't be cachable by squid.
20:45 shortcircuit Ah. CPU's going down.
20:46 BenBE Well, I'm thinking of using Nginx in front of my Apache once I get the system settled and all the remaining work done.
20:47 shortcircuit O.o
20:48 shortcircuit I just saw "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" for a user agent in one of the HTTP 500 responded-queries.
20:48 shortcircuit And a second, from the same IP.
20:48 BenBE What's HTTrack?
20:49 shortcircuit Apparently, a crawler and offline browser.
20:49 shortcircuit http://en.wikipedia.org/wiki/HTTrack
20:49 fedaykin "HTTrack - Wikipedia, the free encyclopedia"
20:51 shortcircuit Ok, I know more or less what's happening.
20:51 shortcircuit Someone using HTTrack (and I even know from which IP) is mirroring the sight, and not at a pleasant rate.
20:51 * shortcircuit checks to see if he still has Crawl-Delay in robots.txt
20:52 BenBE Some parts of the website may not be downloaded by default due to the robots exclusion protocol unless disabled during the program.
20:53 shortcircuit I'm not seeing it hitting any of the prohibited URIs, so I don't think it's ignoring robots.txt. I think it's just hitting the site too fast.
20:53 Mathnerd314 shortcircuit: can you add a rate-limit?
20:54 shortcircuit Mathnerd314: I certainly could, but it'd require actually having firewall rules on the box; at this point, I just don't have any services listening on interfaces that I don't want them to be accessible from.
20:54 shortcircuit BenBE: Any idea if the new GeSHi core might be more CPU-intensive than the previously?
20:55 BenBE You mean because of the changes? No actual code changes. And the patch I asked about in my blog didn't get integrated because of that reason.
20:56 shortcircuit Grr. You know, most crawler websites have big, visible links for webmasters to fine-tune their robots.txt. I'm having difficulty finding httrack's.
20:56 BenBE There was no feedback either though :(
20:57 Mathnerd314 shortcircuit: http://www.httrack.com/html/abuse.html
20:57 fedaykin "HTTrack Website Copier - Offline Browser"
20:57 Mathnerd314 though it doesn't say much
20:58 shortcircuit I've been noticing this about the attitude in their site's documentation.
21:03 shortcircuit Mathnerd314: The other problem with ratelimiting is that it'd intensely slow down first-time visits; there are five or six pulls from rosettacode.org for each cache-free client.
21:03 fedaykin "Welcome to Rosetta Code - Rosetta Code"
21:04 shortcircuit Coming up with a burstable setup is possible--and I know enough about iptables to figure it out--but rediculously complicated.
21:05 shortcircuit HAHAHA!
21:05 shortcircuit Go figure I'd have encountered this before.
21:05 shortcircuit http://forum.httrack.com/readmsg/23408/index.html
21:05 fedaykin "Crawl-Delay and Honored robots.txt lines - HTTrack Website Copier Forum"
21:06 shortcircuit First duckduckgo search result for 'httrack crawl-delay'
21:07 shortcircuit So, yeah, the general gist of it appears to be: "We already have a rate-limiter. Do more work on your end."
21:07 shortcircuit Time for another mirror poison-pill. I don't like doing those.
21:18 BenBE Do you see the characteristics of that crawling?
21:18 BenBE Like X connections with delay X in between?
21:20 shortcircuit Not easily. Squid's still not configured for logging, so I only see what gets past it.
21:21 shortcircuit I can see that it's not hitting any pages I forbade in robots.txt, though.
21:21 BenBE hmmm.
21:23 BenBE Even though I am only allowed for limited logging in Ger; it's one of the first things I set up. (Logging the IP falls under privacy laws and isn't allowed usually unless explicitely stated on the website. But I won't give out those logs to anyone unless there's a strong requirement by law AND I know WHY I should give them out.
21:23 BenBE OT: shortcircuit Does RC have IPv6?
21:24 shortcircuit BenBE: Link-local address only, and the server isn't configured to respond to it.
21:24 shortcircuit afk for a bit.
21:24 shortcircuit I'm going to ride out this one
21:28 BenBE left #rosettacode
21:28 BenBE joined #rosettacode
21:29 BenBE left #rosettacode
21:29 BenBE joined #rosettacode
22:38 MigoMipo left #rosettacode
23:12 shortcircuit BenBE: I might enable 6to4 at some point, for IPv6. The data center I'm at doesn't know what it wants to do with IPv6 yet, and offers 'experimental' support, with the caveat that the behavior may change...I figure 6to4 is the only way I'll have a stable IPv6 address there, for now.
23:15 BenBE Avoid 6to4 AS HELL; that won't do you any good. Believe me. The only time I got a stable IPv6 connection was when it was offered natively or I set up my own tunnel.
23:16 BenBE Maybe check with some tunnel providers/brokers like SixXS or the-like: Running aiccu (AYIYA or Heartbeat tunnels) works well and without any mayor downtimes.
23:17 BenBE And it's not to hard to set up (on Debian there's a package for aiccu you just install and answer 4 questions from the installer --> done).
23:17 BenBE Other distros shouldn't be that different.
23:17 shortcircuit BenBE: I've got a SixXS account, so I can get a tunnel if I need it. Still, re 6to4; stability will be dependent on your nearest anycast respondent to the 6to4 tunnel endpoint address. Maybe you just had a bad location on the network?
23:18 shortcircuit I've (sorta) got IPv6 at home
23:18 shortcircuit 6to4; it's disabled until I get the firewall finished being set up properly.
23:19 BenBE Tried 6to4 with T-Online (German provider) and DFN (German Research Network) both getting no reliable connection if any.
23:19 BenBE On TO the connection is flaky, on DFN proto 41 is usually blocked.
23:20 BenBE Tried with other providers too, but IPv6 still is a sad story in Germany ...
23:21 shortcircuit From my tests at home, 6to4 looks like it'll work fine.
23:22 BenBE So the IP will be something like 2002:42dc:00e7::1?
23:22 shortcircuit Though I didn't try any long-sustained connections, or bulk transfers
23:23 FireFly left #rosettacode
23:23 shortcircuit Can't verify atm.
23:23 BenBE I didn't even get a proper IP with 6to4; thus couldn't do any reliability tests I only ever saw it by chance I had one and the time I got to test it it was gone again.
23:24 BenBE BTW: http://test-ipv6.com/ - What's your test-score there?
23:24 fedaykin "Test your IPv6."
23:25 BenBE Mine is 9/10 + 7/10 - and I'm only missing the last points because my DNS server currently doesn't support IPv6 due to a missing IPv6 subnet for it.
23:41 shortcircuit I believe my router still has the 6to4 tunnel enabled, I'm just not throwing out RAs on my internal network.
23:43 BenBE I throw out RAs for my own network I get routed through my tunnel from SixXS
23:49 shortcircuit Spun up radvd briefly, so I can test IPv6 from inside my network

| Channels | #rosettacode index | Today | | Search | Google Search | Plain-Text | summary