Perl 6 - the future is here, just unevenly distributed

IRC log for #rosettacode, 2014-01-03

| Channels | #rosettacode index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:47 okcoker joined #rosettacode
03:13 mwn3d joined #rosettacode
04:11 sirdancealo2 joined #rosettacode
04:12 sirdancealo2 how do i keep falling out of here
04:14 sirdancealo2 <Hypftier>sirdancealot: Wasn't there an attempt with Semantic MediaWiki somewhen in the past to get more queryable pages as well?              ...any idea if it got anywhere?
04:15 sirdancealo2 also, if you once scraped it, is there any code i could reuse?
04:15 sirdancealo2 maybe you guys should just make semanticizing rosettacode one of the tasks
04:23 sirdancealo2 semantic media wiki..would that mean a migration?
04:38 sirdancealo2 hm hm https://github.com/acmeism/RosettaCodeData
07:06 kpreid joined #rosettacode
09:25 Hypftier sirdancealo2: It's a set of semantic hints embedded in the normal page source; it's just an MW extension as far as I know.
09:38 sirdancealo2 ah
09:51 Hypftier Scraping is made slightly hard because there seems to exist exactly one parser for MediaWiki, which is itself (and no published or formal grammar exists either ... reminds  one of PHP in that respect ;))
10:27 Hypftier I seem to no longer have the code I used to scrape RC ... at least it's nowhere to be found on my machine or any backups or repositories
13:16 BenBE_ joined #rosettacode
15:08 okcoker joined #rosettacode
15:46 mikemol|zoe joined #rosettacode
16:30 ivanshmakov Hypftier: Fortunately, there are /many/ parsers for the (X)HTML made from the MediaWiki markup. Starting there is likely to be easier, and even more so if the pages happen rely on templates and built-in MediaWiki features that adorn the HTML with ‘class’ attributes and similar magic.
16:31 Hypftier ivanshmakov: I'd guess to extract more or less semantic data from a MediaWiki page it's easier to look at the templates and not the generated markup
16:33 ivanshmakov Hypftier: Depending on the data itself, but it’s going to be even easier if the templates produce RDFa or some other “microformat.”
16:34 ivanshmakov But given that (as was noted) there’s /no/ formal specification for the MediaWiki markup, I’d rather update the few templates to give me reasonable (X)HTML on the output, and try to avoid dealing with the markup at all.
16:36 ivanshmakov The other way to think of it is that the templates themselves are a kind of yet another markup-within-a-markup, — and there’re does not seem to be any “formal specification” for them, either.
16:41 mwn3d1 joined #rosettacode
16:52 FireFly Hypftier: I have a somewhat-working MW wikimarkup parser in JavaScript somewhere, I could dig it up if you'd like to give it a try
17:37 mwn3d joined #rosettacode
19:38 ivanshmakov joined #rosettacode
21:49 skinkitten joined #rosettacode
22:14 mwn3d1 joined #rosettacode

| Channels | #rosettacode index | Today | | Search | Google Search | Plain-Text | summary