Perl 6 - the future is here, just unevenly distributed

IRC log for #rosettacode, 2011-07-21

| Channels | #rosettacode index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:56 kpreid left #rosettacode
01:41 FireFly left #rosettacode
02:36 trolzies left #rosettacode
02:38 realazthat joined #rosettacode
02:57 kpreid joined #rosettacode
03:49 BenBE left #rosettacode
04:19 mwn3d_phone left #rosettacode
04:24 mwn3d_phone joined #rosettacode
10:54 FireFly joined #rosettacode
12:27 BenBE joined #rosettacode
13:19 dagnyscott joined #rosettacode
14:08 dagnyscott left #rosettacode
14:08 rodt joined #rosettacode
14:10 rodt :S dunno what to call my language :/
14:23 rodt theres something magical about self hosting booting strapping compilers... oh whatever its called... i cant stop dreaming about it :S
14:24 rodt i spose its one of the most fun things to write ? :O
14:35 rodt thing is as with most langs you have this tiny core, and massive standard library, with just the tiny core even the simplest of things would be pages of fairly obscure code
14:36 Hypftier Perl? C#? :P
14:37 rodt yeah im not sure how the core/rest works with them, but the cores seem pretty big
14:37 rodt you can do alot without libs
14:38 rodt like lisp, but most of that is written in a lot less etc...
14:38 rodt eitherway it seems ive tons of work to do - standard lib -- before i could tackle much rossetacode in a reasonable way
14:39 Hypftier Just wanted to point out that there are indeed languages that provide lots of complexity in the compiler/interpreter alone, instead of having a tiny core :)
14:40 rodt yeah i added a random generator and synchroniser yesterday
14:40 rodt i want a tiny core tho obviously - its considered more mathematically pleasing or something
14:41 rodt the more i add to teh compiler/interpreter - the harder to write them, and less nice the language seems lol
14:42 rodt im into having a massive set of standard libs, like java
14:42 rodt gunna take ages tho :S
14:43 rodt part of the main idea / philo behind the language is to squash users and programmers up ;)
14:44 rodt with a decent stdlib, there is no code
14:44 rodt cause mine is graphical :D
14:45 rodt so code for a irc client is just the irc client
14:45 rodt you can pull it apart :D
14:45 rodt and look in stuff to see how its made
14:47 rodt ok i spose you hide the engineering lines is all
14:47 rodt i sort of got rid of that as well tho
14:47 rodt user and programmer = no difference
14:52 rodt so yeah at teh lvl of say an basic irc client , and with a good base of stdlibs, there would no need to look inside : standard.lib.button(connect) :D
14:53 rodt so most of the rossettacode would just look like someone jokingly too a screenshot of a running app lol
15:03 rodt trying to think how long its gunna take
15:03 rodt took 4 days to write the textbox... just XXXXXXX many things to go ?
15:11 rodt lolz
15:13 rodt we are experienced coders having developed the various concepts to easily translate algorithms in our heads into linear squencial text
15:15 rodt i am getting a flow for it tho now ;)
15:15 rodt my brain must of have half a translator for real* algorithms* now
15:16 rodt ... as in the real world implements algorithms like that
15:25 kpreid left #rosettacode
15:37 mwn3d_phone left #rosettacode
15:59 TimToady Where do people get this notion that forcing humans to use languages is cruel and unusual?
16:09 mwn3d_phone joined #rosettacode
16:12 rodt cause of teh word force i suspect :P
16:18 rodt its just representation anyway... i mean i dont care how famous a creative writer you are, id prefer you to show me the picture, than describe... if you have a pic that is... etc...
16:19 rodt or just let me see what your eyes saw
16:19 kpreid joined #rosettacode
16:21 rodt normally language is only used to do with a bad situation - its all we got
16:22 rodt to make do with a bad situation*
16:23 rodt we cant transfer experiences, but we can try to resonate transfer using a sequence of sounds
16:23 rodt basically you dont learn from books...
16:24 rodt they just resonate into your the kinds of thigns you would need to do learn
16:24 rodt to do to learn* ...
16:49 kpreid left #rosettacode
16:50 kpreid joined #rosettacode
16:55 MigoMipo joined #rosettacode
17:24 TimToady I still think you're view is way out of balance.  Pictures have great difficulty representing generic abstractions.  For languages this is really easy.  Every common noun is a generic.
17:24 TimToady *your
17:24 mikemol oi
17:24 TimToady howdy
17:24 mikemol lo
17:24 mikemol A/C at work is out. A/C at home has a hard time coping.
17:27 TimToady supposed to go up to 33 here today, but our home AC is actually working, for the first time in ten years
17:27 mikemol We've got a 'window unit' in the wall in the living room. Computer room is at the other end of the apartment, down a hall.
17:28 mikemol I guess I didn't think that one through.
17:28 TimToady sometimes a well-placed fan can keep the heat-pipe running :)
17:29 TimToady but you have to work with the natural convection, not against it
17:30 * mikemol nods
17:32 mikemol Before it got this incredibly hot, it'd be enough to not use the A/C. I'd open a window at either end of the apartment, set a couple box fans to circulate air in on the cool side, out on the warm side.
17:33 TimToady rodt: a picture can be worth a thousand words, but only if it's the right picture.  Most pictures are crap, and your brain throws away 99% of the pictures it's even interested in.
17:33 TimToady pictures don't communicate well either when my brain throws away a different 99% than yours does
17:36 TimToady well chosen language will do most of the filtering for the listener, and build the desired picture in the listeners head more efficiently and more accurately, if you're a good writer
17:36 TimToady or speaker
17:37 TimToady if you're trying to communicate ideas rather than bitmaps, language wins
17:38 Hypftier communicating data or facts can be done quite well in images, though. Especially when I look at what some people come up with visualizing multi-dimensional data.
17:39 TimToady sure, I'm not meaning to oversimplify in the other direction
17:40 mikemol TimToady: 'bokeh' might be a useful concept when considering images and information.
17:41 mikemol With bokeh, you get people to pay attention to what's important.
17:41 mikemol Hm. I can probably draw good and bad examples from my flickr set.
17:42 TimToady yes, that's a bit like good writing
17:44 TimToady much like the concept of inverted syntax draws your attention to one part of the program or another
17:44 TimToady shoving things off to the right, or down later, is a form of defocusing
17:45 mikemol Good: http://www.flickr.com/photo​s/28208534@N07/5949545252/
17:45 fedaykin "Dandelion seed head | Flickr - Photo Sharing!" http://rldn.net/iDcu
17:45 TimToady and certainly the visual metaphors of programming have always been important in the design of Perl
17:45 mikemol bad: http://www.flickr.com/photo​s/28208534@N07/5875326637/
17:45 fedaykin "JAFAX 16 - Cosplay - Dark Link | Flickr - Photo Sharing!" http://rldn.net/sX
17:45 mikemol Oh god it's horrible: http://www.flickr.com/photo​s/28208534@N07/4700803466/
17:45 fedaykin "012_pregamma_1_fattal_alpha_0.1_b​eta_0.8_saturation_1_noiseredux_0 | Flickr - Photo Sharing!" http://rldn.net/4bD
17:46 mikemol Hm. "well chosen language will do most of the filtering for the listener" ... Now I understand why people say C++ has terrible syntax.
17:47 TimToady I actually like the last one :)
17:48 TimToady but I like it *for* its artifacts, not in spite of them
17:49 TimToady and the second one is as much a problem of composition as it is a problem of insufficient bokeh
17:50 * mikemol nods
17:50 TimToady but you don't always get a choice when shooting quick photos
17:50 mikemol The *because* its artifacts was why I originally liked the photo. Were I to take it again, though, I'd drop the DoF through the floor and get the sky as a backdrop.
17:50 TimToady most of what "good photographers" do is just not show you their bad pix
17:51 mikemol IIRC, though, I had to compose to just miss the bushes between me and the statue. Not sure I could get the angle I'd need.
17:56 rodt theres prolly somekind of variable good balance depending on domain... i like narrators :D
17:58 rodt but yeah somethings seem more suited to one or the other mainly... luckily they arent mutually exclusive
19:03 mwn3d_phone left #rosettacode
21:41 BenBE Only few new files since my first snapshot. Seems to stabilize at roughtly 14 MB.
21:42 BenBE Now for the hard part: Find a way to detect the various languages.
21:43 BenBE (Only given this set of code samples AND the information from GeSHi language files).
21:43 BenBE Who else is on to the challenge?
21:46 Hypftier but didn't you know the languages when you formatted the snippets, already?
21:47 BenBE Yes. That's what I used when doing the code snippet dump from RC.
21:47 Hypftier Oh, wait, so you just have a bunch of code snippets, not formatted code snippets?
21:48 BenBE The question I want to solve is: Given an /unknown/ piece of source identify the programming language it most likely was written in.
21:48 BenBE yes. I Only have a set of ~30k textfiles sorted into ~100 folders marked with a language name. That's my "training data"
21:49 Hypftier hm, there was a question on SO on that a long time ago.
21:49 BenBE SO?
21:49 Hypftier Stack Overflow
21:49 Hypftier there are some programs that already do that with varying degrees of success. Ohcount does a pretty good job (the one that Ohloh is using for statistics), Prettify does a very lousy job.
21:49 BenBE Ah, IDK if I read it, but yeah, I seem to recall something like there was ...
21:50 Hypftier Well, I answered that one back then, but there was no really useful answer, I think.
21:50 BenBE I stumbled accross Ohcount too. GeSHi's on Ohloh too ;-)
21:51 Hypftier I think the most reliable way would be to actually have grammars for all the languages and try parsing the snippets. Wherever it fails it cannot be that language (or the grammar is wrong) ;)
21:51 Hypftier Not very fast approach, or easy to implement, though :)
21:51 BenBE Well, not all snippets are fully compileable programs at all the times.
21:51 Hypftier From geshi's perspective ... trying to figure out keywords, quotes and other literals
21:52 BenBE Yes. Basically that's what I'm going to try.
21:52 Hypftier Yeah, that's a major problem; although you could probably adapt the grammars so that "likely" snippets are possible, such as functions or lines
21:52 Hypftier *nods* it might work better with geshi than with prettify, since geshi has a nuch larger library of languages already.
21:52 BenBE If you like: the compressed data (tar.gz) is roughly 6.1 MiB; and because of GFDL 1.2 I have to publish it anyways.
21:53 Hypftier And hopefully doesn't try classifying VB code as C which leads to everything since the first comment being formatted as a string ;)
21:53 * Hypftier has to write a thesis currently, so I probably won't have time :/
21:54 BenBE Just change your thesis to this new problem then :P
21:54 Hypftier only two months left ... this could be a bit challenging. And I don't have that much of a clue of languages ;). I took HCI and UX, not theoretical CS :)
21:55 BenBE HCI? UX?
21:55 Hypftier Human-computer interaction and user experience. Rare topics for CS students to even touch, I guess :)
21:56 rodt computers shouldnt exist :P
21:57 BenBE Ah, well, I know some handicapped people that use computers so interface design is always a concern.
21:58 BenBE Usually if there's a new release with a /new/ design that interface designers built ... the program /IS/ unuseable ;-)
21:58 rodt i tried to think a lot about hci and ux with my lang design
21:58 rodt yeah lols
21:59 Hypftier I think it's a sad state with many open-source tools that developers, not UX professionals built the UI :P
21:59 BenBE rodt Unfortunately most programmers might mistake your language for either Piet or the latest draft standard in doing UML diagramms.
21:59 rodt problem - need code to make ui / users cant code
22:00 rodt solution = my lang ;)
22:00 rodt yeah at the moment it looks like that
22:00 rodt its ui extendable
22:01 BenBE I'll stop with the jokes here or it would get unfair ;-)
22:02 BenBE But actually: It's just the same with API design.
22:02 BenBE Most programmers just can't do it right.
22:02 rodt yeah so force component design / input / output black boxes
22:04 rodt a lot of programs with concurency  - so force a safe scalable concurrent system
22:04 rodt my lang solves all sorts of problems it seems
22:05 rodt programs/problems lol
22:06 BenBE Well, let's head back to the language detection problem we started out with.
22:06 rodt i think Hypftiers' grammer checker sounds best ;)
22:06 Hypftier That requires things we don't have ;)
22:07 rodt time ?
22:07 BenBE What I already tried was an approach using Markov chains to test probabilities of how likely it is that in a given code language some char is followed by another.
22:07 Hypftier Hm, that could trip over strings and identifiers
22:08 Hypftier but might work for very keyword-heavy languages
22:08 BenBE This surprisingly works for a small number of languages to be detected, but horribly fails if a) the languages are simular or b) you have too many languages to detect.
22:09 BenBE Hypftier Surprisingly: Algol68 would be a candidate with a good detection then? But it isn't ;-)
22:11 rodt what about comments ?
22:11 rodt i dont think yourll do too well with stats tests on something like this
22:12 BenBE Not filtered in that approach. At least not, because to filter comments you would have to check for some known patterns to detect comments beforehand.
22:12 Hypftier BenBE: I thought Cobol as well ;)
22:12 BenBE I wanted to keep it as general as possible.
22:12 BenBE Have a look here: http://benbe.home.omorphia.de:4381​5/geshi-misc/langdetect/detect.php
22:12 sorear BenBE: about ten years ago someone did something like that for human languages
22:13 BenBE That's a live detection run on a snapshot from tue only detecting languages abap through c
22:13 Hypftier One could build several classifiers, each with a different approach and either rank them, based on how well they perform, or constrain them to certain languages or pick the language most classifiers agree with
22:13 sorear their system was very simple
22:13 Hypftier digraphs and trigraphs, I guess
22:13 Hypftier + scripts
22:13 sorear take a known file K and an unknown file U
22:14 BenBE sorear Yes. and for human languages Markov chains usually give quite good results even if only typing a few characters of a text in any given language.
22:14 sorear length(gzip(K+U)) / (length(gzip(K)) + length(gzip(U)))
22:14 sorear no custom markov chains needed
22:14 Hypftier so, solely based on information density
22:15 BenBE That's basically a result from the distributions you get from the markov chains.
22:15 rodt you need a grammer checker of some sort basically...
22:15 sorear not information density
22:15 Hypftier entropy, sorry
22:15 sorear not that either
22:15 sorear gzip is adaptive
22:15 rodt but it wont work for computer languages
22:15 Hypftier ah, right
22:16 sorear it can compress data with uniform statistics better than data with strong block boundaries
22:16 Hypftier Yes, the problem with source code is the unpredictable things liek identifiers and literals :)
22:16 sorear even if the entropy is the same across the gap
22:16 BenBE Actually most code should compress better than natural languages.
22:16 BenBE Especially with the identifiers.
22:17 BenBE Since they repeat so often accross the code you have a lot of backreferences you can utilize for compression.
22:18 Hypftier Hm, I remember the PNG+HTML+JS+WebGL hack ...
22:18 BenBE ?
22:18 Hypftier the resulting size was a very delicate thing where even small source changes could blow it up again
22:19 BenBE Any more details?
22:19 Hypftier http://demoseen.com/windowpane/magister.html
22:19 Hypftier that one :)
22:19 Hypftier should work with Chrome and FF5
22:20 Hypftier That's basically a PNG that parses as HTML as well and contains the remaining source code in the image (which is compressed)
22:22 fedaykin http://rldn.net/EDR
22:24 rodt lol nice :)
22:24 BenBE FF4 (Palemoon on Windows) does too ...
22:24 BenBE Even if it's kinda slow)
22:24 Hypftier those were two evil evenings of collaborative golfing. Not my code, though I think 30 bytes were my doing
22:24 Hypftier (30 bytes shortening, that is)
22:25 BenBE hehe ^^
22:25 rodt :D
22:26 BenBE Well, that demo give's a fascinating 4-8 fps in Full-HD on no HW accel ;-)
22:26 Hypftier it's horribly slow on my other machine with an Intel GMA 4500 MHD ... and doesn't work at all here on an GMA 950
22:33 kpreid left #rosettacode
22:34 MigoMipo left #rosettacode
22:45 mwn3d_phone joined #rosettacode
23:27 kpreid joined #rosettacode
23:45 kpreid left #rosettacode
23:45 kpreid joined #rosettacode

| Channels | #rosettacode index | Today | | Search | Google Search | Plain-Text | summary