Perl 6 - the future is here, just unevenly distributed

IRC log for #pdl, 2013-09-23

| Channels | #pdl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
13:51 Su-Shee joined #pdl
13:52 Su-Shee hi. ;)
14:03 Mithaldu heya
14:06 sivoais activity! good localtime()
14:11 Su-Shee I just watched david mertens' talk from 2012 about PDL.
14:12 Su-Shee in an attempt to whip something up in PDL to get to a comparison to R and python's scipy/numpy/pandas..
14:16 sivoais it would nice to have data frames in Perl
14:16 Mithaldu numpy and pdl are pretty much the same thing
14:16 Su-Shee sivoais: oh. it doesn't have something equivalent?
14:18 sivoais Su-Shee: no, you'd probably have to do something like a hash and keep a PDL in each
14:18 sivoais I would write one but I lack the tuits :-/
14:18 Su-Shee Mithaldu: ok, I'm mostly interested in the part beyond that. both R and Pandas allow me to not care for the structure behind it, I can just create one and apply functions on it very easily.
14:19 Mithaldu i've no idea what you mean with that
14:20 Su-Shee Mithaldu: R and Pandas have data frames and something which in the end translates into arrays and multidimensional arrays. I just load my data into one of them and then apply whatever function I'd like to perform on the entire thing, on slices of it or rows of it.
14:20 sivoais Mithaldu: the method Su-Shee is talking about lets you deal with tabular data of different types easily
14:21 Su-Shee among other things, exactly.
14:21 Mithaldu "something which in the end translates into arrays and multidimensional arrays" <- that's pdl and numpy afaik
14:21 Mithaldu "I just load my data into one of them and then apply whatever function I'd like to perform on the entire thing, on slices of it or rows of it." <- that too
14:22 sivoais yeah, PDL gives the tools to do something like that, but the interface is not quite the same
14:22 Su-Shee sivoais: in what way?
14:22 Su-Shee sivoais: I have to admit that R is _extremely_ convenient from a user's point of view.
14:23 sivoais Su-Shee: PDL's multidimensional arrays are of a single type for one. And labelling each column isn't builtin
14:24 sivoais Yeah, that's R's strength
14:24 Su-Shee sivoais: oh, I see.
14:24 Su-Shee R is horrible from a developer's point of view, but the level of whipuptitude and productivity is amazing.
14:24 sivoais So if I were designing data frames for Perl, I'd wrap around PDL
14:25 sivoais In fact, with some of the work on getting PDL Moo-ifiable, it might be possible to get data frames by subclassing PDL
14:25 Su-Shee sivoais: how data-frame-ish could I treat a multidimensional array if I put only one type in it? after the talk, I at least have slices and ranges and such, but you're saying I can't grab a column by name?
14:25 sivoais no, just by index
14:26 Su-Shee hm, ok.
14:27 Su-Shee what about the range of functions.. I've seen in the talk the usal stuff like min, max, sin, foo, bar - how much statistics and family is there and how simple is it compared to R? (I mostly don't care for the whole physics-astronomy shebang)
14:29 sivoais there's the normal stuff like avg() and you can get prob. and regression from <http://pdl-stats.sourceforge.net/>
14:29 sivoais there's another talk on that at YAPC::NA 2012
14:30 sivoais "Maggie Xiong Statistics and data mining with Perl Data Language" <http://www.youtube.com/watch?v=DFX_cNB97yQ>
14:30 Su-Shee yeah, I couldn't load it :(
14:30 Su-Shee yes that one :/
14:30 Mithaldu Su-Shee: http://search.cpan.org/~chm/PDL-2.006/Basic/Primitive/primitive.pd
14:30 sivoais hmm
14:30 Su-Shee sivoais: I'll try again later.
14:30 sivoais I can download it and share a link
14:31 Su-Shee sivoais: don't bother, I'll get to it somehow :)
14:31 sivoais OK :-)
14:31 Su-Shee I also already cpan'd App::Prima::REPL
14:33 Su-Shee sivoais: how hm. how do I say this. how "consistent" is PDL? I mean R is a horrible cludge of weirdly named "stuff" thrown together and wrapped into each other which kills me every day from a developer's point of view?
14:34 Mithaldu can you give examples of what annoys you?
14:35 sivoais Su-Shee: PDL definitely has a unique feel to it. The other array processing languages I use are R and MATLAB. PDL's slicing syntax is incredibly powerful compared to those.
14:35 Su-Shee Mithaldu: do you know R at least a little?
14:35 Mithaldu no
14:35 sivoais but PDL needs more functions
14:35 sivoais for all kinds of domains
14:36 Su-Shee sivoais: well R doesn't even have a simple shebang.. ;)
14:37 sivoais hehe
14:37 Su-Shee Mithaldu: it's difficult to explain without having used a couple of R packages.
14:37 Mithaldu aight
14:37 Su-Shee sivoais: I've recently looked for testing module.. dear god.. horrible.
14:37 sivoais oh, but I have to say, R has *excellent* plotting
14:38 sivoais lol, yes... testing
14:38 sivoais Su-Shee: have you joined <https://groups.google.com/forum/#!forum/the-quantified-onion>?
14:38 sivoais they had an interesting testing discussion a while back
14:39 Su-Shee Mithaldu: imagine something weirdly OO-and-functional mixed and totally inconsistent naming everywhere. if one function or module uses an attribute e.g. "color", the other uses "col", the next "aes", the next "PlotSetting" and so on.
14:40 sivoais MATLAB 2013 just got a unit testing package built-in. About 30 years later. It's unlikely to be used anyway. :-P
14:40 Mithaldu Su-Shee: not sure about the names, but pdl has the oo/functional thing too in that most functions are functions AND methods
14:40 Su-Shee sivoais: no, I'm writing a 10 part series about handling open data for a german tech magazine and basically went through all the tools this year and whipped stuff up from R to Python to SQL to GNUPlot, D3...
14:41 Su-Shee Mithaldu: I was trying to say that it's completely garbled and horrible in R, sivoais seemed immediately to know what I mean :)
14:41 sivoais ooh, I would like to read that! Except I'd have to read a translation :-P
14:41 Su-Shee sivoais: the code is in english, too. just not published yet.
14:41 sivoais ah
14:41 Mithaldu Su-Shee: i'm getting the idea :)
14:42 Su-Shee sivoais: here's two examples: https://github.com/Su-Shee/open-data-berlin-moves https://github.com/Su-Shee/open-data-berlin-inhabitants
14:42 Mithaldu also, are those articles online?
14:43 Su-Shee not yet, editor is a little slow.
14:43 Mithaldu aight
14:43 Su-Shee hopefully end of the year (heise open)
14:43 Mithaldu fwiw: i'm mostly here because chm also does OpenGL.pm and i intend to do opengl stuff with pdl
14:43 Su-Shee anyways. i'm in the section "machine learning" and I was hoping to add a PDL example
14:44 sivoais cool! I tried dealing with open city data before, but every data set had its own quirks.
14:44 Su-Shee sivoais: HA. HA.
14:45 sivoais yes, I know. ;_;
14:45 Su-Shee sivoais: it's horrible. open data sucks big time. I've literally had half a dozen problems on ONE excel file. encoding. weird cell count/format, german number notation...
14:45 Mithaldu hahaha, german numbers
14:46 Su-Shee sivoais: I started to shove it all in Makefiles and just throw shell at it. works exceptionally well
14:46 Mithaldu did you yell at them?
14:46 sivoais Makefile-based ETL pipeline? :-)
14:46 Su-Shee Mithaldu: no, because it's not their fault for starters and second why on earth shouldn't be GERMAN economic or city planning data be in GERMAN numbers?
14:47 Su-Shee sivoais: yeah -> https://github.com/Su-Shee/open-data-berlin-moves/blob/master/Makefile
14:47 Mithaldu it's on a computer in a data collection file meant for machine processing
14:47 Mithaldu it should be in a format that machines can easily process
14:48 Mithaldu also, german numbers suck
14:48 Su-Shee Mithaldu: yes, that's totally what someone working in some administrative department of the senat of berlin a) actually knows and b) has in mind ;)
14:49 Su-Shee sivoais: if you really like good plotting: D3.js a-ma-zing.
14:49 Mithaldu never too late to educate! :D
14:49 sivoais for machine learning, there's stuff like <https://metacpan.org/module/Algorithm::SVM>
14:49 Mithaldu anyhow, i'm just bitter because dealing with that shit in localization drives me up trees
14:49 sivoais but that's not PDL specific
14:50 Su-Shee sivoais: I don't know what I'm actually going to illustrate yet, I'm still in the reading up on the subject phase.
14:50 sivoais ah
14:50 Su-Shee sivoais: I did find a really nice weather data set as open data dating back to the 1880ies I'm thinking to use to whip up some prediction :)
14:51 Su-Shee sivoais: I went through "sql and basic min/max", "geo stuff", "natural language processing" etc.
14:51 sivoais It's a large subject! I use machine learning for my image analysis research
14:51 Su-Shee sivoais: yeah, that's why I have to decide on some nice basic example usage.
14:52 Su-Shee sivoais: e.g. I have in "natural language processing/linguistics" a "automated summary" script (perl) and a "gender guessing" (python) script and a word cloud visualization in R.
14:53 Su-Shee sivoais: so I want to find nice examples more related to machine learning..
14:53 Su-Shee sivoais: three examples of combing all this stuff with three different web frameworks, two different databases..
14:54 sivoais perhaps you might want to look at this real-world Perl + NLP + machine learning code
14:54 sivoais <http://wing.comp.nus.edu.sg/parsCit/>
14:54 sivoais it's one of the better citation extraction algorithms out there
14:55 sivoais used in CiteSeer
14:55 Su-Shee _now_ you're telling me ;)
14:56 Su-Shee sivoais: but interesting to know.
14:56 Su-Shee DID I JUST SEE WSDL? oh lord..
15:11 sivoais just for full-measure, <http://lush.sourceforge.net/> and <http://julialang.org/> are also worth looking into
15:17 Su-Shee sivoais: na, too exotic. this is my last article which I have to finished until my new job starts...
15:18 Su-Shee sivoais: I have a short ebook on doing data stuff with clojure which uses weka and some other classes behind the scenes..
15:18 Su-Shee sivoais: doing data in what.. perl, r, python, javascript and clojure is enough :)
15:18 sivoais hehe
15:21 Su-Shee plus sql, gnuplot, shell..
15:22 Su-Shee but it's going to be interesting, "data" is going more mainstream and there's already a couple of books for non-scientific folks.
17:00 drrho joined #pdl
17:02 drrho joined #pdl

| Channels | #pdl index | Today | | Search | Google Search | Plain-Text | summary