Camelia, the Perl 6 bug

IRC log for #bioruby, 2012-04-27

| Channels | #bioruby index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:00 shevy joined #bioruby
01:14 wrk__ joined #bioruby
01:16 pjotrp joined #bioruby
03:05 wrk__ joined #bioruby
06:04 marjan joined #bioruby
06:59 pjotrp hi marjan
06:59 marjan hi
07:00 pjotrp ASN.1 was actually used for some microarray data formats
07:00 pjotrp so, not everyone does 'not invented here'
07:00 marjan :)
07:00 pjotrp ;)
07:01 marjan I'm not a big fan of binary formats :)
07:01 pjotrp me neither
07:01 pjotrp and I am no fan of SQL databases
07:01 marjan But this standard is heavily used in telecommunications software, so there should be a fast parser for that standard available.
07:01 pjotrp yes, libraries exist
07:02 marjan But then again, not very interesting for us at this point :)
07:02 pjotrp quite right, that would have been interesting
07:03 pjotrp I am also no fan of XML
07:03 pjotrp and I am no fan of pure OOP languages
07:03 marjan pjotrp, while experimenting with D, did you use multiple threads?
07:03 pjotrp that kinda defines my environment ;)
07:03 pjotrp yes we experimented with actors, but not bound against Ruby
07:04 pjotrp why?
07:04 pjotrp Artem also experimented
07:04 marjan I'm discussing with Artem what kind of signaling between threads we should use.
07:05 pjotrp use actors and immutable data structures
07:05 pjotrp it is maybe not the most light-weight
07:05 pjotrp but far easiest to get right
07:05 marjan And what did you use for stalling the actors that were working too fast?
07:06 pjotrp uhm. That may be a problem - you could idle them until the main thread gets ready
07:06 marjan did the actors ask for another job or did you use inboxes with limited size and blocking or something else?
07:07 marjan Or did your programs used a lot of memory? :)
07:07 pjotrp I think the main thread has to distribute work
07:07 pjotrp perhaps limiting mail box size is the best option
07:07 marjan (for the big amounts of data in the inboxes)
07:07 pjotrp and have many more actors than CPUs
07:08 pjotrp the main thread can just wait until a mail box has space
07:08 pjotrp but maybe the actors are to fine-grained, in this case
07:08 marjan well, I am thinking about that, because that way the actors would simply receive a chunk, process it, and send it. D would take care of the rest.
07:08 pjotrp there are also overheads in handling messages
07:09 pjotrp yeah
07:09 pjotrp Let's assume that the chunks are large enough to keep the mail box empty
07:10 pjotrp with GFF3 we can take an arbitrary number of blocks at a time
07:10 marjan I'm still didn't investigate how to stop the threads if the user wants to stop parsing in the middle. But I don't expect problems here.
07:10 pjotrp actors can handle that, it is in the book
07:11 marjan Out-of-bounds messages, I know, but in the book I didn't find if it works with the blocking action when the inbox is full.
07:12 marjan Anyway, just wanted to ask you did you have this kind of problem.
07:12 pjotrp OK, time for an experiment ;)
07:13 pjotrp be good to test for edge cases anyway
07:13 marjan definitelly :)
07:13 pjotrp sounds like a cucumber feature
07:13 pjotrp when I load the mailbox so it is full
07:14 pjotrp and I kill the actor
07:14 pjotrp it should terminate gracefully
07:14 pjotrp there you are...
07:14 marjan Well, the inboxes would be empty only in the case where everything, including the Ruby code works faster then the file reading from disk operation.
07:14 marjan So I don't think this is an edge case.
07:14 pjotrp it is the other way round
07:15 marjan But that is, terminating gracefully :)
07:15 pjotrp the inboxes will be empty when the actor is fast enough
07:15 pjotrp the inbox sits with the actor
07:16 marjan There will be an inbox at the end of the process too.
07:16 pjotrp if you have access to Joe Armstrong's Erlang book, it may be worth a read
07:16 pjotrp to notify actor completion?
07:17 marjan No, to control thoughtput.
07:18 pjotrp That is not clear to me - you send a message to a thread to do a job, when the actor is ready it sends a message back (ready!). That allows you to gather results. What is there about throughput?
07:18 marjan The main thread would take data from this last inbox when it needs it, and then the inbox would have a wacant slot, and the other threads would start working.
07:19 marjan I'm thinking more in the way that there is no explicit backward signaling.
07:19 marjan And no control thread.
07:19 pjotrp hmm. That is not standard practise
07:20 pjotrp you need to keep it simple
07:20 marjan Well, it is.
07:20 pjotrp explain it to me: I am standing in a bar with people. What happens?
07:21 pjotrp The people are actors
07:21 pjotrp what do we communicate?
07:21 marjan At the table between every person is a box, with limited size :)
07:21 pjotrp skip the box
07:22 marjan The first reads the file, and puts the data in the first box.
07:22 marjan When there is no more place in the box, he's idling, waiting for the next person to take the next piece.
07:23 marjan The next person proceses the data, and puts it in the next box.
07:23 pjotrp you mean chaining tasks?
07:23 marjan Same here, if the box is full, he has to wait.
07:23 marjan Yeah
07:23 pjotrp they are different jobs?
07:23 marjan yes.
07:24 pjotrp OK. It is not impossible, but it will quickly get complicated
07:24 pjotrp The main file is linear, and IO bound, right?
07:24 marjan yes
07:24 pjotrp So, the file reading thread will set the pace, right?
07:25 marjan Depends how fast it gets :)
07:25 marjan Current implementations are much slower then the IO boundary.
07:25 pjotrp absolutely, but let's stick to this reasoning
07:26 pjotrp the CPUs will take on load for parsing, right?
07:26 marjan Like I said, I expect the inboxes to control the threads.
07:26 marjan yes
07:26 pjotrp based on this, I would say the main thread does the IO
07:26 marjan no.
07:26 pjotrp you only need an actor to parse a block
07:26 marjan why should the main thread wait for IO?
07:27 pjotrp it is the limiting factor
07:27 marjan why not use a separate thread, so that the main thread can do what it needs to do?
07:27 pjotrp don't go there
07:27 pjotrp ;)
07:27 pjotrp you are talking parallel Ruby
07:27 marjan I wan't to offload the main thread as much as possible, because that's where Ruby is running :)
07:27 pjotrp that is crazy and dangerous
07:27 pjotrp :)
07:28 pjotrp When Ruby is ready for parallelism, it will create a thread for parsing
07:28 pjotrp you don't need to do that
07:29 pjotrp KISS
07:29 pjotrp I think we gain enough by parallelizing the parsing only
07:29 marjan Well, my approach looks very simple in my mind ;)
07:29 pjotrp I know, we are not all geniuses
07:30 pjotrp one thing to consider is that your library is a building block
07:30 pjotrp for others to build on
07:30 marjan Let me try do an experiment :)
07:30 pjotrp if it is simple, it will be easy on others
07:30 pjotrp sure
07:31 marjan ok, I will try, and will show you the results.
07:31 pjotrp but to try and return to Ruby and have your library as a background process I will send you to an asylum
07:31 pjotrp for sure ;)
07:31 pjotrp it will also complicate bindings to other languages
07:31 marjan thread, not process :)
07:32 pjotrp as a manner of speaking, but yes
07:32 marjan well, that was my idea actually.
07:32 pjotrp I got it
07:33 marjan Let me try it. If it gets complicated, you will send me your example code and I will use your approach, ok?
07:33 pjotrp it is the same as in the book
07:34 pjotrp I am worry about complicated, and the nature of Ruby's interpreter
07:34 marjan there is also the file copying example, with two threads.
07:34 pjotrp yes, with a very small gain
07:34 marjan I was thinking more in lines of that.
07:34 pjotrp I am more in line with Knuth
07:35 pjotrp Premature optimization is the root of all evil
07:36 pjotrp talk to him ;)
07:36 pjotrp but yes, please experiment
07:36 pjotrp but you'll need to do it against Ruby
07:36 marjan sure
07:36 pjotrp to convince me
07:36 marjan ok :)
07:37 marjan I will do the work for the first week in my proposal, gff_file.lines.each {}
07:37 marjan Would that be ok?
07:40 pjotrp sure
07:41 fstrozzi joined #bioruby
07:42 fstrozzi left #bioruby
07:42 fstrozzi joined #bioruby
09:26 ilpuccio joined #bioruby
09:26 ilpuccio hi guys
09:29 DannyArends hey there
09:32 matteop joined #bioruby
09:33 fstrozzi hello
09:39 Helius did you read  http://blog.railsware.com/2012/​03/13/ruby-2-0-enumerablelazy/ and http://patshaughnessy.net/2012/3/23/why-you-shoul​d-be-excited-about-garbage-collection-in-ruby-2-0 ? posted on ML from Pjotr
09:49 fstrozzi I haven't seen any post on the ML on this, looks interesting
10:12 matteop left #bioruby
10:22 Helius maybe it was private, btw are good posts
10:30 valeria joined #bioruby
12:20 marjan_ joined #bioruby
12:22 bioruby0 joined #bioruby
12:48 wrk__ i like the lazy yiels
12:48 wrk__ yields
12:49 wrk__ joined #bioruby
12:50 Helius :)
12:50 Helius pjotrp: hello
12:51 pjotrparrot joined #bioruby
12:52 pjotrparrot makes for concise code - and fast parsers
13:39 fstrozzi I'm about to release a new version of bio-faster for FastQ parsing. The new tests I've done confirm that  read_array_from_int method in FFI have better performance.
14:31 Helius fstrozzi: good
14:47 fstrozzi done :-)
15:27 pjotrparrot http://www.slideshare.net/tend​erlove/hidden-gems-of-ruby-19
15:27 pjotrparrot I had to laugh
15:31 pjotrparrot but looking at the fiddle module for bindings
15:32 pjotrparrot very dynamic
15:37 pjotrparrot joined #bioruby
16:00 pjotrp_t42 joined #bioruby
16:44 shevy2 joined #bioruby

| Channels | #bioruby index | Today | | Search | Google Search | Plain-Text | summary