Camelia, the Perl 6 bug

IRC log for #bioperl, 2010-02-17

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:43 brunov joined #bioperl
06:18 bag_ joined #bioperl
13:50 brunov joined #bioperl
15:13 brandi joined #bioperl
15:19 brunov joined #bioperl
15:23 * faceface summons pyrimidine
15:51 brandi left #bioperl
16:28 driveby_bot joined #bioperl
16:28 driveby_bot /home/svn-repositories/bioperl: r16845 (dave_messina) : Correcting failures in tests 15 & 36 due to incorrect use of Bio::Species->species. All tests in seqxml.t now pass.
16:28 driveby_bot Diff: http://tinyurl.com/y8hjblo
19:49 driveby_bot joined #bioperl
19:49 driveby_bot /home/svn-repositories/bioperl: r16846 (jason) : get the test count right
19:49 driveby_bot Diff: http://tinyurl.com/yjcywaq
19:51 driveby_bot joined #bioperl
19:51 driveby_bot /home/svn-repositories/bioperl: r16847 (jason) : rollback Florent's changes that defaulted Bio::PrimarySeq instead of Bio::Seq creation of objects
19:51 driveby_bot Diff: http://tinyurl.com/yzn7l7o
19:55 * deafferret summons pyrimidine!
20:00 * brunov summons purine!
20:04 * perl_splut breaks them all down to just a bunch of atoms *
20:07 perl_splut http://news.bbc.co.uk/2/hi/​science/nature/8516319.stm
20:08 rbuels by the power of uracil, i have the powerrrr!!!
20:08 * perl_splut steals uracil and replaces it with thymine *
20:11 deafferret perl_splut: don't do it. your process will swap out at 4GB and crash and burn, forcing you to rewrite it.
20:11 deafferret at least mine did on Monday.
20:11 deafferret sudo rbuels fix thatcrap
20:13 * perl_splut is running at 1024-bit with 50PB of RAM *
20:14 deafferret really?
20:14 deafferret that. is. impressive.
20:14 deafferret GPUs I assume
20:15 perl_splut nope. that's just my cpu. waiting on the upgrades which will push me into quantum land...
20:16 deafferret perl_splut: can you run this R crap for me? "failed to allocate 3.2GB vector" would be no problem for you!
20:19 perl_splut afraid R doesn't exist for my system... the joy of having such a unique piece of hardware :)
20:53 dnewkirk joined #bioperl
20:58 dnewkirk joined #bioperl
21:11 bag joined #bioperl
21:16 * rbuels dials deafferret's same-host-parallel-processing helpline
21:17 rbuels boopboopboopboopboopboopboop
21:17 * rbuels lets it ring, waiting for an answer
21:22 deafferret hello?
21:22 rbuels oh hello mr. ferret
21:22 deafferret can I help you?
21:23 rbuels so i have this script that is doing rather slow, intensive crap with a bunch of xml files
21:23 rbuels http://gist.github.com/307019
21:23 * deafferret starts the meter
21:23 rbuels and i have a single big machine with lots of procs and ram
21:23 rbuels the find command you see there runs for hours
21:23 rbuels there are many thousands of files it is finding
21:23 rbuels and the file server is slow
21:24 rbuels i would like to apportion the files to a number of 'workers'
21:24 rbuels processes on the same machine
21:24 rbuels which will all log to that donefile
21:24 rbuels so i'll need to do some locking on that donefile ...
21:25 rbuels and share the find() command's output among multiple processes, if that's possible ...
21:25 rbuels what tools would you recommend for this task?
21:25 * deafferret rubs rbuels' chin
21:26 deafferret print while <$fh>;   ? what's that doing?
21:27 rbuels just blatting the contents of that tempfile to stdout
21:27 rbuels i'm running this like foo.pl > results.gff3
21:27 rbuels well, >> results.gff3
21:27 rbuels since it has some error tolerance built in
21:28 rbuels so that it doesn't necessarily have to be totally rerun if something screws up
21:29 deafferret so you want to converts thousands of .xml into a single gff3 ?
21:29 rbuels yep
21:29 deafferret s/s//
21:29 deafferret hm. is thousands of .xml files into thousands of .gff3 files, which you can later cat together, acceptable?
21:29 rbuels yeah probably
21:29 rbuels that would simplify the locking
21:30 rbuels i could use find -exec then
21:30 rbuels or something
21:30 rbuels well still need some kind of apportionment/job control
21:30 deafferret i'd do that then. MooseX::Workers. spool up the work list from find, prune it with $donefile, then spew out workers 100 files at a time or something
21:30 deafferret how long does each file take?
21:31 deafferret you have Moose and POE?
21:31 rbuels a few seconds
21:31 rbuels yeah prereqs are no problem
21:31 rbuels hmmm, i guess now's the time to pop my moo-sexworker cherry
21:32 deafferret hmm... i vote for launching a worker for each 20 input files
21:32 deafferret you can do anything with the right bovine sex workers
21:32 rbuels Mr. Synopsis is a little opaque
21:32 deafferret NO U
21:32 * deafferret looks
21:33 * rbuels cogitates and pokes at this hacked-up thing
21:33 rbuels (not the -sex::workers, this script
21:33 deafferret hmm? there's some good t/ examples
21:34 rbuels yeah i'm workin on it
21:34 rbuels you're a mensch
21:37 deafferret ya, this is your ticket
21:37 deafferret jhannah@klab:~/src/moosex-workers$ perl -Ilib t/10.worker.enqueue.t
21:37 deafferret all those callbacks are optional
21:38 deafferret me, I like Log::Log4perl callbacks all over the place
21:38 rbuels i could avoid having to cat if each worker locked the outfile and appended to it
21:38 deafferret k. If you know how to do that cleanly you'll have to edumacate me
21:38 rbuels heh i'm about to figger it oot
21:39 * deafferret streams his tuits to #bioperl
21:39 deafferret all done
21:40 rbuels wow that's amazing
21:41 rbuels food &
21:42 deafferret so if I were you, I'd build my 10000 @todo list, then enqueue() them out 50 at a time or whatever
21:42 deafferret http://gist.github.com/307036
21:42 deafferret that's my running-shit-at-$job[0] class
21:43 brunov 'j + POE + Moose r0ck5'; # nide
21:43 brunov *nice
21:45 deafferret here's the stack from my most recent use case... http://gist.github.com/307036
21:46 rbuels deafferret: well, i kind of need to stream my todo list
21:46 rbuels deafferret: the find takes a long time
21:46 deafferret k. so enqueue() each as it happens
21:46 rbuels deafferret: yeah
21:47 deafferret so POE oscilates between launching children and waiting for find
21:47 * rbuels runs back to the computer while the soup is in the microwave
21:47 rbuels lol
22:02 deafferret power outage. that was fun
22:04 deafferret I was suggesting "50 at a time" so you don't lose 0.5 seconds for perl compile time on every job
22:05 deafferret but if your find is that slow, maybe you don't care and just enqueue each job immediately
22:08 rbuels wai-hait?  perl compilation?
22:08 rbuels it's not forking under the hood?
22:08 rbuels deafferret: ^^^
22:08 deafferret um, no. In my case it's running a new perl command
22:08 deafferret maybe there's a forky version
22:09 deafferret runner.pl launches lots of child.pl instances
22:09 deafferret http://gist.github.com/307036  refreshed
22:09 rbuels oh, i get it
22:10 rbuels mine isn't calling anything with system
22:10 rbuels it'll just be doing the work in the worker itself
22:10 deafferret MooseX::Workers is a big POE::Wheel::Run fellator, not a fork() fellator
22:10 deafferret oh. hmm.
22:11 rbuels i'm sure POE::Wheel::Run must be forking under the hood
22:11 deafferret right, but the child it then launches is a seperate process
22:11 rbuels ya, cause it forked.
22:11 deafferret separate?
22:11 rbuels no es the same que threading
22:12 brunov claro que no
22:12 deafferret so there's actually 3:  (1) the runner.pl that POE::Wheel::Run'd, (2) the fork that will call system(), (3) the thing that gets called with system()
22:13 deafferret but maybe if you don't want....    lemme look
22:14 rbuels fork() is not a heavy thing.  having 3 processes is not a big deal
22:15 deafferret ya. looks like I was right. and you were too, about fork.    In MY case, I want to run "perl fatchild.pl", and I don't care about the 0.5 seconds it takes to compile that. BUT, if you need millions of tiny children you probably don't want to wait for perl compile-time 1M times
22:16 deafferret so, um. hmm. probably fine, and you should just do it  :)
22:16 * rbuels nods vigorously
22:16 deafferret are are you saying you only want 1 program, total, getting the whole job done?
22:16 rbuels still grokking the -sex::workers interface
22:18 deafferret hmm... you probably want to just sub { xml_to_gff3() } so you pay no penalty
22:18 deafferret like the t/ examples
22:19 deafferret whereas I am sub { system("fatchild.pl blah") }
22:19 rbuels oh this is nice, i could use MooseX::App::Cmd with this too
22:19 * rbuels nods even more vigorously
22:20 deafferret max_children(50) # or however much pain you can handle
22:20 deafferret enqueue() # as fast as find finds shit
22:20 deafferret without all the swearing
22:20 deafferret <-- dork
22:21 * rbuels will post the parallelized script shortly
22:21 rbuels after it actually works
23:31 rbuels sco-ho-ho-hore!
23:31 brunov joined #bioperl
23:32 * rbuels refers to jason's latest email
23:33 deafferret bah. you're just impeding progress!
23:33 * deafferret was looking forward to rewriting everything
23:34 deafferret whoah. he's also on board with dynamiting the monolith now
23:35 deafferret dy-no-mite!

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary