Time |
Nick |
Message |
00:43 |
|
brunov joined #bioperl |
06:18 |
|
bag_ joined #bioperl |
13:50 |
|
brunov joined #bioperl |
15:13 |
|
brandi joined #bioperl |
15:19 |
|
brunov joined #bioperl |
15:23 |
* faceface |
summons pyrimidine |
15:51 |
|
brandi left #bioperl |
16:28 |
|
driveby_bot joined #bioperl |
16:28 |
driveby_bot |
/home/svn-repositories/bioperl: r16845 (dave_messina) : Correcting failures in tests 15 & 36 due to incorrect use of Bio::Species->species. All tests in seqxml.t now pass. |
16:28 |
driveby_bot |
Diff: http://tinyurl.com/y8hjblo |
19:49 |
|
driveby_bot joined #bioperl |
19:49 |
driveby_bot |
/home/svn-repositories/bioperl: r16846 (jason) : get the test count right |
19:49 |
driveby_bot |
Diff: http://tinyurl.com/yjcywaq |
19:51 |
|
driveby_bot joined #bioperl |
19:51 |
driveby_bot |
/home/svn-repositories/bioperl: r16847 (jason) : rollback Florent's changes that defaulted Bio::PrimarySeq instead of Bio::Seq creation of objects |
19:51 |
driveby_bot |
Diff: http://tinyurl.com/yzn7l7o |
19:55 |
* deafferret |
summons pyrimidine! |
20:00 |
* brunov |
summons purine! |
20:04 |
* perl_splut |
breaks them all down to just a bunch of atoms * |
20:07 |
perl_splut |
http://news.bbc.co.uk/2/hi/science/nature/8516319.stm |
20:08 |
rbuels |
by the power of uracil, i have the powerrrr!!! |
20:08 |
* perl_splut |
steals uracil and replaces it with thymine * |
20:11 |
deafferret |
perl_splut: don't do it. your process will swap out at 4GB and crash and burn, forcing you to rewrite it. |
20:11 |
deafferret |
at least mine did on Monday. |
20:11 |
deafferret |
sudo rbuels fix thatcrap |
20:13 |
* perl_splut |
is running at 1024-bit with 50PB of RAM * |
20:14 |
deafferret |
really? |
20:14 |
deafferret |
that. is. impressive. |
20:14 |
deafferret |
GPUs I assume |
20:15 |
perl_splut |
nope. that's just my cpu. waiting on the upgrades which will push me into quantum land... |
20:16 |
deafferret |
perl_splut: can you run this R crap for me? "failed to allocate 3.2GB vector" would be no problem for you! |
20:19 |
perl_splut |
afraid R doesn't exist for my system... the joy of having such a unique piece of hardware :) |
20:53 |
|
dnewkirk joined #bioperl |
20:58 |
|
dnewkirk joined #bioperl |
21:11 |
|
bag joined #bioperl |
21:16 |
* rbuels |
dials deafferret's same-host-parallel-processing helpline |
21:17 |
rbuels |
boopboopboopboopboopboopboop |
21:17 |
* rbuels |
lets it ring, waiting for an answer |
21:22 |
deafferret |
hello? |
21:22 |
rbuels |
oh hello mr. ferret |
21:22 |
deafferret |
can I help you? |
21:23 |
rbuels |
so i have this script that is doing rather slow, intensive crap with a bunch of xml files |
21:23 |
rbuels |
http://gist.github.com/307019 |
21:23 |
* deafferret |
starts the meter |
21:23 |
rbuels |
and i have a single big machine with lots of procs and ram |
21:23 |
rbuels |
the find command you see there runs for hours |
21:23 |
rbuels |
there are many thousands of files it is finding |
21:23 |
rbuels |
and the file server is slow |
21:24 |
rbuels |
i would like to apportion the files to a number of 'workers' |
21:24 |
rbuels |
processes on the same machine |
21:24 |
rbuels |
which will all log to that donefile |
21:24 |
rbuels |
so i'll need to do some locking on that donefile ... |
21:25 |
rbuels |
and share the find() command's output among multiple processes, if that's possible ... |
21:25 |
rbuels |
what tools would you recommend for this task? |
21:25 |
* deafferret |
rubs rbuels' chin |
21:26 |
deafferret |
print while <$fh>; ? what's that doing? |
21:27 |
rbuels |
just blatting the contents of that tempfile to stdout |
21:27 |
rbuels |
i'm running this like foo.pl > results.gff3 |
21:27 |
rbuels |
well, >> results.gff3 |
21:27 |
rbuels |
since it has some error tolerance built in |
21:28 |
rbuels |
so that it doesn't necessarily have to be totally rerun if something screws up |
21:29 |
deafferret |
so you want to converts thousands of .xml into a single gff3 ? |
21:29 |
rbuels |
yep |
21:29 |
deafferret |
s/s// |
21:29 |
deafferret |
hm. is thousands of .xml files into thousands of .gff3 files, which you can later cat together, acceptable? |
21:29 |
rbuels |
yeah probably |
21:29 |
rbuels |
that would simplify the locking |
21:30 |
rbuels |
i could use find -exec then |
21:30 |
rbuels |
or something |
21:30 |
rbuels |
well still need some kind of apportionment/job control |
21:30 |
deafferret |
i'd do that then. MooseX::Workers. spool up the work list from find, prune it with $donefile, then spew out workers 100 files at a time or something |
21:30 |
deafferret |
how long does each file take? |
21:31 |
deafferret |
you have Moose and POE? |
21:31 |
rbuels |
a few seconds |
21:31 |
rbuels |
yeah prereqs are no problem |
21:31 |
rbuels |
hmmm, i guess now's the time to pop my moo-sexworker cherry |
21:32 |
deafferret |
hmm... i vote for launching a worker for each 20 input files |
21:32 |
deafferret |
you can do anything with the right bovine sex workers |
21:32 |
rbuels |
Mr. Synopsis is a little opaque |
21:32 |
deafferret |
NO U |
21:32 |
* deafferret |
looks |
21:33 |
* rbuels |
cogitates and pokes at this hacked-up thing |
21:33 |
rbuels |
(not the -sex::workers, this script |
21:33 |
deafferret |
hmm? there's some good t/ examples |
21:34 |
rbuels |
yeah i'm workin on it |
21:34 |
rbuels |
you're a mensch |
21:37 |
deafferret |
ya, this is your ticket |
21:37 |
deafferret |
jhannah klab:~/src/moosex-workers$ perl -Ilib t/10.worker.enqueue.t |
21:37 |
deafferret |
all those callbacks are optional |
21:38 |
deafferret |
me, I like Log::Log4perl callbacks all over the place |
21:38 |
rbuels |
i could avoid having to cat if each worker locked the outfile and appended to it |
21:38 |
deafferret |
k. If you know how to do that cleanly you'll have to edumacate me |
21:38 |
rbuels |
heh i'm about to figger it oot |
21:39 |
* deafferret |
streams his tuits to #bioperl |
21:39 |
deafferret |
all done |
21:40 |
rbuels |
wow that's amazing |
21:41 |
rbuels |
food & |
21:42 |
deafferret |
so if I were you, I'd build my 10000 @todo list, then enqueue() them out 50 at a time or whatever |
21:42 |
deafferret |
http://gist.github.com/307036 |
21:42 |
deafferret |
that's my running-shit-at-$job[0] class |
21:43 |
brunov |
'j + POE + Moose r0ck5'; # nide |
21:43 |
brunov |
*nice |
21:45 |
deafferret |
here's the stack from my most recent use case... http://gist.github.com/307036 |
21:46 |
rbuels |
deafferret: well, i kind of need to stream my todo list |
21:46 |
rbuels |
deafferret: the find takes a long time |
21:46 |
deafferret |
k. so enqueue() each as it happens |
21:46 |
rbuels |
deafferret: yeah |
21:47 |
deafferret |
so POE oscilates between launching children and waiting for find |
21:47 |
* rbuels |
runs back to the computer while the soup is in the microwave |
21:47 |
rbuels |
lol |
22:02 |
deafferret |
power outage. that was fun |
22:04 |
deafferret |
I was suggesting "50 at a time" so you don't lose 0.5 seconds for perl compile time on every job |
22:05 |
deafferret |
but if your find is that slow, maybe you don't care and just enqueue each job immediately |
22:08 |
rbuels |
wai-hait? perl compilation? |
22:08 |
rbuels |
it's not forking under the hood? |
22:08 |
rbuels |
deafferret: ^^^ |
22:08 |
deafferret |
um, no. In my case it's running a new perl command |
22:08 |
deafferret |
maybe there's a forky version |
22:09 |
deafferret |
runner.pl launches lots of child.pl instances |
22:09 |
deafferret |
http://gist.github.com/307036 refreshed |
22:09 |
rbuels |
oh, i get it |
22:10 |
rbuels |
mine isn't calling anything with system |
22:10 |
rbuels |
it'll just be doing the work in the worker itself |
22:10 |
deafferret |
MooseX::Workers is a big POE::Wheel::Run fellator, not a fork() fellator |
22:10 |
deafferret |
oh. hmm. |
22:11 |
rbuels |
i'm sure POE::Wheel::Run must be forking under the hood |
22:11 |
deafferret |
right, but the child it then launches is a seperate process |
22:11 |
rbuels |
ya, cause it forked. |
22:11 |
deafferret |
separate? |
22:11 |
rbuels |
no es the same que threading |
22:12 |
brunov |
claro que no |
22:12 |
deafferret |
so there's actually 3: (1) the runner.pl that POE::Wheel::Run'd, (2) the fork that will call system(), (3) the thing that gets called with system() |
22:13 |
deafferret |
but maybe if you don't want.... lemme look |
22:14 |
rbuels |
fork() is not a heavy thing. having 3 processes is not a big deal |
22:15 |
deafferret |
ya. looks like I was right. and you were too, about fork. In MY case, I want to run "perl fatchild.pl", and I don't care about the 0.5 seconds it takes to compile that. BUT, if you need millions of tiny children you probably don't want to wait for perl compile-time 1M times |
22:16 |
deafferret |
so, um. hmm. probably fine, and you should just do it :) |
22:16 |
* rbuels |
nods vigorously |
22:16 |
deafferret |
are are you saying you only want 1 program, total, getting the whole job done? |
22:16 |
rbuels |
still grokking the -sex::workers interface |
22:18 |
deafferret |
hmm... you probably want to just sub { xml_to_gff3() } so you pay no penalty |
22:18 |
deafferret |
like the t/ examples |
22:19 |
deafferret |
whereas I am sub { system("fatchild.pl blah") } |
22:19 |
rbuels |
oh this is nice, i could use MooseX::App::Cmd with this too |
22:19 |
* rbuels |
nods even more vigorously |
22:20 |
deafferret |
max_children(50) # or however much pain you can handle |
22:20 |
deafferret |
enqueue() # as fast as find finds shit |
22:20 |
deafferret |
without all the swearing |
22:20 |
deafferret |
<-- dork |
22:21 |
* rbuels |
will post the parallelized script shortly |
22:21 |
rbuels |
after it actually works |
23:31 |
rbuels |
sco-ho-ho-hore! |
23:31 |
|
brunov joined #bioperl |
23:32 |
* rbuels |
refers to jason's latest email |
23:33 |
deafferret |
bah. you're just impeding progress! |
23:33 |
* deafferret |
was looking forward to rewriting everything |
23:34 |
deafferret |
whoah. he's also on board with dynamiting the monolith now |
23:35 |
deafferret |
dy-no-mite! |