Camelia, the Perl 6 bug

IRC log for #bioperl, 2010-01-14

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
02:10 Rick__ joined #bioperl
02:42 brandi joined #bioperl
02:42 brandi left #bioperl
04:20 driveby_bot joined #bioperl
04:20 driveby_bot /home/svn-repositories/bioperl: r16688 (kortsch) : Bug fix
05:03 driveby_bot joined #bioperl
05:03 driveby_bot /home/svn-repositories/bioperl: r16689 (kortsch) : Bug fix
07:27 driveby_bot joined #bioperl
07:27 driveby_bot /home/svn-repositories/bioperl: r16690 (kortsch) : Added Bio::Assembly::IO::bowtie module.
09:09 dmb_ joined #bioperl
09:10 dmb_ joined #bioperl
12:50 brandi joined #bioperl
12:50 brandi left #bioperl
14:02 brandi joined #bioperl
14:03 brandi left #bioperl
14:22 brandi joined #bioperl
15:12 splut joined #bioperl
15:38 driveby_bot joined #bioperl
15:38 driveby_bot /home/svn-repositories/bioperl: r16691 (maj) : revert to WrapperBase namespace (bug #2991)
15:47 brandi left #bioperl
16:10 deafferret @o.  ..  .o  o
17:36 deafferret adding Bio::BroodComb::PCR
17:52 balin joined #bioperl
18:24 driveby_bot joined #bioperl
18:24 driveby_bot /home/svn-repositories/bioperl: r16692 (dave_messina) : Updated header tag writing code such that only one of source or sourceVersion need be set, not both (to comply with spec). Added checks that sequence actually exists and has nonzero length before attempting to write (to comply with spec).
18:34 jhannah joined #bioperl
18:41 jhannah joined #bioperl
19:16 siddbasu joined #bioperl
19:16 siddbasu hi
19:17 siddbasu could anybody clarify if reading of searchio object from a blast file loads the entire file in the memory
19:17 siddbasu something like this while(my $result = $searchio->next_result)
19:18 siddbasu does the $searchio object reads a huge chunk of file in the memory
19:25 jhannah checking...
19:28 jhannah if you search 1 query sequence against a jillion database sequences, then you have 1 rest, many hits, many hsps
19:28 jhannah so next_result is slurping the entire file
19:28 jhannah how big is your file? do you little memory?
19:28 siddbasu 298 MB
19:29 jhannah so if you've got a gig or more of RAM you should be ok
19:29 jhannah not sure if there's a chunky version of SearchIO
19:29 siddbasu hmm so actually next_result is eating the whole file till the stats part
19:29 siddbasu wow
19:30 jhannah yup. if you have many query sequences and many database sequences, then next_result() is one piece at a time
19:30 siddbasu I do have a gig of RAM but it is also a app server so it needs it for other process
19:31 siddbasu may be i am missing something if have searched 13000 protein sequences to a genome with 800 scaffolds
19:31 siddbasu and then output that in a single file
19:32 siddbasu it reads the whole in one shot
19:32 siddbasu and i thought it goes like Bio::SeqIO
19:32 jhannah hmm... I must be wrong then
19:32 jhannah in the scenario you described I would expect many small next_result() calls
19:32 siddbasu like for fasta one header to other header setting $/="\n>"
19:33 siddbasu yup that's what i though and got into an argument with my co-worker
19:33 siddbasu thought
19:33 siddbasu he thinks i have to index the whole file and then read one by one
19:33 jhannah "The Result is the entire analysis for a single query sequence, and multiple Results can be concatenated together into a single file"   http://bioperl.open-bio.org/wiki/HOWTO:SearchIO
19:33 jhannah so either you're wrong, or both I and the document are wrong  :)
19:34 siddbasu probably i am
19:34 jhannah -ponder-   dunno  :)
19:35 jhannah Paste to http://codepad.org/ ?
19:36 siddbasu my code
19:36 siddbasu sure
19:38 siddbasu http://codepad.org/fq2gSXPV
19:38 siddbasu It is a long script and uses Bio::Chado::Schema in between
19:39 jhannah maybe just   print "result found\n"; next   on line 184 and see how many happen?
19:40 jhannah should be lots  :)
19:40 jhannah I don't know anything about Chado...
19:40 jhannah rbuels: ^^^
19:40 siddbasu yup it is his module
19:40 jhannah rbuels wrote Chado?
19:40 siddbasu He wrote Bio::Chado::Schema
19:41 jhannah ahh. ya, he was in DBIx::Class::Schema fixing some stuff last week
19:41 jhannah ...Loader
19:41 jhannah rbuels: FIX YUR CHADO!  ;)
19:57 siddbasu well when i just read the blast file without any database thing it just ran file
19:57 siddbasu fine
19:57 siddbasu and so memory overhead
20:04 siddbasu Hi rbuels
20:06 jhannah it ran with 1 result or thousands?
20:08 siddbasu The entire file
20:08 siddbasu thoushands
20:09 jhannah ah, so there are thousands of small results then?
20:10 siddbasu yup
20:10 jhannah hmm... so something is going wrong in the Chado layer? what are your symptoms?
20:10 siddbasu okay i see in the mailing list that there is a blast_pull parser which does it faster
20:10 jhannah k
20:10 siddbasu well the machine just ran out of memory
20:12 jhannah let me know if blast_pull doesn't solve your problem
20:13 siddbasu jason stajich also told me to use -m8 output if i don't need the alignment part
20:13 siddbasu okay thanks jhannah
20:14 siddbasu need to talk with rbuels anyway
20:14 jhannah :)
21:41 siddbasu hi jhannah, searchio loads one result per query not the entire file
21:41 siddbasu just got confirmed by jason
21:41 siddbasu so in my code searchio not to blame
21:42 deafferret right, you proved that to yourself earlier
21:43 deafferret so now the question is whether rbuels' bio::chado::whatever is designed in such a way to slurp everything
21:43 deafferret I know nothing about chado...
21:43 deafferret rbuels: wakey, wakey!
21:45 rbuels siddbasu, deafferret: i'm sitting in the GMOD meeting right now
21:45 rbuels siddbasu, deafferret: why aren't you guys here?
21:45 rbuels siddbasu, deafferret: huh????
21:45 deafferret what's gmod?
21:46 rbuels deafferret: gmod.org, confederation of open-source components for model organism databases
21:47 deafferret rbuels: siddbasu is spreading nasty rumors that Bio::Chado::Schema is leading his/her entire 450MB blast result file in at once, ENOMEMORY
21:47 deafferret rbuels: I know, silly
21:47 rbuels dude Bio::Chado::Schema does not read blast reports
21:47 deafferret a-ha!
21:47 rbuels it's just a bunch of DBIx::Class classes
21:47 * rbuels slaps deafferret
21:47 deafferret A-HA!!
21:47 deafferret ow
21:48 deafferret sudo rbuels /me rubs lstein's chin
21:49 rbuels Sorry, try again.
21:49 * deafferret stares at http://codepad.org/fq2gSXPV some more
21:49 deafferret are lstein, pyrmawhever and jstajich there?
21:52 deafferret siddbasu: is this the current load_alignment.pl, straight out of Chado? Or did you modify it? When you run this you run out of RAM?
21:52 rbuels pyrmafrost and jstajnijich are here
21:52 * deafferret swoons
21:53 deafferret I'm adding Bio::BroodComb::PCR now. Prepare for global domination.
22:01 rbuels i'm betting the culprit is the SearchIO.
22:02 rbuels siddbasu: does the script actually run through the loop for a while before going into orbit?
22:02 rbuels siddbasu: or does it do it on the first loop iteration?
22:07 deafferret mmm... San Diego. http://www.omnihotels.com/FindAHotel/SanDiego.aspx
22:14 driveby_bot joined #bioperl
22:14 driveby_bot /home/svn-repositories/bioperl: r16693 (kortsch) : Revert WrapperBase include to follow its definition change.
22:14 driveby_bot joined #bioperl
22:14 driveby_bot /home/svn-repositories/bioperl: r16694 (kortsch) : INIT block definition removed as unnecessary.
22:51 siddbasu nope i am not
22:51 siddbasu spreading any rumours
22:54 deafferret :)
22:54 deafferret sudo rbuels /me rubs lstein's chin
23:06 siddbasu rbuels: it does go through most of the records

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary