Camelia, the Perl 6 bug

IRC log for #bioperl, 2010-02-02

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:19 driveby_bot joined #bioperl
00:19 driveby_bot /home/svn-repositories/bioperl: r16805 (maj) : File::Temp::filename is full path--fix
00:19 driveby_bot Diff: http://tinyurl.com/yalc75p
00:43 ptl joined #bioperl
00:43 ptl joined #bioperl
03:26 dnewkirk left #bioperl
07:19 kyanardag joined #bioperl
15:57 derele joined #bioperl
16:12 ende joined #bioperl
16:12 ende left #bioperl
17:06 deafferret .
17:09 rbuels O RLY?
17:10 * deafferret fears the coming torrent
17:26 kishore joined #bioperl
17:26 kishore joined #bioperl
17:26 kishore Hi
17:27 kishore how can we parse a blast output file which is in new format
17:27 kishore for example...
17:27 kishore >pdb|1TSR|A Chain A, P53 Core Domain In Complex With Dna  pdb|1TSR|B Chain B, P53 Core Domain In Complex With Dna  pdb|1TSR|C Chain C, P53 Core Domain In Complex With Dna  pdb|1TUP|A Chain A, Tumor Suppressor P53 Complexed With Dna  pdb|1TUP|B Chain B, Tumor Suppressor P53 Complexed With Dna           Length = 219   Score =  464 bits (1194), Expect = e-131  Identities = 219/219 (100%), Positives = 219/219 (100%)
17:27 kishore _________________
17:28 kishore ok that looks a bit messy
17:28 kishore i'll enter the output line by line
17:28 kishore >pdb|1TSR|A Chain A, P53 Core Domain In Complex With Dna
17:28 kishore pdb|1TSR|B Chain B, P53 Core Domain In Complex With Dna
17:28 kishore pdb|1TSR|C Chain C, P53 Core Domain In Complex With Dna
17:29 kishore pdb|1TUP|A Chain A, Tumor Suppressor P53 Complexed With Dna
17:29 kishore pdb|1TUP|B Chain B, Tumor Suppressor P53 Complexed With Dna
17:29 rbuels nono, don't enter it line by line
17:29 kishore Length = 219
17:29 kishore Score =  464 bits (1194), Expect = e-131
17:29 rbuels we know what the new format means
17:29 kishore Identities = 219/219 (100%), Positives = 219/219 (100%)
17:29 kishore ooh that's great
17:29 kishore thanks
17:29 kishore just thought of giving some info abt it
17:30 kishore like here i would like to get all the hits
17:30 kishore is it possible with bioperl?
17:31 kishore i'm sure it is :-)
17:31 rbuels you are talking about blastplus?
17:31 kishore just the blast
17:32 kishore blastall -p blastp  -i  inputProtein.fasta  -d pdbaa  -o output.blast
17:32 rbuels hmmm
17:33 rbuels it's in here, i'm looking for it...
17:33 kishore ok
17:34 deafferret kishore: can you paste an example here please   http://codepad.org/
17:35 kishore sure
17:35 rbuels kishore: have you tried parsing it with something like the example code here?  http://www.bioperl.org/wiki/H​OWTO:SearchIO#Using_SearchIO
17:35 * deafferret doesn't know what "the new format" means
17:35 kishore yes that's right
17:35 kishore i used the same code
17:36 kishore from  http://www.bioperl.org/wiki/H​OWTO:SearchIO#Using_SearchIO and it parses the older formats
17:36 kishore easily
17:39 kishore @deafferret: you can access my parser code at http://codepad.org/kOLTaKaR
17:40 deafferret kishore: can you also paste an example input file please? preferably the one you're trying to parse?
17:40 kishore ok
17:40 deafferret kishore: add    use Bio::SearchIO;    to the top of your program
17:42 kishore yes i did
17:42 kishore that code parses older blast output
17:42 kishore i just copied part of my code which does parsing
17:42 kishore >t120407068
17:43 kishore MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDI​EQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQ​KTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDST​PPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGN​LRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRP​ILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP​PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALEL​KDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD
17:43 kishore those two lines are my input.fasta
17:43 deafferret the error   "Can't locate object method "new" via package "Bio::SearchIO""    means you didn't
17:43 deafferret paste your whole program
17:43 kishore ok
17:43 deafferret please don't paste in channel
17:43 kishore ok
17:47 kishore just give me few minutes
18:31 kishore ok done finally
18:31 kishore i modified the code which only parses the file  and you can access it from http://codepad.org/k0LLpmak
18:32 kishore to run tat code ./blastParser.pl t120407068.blast  177 > out.txt
18:33 deafferret "Can't locate Bio/SeqIO.pm in @INC"  means you haven't install BioPerl, or have installed it incorrectly
18:33 deafferret this (so far) has nothing to do with actually parsing anything
18:34 deafferret how did you install BioPerl?
18:35 kishore actually bioperl was installed by someone else
18:36 deafferret apparently not.  :)
18:36 kishore the perl is compiled with bioperl long back
18:36 deafferret what does perldoc -l Bio::SearchIO say?
18:36 deafferret no, bioperl is never compiled
18:36 kishore that script runs fine for me here
18:36 kishore at my linux machine
18:37 deafferret your last paste had the error "Can't locate Bio/SeqIO.pm in @INC" in it
18:37 deafferret you're saying you're NOT getting that error?
18:38 kishore that script ran fine and i uploaded the out.txt to codepad http://codepad.org/Wfu197zx
18:39 deafferret ... oh, ok. so you're all set now? or you have questions?
18:40 kishore that script get's only first hit
18:40 kishore if frac_identical is 1
18:40 kishore leaving the rest
18:40 kishore i want to get all the hits from new format
18:41 deafferret ok, so I need a "new format" file so I can try to reproduce the problem on my side
18:42 kishore blast output in new format uploaded http://codepad.org/jut1JCZh
18:42 rbuels blast 2.2.12 is "new format"?
18:43 kishore i get the same format in both 2.2.18 as well
18:43 kishore i so no change
18:43 kishore if you see in line 88 of the blast output file
18:43 kishore my blast parser parses and gets only line 88
18:43 deafferret your code says  if ($hsp->frac_identical==1.0){
18:44 kishore leaving other hits from 89 to 97
18:44 kishore and i want to get those hits as well
18:44 kishore yes
18:44 deafferret so if you don't want that test, remove it
18:45 kishore i want it to frac_identical==1.0
18:45 kishore but don't you think all lines 88 to 97 have frac_identical as 1.0
18:45 kishore in the blast output file
18:46 deafferret http://codepad.org/jut1JCZh  lines 88-98 look malformed to me...
18:47 deafferret 120-124 looks healthy to me
18:48 kishore why do you think they are malformed
18:48 deafferret huh... again at 312-318 ... I'm not even sure what this means
18:48 deafferret because I'm stupid   :)
18:48 kishore :-)
18:48 deafferret I'm used to HSPs being 1 query, 1 db. what does 88-98 mean?
18:49 kishore they are the pdb structures
18:49 kishore for the position 177 in protein sequence
18:49 kishore ok since you say 120 to 124 are healthy
18:49 deafferret pdb is Protein Data Bank ?
18:50 kishore how can I parse those lines
18:50 kishore yes you are right PDB=Protein Data bank
18:53 deafferret running on my side so I can try to get a clue  :)
18:53 kishore great thanks
18:53 kishore all the best!
18:54 deafferret is it saying line 88 hit ALL of the sequences in lines 89-97 -- because ... why? 89-97 are all identical to each other?
18:54 deafferret is this NCBI blast or wu-blast?
18:54 kishore NCBI blast
18:56 kishore a protein sequence can have multple pdb structures
18:56 * deafferret has almost zero protein experience
18:56 kishore :-)
18:57 kishore and lines 88 to 97 are the pdb structures
18:57 deafferret of the *same* protein sequence?
18:57 kishore 1TSR is the pdb id
18:57 kishore etc...
19:00 deafferret sigh... github.com failures
19:01 deafferret so this code   http://www.bioperl.org/wiki/HOWTO:SearchIO   seems to work on your input file
19:01 deafferret i'll show you as soon as github.com starts working again
19:02 deafferret so I need to find where yours is different
19:03 kishore ok
19:08 deafferret http://github.com/jhannah/s​andbox/tree/master/kishore/
19:15 kishore ok tested your code just now
19:15 kishore but looks like it get's only 1HSP per hit
19:15 deafferret just pushed a new version. look at out.txt, or run the new version -- frac_identical is rarely 1
19:16 deafferret brb
19:16 kishore ok
19:20 deafferret agreed, it's seeing 1 HSP per hit. is that wrong?
19:20 kishore that's right
19:20 deafferret I'm right, it's wrong?  :)
19:20 kishore :-)
19:20 deafferret <-- troublemaker
19:20 deafferret so 1 HSP per hit is correct?
19:20 kishore no not at all
19:21 kishore 1 HSP per hit is correct
19:21 kishore but i want other HSPs i that HIT
19:21 deafferret ok. so as you can see, BioPerl thinks fraq_identical=0.981735159817352, etc
19:22 deafferret what was the blastall command you ran?
19:22 kishore blastall -p blastp -d pdbaa -i t120407068.fasta -o out.blast
19:23 kishore i get almost similar output both with blast_2.2.12 and blast_2.2.18
19:23 kishore I'm looking at your profile in bioperl website ;-)
19:25 kishore looks like seqlab.net is down
19:27 deafferret http://clab.ist.unomaha.edu/​CLAB/index.php/SeqLab_(Perl)
19:27 deafferret http://www.bioperl.org/wiki/User:Jhannah#Hype updated
19:28 deafferret I'm staring at blastall switches, but I don't see a HSP cound cutoff
19:28 deafferret count
19:28 kishore hmm i too tried different options
19:28 deafferret at this point we're thinking BioPerl is parsing correctly, but blastall isn't giving you what you wanted, right?
19:29 kishore yes and no
19:29 kishore i submitted the same fasta sequence
19:29 kishore in the ncbi blast website
19:29 kishore and i get the blast output as the one I gave you
19:30 deafferret ... and the ncbi website shows more than 1 HSP per hit?
19:31 kishore yes that's right
19:31 deafferret uh... is there a URL where I can see that?
19:32 kishore yes got it http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=G​et&amp;VIEW_RESULTS=FromRes&amp;RID=PGZBP69X0​1S&amp;UNIQ_OBJ_NAME=A_SearchResults_1NcOSo_1​dEb_1Z5H2O1Yp_GTR6V_1xE0NX&amp;QUERY_INDEX=0
19:33 kishore you can see multiple HSPs per hit in "Alignments" section
19:34 driveby_bot joined #bioperl
19:34 driveby_bot /home/svn-repositories/bioperl: r16806 (rbuels) : tweaked and tidied Build.PL, adding pointer to repository, create_license
19:34 driveby_bot Diff: http://tinyurl.com/yl82z8n
19:35 deafferret you can? I don't see any. give me an example ID ("pdb|ITSR|A") where there is more than 1 HSP
19:35 deafferret > signifies a new hit
19:35 deafferret each alignment block is a new HSP
19:36 * deafferret blames rbuels
19:36 kishore :-)
19:37 kishore ok in that blast file we do see  pdb|1TSR|B  ,  pdb|1TSR|C after pdb|1TSR|A
19:37 kishore and i want those as well
19:38 kishore since they have differenct 'chains'
19:38 kishore i might be wrong in my understanding of HSP
19:38 deafferret oh, so while the actual HSP is identical (the sequence) , you want to see all those decriptions in your output?
19:38 kishore correct me if i'm wrong, don't we call all those lines with 1TSR|A 1TSR|B,... as HSPs?
19:39 deafferret no, the    Score =  464 bits (1194)   and alignment itself is an HSP
19:39 deafferret you taught me that in protein land there can be lots of chains with the same sequence
19:39 deafferret but an HSP is sequence-centric
19:39 deafferret afaik
19:40 deafferret checking for how to get your labels out
19:40 kishore i think i need to know what i'm talking first ;-)
19:40 kishore so what do you call all those lines in alignment with 1TSR|A TSTR|B...
19:40 deafferret no clue... chains?  lemme debug it
19:41 kishore ok
19:43 kishore brb
19:48 rbuels deafferret++ # great patience
19:48 was kicked by deafferret: rbuels
19:48 rbuels joined #bioperl
19:48 * rbuels cringes
19:49 deafferret :)
19:49 deafferret kishore: looks like hit_description() is what you're looking for  http://github.com/jhannah/s​andbox/tree/master/kishore/
19:50 deafferret needs a "chains" for dummies wikipedia page or something
19:50 deafferret i have no idea what's going on here, bio* wise
19:51 deafferret looks like the ncbi website calls these "sequence titles" and hides them with those "7 more sequence titles" links
19:51 kishore hmm
19:52 kishore is there a way to parse those alignments
19:52 kishore like to get those lines 88 to 97
19:53 deafferret well, the alignment is the same 15 times, right?
19:53 deafferret each HSP is one alignment, with 15 alternate descriptions/titles  ?
19:53 deafferret lost your paste URL
19:53 kishore i don't know :-)
19:54 deafferret $hsp->query_string()  homology_string()  hit_string()
19:54 deafferret is that what you meant?
19:56 deafferret I pushed a demo of that
19:57 deafferret hmm... I've been clocked out for 64 minutes. gonna each lunch and earn a salary  :)
19:57 kishore ok will take a look
19:57 kishore :-)
19:57 kishore ok
19:57 kishore thank you!
19:58 deafferret you're welcome.   :)
19:58 deafferret you're in good hands -- rbuels is way smarter than me
19:59 rbuels but also busier!
19:59 rbuels (possibly?)
19:59 rbuels or maybe i'm just not as nice
19:59 rbuels actually that's probably it
20:00 kishore thank you jhannah!
20:00 kishore i think hit_description is what i wanted
20:00 kishore thanks a lot!
20:00 rbuels who was that masked man?
20:01 kishore ooh i thought deafferret is jhannah
20:01 kishore isn't he?
20:02 rbuels well, yes
20:02 kishore :-)
20:02 rbuels but on freenode he's deafferret!
20:02 rbuels and a mensch!
20:02 rbuels deafferret++ # A MENSCH I TELL YOU
20:03 kishore hmm
20:03 kishore and has lot of patience as well
20:08 kishore thank you once again jhannah
20:08 kishore you made my day!
20:08 kishore have a wondeful day ahead!
20:15 deafferret mensch?
20:15 deafferret tee-hee
20:33 kishore left #bioperl
21:06 ptl joined #bioperl
21:06 ptl joined #bioperl
22:36 driveby_bot joined #bioperl
22:36 driveby_bot /home/svn-repositories/bioperl: r16807 (maj) : dev changes assoc w/WrapperMaker
22:36 driveby_bot Diff: http://tinyurl.com/ybxhx2z
22:37 driveby_bot joined #bioperl
22:37 driveby_bot /home/svn-repositories/bioperl: r16808 (maj) : exporting wrapper object
22:37 driveby_bot Diff: http://tinyurl.com/ydbgaq4
22:38 driveby_bot joined #bioperl
22:38 driveby_bot /home/svn-repositories/bioperl: r16809 (maj) : schema tweaks and additions
22:38 driveby_bot Diff: http://tinyurl.com/yhxh2vd
22:40 * deafferret gasps
23:05 rbuels what are you gasping about?
23:06 rbuels or are you just feeling a little bit piqued?
23:06 rbuels did you forget your parasol
23:06 rbuels ?
23:07 deafferret have you seen my parasol pic?
23:07 * deafferret was gasping at the audacity of the maj
23:08 deafferret http://jays.net/images/logos/jays.net.pretty.jpg

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary