Camelia, the Perl 6 bug

IRC log for #bioperl, 2010-05-19

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
02:38 deafferret lol    I suck at branching  10845 merged 10828:10844 into unstable
03:42 cawiss joined #bioperl
05:44 cawiss joined #bioperl
05:58 bag_ joined #bioperl
05:59 bag__ joined #bioperl
08:35 Lynx_ Hi all! Is there a way to use bioperl modules to represent an alignment in a picture? Something like NCIB BLAST does on the website for all the hits. Just colored bars, so you can get an overview where the gaps are in your (long) alignment.
10:41 JunY Hi, Lynx_. Try Bio::Graphics
10:42 JunY You can always make beautiful alignment pictures with Bio::Graphics
12:55 Lynx_ JunY: Thanks, I'll look into it!
12:57 Lynx_ Another question, though not really bioperl related: I just formatted a blast database, and when I blast against it now the subject seq ID is shown as gnl|BL_ORD_ID|5034 or similar when using -outfmt 6. However, there is nothing of that sort in the fasta file from which the db was made, the deflines are different.
12:57 Lynx_ Where does that name come from?
12:58 Lynx_ I get the normal subject name as expected if I don't use the tabular output option.
13:13 vinnana joined #bioperl
13:57 JunY Lynx_, where do you set the -outfmt parameter? In formatdb or in blastall?
14:12 flu damn you perl and your new release cycle.  I just compiled 5.12.0 3 days ago!
14:12 JunY Anyway, it seems something is wrong in your fasta file.
14:13 JunY Make sure your fasta file title starts with ">", and no ">" in your sequence file
14:19 Lynx_ JunY: It's not the fasta file, this must come from formatdb somehow. If I grep 'gnl' in the fasta file, I get no matches. And in the without the -outfmt all names are as expected from the fasta.
14:32 rbuels Lynx_: yes, formatdb puts things like lcl| and gnl| on your idents
14:32 rbuels Lynx_: internally
14:32 rbuels i'm not a fan of that behavior either.
14:58 deafferret ....__..
15:00 rbuels aw, phant, what did you get into now
15:05 deafferret ew, he stepped in it! gross
15:07 * deafferret calls 1-800-mani-pedi
15:43 vinnana left #bioperl
16:19 kyanardag_ joined #bioperl
16:27 JunY Yep, gnl|BL_ORD_ID|5034  reads like "gnl" "BLAST_ORDER_ID" 5034 ....
16:28 JunY It could be something wrong with the 5034th sequence, e.g. the 5034th ">"
16:35 rbuels Lynx_: also, if there are spaces after your fasta's >, formatdb might be confused
16:35 rbuels Lynx_: or spaces before the > also
16:36 rbuels Lynx_: formatdb is C code that was written in 1998.  always remember that.
16:36 rbuels actually maybe 1996.
16:38 deafferret 'cause they did it right the first time
16:38 rbuels deafferret: so that's why they are throwing it all out and doing a ground-up rewrite with blastplus?
16:39 rbuels ;-)
16:39 deafferret blastplus is a fad
16:40 deafferret 17.9M man-hours of published research based on 1996 BLAST can't be wrong
16:40 deafferret person-hours. pardon me
16:41 rbuels insensitive clod.
16:41 * deafferret looks around
16:41 * deafferret loolishly
16:46 pyrimidine joined #bioperl
16:58 JunY Hi, pyrimidine
16:58 JunY Are you there?
16:58 pyrimidine JunY: yes
16:58 JunY great
16:59 JunY I am thinking of re-organizing the whole Bio::Align package
16:59 pyrimidine good!
16:59 JunY Now it consists of six modules in Bio::Align
17:00 JunY And a few modules in Bio::AlignI
17:00 pyrimidine do you have a wiki or blog post detailing this?
17:00 JunY And a few modules in Bio::AlignIO
17:00 pyrimidine okay
17:00 JunY And another two of Bio::AlignI and Bio::SimpleAlign
17:01 JunY Actually everything seems to be connected, and this will help to get things organized, and remove redundancy
17:01 JunY I have created the wiki/blog post. But the current google doc link gives a summary of all Bio::SimpleAlign methods
17:02 pyrimidine do you have the link?
17:02 JunY http://spreadsheets.google.com/ccc?key=0AssLTcJ​FJMbXdFp3Smg1S3JaYzBKNUcxTmQ0STBNTXc&hl=en
17:03 pyrimidine you need to grant me access to that one
17:03 JunY ok, I will edit it
17:06 JunY Can you see it now?
17:09 pyrimidine yes
17:10 * pyrimidine reading
17:10 JunY good good
17:16 pyrimidine you should probably cap any class name in a method (add_Seq instead of add_seq).  The main reason to do this is to indicate it is not a simple string, but a first-class object
17:17 pyrimidine this, btw, is where there is a terrible lack of consistency in bioperl
17:17 pyrimidine next_seq should be next_Seq
17:17 * pyrimidine sighs
17:18 pyrimidine gap_char is tricky
17:19 pyrimidine what if you have a mix of Seqs with different gap characters?
17:19 pyrimidine edge case, sure, but I'm sure it will pop up
17:19 JunY yep, someone says the gap chars are really messy
17:19 pyrimidine yes
17:20 JunY there was a bug report on that, saying they may be unwantedly changed
17:20 pyrimidine that was from me
17:20 pyrimidine :)
17:20 JunY cool :D
17:20 spekki01 off topic but anyone ever use HMMER3 or stockholm file format?
17:20 pyrimidine yes, yes
17:20 pyrimidine that's directed as spekki01
17:21 spekki01 cool heres my question.
17:21 * pyrimidine runs the other direction
17:21 JunY Anyway, apparently there are at least two or three major authors of Bio::SimpleAlign.. So some of the methods were so different from the others
17:21 pyrimidine :)
17:22 pyrimidine JunY: yes, which is one reason why it needs a cleanup
17:22 JunY yep
17:22 JunY And some method names are not very obvious... e.g. purge
17:23 pyrimidine yes; purge what?
17:23 JunY Purge is a very useful function to remove sequences above a certain similarity threshold...
17:23 spekki01 I have a bunch of subfamiles and each subfamily has multiple sequences, so I want to create a stockholm file that hmmer can read and create multiple profiles from, ie one per subfamily type. How do i seperate the allignments in the stockholm file format so that hmmer picks them up as seperate profiles?
17:23 JunY but as you can see, no one understands it with the bare word "purge"....
17:25 pyrimidine spekki01: read in the alns via AlignIO
17:25 pyrimidine create a new output stream with AlignIO for each aln,
17:25 pyrimidine write to output
17:25 spekki01 cool tx
17:25 pyrimidine wash, rinse, repeat
17:26 pyrimidine JunY: so, back to my thoughts on seqs in the aln
17:27 pyrimidine each instance should have an instance-based setting for gaps, but maybe returning a generic global one
17:27 pyrimidine same for other symbols
17:28 pyrimidine so, if nothing is specified, we have a fallback
17:28 JunY the instance-based setting may be $self->{_gap_char}, which is currently used in the package
17:29 JunY but some of methods dont read that, they just change it randomly whenever they want
17:29 pyrimidine right
17:29 JunY That is partly of the plan... clean the redundant methods
17:30 JunY and rewrite a few of them, let them read parameters from these internal settings, not just from random setting of their own methods
17:30 pyrimidine ok
17:31 JunY This part is ok. THe programming part is not hard. There are just a few new methods need to be writen. I am not worried a about that.
17:31 JunY After the cleanup, we just need to rewrite the document a little bit. And make a good HOWTO on the wiki.
17:32 pyrimidine a howto would be really nice
17:32 JunY It is really necessary ... I guess no one will want to spend one week like me to read the codes to understand what is really going on....
17:33 JunY What I am really worry about is the alignment oriented way of Bio::AlignIO...
17:33 pyrimidine vs assembly?
17:34 JunY As you know, we have so many alignment formats, which means we have to rewrite the read-in methods for all of them....
17:34 pyrimidine yes, but I would start very simply
17:35 JunY Yep, I would love to work on assembly after the cleanup of Bio::Align
17:35 pyrimidine let's say, for instance, we are just changing the class interface a little
17:35 pyrimidine class name changes, for instance
17:35 pyrimidine that's a simple fix
17:35 JunY yep
17:35 pyrimidine all the parsers do is spit back SimpleAligns
17:36 pyrimidine if they are doing much more than that, they're broken
17:36 pyrimidine (well, not broken, but doing too much)
17:37 JunY yep, that will be ok. We can just retire the inappropriate methods, and re-direct them to the new ones.
17:37 pyrimidine if you add deprecation warnings to those, they'll be easy enough to pick out
17:38 pyrimidine via tests
17:38 JunY e.g. $self->deprecated(); $self->newmethod(@arg);
17:38 JunY yes, that is what I mean
17:39 JunY This part is not a problem.
17:39 pyrimidine the Bio::Root::RootI deprecated method allows for version-specific 'blowups'
17:40 pyrimidine but defaults to immediate deprecation if used
17:40 pyrimidine $obj->deprecated(-message => "Method X is deprecated", -version => 1.007);
17:40 pyrimidine or simpler:
17:40 pyrimidine $obj->deprecated("Method X is deprecated",1.007);
17:41 JunY ok
17:41 pyrimidine allows for a deprecation cycle if needed
17:42 pyrimidine also has a -warn_version and -throw_version for better control
17:42 JunY yep, I have seen them. I will definitely check them later
17:44 JunY So, if I am right, the plan is not to change the Bio::SimpleAlign too much. Everything will be based on the current methods, we just remove a few bad ones, and add a few new ones.
17:45 pyrimidine and any method that is utility-like go into another module
17:45 JunY Actually, I would like to retire the Bio::SimpleAlign, and move the methods to Bio::AlignI and Bio::Align::Utilities
17:45 JunY I mean Bio::Align::AlignI
17:45 pyrimidine any reason?
17:46 JunY No big reason. It is just Bio::Align::AlignI is a more decent name
17:46 pyrimidine ah, yes, but it's an interface
17:46 pyrimidine it's supposed to be abstract
17:46 pyrimidine (methods unimplemented)
17:47 pyrimidine so, what if we want to create a lightweight alignment class?
17:47 pyrimidine for instance, something attached to a database? Like a BAM file?
17:48 pyrimidine One could create a Bio::Align::AlignI that has the same methods but different implementation
17:48 pyrimidine and, if everything is the same, if I did
17:49 pyrimidine if ($aln->isa('Bio::Align::AlignI')) { ###} else { die }
17:49 pyrimidine I should be guaranteed that the ### would work
17:50 JunY ok
17:50 pyrimidine not as nice as checking a Role, but it'll do
17:50 pyrimidine The other thing we're somewhat hindered by
17:51 pyrimidine is object creation in Perl is expensive
17:52 pyrimidine mainly (in bioperl) this is b/c we are crawling up the inheritance hierarchy
17:52 pyrimidine via SUPER::new()
17:52 JunY yep, I saw that...
17:52 pyrimidine there are ways around that
17:53 pyrimidine (easiest is don't call SUPER::new)
17:53 pyrimidine but you have to account for lack of specific attributes needed further up the inheritance tree
17:54 pyrimidine *possible lack
17:54 pyrimidine i.e. verbosity settings, for instance
17:55 JunY yep
17:55 cawiss joined #bioperl
17:57 JunY I think I have roughly got the idea...
17:58 JunY The other thing is , shall we create some methods on visiting database/online alignment file?
17:58 JunY These can be new modules in Bio::AlignIO
17:59 JunY and, is there any module in Bio::AlignIO need to pay special attension to ?
17:59 JunY For example, stockholm format?
18:00 pyrimidine yes
18:01 pyrimidine well...
18:01 pyrimidine it makes more sense to me to have a Bio::DB::* interface for that
18:04 JunY Ok, I see
18:04 spekki01 pyrimidine: you wouldnt happen to have a file in stockholm format that has multiple alignments in it so i can get an idea of how it should look to make sure the one im trying to create makes sense.
18:08 JunY Ok, I will think about the new structure of the Bio::SimpleAlign. There are just a few things left.
18:09 JunY Do I need to make an official report? Or write a report on my blog?
18:11 pyrimidine the blog report should be official enough for now.
18:17 JunY Ok, I will try. Hopefully, you will see the report next week.
18:18 JunY And the first milestone can be to impletment the new methods I proposed. And move some of them to Bio::Align::Utilities
18:19 JunY Hopefully, we can finish the work on Bio::Align before the midterm accessment. And continue with Bio::Assembly after early July...
18:19 pyrimidine sounds like a good plan
18:20 JunY I cannot be too ambitious, since it is the first time I work on this community coding work. But we will see whether my plan works or not~
18:20 pyrimidine right
18:20 pyrimidine re: very large alignments, try to keep in mind how these might be implemented
18:21 pyrimidine as I mentioned before, some of the methods require all seqs in memory, which isn't efficient
18:21 JunY Yep, I will. I am think about to make some methods dealing with up to 2Gb alignment
18:21 pyrimidine ok
18:21 JunY Definitely
18:22 pyrimidine gotta go, $job calls
18:22 JunY We can implement the new method, maybe next_locatable_aln and next_locatable_seq on a few alignment formats
18:22 pyrimidine yes
18:22 JunY Ok, see u later~
18:22 JunY good talk to u today :D
18:23 pyrimidine will be on here, may not immediately respond, but post and I can read the backlog
18:23 pyrimidine yes, good talk
18:23 pyrimidine hopefully mark will join in once things calm down at his new $job
18:23 JunY ok i need to go home for dinner... so hungry...
18:24 JunY That will be great ... welcome on board :D
18:24 pyrimidine bye!
18:24 JunY Bye~
19:12 spekki01 Off topic but another question about alignments, do the multiple sequences that make up an allignment have to be in some special format or can they just be any sequences?
19:12 spekki01 like the same lenght or something?
19:15 rbuels spekki01: nope, any sequences
19:15 rbuels spekki01: that's one of the things that makes alignment algorithms Hard
19:15 spekki01 ah ok
19:16 rbuels spekki01: for sequences with different lengths, an alignment is usually expected to say what parts are inserted and deleted relative to the others
19:17 rbuels spekki01: or another way to say that is that it makes 'gaps'
19:17 rbuels spekki01: or that it gaps the sequences
19:18 spekki01 alright that makes sense now how whenever i look at an allignment theres always ....... between certain parts of the sequences that make up the allignment
19:18 spekki01 im assuming those are the gaps it makes
19:18 rbuels probably
20:20 vinanna joined #bioperl
20:41 bag_ joined #bioperl
21:04 bag_ joined #bioperl
21:13 pyrimidine hmm?
21:13 pyrimidine oops, IRCFAIL
22:22 deafferret ?
22:24 rbuels deafferret: his irc failed, obviously.
22:24 * rbuels snorts
22:25 was kicked by deafferret: like this?
22:32 deafferret hmm, nope.   "has quit" ne "was kicked"   curious.
22:33 rbuels joined #bioperl
22:33 deafferret hmm, nope.   "has quit" ne "was kicked"   curious.
22:33 rbuels you sure are asking for it aren't you
22:33 * deafferret bends over to pick up the ice cream cone
22:34 rbuels no time for your shenanigans, i am helping reproduce bugs in #git
22:37 deafferret i just reproduced a bug in #catalyst   :p
22:37 deafferret Catalyst-Runtime/5.80/trunk/t/​aggregate/unit_core_ctx_attr.t   says :p  too

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary