Camelia, the Perl 6 bug

IRC log for #bioperl, 2010-10-21

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
02:56 dnewkirk joined #bioperl
03:37 mzgrideng left #bioperl
07:07 vtpoe joined #bioperl
07:46 vtpoe left #bioperl
07:53 arcsine left #bioperl
07:53 svaksha left #bioperl
07:53 vinnana left #bioperl
07:55 svaksha joined #bioperl
07:55 vinnana joined #bioperl
07:56 arcsine joined #bioperl
10:34 philsf joined #bioperl
10:37 svaksha left #bioperl
10:38 svaksha joined #bioperl
11:19 arcsine left #bioperl
11:31 carandraug joined #bioperl
11:57 brandi1 joined #bioperl
12:48 brandi1 left #bioperl
13:20 arcsine joined #bioperl
13:39 flu left #bioperl
14:24 mzgrideng joined #bioperl
14:50 philsf left #bioperl
15:20 flu joined #bioperl
15:26 mzgrideng left #bioperl
15:47 mzgrideng_ joined #bioperl
17:04 jearl joined #bioperl
17:05 jearl Hello all, please bear with me, I'm pretty new to irc (I'm actually going through webchat.freenode.net
17:06 jearl I have a question though, I'm working on a project where we have sequenced several genomes of the same species
17:06 jearl two of the species we have closed the genome (1 contig), however for the others they are in various levels of lots of contigs
17:07 jearl I'd like to use the closed genomes as reference for aligning the contigs for the rest of the strains, and I can do that pretty easily using Mauve (with varying levels of success)
17:10 jearl however, while Mauve can read genbank files, after you use the utility in the program to rearrange a contig file to a reference, all the annotations are lost.  It will output an aligned file, and a file which lists which contigs were rearranged to where, and if they were reverse complimented, but not the proteins
17:11 jearl So, what I really need is a program that will take in a list of contig names, a series of genbank records, and re-make the genbank records to be reverse complimented or simply change the order of where it appears in the genbank file.
17:13 jearl Actually, I can program all that stuff, what I really want to know is if there is a program out there that will take a genbank record (i.e. of a contig with several annotated proteins) and reverse comp not just the contig sequence, but also the protein annotations
17:13 jearl Or a quick way to do that in bioperl ;-)
17:16 deafferret jearl: howdy   :)
17:18 deafferret i'm not sure what you mean by "reverse comp". What format are your annotations in?
17:19 deafferret sounds like a chunk of work
17:20 deafferret if it's possible to map it manually then you can write BioPerl to do it for you. Have you figured out (manually) how the first one maps?
17:23 jearl sorry, reverse compliment
17:24 jearl The format I'm most interested in changing is a single genbank file, which consists of several records.  Each record is a contig, and each contig can have hundreds (or even thousands) or protein annotations
17:26 jearl each annotation also has a location on the contig associated with it... so while I could go through and change each protein annotation to a reverse compliment of itself, the location on the contig would no longer be correct
17:26 jearl all the programs I've run into that do will create a reverse compliment of a genbank file, only do it for the contig part of the record.. not the annotations.
17:28 jearl I have actually done this kind of "by hand" to a fasta file, just to be able to rearrange the contigs themselves. Although, I did use a bioperl script to reverse compliment the contigs from the fasta file
17:29 jearl I think its definitely possible in bioperl (though, I'm pretty new to bioperl, I write quite a bit of my own parsers etc in perl, and only recently decided that it was time to stop re-inventing the wheel on a bunch of stuff)
17:30 jearl my basic idea of how to do this is to:
17:31 jearl Read in orientation, reverse compliment information from Mauve text output file.
17:31 ank joined #bioperl
17:31 jearl Read in genbank file with all contigs annotations.
17:32 jearl Parse genbank file, and split into separate records (hopefully with a bioperl object that holds both the contig sequence, and its annotations)
17:34 jearl For each genbank record that needs to be reverse complimented, do that and keep track of the locations of each protein annotation
17:35 jearl reverse compliment each protein annotation, and change the locations to accomodate for the reverse compliment of the contig sequence.
17:36 jearl But that would take some time, and it would be way cooler if someone had just already written something that could reverse compliment an entire genbank record, contig protein annotations and all.
17:39 deafferret well, bioperl does all of that for you except flopping the annotation location (i don't know that it does that anyway)
17:40 deafferret are the annotations at specific 1-base locations? or are they complex positions   (100..159,180..220)?
17:41 deafferret btw, in BioPerl lingo these are features not annotations   http://www.bioperl.org/wiki/HOWT​O:Feature-Annotation#The_Basics
17:41 deafferret is it just new_location =
17:42 deafferret new_location = length - old_location  ?
17:42 deafferret and flop the strand
17:49 jearl Well, the short answer about the complex position question is I'm not sure.  Probably not for this project.  In general I'm only looking at bacterial sequences, so in general I don't have to worry about intron-exons for the most part.
17:51 jearl of course, as it usually is in biology, that's just just generally true, there are definitely exceptions.  I suppose I could just keep track of the number that have complex locations, and if its reasonably small, just do those by hand, or remove them or something.
17:53 deafferret k. then quite doable unless I'm missing things? sounds like you've already got the steps 95% figured out?
17:54 jearl I believe that in general the location will follow your above formula.  for a toy example a protein on a contig of size 10, location (2..5)  would actually be from (8..5).  Although off the top of my head I can't remember if genbank reverses the numbers when a gene is reverse complement to the arrangement of the contig
17:55 jearl Oh yes, I'm just really lazy, and wanted to see if anyone else already had a function that I could just reverse compliment a genbank record ;-)
18:02 pyrimidine joined #bioperl
18:02 jearl oooohhhh yes, that feature-annotation link is very helpful.
18:02 jearl I think I was messing up sequence/feature... I need to take a closer look at that
18:06 pyrimidine jearl: locations in GenBank and in BioPerl always have start <= end.  strand is key in this case.
18:16 jearl yeah... I just have to figure what in bioperl refers to the strand, and what refers to the protein
18:17 jearl ah, which appears you can get at features al la my @features=$seqobj->all_SeqFeatures();
18:37 jearl left #bioperl
18:38 deafferret another satisfied customer   :)
18:39 perl_splut we can't have that... they must leave here hungry and unsatisfied...
18:39 deafferret perl_splut: are you hungry? are you satisfied?
18:39 deafferret /kick perl_splut lemme help!
18:44 jearl joined #bioperl
18:47 perl_splut :)
18:53 deafferret jearl: welcome back! we thought we lost you
18:55 jearl oh no, just had to restart the browser for an update
18:55 jearl I am getting some strange errors using bioperl though.
18:55 jearl I tried to follow the example here: http://www.bioperl.org/wik​i/Features_vs._Annotations
18:56 jearl the very first one, and it runs, but I get errors like this:
18:56 jearl Subroutine new redefined at C:/strawberry/perl/site/lib/Bio\Location\Simple.pm line 93, <GEN0> line 12.
18:56 deafferret thats a warning, not an error. are you on bioperl-live?
18:58 jearl right, the program will still run, but warns about several .pm files in Bio\Location
18:58 jearl I'm using eh... I'm not sure what version of bioperl this is...
18:59 pyrimidine jearl: if you are on Windows, don't use the -w flag in the shebang line
18:59 deafferret I always use bioperl-live out of github  -shrug-
18:59 pyrimidine that seems to be an odd persistent problem only on Windows perl builds
19:04 jearl Hmm, I often have trouble getting bioperl to install, and it usually takes me a couple of tries.  I am currently running this on a windows machine, but I'll run it in linux and see if I get similar errors.
19:04 jearl I actually always remove the '-w' from the shebang... I'm a big proponent of 'use warnings;' instead
19:08 deafferret I rarely "install" BioPerl. I tend to git clone into my @INC
19:08 deafferret done and done   :)
19:10 jearl hmm, well if you don't have to worry about all the nonsense that I usually go through to get it, that sounds way better ;-)
19:11 jearl ok, all the warnings don't show up if I run it in linux.  But there still are a couple
19:12 jearl Use of uninitialized value in concatenation (.) or string at rearrange_gbk_from_Mauve.pl line 19, <GEN0> line 211.
19:13 jearl and in that example its refering to the "$temp2[0]" item (when I remove that from the print statement, it works)
19:17 deafferret http://bioperl.org/wiki/IRC#Getting_help | http://www.bioperl.org/wiki/Using_Git | nopaste to gist.github.com
19:18 Topic for #bioperl is now http://bioperl.org/wiki/IRC#Getting_help | http://www.bioperl.org/wiki/Using_Git | nopaste to gist.github.com
19:18 deafferret jearl: if you show us you code we might have a fighting chance of knowing what you're talking about.  :)   gist.github.com
19:21 jearl I'm just running the example code from the earlier link I sent, at the top of the page (the first example. It is here http://www.bioperl.org/wiki​/Features_vs._Annotations).  However, I need to read the rest of the page, since the post is about the problems associated with this code...
19:26 jearl Ah.  So that code seems more like a way of *not* doing what I want.
19:51 carandraug left #bioperl
21:00 mzgrideng_ left #bioperl
21:17 pyrimidine left #bioperl
21:40 mzgrideng joined #bioperl
21:40 mzgrideng left #bioperl
22:00 deafferret ?  :)
22:26 * dnewkirk falls asleep at his desk
22:44 rbuels dnewkirk: if you need some work to do, i can give you some
22:44 * rbuels chuckles
23:05 dnewkirk Oh believe me, I have work to do. I just haven't slept wll in weeks.
23:07 dnewkirk Debugging code isn't always invigorating
23:08 perl_splut yep, it isn't

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary