Camelia, the Perl 6 bug

IRC log for #bioperl, 2010-11-04

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:25 kyanardag_ joined #bioperl
00:56 philsf joined #bioperl
00:56 philsf left #bioperl
00:56 philsf joined #bioperl
01:57 philsf left #bioperl
03:36 dukeleto left #bioperl
03:36 dukeleto joined #bioperl
04:28 dukeleto left #bioperl
04:28 dukeleto joined #bioperl
04:41 svaksha left #bioperl
04:41 svaksha joined #bioperl
06:26 bag__ joined #bioperl
07:45 bag__ left #bioperl
09:30 BeyBar joined #bioperl
10:11 kai hi folks
10:13 kai can I call _pushback twice?
10:29 BeyBar left #bioperl
11:23 kai now for the more interesting question... how do I get the first two lines of the input before the Bio::SearchIO->_initialize_io ran
11:37 carandraug joined #bioperl
12:13 brandi1 joined #bioperl
12:13 brandi1 left #bioperl
12:23 carandraug left #bioperl
12:25 carandraug joined #bioperl
12:47 raj joined #bioperl
12:49 raj I have to write an ORF finder (ie re-invent the wheel) to parse a 1.2 million nuc. sequence FASTA file
12:50 raj and I'm getting several thousand 'hits' back
12:50 raj is there any online resource taht can tell me how many I *should* be getting?
12:54 philsf joined #bioperl
12:54 philsf left #bioperl
12:54 philsf joined #bioperl
13:05 philsf anyone know how can I set Bioperl as a dependency for a module in ExtUtils::MakeMaker? Using Bio::Perl, or the modules I actually use (Bio::Seq and SeqIO) , or something else?
13:24 kai raj: er, been, there, but on the phone right now, will get back to you in a bit
13:41 raj kai: ok, on standby :)
13:48 kai raj: ok, so basically the number of ORFs depends a bit on the organism you have
13:49 raj it's a theoretical bacterial DNA seq, assume start with ATG & standard stop TAA, TGA or TAG
13:49 kai ok, so it's not a real sequence? :)
13:49 raj I'm just interested to know if there is  a *definitive* answer to this?
13:49 raj no I don't think it's real
13:50 raj it's an exercise and they didn't say where it came from
13:50 kai you can only play the probabilities
13:51 kai you could grab a couple of bacterial genomes from NCBI and see what the average length of the ORFs is there
13:51 raj ok, so no definitive answer then, just a probable range ?
13:51 kai that'd allow you to guesstimate the number of ORFs you can fit on a 1.2Mbp genome
13:52 raj right, good idea - I'm getting about 7000 unique seqs
13:52 raj ie ORFS
13:52 kai that sounds a bit high
13:52 kai I'm working with actinomycetes, which have ~7k-8k ORFs in their ~8-9 Mbp genome
13:52 raj ooh, right i see
13:52 kai but that's about eight time the genome size you have
13:52 raj yep
13:53 raj mind you, i've not applied a min length yet
13:53 kai ah, ok
13:53 raj well, actually 6
13:53 raj which is too low for 'sensible' peptides
13:53 kai 6 bp? i.e. start+stop? :)
13:54 raj no, i'm just capturing nuc's between atg & stop
13:54 kai well, still a bit on the short side :)
13:55 raj i notice other online ORF finders always return M as the start amino-acid, so
13:55 kai that's because ATG is M
13:55 raj I assume the corerct thing is to return the starting ATG ?
13:55 kai sure
13:55 raj yeah, but is it not only the start, but also translated into M ?
13:55 raj in the real world?
13:55 kai ys
13:56 kai yes..
13:56 raj so (almost) all bacterial peptides begin with Methionine ?
13:56 kai in the real world, there's alternative start codons as well
13:57 raj oh, yes of course - the exercise said to *assume* al start with ATG for simplicity
13:57 kai and you can have posttranslational modifications that cut off the M
13:57 raj right
13:58 kai but in general, yes, most proteins start with Methionine
13:58 raj ok, just to extend my exercise, what nucleotide cut-off would you recommend and I'll re-run it ?
13:59 raj for comparison with your actinomycetes genome
13:59 kai the ORF finder I wrote used 40 AAs as the lower limit
14:00 raj ok, thanks
14:00 kai that misses me out on some small hypothetical proteins, but not very many
14:00 raj 3851 unique sequences
14:00 kai also, what you'll want to do is to run your orf finder on some organisms that have a similar GC content as your target sequence and see how you compare :)
14:01 kai not sure if that's really needed if you just need the ORF finder for a class excercise
14:02 kai there's a couple of corner cases you'll want to make sure to handle
14:02 kai like overlapping ORFs
14:02 raj right
14:02 kai usually an overlap (especially not in-frame) of start/stop codons is ok
14:02 kai you'll see that in many bacterial operons
14:05 kai what's the design goal of your ORF finder? how accurate do you need it to be? how fast does it need to be?
14:06 kai if you need accurate, you might want to overpredict ORFs and then use extra steps to discard unlikely ORFs
14:07 raj predict likely peptide seq's to find best digestive enzyme for analysis in mass specto
14:08 raj probably doesn't need to be highly accurate, or fast
14:08 raj but it is coz I've written it in perl :)
14:08 kai for GC-rich organisms, you can make use of the GC content of the third codon position
14:09 kai but that's probably unreliable if your GC content is below ~65%
14:09 raj oh, jost noticed you said 40 aa's not nuc's, hold on ....
14:10 raj 120 nuclotide cut-off gives me 1048 ORFs
14:10 raj which is roughly in line with your actinomycetes figure
14:13 kai yeah, not too bad
14:13 kai and it's still plenty above the number of "minimal required genes" of e.g. Bacillus subtilis
14:14 kai which seems to be roughly 300
14:14 raj ok, thanks for all that - very helpful
14:15 kai no problem :)
14:16 kai it's more fun than trying to merge the hmmer2 and hmmer3 parsers ;)
14:16 deejoe this has been a fun read, thanks
14:18 raj deejoe: heh - probably very basic biology, but that's my level at the moment :)
14:18 kai philsf: go for the modules you need
14:20 philsf kai, wouldn't it make automated systems look for those modules instead of the bioperl package?
14:21 kai philsf: depends on the automated system
14:22 kai philsf: e.g. the packaging scripts for fedora and opensuse have scripts that check which RPM provides a specific module. if you depend on a module, the corresponding RPM will be installed
14:22 kai debian has a similar tool that packagers can run
14:23 kai also, BioPerl has plans to split up into smaller parts that are useful standalone
14:23 kai depending on the specific modules future-proofs your checks
14:38 kai pyrimidine: when you wake up, I've got a question about this hmmer parser merge :)
14:39 kai pyrimidine: my OO-Perl-fu is too weak to figure out how to correctly initialize the "merged parser" that I then use to decide which real parser to call
14:42 dnewkirk joined #bioperl
15:06 kyanardag_ left #bioperl
15:17 philsf kai, I intend to use dh-make-perl to create deb packages, but it can't (AFAICT) detect that it needs Bioperl - instead it requires nonexisting packages that follow normal naming convention (like libio-seq-perl, fx)
15:24 philsf in other words, it thinks Bio::Seq is available as a standalone module, as a debian package
15:27 kai it does?
15:28 kai kai@mikropc7:~$ dh-make-perl --locate Bio::SearchIO::hmmer
15:28 kai Using cached Contents from Wed Oct 13 09:43:19 2010
15:28 kai Bio::SearchIO::hmmer is in science/bioperl package
15:29 kai doesn't look like it to me
15:30 philsf kai, hmmm, that's what happened in my tests
15:31 kyanardag joined #bioperl
15:31 philsf but I still didn't finish the pre-reqs definitions for MakeMaker, this is probably why
15:40 philsf left #bioperl
15:41 philsf joined #bioperl
16:07 dbolser raj: what about information content on 'genome' ?
16:07 dbolser coding regions = less random
16:07 dbolser also phylogeny
16:09 raj dbolser: no idea - it's just a file full of a,t,c & g to me
16:11 dbolser raj: google sequence complexity ;-)
16:11 dbolser also, coding sequence prediction
16:12 dbolser also, blast it all ;-)
16:12 raj ok, on todays todo list :)
16:13 dbolser where are you at?
16:13 dbolser also see #bioinformatics
16:17 * deafferret phylogenizes dbolser
16:34 raj dbolser: where am I at ?
16:36 raj left #bioperl
16:36 raj_ joined #bioperl
16:40 philsf left #bioperl
17:06 dnewkirk Today is turning out really well. Something must be wrong :)
17:07 deafferret dnewkirk: you must not be fighting javascript today  :)
17:12 dnewkirk Indeed. I never need javascript, only C/Python/Perl/CUDA
17:13 deafferret whats CUDA?
17:13 dnewkirk GPU compute framework for Nvidia
18:04 pyrimidine kai: awake :)
18:27 dbolser raj_: I don't understand the question
18:27 dbolser also, look for long regions of peptide with no stops
18:27 dbolser you may identify alternative stop codons that code for seleno-met
18:42 raj_ dbolser: it was just in response to your "where are you at?" - was it for me ?
18:43 dbolser raj_: oh right, yeah
18:43 dbolser you seem to be doing bifx homework ;-)
18:43 dbolser I wondered at what school
18:46 raj_ bifx ??
18:47 raj_ last time I did homework in the formal sense of the word was not long after Watson & Crick cracked the code :)
18:53 raj_ ahh, bfix == bioinformatics :)
19:17 kblin pyrimidine: hey there
19:18 kblin pyrimidine: so, my idea of merging the hmmer2 and hmmer3 parsers was the following:
19:18 kblin pyrimidine: I moved the hmmer2 parser to Bio::SearchIO::hmmer2
19:18 kblin then
19:21 kblin then I added a new Bio::SearchIO::hmmer class that inherits from Bio::SearchIO as well, overriding new(), looking at the input file's first two lines. If the second line matchess m/^HMMER 2/ then it's hmmer2, so return Bio::SearchIO::hmmer2->new(), if it matches m/^# HMMER 3/ then return Bio::SearchIO::hmmer3->new()
19:22 kblin the plan worked up to the moment I realized that handling the IO setup happens in Bio::SearchIO::_initialize, which won't be called for Bio::SearchIO::hmmer or it'd end up being called twice..
19:49 rbuels bifx?
19:50 * rbuels looks askance at dbolser
19:51 deejoe bi[oin]f[ormati]x
19:52 deejoe or b12s ;-) for short
20:04 deafferret rbuels: picture of skances last longer  :(
20:04 deafferret pictures
20:16 pyrimidine pitures, as we say in Texas
20:16 deafferret pyrimidine: you in TX now?
20:16 pyrimidine no, just from there
20:17 deafferret ah. get that webcam fixed yet?  :)
20:18 pyrimidine nope, and I can't even find the link.  they prettied up the home page: http://www.igb.uiuc.edu/
20:18 deafferret you're hot for lisa stubbs?  ;)
20:19 pyrimidine actually, I do some work for her
20:20 deafferret you need a homepage.  Can't click on your name  :)
20:21 pyrimidine deafferret: not enough tuits these days
20:21 pyrimidine man I look old in that pic
20:21 deafferret huh. mouse-over bar very touchy in firefox
20:22 pyrimidine yes, I think they just switched over, wouldn't be surprised if there are still some things that need tweeking
20:26 deafferret Gorillaz++
20:27 deafferret this is a rather repetitive playlist tho  :/
20:28 deafferret woot! a freaking week later I got this damn calorie counter working  :)
20:31 carandraug left #bioperl
20:31 pyrimidine kblin: IIRC, Bio::SearchIO->new(-format => 'foo') dynamically loads the module 'Bio::SearchIO::foo', calls new(), then initializes the IO.  So, you could override both new and _initialize to suit your purposes, though it might be a little tricky.
20:31 pyrimidine $writer = $output_module->new(@args);
20:32 pyrimidine sorry, wrong line.  This is the one: return "Bio::SearchIO::${format}"->new(@args);
21:38 pyrimidine left #bioperl
22:10 kblin durn
22:18 occams_rzr joined #bioperl
22:25 * kblin resummons pyrimidine
22:26 dnewkirk waiting for pyrimidine to respawn, eh?
22:26 kblin yeah
22:26 deafferret kblin: he scrolls the backlogs, so you can go ahead and ask
22:27 kblin I already asked, more or less
22:28 kblin I don't really see how I can override new to call _initialize to return "Bio::SearchIO::${format}"->new(@args), because that'd run initialize again
22:40 rbuels kblin: so refactor it
22:42 rbuels kblin: you can modify other modules too
22:44 kblin rbuels: I know, I'd just like to keep the code changes minimal
22:44 rbuels kblin: it's nearly midnight.  time to take out the chainsaw.
22:44 * rbuels chuckle
22:44 rbuels s
22:45 kblin but I guess if the hmmer2 and hmmer3 parsers both subclass hmmer instead of SearchIO, and I mimic the SearchIO logic it  could work
22:45 kblin "malcolm solves his problems with a chainsaw and he never has the same problem twice"
22:47 bag_ joined #bioperl
22:48 kblin hm, I guess I'll just go for that
23:01 occams_rzr left #bioperl
23:17 raj_ left #bioperl
23:19 bag_ left #bioperl
23:28 kyanardag left #bioperl

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary