Camelia, the Perl 6 bug

IRC log for #bioperl, 2012-03-21

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:18 gtuckerkellogg joined #bioperl
00:31 rbuels pyrimidine1++ # thanks for the retweet
00:41 CIA-66 bioperl-live: Florent Angly amplicons * r1fa852f / Bio/Tools/AmpliconSearch.pm : POD update: todos - http://git.io/36mfVg
00:42 CIA-66 bioperl-live: Florent Angly amplicons * r90f13d2 / Bio/Ontology/SimpleGOEngine/GraphAdaptor02.pm : Merge github.com:bioperl/bioperl-live into amplicons - http://git.io/CHDsEg
01:42 leprevost joined #bioperl
02:02 scottcain joined #bioperl
02:47 scottcain joined #bioperl
03:00 gtuckerkellogg joined #bioperl
08:00 gtuckerkellogg joined #bioperl
08:49 gtuckerkellogg joined #bioperl
08:53 leont joined #bioperl
09:22 kevin joined #bioperl
09:35 kev joined #bioperl
09:40 davidm_ joined #bioperl
09:41 daube joined #bioperl
09:45 daube joined #bioperl
10:07 daube hi all, got a rather general question regarding the databases maintained by NCBI: is there a bioperl function to be able to get a sequence from the query-like syntax used in the "coded-by" tags, e.g. "complement(CP001087.1:4026613..4027008)".
10:08 daube i can write something to get the sequence, but it involves downloading the ENTIRE sequence of the accession mentioned, which are often whole chromosomes so is very slow.
10:27 daube Here's the gist pastebin https://gist.github.com/2146033
10:30 carandraug joined #bioperl
14:04 scottcain joined #bioperl
14:14 gtuckerkellogg joined #bioperl
14:25 kai I think the problem is that ncbi doesn't support queries like "give me bases 4026613..4027008 from CP001087"
14:32 leprevost joined #bioperl
15:22 scottcain_ joined #bioperl
16:10 scottcain_ joined #bioperl
16:24 pyrimidine1 kai: actually, if you use efetch you can get regions of sequence
16:24 pyrimidine1 can even flip the strand
16:30 pyrimidine the trick is, the location would still have to be munged out from the feature itself, so it makes more sense to just pull in the seq and use the SeqFeature API to get the region of interest
16:34 pyrimidine joined #bioperl
16:53 pyrimidine rbuels: about to release the BioPerl dzil bundle to CPAN, let me know if I should pull the trigger
16:56 CIA-66 Bio-Coordinate: Chris Fields master * r14e416d / dist.ini : fix dist.ini - http://git.io/DmHDLg
16:56 CIA-66 Bio-Coordinate: Chris Fields master * r39585f9 / lib/Bio/Coordinate/Collection.pm : fix bug where multiple mappers are potentially passed in (as expected by method name) but are ignored - http://git.io/_4ZBmw
16:56 CIA-66 Bio-Coordinate: Chris Fields master * rfeece49 / (12 files in 2 dirs): untabify, remove extra spacing - http://git.io/Wo1knQ
16:56 CIA-66 Bio-Coordinate: Chris Fields master * rc64217b / (5 files in 2 dirs): line endings - http://git.io/F2bsyQ
16:56 CIA-66 Bio-Coordinate: Chris Fields master * rd5fc8d9 / dist.ini : skip tab tests for now (these were borking the inline test examples) - http://git.io/moI5BA
17:06 CIA-66 Bio-Coordinate: Chris Fields master * rce47658 / dist.ini : bundle is BioPerl, not BIOPERL - http://git.io/la07Tw
17:10 CIA-66 bioperl-live: Chris Fields topic/eutils-migration * rbda296a / (32 files in 8 dirs): remove EUtilities, now exists as a separate repo - http://git.io/0CpghQ
17:10 CIA-66 bioperl-live: Chris Fields topic/eutils-migration * r5ac860b / (25 files): remove EUtilities data - http://git.io/OKM9Xg
17:11 pyrimidine rbuels: also, I think we'll move out the code from the repo piecemeal and release
17:14 rbuels pyrimidine: when in doubt, pull the trigger
17:14 rbuels pyrimidine: just do it.
17:14 * rbuels always just does it
17:14 pyrimidine rbuels: will do
17:16 rbuels pyrimidine: also, Bio::GFF3::Parser::LowLevel is about 26x faster than Bio::FeatureIO::gff -version 3.    not surprising that it's way faster, but I was expecting something more like 5x or 10x, not 26x.
17:16 rbuels spending some quality time with NYTProf last night, I was able to just about double its speed from 13x to 26x
17:17 rbuels so, would be a good idea to have a look at using it to drive Bio::FeatureIO::gff
17:17 rbuels for the v3 parsing anyway
17:18 rbuels (although the performance gain will not be *that* huge, since most of the time in the bp code is spent on object creation that isn't going to get any faster by swapping out the engine)
17:18 pyrimidine rbuels: agreed.  haven't benchmarked that, so it's nice to know
17:18 rbuels (so i'm not putting swapping out the gff3 parsing in FeatureIO high on my priority list)
17:19 rbuels pyrimidine: but holy crap, if you don't really need objects, it's fast as shit
17:19 * rbuels chuckles
17:19 pyrimidine yeah, it's kinda stunning how much of a penalty there is when using bless()
17:20 pyrimidine particularly if the inheritance hierarchy is as complex as bioperl's
17:20 * rbuels nods
17:20 pyrimidine I think there is also a huge penalty for arg list munging via _rearrange() that we pay
17:22 rbuels yeah, there is
17:22 pyrimidine wow, genbank tests are really crapping out, will have to check on that
17:23 CIA-66 bioperl-live: Chris Fields topic/eutils-migration * r0aee0e5 / t/RemoteDB/EUtilities.t : remove remote eutil tests - http://git.io/Id69mA
17:24 pyrimidine afk #lunch
17:46 scottcain rbuels: pyrimidine: one of my suggested GSoC projects is to speed up GFF3 loading for Chado.  If the loader could be rewritten to not use objects, that would likely make a huge difference.
17:48 rbuels scottcain: indeed, that's quite true
17:48 rbuels might be tough though
17:48 rbuels but then, interns are good for tough, straightforward projects
17:48 scottcain exactly, that's why I want a student to work on it :-)
17:49 * rbuels nods
17:50 pyrimidine scottcain: if the data is distilled down into simple hashrefs, then you could write up convenience methods to do some of the work
17:50 pyrimidine tag('tagname$hr) to get
17:51 scottcain pyrimidine: yes, I think it would be, as rbuels put it, tough but straight forward.
17:51 pyrimidine meant, tag('tagname',$hr)
17:52 pyrimidine could also wrap any c-based parsers, not too hard to do (just wrapped a FASTQ parser myself)
17:52 rbuels i wonder if i should try to add some Inline::C to that gff3 parser
17:53 * rbuels doesn't know if a dep on Inline::C would be a good idea
17:53 * rbuels suspects not
17:54 pyrimidine rbuels: you could, but I would switch over to straight XS (Inline::C requires compilation each time, I don't think it caches by default)
17:54 pyrimidine afk again #workshop
17:55 rbuels yeah probably
17:56 rbuels nah.  it's fast enough for now
18:21 leont joined #bioperl
18:47 sl33v3_ joined #bioperl
20:54 sl33v3_ joined #bioperl
22:12 dnewkirk joined #bioperl
23:03 zenman joined #bioperl
23:44 20WAABZ3X joined #bioperl
23:44 zenman joined #bioperl
23:55 leont joined #bioperl

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary