Camelia, the Perl 6 bug

IRC log for #bioperl, 2009-08-12

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:49 driley joined #bioperl
01:02 driley joined #bioperl
01:53 splut joined #bioperl
12:19 kyanardag joined #bioperl
12:50 deafferret rbuels: check your purl factoid in #moose
12:51 deafferret http://www.imdb.com/title/tt0404030/
14:58 rbuels hahahaha
14:58 rbuels officious
14:59 rbuels i've never seen everything is illuminated, is that factoid a reference?
15:21 perl_splut joined #bioperl
15:52 deafferret rbuels: indeed. good stuff. Eugen Hütz   Gogol Bordello
15:52 deafferret Eugene Hutz
15:53 rbuels oh really?
15:53 * rbuels puts it in netflix queue
15:53 deafferret ya. I just Netflix-saw that last night
15:55 deafferret http://titirangistoryteller.files.​wordpress.com/2009/02/eii-osdb.jpg
17:14 pyrimidine joined #bioperl
17:18 pyrimidine left #bioperl
17:19 pyrimidine joined #bioperl
20:00 nErVe joined #bioperl
20:12 balin Hi all. Here is the task: I have a multiple sequence alingnment in FASTA format and a domain(substring) for one of the sequecnes. In the end I want to get a slice for all the sequences that are aligned with this domain. I tried SimpleAlign and it doesn't seem to do exactly this. Is there a simple way to do it?
20:14 perl_splut simple align does do it, but you have to pad all the sequences so that they are in the right position
20:18 balin the sequenses are already padded with dashes, or it is not what you meant?
20:21 rbuels balin: if the domain seq you're looking for has no gaps, and you have a seq in the multiple alignment that contains exactly that domain,
20:21 rbuels balin: then you can find the coords of that domain in the 'reference' seq and use splice()
20:22 rbuels balin: to get that region for the whole alignment
20:22 * pyrimidine points at rbuels
20:22 * deafferret too
20:22 pyrimidine what he said
20:23 rbuels er, not splice.  slice.
20:24 * deafferret shakes his pointed finger
20:24 balin I have thought of that too.  Now, how do I find the domain in a sequence?
20:25 rbuels can use the perl index() function on the $seq->seq
20:25 rbuels perldoc -f index
20:25 pyrimidine that's the fastest way by far
20:25 rbuels my $ind = index($refseq->seq,$domain->seq)+1
20:25 balin so, there in no wrapper for it in bioperl?
20:25 pyrimidine we can't do everything for you
20:26 rbuels my $domain_slice = $align->slice($ind,$ind+$domain->length);
20:26 pyrimidine that would be cheating
20:26 pyrimidine ;)
20:26 rbuels lol, what is the point of wrapping index()
20:26 rbuels well....
20:26 balin I was just thinking about encapsulation and stuff
20:26 * rbuels encapsulates dukeleto's shenanigans
20:27 pyrimidine ?
20:27 pyrimidine what's dukeleto up to now?
20:27 rbuels (sshhhh, he'll hear)
20:27 * pyrimidine stays quiet
20:27 balin ok, now what if we have gaps in the domain?
20:28 deafferret index won't work if the $domain gapped out, right?
20:28 rbuels heh.
20:28 rbuels yeah.
20:28 deafferret oh. ya. what he said.
20:28 rbuels pyrimidine: well, he's talking a little bit at the pdx.pm meeting tonight
20:28 pyrimidine if you know the placement of the gaps, index would work, otherwise you'll need a regex of some sort
20:28 rbuels pyrimidine: i end up hanging out with him a lot at pdx hackathons
20:28 pyrimidine oh, okay
20:29 pyrimidine rbuels:
20:29 pyrimidine rbuels: when are you planning on destroying bioperl?
20:29 rbuels i am accumulating tuits.
20:29 balin no, I would know where gaps are located
20:29 balin ^not!
20:29 pyrimidine I mean, 'restructuring' bioperl
20:29 rbuels "propose chunk of bioperl to split off is on my todo list"
20:30 * rbuels considers the thorny gap problem
20:30 rbuels maybe if you add the domain into the multiple alignment?
20:30 pyrimidine okay
20:31 balin and realign?
20:31 rbuels or just have it in there in the first place during the alignment
20:31 pyrimidine Where are you pulling this from?
20:31 pyrimidine (the alignment)
20:32 balin from homologene
20:33 rbuels ah.
20:33 pyrimidine This is one of the problem areas with the bp simplealign implementation
20:34 pyrimidine The alignment can have features
20:34 pyrimidine but features are geared towards sequences, not alignments
20:35 pyrimidine We need to genericize features to pertain to any start/end coordinates, whether on an alignment, sequence, array, etc
20:36 pyrimidine Then, from an alignment, we could get the feature and use a generic slice method to get the slice of whatever the feature is pointing to
20:36 rbuels well a feature is just a pair of coordinates right?
20:37 rbuels and some other stuff...
20:37 * rbuels looks...
20:37 pyrimidine (this would be analogous to seq() in the current SeqFeature implemention)
20:37 pyrimidine rbuels: yes
20:37 rbuels ls
20:37 rbuels woops wrong window
20:37 rbuels lol
20:37 deafferret alias ls=/names
20:38 rbuels hrm, i guess they are called SeqFeatures aren't they
20:38 rbuels 'tis a problem.
20:38 pyrimidine yep
20:38 pyrimidine and to get this integrated into bioperl would be a pita
20:39 * rbuels feels the molasses feeling of bioperl's monolith sapping his energy
20:39 pyrimidine b/c it adds an extra level of inheritance (Root->Feature->SeqFeature->etc)
20:39 rbuels but.  balin's problem.
20:40 rbuels balin: do you have a way to redo the multiple alignment?
20:40 balin realy don't want to
20:40 rbuels balin: that would probably be the easiest way to gap your domain so you get the right slice
20:40 rbuels yeah, understandable
20:41 pyrimidine balin: is there any meta data attached to the alignment?  Like an extra line indicating where the domains are?
20:41 balin nope
20:41 * rbuels gnashes his teeth
20:41 * pyrimidine looking at homologene data
20:43 balin I guess I'll just create a second string/array with numbers [1-endOfAlignemnt] then I'll remove the gaps with corresponding indixes. After that I could just use index, and retrieve alignment indices from that second array
20:43 pyrimidine looks like the XML output is the only one that includes any domain information
20:44 balin even then they only have CDD domains, and I have some from other sources
20:51 * rbuels does not understand balin's proposed strategy
20:51 rbuels seems to me that the only way some arbitrary domain could be mapped onto these existing seqs would be with *some sort* of alignment work being done
20:51 rbuels cause after all, that's what aligning is.
20:54 pyrimidine I think the idea is to go through the sequence keeping track of the position of the residues, match the domain using index() against the seq w/o gaps to get the start.end, then map those coordinates back to the alignment
20:54 balin To use index() I will need to remove all gaps. But then the indices will be wrong. To correct for this I'll store the indices separately
20:54 pyrimidine booyah!
20:56 perl_splut yeah, Bioperl is more of a toolkit than a complete application for Bioinformatics work
20:57 balin would it be useful enough to add it to bioperl?
20:57 * pyrimidine notes that 'toolkit' matches 6 times on the bioperl home page
20:59 pyrimidine yes, it would be useful, but it would need to somehow take into account variation in gap symbols (b/c that's what would be mapped, not the residues)
21:00 balin that would be set by gap_char
21:01 pyrimidine For the alignment, yes (this is another problem area)
21:02 pyrimidine b/c LocatableSeq also define what a gap symbol is (and does it globally, which is not good)
21:04 rbuels oh i see, because there will be a seq in there where you can exactly map the domain, you just have to correct for the gaps that might be in the reference seq
21:04 * rbuels understands now
21:04 balin well, LocatableSeq takes gap symbol as a parameter
21:05 balin in no_gaps()
21:08 pyrimidine Symbols are set as global variables.  For LocatableSeq: $GAP_SYMBOLS = '\-\.=~';
21:08 pyrimidine That can be modified, but it all LocatableSeq instances
21:09 pyrimidine http://bugzilla.open-bio.org/show_bug.cgi?id=2715
21:09 pyrimidine s/it all/it affects all/
21:10 balin oh, I was looking at api for version 1.4
21:10 pyrimidine For you case this shouldn't be an issue if the gaps don't change: example code (from Biome):
21:11 pyrimidine http://gist.github.com/166764
21:12 pyrimidine In this case I'm just mapping start/end to grab the subsequence, but in your case you could modify this to map residues
21:13 pyrimidine (and it could probably be optimized to use index instead of the regex
21:13 * pyrimidine noticing #moose and #perl6 are very active today!
21:17 balin thanks, I'm looking into it
21:19 pyrimidine BTW, the $gaps is a flag in the code; the original code is in github:
21:19 pyrimidine http://github.com/cjfields/biome/tree/master
21:19 pyrimidine (lib/Biome/Role/PrimarySeq.pm)
21:39 * pyrimidine commuting &
21:55 deafferret It's a brave new world for deafferrets.... open my $fh, '<:encoding(UTF-8)', $filename;
22:02 rbuels o_O
22:02 rbuels where is that :encoding thing documented?
22:03 balin http://perldoc.perl.org/functions/open.html
22:06 rbuels i've never looked at that before
22:06 * rbuels has learned new skill "PerlIO layers", 3 more skills to level 35
22:07 biomoose joined #bioperl
22:07 biomoose biome: Christopher Fields master SHA1-51f2b7e
22:07 biomoose minor updates
22:07 biomoose http://bit.ly/Siy5d
22:07 biomoose biome: Christopher Fields master SHA1-de4192b
22:07 biomoose minor updates (deps)
22:07 biomoose http://bit.ly/EN7oo
22:07 biomoose biome: Christopher Fields master SHA1-782f5bf
22:07 biomoose minor update
22:07 biomoose http://bit.ly/pQbcM
22:07 biomoose biome: Christopher Fields master SHA1-a0d01d1
22:07 biomoose minor updates
22:07 biomoose http://bit.ly/SaK8T
22:07 biomoose biome: Christopher Fields master SHA1-747a6a2
22:07 biomoose minor updates
22:07 biomoose http://bit.ly/MBhxi
22:07 biomoose left #bioperl
22:07 balin huh?
22:12 deafferret rbuels: http://search.cpan.org/~nwclark​/perl-5.8.9/pod/perlunifaq.pod
22:14 deafferret http://perldoc.perl.org/perlunifaq.html (5.10)

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary