Camelia, the Perl 6 bug

IRC log for #bioperl, 2012-03-08

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
03:36 gtuckerkellogg joined #bioperl
05:09 sizz joined #bioperl
06:46 balin joined #bioperl
08:45 leont joined #bioperl
11:20 carandraug joined #bioperl
12:44 leont joined #bioperl
13:11 gtuckerkellogg joined #bioperl
16:58 donut_1 joined #bioperl
16:59 donut_1 I'm having a script issue and could use a bit of help.
16:59 donut_1 Anyone here have a little free time?
17:01 donut_1 I have a GeneID and I want to be able to yank that sequence from the NCBI Gene database.  I'd also like to get the sequence of +5k -5k bases from the start of the Gene seq.
17:01 donut_1 So I'm currently trying to use the "DNA sequence using EntrezGene IDs" example script from the Bioperl Eutilities HowTo page
17:02 donut_1 when I give it a GeneID, it searches in the 'nucleotide' database and gives me all this nice information that is unrelated to the GeneID (that I'm aware of)
17:02 donut_1 So I went in a changed the database flag -db to gene for all cases
17:02 donut_1 and now it spits out a bunch of junk code
17:10 jhannah donut_1: howdy. per /topic, please show us your "bunch of junk code" via gist.github.com
17:13 donut_1 jhannah: sure thing.  but the explanation could be simple.  if I change both -db flags to 'gene' as opposed to the default -db flag 1 = 'gene' and -db flag 2 = 'nucleotide', am I fundamentally wrong in doing this?
17:13 donut_1 or maybe to even simplify things a bit more, if i have a GeneID, what script at the BioPerl website should I begin with to pull sequence data from that GeneID?
17:14 donut_1 The next step is to be able to pull +5k upstream and +5k downstream from the GeneID start coordinate
17:18 jhannah I don't understand you when you type    -db flag 1 = 'gene' and -db flag 2 = 'nucleotide'     if you show me the actual code I will understand it   :)
17:19 donut_1 Ah yes.  My apologies. https://gist.github.com/2002175
17:20 donut_1 This is the sample script from the BioPerl website. When I leave it 'as is' and run the script with a particular GeneID, it gives me results from the nucleotide database giving me a sequence that does not match the gene seq. I'm looking for.
17:20 jhannah ah, yes. thank you. that's much easier for me.  :)    i've been out of the game for a while, hopefully someone with more clues will show up. i'm poking around on the wiki for you
17:22 donut_1 Thank you so much.  I've been beating my head against this wall for two days.  I assume that calling a seq. via a script would be somewhat on the simplistic side of things.  However I am no Perl programmer nor do I do hardly anything biochem related.
17:22 jhannah also I need to help my coworker with this other thing, if you can hang out a while someone (or me) might be able to help
17:22 donut_1 I will be here.
17:25 jhannah :)
17:53 pyrimidine donut_1: that gist link appears dead.  Did you work out the problem?
17:56 donut_1 give me a sec and I'll repost the code
17:57 donut_1 https://gist.github.com/2002353
17:57 donut_1 So this is the script I'm using.  It gives me seq data but not for the geneID i'm tying to query
17:58 pyrimidine what GeneID?
17:59 donut_1 Its an EnsemblID
17:59 donut_1 ex.) ENSG00000000460
17:59 leont joined #bioperl
17:59 pyrimidine ah, then it definitely won't work.  eutils expects NCBI UIDs
18:00 donut_1 Perfect.  I thought it was something fundamentally wrong with my methods.
18:00 pyrimidine you could probably try an esearch on 'gene' to get a list of possible UIDs
18:01 donut_1 is there a bioperl script that takes ensemblIDs and gives sequence data?
18:02 pyrimidine donut_1: you should look into the emsembl Perl API
18:02 pyrimidine or possibly use BioMart
18:02 donut_1 Thats what I'm doing right this moment, looking at the ensembl perl api
18:06 pyrimidine donut_1: this version works for me: https://gist.github.com/2002399
18:07 pyrimidine you need a prelim step to get the proper UID from the Ensembl ID, the only way to get that is esearch
18:08 donut_1 Wonderful.  I will give it a try.  So I've installed the api and I'm using a sample script to search for I assume the EnsemblID.  The script runs without error finally but it returns nothing to the screen.
18:08 pyrimidine donut_1:  the dangerous part: if you get more that a 1-to-1 mapping of UID to EnsemblID, this won't distinguish that.  You should probably run the queries in a loop.
18:09 pyrimidine and keep track of the EnsemblID => UID mapping(s)
18:13 donut_1 Yes i would run that in a loop.  Is there a wiki script example that allows you to search EnsemblID and returns a UID?
18:14 pyrimidine donut_1: just use the first part of that script (up to the @gene_ids line)
18:17 donut_1 I guess I'm a bit afraid to say this but I don't exactly know for sure what you mean.
18:19 pyrimidine donut_1: https://gist.github.com/2002399 # second example
18:20 donut_1 okay so the output is the UID?
18:20 pyrimidine donut_1: yup
18:22 donut_1 Oh okay.  So if I run your first example with an EnsemblID, I get the exact same results when I run your second example, get a UID and use the UID in the first example.
18:22 donut_1 Therefore, using your first example + EnsemblID is giving me the sequence I'm looking for?
18:22 donut_1 I see you have the -5000 +5000 in there so that is the +/- 5k bases I'm looking for?
18:24 donut_1 do you recommend I grab the UIDs first then run those through your 1st example?
18:27 pyrimidine donut_1: yes, you could grab the UIDs first
18:27 donut_1 and the +/-5k is already built into the script at the bottom there?
18:28 pyrimidine donut_1: I believe so
18:28 donut_1 Thank you so much.
18:28 pyrimidine donut_1: have you looked at biomart?  It's possible it would be much easier to get the UIDs from there
18:28 donut_1 I don't know Perl hardly at all but with this I can whip up a shell script to quickly loop through all the EIDs I have
18:28 donut_1 I actually haven't yet.
18:30 pyrimidine donut_1: any lang experience besides Perl?  Python, Ruby, etc?
18:30 donut_1 Python
18:31 donut_1 I didn't really want to come out and say that.  I hear there is a lot of bitterness with Perl programmers and Python.
18:33 pyrimidine naw,
18:33 pyrimidine the only bitterness comes from those who don't know any better
18:33 pyrimidine both have their advantages/disadvantages
18:34 pyrimidine I know a bit of Ruby, Python, Java, my primary lang is Perl
18:34 donut_1 well prior to Python I realy didnt have any prog experience (other than shell scripting).  Python was the first by chance.  My boss told me to enroll in that class.
18:34 pyrimidine it's worth it
18:34 donut_1 Well I really enjoyed Python.  It reads well and for someone like me that is a huge plus.
18:35 pyrimidine it helps enforce that, yes
18:35 pyrimidine you can write readable code in any language, python's style enforces it a bit more
18:36 donut_1 Well this is really my first round with Perl.  So far so good.
18:38 pyrimidine just need to watch out for the few gotchas, specifically sigal variance
18:38 pyrimidine (what some pythonistas refer to as 'line noise')
18:38 pyrimidine though I suppose that applies to regexes as well ...
18:38 donut_1 Now you are speaking way over my head. :)
18:39 pyrimidine the @$% that preceeds variable names
18:40 pyrimidine $ = scalar, @ = array, % = hash
18:40 pyrimidine etc
18:40 donut_1 Ah, yes.
18:58 donut_1 Okay I'm looping over the EIDs and logging the UIDs
18:58 donut_1 next step will be to pump these into the seq fetcher and log the seqs
18:58 donut_1 I'm so deeply grateful for your help
19:00 pyrimidine donut_1: no problem
19:07 donut_1 So there have been a couple cases where I feed it an EnsembleID and it returns nothing or it returns two UIDs
19:07 donut_1 is this expected and if so what suggestions do you have for what is going on here?
19:07 pyrimidine yes, that's not surprising'
19:08 donut_1 Right now my script is looping through a few thousand EIDs.  I built in a sleep command so it doesnt overload the server.  I think I read 3 queries per second if I'm not mistaken.
19:08 donut_1 So if a UID is not returned, then NCBI does not have information for that particular EID?
19:08 pyrimidine yes, that should be fine
19:08 pyrimidine I don't think all EnsembleIDs will be found, or could possibly map to multiple IDs
19:08 pyrimidine may be database discordance
19:08 donut_1 So a manual verification is necessary here I would imagine
19:09 donut_1 but for the cases where it returns one UID, that should be kosher.
19:11 donut_1 I'm helping out a friend with all this.  Outside of what I'm supposed to do (use scripts to get seq data from database) I've no clue what is going on.
19:12 pyrimidine ah, now you get to see how much fun us bioinformaticians have :)
19:12 donut_1 pchem is just as 'fun'!
19:46 spanner joined #bioperl
20:00 jhannah donut_1: oh thank goodness pyrimidine showed up. he knows 1000X more than me  :)   pyrimidine++
20:00 jhannah (i'm back from lunch)
20:03 leprevost joined #bioperl
20:04 Topic for #bioperl is now Be patient! People chat here daily, but not necessarily the minute or hour you wandered in. Leave your IRC client connected. | http://bioperl.org/wiki/IRC#Getting_help | http://www.bioperl.org/wiki/Using_Git | nopaste to gist.github.com
20:57 rbuels who dat is?
21:07 rbuels joined #bioperl
21:09 rbuels joined #bioperl
21:38 philsf joined #bioperl
21:38 philsf joined #bioperl
21:48 donut_1 if pyrmidine is still around, Just wanted to say thank you very much for your help once again
21:49 donut_1 I've finally snagged all the UIDs that I could get.  I processed them through another script and now I'm looping over each one getting the sequence data for each
23:25 jhannah pyrimidine++++
23:25 jhannah rbuels: ACE HOOD!
23:26 jhannah http://www.youtube.com/watch?v=MkpvCACeaak
23:28 jhannah Ace "The Bard" Hood
23:33 jhannah oh dear. Ace appears to have some unresolved anger issues.
23:34 jhannah lolz
23:35 jhannah http://www.youtube.com/watch?v=g4jCEB0OTQE#!
23:39 jhannah wow. NSFW: http://www.youtube.com/watch?v=aJY4AOFoaZI lolz
23:39 jhannah rbuels: ^
23:40 rbuels jhannah: sssh, i'm watching Free My Niggas
23:41 jhannah "My Speakers" seems to suggest that women are sexy sometimes...
23:41 jhannah odd.
23:41 rbuels my goodness, that third one features the F word rather prominently
23:41 jhannah ya, it gets better at 1:20  :)
23:41 rbuels "My Speakers"?
23:42 rbuels ah, i have it
23:42 rbuels i don't think those are actually his speakers
23:42 jhannah rbuels: he's not wastin' time on you haters
23:43 rbuels wow, he rhymed "Bimmer" and "Leukemia"
23:43 * rbuels applauds
23:43 jhannah 23:28 <@jhannah> Ace "The Bard" Hood
23:43 rbuels indeed.  'twas apt.
23:44 jhannah this is how I get into every concert: http://www.youtube.com/watch?v=8JJrqejW4n0
23:45 jhannah lol. it is so wrong that i like this
23:46 rbuels make it rain is my second favorite, after loco with the cake
23:46 jhannah Verizon MiFi working extremely well over here in Council Bluffs. Woot!
23:52 jhannah 2 greatest songs of all time
23:53 jhannah hell, 2 greatest things ever created by mankind

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary