Camelia, the Perl 6 bug

IRC log for #bioperl, 2010-05-03

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
04:10 cawiss joined #bioperl
05:49 cawiss joined #bioperl
06:13 bag_ joined #bioperl
07:48 cawiss left #bioperl
07:48 cawiss joined #bioperl
14:02 aboui joined #bioperl
14:43 driveby_bot joined #bioperl
14:43 driveby_bot /home/svn-repositories/bioperl: r16959 (fangly) : typo
14:43 driveby_bot Diff: http://tinyurl.com/34lkrvm
14:45 driveby_bot joined #bioperl
14:45 driveby_bot /home/svn-repositories/bioperl: r16960 (fangly) : Protect singlet against the effect of gaps on their LocatableSeq end
14:45 driveby_bot Diff: http://tinyurl.com/32baddc
14:52 driveby_bot joined #bioperl
14:52 driveby_bot /home/svn-repositories/bioperl: r16961 (fangly) : The content of the assembly (the contigs and singlets) is now updated when a dissolved or cross-contig spectrum is determined
14:52 driveby_bot Diff: http://tinyurl.com/2gy974p
15:41 melic joined #bioperl
15:44 melic joined #bioperl
15:46 melic joined #bioperl
15:58 perl_splut joined #bioperl
16:01 melic joined #bioperl
16:01 melic joined #bioperl
16:06 driveby_bot joined #bioperl
16:06 driveby_bot /home/svn-repositories/bioperl: r16962 (fangly) : small cleaning
16:06 driveby_bot Diff: http://tinyurl.com/22l4su9
16:47 cawiss joined #bioperl
17:04 cawiss joined #bioperl
18:07 nuba left #bioperl
18:30 dmb_ joined #bioperl
18:35 pyrimidine joined #bioperl
19:32 Lynx_ joined #bioperl
20:07 spekki01 joined #bioperl
20:15 * jhannah plays the driveby_bot Blues on his harmonica
20:50 dave_messina joined #bioperl
20:52 dave_messina ?
20:52 perl_splut ?
20:54 dave_messina sorry, I was trying to get command help. It's been too long since I've used IRC :)
20:54 perl_splut heheh
20:54 dave_messina Does Mark Jensen hang around here?
20:55 perl_splut he might. not sure what nick he's hiding under if he does
20:55 dave_messina ok, no problem I'll just email him.
20:58 pyrimidine dave_messina: hi
20:58 jhannah http://bioperl.org/wiki/IRC  majensen occasionally, apparently
20:59 jhannah email? crazy kids
20:59 dave_messina pyrimidine - is that you Chris?
20:59 * pyrimidine shakes fist: 'git off my lawn you crazy kids'
20:59 pyrimidine dave_messina: yes
21:00 dave_messina jay - ha! I know...
21:00 dave_messina So maybe you guys know the answer to this:
21:00 pyrimidine 42
21:00 dave_messina :)
21:00 jhannah BioPerl is the answer, silly.
21:00 dave_messina Anyone have experience with Amazon EC2 instances?
21:01 jhannah I've diddled them slightly.
21:01 jhannah never for a real project
21:01 pyrimidine same here
21:01 pyrimidine though I've used our local cloud cluster a bit more
21:01 * jhannah has wracked up $7 in EC2 charges over the last year or so   :)
21:02 pyrimidine mark is def. the one to talk to
21:02 dave_messina Ah, ok. I'll bug him then.
21:03 dave_messina I'm wondering if I can add stuff to an existing machine image, say the BioLinux image, and then make a new image from that.
21:03 jhannah yes
21:04 dave_messina Aha, cool. I want to throw a couple big DBs  on there, like nr, instead of having to mess with mounting EBS volumes with nr on them.
21:04 jhannah uh, your DBs should probably be in EC2 storage, not in an image...?
21:05 jhannah um. I should probably shut up now since this is not at all my expertise
21:05 dave_messina heh heh. :)
21:05 * pyrimidine wonders what the upload cost would be to get nr onto EC2...
21:05 jhannah I'd try to keep the server image just OS + BioPerl + other little tools if I were you
21:05 jhannah s/little/super-awesome/
21:05 dave_messina All uploading is free until June 30th.
21:06 dave_messina and in any case it's not that expensive.
21:06 pyrimidine definitely take advantage of that
21:07 dave_messina Jay, I'd tend to agree, and indeed that's how the BioLinux image is set up.
21:07 jhannah keep in mind that after it's not free you'll still want to push new NRs occasionally
21:07 * jhannah wonders what the max total image size is
21:07 dave_messina True, but since you can incrementally update nr, it won't be too bad.
21:08 * jhannah rubs rbuels' chin
21:08 dave_messina I think max is 1 TB right now.
21:08 jhannah really?? wow
21:08 pyrimidine how big is nr getting these days?
21:09 dave_messina uncompressed nr is 8 GB, but that excludes environmental nr, which is another 6 GB or so.
21:09 dave_messina that's the database only, not the raw fasta
21:10 pyrimidine ok
21:11 dave_messina NCBI BLAST database, that is.
21:11 jhannah "The maximum size of an image is 10240MB."
21:11 jhannah wow
21:12 jhannah oh, err... right.  so if   http://developer.amazonwebservices.co​m/connect/entry.jspa?externalID=1145   can be believed then you might not fit NR + OS + tools on a single image
21:12 jhannah 10 GB considerably < 1 TB
21:13 dave_messina hmm. So 1 TB must be max size of an EBS (virtual hard disk) only.
21:14 jhannah image vs instance is a very critical distinction  :)
21:15 * pyrimidine would be classified as a newbie on that page
21:15 * jhannah wins then  :)
21:15 pyrimidine \o/
21:15 jhannah 'cause what I know would take many minutes to learn   woot!   job security
21:16 jhannah an image is a thing that has your OS + tools. If you need 10 servers you launch 10 instances of that 1 image
21:16 jhannah damn, pyrimidine just caught up with me  :(
21:16 pyrimidine yep
21:17 dave_messina so...guess I'll have to do it the hard way: make an S3 bucket containing nr, and then copy it over to the instance each time.  Their transfer rates are pretty decent, though, so it shouldn't take more than about 15 minutes. (well below compute time)
21:17 jhannah and you typically pay for one of their storages as a totally separate matter, so all 10 instances all read/write the same datasource
21:18 jhannah i think their intent is that you pay for a storage solution for the nr, then use it from your instances. whether or not it would actually be cheaper to do it some other way I don't know
21:19 pyrimidine I'm wondering if this is one of the main reasons we haven't seen much movement to using cloud yet
21:19 * pyrimidine stresses the 'yet'
21:19 * jhannah wonders if there is any system for sharing a single storage across *different customers* instances -- so you could just pay a library card style fee and someone else could keep NR up to date for you
21:19 jhannah no sense in everyone and their mom worrying about incremental nr updates
21:20 dave_messina Yep, and I'm happy to pay for storing nr, but I don't want to have it copy it around. I want to have read-only access to it from multiple instances concurrently.
21:20 dave_messina AFAIK you can't do real shared storage on EC2 yet.
21:20 pyrimidine well, there is some precedent for that.  ensembl has data up, but I don't think it's shared.
21:20 pyrimidine what dave said
21:20 jhannah dave_messina: ? oh, sure you can.... you get your storage set up once, and all 100 of your instances can use it instantly
21:21 jhannah they mount it, if you want to access it filesystem-style
21:21 dave_messina wait, how?
21:21 dave_messina are you talking about mounting an S3 bucket on an instance?
21:21 jhannah you put the trucks in the series of tubes, then your CC is billed!
21:22 jhannah :)   um, S3? looking
21:22 jhannah I haven't actually done this mount mind, you. I'm parroting a presentation I saw
21:22 dave_messina ah, gotcha.
21:22 * jhannah logs into the management thingy
21:22 dave_messina S3 is their Simple Shared Storage.
21:23 jhannah can you mount S3?
21:23 pyrimidine I don't think so.
21:23 dave_messina that's what I'm asking. :) as far as I can tell, no.
21:23 * pyrimidine sees a lot about data transfer
21:24 jhannah Attachment Information:
21:24 jhannah i-1f124577:/dev/sda1 (attached)
21:24 jhannah Elastic Block Store - 5GB by default
21:24 jhannah mounts to your instances
21:24 jhannah EBS
21:24 spekki01 Anyone encounter any issues using bioperls esearch, im trying to submit a large list of accession numbers ~700 and get a gi number for each. But when i do this it doesnt give me back a corresponding amount of gi numbers?
21:25 jhannah spekki01: pinpoint one that's missing for us? then nopaste your code?
21:25 pyrimidine spekki01: that's not unusual.  You should use correspondence to map those across
21:25 jhannah dave_messina: so you pay like $0.73 a month or something for the EBS to exist + transfer fees
21:25 jhannah and all your instances mount that EBS
21:25 jhannah or I'm drunk
21:26 jhannah pretty sure I, personally, have done this with happy results
21:26 pyrimidine that sounds more like our local cloud
21:26 dave_messina Any of your instances *can* mount the EBS, but not more than one at a time.
21:26 dave_messina I think.
21:26 jhannah really? firing up 2 now
21:27 jhannah If I teach dave_messina something I can die a happy man
21:27 dave_messina if I can still learn something new, so can I.
21:28 dave_messina Jay, are you doing this via the website management control thingy? Or on the command line?
21:28 * pyrimidine thinking I may just have to set up something myself on EC2...
21:29 pyrimidine spekki01: http://www.bioperl.org/wiki/HOWTO:E​Utilities_Cookbook#Get_accessions_.28as_well_as_other_inform​ation.29_for_a_list_of_GIs
21:29 pyrimidine man, we need a tinyurl bot
21:31 pyrimidine I think all the cloud talk scared off spekki01
21:32 spekki01 no no im still here lol just getting my code together
21:32 spekki01 http://codepad.org/QGfHtmHd
21:32 dave_messina Chris, I just started on it today, and so far it's gone pretty smoothly. The BioLinux image has got most of the standard bioinformatics tools already on there, so I was able to do some test runs pretty quickly.
21:33 spekki01 so thats what im attempting to run with a small chunch of the accession numbers, but i can never seem to get the same amount of gi numbers as accession numbers
21:33 pyrimidine spekki01: yes, but I don't see any use of bioperl there
21:33 jhannah darn. I lost my EC2 keys apparenly
21:34 * pyrimidine always losing my keys. damn old age
21:35 spekki01 im in the process of rewriting it to http://www.bioperl.org/wiki​/HOWTO:EUtilities_Cookbook near the "esearch->esummary" example near the bottom
21:35 spekki01 like the*
21:49 jhannah dave_messina: oh dear. you were right. for shame, me
21:49 dave_messina Oh well. Thanks for looking into that for me, though.
21:50 jhannah what's the solution then?
21:50 jhannah ... you have to have a separate instance ... and NFS mount it yourself?
21:50 dave_messina I think it's this:
21:52 * dave_messina put nr in an S3 "bucket", since files on S3 buckets can be accessed by multiple instances simultaneously, just like a web server.
21:52 dave_messina But:
21:52 dave_messina You can't *mount* an S3 bucket. So each instance will have to copy nr over to its local disk.
21:53 dave_messina Now, with their transfer speeds it won't take too long, maybe 15 minutes. But still, sheesh.
21:53 jhannah looks like you CAN NFS mount if you want   http://developer.amazonwebservices.com/conn​ect/thread.jspa?messageID=92393&amp;#92393
21:54 dave_messina Right, but it's slow: http://dsl-wiki.cs.uchicago.edu/index.php/Perfor​mance_Comparison:Remote_Usage,_NFS,_S3-fuse,_EBS
21:54 jhannah do you need more than 1 instance?
21:56 dave_messina I think so, since I want to run 1 million seqs against nr.
21:56 dave_messina Now, one thing I'm contemplating is to use their high-cpu instances, so I can get like 20 cores on an instance.
21:56 * jhannah rubs rbuels' chin
21:57 pyrimidine dave_messina: that's essentially how we have to do cloud here
21:57 pyrimidine have you looked at cloudburst?
21:57 dave_messina no I haven't ... will do so, though. thanks.
21:58 pyrimidine though that's really shortread mapping
21:58 dave_messina hey jhannah is there a way to determine how much $$$ I'm spending in realtime?
22:00 jhannah um, not that I know of
22:01 jhannah I've seen some demos of services that sit *on top of* EC2 and manage EC2 for you. don't remember if they had crawls of real time $ bleed or not...
22:02 pyrimidine http://www.lbl.gov/cs/CSnew​s/Metagenomics_sidebar.html
22:03 pyrimidine "Although the team achieved scalable performance on all of the evaluated platforms, their results show that the cost of running BLAST-based codes on commercial cloud architectures increased significantly as they scaled up. This was primarily due to the premium cost associated with on-demand access."
22:04 pyrimidine Also, Amazon is apparently significantly more expensive than locally owner clusters
22:07 dave_messina ah great, thanks for the link. I'm a little surprised it's more expensive given that Amazon must have economies of scale. I'll have to go read the paper linked in that article and find out the details.
22:08 pyrimidine this is from a year ago, though.  Not sure how this compares now.
22:10 pyrimidine though the desktop I am using is dual-quad core with 48G RAM, cost ~ $4500 (and it's probably less than that now)
22:13 dave_messina wow, that's such a great price.
22:14 dave_messina I had access to a 96G RAM machine a few years ago, and I'm pretty sure its price tag was over 50K.
22:29 * pyrimidine heading home
22:29 dave_messina bedtime for me. Good talking with you guys....
22:29 pyrimidine dave_messina: o/
22:29 pyrimidine left #bioperl
22:42 * jhannah plays his lonely harmonica
23:54 jhannah heehee... installing perl 5.12.0 on a 2000 node cluster  :)

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary