Camelia, the Perl 6 bug

IRC log for #bioperl, 2013-03-28

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:10 scottcain joined #bioperl
01:31 scottcain joined #bioperl
01:59 scottcain_ joined #bioperl
02:29 scottcain joined #bioperl
07:27 bbb_ joined #bioperl
08:37 j_wright joined #bioperl
09:01 carandraug joined #bioperl
12:55 Mithaldu deafferret: nope. it's likely too late now in any case, but i never did work that out
12:56 Mithaldu i strongly suggest you add a warning though that those modules, even if they look like streamers, are not memory-constant
13:54 carandraug looking at t/Annotation/Annotation.t, I see that it tests whether Bio::SimpleAlign or Bio::Cluster::UniGene return a Bio::Annotation object. Shouldn't those tests belong to the align and unigene tests instead?
14:52 github [bioperl-live] bosborne pushed 2 new commits to master: https://github.com/bioperl/bioperl-liv​e/compare/98053e9a2b2e...2cd1789795e4
14:52 github bioperl-live/master 8a95e51 Brian Osborne: Add dna_to_aa_aln method and tests
14:52 github bioperl-live/master 2cd1789 Brian Osborne: Merge branch 'master' of github.com:bioperl/bioperl-live
15:52 deafferret Mithaldu: add what warnings to what now? sounds like you've got a good idea. you should patch that into there. :)
15:52 deafferret carandraug: probably? sounds like you know better than me.
15:53 Mithaldu 13-03-26@12:16:41 (Mithaldu) hey guys, got someone in #perl-help on irc.perl.org who has this script running into "Out of memory!" : http://paste.scsys.co.uk/237067
15:53 Mithaldu 13-03-26@12:16:57 (Mithaldu) any of you maybe got a clue whether there's a better way to do this?
15:54 Mithaldu deafferret: the issue here is that the dude who brought the code example thought it would stream
15:54 Mithaldu but instead the modules kept storing some data in the background, so as a stream this was quite leaky
15:55 deafferret do the docs claim it streams?
15:56 Mithaldu i haven't looked, but i'm fairly sure the docs neither deny nor confirm ;)
15:56 Mithaldu also see: duck-typing
15:57 deafferret -nod- I'd assume "does not stream" unless the docs claim it does -- since not streaming is easier to write :)
15:57 * deafferret assumes
15:57 Mithaldu you assume because you're familiar with the internals
15:57 Mithaldu or at least more familiar than J. Random Bioinf who turns up in #perl-help :)
15:58 deafferret no, just speaking in general, it is often easier to write source code that slurps everything in and then parses -- which is not a problem until your inputs are huge and then you go "whoops" and re-write it to stream
15:58 Mithaldu well, uh
15:58 Mithaldu did you look at the code?
15:59 deafferret nope
15:59 Mithaldu please do
15:59 deafferret k, looking
16:00 * deafferret grumbles that lib/ doesn't exist, for the 4000th time
16:00 deafferret rbuels: ^ didn't you threaten to fix that 2 years ago? :)
16:00 Mithaldu i meant the sample i pasted
16:00 Mithaldu not bioperl :)
16:00 deafferret oh ya, I looked at that. the question is does Bio::AlignIO::fasta stream, right?
16:01 Mithaldu actually, you're right
16:01 Mithaldu looking at it, it is kind of reasonable to assume that $in would slurp the whole file
16:01 * deafferret disagrees
16:01 Mithaldu i guess it needs a section to the effect of "What to do if i have a bazillion gigabyte files to process?!"
16:02 deafferret the question is whether ->next_aln() slurps the whole file, right?
16:02 Mithaldu deafferret: what exactly are you disagreeing with? :o
16:02 deafferret so now we're in _readline() in fasta.pm...
16:02 Mithaldu there are two issues here
16:02 Mithaldu #1 - will the memory used be the same after next_aln as before next_aln
16:02 deafferret new() doesn't HAVE TO slurp anything. but it might
16:03 Mithaldu #2 - will the memory used be the same after write_aln as before write_aln
16:03 deafferret we're talking about 1000 small alignments? or single MASSIVE alignments?
16:03 Mithaldu a million small ones
16:04 Mithaldu as observed by the dude who made that code memory rises constantly with that code
16:04 deafferret k, then IF _readline() is steamy we should be OK, right? unless there's a huge leak
16:04 deafferret streamy
16:04 Mithaldu yeah
16:05 deafferret k. so... Root/IO.pm _readline() is ignorant of Bio* formattyness
16:06 Mithaldu i haven't a single clue what that means :)
16:06 deafferret huh. next_aln() claims "Function: returns the next alignment in the stream."
16:06 Mithaldu huh, so the docs actually do confirm it
16:07 Mithaldu keep in mind it's also possible that write_aln might be leaky
16:07 deafferret really? i find that less plausible :)
16:07 deafferret just guessing :)
16:07 Mithaldu well, you know the code
16:07 Mithaldu i'm just being openminded
16:07 * deafferret knows nothing :)
16:08 deafferret well, I know a few things about how perl works. other than that I know nothing :)
16:09 Mithaldu you're only op in #bioperl, huh? :P
16:09 deafferret so glancing at the guts https://github.com/bioperl/bioperl-liv​e/blob/master/Bio/AlignIO/fasta.pm#L80
16:09 Mithaldu this is bat country!
16:10 deafferret looks to me like next_aln() slurps the entire file, so the doc is lying
16:10 Mithaldu hm
16:10 deafferret ... errr
16:10 deafferret ... wait, in the case of .fasta, one file can only have one alignment...? true or false?
16:10 * Mithaldu is not bio informatics
16:11 Mithaldu i'm just obsessed with fixing problems :)
16:11 deafferret hmm... looks to me like this code assumes that any given .fasta file can ONLY be ONE alignment. If true, then next_aln() slurps the entire file, which is a single piece "stream" :)
16:12 deafferret so technically "stream" means ONE, and it's working as documented.  :)
16:12 deafferret agree / disagree?
16:12 Mithaldu would there be an alignment that's 1.3 gigabytes?
16:12 Mithaldu deafferret: possible!
16:12 deafferret you'd have to ask crazy person working with that alignment...?
16:13 deafferret nothing is impossible when biology and grad students exist ;)
16:13 Mithaldu huh, so it can be possible
16:13 Mithaldu interesting
16:13 deafferret i mean -- think about it -- next_aln() says "GIVE ME THAT WHOLE 1.3GB ALIGNMENT", right?
16:14 deafferret so ... you better have A LOT of RAM -- since the objects are gonna be 10X bigger (or 100X or 1000X) than the ASCII file
16:14 Mithaldu yeah, that would be a reasonable problem
16:14 deafferret but realistically I've never seen any alignment over 40K or so, myself
16:14 Mithaldu asking the dude would be difficult
16:14 Mithaldu i lack the time right now, but i can check my logs later on and see if i can find him
16:14 deafferret dude: so, what are you smoking, exactly?
16:14 deafferret :)
16:14 deafferret Mithaldu++ # fighting the good fight
16:15 carandraug deafferret, well, I just pushed the change that splits the tests so I better be right :p
16:15 Mithaldu cheers :)
16:15 Mithaldu also, well
16:15 deafferret carandraug: you seem reasonable and sober today :)
16:15 Mithaldu having seen the perl book for bio informatics people, i expect ALL of them to be smoking all kinds of things
16:15 * deafferret sniffs carandraug's breath
16:15 carandraug deafferret, you suggesting that somedays I do not?
16:16 deafferret :)
16:17 carandraug deafferret, by the way, I'm trying to split bio-cluster into a separate distro now. Do you have an opinion on whether Bio/ClusterIO/dbsnp.pm and Bio/Cluster/SequenceFamily.pm should fit better on a Bio-variations and bio-seq dist?
16:18 deafferret carandraug: I don't have an opinion, sorry. I've never worked with those topics
16:23 carandraug deafferret, ok. I asked on the mailing list
16:27 carandraug deafferret, I frequently have fasta files of alignments with multiple sequences
16:28 deafferret carandraug: sure. but is there only one ALIGNMENT per .fasta alignment file?
16:29 carandraug deafferret, Mithaldu : it seems to me that the user should have used Bio::AlignIO::largemultifasta
16:30 carandraug deafferret, that would not be possible. How would you distinguish between alignments if there's more than one in the same file?
16:30 deafferret carandraug: how about like this?
16:30 deafferret ------------------------------------------------
16:30 deafferret ;)
16:31 deafferret I don't know what the spec allows :)
16:32 carandraug deafferret, are there any specs at all? It's just fasta format. I'm guessing technically you could have multiple aligns on the file and distinguish by their names, but those names should be used to identify the sequence, not the alignment
16:35 carandraug deafferret, ok, reading the specs, you cannot have more than one alignment per file. And all sequences must be of the same length
16:43 scottcain joined #bioperl
16:57 deafferret carandraug: ah, excellent. So, Mithaldu, looks like it's WorkingAsDocumented. Maybe largemultifasta is a good tip for $thatdude
16:57 carandraug deafferret, looking at the files, I don't understand why there's need for the code in these files. They should be using the code from Bio::SeqIO::fasta
16:58 deafferret need for what code in what files?
16:58 carandraug deafferret, code to read a fasta file
16:59 deafferret carandraug: I'm lost. We were exploring a complaint about a possible memory leak.
17:01 carandraug deafferret, Yes. And I'm exploring a separate thing, code duplication which means that one could possibly have to fix it in 3 different places. And since Bio::SeqIO::fasta is likely to have a lot more use than Bio::AlignIO::fasta, if the code only existed there this would likely already been fixed
17:02 deafferret what code is duplicated? (what subs in what files?)
17:03 deafferret code reduction is (almost) Always a Good Thing ;)
17:03 deafferret and often time consuming as hell, unfortunately :)
17:05 carandraug deafferret, Bio::AlignIO::fasta reads fasta file. Shouldn't that be job of Bio::SeqIO::fasta?
17:06 * deafferret looks
17:10 carandraug deafferret, the one in Bio::SeqIO::fasta seems much more complex, probably too accommodate more test cases
17:12 deafferret meh Bio::AlignIO::fasta / Bio::LocatableSeq / Bio::SeqIO::fasta
17:15 deafferret carandraug: ya, so I concur. I think Bio::AlignIO should use Bio::SeqIO::fasta, not _readline for itself
17:15 deafferret so I'd drop that assertion onto the mailing list and proceed unless / until someone objected
17:16 deafferret bioperl-- # not eating its own dog food
17:16 deafferret carandraug: patching Bio::AlignIO::fasta::next_aln() should be easy
17:17 carandraug deafferret, you think it's ok to run perltidy through the project?
17:17 deafferret no, i'm not a fan.
17:17 deafferret it'd create 99.9% white noise in the repo history, a big problem in Moose (for example)
17:18 deafferret carandraug: e.g. http://sartak.org/2013/03/reinstating-c​lass-mops-commit-history-in-moose.html
17:18 carandraug deafferret, what are a fun of? Having modules with hard tabs where in a space of 5 lines, a tab means 4 and 8 spaces?
17:19 deafferret what?
17:20 deafferret perltidy is great when everything is a huge mess. the code i've seen isn't a huge mess. so perltidy would just be creating TONS of white noise in the name of pedantry
17:21 deafferret plus lots of bioperl code is EASIER to read when NOT strictly consistent. $0.02
17:22 Mithaldu deafferret: waiting for my browser to boot
17:22 Mithaldu do the docs point towards largemultifasta or is that only discoverable by checking the module list?
17:24 deafferret Mithaldu: dunno. carandraug?
17:24 carandraug deafferret, look at Bio::AlignIO::fasta, between lines 98 and 121, the meaning of a tab changes 4 times
17:24 carandraug Mithaldu, I only found out by looking at the source
17:24 deafferret carandraug: so cleanse tabs in *that file*, not the whole project :)
17:25 Mithaldu carandraug: alright, i'll try to keep in the back of my mind that that needs a doc patch then
17:25 carandraug deafferret, which is what I have been doing. But I'm getting tired of doing that every time I open a bioperl file
17:26 Mithaldu deafferret: it is generally a better idea to normalize an entire repository at once
17:26 Mithaldu it does mean that when history-digging there's one huge thing, but it also means that before and after that point there is no noise
17:27 carandraug deafferret, plus, because I was doing it like that, I already let it slip a mistake on indentatino, which now means 2 commits to fix indentation instead of only 1
17:27 deafferret maybe just perltidy the tabs then? I'm afeared of a full perltidy sweep
17:27 Mithaldu deafferret: don't be afraid of perltidy :)
17:27 * deafferret shudders
17:27 Mithaldu if fear is the only thing holding you back then only you are holding yourself back ;)
17:28 carandraug not to mention the files that I reallyyyyy love with 3 space indentation. Those are much more difficult to change manually
17:28 deafferret I just hate (1) white noise mega-commits in history (2) perltidy being "correct" but making code less readable
17:28 deafferret <3 3 spaces :)
17:29 Mithaldu set up a perltidyrc for the project
17:29 carandraug deafferret, i wouldn't be afraid of (2). Code from perltidy reads very well. They even align the "="
17:29 Mithaldu also, why do you hate it when there's a single mega commit? :o
17:30 carandraug looking at it, it would appear that with the -ce -nolq flags looks best
17:30 deafferret Mithaldu: e.g. http://sartak.org/2013/03/reinstating-c​lass-mops-commit-history-in-moose.html
17:30 * deafferret shrugs
17:30 Mithaldu that's a completely different issue
17:30 Mithaldu in that case the history of an entire repo was reduced to one single commit
17:31 deafferret carandraug: that's what branches are for. perltidy everything in a perltidy branch and then send out a "this OK to merge?" to the mailing list
17:31 Mithaldu that IS someting i'd spew hellfire to prevent :)
17:31 deafferret Mithaldu: it is?  /me re-reads
17:31 Mithaldu make a few branches and let people bikeshed :P
17:31 Mithaldu that way you move the question from "do we?" to "whcih do we do?"
17:32 deafferret plus people can comment on specific bits easily in github.  github comments rock
17:32 Mithaldu they do
17:33 deafferret L81 "I freaking hate this change. BOO!"   ;)
17:35 deafferret Mithaldu: My point regarding that blog post stands -- whether it's perltidy or WHATEVER it is that commits 14K lines of code the problem when researching history is that *EVERYTHING* changed in that mega-commit. I despise this sort of white noise.
17:35 deafferret but hey, I'm just one dude, if the tabs were bugging me I'd argue the opposite point :)
17:35 Mithaldu deafferret: see, the thing is :)
17:35 deafferret the people working with the code get more sway than me bitching, hypothetically, from the side lines :)
17:35 Mithaldu perltidy commits are white noise
17:36 Mithaldu they only change formatting, not functionality
17:36 Mithaldu it's not even a refactoring
17:36 Mithaldu the code is literally the same, only the whitespace is different
17:36 Mithaldu so when researching history and you see a perltidy commit you can safely ignore its entirety :D
17:37 deafferret can git log ignore specific SHAs? that would be sweet
17:37 deafferret anyway, do whatever. I've been noisy enough. :)
17:38 deafferret decisions are made by the people show up. i'm sitting on the sidelines. :)
17:38 deafferret please do make sure the commit comment says 'perltidy'  :)
17:38 Mithaldu git log does have a whole bunch of filtering options, it might actually be able to do that
17:39 deafferret ya, I rarely review history anyway, dunno why I care
17:39 deafferret <--   #bikeshedder
17:39 Mithaldu :D
17:40 deafferret git blame is nice when 99% of the code isn't "perltidy!"  :p
17:41 Mithaldu just start blaming after the perltidy commit
17:42 carandraug deafferret, it's also nicer when commit messages are not "no time for typing message! SOX are winning"
17:43 deafferret :)
17:44 carandraug deafferret, by the way, the same problem happens on largemultifasta at least between lines 180 and 200
17:45 carandraug oh! Also between 153 and 158 of that same file
17:45 * deafferret grants carandraug his blessing to Do The Right Thing
17:59 scottcain joined #bioperl
18:01 github [bioperl-live] carandraug pushed 1 new commit to master: https://github.com/bioperl/bioperl-live/comm​it/0e0ddd3b24a2e62c837bad955b2e15484df8496f
18:01 github bioperl-live/master 0e0ddd3 Carnë Draug: maint: remove hard tabs and trailing whitespace
18:01 dnewkirk joined #bioperl
18:54 carandraug deafferret, tell me about bioperl moose
18:59 deafferret carandraug: https://github.com/cjfields/biome
19:10 carandraug deafferret, is it used in production anywhere?
19:11 deafferret carandraug: dunno.  py<TAB><TAB><TAB>
19:13 carandraug is that pyramidine?
19:13 deafferret ya, Chris Fields is pyWHATYOUJUSTSAID, but he's not in here right now
19:14 carandraug deafferret, ah! I did not knew pyramidine is Chris Fields. I acknowledge him on the paper that I'm writing now, but used his IRC nick
19:14 deafferret :)
19:15 carandraug I'm also acknowledging some other people from #perl using their nicks. I have never read that on a paper before
19:19 deafferret lol http://www.ohloh.net/p/bioperl
19:20 carandraug deafferret, what lol?
19:21 deafferret i'm generally amused by attempts to do high-level analysis on insanely complicated things, and then drawing conclusions like "105 years of effort"
19:22 carandraug deafferret, that's probably if bioperl was written by a single person. i think octave has a much larger number
19:22 carandraug deafferret, https://www.ohloh.net/p/octave
19:22 carandraug yup, 215 years
19:22 deafferret lol
19:24 deafferret lol http://www.ohloh.net/p/perl
19:25 carandraug deafferret, I specially enjoyed the perl is mostly written in perl
19:59 dnewkirk joined #bioperl
21:06 sl33v3 joined #bioperl
21:10 looper joined #bioperl
22:08 carandraug joined #bioperl
22:48 scottcain_ joined #bioperl

| Channels | #bioperl index | Today | | Search | Google Search | Plain-Text | summary