Time |
Nick |
Message |
05:17 |
|
egonw joined #bioclipse |
06:53 |
|
Gpox joined #bioclipse |
07:16 |
|
egonw_ joined #bioclipse |
07:59 |
egonw |
@tell jonalv http://chem-bla-ics.blogspot.com/2009/10/processing-chebi-mdl-sd-file-with-cdk.html |
07:59 |
zarah |
egonw's link is also http://tinyurl.com/yea5vv4 |
07:59 |
zarah |
Consider it noted. |
07:59 |
egonw |
@tell olas http://chem-bla-ics.blogspot.com/2009/10/processing-chebi-mdl-sd-file-with-cdk.html |
07:59 |
zarah |
egonw's link is also http://tinyurl.com/yea5vv4 |
07:59 |
zarah |
Consider it noted. |
07:59 |
egonw |
@tell masak can I do something like '@tell olas, jonalv FOO' ? if not, please consider this a feature request |
07:59 |
zarah |
Consider it noted. |
08:00 |
egonw |
@tell masak exact syntax does not matter |
08:00 |
zarah |
Consider it noted. |
08:14 |
|
shk3 joined #bioclipse |
08:25 |
|
egonw_ joined #bioclipse |
09:18 |
|
olass joined #bioclipse |
09:27 |
* egonw |
is at home |
09:28 |
egonw |
unplanned, but was not feeling well yesterday afternoon and evening |
09:28 |
egonw |
moreover, late at home (22:15 or so) |
09:32 |
* egonw |
profiling SDF property reading... |
09:32 |
egonw |
see my blog |
09:59 |
|
masak joined #bioclipse |
11:03 |
|
samuell joined #bioclipse |
11:11 |
|
olass joined #bioclipse |
11:22 |
egonw |
olass: confirmed that reading the metadata is the bottleneck |
11:22 |
olass |
what metadata is it? |
11:22 |
egonw |
99% of the time is spent on reading the SD fields |
11:22 |
olass |
why does this take so much time? |
11:23 |
egonw |
very much metadata |
11:23 |
egonw |
water |
11:23 |
egonw |
that has a lot of information |
11:23 |
egonw |
links to whatever... |
11:23 |
egonw |
it's the String building really |
11:23 |
egonw |
just very, very much String building |
11:23 |
olass |
ok, using StringBuffer I hope? |
11:24 |
egonw |
internally it always is... |
11:24 |
egonw |
but, gonna do some tuning now... |
11:24 |
olass |
I see |
11:24 |
egonw |
should be possible |
11:24 |
egonw |
since that has never been done |
11:24 |
olass |
maybe we should stick to CHebi_lite,sdf for testing? |
11:24 |
olass |
without the metadata? |
11:26 |
egonw |
:) |
11:26 |
egonw |
yes, was thinking that too :) |
11:26 |
egonw |
but then say (like Lilly Allen is now singing on the radio) to the user... F**k you very, very much ? |
11:26 |
Gpox |
use StringBuilder if thread safety is not needed |
11:26 |
egonw |
just use a lite SD file, not the heavy one you are using... |
11:26 |
egonw |
anyways... |
11:27 |
olass |
yes, agreed |
11:27 |
olass |
it does not solve the problem |
11:27 |
egonw |
made the use of stringbuffer explicit now... |
11:27 |
egonw |
let's see what boost that gives |
11:28 |
olass |
Gpox: What is StringBuilder? Better than StringBuffer? |
11:29 |
masak |
differ in thread safety, methinks. |
11:29 |
olass |
aha |
11:30 |
masak |
I never remember which is which, though :) |
11:30 |
Gpox |
and speed |
11:32 |
olass |
Gpox, egonw, masak: Don't forget to push what you have for the devel release tomorrow. I will make the 2.2.x branch at that time (not today) |
11:32 |
olass |
so tomorrow noon I guess |
11:33 |
masak |
I will be pushing this afternoon. |
11:33 |
olass |
\o/ |
11:38 |
egonw |
I'm pushing the JCP updates tonite or tomorrow early in the morning... |
11:41 |
egonw |
masak: hahahaha tvimter ? |
11:47 |
masak |
TwitVim, apparently. |
11:55 |
egonw |
using stringbuilder explicitly fixes the problem |
11:55 |
egonw |
working on patches for 2.0 and 2.2 |
11:55 |
egonw |
the improve is incredible |
11:56 |
egonw |
mind blowing |
11:56 |
egonw |
but I don't see any diff between SBuilder and SBuffer |
11:59 |
egonw |
but I can live with the theory and will use SBuilder |
12:10 |
|
samuell joined #bioclipse |
12:16 |
masak |
egonw: StringBuilder: "A mutable sequence of characters.". StringBuffer: "A thread-safe, mutable sequence of characters." -- http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuffer.html |
12:16 |
zarah |
masak's link is also http://tinyurl.com/7fve4 |
12:16 |
masak |
so StringBuffer is the threadsafe one. rule of thumb: the StringBuffer gives you a 'buffer' of safety against thread problems. |
12:17 |
egonw |
sure |
12:17 |
egonw |
that's what it says... read that too |
12:17 |
egonw |
the builder however, did not really show to be significantly faster |
12:19 |
masak |
as long as the variable is local, you can use any which one you want, I guess. it's when it's a field or otherwise shared that it should be a StringBuffer. |
12:19 |
egonw |
yes, I know |
12:19 |
egonw |
I know how threading works... |
12:19 |
egonw |
that was never the point |
12:20 |
masak |
I assumed you knew. I'm just thinking out loud. :) |
12:20 |
egonw |
ah, ok |
12:30 |
|
edrin joined #bioclipse |
12:46 |
egonw |
Gpox: ping |
12:46 |
egonw |
olass: ping |
12:46 |
Gpox |
egonw: pong |
12:46 |
olass |
egonw: pong |
12:47 |
egonw |
Gpox, olass: the CDK code was slow with parsing to SD file |
12:47 |
egonw |
but... |
12:47 |
egonw |
(stupid me) |
12:47 |
egonw |
the mol table is *not* using the CDK code |
12:47 |
egonw |
Gpox: and while you are using a StringBuilder |
12:47 |
egonw |
not buffering the input when doing things char by char |
12:47 |
egonw |
makes it horribly slow too |
12:48 |
egonw |
Gpox: MoleculeTableManager lines 406-423 |
12:49 |
egonw |
but you are using the BufferedIS... |
12:50 |
egonw |
Gpox: I can try to pinpoint where most time is used... |
12:51 |
* egonw |
is annoyed he forgot that Bioclipse is using it's own SD file parser :( |
12:52 |
egonw |
Gpox: assuming you have unit tests... does the MoleculeTableManager have unit tests? |
12:52 |
Gpox |
it should be possible to rewrite it to read lines |
12:52 |
Gpox |
no it dosen't |
12:53 |
egonw |
I'll try to make it read lines... |
12:53 |
egonw |
no, you better do that... |
12:53 |
egonw |
I stumble on line 3... |
12:53 |
egonw |
what is start?? |
12:55 |
Gpox |
the start of the properties block in the SDfile |
12:56 |
egonw |
gonna leave this to you |
13:00 |
Gpox |
egonw: but getProperties(...) is only run once in a separate job iirc |
13:01 |
egonw |
well, it's not the parsing of the connection table that takes long... |
13:02 |
Gpox |
it dose use cdk SD file parser |
13:02 |
egonw |
where? |
13:03 |
Gpox |
SDFIndexEditorModel.getMolecule |
13:07 |
Gpox |
it should be possible to not pass it the properties section, the information is there to do that |
13:07 |
egonw |
but the MDLV2000Reader is not reading the data block |
13:08 |
egonw |
unless... |
13:08 |
egonw |
ok, found it... |
13:08 |
egonw |
another patch brewing... |
13:09 |
egonw |
testing |
13:10 |
egonw |
btw, no begging for google wave invites here? |
13:10 |
egonw |
#bioclipse++ |
13:11 |
egonw |
Gpox: OK, problem fixed |
13:14 |
CIA-51 |
bioclipse.cheminformatics: Egon Willighagen 2.0.x * r27cd147 / (2 files in 2 dirs): StringBuilder instead of += concatenation, boosting performance of SD file support - http://bit.ly/hVFrS |
13:15 |
CIA-51 |
bioclipse.cheminformatics: Egon Willighagen master * rd075a6e / (2 files in 2 dirs): StringBuilder instead of += concatenation, boosting performance of SD file support - http://bit.ly/yrHVD |
13:15 |
egonw |
olass, jonalv: please test |
13:28 |
|
masak joined #bioclipse |
13:28 |
|
mgerlich joined #bioclipse |
13:28 |
|
stain joined #bioclipse |
13:38 |
edrin |
egonw: did you know this: http://web.chemdoodle.com/overview.php ? |
13:38 |
zarah |
edrin's link is also http://tinyurl.com/ybmxp9s |
13:50 |
egonw |
edrin: yes |
13:51 |
egonw |
samuell: reading material for you: http://www.biomedcentral.com/1471-2105/10?issue=S10 |
13:51 |
zarah |
egonw's link is also http://tinyurl.com/y8pma8m |
13:52 |
samuell |
egonw: Thanks! |
14:05 |
egonw |
samuell: btw, you might find this one interesting too: http://esw.w3.org/topic/HCLSIG |
14:05 |
zarah |
egonw's link is also http://tinyurl.com/y8ugdvt |
14:06 |
samuell |
egonw: Yep, added to bookmarks. thx. |
14:09 |
CIA-51 |
bioclipse.cheminformatics: Egon Willighagen 2.0.x * r30bbc30 / plugins/net.bioclipse.cdk.ui/src/net/bioclipse/cdk/ui/wizards/NewFromSMILESWizard.java : Added note that it saves in CML format (clarifies #1626) - http://bit.ly/3eMEcP |
14:32 |
masak |
vim++ |
15:22 |
* egonw |
is updating bc2.2 with the latest CDK+JCP-Prim |
15:23 |
egonw |
but running into a lot of trouble never seen with eclipse 3.4 |
15:25 |
olass |
egonw: your fix to SDF reader was a real performance boost! |
15:25 |
olass |
egonw++ |
15:25 |
egonw |
and a few repeated refreshes makes all problems disappear like a summer in sweden |
15:25 |
egonw |
olass: yes, it apparently was a big bottleneck |
15:26 |
* egonw |
is happy that OS/X is such a crappy OS, that the bottleneck showed up :) |
15:26 |
egonw |
hahahaha |
15:26 |
olass |
hrmf |
15:26 |
egonw |
seriously... |
15:27 |
egonw |
for 2.4 we should plan a YourKit-on-Plugin-Unit-Tests session |
15:27 |
olass |
yup |
15:27 |
olass |
sounds like a plan |
15:31 |
egonw |
olass: ping |
15:31 |
egonw |
Gpox is not around... |
15:31 |
olass |
egonw: I'd like you to close bugs reported as resolved fixed, for example 73, 794, 795, 1064, etc |
15:31 |
olass |
egonw: pong |
15:32 |
olass |
Gpox no |
15:32 |
egonw |
there are a few compile errors resulting from the update I am about to do... |
15:32 |
olass |
he left 15.30:ish |
15:32 |
egonw |
but Gpox will fix those |
15:32 |
egonw |
yes, I know |
15:32 |
olass |
ok |
15:32 |
egonw |
:) |
15:32 |
olass |
so cdk will not compile from now until he fixes those? |
15:32 |
egonw |
so, I will update, but that will leave the repos slightly broken... |
15:32 |
olass |
fine with me |
15:32 |
egonw |
no, just the JCP part |
15:32 |
olass |
thx for the pointer |
15:33 |
olass |
ok |
15:33 |
egonw |
but he needs me to make this commit to proceed |
15:33 |
egonw |
I'm sure he and I will resolve it tomorrow |
15:33 |
olass |
yup, I know |
15:33 |
olass |
[17:31] < olass> egonw: I'd like you to close bugs reported as resolved fixed, for example 73, 794, 795, 1064, etc |
15:33 |
egonw |
I have a very long list... |
15:33 |
olass |
egonw: they donät show up in your queries? |
15:33 |
egonw |
I'll see if I can find some time tomorrow to do some admin stuff |
15:33 |
olass |
would be appreciated, they litter my lists |
15:34 |
egonw |
I skipped Hierta, btw |
15:34 |
olass |
me too :( |
15:34 |
olass |
no time |
15:34 |
egonw |
could not find the energy to push me once more... |
15:34 |
olass |
that's life |
15:35 |
* samuell |
has to leave for a couple of hours. bbl |
15:35 |
olass |
bye samuell |
15:35 |
samuell |
bye |
15:35 |
egonw |
bye |
15:35 |
|
samuell left #bioclipse |
15:39 |
CIA-51 |
bioclipse.cheminformatics: Egon Willighagen master * r5f0923d / (168 files in 99 dirs): Pushed in a new CDK 1.3.0.+ version plus updated JChemPaint-Primary - http://bit.ly/GMoL4 |
15:40 |
CIA-51 |
bioclipse.rdf: Egon Willighagen master * r5035b4c / plugins/net.bioclipse.rdf/src/net/bioclipse/rdf/business/IRDFManager.java : Added API for downloading RDFa - http://bit.ly/ciAxu |
15:40 |
CIA-51 |
bioclipse.rdf: Egon Willighagen master * rba4ee45 / plugins/net.bioclipse.rdf/src/net/bioclipse/rdf/business/RDFManager.java : Implemented a cheap importRDFa method, using the W3C webservice - http://bit.ly/11haz6 |
15:52 |
|
mgerlich joined #bioclipse |
16:29 |
|
stain joined #bioclipse |
16:41 |
|
edrin left #bioclipse |
19:57 |
|
samuell joined #bioclipse |
20:29 |
|
olass joined #bioclipse |