Camelia, the Perl 6 bug

IRC log for #cdk, 2010-06-01

| Channels | #cdk index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
04:46 CIA-47 joined #cdk
05:12 sneumann_ joined #cdk
06:16 bag_ joined #cdk
06:27 sneumann_ joined #cdk
06:57 Gpox joined #cdk
06:58 sneumann joined #cdk
08:53 egonw joined #cdk
09:10 jbrefort joined #cdk
09:16 mgerlich Hi, could someone help me with a mol file problem?
09:17 mgerlich the following mol file raises a NumberFormatException when I try to read it with MDLV2000Reader
09:17 mgerlich http://www.massbank.jp/jsp/Dispatcher.jsp​?type=mol&query=2ALPHA-ALLYL-2BETA-FO​RMYL-11ALPHA-METHOXY-6,6,8BETA-TRIMETHYL-​5ALPHA-TRICYCLO%286.3.0.0%281,5%29%29UNDE​CANE&qtype=n&otype=m&site=9
09:17 zarah mgerlich's link is also http://tinyurl.com/3afhmg5
09:18 mgerlich this is my code to read the file (after its stored on the local filesystem) and create an Atomcontainer for it:
09:18 mgerlich FileInputStream fis = null;
09:18 mgerlich try {
09:18 mgerlich fis = new FileInputStream(f);
09:18 mgerlich } catch (FileNotFoundException e1) {
09:18 mgerlich System.err.println("File not found - " + f);
09:18 mgerlich return null;
09:18 mgerlich }
09:18 mgerlich InputStream is = fis;
09:18 mgerlich MDLV2000Reader reader = new MDLV2000Reader(is);
09:18 mgerlich IChemFile chemFile = new ChemFile();
09:18 mgerlich IAtomContainer container = null;
09:18 mgerlich try {
09:18 mgerlich chemFile = (IChemFile) reader.read(chemFile);
09:18 mgerlich container = ChemFileManipulator.getAllAto​mContainers(chemFile).get(0);
09:18 mgerlich // remove hydrogens
09:19 mgerlich /container = AtomContainerManipulator.r​emoveHydrogens(container);
09:19 mgerlich return container;
09:19 mgerlich } catch (java.lang.NumberFormatException e) {
09:19 mgerlich System.err.println("NumberFormatException occured while parsing mol file - " + f);
09:19 mgerlich return null;
09:19 mgerlich } catch (CDKException e) {
09:19 mgerlich System.err.println("CDKException occured for mol file - " + f);
09:19 mgerlich return null;
09:19 mgerlich }
09:19 mgerlich it seems that the NumberFormatExcetpion is thrown during the reader.read and I'm unable to catch it with the surrounding clause - any hints for that?
09:33 egonw that would be weird
09:39 sneumann mgerlich is just out for lunch
10:03 mgerlich hi i'm back
10:35 egonw joined #cdk
12:07 egonw joined #cdk
12:16 egonw mgerlich: I'm looking at the file now...
12:22 mgerlich hi egonw - thanks the effort
12:37 jbrefort joined #cdk
13:14 carsten joined #cdk
13:59 egonw mgerlich: sorry... being very multitasking right now...
13:59 egonw will get back on it asap
14:00 mgerlich no problem :)
14:21 mgerlich @egonw: a test to convert this mol file into smiles via the babel tool also failed -> *** Open Babel Warning  in ReadMolecule
14:21 mgerlich WARNING: Problems reading a MDL file
14:21 mgerlich Invalid bond specification, atom numbers or bond order are wrong.
14:22 egonw checking
14:22 egonw that bit seems ok
14:22 egonw 22 atoms
14:22 egonw but...
14:22 egonw the header does have too many lines
14:23 egonw the " 22 24" line is supposed to be the fourth one... not the fifth
14:23 egonw try removing the first two lines:
14:23 egonw 1
14:23 egonw 51
14:26 mgerlich I tried removing these, leaving the third line as the first one - but this also raised the error
14:27 mgerlich assuming the complete mol-content is held insinde the string mol: Pattern p = Pattern.compile("[0-9]+\n[0-9]+\n");
14:27 mgerlich mol = p.matcher(mol).replaceFirst("").trim();
14:29 egonw I'll write up a blog post on how to use the CDK for MDL molfile validation...
14:43 egonw mmm... I already did: http://chem-bla-ics.blogspot.com/sea​rch?q=IChemObjectReaderErrorHandler
14:43 zarah egonw's link is also http://tinyurl.com/2f92jw9
14:51 egonw mgerlich: http://gist.github.com/421022
14:52 egonw mgerlich: I think it is safe to say the MassBank file is broken
14:54 mgerlich ah - always had a feeling like this ^^
14:54 egonw after removing the two first files, I do not see further problems...
14:54 egonw perhaps line length... not sure the CDK MDLV2000Reader checks that right now...
14:55 egonw seems not
14:55 mgerlich you mean the first two lines?
14:56 egonw yes :)
14:57 mgerlich ^^
16:52 jbrefort joined #cdk
17:34 sneumann joined #cdk
20:50 bag_ joined #cdk
23:17 CIA-47 cdk: Mark Rynbeek master * red510bf / src/main/org/openscience/cd​k/io/RGroupQueryWriter.java :
23:17 CIA-47 cdk: Fix for character spacing for "APO" line in RGFile output
23:17 CIA-47 cdk: Signed-off-by: Rajarshi Guha <rajarshi.guha@gmail.com> - http://bit.ly/cJgb0X

| Channels | #cdk index | Today | | Search | Google Search | Plain-Text | summary