Camelia, the Perl 6 bug

IRC log for #cdk, 2012-10-16

| Channels | #cdk index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
05:28 sneumann joined #cdk
05:29 jbrefort joined #cdk
06:13 konditorn joined #cdk
07:11 Gpox joined #cdk
07:24 jonalv joined #cdk
07:50 konditorn_ joined #cdk
07:58 konditorn_ joined #cdk
07:58 egonw joined #cdk
08:40 konditorn joined #cdk
10:32 egonw joined #cdk
17:03 sneumann joined #cdk
17:25 birchsport joined #cdk
17:25 birchsport Is there a way with CDK to output SMARTS?
18:15 jbrefort joined #cdk
19:25 egonw hi birchsport
19:25 egonw sorry, I have been really busy these days...
19:25 egonw did you get a good answer on the ML?
19:25 egonw I was just about to close down my laptop...
19:25 birchsport yes, it was no
19:27 egonw if you end up with a substructure, the matching SMILES should be OK?
19:27 egonw that is, the branching should still be there...
19:28 birchsport Can you point to any decent example of this?  I have not been able to decipher how to go about doing this
19:28 birchsport I will keep looking though
19:28 egonw only thing I can think of right now... is that 'C' in SMILES is not entirely like
19:28 egonw 'C' in SMARTS...
19:28 egonw wrt to implicit hydrogens...
19:28 egonw output a SMILES for a substructure?
19:29 egonw what is the output of your fragmentation code?
19:29 egonw a list of atom numbers, or a List<IAtomContainer> ?
19:29 egonw oh...
19:29 birchsport we have a method for fragmenting the code that generates a custom representation that has enough information to, in theory, recreate the SMARTS
19:29 birchsport it does not use CDK…..it i custom c++
19:30 egonw how many fragments do you get per molecules, and do I understand correctly that you may have to merge fragments?
19:30 egonw how many fragments do you get per molecules, and do I understand correctly that you may have to merge fragments for a single mol?
19:30 birchsport well, for aniline for instance, we get 12 that are between the size of 3 and 7….and the fragments are not linear as they can branch
19:30 egonw if your C++ code generates atom numbers, that can be used as input in the CDK...
19:31 egonw and you want one SMARTS for each of those 12 substructure?
19:31 egonw and you want one SMARTS for each of those 12 substructures?
19:31 birchsport yes
19:31 egonw ah, OK...
19:31 egonw then I suggest to output the atom numbers, perhaps like this:
19:31 egonw 1,2,3,5
19:32 egonw 5,6,7
19:32 egonw 8,9,1,2,3,4,5,6
19:32 egonw that is...
19:32 egonw one fragment per line...
19:32 egonw where the numbers indicate the atom in the input file...
19:32 egonw then you can use this approach:
19:32 egonw 1. read the input file (MDL or SMILES)
19:33 egonw 2. for each atom number list line, do
19:33 egonw 2.1. create an IAtomContainer with just those selected atoms, and bonds between them
19:33 egonw 2.2. create a SMILES for that IAtomContainer
19:34 egonw that SMILES will handle the branching properly, and be a valid SMARTS query
19:34 birchsport that makes sense, but since our fragments are not linear, I have yet to see a smiles representation properly output branches like 123(4)5
19:35 birchsport where 4 and 5 branch from 3
19:35 birchsport or ccc(N)C
19:35 egonw that is why you start with the original structure as input...
19:35 egonw that has the proper connectivity...
19:35 egonw the SMILES generator will use the bonds, and knows how to handle branching
19:36 birchsport ok, let me try that
19:36 birchsport thanks for all the help!
19:36 egonw my atom numbering was not intended to imply linear fragments :)
19:36 egonw good luck!
19:37 egonw what I can recommend, it to send an example structure (MDL molfile, or SMILES) to the list, with matching atom indices you get from your algorithm...
19:37 egonw some example data makes it a lot easier for people to write some example code
19:37 egonw OK, I am going offline now...
19:38 egonw I'll check the ML tomorro
19:38 egonw I'll check the ML tomorrow
19:38 egonw bye

| Channels | #cdk index | Today | | Search | Google Search | Plain-Text | summary