What is the Algorithm behind this structure proposal engine ?



In order to understand the algorithm behind, let's do a simple example:  Assume your spectrum consists of ONE, SINGLE line - a dublett at 160.0 ppm. What are possible structures to fulfill all constraints derived from the C-NMR spectrum ?

e.g.

1,2,4,5-Tetrazine          ( 161,0 ppm )
Formic acid, anhydride     ( 158,5 ppm )
1,2-diformyl hydrazine     ( 159,2 ppm )
Formyl hydrazine           ( 159,2 ppm )



Instead of taking the exact shiftvalue, the spectrum is translated into a PATTERN like: "In the typical lowfield heteroaromatic region there is a dublett, all other regions of the spectrum contain no signal." This is similar to Wolfgang Bremser's approach named SAHO-search - internally the SAHO-implementation as used as in CSEARCH is also used here.

This spectral information (=peaktable) is encoded into a 15-character 'Spectral Hash Key' which describes this spectral pattern.

The spectrum of acetone ( 29 Q, 206 S ) is described like: "There is one singulett in the typical CO-region and one quartett in the region around 30ppm, no other signals are available."

This description of the spectral pattern is again converted into a unique hash-code describing the URL of a webpage holding all structure proposals with an identical description of their spectra. The situation becomes more complicated from the fact, that lines cannot be exactly attributed to one single region in the spectrum - therefore for ONE spectrum up to 1,024 alternate spectral hash-keys can be constructed in this implementation. The structures are accessed as INChIKeys and the spectra as SpectralKeys allowing direct access to the corresponding webpage hosted on 'http://nmrpredict.orc.univie.ac.at'



How many structures are available ?   

70,000,000 Structures from the PUBCHEM-Compounds and PUBCHEM-Substances files have been taken, which corresponds to the downloads from November and December 2007. This set of structures includes also the structures from Chemspider deposited at PUBCHEM.


How many C-NMR spectra have been calculated ?

The number of calculated C-NMR spectra is somewhat smaller than the number of structures contained in the PUBCHEM-files. Inorganics, polymers and compounds exceeding the limits of CSEARCH have been automatically skipped (e.g. more than 99 carbons, or more than 63 oxygen, etc.)


How many structure - spectra pairs are available ?


2,963,385,376 Structure pairs - Spectral pairs are available. Roughly 3 billions.


What is the typical search time for 3 billions of structure-spectra pairs ?


The typical search-time for searching 3 billions of structure-spectra pairs is below 2 seconds, in most cases below 1 second. Any redundant information (same structure-spectrum pair) will be automatically removed during the search - this costs a few milliseconds of CPU-usage.


Which technology has been used for spectrum calculation ?

All spectra have been calculated using the NN-prediction engine of the CSEARCH-software enhanced by the auto-stereo recognition feature as implemented into the NMRPredict-program.


What happens during a search ?

Your peaklist is 'translated' into an URL pointing to the corresponding webpage on the 'NMRPredict'-Server. On this particular webpage adressed by your hash-code all structures from the PUBCHEM-collection are summarized, which have 'similar' NMR-spectra to your peaklist. The corresponding structures might give some hint about your unknown molecule. In cases where experimental data are available, the corresponding information derived from the CSEARCH, SPECINFO and NMRShiftDB collection is directly linked to the structures.


Is there a link to experimental NMR-data available ?

Yes - in cases where an experimental spectrum is available, it is linked via the InChIKey. The collections indexed are CSEARCH, SPECINFO and NMRShiftDB as well as the 'University of Mainz'-collection.


How to perform a search ?


Method 1:


Method 2:






Please keep in mind:







Page written by: Wolfgang.Robien(at)univie.ac.at on November 13th, 2007
Page online since: April 16th, 2008