WEB-Based CSEARCH Robot-Referee and Prediction Engine

Description of the CSEARCH Robot-Referee

What you have to do:

Draw a structure, assign chemical shift values (and optionally add unassigned lines) and send these data to a dedicated service-email address
Before the first usage your email-address should be validated – contact me at wolfgang.robien@univie.ac.at ; in case you forgot to do that, you will get the results back too, but you can't access them. Your results will be kept for 60 days so there is plenty of time to get the access data afterwards.

What you get back:

An email holding your structure together with the assigned chemical shift values ready for importing into Isisdraw/Symyxdraw. This allows you to transfer your data into your manuscript e.g. written with Word using copy/paste.
An email holding an URL together with the access data, where you can recall your report giving you details about the evaluation, the spectrum prediction and the classification process.

Input procedure:

Draw your structure using Peter Ertl's JME-Editor; in case you have already a MOLfile of your query, simply copy/paste the TEXT of the MOLfile into the window which opens when you click the link below the structure editor.
Fill your email-address, the name of the project and the identifier for your compound into the 3 boxes. All reports will be sent to this email-address.
Clicking „Continue“ opens the second page for entering the ASSIGNED chemical shiftvalues. It is necessary to assign as much lines as possible – even when the molecule is highly symmetric ! (e.g. for benzene 6-times the same shiftvalue should be entered). As an option the experimentally determined multiplicity (eg. From DEPT or ATP) can be entered, furthermore 'exchange-flags' can be specified.
Clicking „Continue“ opens the next form, where you can optionally enter additional UNASSIGNED lines – please DO NOT enter lines e.g. coming from the solvent, etc. - only lines belonging to your compound should be given / assign as many as you can ! Usually this from should remain empty.
Clicking „Continue“ opens a summary showing the structure with the assigned lines at the corresponding carbon-positions and a table of UNASSIGNED lines. If there is need for a correction of the input data you can step back – a detailed description of how to do that is given on this page. If you are satisfied with your input-data click the „Submit“-Button, whichs starts the evaluation process.

Tipps and Tricks:

You should immediately receive a verification-email holding a MOLfile with your input-data for integration into your manuscript. The email holds a short description how to do that.
Usually the result is mailed within 15 minutes back to you.
The header line of the input-mask holds a link to a 'serverstatus'-file showing the date and time of the last server activity – if this time is more than 30 minutes in the past, the server might be down or has a lot to do. Don't panic, your request isn't lost – it will be processed automatically whenever the server restarts or has finished all previous requests. Don't resend your request ! Your request has been received on a higly redundant computer-system and will be kept there even when the server is down.

Here you can recall the statusfile.
If you would like to get only a spectrum prediction, then enter ONLY ONE ASSIGNED chemical shift value into the FIRST BOX – use a value of 399.0; any other value will produce a complete report with a lot of warnings !
For redundancy reasons the server can be accessed via:

http://nmrpredict.orc.univie.ac.at/c13robot/robot.php

http://synthon.pch.univie.ac.at/csearchlite/robot.php

********* Keep in mind *********

Your request will be sent via email over the Internet without any encryption.

Your request will be processed on one of my computers and stored there.

In any case your data will be kept confidential, but I am able to see them. Furthermore in case of a strange result I will use your example for my internal development work in order to optimize the server.

If you have a problem with these facts, I offer a simple solution: Don't use this server !

Understanding the Report

The color coding scheme on structures is similar to a „traffic light“

GREEN: Good, well assigned information
YELLOW: A problem might be present
RED: Here is something wrong or at least there is a high probability for an error

Structure Proposal: Here the structure proposal as given by the author(s) is displayed – this is done for checking the input data.

Numbering Scheme derived from the drawing sequence used during the calculation: The structure is shown with attached numbers to the carbon atoms used in the subsequent calculations and tables.

The marked carbons have been fully assigned: Here the structure proposal is shown with all carbons highlighted in GREEN where the authors have assigned a chemical shift value; we use GREEN here, because this 1:1-correlation is necessary to utilize the dataset later on for spectrum prediction. The fully assigned lines are summarized below the structure.

The marked carbons have exchangeable assigned lines: Here the structure proposal is shown with all carbons highlighted in YELLOW where the authors have assigned lines, but these lines are grouped together into 'exchange groups' – therefore we use YELLOW here. The exchangeable assigned lines are summarized below the structure.

The marked carbons have no lines assigned: Here the chemical shift values are either missing or only given without any assignment, sometimes even without any multiplicity information. This should be avoided, because this makes a NMR-spectrum useless for storing in a database system – for this reason the corresponding carbons are marked with RED color. A list of unassigned chemical shift values is given below the structure.

Definition of stereocenters: This display highlights all stereocenters avilable in the molecule (only on chiral carbons), a RED marker shows a center with a missing up(or down)-bond or an otherwise not properly defined chiral center.

Matching map of predicted versus experimental data: Here the structure is shown with highlighted carbon positions, a GREEN marker tells us a good coincidence between predicted and experimentally determined chemical shift value. A YELLOW marker is given when the predicted chemical shift agrees usually less than 10 ppm. A RED marker shows a large deviation (usually more than 10 ppm) between prediction and experimenent. A LARGE CONNECTED part of the structure marked with RED color is a severe hint for a wrong structural proposal. The above-mentioned deviation of 10ppm used for seperating YELLOW and RED markers may vary according to the elements present in your query-structure (in organometallic compounds it is increased) and also varies depending on the underlying parameters describing the quality of the spectrum prediction itself. A deviation of e.g. 8ppm between experimental and predicted value at the 5 shell-level has to be taken more seriously than a 10ppm deviation at the 3 shell-level. Simply believe that a YELLOW box should be checked anyway and a RED box is a very severe hint pointing to an assignment error and/or error in your structure proposal.

Differences between predicted and experimental data in ppm: These differences are inserted into the structure display; for sake of clarity any increment lower than 1.0 ppm is ignored.

Visualization of increments: The differences between predicted and experimental data are visualized as two spectra with connections between corresponding lines. This is an excellent way to detect interchanged signal assignments.

Experimental shift values as given by authors: In this picture ALL lines are given, independent from any assignment. This picture simply visualizes the input of chemical shift values. The multiplicity information is coded into the color of the line.

Singulet - Dublet - Triplet - Quartet - Odd (S or T) - Even (D or Q) - Unknown

Assigned spectrum as given by the author(s): Here only lines assigned by the author are given – the intention is definitely to show some authors that the chemical community deserves only well-assigned spectra.

Best predicted Spectrum: This image shows the 'best predicted spectrum' – in this case the multiplicities as derived from the assignment are used. The multiplicities are coded into the color of the line.

Singulet - Dublet - Triplet - Quartet

Overall impression on compatibility of multiplicity from structure and experiment: The input data are checked for consistency. The multiplicities calculated from the structure are compared against the multiplicities supplied by the author, any inconsistency is highlighted in the table above, the overall impression with respect to this evaluation is summarized in the quality bar below.

Quality of the Spectrum Prediction: Here the spectrum prediction itself has been evaluated. Why ? For ranking it is important to know to which extent the prediction was successful – in case of missing reference data in the CSEARCH-system behind, it is necessary to accept larger deviations for a better ranking. If this parameter is not taken into account, every structure which has uncommon functional groups with respect to the reference database, will be ranked lower. The consequence is a lower classification (e.g. 'Major Revision' instead of 'Minor Revision')

Preferred Chemical Shift Values from both predictions: In this picture the structure is given with attached chemical shift values, which have selected as 'best choice' from both prediction techniques. The underlying color reflects the quality of the prediction.

Contribition of the Methods: The carbons in the structure are highlighted according to the source of the 'best predicted shift value'. A blue circle shows that the best predicted shift value is coming from the HOSE-code method, a violet circle tells us that the Network-value has been preferred. A red circle shows that both methods were unable to produce a prediction.

Similarity between predicted and experimental data based on positions: It is necessary to evaluate the resulting signal assignment in terms of 2 parameters, the first one is used here. Assume a situation, where you have an average deviation of 10ppm per carbon, which is extremely high, but this deviation is focused only on ONE, SINGLE POSITION within the structure. In such a case a typing error has a high probabilty (e.g. 151.1 instead of 51.1 ppm). When the average deviation is distributed over many carbons ( in some cases 'connected carbons') then a high probability for an error with the structure proposal itself, is given.

Similarity between predicted and experimental data is X.Y ppm: It is necessary to evaluate the resulting signal assignment in terms of 2 parameters, the second one is used here. Assume a situation, where you have an average deviation of 10ppm per carbon, which is extremely high, but this deviation is focused only on ONE, SINGLE POSITION within the structure. In such a case a typing error has a high probabilty (e.g. 151.1 instead of 51.1 ppm). When the average deviation is distributed over many carbons ( in some cases 'connected carbons') then a high probability for an error with the structure proposal itself, is given.

Eventually Symmetry Error: Same environment - Different shiftvalue: This warning tells you, that you have different chemical shift values on identical functional groups. The underlying algorithm does not take into account stereochemistry, therefore you will get also warnings when the values are correct.

Eventually Symmetry Error: Same shiftvalue – Different environment: This warning tells you, that you have identical chemical shift values on different functional groups – this might be a hint for a typo in your peaklist.

Eventually Symmetry Error: Same environment - shiftvalue missing, but known: This warning tells you, that symmetrical atoms have assigned values and for the marked atom no value is given. This is a severe hint that you were lazy during data input ! The consequence is an unnecessary low ranking, because missing lines are severely taken into account and contribute significantly to classification like 'Major revision' or 'Reject'.

The quality bar:

The quality bar shows in a graphics way the quality of the evaluation – the range is given from „Poor“ to „Good“. It should be stated, that this bar really covers the range from „Horrible“ to „Excellent“ - these two words reflect the situation with real-world NMR-data much better than the pair „Poor/Good“.

Search the Internet using 2D-/3D-Inchikeys: The Inchikey is a structure descriptor well-suited for text-based search-engines. Clicking the Descriptor launches a Google-search using either the 2D-inchikey (connectivity) or the 3D-inchikey (connectivity, stereochemistry, mobile hydrogens) retrieving all webpages which have been indexed by google holding information on the identical structure.

Identical structure search: All accessed CSEARCH-databases are searched for the query-structure – the structure comparison is based on a 2D-model neglecting stereochemistry. In the case the identity is found, the structure and the spectrum together with the literature citation is shown. The literature citation is linked via the DOI (Digital Object Identifier) with the article at the publishers webpage. Accessing the literature depends on the license of your organization; abstracts are usually available for free.

Identical spectrum search: The query spectrum is used to retrieve identical spectra from all accessed CSEARCH-databases – the spectrum comparison is only based on chemical shift values, the multiplicity information is ignored. You won't believe how many people sell the same spectrum twice without stating that this is a structure revision. Only entries having a different structure than the query-structure are shown.

Page written by: Wolfgang.Robien(at)univie.ac.at

Last update: 25-Feb-2010