As many readers of my webpages might remember there was a nice "blogwar" last year regarding the quality of the NMRShiftDB-data initiated by a quality-check according to the usual CSEARCH-criteria. I dont want to discuss if 6,200 ppm assignment errors found and corrected within 2 hours are small or large - I simply want to know after 1 year, which mechanism have been built into NMRShiftDB in order to avoid such misassignments.

Therefore my question:

Which mechanism have been implemented into NMRShiftDB during the last 12 months in order to avoid obvious assignment errors ?

NMRShiftDB is an Open Source and Open Access NMR-database system with 1,484 registered user (as per March 26th, 2008) - therefore every contribution to this system influences future assignments very effectively. Assume the situation, you are working on a specific class of compounds and rely your C-NMR assignments on a wrong reference dataset from NMRShiftDB. The consequence is again a wrong assignment; now this wrong assignment goes back into NMRShiftDB, because you want to contribute to this OPEN system. What happens: When doing the next assignment for a similar partial structure, this assignment is now based on better statistical parameters ! A database - as a collection of (hopefully) facts - has a high impact on upcoming science - therefore any database supplier has a high degree of responsibility for the content (s)he provides. Data curation is therefore of extremely high value, but a prerequisite of data curation is error-detection - therefore I repeat my question: Which mechanism have been implemented into NMRShiftDB during the last 12 months in order to avoid obvious assignment errors ? Other systems like "chemspider" have put a lot of effort into this topic and have discussed their efforts very frequently on their weblogs.

When going to NMRShiftDB there is a column 'Latest Additions' on their homepage. The newest entry from March 21st, 2008 is shown in the slide below (screendump from March 26th,2008). The CAS-Registry number of this compound is 324029-99-4; there is no literature citation given for this compound in the CAS-Registry-File - therefore this seems to be an unpublished result contributed by J. Beutler according to the structural and spectral identifier (nmrshiftdb.cubic.uni-koeln.de_jbeutler_2008-03-21_03:48:07_0415)

Original data

Translating this entry into a CSEARCH-type display gives

CSEARCH original

Comparing this entry against the predicted values using CSEARCH-NN-Technology gives an average deviation of 7.7ppm per carbon. Reassigning the following pairs: 15/16; 13/14; 5/9; 11/12 and 8/10 reduces the average deviation to 1.8ppm per carbon. This assignment is shown in the next slide. Please note that the average deviation for the pair C5/9 is more than 20ppm per carbon!

CSEARCH New assignment

Please keep in mind: The original assignment has been done using HH-COSY, HMQC and HMBC, the proposed assignment is based only on shift-arguments; I have never seen any 2D-NMR of this compound !

I would highly appreciate to see the original 2D-NMRs somewhere on the net - let me know, when my assignment proposal is wrong ! Your comments are highly appreciated on my weblog.

Since March 29th, 2008 the corrected dataset is available on NMRShiftDB. All corrections proposed by me have been performed on this dataset. Please keep in mind, the original assignment has been done by extensive application of 2D-NMR (HH-COSY,HMQC and HMBC) - I have never seen them, my corrections are solely based on one simple prediction using CSEARCH.

Corrected entry

According to the access-protocols of my webserver, I am quite sure that my proposed assignment has been checked using another professional program. I am quite happy that all my proposals have been verified by this independent test.

What are the facts:

