SPLASH, a hashed identifier for mass spectra

G Wohlgemuth, SS Mehta, RF Mejia, S Neumann… - Nature …, 2016 - nature.com
Nature biotechnology, 2016nature.com
To the Editor: Over the past few years, as the use of mass spectrometry (MS) has increased,
multiple spectral libraries, databases and software frameworks have been created to enable
sharing and searching of MS data. However, finding all the spectra that correspond to a
specific compound across different databases continues to be a challenge. A spectral
identifier that improves the exchange of mass spectra, as well as provenance and duplicate
detection, would address these issues and enhance searchability. MassBank1 (http://www …
To the Editor: Over the past few years, as the use of mass spectrometry (MS) has increased, multiple spectral libraries, databases and software frameworks have been created to enable sharing and searching of MS data. However, finding all the spectra that correspond to a specific compound across different databases continues to be a challenge. A spectral identifier that improves the exchange of mass spectra, as well as provenance and duplicate detection, would address these issues and enhance searchability. MassBank1 (http://www. massbank. jp and http://massbank. eu/MassBank/) has been the source of data for other open libraries, such as the Global Natural Products Social Molecular Networking2 (GNPS) and Human Metabolome Database3 (HMDB) libraries as well as the MetaboLights reference layer4. In turn, HMDB and communitycontributed spectra from GNPS have also been imported into MassBank of North
America (MoNA; http://mona. fiehnlab. ucdavis. edu/), while GNPS searches public MS data against the above-mentioned libraries as well as the National Institutes of Standards and Technology (NIST) spectral library5. The mzCloud (https://www. mzcloud. org/) library contains some spectra generated from the same raw data that were used to create MassBank records. As these examples show, the complexity and the cross-import of data are increasing, together with the number of mass spectra, such that these different resources can now contain identical, or near identical, spectra under different accession numbers. For example, the library entries PR100026 (MassBank, MoNA), 5464 (HMDB) and CCMSLIB00000222858 (GNPS) all refer to exactly the same mass spectrum of caffeine, originally sourced from MassBank. As the different libraries focus on different compound domains7, users wishing to access mass spectra from all compounds
nature.com