A navigator for human genome epidemiology

W Yu, M Gwinn, M Clyne, A Yesupriya, MJ Khoury - Nature genetics, 2008 - nature.com
W Yu, M Gwinn, M Clyne, A Yesupriya, MJ Khoury
Nature genetics, 2008nature.com
Recent successes in large-scale genetic association studies call for renewed attention to
integrating research results, not only among studies, but across disciplines1. At the
molecular level, genetic polymorphisms provide a starting point for investigating the
functions of complex biological systems. At the population level, epidemiologists can begin
to use data on genetic variation, associations and interactions to interpret population
attributable fractions and estimate the potential health impact of genetically directed …
Recent successes in large-scale genetic association studies call for renewed attention to integrating research results, not only among studies, but across disciplines1. At the molecular level, genetic polymorphisms provide a starting point for investigating the functions of complex biological systems. At the population level, epidemiologists can begin to use data on genetic variation, associations and interactions to interpret population attributable fractions and estimate the potential health impact of genetically directed interventions2. Publicly available genetic sequence databases have demonstrated their value in accelerating the Human Genome Project and advancing the field of molecular genetics; newer efforts, such as dbGaP and CGEMS, are now beginning to make genotypephenotype data broadly available to the scientific community3.
The published scientific literature also reflects rapid growth in studies of human genetic factors in relation to health and disease. Since 2001, the Human Genome Epidemiology Network (HuGENet) has maintained a database of published, population-based epidemiologic studies of human genes extracted from PubMed4. We recently replaced our PubMed search strategy with a new approach using machine learning, which has reduced manual effort and increased both the sensitivity and specificity of screening. Our curator updates the database weekly with articles newly added to PubMed and assigns to them one or more study types (for example, observational study, meta-analysis or genome-wide association study) and data categories (for example, gene-disease association, gene-environment interaction or pharmacogenomics). Each article is indexed in the database with MeSH terms (using the MeSH hierarchical structure) and gene information from the National Center for Bioinformatics (NCBI) Entrez Gene database. As of November 2007, the database has indexed more than 30,000 articles, referencing more than 3,000 genes and nearly 2,000 disease terms (Table 1). Most articles (80%) describe genetic associations. Approximately 20% of all articles were published in 2007, including 68 of 82 genome-wide association studies.
nature.com