The NHGRI-EBI GWAS Catalog has a new home at EMBL-EBI.
- Built on a new platform, the GWAS Catalog helps biologists explore the genomics of disease and human complexity.
- The re-launched service offers comprehensive, ontology-driven search functionality for better discoverability.
- The Catalog will now benefit from EMBL-EBI’s flexible, sustainable technical and scientific infrastructure.
The GWAS Catalog, originally developed by the National Human Genome Research Institute (NHGRI) in the US, has moved to EMBL-EBI in the UK and will be jointly curated by the two institutes. Created in 2008 to help researchers explore data from genome-wide association studies (GWAS), the Catalog has been redeployed on an ontology-driven platform supported by EMBL-EBI’s flexible technical and scientific infrastructure.
The GWAS Catalog is a rich resource containing a huge amount of information about associations between DNA sequences and traits such as disease. Co-developed between NHGRI and EMBL-EBI since 2010, it is a quality-controlled, manually curated, literature-derived collection of close to 2000 GWA studies that have assayed at least 100,000 SNPs, and all statistically robust SNP-trait associations. The search capability has been improved so that users no longer need to download the whole database, and can select as few or as many studies as they need.
We’ve given the Catalog a more flexible infrastructure – one that is designed to represent what happens in biology, and how studies are carried out.
The rebuilt NHGRI-EBI GWAS Catalog better serves the needs of biologists who may not be experts in bioinformatics. Its comprehensive search functionality, made possible by an underlying ontology and improved data modelling, makes it easier for biologists to find useful information about their trait of interest, even in GWAS with complex study designs. While one-to-one associations between SNPs and traits might seem simple to display, it is often the case that a trait will only come about given a certain combination of SNPs – and that is much more difficult to represent. The GWAS Catalog is being developed to offer a more multi-layered view of complex data so that it can display these combinations of SNPs that may lead to a disease.
“We’ve given the Catalog a more flexible infrastructure, one that is designed to represent what happens in biology and how studies are carried out. The improved modelling of the Catalog’s multi-dimensional information has increased the accuracy of the data representation, particularly for the more complex studies,” says Jackie MacArthur, GWAS Catalog Project Manager at EMBL-EBI.
The high-quality information represented in the Catalog is curated (and double-checked) both at EMBL-EBI and the NHGRI, and the project is developing tools to ensure these activities can be extended as the resource grows to meet the increase in eligible studies. In addition to new studies being added, the GWAS Catalog may extend in future to include targeted or exome arrays.
The GWAS Catalog can now be searched by many more indexed fields, for example reported trait, synonyms, sample descriptions, genes and SNPs. This has been made possible due to the BBSRC Funded BioSolr project, which is developing extensions to the search technology used in this and other EMBL-EBI resources. Integration with other public data resources such as Ensembl is an important goal for the project, as this will make it easier for researchers to fully explore their gene of interest in genomic context.
“We are excited that the longstanding and productive collaboration between NHGRI and EMBL-EBI has resulted in a more robust infrastructure, which will provide improved access, quality and breadth of data for Catalog users,” says Lucia Hindorff, program director in the Division of Genomic Medicine at the NHGRI.
“We look forward to building additional data connections between Ensembl and this excellent resource. Having the GWAS Catalog at EMBL-EBI will enable better integration with our resources such that more complex biological questions can be investigated using these data,” says Fiona Cunningham, Variation Annotation Coordinator at EMBL-EBI.
Delivery and development of the Catalog is partly funded by NHGRI grant number 1U41HG007823-01 to Helen Parkinson and Paul Flicek.
BioSolr, a project addressing the challenges in making biomedical data easily accessible using the world-leading Apache-Solr search-engine framework, is funded in part by BBSRC grant BB/M013146/1 to Sameer Velankar, Gerard Kleywegt and Helen Parkinson of EMBL-EBI in collaboration with Flax, the Open Source Search Specialists.
This post was originally published on EMBL-EBI News.