Genetic Data Products

Available Products

Genotype Data Version 1 (2006-2008 Samples)

In 2006, saliva was collected using a mouthwash collection method. In 2008, saliva was collected using the Oragene DNA collection kit (OGR-250). Saliva completion rates were 83% in 2006 and 84% in 2008.

The genotyping was performed by the NIH Center for Inherited Disease Research (CIDR, X01HG005770-01), using the Illumina Human Omni-2.5 Quad beadchip, with coverage of approximately 2.5 million single nucleotide polymorphisms (SNPs). For more information about the specific SNPs included on the Illumina Human Omni-2.5 Quad beadchip, please refer to/download the files in the links below. Genotyping Quality Control was performed by the Genetics Coordinating Center at the University of Washington, Seattle, WA. A copy of the QC report is available here.

The initial data product is available through dbGaP. Specific information on the data can be found on the NCBI website.

Imputations

Current dbGaP data products also include imputation of approximately 21 million DNA variants from the 1000Genomes Project. Imputation will increase the number of available markers and will make possible comparisons across platforms that do not assay the same genome-wide SNP panel. These imputation analyses were performed and documented by the Genetics Coordinating Center at the University of Washington, Seattle, WA. A copy of the imputation report is available here.

Important Note Concerning Flipped Strand Issues

In mid-2014, we became aware of an annotation issue with the Illumina HumanOmni2.5-v1_D manifest that caused strand flip errors in the annotation for approximately 20,000 SNPs genotyped in the Health and Retirement Study 2006/2008 samples. This manifest also served as the foundation for the 1000 Genomes imputation for those samples. We systematically investigated the problem and identified two issues with the Illumina HumanOmni2.5-v1_D manifest that led to the strand flip errors. The issues are described in detail in HRS1-2_dbGaPUserInfo_v3.pdf

In order to correct the flipped strand issue, users should take the following action. SNPs that are affected by the strand swap are flagged as problem.type2.SNP = TRUE inHRS1-2_HumanOnmi2.5v1_D_flaggedSNPs.zip. Users should switch the coded and non-coded allele annotation for the affected SNPs in the annotation files. For example, if coded_allele=A and noncoded_allele=T, then coded_allele should be T and noncoded _allele should be A.


Genotype Data Version 2 (2006-2010 Samples)

Approximately 3,000 samples, collected during the HRS 2010 field period, have been genotyped and, combined with genotyped samples from 2006-2008, are available from dbGaP as Version 2 of the HRS data. The relationship matrix estimated using the KING Robust Method for these data will also be available with this update.

The 2010 sample addition includes half of the new cohort and minority sample expansion with the remaining half to be added with Version 3.

The genotyping was performed by the NIH Center for Inherited Disease Research (CIDR, X01HG005770-01), using the Illumina HumanOmni2.5-8v1 array. Genotyping Quality Control was performed by the Genetics Coordinating Center at the University of Washington, Seattle, WA. A copy of the QC report is available here.

Imputations

Current dbGaP data products for Version 2 also include genotype imputations from the 1000Genomes Project. These imputation analyses were performed and documented by the Genetics Coordinating Center at the University of Washington, Seattle, WA. A copy of the imputation report for the Version 2 data is available here.


Candidate Gene and SNP Files

The purpose of the HRS Candidate Gene and SNP files is to provide data users access to carefully selected subsets of the HRS genotype data available on dbGaP. These are smaller and more manageable files designed for data users who are interested in a specific gene or SNP. Users must have dbGaP approval before requesting and gaining access to these files from HRS.

Currently, there are two sets of files available: Cognition and Behavior; Longevity. The specific SNPs and genes included in each package and file description documents with details on each data package are provided below.

Cognition and Behavior

Longevity

Principal components for all unrelated study subjects as well as principal components calculated within ethnic-specific samples are also provided with each package. For more information on the data collection, genotyping, imputation, and annotation of these files please refer to the Candidate Gene and SNP Data Description.


Exome Data (2006-2010)

Exonic variants have been measured using the Illumina Human Exome BeadChip v1 on the approximately 15,500 samples collected from 2006 through 2010. These exome data are available through HRS and dbGaP. These exonic variants are also being measured on the 2012 samples using the Illumina HumanOmni2.5 plus Exome array. For more information on the data collection, genotyping, and annotation of these data please refer to the Quality Control Report for Exome Chip Data.


Polygenic Score Data (PGS)

Complex health outcomes and behaviors of interest to the research community are often highly polygenic, or reflect the aggregate effect of many different genes, so the use of single genetic variants or candidate genes may not capture the dynamic nature of more complex phenotypes. A polygenic score (PGS) aggregates millions of individual loci across the human genome and weights them by the strength of their association to produce a single quantitative measure of genetic risk.To facilitate use of these scores for research, HRS has created a set of PGS for public distribution based on several large, replicated GWAS.


Telomere Data (2008)

The 2008 Telomere Data release (Final, Version 1.0) includes average telomere length data from 5808 HRS respondents who consented and provided a saliva sample during the 2008 interview wave. Assays were performed by Telome Health (Telomere Diagnostics, http://www.telomehealth.com/). Average telomere length was assayed using quantitative PCR (qPCR) by comparing telomere sequence copy number in each patient's sample (T) to a single-copy gene copy number (S). The resulting T/S ratio is proportional to mean telomere length. Funding was provided by the National Institute on Aging (NIH U01 AGO9740 and RC4 AG039029).

Documentation

How to Apply

Polygenic Score Data (PGS)

Polygenic Score Data (PGS) is available as public data from the HRS Data Download System, and is not covered by the steps below.

Telomere Data

2008 Telomere Data is available as an HRS Sensitive Health Data Product, and is not covered by the steps below. See the Sensitive Health Data page for details.

Other Genetic Data

All other HRS genetic data requires dbGaP Authorized Access. See the YouTube videodbGaP: Apply for Controlled Access Data for a step-by-step demonstration of how the application process works.

Genotype Data

After dbGaP Authorized Access is granted, Genotype Data Version 1 and Genotype Data Version 2 are available directly from dbGaP.

Candidate Gene, Exome, and Cross-Reference Files

After dbGaP Authorized Access is granted, Candidate Gene and SNP Files, Exome Data, and HRS-dbGaP Cross-Reference Files (for linking to HRS phenotype measures not in dbGaP) are available from the HRS download site after additional application steps:

  1. Create an account on the HRS User Registration/File Download Web site, if you don't already have one
  2. Download and complete the Genetic Data Access Use Agreement
  3. Download and complete the Genetic Data Request Form
  4. Send a signed copy of these two documents via email (PDF) to hrsdatareq@umich.edu or via surface mail to:


    Health and Retirement Study
    DUA Review Committee
    426 Thompson Street
    Ann Arbor, Michigan 48104-2321

  5. You will be notified when access to download the files has been granted

HRS Public Data

Researchers can view the HRS public data (phenotypes) at any time by registering as an HRS data user. The concordance tool is a convenient way to search for survey content by topic/keyword.

Restricted Data Linkages

Users wishing to link to HRS restricted data products must submit a restricted data application.

Training

Genomics for Social Scientists

A one-week genomic data workshop focused on providing hands-on training for researchers working at the intersection of genetics and social science research, using data from the Health and Retirement Study (HRS) as a model.

View workshop website

Six Weeks to Genomic Awareness

This free, self-paced online course is designed to provide a foundation for understanding genomic advances and identifying the relevance of genomics to public health.

View course website

Additional Information