Genetic Data Products

Available Products

Genotype Data Version 1 (2006-2008 Samples)

In 2006, saliva was collected using a mouthwash collection method. In 2008, saliva was collected using the Oragene DNA collection kit (OGR-250). Saliva completion rates were 83% in 2006 and 84% in 2008. The genotyping was performed by the NIH Center for Inherited Disease Research (CIDR, X01HG005770-01), using the Illumina Human Omni-2.5 Quad beadchip, with coverage of approximately 2.5 million single nucleotide polymorphisms (SNPs). For more information about the specific SNPs included on the Illumina Human Omni-2.5 Quad beadchip, please refer to/download the files in the links below. Genotyping Quality Control was performed by the Genetics Coordinating Center at the University of Washington, Seattle, WA. A copy of the QC report is available here. The initial data product is available through dbGaP. Specific information on the data can be found on the NCBI website. HRS-dbGaP Cross-Reference File v1 (2006-2008), available from HRS, is required to link HHID/PN (the unique HRS identifier) to the identifier assigned to HRS genetic data stored in dbGaP, Genotype Data Version 1. See below for access details.

Imputations

Current dbGaP data products also include imputation of approximately 21 million DNA variants from the 1000Genomes Project. Imputation will increase the number of available markers and will make possible comparisons across platforms that do not assay the same genome-wide SNP panel. These imputation analyses were performed and documented by the Genetics Coordinating Center at the University of Washington, Seattle, WA. A copy of the imputation report is available here.

Important Note Concerning Flipped Strand Issues

In mid-2014, we became aware of an annotation issue with the Illumina HumanOmni2.5-v1_D manifest that caused strand flip errors in the annotation for approximately 20,000 SNPs genotyped in the Health and Retirement Study 2006/2008 samples. This manifest also served as the foundation for the 1000 Genomes imputation for those samples. We systematically investigated the problem and identified two issues with the Illumina HumanOmni2.5-v1_D manifest that led to the strand flip errors. The issues are described in detail in HRS1-2_dbGaPUserInfo_v3.pdf In order to correct the flipped strand issue, users should take the following action. SNPs that are affected by the strand swap are flagged as problem.type2.SNP = TRUE in HRS1-2_HumanOnmi2.5v1_D_flaggedSNPs.zip. Users should switch the coded and non-coded allele annotation for the affected SNPs in the annotation files. For example, if coded_allele=A and noncoded_allele=T, then coded_allele should be T and noncoded _allele should be A.

Access Information

Genotype Data Versions 1 and 2 are available directly from dbGaP and require dbGaP Authorized Access. See the YouTube video dbGaP: Apply for Controlled Access Data for a step-by-step demonstration of how the application process works.

HRS-dbGaP Cross-Reference File v1 is required to link HHID/PN (the unique HRS identifier) to the identifier assigned to HRS genetic data stored in the dbGaP system. Submit the Genetic Data Cross-Reference Request Form to apply for access to the cross-reference file after you receive your dbGaP Authorized Access. A completed and signed Genetic Data Access Use Agreement is also required.

Genotype Data Version 2 (2006-2010 Samples)

Approximately 3,000 samples, collected during the HRS 2010 field period, have been genotyped and, combined with genotyped samples from 2006-2008, are available from dbGaP as Version 2 of the HRS data. The relationship matrix estimated using the KING Robust Method for these data will also be available with this update. The 2010 sample addition includes half of the new cohort and minority sample expansion with the remaining half to be added with Version 3. The genotyping was performed by the NIH Center for Inherited Disease Research (CIDR, X01HG005770-01), using the Illumina HumanOmni2.5-8v1 array. Genotyping Quality Control was performed by the Genetics Coordinating Center at the University of Washington, Seattle, WA. A copy of the QC report is available here. The initial data product is available through dbGaP. Specific information on the data can be found on the NCBI website. HRS-dbGaP Cross-Reference File v2 (2006-2010), available from HRS, is required to link HHID/PN (the unique HRS identifier) to the identifier assigned to HRS genetic data stored in dbGaP, Genotype Data Version 2. See below for access details.

Imputations

Current dbGaP data products for Version 2 also include genotype imputations from the 1000Genomes Project. These imputation analyses were performed and documented by the Genetics Coordinating Center at the University of Washington, Seattle, WA. A copy of the imputation report for the Version 2 data is available here.

Access Information

HRS-dbGaP Cross-Reference File v2 is required to link HHID/PN (the unique HRS identifier) to the identifier assigned to HRS genetic data stored in the dbGaP system. Submit the Genetic Data Cross-Reference Request Form to apply for access to the cross-reference file after you receive your dbGaP Authorized Access. A completed and signed Genetic Data Access Use Agreement is also required.

Genotype Data Version 3 (2006-2012 Samples)

Respondents who consented to the saliva collection in 2006 (Phase 1), 2008 (Phase 2), 2010 (Phase 3), or 2012 (Phase 4) have been genotyped using Illumina HumanOmni2.5-arrays. The Phase 1 and 2 participants were genotyped together, and were imputed together previously (see dbGaP accession number phs000428.v1.p1). The Phase 3 participants were subsequently genotyped, and were imputed together with Phases 1-2 (dbGaP accession number phs000428.v2.p2). An additional 3,303 Phase 4 participants were genotyped in 2015. The four phases are now combined, yielding a total of 18,923 unique HRS participants: 15,620 from Phases 1-3, and 3,303 from Phase 4. After QC, there were a total of 18,916 unique HRS participants included in this dataset. The DNA samples were genotyped at the Center for Inherited Disease Research (CIDR) using the calling algorithm GenomeStudio version 2011.1, Genotyping Module version 1.9.4 and GenTrain version 1.0. Genotyping Quality Control was performed by the Kardia Lab at the University of Michigan. A copy of the QC report is available here. Additional information can be found on the NIAGADS website.

Imputations

Current NIAGADS data products also include genotype imputations using the 1000 Genomes and the Haplotype Reference Consortium (HRC) reference panels. These imputation analyses were performed and documented by the University of Michigan using the Michigan Imputation Server. A copy of the imputation report for 1000G is available here, and for HRC is available here.

Access Information

The data product (NG00119) is available directly from NIAGADS with an approved NIAGADS Data Access Request (DAR). More information on how to apply for access can be found on the NIAGADS website.

HRS-NIAGADS Cross-Reference File is required to link HHID/PN (the unique HRS identifier) to the identifier assigned to HRS genetic data stored and distributed by NIAGADS. Submit the Genetic Data Cross-Reference Request Form form to apply for access to the cross-reference file after your NIAGADS access is approved. A completed and signed Genetic Data Access Use Agreement is also required.

DNA Methylation (VBS 2016)

DNA methylation assays were done on a subsample (n=4,104) of people who participated in the 2016 Venous Blood Study. The sample includes all the participants of the 2016 Healthy Cognitive Aging Project (HCAP) who have provided blood samples, plus younger participants designated for future HCAP assessments, and a subsample of HCAP non‐participants. Additional information can be found on the NIAGADS website and in the data description.

Documentation

Data Description

Access Information

The data product (NG00153) is available directly from NIAGADS with an approved NIAGADS Data Access Request (DAR). More information on how to apply for access can be found on the NIAGADS website.

HRS‐NIAGADS DNAm Cross‐Reference File, available from HRS, is required to link the DNA methylation data to HRS survey data. Submit the Genetic Data Cross-Reference Request Form form to apply for access to the cross-reference file after your NIAGADS access is approved. A completed and signed Genetic Data Access Use Agreement is also required.

APOE and Serotonin Transporter Alleles (2021)

The APOE and Serotonin Transporter Alleles data product (Early, Version 1.0) includes data for the APOE isoform, directly genotyped using a Taqman allelic discrimination SNP assay, where available, or imputed from preexisting genotype array data otherwise, This file also includes human serotonin transporter (5HTTLPR) short and long alleles measured using polymerase chain reaction (PCR). In total, there are 19,193 HRS participants in the data file: 17,237 with directly genotyped data for APOE and 1,956 additional participants with imputed data. There are 17,364 participants with valid values for 5HTTLPR.

Funding was provided by the National Institute on Aging (NIH U01 AGO9740, RC2 AG036495, and RC4 AG039029).

Documentation

Data Description

Access Information

APOE and Serotonin Transporter Alleles Data is available as an HRS Sensitive Health Data Product. See the Sensitive Health Data page for application details.

Candidate Gene and SNP Files

The purpose of the HRS Candidate Gene and SNP files is to provide data users access to carefully selected subsets of the HRS genotype data available on dbGaP. These are smaller and more manageable files designed for data users who are interested in a specific gene or SNP. Currently, there are two sets of files available: Cognition and Behavior; Longevity. The specific SNPs and genes included in each package and file description documents with details on each data package are provided below.

Cognition and Behavior

Longevity

Principal components for all unrelated study subjects as well as principal components calculated within ethnic-specific samples are also provided with each package. For more information on the data collection, genotyping, imputation, and annotation of these files please refer to the Candidate Gene and SNP Data Description.

Access Information

Candidate Gene and SNP Files are available through through a virtual desktop infrastructure (VDI) system that allows users to connect their own desktop to a secure data enclave. If you have an HRS Restricted Data Agreement (RDA), submit the Modify an Agreement form to request access to these products. If you need to apply for an RDA, begin at the Restricted Data: Access with VDI page.

Exome Data (2006-2010)

Exonic variants have been measured using the Illumina Human Exome BeadChip v1 on the approximately 15,500 samples collected from 2006 through 2010. These exome data are available through HRS and dbGaP. These exonic variants are also being measured on the 2012 samples using the Illumina HumanOmni2.5 plus Exome array. For more information on the data collection, genotyping, and annotation of these data please refer to the Quality Control Report for Exome Chip Data.

Access Information

Exome Data is available through through a virtual desktop infrastructure (VDI) system that allows users to connect their own desktop to a secure data enclave. If you have an HRS Restricted Data Agreement (RDA), submit the Modify an Agreement form to request access. If you need to apply for an RDA, begin at the Restricted Data: Access with VDI page.

Epigenetic Clocks

DNA Methylation (DNAm) is one mechanism by which exposure to adverse life circumstances and environments are linked to health outcomes related to aging. A number of researchers have identified portions of the genome where methylation changes are related to either age or, more recently, to health outcomes linked to age. The resulting “methylation clocks” combine information for a small number of of CpGs (typically 100-500) to produce indicators of epigenetic aging. Thirteen epigenetic clocks have been constructed using the HRS DNAm data collected in 2016 (n=4,018) through the 2016 Venous Blood Study (VBS).

Epigenetic Clocks: Supplement PACE and Grim2

Two additional epigenetic clocks have been constructed using the DNA methylation data derived from the 2016 Health and Retirement Study Venous Blood Study, PACE and GrimAge Version 2.

Access Information

Epigenetic Clocks and Epigenetic Clocks: Supplement PACE and Grim2 are available as HRS Sensitive Health Data.

Polygenic Score Data (PGS)

Complex health outcomes and behaviors of interest to the research community are often highly polygenic, or reflect the aggregate effect of many different genes, so the use of single genetic variants or candidate genes may not capture the dynamic nature of more complex phenotypes. A polygenic score (PGS) aggregates millions of individual loci across the human genome and weights them by the strength of their association to produce a single quantitative measure of genetic risk. To facilitate use of these scores for research, HRS has created a set of PGS for public distribution based on several large, replicated GWAS.

Access Information

Polygenic Score Data is available as an HRS Sensitive Health Data Product. See the Sensitive Health Data page for application details.

Telomere Data (2008)

The 2008 Telomere Data release (Final, Version 1.0) includes average telomere length data from 5808 HRS respondents who consented and provided a saliva sample during the 2008 interview wave. Assays were performed by Telome Health (Telomere Diagnostics, http://www.telomehealth.com/). Average telomere length was assayed using quantitative PCR (qPCR) by comparing telomere sequence copy number in each patient's sample (T) to a single-copy gene copy number (S). The resulting T/S ratio is proportional to mean telomere length. Funding was provided by the National Institute on Aging (NIH U01 AGO9740 and RC4 AG039029).

Documentation

Access Information

Telomere Data is available as an HRS Sensitive Health Data Product. See the Sensitive Health Data page for application details.

VBS 2016 RNASeq Count Data

This data release includes raw counts and log2 counts-per-million (log2cpm) values from RNASeq analysis performed on a subsample of participants who consented to the HRS 2016 Venous Blood Study (n=3748). RNASeq was performed on a representative subsample of HRS participants who participated in the 2016 Venous Blood Study. The sample includes all the participants of the 2016 Healthy Cognitive Aging Project (HCAP) who provided blood samples, younger participants designated for future HCAP assessments, and a subsample of HCAP non-participants. This subsample fully represents the entire HRS sample. The same sample that was selected for DNA methylation analysis was selected for RNA sequencing.

Documentation

Data Description

Access Information

VBS 2016 RNASeq Count Data is available as an HRS Sensitive Health Data Product. See the Sensitive Health Data page for application details.

Available Products

Genotype Data Version 1 (2006-2008 Samples)

Imputations

Important Note Concerning Flipped Strand Issues

Access Information

Genotype Data Version 2 (2006-2010 Samples)

Imputations

Access Information

Genotype Data Version 3 (2006-2012 Samples)

Imputations

Access Information

DNA Methylation (VBS 2016)

Documentation

Access Information

APOE and Serotonin Transporter Alleles (2021)

Documentation

Access Information

Candidate Gene and SNP Files

Cognition and Behavior

Longevity

Access Information

Exome Data (2006-2010)

Access Information

Epigenetic Clocks

Epigenetic Clocks: Supplement PACE and Grim2

Access Information

Polygenic Score Data (PGS)

Researcher Contributions from the Social Science Genetic Association Consortium (SSGAC):

Access Information

Telomere Data (2008)

Documentation

Access Information

VBS 2016 RNASeq Count Data

Documentation

Access Information