Abstract | Since the Surgeon General’s report in 1964, cigarette use has been declining in the US and in
most high-income countries across the rest of the world (Antman, Arnett, Jessup, & Sherwin,
2014; Spanagel, 2017) (M. Ng et al., 2014) (Jha & Peto, 2014). There have been multiple studies
of cigarette smoking as the primary risk factor for many diseases including lung cancer and heart
disease, contributing to 5 million deaths globally (Jha & Peto, 2014). The risk of cigarette smoking
is not limited to regular smokers alone, a higher risk of cancer and cardiovascular disease also
exists for intermittent smokers (Schane, Ling, & Glantz, 2010). Moreover, the effects of
secondhand smoke are also a public health concern since mere exposure has been linked to an
increased risk in lung cancer and stroke (Kim, Ko, Kwon, & Lee, 2018). Children are particularly
vulnerable because they breathe at a faster rate than adults and secondhand smoke has been
associated with increased rates of bacterial infections, acute respiratory illness and rates of
hospitalization of asthma attacks (Cao, Yang, Gan, & Lu, 2015; Z. Wang et al., 2015).
The only other substance to approximate the public health burden of tobacco use is alcohol use.
There are an estimated 2.4 billion people across the globe who use alcohol (Gowing et al., 2015;
Griswold et al., 2018; Jha & Peto, 2014). Alcohol use has been clearly linked to risk for a wide
variety of diseases (e.g., liver cirrhosis), but also to unintentional injuries such as traffic accidents
and falls (Griswold et al., 2018; Rehm, 2011).
Twin studies routinely find that nicotine and alcohol use are heritable (Polderman et al., 2015)
with ~50% of the phenotypic variance being accounted for by additive genetic effects. Despite
this substantial heritability, prior to 2019 only a handful of specific genetic variants or genomic
regions have been reliably found to be associated with substance use or dependence. The
decreasing cost of array genotyping and genome sequencing since the mid-2000s (Mardis, 2011)
has led to a marked increase in the number of studies that use these technologies to study
genetic associations for complex traits and disease.
The standard analytical approach for gene-disease mapping has become the genome-wide
association study (GWAS). Very simply, GWAS is a series of correlations between individuals’
genotypes, most commonly single-nucleotide polymorphisms (SNPs), with the phenotype of
interest. GWAS and GWAS meta-analyses have successfully found several functional, and
potentially causal variants associated with substance dependence (Laura J. Bierut et al., 2010;
Hancock et al., 2018; Walters et al., 2018). However, these variants account for a tiny fraction of
the phenotypic variation, prompting the conclusion that behavioral phenotypes are highly
polygenic; many variants, each of small effect, work in conjunction to influence the phenotype. In
fact, the effect of any single variant is so small that very large study samples are necessary to
detect them (Visscher et al., 2017). Alcohol and nicotine dependence are clinically relevant
phenotypes and tend to have higher heritability estimates than measures of consumption
(Verhulst, Neale, & Kendler, 2015; Vink, Willemsen, & Boomsma, 2005) which would make them
more ideal GWAS candidate phenotypes. However, there is substantial difficulty in achieving the
desired sample size in substance abuse as cases would require clinical diagnosis and further
work would be needed to find a suitable control (Dick, Meyers, Rose, Kaprio, & Kendler, 2011). It
is more practical to work with simple substance use phenotypes that are regularly collected in
biomedical studies using short survey questions (“Do you smoke regularly?” or “How many drinks
per week do you typically consume?”) and are common in medical records as part of regular
health check-ups or hospital intakes. There have been GWAS meta-analyses of alcohol and
nicotine use that have found several significant loci (Schumann et al., 2016; Tobacco and
Genetics Consortium, 2010) and we aim to increase the sample size further in order to capture
more substance use associated variants.
In the first chapter, we used GWAS meta-analysis to discover common variants (variants with
allele frequency > 0.1%) that are associated with alcohol and nicotine use. It is the largest GWAS
meta-analysis of alcohol and nicotine use to date combining summary statistics from over 30
GWASs and reached over 1.2 million participants of European descent.
For nicotine use, we examined cigarette smoking from initiation to cessation. The four
phenotypes are
• Smoking Initiation: Binary phenotype on whether the participants have ever been a
regular smoker (also commonly defined as having smoked more than 100 cigarettes). 2
coded as regular smoker and 1 as never a regular smoker.
• Age of initiation: Quantitative phenotype on when the participants started regularly
smoking. Individuals who are not regular smokers were set to missing.
• Cigarettes per Day: Binned phenotype (1-5) on how many cigarettes smoked per day.
Individuals who are not regular smokers were set to missing.
• Smoking cessation: Binary phenotype on whether the participants is a current or former
regular smoker. 2 coded as current and 1 as former. Individuals who are not regular
smokers were set to missing.
We had one alcohol use phenotype which measures heaviness of use.
• Drinks per week: Quantitative phenotype on how many alcoholic drinks per week they
consume. Studies were asked to left-anchor and log transform this phenotype.
We discovered 566 conditionally independent variants in 406 loci associated with nicotine and
alcohol use. Using these results, we performed cell, tissue, gene-set, and pathway enrichment
analyses on each set of meta-analysis results to understand the specific biological mechanisms
of those traits. An advantage to including both alcohol and nicotine use phenotypes is that we can
jointly explore the results for any common variants that may contribute to a more general
substance use factor. Alcohol and nicotine use are highly comorbid behaviors (Meyerhoff et al.,
2006) so there may be common variants that are affecting both substances pleiotropically. We
examined the genetic correlations between the phenotypes and did a pleiotropy analysis to see if
any genes overlap across the five traits. Lastly, to see the utility of these results, we calculated
polygenic risk scores (PRS) with the meta-analyses results and it significantly predicted the same
phenotypes in two other independent samples.
Previous large-scale GWAS meta-analysis of alcohol and nicotine use have found several
significant loci (Schumann et al., 2016; Tobacco and Genetics Consortium, 2010) but the
heritability derived from these SNPs are far from the heritability estimated from twin studies. SNPbased heritability from published GWAS meta-analysis results are generally under 10% (Zheng et
al., 2017), much less than the 30-60% (Grant et al., 2009; Polderman et al., 2015; Verhulst et al.,
2015; Vink et al., 2005) typically found in twin studies. The discrepancy between heritabilities
estimated from twin studies and genotyped variants has been termed the “missing heritability”
(Eichler et al., 2010; Gibson, 2012; Maher, 2008). One hypothesis is the effect of each individual
variant is much smaller than previous expectations and we may need hundreds of thousands of
individuals to detect them. Another common hypothesis concerns the genetic architecture
underlying the trait where rare variants with large effects are what’s contributing to the missing
heritability. There are examples of highly penetrant mendelian diseases that are due to low
frequency variants such as cystic fibrosis, therefore, it stands to reason that the same may be
true for behavioral traits as well. From an evolutionary theory perspective, if the variant has a
large deleterious effect then it is expected to be selected against in a population and thus exists
at a lower frequency (Gibson, 2012).
In the second chapter, we performed exome meta-analysis in parallel to the first chapter in order
to find rare variants that may be associated specifically with nicotine use. We examined 4 nicotine
use phenotypes:
• Smoking Initiation: Binary phenotype on whether the participants have ever been a
regular smoker (also commonly defined as having smoked more than 100 cigarettes). 2
coded as regular smoker and 1 as never a regular smoker.
• Cigarettes per day (CPD; quantitative trait) average number of cigarettes smoked per day
by ever smokers.
• Pack-years (quantitative trait; Packs per day x Years smoked, with a pack defined as 20
cigarettes); years smoked is typically formed from age at smoking initiation to current age
for current smokers or age at cessation for former smokers.
• Smoking cessation: Binary phenotype on whether the participants is a current or former
regular smoker. 2 coded as current and 1 as former. Individuals who are not regular
smokers were set to missing.
The exome-metanalysis was done simultaneously as the GWAS meta-analysis and found 40
common loci (also implicated in the GWAS meta-analysis) associated with nicotine use but no
conclusive rare variant associations. We also checked for conditionally independent rare variants
within previously associated loci and found one low-frequency variant (allele frequency~1%). In
order to characterize these loci, we queried the GWAS catalogue, QTL in GTEx V7, Brain xQT,
and BRAINEAC and also performed pathway enrichment analysis. Lastly, we used mendelian
randomization with our results and some key phenotypes associated with smoking. We found
causal associations between smoking initiation and educational attainment.
In order to understand the mechanisms of these addictions, there have been many animal
studies, most commonly mice, that model drug addiction from use to relapse (Lynch, Nicholson,
Dance, Morgan, & Foley, 2010). The biology and chemistry of alcohol and nicotine have been
studied extensively (Benowitz, Hukkanen, & Jacob, 2009; Cederbaum, 2012; Edenberg, 2007),
yet there are still gaps in the knowledge of how and why there are individual differences in the
metabolism of these substances. The underlying biology of substance metabolism may be
common amongst mammalian species, but human-specific traits and behaviors are much harder
to model and replicate in mice.
A common method to measure these underlying mechanisms in humans is to examine
endophenotypes that are associated with the complex phenotype of interest. Endophenotypes
are stable, simple, and heritable traits within individuals that are useful as measures associated
with a more complex phenotype; some examples of endophenotypes are biomarkers such as
cotinine and brain-based measures like electroencephalography. These endophenotypes are
viewed as measures that are closer to acute underlying biological pathways or cognitive
processes which may be expressed as part of the heterogeneity of a complex phenotype.
There have been studies linking alcohol use disorder and various brain-based endophenotype in
the literature (Carlson, Iacono, & McGue, 2002; Malone, Iacono, & McGUE, 2001). In the third
chapter, we associated the results from the imputed GWAS meta-analyses to these
endophenotypes in order to understand its connection to substance use. None of the associations
were significant after correcting for multiple tests.
|