Documente Academic
Documente Profesional
Documente Cultură
Editorial
Cell 167, November 17, 2016 ª 2016 Published by Elsevier Inc. 1139
Leading Edge
Analysis
1140 Cell 167, November 17, 2016 ª 2016 Published by Elsevier Inc.
Another place where scientists and
journalists are learning from each other
is the Kavli Conversations on Science
Communication, hosted by New York
University’s Science, Health and Environ-
mental Reporting Program. In a series of
live conversations, a scientist and a jour-
nalist discuss a specific scientific topic
and how best to communicate that topic
to the public. Recent topics include the
microbiome, genetic engineering, animal
cognition, and, on a lighter note, the hu-
mor in science.
Because science moves so quickly and
is so technically complicated, science
writers struggle to keep up. Having a sci-
entist colleague as a behind-the-scenes
resource would allow a journalist to feel
Journalists in the 2016 Marine Biology Laboratory hands-on research course learn to pipette. comfortable asking ‘‘dumb’’ questions in
Photo courtesy of Brad Shuster.
a nonjudgmental environment.
The Marine Biological Laboratory’s Lo-
or they leave out the larger meaning of a and offer to speak to a class or mentor a gan Science Journalism Program creates
discovery. Clearly, science writers can’t journalist, or they could contact a profes- just this sort of nonjudgmental environ-
do this job alone. So, how can research sional organization, such as the NASW or ment. The program allows working
scientists work with journalists to improve the Council for the Advancement of journalists to immerse themselves in a
science communication to the general Science Writing (CASW). These two orga- scientific community, where they ‘‘get
public? Here are some key ways that sci- nizations hold a joint meeting each year to unfiltered, unrestricted access to scien-
entists can help. discuss the craft of writing and to learn tists,’’ says Brad Shuster, biologist at
about new advances in science. CASW’s New Mexico State University, who has
1. Spend Time with Science Writers ‘‘New Horizons in Science’’ program con- taught the ten-day biomedical research
One way for scientists to engage in larger, sists of briefings from top scientists on course for 6 years.
public conversations is to simply spend near-future research, especially those ‘‘We’re together for 12–14 hours a day,
time with science journalists. They can topics that are expected to stir up contro- and we have informal discussions about
look for science writers who cover their versy society-wide. The goal ‘‘is to have a anything they want: politics, the ethics of
area of research and connect with them really sustained interaction between sci- funding, the state of science journalism,’’
by commenting on their articles publicly entists and journalists in a setting where says Shuster. Their day starts with coffee
or by contacting them privately. To find sci- they can spend some time together, and a short lesson at the white board and
ence journalists, scientists can read STAT, have a lot of conversations, and have then lab work all day. The journalists use
Quanta Magazine, The Open Notebook, formal presentations on new areas of sea urchins to study early development,
The Last Word on Nothing (a group blog), research,’’ says Rosalind Reid, executive and they perform genetic screens on
or Mosaic, a science magazine published director of CASW. ‘‘Our sessions are yeast, looking for interesting mutations.
by the Wellcome Trust. Scientists can offer highly interactive, and many times, the The scientists and journalists eat meals
to discuss their research specifically or the scientists say, ‘This is one of the most together, attend community-wide talks,
world of science more generally. fascinating speaking engagements I’ve and return to the lab after dinner.
Scientists who see themselves as part had.’.We encourage the scientists to Phong Tran, biologist at the Perelman
of the larger, public project of science stick around for a couple of hours after School of Medicine at the University
can make a big difference. ‘‘If you want they give a talk so that there can be lots of Pennsylvania and the Institut Curie,
to encourage [public] funding, obviously, and lots of questions.’’ has taught the biomedical course for
you have to reach out in partnership with Closer to home, scientists can attend three years. His goal is to ‘‘get the
the taxpayers,’’ says Shell. ‘‘By being events hosted by regional groups, such students to think like a scientist’’ by
available to journalists and by being as the Northern California Science Writers designing their own projects and
open to these conversations, you can Association, the New England Associa- answering novel research questions.
minimize the misinformation that gets out tion of Science Writers, or the Philadel- Tran has discovered that journalists, like
[as well as] the anti-science sentiment, phia-area Science Writers Association. scientists, have ‘‘an inherent curiosity to
which some people think is growing.’’ What better way to connect informally know things’’ and to get to the bottom
In addition to connecting with individual with journalists than to go on a museum of an issue by conducting extensive,
journalists, scientists could contact a tour or a nature hike with them or attend detailed research. He started teaching
university’s science journalism program a book talk and have a beer with them? the course because he ‘‘wanted the
Susan Matheson
Cambridge, MA, USA
http://dx.doi.org/10.1016/j.cell.2016.10.051
Bench to Bedside
NAME
Exondys 51 (Eteplirsen)
S K E L E TA L M U S C L E F I B E R
APPROVED FOR
Patients with Duchenne muscular dystrophy who have a confirmed
DMD trancript with patient mutation lacking exon 50 mutation applicable to exon 51 skipping
48 49 51 52 53 54 TYPE
Phosphorodiamidate morpholino oligomer (PMO)
48 49 51 52 53 54 MOLECULAR TARGETS
Exondys 51 PMO RNA transcript of DMD, exon 51
CELLULAR TARGETS
Skeletal muscles expressing DMD transcript
DMD trancript reading frame restored
48 49 52 53 54 EFFECTS ON TARGETS
Causes DMD transcript exon 51 to be skipped, putting the RNA back
in frame and creating an internally deleted but somewhat functional
Exondys 51 is the first therapy for Duchenne muscular dystrophin protein
dystrophy (DMD) to have been granted accelerated
approval by the FDA. Approval was granted based on a DEVELOPED BY
Sarepta Therapeutics
<1% increase in dystrophin expression as a surrogate
marker. Exondys 51 targets DMD exon 51 for skipping to
restore the reading frame for 13% of Duchenne patients.
27 Years
Average lifespan
Exondys 51 is applicable
165m
Exondys 51 treated patients walk
for 13% of Duchenne boys
165 meters further in 6 minutes
1987 2012
Cloning of the DMD gene 2003 Sarepta’s Phase IIB 48-week results
PMO exon skipping in
1988 Duchenne model
Out-of-frame hypothesis 2014
Sarepta’s Phase III trial ongoing
1996
First demonstration 2016
of exon skipping Accelerated approval granted
for Duchenne
References for further reading are available with this article online: www.cell.com/cell/fulltext/S0092-8674(16)31511-2
The International Human Epigenome Consortium (IHEC) coordinates the generation of a catalog of
high-resolution reference epigenomes of major primary human cell types. The studies now pre-
sented (see the Cell Press IHEC web portal at http://www.cell.com/consortium/IHEC) highlight
the coordinated achievements of IHEC teams to gather and interpret comprehensive epigenomic
datasets to gain insights in the epigenetic control of cell states relevant for human health and
disease.
One of the great mysteries in develop- to improve human health. A critical nism to facilitate communication among
mental biology is how the same genome component of IHEC is to coordinate the members and provides a forum for coordi-
can be read by cellular machinery development of common bioinformatics nation with the objective of maximizing ef-
to generate the plethora of different standards, data models and analytical ficiency among researchers working to
cell types required for eukaryotic life. tools to organize, integrate and display understand, treat, and prevent diseases.
As appreciation grew for the central the epigenomic data generated. Current full members of IHEC include:
roles of transcriptional and epigenetic IHEC members all contribute to these AMED CREST/IHEC Team Japan; DLR-
mechanisms in specification of cellular primary goals, but they also have indi- PT for BMBF German Epigenome Pro-
fates and functions, researchers around vidual complementary goals such as gramme DEEP; CIHR Canadian Epige-
the world encouraged scientific funding developing new and improved ways to netics Environment, and Health Research
agencies to develop an organized and monitor or manipulate the epigenome, Consortium (CEEHRC); European Union
standardized effort to exploit epigenomic discovering new epigenomic mecha- FP7 BLUEPRINT Project; Hong Kong
assays to shed additional light on this nisms, training the next generation of Epigenomics Project; KNIH Korea Epi-
process (Beck et al., 1999; Jones and epigenome researchers, exploring epige- genome Project; NHGRI ENCODE; the
Martienssen, 2005; American Association nomic features associated with disease NIH Roadmap Epigenomics Program;
for Cancer Research Human Epigenome states, and translating epigenomic dis- and the Singapore Epigenome Project
Task Force; European Union, Network coveries into improvements to human (http://ihec-epigenomes.org/). In the sub-
of Excellence, Scientific Advisory Board health. This is in keeping with the larger sequent sections, we overview experi-
2008). overarching vision of IHEC, which is mental and computational tools devel-
In March 2009, leading scientists and to help address fundamental questions oped by IHEC members and highlight
international health research funding in how the genome and environment key findings from a collection of recent
agency representatives were invited to a interact during development and aging, publications from IHEC members.
meeting in Bethesda, Maryland (US), to and how the epigenome influences health
gauge the level of interest in an interna- and disease. Indentifying Heterogeneity in
tional epigenomics project and to identify There are many strengths to a con- Epigenomic Measurements
potential areas of focus. This meeting, sortium model, bringing together research Cellular and allelic heterogeneity provides
and a subsequent conference in January expertise and knowledge from across a significant challenge in the interpre-
2010 in Paris (France) ultimately led to the world. These include the ability to tation of epigenomic signatures that
the creation of the International Human implement and monitor high-quality data are typically derived from heterogeneous
Epigenome Consortium (IHEC). and assay standards and to maximize populations of millions of individual cells.
The primary goals of IHEC are to coor- coverage of human cells and tissues while To address this challenge, we have devel-
dinate the production of reference maps avoiding unnecessary duplication. Addi- oped a series of molecular and computa-
of human epigenomes for key cellular tionally, this model helps harmonize data tional approaches to deconvolute epige-
states relevant to health and diseases, to collection, management, and analysis, nomic signatures from heterogeneous
facilitate rapid distribution of the data to to facilitate sharing and retrieval across populations. Three independent strate-
the research community, and to accel- countries and provides open access to gies are presented to explore the hetero-
erate translation of this new knowledge data and results. IHEC provides a mecha- geneity at bivalent domains, a ‘‘poised
We review emerging strategies to protect the privacy of research participants in international epi-
genome research: open consent, genome donation, registered access, automated procedures, and
privacy-enhancing technologies.
With the advent of the Human Genome data with public funding still lack familiar- privacy-enhancing technologies. How-
Project and the emergence of next-gener- ity with recent scientific norms appli- ever, these approaches have yet to gain
ation sequencing (NGS) technologies, cable to governing, curating, and sharing broad acceptance from the international
open data sharing has become increas- big data. This can lead to delays in research community.
ingly popular within the scientific commu- data sharing, as well as incomplete quality Based on the authors’ experience
nity. This model is now the norm for large control. as members of the International Cancer
scale ‘‘OMICS’’ research projects. In the The initial response from policymakers Genome Consortium (ICGC), the Inter-
context of epigenomic research, open has been to propose an intermediate national Human Epigenome Consortium
science will facilitate the association of approach involving ‘‘controlled access’’ (IHEC), and the Global Alliance for
rich epigenetic datasets (such as DNA to potentially identifying human genomic Genomics and Health (GA4GH), this
methylation, RNA expression, chromatin data and associated metadata. This pro- Commentary provides an overview of
states, and conformation) with data on cess requires, at a minimum, that re- the trade-offs involved in controlled-
participants’ medical and environmental searchers requesting access to the data access approaches for epigenomics
exposure. These will provide a reference complete an access agreement providing research and assesses the potential of
for studies of genetic and epigenetic personal and institutional identification. various proposed alternatives.
events that underlie human development, Researchers are also required to describe It is particularly timely to discuss this
diversity, and disease. The independent the purpose of their research and commit in the field of epigenetics, where research
evaluation of robustness of analytical to a number of good privacy and security is shifting from animal models to human
strategies and conclusions from experi- practices for the processing of controlled participants. Epigenomic datasets are
ments, a critical part of the scientific pro- data. even richer and more informative than
cess, hinges on access to materials, The controlled access approach is not genomic data with sequence variation
which, in this case, are raw sequence files a standalone, comprehensive method only, enhancing the benefits and exac-
and associated metadata. to ensure the complete protection of erbating the challenges of open data
Although the open science model potentially identifying health data. Rather, sharing and making concerns about
has made significant contributions to the controlled access should be deployed as ethics and data security all the more rele-
progress of ‘‘OMICS’’ research, it has part of an overall data privacy protection vant in this field. Members of IHEC have
also met with resistance from some framework that includes state-of-the-art adopted a tiered strategy to sharing
stakeholders. These include concerns physical, administrative, and technical se- data, using a completely open access
from the private sector over intellectual curity safeguards working in conjunction policy approach as a default and a
property rights and from the data pro- with national and international privacy controlled access approach where the
ducers over attribution and recognition norms. It should also include the develop- sensitivity of the data requires greater
of their work. Another important critique ment of an effective compliance and care. The IHEC Bioethics Workgroup con-
comes from privacy advocates due to accountability framework. ducts research to support IHEC’s data
the inherently identifying nature of genetic Yet, controlled access has been criti- sharing policies and regularly evaluates
information and the possibility for data cized by some members of the scientific the risks and benefits of evolving data
misuse by third parties. A daunting chal- community who believe that it repre- sharing strategies (Dyke et al., 2016b).
lenge is the inevitable variation among sents a strong impediment to open sci-
the large number of data producers, ence research. In an attempt to address The Controlled Access Approach
most operating outside large international this critique, other potential models According to the 2009 Toronto Statement
consortia, now having access to NGS have been proposed and implemented, on Data Sharing, ‘‘Data about human sub-
technologies in all branches of life sci- such as open consent, genome donation, jects participating in genetic and epide-
ences. Researchers producing valuable registered access, and the use of diverse miological research require particularly
Getting together to exchange ideas, forge collaborations, and disseminate knowledge is a long-
standing tradition of scientific communities. How conferences are serving the community, what
their current challenges are, and what is in store for the future of conferences are the topics covered
in this Commentary.
Conferences are a big part of scientists’ ings provide an excellent opportunity later, I still have great appreciation for
lives. They provide an important forum to bring people together with different that experience and how invaluable it
to bring a research community together views and advance a field of study. was at such an early stage of my career.
and serve as a platform to disseminate Throughout my career, I have had the Presenting my unpublished work gave
knowledge and forge collaborations. opportunity to participate in various sci- me confidence, as it allowed me to test
However, we are now in an era of rapid entific conferences as attendee, speaker, new ideas and to get constructive feed-
technological changes that influence the and organizer. I have also had the privi- back on the project. Moreover, it opened
way in which the research community lege of working with nonprofit organiza- a network that has enriched my career.
interacts and communicates science. tions such as Keystone Symposia as I have since learned that, for many scien-
With so many emerging forms of digital Chair of the Board of Directors and pro- tists, this is not an uncommon experience,
communication and social media, one fessional societies such as the European and I perceive it to be one of the values of
can question whether the face-to-face Association for the Study of Diabetes as face-to-face meetings.
meeting is still relevant. The globalization President and Member of the Executive
of science brings many stakeholders Committee. Collectively, these experi- Not All Conferences Are Born Equal
from diverse regions of the world to the ences have given me insight from several Every week, numerous scientific confer-
table, and with this comes a need to find different directions into the future of sci- ences are taking place around the world.
a common meeting ground to discuss entific conferences. With this short Com- These meetings come in many different
emerging biology. This is compounded mentary, I will share thoughts, ideas, and flavors. But, scientists do not have limit-
by the fact that, today, individual sci- personal reflections on the importance of less amounts of time and money to go to
entists are facing tighter budgets, and scientific conferences, focusing on ques- every meeting. Therefore, it is vital to pri-
consequently, many investigators are cut- tions related to how they serve the com- oritize how resources are spent. Different
ting travel expenses to preserve precious munity, what the current challenges are, meetings serve different purposes, and
funding to maintain research activities. and the outlook for the future. one should choose carefully. Depending
Thus, with the potential risk that funds on your career stage, you may choose
to support research conferences may Breaking into the Business one type of conference over another. For
shrink, there may be fewer opportunities Each one of us has our own unique view of example, young trainees may be better
to participate in vital face-to-face meet- the importance of scientific conferences, off going to national meetings in which
ings. But, in order to make progress and this view is likely to shift during they can meet their local community
in research and education, scientists various stages of our career. One of my to learn more about what their peers are
are dependent on an open exchange earliest experiences of a scientific confer- doing, what techniques are available in
of knowledge and the ability to come ence was as a graduate student, where I nearby laboratories, and to inspire poten-
together to discuss, debate, and tackle had the honor of presenting my first short tial collaborations with the people readily
emerging issues related to their field of talk at a large society meeting. The mem- available to them. Post-doctoral fellows
study. ory of this experience is still quite vivid. and early-stage investigators may be bet-
The quest for new knowledge has been I practiced the talk numerous times and ter served by going to larger, international
a driving force for centuries. Even early discussed with my colleagues for hours meetings to ‘‘break into the business’’ and
Greek philosophers, including Socrates, in anticipation of possible questions I develop new contacts or collaborations.
Plato, and Aristotle, understood the might receive from the delegates. The More senior scientists may focus on
need to present new ideas, question final publication was greatly enhanced selective, invitation-only meetings (think-
dogma, and participate in vigorous by the input received during the presenta- tank style) in which the future direction
debate. In this regard, face-to-face meet- tion (Kaiserauer et al., 1989). Decades of the field may be set, as this can be
person, you can form a stronger network exchange remains. Broader expertise don’t consider presentations at meetings
than you could by attending a conference from many disciplines may be required to be previous publications that under-
online. You never know who you might to tackle the complex etiology of a dis- mine the novelty of a set of findings.
meet at a scientific conference—it could ease like type 2 diabetes. Progress will Thus, presentations at scientific confer-
be the head of a new research institute, not be made in isolation. Many fields ences could be brought into the historical
the founder of an up and coming biotech evolve in large part because of the de- record by becoming more formally inte-
company, or the next Nobel Laureate. bates and discussions that occur at sci- grated with the eventual journal publica-
My own view is that virtual meetings entific meetings. tion of the work. How future meetings
will never replace face-to-face versions can take advantage of this attribute to
because of this unpredictable human On the Horizon help the dissemination of scientific ideas
element. So how can more people be brought to while, at the same time, increasing their
the table to address the vast research relevance and their value is an ongoing
Solving Complex Problems questions that we as scientists face? concern for many professional societies
Throughout the years, some of the stron- One could imagine the ‘‘future’’ scientific and associations.
gest scientific partnerships have been conference where people gather for a
formed through interactions in scientific face-to-face meeting where new and Building Bridges and Filling in the
conferences. Taking an example from unpublished research is presented. This Gaps
my own field of diabetology, scientific could be interfaced with a virtual offering In order to tackle the complexity of the
conferences have provided a forum for of the various presentations that are sup- many communicable and non-communi-
long-standing debates on the pathogen- ported with an online discussion or ques- cable diseases facing humankind, clinical
esis of type 2 diabetes. Given the tion and answer forum to bring in a wider and experimental scientists in inter-
complexity of this metabolic disease, the community. This would leave a public and cross-disciplinary areas will need to
contributions of countless investigators record of the pioneering work presented continue to come together as a commu-
have been vital to progress, and synthesis at the face-to-face meeting, while also nity in face-to-face interactive exchanges
of different ideas and approaches that creating a forum for continued dialog to share knowledge. Thus, in the foresee-
happens at meetings (Figure 1) have pro- that community could become actively able future, while digital and social media
vided a greater understanding of etiology engaged in. The hope would be that the can play an important role, scientific
and pathophysiology of the disease, with online comments and dialogue might conferences will remain vital to serve
multiple treatment options now being help improve the work before it finds its the community by offering an arena to
available. Despite this, there is no cure way to the official scientific record in the advance the dialog, debate, and discuss.
for diabetes, and the need for scientific form of a journal publication. Journals Face-to-face meetings give you the
Knowing the configuration of the nuclear pore is essential for appreciating the underlying mecha-
nisms of nucleo-cytoplasmic communication. Now, Fernandez-Martinez et al. present a high-
resolution structure of the cytoplasmic nuclear pore-mRNA export holo-complex, challenging our
textbook depiction of this massive membrane-embedded complex.
Often referred to as the gateway to through the central transport channel of scopy (EM) to determine the average
the genome, the nuclear pore complex the NPC, lined with FG-rich Nups. At the morphology and dimensions of the
(NPC) functions as a selectively perme- cytoplasmic face of the NPC, an RNA heli- complex; (3) integration of available and
able channel that mediates nucleo-cyto- case Dbp5 and its binding partner Gle1 similar atomic structures of individual
plasmic transport, such as import of remodel the mRNP, removing export re- domains and components; (4) NMR data
proteins and export of newly generated ceptors, which ensures the directionality on disordered FG repeat domains; and
mRNAs. While the general picture of of mRNP transport and results in release (5) chemical crosslinking, followed by
transport across the NPC has been of the mRNA into the cytoplasm for trans- mass spectrometry (CX-MS), to define
drawn, molecular understanding of the lation. The remodeling of mRNPs occurs protein residues that are in spatial
functional interplay between the NPC in association with the Nup82-holo proximity. Through this highly integrative
and the transporting complexes remains complex that is comprised of Nup159, approach, the authors were able to
incomplete, largely because of the diffi- Nup82, Nsp1, and Dyn2 and acts as a generate a model structure of the Nup82
culty of solving the structure of such binding hub for the mRNP remodelers complex that satisfied all of the input
an enormous multi-subunit complex. The Gle1 and Dbp5. The last step of remodel- constraints at the final 9.0 Å resolution.
65–125 MDa NPC consists of multiples ing has been proposed to occur on cyto- The solved structure demonstrates that
of 30 different components, termed plasmic fibrils, comprised of the Nup82 Nup82 holo-complex assembles into an
nucleoporins (Nups), and various unstruc- complexes, which are thought to protrude asymmetric ‘‘D’’-shaped particle formed
tured domains, containing stretches of away from the NPC scaffold into the cyto- by compositionally identical subunits,
phenylalanine-glycine (FG) repeats, are plasm (Figure 1). This step is one of the each consisting of Nup82, Nup159, and
present within many Nups (D’Angelo and less understood transitions in the mRNA Nsp1, which bind to a Dyn2 dimer. One
Hetzer, 2008). FG-rich Nups comprise export process because, from the earlier of the subunits forms the ‘‘rod’’ while the
the NPC sub-complexes that interface proposed structure of the NPC, it is un- other forms the ‘‘loop’’ of the D-shaped
with mRNA export machinery and play clear how mRNPs would ‘‘jump’’ from holo-complex.
key roles in mRNA export (Knockenhauer the inner channel to the distally located With a structure of the Nup82 holo-
and Schwartz, 2016). In this issue of Cell, cytoplasmic fibrils and how the process complex in place, the authors used
Fernandez-Martinez et al. (2016) pre- of mRNP transport is coupled to its CX-MS to determine how the holo-com-
sents a sub-nanometer-resolution struc- final remodeling (Folkmann et al., 2011; plex associates with the rest of the NPC.
ture of the yeast Nup82 holo-complex, Knockenhauer and Schwartz, 2016). The majority of the identified crosslinks
which not only provides critical infor- In the current study, the authors base connected the Nup82 complex to the
mation for understanding the molecu- their approach on previously devel- NPC scaffold via the Nup84 Y-shaped
lar underpinnings of mRNA export but oped modeling methods, which integrate complex. Combining this data with previ-
also reveals that our generally accepted diverse sets of biochemical and biophys- ously determined density maps and crys-
depiction of the nuclear pore has been ical data to assemble structures of large tallographic data on the yeast Nup84
incorrect for decades. complexes, including the recently solved complex (Alber et al., 2007; Kelley et al.,
Export of mRNA is a multi-step process structure of the scaffold complexes of 2015), the authors put forward a structural
involving several key interactions with the the NPC (Shi et al., 2015; von Appen model for the entire Nup82-Nup84 com-
NPC (Folkmann et al., 2011; Knockenha- et al., 2015). The extensive array of plex assembly. Unexpectedly, the model
uer and Schwartz, 2016). After post-tran- high-resolution methods and information revealed that the Nup82 holo-complex is
scriptional modifications and processing, used here is remarkable. It includes (1) not extended away from the NPC scaf-
mRNA associates with ribonucleopro- quantitative mass spectrometry, such as fold, as has been proposed and drawn
teins (RNP) to form mRNP particles. With QconCAT-MS, on the purified Nup82 for decades, but instead positions right
the assistance of export receptor pro- complex to determine its stoichiom- over the scaffold ring of the NPC and
teins Mex67 and Mtr2, mRNPs traverse etry; (2) negative stain electron micro- faces downward into the central transport
channel of the NPC, with its FG regions a collection of truncation mutants for
REFERENCES
projecting similarly into the channel the Nup84 sub-complex components.
(Figure 1). Remarkably, mutations in the Nup84 Alber, F., Dokudovskaya, S., Veenhoff, L.M.,
This structural information has a far- complex components that resulted in Zhang, W., Kipper, J., Devos, D., Suprapto, A.,
reaching impact on our understanding an mRNA export defect mapped largely Karni-Schmidt, O., Williams, R., Chait, B.T., et al.
of mRNA export. It demonstrates that, to the Nup85-Seh1 arm of the Nup84 (2007). Nature 450, 695–701.
instead of distant cytoplasmic fibrils, complex ‘‘Y,’’ which is precisely the D’Angelo, M.A., and Hetzer, M.W. (2008). Trends
the Nup82 holo-complexes form ‘‘struts’’ interface that the structure predicts to Cell Biol. 18, 456–466.
that hover immediately over the exit interact with the Nup82 holo-com- Fernandez-Martinez, J., Kim, S.J., Shi, Y., Upla, P.,
point of the traversing mRNP particles plex. Consistently, truncation mutants Pellarin, R., Gagnon, M., Chemmama, I.E., Wang,
(Figure 1). The FG domains, projected of Nup85 exhibited loss of Nup82 from J., Nudelman, I., Zhang, W., et al. (2016). Cell
from the ‘‘struts,’’ thus form an FG con- the nuclear pore, as assessed with 167, this issue, 1215–1228.
tinuum with the underlying FG domains a Nup82-GFP reporter, reinforcing the Folkmann, A.W., Noble, K.N., Cole, C.N., and
of the central channel Nups, providing notion that mRNA export defects ex- Wente, S.R. (2011). Nucleus 2, 540–548.
a straightforward path for the trav- hibited by the Nup84 complex are Kaneb, H.M., Folkmann, A.W., Belzil, V.V., Jao,
eling mRNPs to the remodeling pro- explained by its interaction with the L.E., Leblond, C.S., Girard, S.L., Daoud, H., Nor-
eau, A., Rochefort, D., Hince, P., et al. (2015).
teins bound to Nup82 complex. In this Nup82 holo-complex.
Hum. Mol. Genet. 24, 1363–1373.
manner, the mRNP particles could make The study further demonstrated that
efficient contact with Dbp5 and Gle1 to this structure of the yeast Nup82-Nup84 Kelley, K., Knockenhauer, K.E., Kabachinski, G.,
and Schwartz, T.U. (2015). Nat. Struct. Mol. Biol.
undergo final remodeling, no ‘‘jumping’’ complex assembly aligns well with the
22, 425–431.
required. available cryo-EM density map of the hu-
Knockenhauer, K.E., and Schwartz, T.U. (2016).
This model of mRNA export was man NPC (von Appen et al., 2015), sug-
Cell 164, 1162–1171.
further validated by the identification gesting that the unexpected configuration
Shi, Y., Pellarin, R., Fridy, P.C., Fernandez-Marti-
of crosslinks of the Nup82 holo-com- of the cytoplasmic portion of the NPC
nez, J., Thompson, M.K., Li, Y., Wang, Q.J., Sali,
plex to mRNA export machinery, which and likely the proposed model of mRNA A., Rout, M.P., and Chait, B.T. (2015). Nat.
demonstrated that Gle1 and its Dbp5- export are evolutionarily conserved. This Methods 12, 1135–1138.
interacting domain are similarly oriented is particularly exciting since components von Appen, A., Kosinski, J., Sparks, L., Ori, A., Di-
by the Nup82-Nup84 scaffold down- of the human homolog of the Nup82 Guilio, A.L., Vollmer, B., Mackmull, M.T., Banterle,
ward, toward the inner channel (Fernan- holo-complex, the Nup88-Nup214 com- N., Parca, L., Kastritis, P., et al. (2015). Nature 526,
dez-Martinez et al., 2016). Furthermore, plex, have been associated with a variety 140–143.
to obtain functional in vivo validation of human pathologies. For instance, both Xu, S., and Powers, M.A. (2009). Semin. Cell Dev.
of the structure, the authors analyzed mis-expression of Nup88 and oncogenic Biol. 20, 620–630.
*Correspondence: dennis_kasper@hms.harvard.edu
http://dx.doi.org/10.1016/j.cell.2016.10.047
In this issue of Cell, Desai et al. compare how dietary fiber affects the gut microbiota and suscep-
tibility to disease. They find that a fiber-free diet promotes mucus-degrading bacteria and suscep-
tibility to Citrobacter rodentium infection.
The Western diet, characterized by reduced fiber influences the microbiota Further analysis of the intestines of
increased fat and sugar intake and and disease. these mice showed a thicker mucus
decreased fiber intake, has been impli- In a study in this issue of Cell, Desai layer in FR mice than in FF mice. This
cated in a wide variety of diseases, et al. (2016) designed a synthetic micro- difference suggests that, in the
including cancer, type 2 diabetes, and biota (SM) to investigate how dietary absence of dietary fiber, mucus-de-
cardiovascular disease (Cordain et al., fiber affects the composition of the gut grading bacteria outcompete fiber-
2005). Furthermore, the Western diet is microbiota and protection from disease. metabolizing bacteria by degrading the
associated with changes in the bacteria This SM is composed of 14 commensal host mucus lining. The mice on the
in our gut—i.e., our gut microbiota (Turn- species that represent the five dominant alternating FR and FF diets had a
baugh et al., 2008). These associations bacterial phyla present in a human mucus layer of intermediate thickness
have led to the premise that diet influ- gut. To characterize the metabolic prop- (Desai et al., 2016).
ences the gut microbiota, which in turn erties of each of these species, the To investigate whether the FF diet
influences health and disease. Generally, authors evaluated its growth in vitro with had any negative health effects, GF
the Western diet is less diverse than 42 different plant- and animal-derived and SM mice fed FR or FF diets were
more traditional diets (Turnbaugh et al., mono- and polysaccharides. Germ-free challenged with the mouse pathogen
2008). Many researchers hope that, if we (GF) mice colonized with the SM were Citrobacter rodentium. SM mice fed
determine which of the bacteria neces- fed fiber-rich (FR; 15% fiber from mini- the FF diet became much sicker than
sary to promote a healthy gut are missing mally processed grains and plants), mice fed the FR diet. GF mice on either
from our diet, we will be able to design fiber-free (FF), and prebiotic (Pre; purified diet did not display disease symptoms,
a remedial probiotic (bacterial) or prebi- soluble glycans found in prebiotics) diets. a result suggesting that a synergistic
otic (a supplement that feeds specific To imitate the fluctuating human diet, effect between the thin mucus layer
bacteria) or to change our diet to harness some groups of mice were alternately and mucus-degrading bacteria pro-
disease. fed FR and FF or Pre and FF diets, switch- motes C. rodentium disease (Figure 1).
To understand the mechanisms by ing every other day or every 4 days. The These data suggest that an FF diet pro-
which the Western diet influences the numbers of two mucus-degrading bacte- motes outgrowth of mucus-degrading
microbiota and disease, many studies of rial species—Akkermansia muciniphila bacteria, which in conjunction with a
mice have focused on high-fat diets (Dev- and Bacteroides caccae—increased on thinner mucus wall, increases suscepti-
kota et al., 2012; Mahana et al., 2016; the FF diet, whereas the growth of bility to the pathogen.
Turnbaugh et al., 2008). It is commonly two fiber-metabolizing species—Bacter- Using a synthetic, well-characterized
believed that increased fat intake is the oides ovatus and Eubacterium rectale— microbiota, this paper tracks how
cause for obesity-associated diseases, decreased. These changes also occurred dietary fiber influences gut bacterial
yet low-fat diets have failed to yield major when mice were fed alternating diets. The composition and how the bacteria stud-
health benefits (Taubes, 2001). Instead, Pre diet affected the microbial commu- ied affect gut health. Even though GF
other aspects of the Western diet nity in a manner similar to the FF diet; mice were gavaged with the same set
may be responsible for gut microbiota this observation suggests that eating of bacteria, dietary fiber influenced
changes and disease. In fact, recent foods containing prebiotics does not the ratio of these bacteria in the gut.
studies show that dietary fiber promotes have the same beneficial effect as actu- These results help explain why,
the growth of symbiotic bacteria that ally eating dietary fiber. Transcription although gut bacteria have been shown
increase the production of short-chain analysis of the bacteria confirmed these to play a major role in health and dis-
fatty acids and protect the host from findings, showing an increase in fiber- ease, studies of supplemental probiot-
a variety of diseases (Sonnenburg and metabolizing enzymes on the FR diet ics have found an underwhelming
Sonnenburg, 2014). Little is known, how- and in mucus-degrading enzymes on impact. The present study suggests
ever, about the mechanisms by which the FF diet (Desai et al., 2016). that, if probiotics are given but the
REFERENCES
*Correspondence: avisel@lbl.gov
http://dx.doi.org/10.1016/j.cell.2016.10.054
Distal regulatory elements, such as enhancers, play a central enriched in non-coding loci harboring regulatory functions
role in controlling expression in mammalian genomes. Enhancer (Maurano et al., 2012), but specific examples of non-coding
sequences act as substrates for binding of tissue-specific tran- sequence variants conclusively and mechanistically linked to
scription factors and drive transcription through physical inter- disease remain limited. The functional genome annotations
action with gene promoters (Spitz and Furlong, 2012). Recent from the series of new papers (Schmitt et al., 2016; Javierre
chromatin profiling studies reveal the exceptional cell type and et al., 2016; Pellacani et al., 2016) along with a computational
temporal specificity of enhancer activity, which exceeds that of algorithm capable of integrating epigenomic findings described
other classes of gene regulatory sequences (Ernst and Kellis, in Breeze et al. (2016) provide handy tools for addressing
2010; Nord et al., 2013). This stunning specificity, alongside the gap between disease-associated non-coding variants and
advances in sequencing technologies and the increasingly their regulatory gene targets. Using these complementary tech-
recognized importance of non-coding sequences in human niques to explore the regulatory landscape in human tissues and
development and disease, has driven large-scale efforts to isolated primary cell populations, these studies report insights
annotate regulatory elements and gene transcription in the and resources that will be instrumental in linking variants with
human genome under a wide variety of conditions. The Interna- causal mechanisms of disease.
tional Human Epigenome Consortium (IHEC) (Bae, 2013) con-
nects many of these projects, with the goal of characterizing Insights into Cell-Type-Specific Regulation
1,000 epigenomes from different human cell types at diverse Histone ChIP-seq has now become a standard method to
developmental stages and disease states. identify regulatory regions genome-wide (Park, 2009). ChIP-
New studies published in this issue of Cell and in Cell Reports seq combines chromatin immunoprecipitation of modified
and described in greater detail throughout the following sections histones with high-throughput sequencing to identify active
of this Minireview build upon IHEC efforts to explore the role enhancers and other regulatory features. While the underlying
of cell-type-specific regulation and begin to address several DNA sequence does not vary between cell types, histone
important challenges in the field (Schmitt et al., 2016; Javierre modifications mark regions that are active or repressed in vivo
et al., 2016; Breeze et al., 2016; Pellacani et al., 2016). In brief, in a tissue-specific manner. When paired with technologies for
Pellacani et al. (2016) tackle the question of cell type specificity capturing specific cell types, ChIP-seq can be used to identify
of enhancers across the individual cell types that make up differential regulation in cell populations derived from heteroge-
heterogeneous tissues. The authors use chromatin profiling neous tissue. An elegant example of this approach is provided
methods to identify regulatory elements active in the distinct by Pellacani et al. (2016), who generate histone ChIP-seq, DNA
cell populations that comprise mammary tissue. While chromatin methylation, and gene expression data to identify cell-type-spe-
profiling is powerful for identifying predicted enhancer se- cific regulatory elements in primary human mammary tissue.
quences, it is limited in its ability to elucidate the gene target(s) Consistent with previous findings (Gascard et al., 2015), their
of the predicted enhancers. To address this challenge, Javierre results show widespread differences among the different cell
et al. (2016) and Schmitt et al. (2016) use cutting-edge chromo- types isolated from this heterogeneous tissue and relative
some conformation capture techniques to map enhancer-pro- to previous results from immortalized mammary cell lines. The
moter interactions in a variety of human tissues and primary biological relevance of these observations is reinforced by
cell types. Finally, disease-associated variants identified in the findings that differential enhancer utilization in mammary
genome-wide association studies (GWAS) are overwhelmingly cell types is consistent with cell-specific gene expression
non-coding (Altshuler et al., 2010; Visel et al., 2009) and are and that cell-type-specific enhancers are enriched for unique
with enhancer annotations and clustered genes according to conformation capture techniques complement these datasets
enhancer specificity for each cell type. This analysis identifies by linking tissue-specific enhancers with candidate gene targets,
sets of genes that are dynamically regulated in different cell and such approaches are increasingly being used to interpret
types across the hematopoietic tree. The correlation between non-coding disease-associated variation (Martin et al., 2015;
cell-type-specific enhancer activity and gene expression sup- Won et al., 2016). Most studies thus far have focused on
ports a functional role for these interactions in regulating cell one specific cell type or tissue to prioritize GWAS variants. In
fate and differentiation. contrast, Javierre et al. (2016) and Schmitt et al. (2016) analyze
genome interactions across many tissue types or cell popula-
Interpretation of Genetic and Epigenetic Variation in tions, further facilitating the prioritization of regulatory candi-
Disease dates. The papers show that lineage- and cell-type-specific
Elucidating the mechanistic role of non-coding sequence regulatory regions are enriched for genetic variation from associ-
variation in human disease remains an unmet challenge. Tissue- ation studies of phenotypes with similar cell specificity. Javierre
and cell-type-specific annotations of regulatory elements et al. (2016) also use lineage-specific interactions elucidated
generated by ChIP-seq are now widely available through the by PCHi-C to create a prioritized list of genes that may
work of the IHEC members and individual investigators. These be implicated in disease through interactions with disease-
efforts represent an important first step in bridging this gap, associated non-coding regions identified by GWAS. One type
and work is now being done to integrate these diverse maps of interaction diagrammed in Figure 1 is ‘‘lineage-specific pro-
together into high-confidence enhancer annotations to identify moter interactions.’’ Hypothetically, the presence of a pheno-
which disease-associated variants are most likely to impact type-associated variant in an enhancer that interacts with two
gene regulatory sequences (Dickel et al., 2016). Chromosome promoters in a relevant cell lineage would prioritize these genes
similar analyses for EWAS results. The new tool maps regions Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy,
of differential methylation that have been implicated in dis- T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al.
(2009). Science 326, 289–293.
ease through EWAS to regulatory regions genome-wide. Thus,
eFORGE identifies potential mechanistic links between cell- Liu, Y., Aryee, M.J., Padyukov, L., Fallin, M.D., Hesselberg, E., Runarsson, A.,
Reinius, L., Acevedo, N., Taub, M., Ronninger, M., et al. (2013). Nat. Bio-
type-specific distal regulation and epigenome-wide association
technol. 31, 142–147.
studies, information that could aid in the development of disease
Martin, P., McGovern, A., Orozco, G., Duffus, K., Yarwood, A., Schoenfelder,
treatments.
S., Cooper, N.J., Barton, A., Wallace, C., Fraser, P., et al. (2015). Nat. Com-
The compelling new studies presented here use epigenomic mun. 6, 10069.
data to assess the regulatory architecture across an impressive
Maurano, M.T., Humbert, R., Rynes, E., Thurman, R.E., Haugen, E., Wang, H.,
range of primary human cells and tissues. Their findings empha- Reynolds, A.P., Sandstrom, R., Qu, H., Brody, J., et al. (2012). Science 337,
size the cell type specificity of regulatory interactions and the dy- 1190–1195.
namic nature of regulatory networks, and this information will Nord, A.S., Blow, M.J., Attanasio, C., Akiyama, J.A., Holt, A., Hosseini, R.,
be valuable for the interpretation of human disease findings. Phouanenavong, S., Plajzer-Frick, I., Shoukry, M., Afzal, V., et al. (2013). Cell
While this Minireview focused on assessing non-coding variants 155, 1521–1531.
from GWAS, cell-type-specific interactions can also be used Park, P.J. (2009). Nat. Rev. Genet. 10, 669–680.
to interpret rare non-coding variation from whole-genome Pellacani, D., Bilenky, M., Kannan, N., Heravi-Moussavi, A., Knapp, D.J.H.F.,
sequencing studies (Weedon et al., 2014), a technology that is Gakkhar, S., Moksa, M., Carles, A., Moore, R., Mungall, A.J., et al. (2016).
being adopted with increasing frequency for human disease Cell Rep. 17. Published online November 15, 2016. http://dx.doi.org/10.
studies. The computational and experimental resources from 1016/j.celrep.2016.10.058.
these epigenomic studies will be valuable for understanding Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D.,
chromatin structure, as well as for facing the considerable chal- Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., and Aiden,
lenge of linking non-coding variation with cell-specific mecha- E.L. (2014). Cell 159, 1665–1680.
nisms of disease. Schmitt, A.D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C.L., Li, Y., Lin, S., Lin, Y.,
Barr, C.L., and Ren, B. (2016). Cell Rep. 17. Published online November 15,
ACKNOWLEDGMENTS 2016. http://dx.doi.org/10.1016/j.celrep.2016.10.061.
Schoenfelder, S., Furlan-Magaril, M., Mifsud, B., Tavares-Cadete, F., Sugar,
This work was supported by National Institutes of Health grants R01HG003988, R., Javierre, B.-M., Nagano, T., Katsman, Y., Sakthidevi, M., Wingett, S.W.,
U54HG006997, U01DE024427, R24HL123879, and UM1HL098166. Research et al. (2015). Genome Res. 25, 582–597.
conducted at the E.O. Lawrence Berkeley National Laboratory was performed
Spitz, F., and Furlong, E.E.M. (2012). Nat. Rev. Genet. 13, 613–626.
under Department of Energy Contract DE-AC02-05CH11231, University of
California. Visel, A., Rubin, E.M., and Pennacchio, L.A. (2009). Nature 461, 199–205.
Weedon, M.N., Cebola, I., Patch, A.-M., Flanagan, S.E., De Franco, E., Cas-
REFERENCES well, R., Rodrı́guez-Seguı́, S.A., Shaw-Smith, C., Cho, C.H.-H., Lango Allen,
H., et al.; International Pancreatic Agenesis Consortium (2014). Nat. Genet.
Altshuler, D., Lander, E., and Ambrogio, L.A. (2010). Nature 476, 1061–1073. 46, 61–64.
Bae, J.-B. (2013). Genomics Inform. 11, 7–14. Won, H., de la Torre-Ubieta, L., Stein, J.L., Parikshak, N.N., Huang, J., Opland,
Breeze, C.E., Paul, D.S., van Dongen, J., Butcher, L.M., Ambrose, J.C., Bar- C.K., Gandal, M.J., Sutton, G.J., Hormozdiari, F., Lu, D., et al. (2016). Nature
rett, J.E., Lowe, R., Rakyan, V.K., Iotchkova, V., Frontini, F., et al. (2016). 538, 523–527.
The hematopoietic system plays a major role in human health. Two studies by Astle et al. and
Chen et al. published in this issue of Cell use genome-wide association and functional genomics
approaches to provide deep insights into the role of genetic variants in hematological traits.
We discuss these discoveries and future strategies toward completing our understanding of the
genetic basis for variation in human traits.
Cell 167, November 17, 2016 ª 2016 Published by Elsevier Inc. 1167
LD patterns between eQTL, meQTL, and hQTL variants showed the associated variants are classified as pathogenic in the
that almost half of the eQTL variants were also associated with ClinVar database, and coding-associated variants are strongly
an epigenomic QTL. This suggests highly coordinated genetic enriched among Mendelian rare disease genes, demonstrating
influences on gene expression, DNA methylation, and chromatin overlap with Mendelian disease mutations and GWAS discov-
binding, although proving causal links between different molec- eries described here. The study also provides a large catalog
ular changes remains challenging, as the authors point out. In of new rare and low-frequency variants associating to diverse
contrast, sQTLs with eQTLs exhibit predominantly independent hematological traits, including associations with a plausible
genetic influences on splicing and expression, similarly to what biological hypothesis and possible medical importance. These
has been described in previous studies (Lappalainen et al., results indicate how well-powered GWAS reaches into the clas-
2013; Li et al., 2016). sical domain of Mendelian genetics and how these previously
The largest eQTL studies so far have been performed in whole largely separate fields inform each other.
blood due to its easy availability (Battle et al., 2014; Westra et al., The genetic associations to red cells, white cells, and platelets
2013; Wright et al., 2014), but the recent focus has been on were observed to be predominantly different, which is consistent
analyzing eQTL and other molecular QTLs in different tissues with the different biological roles of these cells. In order to
(Aguet et al., 2016), computationally deconvoluted cell types analyze functional underpinnings of the discovered GWAS loci,
(Westra et al., 2015; Zhernakova et al., 2015), and purified cell Astle et al. (2016) use epigenetic reference maps and genetic as-
types (Fairfax et al., 2012; Naranbhai et al., 2015; Raj et al., sociations to molecular traits described by Chen et al. (2016)
2014; Chen et al., 2016). The epigenomic QTL data by Chen from trait-matched primary blood cells. Heritability analysis
et al. (2016) provides additional insights to cell-type specificity, show the largest proportions of GWAS associations driven by
which is often analyzed only at the eQTL level. They show that enhancer elements, as well as transcribed regions, of which
sharing of genetic associations across monocytes, neutrophils, the latter includes both coding effects and proximal, as well as
and naive T cells was greatest for meQTLs, followed by eQTLs post-transcriptional, regulatory effects (Gaffney et al., 2012).
and H3K4me1 and H3K27ac hQTLs, which is consistent with The analysis of GWAS and QTL associations revealed 198
cell-type-specific roles of enhancers. Given that much of GWAS loci with a colocalized molecular QTL, pinpointing likely
GWAS heritability lies in enhancers (Finucane et al., 2015), causal genes and biological mechanisms for the hematological
analysis of specific cell types may be a key approach for better trait associations.
understanding the regulatory function of GWAS loci. Most hematopoietic traits can be considered intermediate
Finally, to link disease-associated loci to their causative phenotypes rather than diseases themselves. Furthermore,
genes and identify putative regulatory mechanisms, the authors while hematological traits associate to many diseases, it has
performed colocalization analysis of GWAS data from six usually remained unclear whether the blood trait changes are a
autoimmune diseases and their molecular QTLs. Interestingly, cause or consequence of the disease. Genetic associations pro-
out of the 115 loci with colocalized associations, a large vide an opportunity to decipher the causal relationship, and Astle
number involve chromatin or splicing changes without a detect- et al. (2016) use this to investigate causal links of hematopoietic
able corresponding eQTL, highlighting the importance of study- traits to multiple common complex diseases, in particular auto-
ing diverse molecular traits. While additional experiments are immune and cardiovascular diseases. This results in a number
needed to prove that these molecular events are causal to dis- of cases where hematological changes appear to be causal to
ease, these analyses provide a list of disease loci with a solid diseases, including both expected and entirely novel associa-
hypothesis of the molecular mechanism. tions. This is of major value in understanding multiple layers of
GWASs have become an essential tool to characterizing ge- biological processes underlying diseases. Overall, these findings
netic contribution to human disease, and large, comprehensive enhance our understanding of genetic effects on hematopoietic
GWASs analyzed with statistical sophistication can provide a processes and will help to identify novel therapeutic targets for
wealth of novel information of mechanisms of biological pro- hematological and other diseases.
cesses underlying disease etiology. The study by Astle et al. Altogether, these two studies build comprehensive catalogs of
(2016) is a prime example of such work. They performed a genetic effects to proximal molecular traits of the chromatin and
genome-wide association analysis in 173,480 individuals of gene expression, as well as physiological traits of the hemato-
European ancestry, testing 29.5 million polymorphic DNA seq- poietic system. Using available GWAS summary statistics, they
uence variants for association with 36 hematological traits. link their data to disease, thus building a comprehensive chain
This led to the discovery of 2,706 independent genetic variants of associations at multiple levels. These large-scale studies
associated with red cell, white cell, and platelet indices, which demonstrate how well-coordinated cohort datasets and in-
is a nearly 10-fold increase in the number of known associations creasingly affordable assays—both for physiological and molec-
(Vasquez et al., 2016). More than 10% of the associated variants ular phenotyping and for genome sequencing—can provide
are rare or low frequency, exemplifying the importance of high- biological and medical insights. A further scale-up and expan-
resolution imputation based on whole-genome sequencing. sion of these approaches holds major promises for the future.
In line with previous GWAS results, the majority of associated Increasing sample sizes and sequencing resolution are particu-
variants are common, located in non-coding regions of the larly important to reliably discover trans-acting regulatory vari-
genome, and with small effects. However, Astle et al. (2016) ants (Bonder et al., 2015; Jo et al., 2016; Westra et al., 2013),
also identify a large number of rare variants that are enriched which account for more than half of the genetically explained
for high effect sizes and effects on protein sequence. Nine of variance in gene expression (Battle and Montgomery, 2014),
ACKNOWLEDGMENTS Lappalainen, T., Sammeth, M., Friedländer, M.R., ’t Hoen, P.A., Monlong, J.,
Rivas, M.A., Gonzàlez-Porta, M., Kurbatova, N., Griebel, T., Ferreira, P.G.,
S.K.-H. is supported by a research fellowship of the DFG. T.L. is supported by et al.; Geuvadis Consortium (2013). Nature 501, 506–511.
NIH grants R01MH106842, UM1HG008901, and HSN2682010000029C. Lee, M.N., Ye, C., Villani, A.C., Raj, T., Li, W., Eisenhaure, T.M., Imboywa, S.H.,
Chipendo, P.I., Ran, F.A., Slowikowski, K., et al. (2014). Science 343, 1246980.
REFERENCES
Li, Y.I., van de Geijn, B., Raj, A., Knowles, D.A., Petti, A.A., Golan, D., Gilad, Y.,
and Pritchard, J.K. (2016). Science 352, 600–604.
Aguet, F., Brown, A.A., Castel, S., Davis, J.R., Mohammadi, P., Segre, A.V.,
Zappala, Z., Abell, N.S., Fresard, L., Gamazon, E.R., et al. (2016). bioRxiv, Naranbhai, V., Fairfax, B.P., Makino, S., Humburg, P., Wong, D., Ng, E., Hill,
http://biorxiv.org/content/early/2016/09/09/074450 A.V.S., and Knight, J.C. (2015). Nat. Commun. 6, 7545.
Albert, F.W., and Kruglyak, L. (2015). Nat. Rev. Genet. 16, 197–212. Pai, A.A., Pritchard, J.K., and Gilad, Y. (2015). PLoS Genet. 11, e1004857–
Astle, W.J., Elding, H., Jiang, T., Allen, D., Ruklisa, D., Mann, A.L., Mead, D., e1004858.
Bouman, H., Riveros-Mckay, F., Kostadima, M.A., et al. (2016). Cell 167. Raj, T., Rothamel, K., Mostafavi, S., Ye, C., Lee, M.N., Replogle, J.M., Feng, T.,
http://dx.doi.org/10.1371/journal.pbio.0000051. Lee, M., Asinovski, N., Frohlich, I., et al. (2014). Science 344, 519–523.
Barreiro, L.B., Tailleux, L., Pai, A.A., Gicquel, B., Marioni, J.C., and Gilad, Y.
Vasquez, L.J., Mann, A.L., Chen, L., and Soranzo, N. (2016). ISBT Sci. Ser.
(2012). Proc. Natl. Acad. Sci. USA 109, 1204–1209.
11(Suppl, Suppl 1 ), 211–219.
Battle, A., and Montgomery, S.B. (2014). Hum. Genet. 133, 727–735.
Westra, H.-J., Arends, D., Esko, T., Peters, M.J., Schurmann, C., Schramm, K.,
Battle, A., Mostafavi, S., Zhu, X., Potash, J.B., Weissman, M.M., McCormick, Kettunen, J., Yaghootkar, H., Fairfax, B.P., Andiappan, A.K., et al. (2015). PLoS
C., Haudenschild, C.D., Beckman, K.B., Shi, J., Mei, R., et al. (2014). Genome Genet. 11, e1005223–e17.
Res. 24, 14–24.
Westra, H.-J., Peters, M.J., Esko, T., Yaghootkar, H., Schurmann, C., Kettu-
Birney, E., Smith, G.D., and Greally, J.M. (2016). PLoS Genet. 12, e1006105.
nen, J., Christiansen, M.W., Fairfax, B.P., Schramm, K., Powell, J.E., et al.
Bonder, M.J., Luijk, R., Zhernakova, D., Moed, M., Deelen, P., Vermaat, M., (2013). Nat. Genet. 45, 1238–1243.
van Iterson, M., van Dijk, F., van Galen, M., Bot, J., et al. (2015). bioRxiv,
http://biorxiv.org/content/early/2015/12/01/033084.1 Wright, F.A., Sullivan, P.F., Brooks, A.I., Zou, F., Sun, W., Xia, K., Madar, V.,
Jansen, R., Chung, W., Zhou, Y.-H., et al. (2014). Nat. Genet. 46, 430–437.
Chen, L., Ge, B., Casale, F.P., Vasquez, L., Kwan, T., Garrido-Martı́n, D., Watt,
S., Yang, Y., Kundu, K., Ecker, S., et al. (2016). Cell 167. http://dx.doi.org/10. Ye, C.J., Feng, T., Kwon, H.K., Raj, T., Wilson, M.T., Asinovski, N., McCabe, C.,
1371/journal.pbio.0000051. Lee, M.H., Frohlich, I., Paik, H.I., et al. (2014). Science 345, 1254665.
Fairfax, B.P., Humburg, P., Makino, S., Naranbhai, V., Wong, D., Lau, E., Jos- Zhernakova, D., Deelen, P., Vermaat, M., van Iterson, M., van Galen, M., and
tins, L., Plant, K., Andrews, R., McGee, C., and Knight, J.C. (2014). Science Arindrarto, W. van t Hof, P., Mei, H., van Dijk, F., Westra, H.-J., et al. (2015).
343, 1246949. bioRxiv, http://biorxiv.org/content/early/2015/11/30/033217
*Correspondence: wysocka@stanford.edu
http://dx.doi.org/10.1016/j.cell.2016.09.018
A class of cis-regulatory elements, called enhancers, play a central role in orchestrating spatiotem-
porally precise gene-expression programs during development. Consequently, divergence in
enhancer sequence and activity is thought to be an important mediator of inter- and intra-species
phenotypic variation. Here, we give an overview of emerging principles of enhancer function,
current models of enhancer architecture, genomic substrates from which enhancers emerge during
evolution, and the influence of three-dimensional genome organization on long-range gene regula-
tion. We discuss intricate relationships between distinct elements within complex regulatory
landscapes and consider their potential impact on specificity and robustness of transcriptional
regulation.
(D) Enhancer elements can be exapted from transposable elements. For example, following endogenization and unequal homologous recombination, the long-
terminal repeats (LTRs) of endogenous retroviral (ERV) elements can gain tissue-specific regulatory activity through accumulation of mutations and emergence
of TFBSs.
(E) Enhancer activity can be transferred to a new gene target, for example, through a genomic inversion event.
highly transcribed genes (Hsieh et al., 2015). Interestingly, these sary to understand the evolution and functional role of topologi-
self-associating topological domains in S. cerevisiae encompass cal domain architecture.
1–5 genes, approximately the same order of gene number as Formation of TAD Boundaries and CTCF-Mediated
mammalian TADs, suggesting that the size of genes and inter- Loops
genic spacing may influence the size and formation of topologi- Since TAD boundaries appear to restrict both enhancer function
cal domains. Further interrogation of the topological landscapes (Dowen et al., 2014; Flavahan et al., 2016; Guo et al., 2015;
at high resolution across the eukaryotic tree of life will be neces- Lupiáñez et al., 2015) and spreading of chromatin marks
(Narendra et al., 2015), they fulfill a canonical definition of insu- in mammals (Dowen et al., 2014; Sofueva et al., 2013). In fact,
lator elements. Indeed, TAD boundaries are enriched for insu- high-resolution Hi-C maps in mammalian cell lines revealed
lator-binding proteins, such as CP190, CTCF, and BEAF-32 in that 86% of roughly 10,000 long-range contact peaks, inter-
Drosophila (Van Bortle et al., 2014; Sexton et al., 2012; Phil- preted as anchors for chromatin loops, are associated with
lips-Cremins et al., 2013) and are enriched for CTCF and cohesin CTCF (Rao et al., 2014). Moreover, CTCF sites that are engaged
Duque, T., and Sinha, S. (2015). What does it take to evolve an enhancer? Gordân, R., Shen, N., Dror, I., Zhou, T., Horton, J., Rohs, R., and Bulyk, M.L.
A simulation-based study of factors influencing the emergence of combinato- (2013). Genomic regions flanking E-box binding sites influence DNA binding
rial regulation. Genome Biol. Evol. 7, 1415–1431. specificity of bHLH transcription factors through DNA shape. Cell Rep. 3,
1093–1104.
Eagen, K.P., Hartl, T.A., and Kornberg, R.D. (2015). Stable Chromosome
Grob, S., Schmid, M.W., and Grossniklaus, U. (2014). Hi-C analysis in Arabi-
Condensation Revealed by Chromosome Conformation Capture. Cell 163,
dopsis identifies the KNOT, a structure with similarities to the flamenco locus
934–946.
of Drosophila. Mol. Cell 55, 678–693.
El-Sherif, E., and Levine, M. (2016). Shadow enhancers mediate dynamic shifts
Grubert, F., Zaugg, J.B., Kasowski, M., Ursu, O., Spacek, D.V., Martin, A.R.,
of gap gene expression in the Drosophila embryo. Curr. Biol. 26, 1164–1169.
Greenside, P., Srivas, R., Phanstiel, D.H., Pekowska, A., et al. (2015). Genetic
Erceg, J., Saunders, T.E., Girardot, C., Devos, D.P., Hufnagel, L., and Furlong, Control of Chromatin States in Humans Involves Local and Distal Chromo-
E.E. (2014). Subtle changes in motif positioning cause tissue-specific effects somal Interactions. Cell 162, 1051–1065.
on robustness of an enhancer’s activity. PLoS Genet. 10, e1004060.
Guo, Y., Xu, Q., Canzio, D., Shou, J., Li, J., Gorkin, D.U., Jung, I., Wu, H., Zhai,
Fabre, P.J., Benke, A., Joye, E., Nguyen Huynh, T.H., Manley, S., and Duboule, Y., Tang, Y., et al. (2015). CRISPR Inversion of CTCF Sites Alters Genome
D. (2015). Nanoscale spatial organization of the HoxD gene cluster in distinct Topology and Enhancer/Promoter Function. Cell 162, 900–910.
transcriptional states. Proc. Natl. Acad. Sci. USA 112, 13964–13969.
Guturu, H., Doxey, A.C., Wenger, A.M., and Bejerano, G. (2013). Structure-
Farley, E.K., Olson, K.M., Zhang, W., Brandt, A.J., Rokhsar, D.S., and Levine, aided prediction of mammalian transcription factor complexes in conserved
M.S. (2015). Suboptimization of developmental enhancers. Science 350, non-coding elements. Philos. Trans. R. Soc. Lond. B Biol. Sci. 368, 20130029.
325–328.
Harmston, N., Ing-simmons, E., Tan, G., Perry, M., Merkenschlager, M., and
Farré, M., Robinson, T.J., and Ruiz-Herrera, A. (2015). An Integrative Breakage Lenhard, B. (2016). Topologically associated domains are ancient features
Model of genome architecture, reshuffling and evolution: The Integrative that coincide with Metazoan clusters of extreme noncoding conservation.
Breakage Model of genome evolution, a novel multidisciplinary hypothesis bioRxiv, http://dx.doi.org/10.1101/042952.
for the study of genome plasticity. Bioessays 37, 479–488.
Hay, D., Hughes, J.R., Babbs, C., Davies, J.O., Graham, B.J., Hanssen, L.L.,
Feng, S., Cokus, S.J., Schubert, V., Zhai, J., Pellegrini, M., and Jacobsen, S.E. Kassouf, M.T., Oudelaar, A.M., Sharpe, J.A., Suciu, M.C., et al. (2016). Genetic
(2014). Genome-wide Hi-C analyses in wild-type and mutants reveal high-res- dissection of the a-globin super-enhancer in vivo. Nat. Genet. 48, 895–903.
olution chromatin interactions in Arabidopsis. Mol. Cell 55, 694–707.
Heger, P., Marin, B., Bartkuhn, M., Schierenberg, E., and Wiehe, T. (2012). The
Feschotte, C. (2008). Transposable elements and the evolution of regulatory chromatin insulator CTCF and the emergence of metazoan diversity. Proc.
networks. Nat. Rev. Genet. 9, 397–405. Natl. Acad. Sci. USA 109, 17507–17512.
Flavahan, W.A., Drier, Y., Liau, B.B., Gillespie, S.M., Venteicher, A.S., Stem- Hilton, I.B., D’Ippolito, A.M., Vockley, C.M., Thakore, P.I., Crawford, G.E.,
mer-Rachamimov, A.O., Suvà, M.L., and Bernstein, B.E. (2016). Insulator Reddy, T.E., and Gersbach, C.A. (2015). Epigenome editing by a CRISPR-
dysfunction and oncogene activation in IDH mutant gliomas. Nature 529, Cas9-based acetyltransferase activates genes from promoters and en-
110–114. hancers. Nat. Biotechnol. 33, 510–517.
Frankel, N., Davis, G.K., Vargas, D., Wang, S., Payre, F., and Stern, D.L. (2010). Hnisz, D., Abraham, B.J., Lee, T.I., Lau, A., Saint-André, V., Sigova, A.A., Hoke,
Phenotypic robustness conferred by apparently redundant transcriptional en- H.A., and Young, R.A. (2013). Super-enhancers in the control of cell identity
hancers. Nature 466, 490–493. and disease. Cell 155, 934–947.
Frankel, N., Erezyilmaz, D.F., McGregor, A.P., Wang, S., Payre, F., and Stern, Hong, J.-W., Hendrix, D.A., and Levine, M.S. (2008). Shadow enhancers as a
D.L. (2011). Morphological evolution caused by many subtle-effect substitu- source of evolutionary novelty. Science 321, 1314.
tions in regulatory DNA. Nature 474, 598–603. Hsieh, T.-H.S., Weiner, A., Lajoie, B., Dekker, J., Friedman, N., and Rando,
Fudenberg, G., Imakaev, M., Lu, C., Goloborodko, A., Abdennur, N., and O.J. (2015). Mapping Nucleosome Resolution Chromosome Folding in Yeast
Mirny, L.A. (2015). Formation of Chromosomal Domains by Loop Extrusion. by Micro-C. Cell 162, 108–119.
Cell Rep. 15, 2038–2049. Hou, C., Li, L., Qin, Z.S., and Corces, V.G. (2012). Gene density, transcription,
Fukaya, T., Lim, B., and Levine, M. (2016). Enhancer Control of Transcriptional and insulators contribute to the partition of the Drosophila genome into phys-
Bursting. Cell 166, 358–368. ical domains. Mol. Cell 48, 471–484.
Gallego Romero, I., Pavlovic, B.J., Hernando-Herraez, I., Zhou, X., Ward, M.C., Iampietro, C., Gummalla, M., Mutero, A., Karch, F., and Maeda, R.K. (2010).
Banovich, N.E., Kagan, C.L., Burnett, J.E., Huang, C.H., Mitrano, A., et al. Initiator elements function to determine the activity state of BX-C enhancers.
(2015). A panel of induced pluripotent stem cells from chimpanzees: a PLoS Genet. 6, e1001260.
resource for comparative functional genomics. Elife 4, e07103. Jacques, P.-É., Jeyakani, J., and Bourque, G. (2013). The majority of primate-
Galtier, N., and Duret, L. (2007). Adaptation or biased gene conversion? Ex- specific regulatory sequences are derived from transposable elements. PLoS
tending the null hypothesis of molecular evolution. Trends Genet. 23, 273–277. Genet. 9, e1003504.
Understanding how transcriptional enhancers control over 20,000 protein-coding genes to maintain
cell-type-specific gene expression programs in all human cells is a fundamental challenge in regu-
latory biology. Recent studies suggest that gene regulatory elements and their target genes gener-
ally occur within insulated neighborhoods, which are chromosomal loop structures formed by the
interaction of two DNA sites bound by the CTCF protein and occupied by the cohesin complex.
Here, we review evidence that insulated neighborhoods provide for specific enhancer-gene inter-
actions, are essential for both normal gene activation and repression, form a chromosome scaffold
that is largely preserved throughout development, and are perturbed by genetic and epigenetic fac-
tors in disease. Insulated neighborhoods are a powerful paradigm for gene control that provides
new insights into development and disease.
Introduction hancers, first described over 30 years ago (Banerji et al., 1981;
Many recent reports describe evidence that specific chromo- Benoist and Chambon, 1981; Gruss et al., 1981), are segments
some structures play important roles in gene control. A core of DNA that are typically a few hundred base pairs in length
principle that has emerged from these studies is that genes and are occupied by multiple transcription factors that recruit
and their regulatory elements typically occur together within co-activators and RNA polymerase II to target genes (Bulger
specific DNA loop structures, which we have called ‘‘insulated and Groudine, 2011; Spitz and Furlong, 2012; Tjian and Maniatis,
neighborhoods.’’ Here, we review evidence that insulated 1994). Tens of thousands of enhancers are estimated to be
neighborhoods are structural and functional units of gene con- active in any given human cell type (ENCODE Project Con-
trol, and we explain how they are used during development to sortium, 2012; Roadmap Epigenomics et al., 2015). Enhancers
control the diverse cell identities that contribute to complex and their associated factors can regulate expression of genes
animals. We explain how insulated neighborhoods form the located far upstream or downstream by looping to the promoters
mechanistic basis of higher-order chromosome structures, of these genes, so the features that cause enhancers to regulate
such as topologically associating domains (TADs), we discuss only specific genes, generally on their own chromosomes, have
how genetic and epigenetic perturbations of neighborhood been something of a mystery for several decades (Figure 1A).
boundaries contribute to disease, and we outline how further This mystery, which we will call the enhancer-gene-specificity
study of neighborhood structure and function will lead to addi- conundrum, is important to solve because the majority of dis-
tional insights into development and disease. There are other ease-associated non-coding variation occurs in the vicinity of
excellent reviews that provide historical perspective and sum- enhancers and, thus, likely impacts these enhancers’ target
marize key insights into chromosome structure (Bickmore and genes (Ernst et al., 2011; Farh et al., 2015; Hnisz et al., 2013;
van Steensel, 2013; Cavalli and Misteli, 2013; de Laat and Maurano et al., 2012).
Duboule, 2013; Dekker and Heard, 2015; Dekker and Mirny, Some of the specificity of enhancer-gene interactions may be
2016; Gibcus and Dekker, 2013; Gorkin et al., 2014; Mer- due to the interaction of DNA-binding transcription factors at en-
kenschlager and Nora, 2016; Phillips and Corces, 2009; Phil- hancers with specific partner transcription factors at promoters
lips-Cremins and Corces, 2013); here, we focus on the insulated (Butler and Kadonaga, 2001; Choi and Engel, 1988; Ohtsuki
neighborhood as a model for further exploration of the principles et al., 1998). Each cell type expresses hundreds of different tran-
that underpin gene control in mammalian systems. scription factors, and these bind to DNA sequences in enhancers
and in promoter-proximal regions. Diverse factors bound at
The Enhancer-Gene-Specificity Conundrum these two sites interact with large cofactor complexes and
Cell-type-specific gene expression programs in humans are could, in principle, interact with one another to produce some
generally controlled by gene regulatory elements called en- degree of enhancer-gene specificity (Zabidi et al., 2015). It is
hancers (Buecker and Wysocka, 2012; Heinz et al., 2015; Levine not clear to what extent this mechanism contributes to specific
et al., 2014; Ong and Corces, 2011; Ren and Yue, 2015). En- enhancer-gene interactions throughout the human genome.
insulated neighborhoods of human T cells, 90% of enhancer- the neighborhood. Insulated neighborhood boundaries are also
promoter loops are fully contained within the neighborhood necessary to maintain repression of genes within the neighbor-
boundaries (Hnisz et al., 2016). It is also possible to estimate hood; deletion of a CTCF anchor of an insulated neighborhood
each neighborhood’s insulation efficacy using an ‘‘insulation containing a Polycomb repressed gene led to the activation of
score.’’ The insulation score of a neighborhood is calculated as that gene (Dowen et al., 2014).
the percentage of enhancer-promoter interactions that are fully The finding that cancer cells can activate oncogenes through
contained within the neighborhood. In human ESCs, 59% of somatic mutations or epigenetic modifications that disrupt insu-
insulated neighborhoods have an insulation score of 100%. lated neighborhood boundaries provides additional evidence
Genetic perturbation of neighborhood anchor sequences has that neighborhood loop anchors have functional insulating prop-
provided evidence for their structural and functional roles as in- erties (Figure 2E) (Flavahan et al., 2016; Hnisz et al., 2016; Katai-
sulators (Dowen et al., 2014; Flavahan et al., 2016; Hnisz et al., nen et al., 2015). Silent proto-oncogenes typically occur within
2016; Ji et al., 2016; Narendra et al., 2015). In a dozen loci and insulated neighborhoods, and genetic modification of the neigh-
in multiple cell types, CRISPR/Cas9 deletion of CTCF binding borhood loop anchors can cause activation of these oncogenes
sites at the anchors of insulated neighborhoods has been shown (Flavahan et al., 2016; Hnisz et al., 2016). Somatic mutations
to produce changes in the expression of genes within the neigh- occur frequently and recurrently in the loop anchors of onco-
borhoods and immediately adjacent to the deleted neighbor- gene-containing insulated neighborhoods in a variety of cancer
hood boundary. For example, the miR-290–295 miRNA gene cells (Figure 2E). Indeed, the CTCF DNA-binding motif in loop an-
cluster, which plays important roles in ESC pluripotency, occurs chor regions is among the most-altered human-transcription-
within an insulated neighborhood together with a super- factor-binding sequences in cancer cells (Ji et al., 2016). These
enhancer; when a CTCF loop anchor site of this neighborhood observations are consistent with the idea that mutations that
was deleted, there was a reduction in expression of the miRNA alter the loop anchor sites of oncogene-containing insulated
precursor and activation of an adjacent gene outside of the neighborhoods make an important contribution to the misregula-
neighborhood concomitant with looping of the super-enhancer tion of gene expression that is inherent to the cancer state (Fla-
to this outside gene (Figure 2D). Furthermore, when genes occur vahan et al., 2016; Hnisz et al., 2016; Katainen et al., 2015).
within multiple nested insulated neighborhoods, deletion of mul- Maintenance of Loop Anchors during Development
tiple boundary sites was required to observe changes in gene The majority of insulated neighborhoods that have been mapped
expression (Dowen et al., 2014). Thus, insulated neighborhood in human ESCs appear to be maintained during development
boundaries constrain the activity of enhancers to genes within because the experimental evidence indicates that CTCF binding
Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and Gorkin, D.U., Leung, D., and Ren, B. (2014). The 3D genome in transcriptional
Ren, B. (2012). Topological domains in mammalian genomes identified by regulation and pluripotency. Cell Stem Cell 14, 762–775.
analysis of chromatin interactions. Nature 485, 376–380. Gröschel, S., Sanders, M.A., Hoogenboezem, R., de Wit, E., Bouwman, B.A.,
Dixon, J.R., Jung, I., Selvaraj, S., Shen, Y., Antosiewicz-Bourget, J.E., Lee, Erpelinck, C., van der Velden, V.H., Havermans, M., Avellino, R., van Lom, K.,
A.Y., Ye, Z., Kim, A., Rajagopal, N., Xie, W., et al. (2015). Chromatin architec- et al. (2014). A single oncogenic enhancer rearrangement causes concomitant
ture reorganization during stem cell differentiation. Nature 518, 331–336. EVI1 and GATA2 deregulation in leukemia. Cell 157, 369–381.
Dong, J., Panchakshari, R.A., Zhang, T., Zhang, Y., Hu, J., Volpi, S.A., Meyers, Grubert, F., Zaugg, J.B., Kasowski, M., Ursu, O., Spacek, D.V., Martin, A.R.,
R.M., Ho, Y.J., Du, Z., Robbiani, D.F., et al. (2015). Orientation-specific joining Greenside, P., Srivas, R., Phanstiel, D.H., Pekowska, A., et al. (2015). Genetic
of AID-initiated DNA breaks promotes antibody class switching. Nature 525, Control of Chromatin States in Humans Involves Local and Distal Chromo-
134–139. somal Interactions. Cell 162, 1051–1065.
Dowen, J.M., Bilodeau, S., Orlando, D.A., Hübner, M.R., Abraham, B.J., Spec- Gruss, P., Dhar, R., and Khoury, G. (1981). Simian virus 40 tandem repeated
tor, D.L., and Young, R.A. (2013). Multiple structural maintenance of chromo- sequences as an element of the early promoter. Proc. Natl. Acad. Sci. USA
some complexes at transcriptional regulatory elements. Stem Cell Reports 1, 78, 943–947.
371–378. GTEx Consortium (2015). Human genomics. The Genotype-Tissue Expression
Dowen, J.M., Fan, Z.P., Hnisz, D., Ren, G., Abraham, B.J., Zhang, L.N., Wein- (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348,
traub, A.S., Schuijers, J., Lee, T.I., Zhao, K., and Young, R.A. (2014). Control of 648–660.
cell identity genes occurs in insulated neighborhoods in mammalian chromo- Guo, Y., Xu, Q., Canzio, D., Shou, J., Li, J., Gorkin, D.U., Jung, I., Wu, H., Zhai,
somes. Cell 159, 374–387. Y., Tang, Y., et al. (2015). CRISPR Inversion of CTCF Sites Alters Genome
Doyle, B., Fudenberg, G., Imakaev, M., and Mirny, L.A. (2014). Chromatin Topology and Enhancer/Promoter Function. Cell 162, 900–910.
loops as allosteric modulators of enhancer-promoter interactions. PLoS Handoko, L., Xu, H., Li, G., Ngan, C.Y., Chew, E., Schnapp, M., Lee, C.W., Ye,
Comput. Biol. 10, e1003867. C., Ping, J.L., Mulawadi, F., et al. (2011). CTCF-mediated functional chromatin
ENCODE Project Consortium (2012). An integrated encyclopedia of DNA ele- interactome in pluripotent cells. Nat. Genet. 43, 630–638.
ments in the human genome. Nature 489, 57–74. Hark, A.T., Schoenherr, C.J., Katz, D.J., Ingram, R.S., Levorse, J.M., and Tilgh-
Ernst, J., Kheradpour, P., Mikkelsen, T.S., Shoresh, N., Ward, L.D., Epstein, man, S.M. (2000). CTCF mediates methylation-sensitive enhancer-blocking
C.B., Zhang, X., Wang, L., Issner, R., Coyne, M., et al. (2011). Mapping and activity at the H19/Igf2 locus. Nature 405, 486–489.
analysis of chromatin state dynamics in nine human cell types. Nature 473, Hartl, T.A., Smith, H.F., and Bosco, G. (2008). Chromosome alignment and
43–49. transvection are antagonized by condensin II. Science 322, 1384–1387.
Farh, K.K., Marson, A., Zhu, J., Kleinewietfeld, M., Housley, W.J., Beik, S., Heidari, N., Phanstiel, D.H., He, C., Grubert, F., Jahanbani, F., Kasowski, M.,
Shoresh, N., Whitton, H., Ryan, R.J., Shishkin, A.A., et al. (2015). Genetic Zhang, M.Q., and Snyder, M.P. (2014). Genome-wide map of regulatory inter-
and epigenetic fine mapping of causal autoimmune disease variants. Nature actions in the human genome. Genome Res. 24, 1905–1917.
518, 337–343. Heinz, S., Romanoski, C.E., Benner, C., and Glass, C.K. (2015). The selection
Flavahan, W.A., Drier, Y., Liau, B.B., Gillespie, S.M., Venteicher, A.S., Stem- and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 16,
mer-Rachamimov, A.O., Suvà, M.L., and Bernstein, B.E. (2016). Insulator 144–154.
dysfunction and oncogene activation in IDH mutant gliomas. Nature 529, Hnisz, D., Abraham, B.J., Lee, T.I., Lau, A., Saint-André, V., Sigova, A.A., Hoke,
110–114. H.A., and Young, R.A. (2013). Super-enhancers in the control of cell identity
Fogarty, M.P., Cannon, M.E., Vadlamudi, S., Gaulton, K.J., and Mohlke, K.L. and disease. Cell 155, 934–947.
(2014). Identification of a regulatory variant that binds FOXA1 and FOXA2 at Hnisz, D., Weintraub, A.S., Day, D.S., Valton, A.L., Bak, R.O., Li, C.H., Gold-
the CDC123/CAMK1D type 2 diabetes GWAS locus. PLoS Genet. 10, mann, J., Lajoie, B.R., Fan, Z.P., Sigova, A.A., et al. (2016). Activation of
e1004633. proto-oncogenes by disruption of chromosome neighborhoods. Science
Francis, N.J., Kingston, R.E., and Woodcock, C.L. (2004). Chromatin compac- 351, 1454–1458.
tion by a polycomb group protein complex. Science 306, 1574–1577. Hou, C., Zhao, H., Tanimoto, K., and Dean, A. (2008). CTCF-dependent
Fudenberg, G., Imakaev, M., Lu, C., Goloborodko, A., Abdennur, N., and enhancer-blocking by alternative chromatin loop formation. Proc. Natl.
Mirny, L.A. (2016). Formation of Chromosomal Domains by Loop Extrusion. Acad. Sci. USA 105, 20398–20403.
Cell Rep. 15, 2038–2049. Hu, J., Zhang, Y., Zhao, L., Frock, R.L., Du, Z., Meyers, R.M., Meng, F.L.,
Fukaya, T., Lim, B., and Levine, M. (2016). Enhancer Control of Transcriptional Schatz, D.G., and Alt, F.W. (2015). Chromosomal Loop Domains Direct the
Bursting. Cell 166, 358–368. Recombination of Antigen Receptor Genes. Cell 163, 947–959.
Katainen, R., Dave, K., Pitkänen, E., Palin, K., Kivioja, T., Välimäki, N., Gylfe, Ong, C.T., and Corces, V.G. (2014). CTCF: an architectural protein bridging
A.E., Ristolainen, H., Hänninen, U.A., Cajuso, T., et al. (2015). CTCF/cohe- genome topology and function. Nat. Rev. Genet. 15, 234–246.
sin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821. Ong, C.T., Van Bortle, K., Ramos, E., and Corces, V.G. (2013). Poly(ADP-ribo-
Kellum, R., and Schedl, P. (1991). A position-effect assay for boundaries of syl)ation regulates insulator function and intrachromosomal interactions in
higher order chromosomal domains. Cell 64, 941–950. Drosophila. Cell 155, 148–159.
Roadmap Epigenomics Consortium, Kundaje, A., Meuleman, W., Ernst, J., Bi- Phillips, J.E., and Corces, V.G. (2009). CTCF: master weaver of the genome.
lenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Cell 137, 1194–1211.
Ziller, M.J., et al. (2015). Integrative analysis of 111 reference human epige-
Phillips-Cremins, J.E., and Corces, V.G. (2013). Chromatin insulators: linking
nomes. Nature 518, 317–330.
genome organization to cellular function. Mol. Cell 50, 461–474.
Kurukuti, S., Tiwari, V.K., Tavoosidana, G., Pugacheva, E., Murrell, A., Zhao,
Phillips-Cremins, J.E., Sauria, M.E., Sanyal, A., Gerasimova, T.I., Lajoie, B.R.,
Z., Lobanenkov, V., Reik, W., and Ohlsson, R. (2006). CTCF binding at the
Bell, J.S., Ong, C.T., Hookway, T.A., Guo, C., Sun, Y., et al. (2013). Architec-
H19 imprinting control region mediates maternally inherited higher-order chro-
tural protein subclasses shape 3D organization of genomes during lineage
matin conformation to restrict enhancer access to Igf2. Proc. Natl. Acad. Sci.
commitment. Cell 153, 1281–1295.
USA 103, 10684–10689.
Lawrence, M.S., Stojanov, P., Mermel, C.H., Robinson, J.T., Garraway, L.A., Pomerantz, M.M., Ahmadiyeh, N., Jia, L., Herman, P., Verzi, M.P., Doddapa-
Golub, T.R., Meyerson, M., Gabriel, S.B., Lander, E.S., and Getz, G. (2014). neni, H., Beckwith, C.A., Chan, J.A., Hills, A., Davis, M., et al. (2009). The
Discovery and saturation analysis of cancer genes across 21 tumour types. 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC
Nature 505, 495–501. in colorectal cancer. Nat. Genet. 41, 882–884.
Levine, M., Cattoglio, C., and Tjian, R. (2014). Looping back to leap forward: Rao, S.S., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Rob-
transcription enters a new era. Cell 157, 13–25. inson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., and Aiden, E.L.
(2014). A 3D map of the human genome at kilobase resolution reveals princi-
Liu, M., Maurano, M.T., Wang, H., Qi, H., Song, C.Z., Navas, P.A., Emery,
ples of chromatin looping. Cell 159, 1665–1680.
D.W., Stamatoyannopoulos, J.A., and Stamatoyannopoulos, G. (2015).
Genomic discovery of potent chromatin insulators for human gene therapy. Ren, B., and Yue, F. (2015). Transcriptional enhancers: bridging the genome
Nat. Biotechnol. 33, 198–203. and phenome. Cold Spring Harb. Symp. Quant. Biol. 80, 17–26.
Liu, X.S., Wu, H., Ji, X., Stelzer, Y., Wu, X., Czauderna, S., Shu, J., Dadon, D., Saldaña-Meyer, R., González-Buendı́a, E., Guerrero, G., Narendra, V., Bona-
Young, R.A., and Jaenisch, R. (2016). Editing DNA methylation in the mamma- sio, R., Recillas-Targa, F., and Reinberg, D. (2014). CTCF regulates the human
lian genome. Cell 167, 233–247. p53 gene through direct interaction with its natural antisense transcript,
Lupiáñez, D.G., Kraft, K., Heinrich, V., Krawitz, P., Brancati, F., Klopocki, E., Wrap53. Genes Dev. 28, 723–734.
Horn, D., Kayserili, H., Opitz, J.M., Laxova, R., et al. (2015). Disruptions of Sanborn, A.L., Rao, S.S., Huang, S.C., Durand, N.C., Huntley, M.H., Jewett,
topological chromatin domains cause pathogenic rewiring of gene-enhancer A.I., Bochkov, I.D., Chinnappan, D., Cutkosky, A., Li, J., et al. (2015). Chro-
interactions. Cell 161, 1012–1025. matin extrusion explains key features of loop and domain formation in wild-
Maurano, M.T., Humbert, R., Rynes, E., Thurman, R.E., Haugen, E., Wang, H., type and engineered genomes. Proc. Natl. Acad. Sci. USA 112, E6456–E6465.
Reynolds, A.P., Sandstrom, R., Qu, H., Brody, J., et al. (2012). Systematic Schmitt, A.D., Hu, M., and Ren, B. (2016). Genome-wide mapping and analysis
localization of common disease-associated variation in regulatory DNA. Sci- of chromosome architecture. Nat. Rev. Mol. Cell Biol. Published online
ence 337, 1190–1195. September 1, 2016. http://dx.doi.org/10.1038/nrm.2016.104.
McGeachie, M.J., Yates, K.P., Zhou, X., Guo, F., Sternberg, A.L., Van Natta,
Seila, A.C., Calabrese, J.M., Levine, S.S., Yeo, G.W., Rahl, P.B., Flynn, R.A.,
M.L., Wise, R.A., Szefler, S.J., Sharma, S., Kho, A.T., et al.; CAMP Research
Young, R.A., and Sharp, P.A. (2008). Divergent transcription from active pro-
Group (2016). Genetics and genomics of longitudinal lung function patterns
moters. Science 322, 1849–1851.
in asthmatics. Am. J. Respir. Crit. Care Med. Published online July 1, 2016.
http://dx.doi.org/10.1164/rccm.201602-0250OC. Sigova, A.A., Mullen, A.C., Molinie, B., Gupta, S., Orlando, D.A., Guenther,
M.G., Almada, A.E., Lin, C., Sharp, P.A., Giallourakis, C.C., and Young, R.A.
Merkenschlager, M., and Nora, E.P. (2016). CTCF and Cohesin in Genome
(2013). Divergent transcription of long noncoding RNA/mRNA gene pairs in
Folding and Transcriptional Gene Regulation. Annu. Rev. Genomics Hum.
embryonic stem cells. Proc. Natl. Acad. Sci. USA 110, 2876–2881.
Genet. 17, 17–43.
Murrell, A., Heeson, S., and Reik, W. (2004). Interaction between differentially Smith, E.M., Lajoie, B.R., Jain, G., and Dekker, J. (2016). Invariant TAD Bound-
methylated regions partitions the imprinted genes Igf2 and H19 into parent- aries Constrain Cell-Type-Specific Looping Interactions between Promoters
specific chromatin loops. Nat. Genet. 36, 889–893. and Distal Elements around the CFTR Locus. Am. J. Hum. Genet. 98, 185–201.
Narendra, V., Rocha, P.P., An, D., Raviram, R., Skok, J.A., Mazzoni, E.O., and Spitz, F., and Furlong, E.E. (2012). Transcription factors: from enhancer bind-
Reinberg, D. (2015). CTCF establishes discrete functional chromatin domains ing to developmental control. Nat. Rev. Genet. 13, 613–626.
at the Hox clusters during differentiation. Science 347, 1017–1021. Splinter, E., Heath, H., Kooren, J., Palstra, R.J., Klous, P., Grosveld, F., Galjart,
Nativio, R., Sparago, A., Ito, Y., Weksberg, R., Riccio, A., and Murrell, A. (2011). N., and de Laat, W. (2006). CTCF mediates long-range chromatin looping and
Disruption of genomic neighbourhood at the imprinted IGF2-H19 locus in local histone modification in the beta-globin locus. Genes Dev. 20, 2349–2354.
Tang, Z., Luo, O.J., Li, X., Zheng, M., Zhu, J.J., Szalaj, P., Trzaskoma, P., Mag- Vietri Rudan, M., Barrington, C., Henderson, S., Ernst, C., Odom, D.T., Tanay,
alska, A., Wlodarczyk, J., Ruszczycki, B., et al. (2015). CTCF-Mediated Human A., and Hadjur, S. (2015). Comparative Hi-C reveals that CTCF underlies evo-
3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell lution of chromosomal domain architecture. Cell Rep. 10, 1297–1309.
163, 1611–1627. Wang, H., Maurano, M.T., Qu, H., Varley, K.E., Gertz, J., Pauli, F., Lee, K., Can-
field, T., Weaver, M., Sandstrom, R., et al. (2012). Widespread plasticity in
Tjian, R., and Maniatis, T. (1994). Transcriptional activation: a complex puzzle
CTCF occupancy linked to DNA methylation. Genome Res. 22, 1680–1688.
with few easy pieces. Cell 77, 5–8.
Wang, H., Zang, C., Taing, L., Arnett, K.L., Wong, Y.J., Pear, W.S., Blacklow,
Tolhuis, B., Palstra, R.J., Splinter, E., Grosveld, F., and de Laat, W. (2002). S.C., Liu, X.S., and Aster, J.C. (2014). NOTCH1-RBPJ complexes drive target
Looping and interaction between hypersensitive sites in the active beta-globin gene expression through dynamic interactions with superenhancers. Proc.
locus. Mol. Cell 10, 1453–1465. Natl. Acad. Sci. USA 111, 705–710.
UK10K Consortium, Walter, K., Min, J.L., Huang, J., Crooks, L., Memari, Y., Zabidi, M.A., Arnold, C.D., Schernhuber, K., Pagani, M., Rath, M., Frank, O.,
McCarthy, S., Perry, J.R., Xu, C., Futema, M., et al. (2015). The UK10K project and Stark, A. (2015). Enhancer-core-promoter specificity separates develop-
identifies rare variants in health and disease. Nature 526, 82–90. mental and housekeeping gene regulation. Nature 518, 556–559.
Correspondence
anna.babour@inserm.fr
In Brief
A chromatin remodeling complex retains
premature mRNPs in proximity to their
transcription site, ensuring an accurate
surveillance mechanism that proofreads
the efficiency of mRNA biogenesis.
Highlights
d The chromatin remodeling complex ISW1 controls nuclear
poly(A) RNA accumulation
France
4Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
5Lead Contact
*Correspondence: anna.babour@inserm.fr
http://dx.doi.org/10.1016/j.cell.2016.10.048
Cell 167, 1201–1214, November 17, 2016 ª 2016 Elsevier Inc. 1201
A B
C D
Figure 1. The Chromatin-Associated ISW1 Complex Controls Nuclear Accumulation of Poly(A) RNA
(A) Co-immunoprecipitation of Isw1-13Myc and Mex67. Mock, pre-immune serum.
(B) Inactivation of the ISW1 complex does not affect poly (A) RNA localization as observed by oligo dT FISH analyses (n = 3, mean ± SD).
(C) Deletion of ISW1 rescues the growth of the mex67DUBA mutant. The indicated strains containing pRS316-MEX67 were grown at 30 C on 5-FOA plates to
counter select pRS316-MEX67.
(D) Inactivation of the ISW1 complex rescues the poly(A) RNA nuclear accumulation defect of the mex67DUBA mutant. FISH analysis and quantification was
performed as in (B) in the indicated strains grown for 2 hr at 30 C.
(legend continued on next page)
(E) Deletion of each subunit of the ISW1 complex rescues the growth of the npl3-1 mutant at 30 C on YPD.
(F) Inactivation of the ISW1 complex reduces the poly(A) nuclear accumulation defect of the npl3-1 mutant. Subcellular localization of poly(A) RNA was analyzed
as in (B) in the different strains grown overnight at 25 C in YPD and shifted for 3 hr at 30 C prior to fixation. Scale bar, 5 mm. See also Figure S1.
(G) Overexpression of Isw1 inhibits the growth of npl3-1 cells. Left: serial dilutions of strains grown on selective media. Right: total protein extracts from WT ISW1-
3FL or npl3-1 ISW1-3FL cells transformed with pRS415GPD or pRS415GPD3FL-ISW1 were analyzed by western blot with anti-FLAG and anti-Mex67 (loading
control) antibodies.
See also Figure S1.
C D E
Figure 3. Inactivation of ISW1 Rescues Nuclear Export of Improper mRNPs and Resulting Genetic Instability
(A and B) Deletion of ISW1 does not affect transcription elongation. Serial dilutions of the indicated strains were grown with or without MPA.
(C) LYS2 transcription shut off experimental setting: npl3-1 and npl3-1 isw1D cells were shifted from 25 C to 30 C for 1 hr and transcription was blocked by
addition of phenanthroline (t = 0). Samples were collected for analysis at t = 0, 300 ,and 600 .
(D) Similar CTD recruitment to LYS2 in npl3-1 and npl3-1 isw1D cells analyzed by chromatin immunoprecipitation (ChIP) and normalized to the value at t = 0 (n = 3,
mean ± SD). See Figure S3C for non-normalized values.
(E) npl3-1 and npl3-1 isw1D show similar LYS2 mRNA levels as analyzed by qRT-PCR and normalized to ACT1 mRNA expression (n = 5, mean ± SD).
(F) Deletion of ISW1 releases the LYS2 transcripts accumulated in a nuclear dot of npl3-1 cells. The subcellular localization of the LYS2 transcript after blocking
transcription with phenanthroline in npl3-1 and npl3-1 isw1D cells was analyzed by FISH using Quasar570 -LYS2 probes. For each time point, the percentage of
cells showing a nuclear dot was scored (n = 3, mean ± SD). White arrows point to nuclear localized LYS2 transcripts. Scale bar, 5 mm.
(G) ISW1 inactivation reduces the number of spontaneous Rad52 foci in npl3-1 cells grown for 3 hr at 30 C. Fluorescence microscopic examination of the
indicated cells transformed with a pRS415-Rad52-YFP plasmid. White arrows highlight Rad52 foci in npl3-1 unbudded cells. For each cell type, an average of 300
budded and unbudded cells were examined (n = 3, mean ± SD).
(H) ISW1 inactivation partially rescues the hyperrecombination phenotype of the npl3-1 mutant. Recombination was analyzed in the indicated strains carrying
pL or pLYDN plasmids, grown for 3 hr at 30 C and plated at 25 C. Average and standard deviation of three fluctuation tests consisting of the median value of
12 independent colonies for each condition are shown.
See also Figure S3.
(G) Isw1 UV cross-links to RNA in vivo. HTP tagged Isw1 was cross-linked (+) or not () and purified from cell extracts. A total of 2.5% of the nickel eluate was
resolved by SDS-PAGE after (lanes 5–6) or not (lanes 3–4) RNase treatment and detected by autoradiography (upper panel) or anti-HIS western blot (lower panel).
An untagged strain was used as a control (lanes 1–2). The red asterisk indicates a contaminant band.
See also Figure S6.
Jensen, T.H., Dower, K., Libri, D., and Rosbash, M. (2003). Early formation of Pinskaya, M., Nair, A., Clynes, D., Morillon, A., and Mellor, J. (2009). Nucleo-
mRNP: license for export or quality control? Mol. Cell 11, 1129–1138. some remodeling and transcriptional repression are distinct functions of
Isw1 in Saccharomyces cerevisiae. Mol. Cell. Biol. 29, 2419–2430.
Jensen, T.H., Boulay, J., Olesen, J.R., Colin, J., Weyler, M., and Libri, D. (2004).
Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bio-
Modulation of transcription affects mRNP quality. Mol. Cell 16, 235–244.
conductor package for differential expression analysis of digital gene expres-
Kallehauge, T.B., Robert, M.C., Bertrand, E., and Jensen, T.H. (2012). Nuclear sion data. Bioinformatics 26, 139–140.
retention prevents premature cytoplasmic appearance of mRNA. Mol. Cell 48,
Saint-André, V., Batsché, E., Rachez, C., and Muchardt, C. (2011). Histone H3
145–152.
lysine 9 trimethylation and HP1g favor inclusion of alternative exons. Nat.
Kaplan, N., Moore, I.K., Fondufe-Mittendorf, Y., Gossett, A.J., Tillo, D., Field, Struct. Mol. Biol. 18, 337–344.
Y., LeProust, E.M., Hughes, T.R., Lieb, J.D., Widom, J., and Segal, E. (2009). Santos-Pereira, J.M., and Aguilera, A. (2015). R loops: new modulators of
The DNA-encoded nucleosome organization of a eukaryotic genome. Nature genome dynamics and function. Nat. Rev. Genet. 16, 583–597.
458, 362–366.
Santos-Pereira, J.M., Herrero, A.B., Garcı́a-Rubio, M.L., Marı́n, A., Moreno, S.,
Keller, C., Adaixo, R., Stunnenberg, R., Woolcock, K.J., Hiller, S., and Bühler, and Aguilera, A. (2013). The Npl3 hnRNP prevents R-loop-mediated transcrip-
M. (2012). HP1(Swi6) mediates the recognition and destruction of heterochro- tion-replication conflicts and genome instability. Genes Dev. 27, 2445–2458.
matic RNA transcripts. Mol. Cell 47, 215–227.
Santos-Rosa, H., Schneider, R., Bernstein, B.E., Karabetsou, N., Morillon, A.,
Krajewski, W.A. (2013). Comparison of the Isw1a, Isw1b, and Isw2 nucleo- Weise, C., Schreiber, S.L., Mellor, J., and Kouzarides, T. (2003). Methylation of
some disrupting activities. Biochemistry 52, 6940–6949. histone H3 K4 mediates association of the Isw1p ATPase with chromatin. Mol.
LaCava, J., Houseley, J., Saveanu, C., Petfalski, E., Thompson, E., Jacquier, Cell 12, 1325–1332.
A., and Tollervey, D. (2005). RNA degradation by the exosome is promoted Sims, R.J., 3rd, Millhouse, S., Chen, C.F., Lewis, B.A., Erdjument-Bromage,
by a nuclear polyadenylation complex. Cell 121, 713–724. H., Tempst, P., Manley, J.L., and Reinberg, D. (2007). Recognition of trimethy-
lated histone H3 lysine 4 facilitates the recruitment of transcription postinitia-
Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and
memory-efficient alignment of short DNA sequences to the human genome. tion factors and pre-mRNA splicing. Mol. Cell 28, 665–676.
Genome Biol. 10, R25. Smolle, M., Venkatesh, S., Gogol, M.M., Li, H., Zhang, Y., Florens, L., Wash-
burn, M.P., and Workman, J.L. (2012). Chromatin remodelers Isw1 and
Lenstra, T.L., Benschop, J.J., Kim, T., Schulze, J.M., Brabers, N.A., Margaritis,
Chd1 maintain chromatin structure during transcription by preventing histone
T., van de Pasch, L.A., van Heesch, S.A., Brok, M.O., Groot Koerkamp, M.J.,
exchange. Nat. Struct. Mol. Biol. 19, 884–892.
et al. (2011). The specificity and topology of chromatin interaction pathways in
yeast. Mol. Cell 42, 536–549. Swygert, S.G., and Peterson, C.L. (2014). Chromatin dynamics: interplay
between remodeling enzymes and histone modifications. Biochim. Biophys.
Lisby, M., Rothstein, R., and Mortensen, U.H. (2001). Rad52 forms DNA repair
Acta 1839, 728–736.
and recombination centers during S phase. Proc. Natl. Acad. Sci. USA 98,
8276–8282. Thomsen, R., Saguez, C., Nasser, T., and Jensen, T.H. (2008). General, rapid,
and transcription-dependent fragmentation of nucleolar antigens in S. cerevi-
Lomvardas, S., and Thanos, D. (2001). Nucleosome sliding via TBP DNA bind- siae mRNA export mutants. RNA 14, 706–716.
ing in vivo. Cell 106, 685–696.
Tirosh, I., Sigal, N., and Barkai, N. (2010). Widespread remodeling of mid-cod-
Longtine, M.S., McKenzie, A., 3rd, Demarini, D.J., Shah, N.G., Wach, A., Bra- ing sequence nucleosomes by Isw1. Genome Biol. 11, R49.
chat, A., Philippsen, P., and Pringle, J.R. (1998). Additional modules for
Toiber, D., Erdel, F., Bouazoune, K., Silberman, D.M., Zhong, L., Mulligan, P.,
versatile and economical PCR-based gene deletion and modification in
Sebastian, C., Cosentino, C., Martinez-Pastor, B., Giacosa, S., et al. (2013).
Saccharomyces cerevisiae. Yeast 14, 953–961.
SIRT6 recruits SNF2H to DNA break sites, preventing genomic instability
Luco, R.F., Allo, M., Schor, I.E., Kornblihtt, A.R., and Misteli, T. (2011). Epige- through chromatin remodeling. Mol. Cell 51, 454–468.
netics in alternative pre-mRNA splicing. Cell 144, 16–26.
Torchet, C., Bousquet-Antonelli, C., Milligan, L., Thompson, E., Kufel, J., and
Mellor, J., and Morillon, A. (2004). ISWI complexes in Saccharomyces cerevi- Tollervey, D. (2002). Processing of 30 -extended read-through transcripts by the
siae. Biochim. Biophys. Acta 1677, 100–112. exosome can generate functional mRNAs. Mol. Cell 9, 1285–1296.
Further information and requests for reagents may be directed to, and will be fulfilled by the Lead Contact, Anna Babour
(anna.babour@inserm.fr).
Plasmids
The isw1K227R mutation was amplified from strain YTT1223 (T. Tsukiyama) and subcloned into pRS416-ISW1-2FL using BamHI
(853) XbaI (1555) restriction sites. An isw1DSANT fragment was amplified from the isw1D2641-2799 strain (J. Mellor collection, strain
MP2) and subcloned into pRS416–ISW1-2FL using the AgeI-NheI restriction enzymes to generate pRS416–isw1DSANT-2FL. Simi-
larly, an isw1DSLIDE fragment was amplified from the isw1D2961-3150 strain (J. Mellor collection, strain MP3) using a downstream
primer bearing two copies of the Flag sequence followed by a termination codon and a PstI restriction site. An AgeI-PstI fragment was
subcloned into pRS416–ISW1-2FL, generating the pRS416–isw1DSLIDE-2FL plasmid. pRS415GPD3FL-ISW1 was built as follow.
ISW1 ORF was cloned between the BamHI-PstI sites of pRS415GPD. A 3FL tag was next inserted at the BamHI site. All constructs
were verified by sequencing. Plasmids used in this study are reported in the Key Resources Table.
METHOD DETAILS
Fluorescence Microscopy
Three-dimensional stacks with a 0.2-mm step were acquired by 3D deconvolution microscopy adapted on an DMR upright micro-
scope (Leica) equipped with a CoolSNAP HQ2 charge-coupled device (CCD) camera (Photometrics) and using a 100 3 Plan Apo-
chromat HCX oil immersion objective (NA = 1.4) controlled in the z axis by a piezoelectric motor (LVDT; Physik Instrumente). For
each condition, 200-300 Hoechst-stained cells were examined in 3 biologically independent experiments. Deconvolution, when
applied, was performed automatically using an iterative and measured point spread function (PSF)-based algorithm method
(Gold-Meinel) on batches of image stacks. Identical processing parameters and number of iterations were used. Maximum intensity
projections were performed using ImageJ software.
Co-immunoprecipitation Experiments
For co IP experiments between Isw1 and Rrp4, WT and npl3-1 log phase cells were shifted for 3 hr at 30 C. For all co IPs, cells were
harvested by centrifugation at 4 C and rapidly frozen in liquid nitrogen before cryolysis. Frozen cell grindates were prepared as pre-
viously described (Oeffinger et al., 2007) and thawed in IP buffer (HEPES 20mM pH 7.4; NaCl 150mM; MgCl2 5mM; Glycerol 10%;
Triton X-100 0.5% and protease inhibitors (Roche)). The cleared lysate was incubated for 90 min at 4 C with EZview Red anti-HA
beads (Sigma) or Dynabeads Protein G (Novex Life Technologies). Samples were then washed 5 times with IP buffer for 5 min at
Chromatin Immunoprecipitation
Cell cultures were cross-linked for 10 min with 1.2% formaldehyde (37%; Sigma), which was quenched with 360 mM Glycine. 40 OD
cell pellets were resuspended in 1 mL of lysis buffer (50 mM HEPES-KOH pH 7.5; 140 mM NaCl; 1mM EDTA pH 8.0; 1% Triton X-100;
0.1% Sodium deoxycholate: 0.05% SDS; protease inhibitors). Cells were lysed using the MagNAlyser (Roche) and the lysates son-
icated for two rounds of 10 min (Diagenode). Sheared chromatin (z300 bps DNA fragments) was isolated from cellular debris by
centrifugation 20 min at 13000 rpm at 4 C. IP was realized by rotating samples overnight at 4 C after addition of 5 mg anti-CTD
(8WG16; Covance) to 0.5 mg of proteins from chromatin extracts (500 mL final). 50 mL of pre-washed protein G Sepharose beads
(GE Healthcare) were added to each sample, and incubated 2 hr at room temperature. Beads were successively washed for
5 min in 1 mL of the following buffers: lysis buffer, 500 mM NaCl buffer (50 mM HEPES-KOH pH 7.5; 500 mM NaCl; 1 mM EDTA
pH 8.0; 1% Triton X-100; 0.1% sodium deoxycholate), buffer III (10 mM TrisHCl pH 8.0; 1 mM EDTA pH 8.0; 250 mM LiCl; 1%
NP-40; 1% sodium deoxycholate), and TE pH 8.0 (10 mM Tris HCl pH 8.0; 1 mM EDTA pH 8.0). Elution was performed in 100 mL
elution buffer (50 mM Tris HCl pH 7.5; 1% SDS; 10 mM EDTA pH 8.0) for 20 min at 65 C. IP and input samples were reverse cross-
linked overnight at 65 C with prior addition of 1 mg/mL proteinase K. Genomic DNA was purified using QIAGEN PCR purification Kit
and quantified by qPCR (SYBR Green 1 Master; Light Cycler 480; Roche).
The number of independent experimental replications, the definition of center and precisions measures are reported in the figure leg-
ends (n, mean ± sd) Significance of the observed differences was evaluated using Student’s t test (*P 0.01-0.05; **P 0.001–0.01;
***p < 0.001).
For quantification of FISH experiments, the percentage of cells with nuclear accumulation of poly (A) or bearing a nuclear ‘‘dot’’
were measured from 100 to 300 Hoechst-stained cells of each genotype in 3 independent experiments (mean ± sd).
For measurement of the hyperrecombination phenotype, recombination frequencies were calculated as the median value of 3
fluctuation tests, each one performed with twelve independent colonies for each transformant studied. The mean frequencies are
represented (n = 3, mean ± sd).
The accession number for the RIP seq data reported in this paper is ArrayExpress: E-MTAB-4826. Downstream bioinformatics anal-
ysis was carried out using bespoke codes written in MATLAB, to identify average protein levels (TBP, Isw1, etc.) in different gene
classes. Codes are available on request.
Autoradiograms were quantified using ImageJ.
Correspondence
chait@rockefeller.edu (B.T.C.),
sali@salilab.org (A.S.),
rout@rockefeller.edu (M.P.R.)
In Brief
mRNAs escape the nucleus with help
from a nuclear pore subcomplex that sits
directly over the transport channel in the
cytoplasm.
Highlights
d Integrative structure at 9 Å precision of the endogenous
Nup82 holo-complex
Cell 167, 1215–1228, November 17, 2016 ª 2016 Elsevier Inc. 1215
nucleus. In the final stage, the remodeled mRNA is released into 4,266 particles were classified into 23 class averages (Fig-
the cytoplasm for translation. ure S2C); a majority of these (21) showed what appears to be a
Unfortunately, the precise coordination of these processes at single dimer of Dyn2, in agreement with a previous study (Gaik
the molecular scale has not been elucidated, in large part due et al., 2015) and with our stoichiometry (see above), and were
to the lack of sufficiently detailed information on the spatial thus included in the calculation. Interestingly, two of the class av-
arrangement of transport and remodeling components relative erages seemingly presented two consecutive dimers of Dyn2
to each other and the NPC. Localization studies have led to (Figure S2C, arrowheads), underscoring the previously observed
the proposal that the Nup82 complex forms filaments that proj- heterogeneity of the complex in vivo (Gaik et al., 2015). Instead of
ect orthogonally from the cytoplasmic face of the NPC; such a using a highly uncertain 3D map computed via single-particle
location would imply that exporting mRNPs must first transit reconstruction based on a heterogenous set of images, we relied
the central channel of the NPC before being transferred out to on much more robustly computed 2D class averages, following
these peripheral cytoplasmic filaments, where the final stages a previously demonstrated procedure (Shi et al., 2014). Only
of mRNP remodeling and export would occur distally from the the structured portions of the complex were constrained by the
central channel of the NPC (reviewed in Folkmann et al., 2011; EM data, because we showed that the unstructured FG repeats
Knockenhauer and Schwartz, 2016; Oeffinger and Zenklusen, are not revealed by negative stain EM (Figure S2D).
2012). However, exactly how this transfer would be accom- All components of the complex were used in the final calcula-
plished, and how central channel transit and mRNP processing tion, including FG repeats to account for their excluded volume
could be coordinated, remained unclear. and emanating points. Protein representations were derived
To understand these processes, we solved the structure of the from the atomic structures in the Protein Data Bank, where avail-
endogenous Nup82 complex by using an integrative approach able, or comparative models were built with MODELER 9.13 (Sali
that relies on multiple structural and proteomic data sources and Blundell, 1993) based on the closest homolog with a known
(Alber et al., 2007b; Shi et al., 2014). We also determined how structure detected by HHPred (Söding, 2005) (Figure S3; Table
the Nup82 complex is anchored to the cytoplasmic face of the S1); disordered FG-repeat-containing regions were modeled
NPC via the Nup84 complex, a seven-member assembly forming as flexible strings of beads, guided by our recent nuclear mag-
the outer rings. In addition, we used a combined structural and netic resonance (NMR) data (Hough et al., 2015). Finally, the
functional mapping analysis to elucidate the major mechanism residue-specific spatial proximity and orientation of the different
responsible for mRNA export defects affecting Nup84 complex subunits were determined by a comprehensive chemical cross-
components. Finally, we integrate our data into a detailed map linking with mass spectrometry readout (CX-MS) method, using
of the whole cytoplasmic mRNA export and remodeling machin- two complementary cross-linkers (Figures 2A and S2A) (Shi
ery. We show that, surprisingly, the Nup82 complex positions the et al., 2014). To reduce the intrinsic ambiguity of cross-link
cytoplasmic FG repeats and mRNP remodeling machinery right data arising from the presence of two copies of each protein,
over the NPC’s central channel rather than on distal cytoplasmic we also analyzed a strain expressing an exogenous homolog
filaments, as previously supposed. of Nup82 (skNup82) from the yeast Saccharomyces kudriavzevii
(Borneman et al., 2012) (Figure S2A; STAR Methods), whose
RESULTS distinct protein sequence allows crosslinks to it to be distin-
guished from the endogenous Nup82. We identified a total of
Solving the Structure of the Endogenous Nup82 Holo- 1,131 cross-links (Table S2) that include 662 unique disuccini-
complex midyl suberate (DSS) and 126 unique 1-ethyl-3-(3-dimethylami-
We solved the structure of the endogenous native Nup82 holo- nopropyl)carbodiimide hydrochloride (EDC) cross-links from the
complex (Figure 1) using an integrative modeling approach that wild-type yeast strain and 343 unique DSS cross-links from
has previously allowed us and others to successfully determine the skNup82-containing complex (Figure S2A). The majority of
the molecular architecture of numerous other large native as- the identified inter-molecular cross-links mapped to the coiled-
semblies (Sali et al., 2015). Such integrative strategies have coil, C-terminal regions of Nup159 and Nsp1 and the whole
proven to be suited for the structural analysis of large endoge- Nup82 and Dyn2 proteins. Few inter-molecular cross-links
nous complexes that are by nature flexible, contain unstructured were found to connect to the FG regions of Nup159 or Nsp1
regions, and are conformationally heterogeneous (Shi et al., and none connected to the b-propeller domain of Nup159,
2014; Shi et al., 2015). strongly indicating that those domains are dynamic, peripheral,
We measured the native stoichiometry of the purified Nup82 and not located in proximity to the core of the complex (Gaik
holo-complex by a combination of QConCAT-MS (Pratt et al., et al., 2015).
2006) and classical Siegel and Monte biophysical measurements We computed the structure of the Nup82 complex (Figure 1)
(Figure S1; STAR Methods). The consensus of our analyses re- through our integrative modeling approach as implemented in
sults in a stoichiometry of 2:2:2:2 (Nup159:Nup82:Nsp1:Dyn2), the Integrative Modeling Platform (IMP) program (Russel et al.,
consistent with that previously measured (Gaik et al., 2015) for 2012) using the data described above. A detailed assessment
a truncated overexpressed version of the complex, with the of the input data and the resulting model are shown in Table 1
exception of the Dyn2 dimer, a labile component that, unless and STAR Methods. In summary, the 463 best-scoring solutions
overexpressed (Figure S1E), is present as a single dimer in satisfy within stringent tolerances the data used to compute
the average native complex. The morphology and dimensions them. The clustering analysis of the best-scoring solutions
of the complex were determined by negative stain EM, where identified a single dominant cluster of 370 similar structures.
The corresponding localization probability density map repre- ately adjacent regions in the complex, as validated by those
sents the probability of any volume element being occupied by cliques that coincide with known crystallographic interface re-
a given protein (Figure 1). The 9.0 Å precision of the core struc- gions, such as Nup159:Dyn2 (PDB: 4DS1) (Romes et al., 2012)
tured region is sufficiently high to pinpoint the locations and and Nup159:Nup82 (PDB: 3PBP) (Yoshida et al., 2011) (Fig-
orientations of the constituent proteins and domains, demon- ure 2B); indeed, in our final calculated structure these cliques
strating the quality of the input data, including the cross-links represent immediately adjacent regions in the complex. Second,
and EM 2D class averages (Figure S4; Table 1). those few cross-links in violation of strict distance limits in our
Our structure is validated by seven considerations as follows. structure are nevertheless right next to one of the cliques; they
First, the EDC and DSS cross-links are highly consistent with are thus consistent with the structure when locally limited flexi-
each other, despite different chemistries, and there is significant bility is taken into account (Figures 2A and S4D). Third, mass
highly non-random clustering of both EDC and DSS cross-links tagging of our structure is consistent with the localization of
into equivalent ‘‘cliques’’ (Figure 2A). These represent immedi- GFP tags on both the Nup82 and Nup159 C termini (Figure 2C).
connecting the Nup82 holo-complex to other parts of the NPC, loop. The two ends of the central rod are each formed by the
revealing interaction sites, as described below. C-terminal (spur-1) and the N-terminal (spur-2) bundles of the
CCS domains. Two copies of Dyn2 form a dimer that is perpen-
Features of the Nup82 Holo-complex dicular with spur-2 and seems to help lock the two subunits into
The C termini of Nup82, Nup159, and Nsp1 share a common their asymmetric arrangement. Dyn2 also helps to orient the two
domain arrangement, formed by consecutive helical coiled-coil Nup159 copies, so that their FG regions emanate in parallel from
regions of different length, connected by flexible linkers. They that end of the complex. Interestingly, the FG regions of Nsp1
assemble (together with Dyn2) to form the Nup82 holo-complex, also project from spur-2, forming, together with the Nup159
a roughly ‘‘D’’-shaped particle, which is formed by the asym- FGs, an intrinsically disordered plume. In agreement with prior
metric assembly of two compositionally identical subunits work, the hump formed by the Nup82 b-propellers helps to
(termed subunit 1 [s1] and subunit 2 [s2] in Figure 1). Each sub- lock down the C termini of Nup159 and form the attachment
unit consists mainly of parallel, three-stranded, hetero-trimeric site for two Nup116 copies (Yoshida et al., 2011) (see below).
coiled-coils connected by flexible linkers, consisting of a single
copy of the C termini of Nup82, Nup159, and Nsp1. However, Structure of the Nup82-Nup84 Complex Assembly and
the two subunits adopt different configurations, mainly due to the Cytoplasmic mRNA Export Platform
the different degree of flexion of the hinges between hetero- To understand how the Nup82 holo-complex is associated with
trimeric coiled-coil segments (termed CCSs) and the relative the whole NPC, we isolated it under conditions that preserved its
position of the Nup82 b-propellers. Subunit 1 mainly forms the interaction with other Nups (Fernandez-Martinez et al., 2012).
‘‘rod,’’ while subunit 2 forms the ‘‘loop’’ of the holo-complex, CX-MS was used to analyze those proteins proximally associ-
with both subunits contributing to the spurs (Figure 1). The ated with each of the Nup82 holo-complex’s components (Table
CCS1s2 and CCS2s2 trimers constitute the extended loop that S3). Notably, most of the identified cross-links connected the
can be observed in certain orientations of the particle (Figure 1A, spur-1 region of the Nup82 holo-complex to components of
left and center). The denser region of the complex is formed by the Nup84 complex hub (Figure 3; Table S3) (Shi et al., 2014);
trimeric parallel CCS domains that form the slightly bent, elon- indeed, a direct physical connection between the Nup82 and
gated central rod. Both Nup82 b-propellers are located side Nup84 complexes was recently demonstrated in Chaetomium
by side on top of the rod formed by subunit s1, with Nup82 b-pro- thermophilum (Kellner et al., 2016). Our data, together with our
pellers2 located in trans in a distal position from the CCS1-2s2 prior map of the Nup84 complex (Shi et al., 2014),
Gle1, and Nup159 N termini; PDB: 3RRM; Montpetit et al., 2011; Gle2/RAE1; PDB: 3MMY; Ren et al., 2010; Nup116 C termini; PDB: 3PBP; Yoshida et al., 2011;
and 3NF5; Sampathkumar et al., 2012). The Gle1 N terminus is represented with a homology model of its predicted coiled-coil region as a red ribbon inside a light
gray density of the approximate expected size for the domain.
See also Table S3.
fully consistent, as shown in Figure 5A. The Nup82 holo-complex When the Nup84 complex was aligned to the corresponding in-
overlaps with the localization density of Nup82, facing down into ner copy of its homolog (the Nup107-160 complex), the Nup82
the central channel, and is in close proximity to the Nup85 arm holo-complex aligned with a density projecting only from the
of the Nup84 complex. cytoplasmic ring, pointing toward the central channel (Figures
Previous attempts to align a single EM envelope for the yeast 5B and 5C). It has been suggested that this protrusion might
Nup82 complex to a human cryo-EM NPC map (Bui et al., indeed represent some aspect of the Nup88-Nup214 complex,
2013) led to divergent and ambiguous results (Gaik et al., the vertebrate counterpart to the Nup82 holo-complex (Bui
2015). However, we were able to unambiguously dock the et al., 2013). The yeast and human alignments both support
yeast Nup82-Nup84 complex assembly into the available hu- an overall conservation for certain major features of NPC
man cryo-EM maps (Bui et al., 2013; von Appen et al., 2015). architecture between fungi and metazoa and provide further
DISCUSSION
Further information and requests for reagents may be directed to and will be fulfilled by the Lead Contact author Michael P. Rout
(rout@rockefeller.edu).
Yeast Strains
All Saccharomyces cerevisiae strains used in this study are listed in the Key Resources Table, with the exception of the Nup84 com-
plex truncation mutants that were described in detail in (Fernandez-Martinez et al., 2012). The Nup82 complex tagged strains were
constructed in a W303 (Mata/alpha ade2-1 ura3-1 his3-11,15 trp 1-1 leu2- 3,112 can1-100) background. Otherwise stated, strains
were grown at 30 C in YPD media (1% yeast extract, 2% bactopeptone, and 2% glucose). The Saccharomyces kudriavzevii strain
was obtained from the American Type Culture Collection (ATCC 2601) and grown in the same conditions as referred above for
S. cerevisiae.
METHODS DETAILS
Chemical Cross-linking and Mass Spectrometry Analysis of the S. cerevisiae/S. kudriavzevii Nup82 Holo-complex
To define the relative orientation of the two copies of Nup82 present in the Nup82 holo-complex we expressed an exogenous copy of
Nup82 from the yeast Saccharomyces kudriavzevii (called from now on skNup82). We selected S. kudriavzevii because it is a closely
related species that forms natural hybrids with S. cerevisiae, some of them used for wine fabrication (Borneman et al., 2012), and the
level of conservation at the amino acid level between both species is particularly high, ensuring functionality of the skNup82 version
and enough sequence variation to identify the specific peptides from each species protein version. S. kudriavzevii strain was obtained
from ATCC (ATCC 2601) and genomic DNA was prepared using standard methods. The 30 UTR and open reading frame for skNup82
was amplified and sequenced to account for potential mutations detected in the sequence available in the public database
(GenBank: EHN01740.1). The wild-type verified skNup82 sequence was found to encode a 716 amino acid protein with 75% identity
to the scNup82 primary sequence (alignment available upon request). The upstream 190 nucleotides (promoter) region and
the gene sequence were amplified using primers skN82Prom-F(50 -CACCGAAAGTTTATAGATTCAT-30 ) and skN82GTW_R2
(50 -GCTGGGCCCCTGGAACAGAACTTCCAGGCCGTTTTTTGGCTGAGTATTAGTG-30 ) that introduces an in-frame prescission pro-
tease cleavage site at the end of the skNup82 coding sequence. The PCR product was cloned using the pENTR/D-TOPO Cloning Kit
(Thermo Fisher Scientific) and then transferred to a modified pAG305GPD-ccdb-EGFP plasmid (Addgene), where the GPD promoter
had been eliminated through a SacI-XbaI (New England Biolabs) cleavage and refill. The resulting integrative plasmid, pAG305-
skNup82ppx-EGFP, was linearized using ClaI (New England Biolabs) and transformed into a diploid w303 S. cerevisiae strain. Suc-
cessful integrations were assessed by PCR; correct expression and localization of the skNup82-EGFP construct were confirmed by
western-blot and fluorescence microscopy, that showed the characteristic nuclear rim staining of a properly localized nucleoporin.
Affinity purification of the Nup82 complex using skNup82-EGFP as a handle showed all the components of the native Nup82
complex, including a substoichiometric amount of scNup82, showing correct incorporation of the construct into the native Nup82
complex. The isolated, purified complex (see above for details on purification) was analyzed by CX-MS (see above).
Fluorescence Microscopy
Nup82 was genomically tagged with GFP on selected Nup84 complex truncation yeast mutant strains using standard techniques.
Cells were grown in YPD media at 30 C and visualized with a 63x 1.4 numerical aperture plan-apochromat objective using a Carl
Zeiss Axioplan 2 microscope equipped with a Hamamatsu Orca ER-cooled CCD camera. The system was controlled with Openlab
imaging software (Perkin Elmer). Images were treated with ImageJ (http://imagej.net/Welcome) and Adobe Photoshop (Adobe)
softwares.
Clustering
A prerequisite for structure analysis is the clustering of the structures generated by satisfying the input data (Alber et al., 2007b; Shi
et al., 2014). We used Ca root-mean-square deviation (RMSD) quality-threshold clustering (Shi et al., 2014). In general, there are three
possible modeling outcomes, based on the number of clusters of models and consistency between the models and information (Shi
et al., 2014). First, if only a single model (or a cluster of similar models) satisfies all restraints and all input information, there is likely
sufficient information for determining the structure (with the precision corresponding to the variability within the cluster). Second, if
two or more different models are consistent with the input restraints, the information is insufficient to define the single state or there
are multiple significantly populated states. If the number of distinct models is small, structural differences between models may sug-
gest additional experiments to narrow down the number of possible solutions. Third, if no model satisfies all input information, the
information or its interpretation in terms of the inferred spatial restraints is incorrect, in which case the representation needs to be
modified to include additional degrees of freedom, and/or sampling needs to be improved.
In the case of the Nup82 complex, the clustering analysis identified a single dominant cluster of 370 similar structures (Figures S4A
and S5B), corresponding to the most favorable outcome of the three possibilities described above. The average RMSD between the
major (370 structures) and minor clusters (93 structures) is relatively low at approximately 20Å, considering the resolution of the data,
the resolution of the coarse-grained molecular representation, and the variation within each cluster (Shi et al., 2014) (Figure S4A). As a
result, localization of all components is effectively identical between the major and minor clusters, differing only in the orientation of
the Nup82 b-propeller (Figure S5B). Most importantly, our functional interpretation of the structure is completely robust with regard to
the differences between the means of the two clusters.
Convergence of Sampling
Any structure determination or computational modeling exercise can be described as a structural sampling process, guided by a
scoring function (Alber et al., 2007a). Generally, good-scoring structures need to be found by a sampling, optimization, or enumer-
ation scheme. Unless structures are enumerated, the very first test needs to estimate the thoroughness of structural sampling or opti-
mization (Shi et al., 2014), which is often stochastic (e.g., Monte Carlo and Molecular Dynamics simulations). For stochastic methods,
thoroughness of sampling can be assessed by showing that two independent runs (e.g., using random starting configurations or
different random number generator seeds) do not result in significantly different solutions (Alber et al., 2007a; Fernandez-Martinez
et al., 2012; Shi et al., 2014). Given two or more sets of structures from independent runs, we first cluster structures from all sets
together, followed by assessing whether or not the runs contribute evenly to the population of each cluster, using the p value
from the c-square contingency test for homogeneity of proportions (McDonald, 2014).
For the Nup82 complex, the highly significant p value of 0.972 (Table 1) indicated that our Monte Carlo algorithm sampled all top-
scoring solutions at the resolution better than the precision of the dominant cluster. The caveat is that passing this sampling test is not
absolute evidence of thorough sampling; a positive outcome of the test may be misleading if, for example, the landscape contains
only a narrow, and thus difficult to find, pathway to the pronounced minimum corresponding to the correct structure.
Software
The modeling protocol (i.e., stages 2, 3, and 4) was scripted using the Python Modeling Interface (PMI), version c7411c3, a library for
modeling macromolecular complexes based on our open-source Integrative Modeling Platform (IMP) package, version 2.5 (https://
integrativemodeling.org) (Russel et al., 2012).
To display the CX-MS data we used the software CX-Circos (http://cx-circos.net).
Data Resources
The chemical cross-linking with mass spectrometric readout data used in this study was deposited in the Chorus database (https://
chorusproject.org/pages/index.html).
Files containing the input data, modeling scripts, and output structures are available online (https://salilab.org/nup82; https://
github.com/salilab/nup82).
Correspondence
ramak@mrc-lmb.cam.ac.uk (V.R.),
rhegde@mrc-lmb.cam.ac.uk (R.S.H.)
In Brief
The individual decoding factor,GTPase
complexes involved in protein synthesis
differentially remodel local protein and
RNA elements on ribosomes to ensure
translation fidelity.
Cell 167, 1229–1240, November 17, 2016 ª 2016 MRC Laboratory of Molecular Biology. Published by Elsevier Inc. 1229
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
2012; des Georges et al., 2014; Preis et al., 2014). However, the mRNA, aa-tRNAs, and the nascent chain are averages of the
molecular interactions that accompany initial selection, communi- species captured. Despite this, the density at the decoding cen-
cate information from the decoding center to each GTPase, and ter is well defined, revealing that decoding in eukaryotes shares
mediate decoding factor accommodation in each case remain many features with that in bacteria (Ogle et al., 2001). In par-
incompletely understood. Using high-resolution electron cryo- ticular, the decoding nucleotides A1824 and A1825 (A1492 and
microscopy (cryo-EM), we analyze the molecular basis of A1493 in bacteria) are flipped out of helix 44 (h44) of 18S
specificity at the decoding center for each mammalian decoding rRNA. Together with G626 (G530 in bacteria) in the anti-confor-
factor,translational GTPase complex, compare potential GTPase mation, these bases inspect the geometry of the minor groove
activation mechanisms, and describe the conformational changes of the codon-anticodon helix (Figure 1B) and help stabilize the
governing the accommodation of decoding factors. These results A-site tRNA via hydrogen bonding. These interactions monitor
provide new insights into how these related complexes are Watson-Crick base-pairing at the first two codon positions (+1
able to make discriminatory interactions to recognize the appro- and +2) while providing tolerance at the +3 wobble position.
priate ribosome-mRNA substrates to maintain overall translational As in bacteria (Ogle et al., 2001), the ribosomal protein uS12
fidelity. projects a loop into the decoding center (Figures 1C and S4A).
Gln61 (Lys44 in E. coli) at the apex of the loop indirectly hydrogen
RESULTS AND DISCUSSION bonds with A1824 in its flipped-out position and with the +2
nucleotide. Pro62 adopts a conserved cis-peptide conformation
Cryo-EM Structures of Eukaryotic Translational (Noeske et al., 2015) that allows its backbone carbonyl to form a
Decoding Complexes water- or metal-mediated hydrogen bond with the +3 nucleotide
Translational decoding complexes (here defined as the elonga- (Figures 1C and S4A). Additional hydrogen bonds may be intro-
tion complex, 80S,aa-tRNA,eEF1A; the termination complex, duced by environmental condition-dependent hydroxylation of
80S,eRF1,eRF3; and the rescue complex, 80S,Pelota,Hbs1l) Pro62 (Loenarz et al., 2014; Noeske et al., 2015). Notably, these
are transient states that either rapidly dissociate or progress to hydrogen bonds are only with the mRNA backbone, allowing for
an accommodated state upon codon recognition. We therefore wobble base-pairing at the +3 position.
developed methods to trap or assemble these complexes Relative to bacterial decoding, the eukaryotic-specific ribo-
(Figure S1 and STAR Methods). To prepare the elongation com- somal protein eS30 may enhance the stability of a correct
plex, ongoing in vitro translation reactions in rabbit reticulocyte codon-anticodon interaction. In the presence of a cognate aa-
lysate of an N-terminally tagged protein were inhibited by the tRNA, the N terminus of eS30 becomes ordered, allowing a
elongation inhibitor didemnin B (Rinehart et al., 1981), and conserved histidine (His76) to reach into a groove between the
the ribosome-nascent chains (RNCs) were affinity purified via phosphate backbone of the anticodon +1 position and the two
the partially synthesized nascent polypeptide. To generate the flipped-out decoding bases to form potentially stabilizing con-
termination complex, we programmed and affinity purified tacts (Figures 1B and 1D). Because this groove depends on
RNCs with a UGA stop codon in the A site that were reconsti- the flipped nucleotides that accompany canonical codon-anti-
tuted with eRF1, eRF3, and the nonhydrolyzable GTP analog codon base-pairing, this interaction may preferentially stabilize
GMPPCP. Rescue complexes were prepared similarly to pro- cognate tRNAs to enhance discrimination.
duce RNCs containing an empty A site (generated with a trun- The A- and P-site tRNAs also appear to stabilize 15 residues
cated mRNA), or an A site occupied by either a stop codon or at the C terminus of uS19 that interacts with the phosphate
an AAA codon within a polyadenylated (poly(A)) tail, that were re- backbone of the P-site tRNA and may make electrostatic in-
constituted with Pelota, Hbs1l, and GMPPCP. The structure of teractions with the A-site tRNA (Figure 1E). Similar tRNA-
each complex was solved by cryo-EM to between 3.3 and dependent transitions in ribosomal proteins are observed in
3.8 Å resolution (Figure S2; Tables S1 and S2). bacteria, with the C terminus of uS13 instead of uS19 thread-
Each complex represents an unrotated ribosome containing ing between the anticodon stem loops of the A- and P-site
canonical P- and E-site tRNAs (Figures 1, 2, 3, and S2). The tRNAs in bacteria (Jenner et al., 2010). Deletion of the uS13
GTPase (G) domain and domains 2 and 3 of each GTPase C terminus in bacteria is associated with a reduced rate of
(Figure S3A) were well resolved, while the highly divergent translation and less efficient tRNA selection (Faxén et al.,
N-terminal extensions of Hbs1l and eRF3 were not visualized, 1994). Thus, the contacts formed by uS19, and especially by
presumably due to their flexibility. Each decoding factor (Fig- eS30, which is dependent on a cognate aa-tRNA, could in-
ure S3B) assumes a pre-accommodated conformation: the crease the stability of aa-tRNAs during initial selection and ac-
tRNA acceptor arm or the homologous M-C domains of eRF1 commodation, thereby reducing erroneous ejection of cognate
or Pelota interacts with the GTPase, and the tRNA anticodon aa-tRNAs during kinetic proofreading.
stem loop or structurally distinct N domain of eRF1 or Pelota Stop Codon Decoding by eRF1
occupies the decoding center (Figures 1, 2, and 3). Unlike translation elongation, the factors and mechanisms medi-
ating translation termination are not conserved between pro-
Decoding Factor Interactions at the Ribosomal karyotes and eukaryotes (Dever and Green, 2012). This includes
Decoding Center the mechanism of stop codon recognition, as well as the role of
Sense Codon Decoding in Eukaryotes termination-associated GTPases. Recent cryo-EM structures
As the ribosomes in the elongation complex (Figure 1A) are have revealed how accommodated eRF1 interacts with stop
stalled at different codons by didemnin B, the density for the codons (Brown et al., 2015b; Matheisl et al., 2015). However,
E
uS13 (80S) His76
uS13 (70S)
uS19 (80S) a flipped-out A1825, and the base
P-site uS19 (70S) following the stop codon (+4) stacks with
tRNA G626 in the anti-conformation (Figures
A/T aa-tRNA 2B, 2C, and S4E) (Brown et al., 2015b;
uS12
Matheisl et al., 2015). Improved density
eS30 for the mRNA further reveals that the +5
base can stack with nucleotide C1698 of
18S rRNA, which protrudes into the
mRNA channel (Figures 2B and 2C). The
mRNA increased stability imparted by this addi-
tional stacking interaction explains why
a +5 purine can increase the effectiveness
the mechanism of stop codon recognition during the initial of a ‘‘weak’’ stop codon with a +4 pyrimidine (McCaughan et al.,
eRF1,eRF3 interaction with 80S ribosomes was unclear, as 1995).
earlier structures had only visualized this complex at moderate Recognition of Stalled Translation Complexes by Pelota
resolution (Taylor et al., 2012; des Georges et al., 2014; Preis Pelota has been reported to bind stalled ribosomes with an
et al., 2014; Muhs et al., 2015). To address this problem, pro- empty A site as well as those with an mRNA-occupied A site
grammed RNCs with a UGA stop codon in the A site were without sequence preference (Shoemaker et al., 2010). To deter-
used to isolate three intermediate states along the canonical mine the basis for this sequence-independent engagement by
termination pathway: (1) delivery of eRF1 to the stop codon by the rescue complex, we utilized our reconstitution method to
eRF3; (2) accommodated eRF1; and (3) accommodated eRF1 assemble 80S,Pelota,Hbs1l complexes with an A site that
after ABCE1 recruitment (Figures 2A, S1, S2, and S4B–S4D) lacked mRNA (assembled on a truncated mRNA), or that con-
(Brown et al., 2015b). tained either the UGA stop codon or the AAA sense codon
The structures show that the stop codon maintains the same (due to translation stalling within a poly(A) tail) (Figures 3A, S1,
compacted geometry and interactions with the eRF1 N domain and S2). The complex assembled on a truncated mRNA shows
(Brown et al., 2015b; Matheisl et al., 2015) throughout the termi- that the b30 -b40 loop of Pelota extends from the N domain to
nation pathway (Figures 2B and S4B–S4D), despite large re- protrude into the empty mRNA channel, following the path nor-
arrangements of the M and C domains of eRF1 (see below). In mally taken by mRNA (Figures 3B and S4F). A similar path is
this configuration, the +2 and +3 stop codon bases stack with taken by the shorter b30 -b40 loop of yeast Dom34 as observed
G626 C1698
eRF1
samples suggest that Pelota,Hbs1l is not
+5 recognizing a minor population of ribo-
NC-stop
mRNA somes that do not contain mRNA in the A
40S mRNA 18S rRNA site. Instead, we favor a mechanism by
which the Pelota b30 -b40 loop is able to
bind a variety of mRNA substrates and, in
C doing so, destabilizes the mRNA within
A1825
the channel. In support of this, the moder-
ate-resolution structure of Dom34,Hbs1
+2
Mg bound to ribosomes stalled by mRNA sec-
+3
+4 ondary structure (Becker et al., 2011) also
mRNA noted poor density within the mRNA
channel.
G626 Distinct Molecular Interactions
+1
C1698 +5
Govern Decoding Factor Selection
Comparisons of the overall architectures
(Figures 1A, 2A, and 3A) and the decoding
centers of our structures (Figures 1B, 2B,
18S rRNA and 3B) suggest that the mammalian ribo-
some does not display translational sta-
tus-specific cues to favor engagement by
a particular decoding factor,translational
GTPase complex. Instead, successful re-
cognition relies on decoding factors ex-
at moderate resolution (Becker et al., 2011). However, the ploiting the inherent plasticity of the mRNA and the ribosomal de-
higher-resolution information in our map allows the details of coding center, with sampling preference being biased by the
this interaction to be analyzed. The highly conserved residue overall abundance and local concentrations of each complex.
(Arg45) at the top of the b30 -b40 loop appears to play an Highly specific interactions form between decoding factors
anchoring role in the complex. Arg45 can hydrogen bond with and mRNA sequences during elongation and termination. In
His100, which is part of a conserved (Y/F/H)HT sequence on particular, the ribosomal protein eS30 may contribute to
b60 that interacts with 18S rRNA (Figure 3C). Arg45 is also part increasing the stringency of sense codon decoding in eukary-
of a wider hydrogen-bonding network that includes the decoding otes relative to bacteria. By contrast, the b30 -b40 loop of Pelota
nucleotide G626 in the anti-conformation (Figure 3C). Residues invariably inserts into the mRNA channel and follows the path
60-61 prevent the decoding nucleotide A1824 from flipping out normally taken by mRNA, regardless of the mRNA substrate
of h44, while A1825 is flipped out and interacts with Arg62. (Figure 3). Having to compete with mRNA for the channel may
Together, these and other potential interactions with uS3 and mean that Pelota,Hbs1l undergoes more futile attempts to
uS5 probably stabilize the otherwise flexible and poorly con- engage the ribosome than other decoding complexes. This
served loop (Kobayashi et al., 2010). Thus, the b30 -b40 loop is barrier and the relatively low abundance of Pelota and Hbs1l
well positioned to sense A site occupancy. (Geiger et al., 2012) probably renders Pelota,Hbs1l a poor
Surprisingly, in both reconstructions containing mRNA seq- competitor for elongating or terminating ribosomes. Only during
uence downstream of the P site, the conformation of the b30 - protracted periods of stalling, or with a truncated mRNA, would
b40 loop in the mRNA channel is unchanged, and we observe the likelihood of the b30 -b40 loop engaging the ribosomal A site
little to no density for the mRNA in the A site, while the mRNA up- increase.
stream of the A site is also noticeably more disordered (Figures Once inserted, the loop maintains the mRNA in a less stable
3D and 3E). The high occupancy of Pelota,Hbs1l in these data- state that may facilitate subsequent endonucleolytic cleavage
sets (26%), the purity of our biochemically isolated complexes, and/or ribosome splitting. Although endogenous substrates of
and no evidence of endonucleolytic mRNA cleavage in our Pelota,Hbs1l remain poorly characterized, this model is
loop (Figure 6C). In our complexes, this insertion forms an of nucleotide A4607 of the SRL (His84 and A2662, respectively,
amphipathic a helix (a2; Figures 6A–6C) connected by a short in E. coli), and the hydrophobic gate formed by residues Val16
loop to a helical turn before adopting the same conformation and Ile71 (Val20 and Ile61 in EF-Tu) appears to be in an open
observed in the EF-Tu,GMPPCP complex (Voorhees et al., conformation. Similar configurations were observed in the
2010) (Figure 6A). In the elongation complex, the a2 helix lies termination and rescue complexes, which were reconstituted
across the surface of eEF1A to bury the hydrophobic face, while with GMPPCP (Figure S6B). Notably, the G domain of Hbs1l is
the polar residues on the other side interact with the ribosome. At further from the SRL, and the catalytic histidine (His348) is less
the top of the a2 helix, Arg37 stacks with nucleotide A464 from strongly coordinated. This could increase the length of time
h14 of SSU rRNA. The C-terminal part of the a2 helix and the Pelota,Hbs1l needs to be associated with the ribosome before hy-
following loop (residues 48–53) make multiple interactions with drolysis occurs, thereby increasing the stringency for a productive
uL14: Glu48 of eEF1A potentially forms salt bridge with Arg131 encounter.
of uL14, and contacts between the eEF1A loop with Arg6 and Specialization of Translational GTPases Regulates
Gly7 of uL14 appear to stabilize the usually disordered N termi- Initial Selection and Activation
nus of uL14. Additional contacts occur between Ser53 and Although the three translational GTPase partners share consid-
nucleotide G4600 of the SRL (Figure 6B). A similar network of in- erable structural similarity and superpose with root-mean-
teractions is seen for the a2 helix of eRF3 and Hbs1l. Together, square deviation (RMSD) values between 1.4 and 1.9 Å, they
these eukaryotic-specific interactions may help to stabilize the cannot complement each other (Wallrapp et al., 1998) and
switch 1 region, perhaps explaining why it is not disordered possess divergent interfaces specialized to interact with their
despite loss of the g-phosphate in the didemnin B-stalled elon- respective decoding factor (Figures S5B, S5F, and S5G). Similar
gation complex. sub-functionalization has not occurred in archaea, where aEF1a
An effect of the ordered switch 1 region in the elongation com- plays an omnipotent role to deliver aa-tRNA, aRF1, and aPelota
plex is that the eEF1A catalytic residues adopt the same confor- to ribosomes (Saito et al., 2010).
mation as seen in the ‘‘activated’’ state of EF-Tu trapped on the Our structures suggest several advantages of having a dedi-
bacterial ribosome by GMPPCP (Figure S6B) (Voorhees et al., cated translational GTPase for each decoding factor in maintain-
2010). In this conformation, the eEF1A catalytic histidine His95 ing overall translational fidelity. First, improved affinity between
on the switch 2 loop is coordinated by the phosphate backbone decoding factors and individual GTPases (Figures S5A–S5G),
B C
combined with distinct temporal and spatial distribution patterns, Conformational Changes Coordinate Decoding Factor
probably contribute to higher selectivity during decoding. Sec- Accommodation
ond, non-redundant pairing may allow for distinct mechanisms After GTP hydrolysis and GTPase dissociation, the decoding
for communicating decoding events to the GTPase (e.g., Fig- factor needs to accommodate fully into the PTC without dissoci-
ure 4), possibly via direct interactions between the decoding fac- ating from the ribosome. Our structural snapshots of the transla-
tor and motifs needed for GTP hydrolysis (Figures S5J and S5L). tion termination pathway reveal the conformational effects of
Finally, specialized complexes may have different dissociation accommodation on eRF1 and the ribosome (Figure 7A). After
constants and basal activation barriers to GTP hydrolysis that eRF3 dissociates, the M and C domains of eRF1 undergo large
could alter the general competitiveness of each decoding com- interdependent rotations relative to the static N domain. The
plex (Figures 4 and S6B). pre-accommodated and accommodated M domains are related
in the rates of GTPase activation and accommodation of eukary- Alksne, L.E., Anthony, R.A., Liebman, S.W., and Warner, J.R. (1993). An accu-
racy center in the ribosome conserved over 2 billion years. Proc. Natl. Acad.
otic decoding complexes. Together, these distinctions likely trans-
Sci. USA 90, 9538–9541.
late into decisive differences in the competitive advantage of each
Amunts, A., Brown, A., Bai, X.-C., Llácer, J.L., Hussain, T., Emsley, P., Long,
decoding complex for different ribosome-mRNA substrates.
F., Murshudov, G., Scheres, S.H.W., and Ramakrishnan, V. (2014). Structure
of the yeast mitochondrial large ribosomal subunit. Science 343, 1485–1489.
STAR+METHODS Andersen, G.R., Pedersen, L., Valente, L., Chatterjee, I., Kinzy, T.G., Kjeldgaard,
M., and Nyborg, J. (2000). Structural basis for nucleotide exchange and compe-
Detailed methods are provided in the online version of this paper tition with tRNA in the yeast elongation factor complex eEF1A:eEF1Balpha. Mol.
and include the following: Cell 6, 1261–1266.
Andersson, D.I., van Verseveld, H.W., Stouthamer, A.H., and Kurland, C.G.
d KEY RESOURCES TABLE (1986). Suboptimal growth with hyper-accurate ribosomes. Arch. Microbiol.
d CONTACT FOR REAGENT AND RESOURCE SHARING 144, 96–101.
Bischoff, L., Berninghausen, O., and Beckmann, R. (2014). Molecular basis for des Georges, A., Hashem, Y., Unbehaun, A., Grassucci, R.A., Taylor, D.,
the ribosome functioning as an L-tryptophan sensor. Cell Rep. 9, 469–475. Hellen, C.U.T., Pestova, T.V., and Frank, J. (2014). Structure of the mammalian
ribosomal pre-termination complex associated with eRF1*eRF3*GDPNP.
Blanchard, S.C., Gonzalez, R.L., Kim, H.D., Chu, S., and Puglisi, J.D. (2004). Nucleic Acids Res. 42, 3409–3418.
tRNA selection and kinetic proofreading in translation. Nat. Struct. Mol. Biol.
Guydosh, N.R., and Green, R. (2014). Dom34 rescues ribosomes in 30 untrans-
11, 1008–1014.
lated regions. Cell 156, 950–962.
Brown, A., Long, F., Nicholls, R.A., Toots, J., Emsley, P., and Murshudov, G. Hossain, M.B., van der Helm, D., Antel, J., Sheldrick, G.M., Sanduja, S.K., and
(2015a). Tools for macromolecular model building and refinement into electron Weinheimer, A.J. (1988). Crystal and molecular structure of didemnin B, an
cryo-microscopy reconstructions. Acta Crystallogr. D Biol. Crystallogr. 71, antiviral and cytotoxic depsipeptide. Proc. Natl. Acad. Sci. USA 85, 4118–
136–153. 4122.
Brown, A., Shao, S., Murray, J., Hegde, R.S., and Ramakrishnan, V. (2015b). Jenner, L.B., Demeshkina, N., Yusupova, G., and Yusupov, M. (2010). Struc-
Structural basis for stop codon recognition in eukaryotes. Nature 524, 493–496. tural aspects of messenger RNA reading frame maintenance by the ribosome.
Bruno, I.J., Cole, J.C., Kessler, M., Luo, J., Motherwell, W.D., Purkis, L.H., Nat. Struct. Mol. Biol. 17, 555–560.
Smith, B.R., Taylor, R., Cooper, R.I., Harris, S.E., and Orpen, A.G. (2004). Klink, B.U., Goody, R.S., and Scheidig, A.J. (2006). A newly designed micro-
Retrieval of crystallographically-derived molecular geometry information. spectrofluorometer for kinetic studies on protein crystals in combination with
J. Chem. Inf. Comput. Sci. 44, 2133–2144. x-ray diffraction. Biophys. J. 91, 981–992.
Carelli, J.D., Sethofer, S.G., Smith, G.A., Miller, H.R., Simard, J.L., Merrick, Kobayashi, K., Kikuno, I., Kuroha, K., Saito, K., Ito, K., Ishitani, R., Inada, T.,
W.C., Jain, R.K., Ross, N.T., and Taunton, J. (2015). Ternatin and improved and Nureki, O. (2010). Structural basis for mRNA surveillance by archaeal
synthetic variants kill cancer cells by targeting the elongation factor-1A ternary Pelota and GTP-bound EF1a complex. Proc. Natl. Acad. Sci. USA 107,
complex. eLife 4, e10222. 17575–17579.
Chauvin, C., Salhi, S., Le Goff, C., Viranaicken, W., Diop, D., and Jean-Jean, O. Kobayashi, K., Saito, K., Ishitani, R., Ito, K., and Nureki, O. (2012). Structural
(2005). Involvement of human release factors eRF3a and eRF3b in translation basis for translation termination by archaeal RF1 and GTP-bound EF1a com-
termination and regulation of the termination complex formation. Mol. Cell. plex. Nucleic Acids Res. 40, 9319–9328.
Biol. 25, 5801–5811. Kramer, E.B., Vallabhaneni, H., Mayer, L.M., and Farabaugh, P.J. (2010). A
Chen, V.B., Arendall, W.B., 3rd, Headd, J.J., Keedy, D.A., Immormino, R.M., comprehensive analysis of translational missense errors in the yeast Saccha-
Kapral, G.J., Murray, L.W., Richardson, J.S., and Richardson, D.C. (2010). romyces cerevisiae. RNA 16, 1797–1808.
MolProbity: All-atom structure validation for macromolecular crystallography. Krastel, P., Roggo, S., Schirle, M., Ross, N.T., Perruccio, F., Aspesi, P., Jr.,
Acta Crystallogr. D Biol. Crystallogr. 66, 12–21. Aust, T., Buntin, K., Estoppey, D., Liechty, B., et al. (2015). Nannocystin A:
An Elongation Factor 1 inhibitor from Myxobacteria with differential anti-can-
Chen, S., McMullan, G., Faruqi, A.R., Murshudov, G.N., Short, J.M., Scheres,
cer properties. Angew. Chem. Int. Ed. Engl. 54, 10149–10154.
S.H.W., and Henderson, R. (2013). High-resolution noise substitution to mea-
sure overfitting and validate resolution in 3D structure determination by single Kucukelbir, A., Sigworth, F.J., and Tagare, H.D. (2014). Quantifying the local
particle electron cryomicroscopy. Ultramicroscopy 135, 24–35. resolution of cryo-EM density maps. Nat. Methods 11, 63–65.
Lee, H.H., Kim, Y.-S., Kim, K.H., Heo, I., Kim, S.K., Kim, O., Kim, H.K., Yoon,
Cheng, Z., Saito, K., Pisarev, A.V., Wada, M., Pisareva, V.P., Pestova, T.V.,
J.Y., Kim, H.S., Kim, D.J., et al. (2007). Structural and functional insights into
Gajda, M., Round, A., Kong, C., Lim, M., et al. (2009). Structural insights into
Dom34, a key component of no-go mRNA decay. Mol. Cell 27, 938–950.
eRF3 and stop codon recognition by eRF1. Genes Dev. 23, 1106–1118.
Li, L.H., Timmins, L.G., Wallace, T.L., Krueger, W.C., Prairie, M.D., and Im,
Cochella, L., and Green, R. (2005). An active role for tRNA in decoding beyond
W.B. (1984). Mechanism of action of didemnin B, a depsipeptide from the
codon:anticodon pairing. Science 308, 1178–1180.
sea. Cancer Lett. 23, 279–288.
Crepin, T., Shalak, V.F., Yaremchuk, A.D., Vlasenko, D.O., McCarthy, A., Ne- Li, X., Mooney, P., Zheng, S., Booth, C.R., Braunfeld, M.B., Gubbens, S.,
grutskii, B.S., Tukalo, M.A., and El’skaya, A.V. (2014). Mammalian translation Agard, D.A., and Cheng, Y. (2013). Electron counting and beam-induced mo-
elongation factor eEF1A2: X-ray structure and new features of GDP/GTP ex- tion correction enable near-atomic-resolution single-particle cryo-EM. Nat.
change mechanism in higher eukaryotes. Nucleic Acids Res. 42, 12939– Methods 10, 584–590.
12948.
Loenarz, C., Sekirnik, R., Thalhammer, A., Ge, W., Spivakovsky, E., Mackeen,
Crews, C.M., Collins, J.L., Lane, W.S., Snapper, M.L., and Schreiber, S.L. M.M., McDonough, M.A., Cockman, M.E., Kessler, B.M., Ratcliffe, P.J., et al.
(1994). GTP-dependent binding of the antiproliferative agent didemnin to elon- (2014). Hydroxylation of the eukaryotic ribosomal decoding center affects
gation factor 1 alpha. J. Biol. Chem. 269, 15411–15414. translational accuracy. Proc. Natl. Acad. Sci. USA 111, 4019–4024.
Demeshkina, N., Jenner, L., Westhof, E., Yusupov, M., and Yusupova, G. Matheisl, S., Berninghausen, O., Becker, T., and Beckmann, R. (2015). Struc-
(2012). A new understanding of the decoding principle on the ribosome. Nature ture of a human translation termination complex. Nucleic Acids Res. 43, 8615–
484, 256–259. 8626.
Requests for reagents may be directed to Lead Contact Ramanujan S. Hegde (rhegde@mrc-lmb.cam.ac.uk).
Cell Lines
HEK293T cells used for protein expression were maintained in DMEM (high glucose, GlutaMAX, pyruvate) with 10% fetal bovine
serum.
METHOD DETAILS
Constructs
An SP64-based plasmid encoding 3X Flag-tagged Sec61b containing the autonomously folding villin headpiece (VHP) domain was
used to generate transcripts truncated after the Val68 codon of Sec61b (Shao et al., 2013). For termination complexes, the same
Sample Preparations
Elongation complex
A transcript encoding 3X Flag-tagged KRas was translated in vitro. A final concentration of 50 mM didemnin B was added after 7 min
to stall ribosome-nascent chain complexes (RNCs) at the stage of tRNA delivery by eEF1A and the reaction allowed to proceed to
25 min. 4 mL translation reaction was directly incubated with 100 mL (packed volume) of anti-Flag M2 beads (Sigma) for 1 hr at 4 C
with gentle mixing. The beads were washed sequentially with 6 mL 50 mM HEPES, pH 7.4, 100 mM KOAc, 5 mM Mg(OAc)2, 0.1%
Triton X-100, 1 mM DTT; 6 mL 50 mM HEPES, pH 7.4, 250 mM KOAc, 5 mM Mg(OAc)2, 0.5% Triton X-100, 1 mM DTT; and 6 mL RNC
buffer (50 mM HEPES, pH 7.4, 100 mM KOAc, 5 mM Mg(OAc)2, 1 mM DTT). Two sequential elutions were carried out with 100 mL
0.1 mg/mL 3X Flag peptide (Sigma) in RNC buffer at room temperature for 25 min. The elutions were combined and centrifuged
at 100,000 rpm at 4 C for 40 min in a TLA120.2 rotor (Beckman Coulter) before resuspension of the ribosomal pellet in RNC buffer
containing 5 mM didemnin B. The resuspended RNCs were adjusted to 120 nM and directly frozen to grids for cryo-EM analysis.
Termination complexes
3X Flag-tagged Sec61b containing the autonomously-folding villin headpiece domain with a UGA stop codon was translated in vitro
with 0.5 mM eRF1(AAQ) to trap termination complexes (Brown et al., 2015b). After 25 min, translation reactions were adjusted to
750 mM KOAc, 15 mM Mg(OAc)2 and spun on a 0.5M sucrose cushion containing 50 mM HEPES, pH 7.4, 750 mM KOAc, 15 mM
Mg(OAc)2 at 100,000 rpm for 1 hr at 4 C in a TLA100.3 rotor (Beckman Coulter). The ribosome pellets from 4 mL translation reactions
were resuspended in RNC buffer and incubated with 100 mL (packed volume) of anti-Flag M2 beads (Sigma) for 1-1.5 hr at 4 C with
gentle mixing. The beads were washed sequentially with 6 mL 50 mM HEPES, pH 7.4, 100 mM KOAc, 5 mM Mg(OAc)2, 0.1% Triton
X-100, 1 mM DTT; 6 mL 50 mM HEPES, pH 7.4, 250 mM KOAc, 5 mM Mg(OAc)2, 0.5% Triton X-100, 1 mM DTT; and 6 mL RNC buffer.
Miscellaneous biochemistry
SDS-PAGE was with 10% or 12% Tris-tricine polyacrylamide gels run at 100 V for 85-90 min. For autoradiography and direct visu-
alization of protein bands, gels were fixed and stained with Coomassie R250, destained and directly imaged, or dried and exposed on
MR film (Kodak Carestream BioMax). For immunoblotting, gels were transferred to 0.2 mm nitrocellulose membrane (Bio-Rad) in a wet
transfer system at 100V for 50 min. Blots were blocked and incubated with primary and secondary antibodies in 5% milk in PBS +
0.1% Tween. Antibodies were used at the following concentrations: 1:4000 aHbs1l, 1:4000 aABCE1, 1:1000 aeRF1, 1:1000 aeEF1A,
1:100 auL6, and 1:100 auS9. Secondary antibodies were used at 1:2500 or 1:5000.
Functional assays were conducted with 35S-methionine-labeled RNCs isolated under high salt conditions and affinity purified via
the Flag tag exactly as described for cryo-EM grid preparation. The radiolabeled RNCs were then incubated with the recombinant
proteins, 1 mM puromycin, or 0.5 mM GTP or GMPPCP at 32 C for 15 min before analysis by SDS-PAGE and autoradiography.
To sequence 28S rRNA, ribosomes were isolated from crude RRL from two rabbits under high salt conditions, and the RNA ex-
tracted using the RNeasy system (QIAGEN). Electrophoresis on 5% TBE-acrylamide gels and toluidine blue staining verified high re-
covery of 28S and 18S rRNA bands. The RNA sample was reverse transcribed with ArrayScript reverse transcriptase (Thermo Fisher)
according to the manufacturer’s instructions and used for PCR reactions to amplify and sequence portions of the 28S sequence with
Sanger sequencing. This revealed some rabbit-to-rabbit variability, and allowed for certain portions (but not all) of the 28S rRNA
sequence to be determined with high confidence based on alignments with highly conserved regions. These regions were incorpo-
rated into the final model (see below).
Image Processing
Details for the processing of each complex are presented in Table S1. Movies frames were aligned using whole-image motion correc-
tion (Li et al., 2013) to reduce beam-induced blurring of the images. Micrographs that displayed evidence of astigmatism, charging,
contamination, and poor contrast were excluded. Parameters of the contrast transfer function for each motion-corrected micrograph
were obtained using Gctf (Zhang, 2016). Ribosome particles were selected from the images using the interactive semi-automatic
swarm tool in the e2boxer.py program of EMAN2 (Tang et al., 2007) or with semi-automated particle picking implemented in RELION
1.4 (Scheres, 2015). Reference-free two-dimensional class averaging was used to discard non-ribosomal particles, with those picked
using RELION subjected to an additional sorting step (Scheres, 2015).
Particles retained after two-dimensional classification underwent an initial three-dimensional refinement using a 30 Å low-pass
filtered cryo-EM reconstruction of a rabbit ribosome (EMDB 3039) as an initial model. After refinement, statistical particle-based
movie correction was performed in RELION 1.4 (Scheres, 2015) that included a resolution and dose-dependent model for the radi-
ation damage, in which each frame is B-factor weighted as estimated from single-frame reconstructions (Scheres, 2014).
The resulting ‘shiny’ particles were then subjected to three-dimensional classification to separate different compositions and con-
formations of the ribosome complexes and isolate particles with high occupancy of the desired factors. This step was omitted for the
80S,aa-tRNA,eEF1A complex. Particles retained after three-dimensional classification were subjected to focused classification with
signal subtraction (FCwSS) (Bai et al., 2015) to further isolate particles containing the desired factor. After FCwSS, an additional round
of 3D classification and refinement were used to obtain the final maps.
Reported resolutions are based on the Fourier shell correlation (FSC) 0.143 criterion (Rosenthal and Henderson, 2003). High-res-
olution noise substitution was used to correct for the effects of a soft mask on FSC curves (Chen et al., 2013). Before visualization,
density maps were corrected for the modulation transfer function of the Falcon II detector and then sharpened by applying a negative
B-factor that was estimated using automated procedures (Rosenthal and Henderson, 2003). Local resolution was quantified using
ResMap (Kucukelbir et al., 2014).
Model building
Ribosome
Both subunits of the mammalian ribosome (PDB accession code 3JAH) (Brown et al., 2015b) were individually docked into the map
with Chimera (Pettersen et al., 2004). The atomic models of the ribosomal proteins and 18S rRNA were modified in Coot v0.8 to agree
with the rabbit sequences and optimized for fit to density using rigid body fitting followed by real-space refinement in Coot (Brown
et al., 2015a; Emsley et al., 2010). Where possible, the atomic model of 28S rRNA was modified to reflect the rabbit sequence
(OryCun2.0 GCA_000003625.1). However, since this sequence had insufficient coverage, we also attempted to sequence the 28S
rRNA directly from ribosomes extracted from RRL (see above for experimental procedures). The model was then modified to agree
with regions with high sequencing confidence (bases 725-965, 1271-2888, 3584-3867) or, in well-conserved areas, to better match
the complete 28S rRNA sequences from human (NCBI accession NR_003287.2) and rat (a closely related rodent; NCBI accession
NR_046246.1). Human numbering is used for the rRNA (NCBI accession NR_003287.2 for 28S, and X03205.1 for 18S). See Table
S3 for the numbering and sequence in the ribosome model aligned with the human reference.
Elongation complex
Because our structure represents a mixture of species, the starting models for the P- and E-site tRNAs and the mRNA were
taken from our previous structure (PDB accession code 3JAH) (Brown et al., 2015b). P-site tRNAVal was also used as an initial
model for the A-site tRNA. The fit of the tRNAs and mRNA to the density were optimized using rigid body fitting and real space
refinement. The crystal structure of yeast eEF1A (PDB accession code 1F60) (Andersen et al., 2000) was docked into density at
the GAC. The switch I loop region was taken from the structure of EF-Tu bound to GMPPNP (PDB accession code 2C78) (Par-
meggiani et al., 2006). The model of eEF1A was modified to the rabbit sequence (UniProt ID: P68105) and manually fit to
density.
The small molecule crystal structure of didemnin B (Hossain et al., 1988) was docked into empty density near eEF1A and adjusted
in Coot using real space refinement with chemical restraints generated using Phenix.elbow (Adams et al., 2010). The geometry of
didemnin B model was analyzed using Mogul, a molecular-geometry library derived from the Cambridge Structural Database
(CSD) (Bruno et al., 2004). Some of the restraints generated from Phenix.elbow were adjusted to match the median angles and dis-
tances identified by Mogul. These modified restraints were then applied during refinement in Coot and REFMAC.
Molecular graphics
All figures were generated with Chimera (Pettersen et al., 2004) or PyMOL (Schrödinger, LLC).
All reported resolutions are based on the Fourier shell correlation (FSC) 0.143 criterion (Rosenthal and Henderson, 2003).
Data Resources
Nine maps have been deposited with the EMDB with accession codes EMDB: 4129, EMDB: 4130, EMDB: 4131, EMDB: 4132, EMDB:
4133, EMDB: 4134, EMDB: 4135, EMDB:4136, and EMDB: 4137. Atomic coordinates have been deposited with the Protein Data
Bank under accession codes PDB: 5LZS, PDB: 5LZT, PDB: 5LZU, PDB: 5LZV, PDB: 5LZW, PDB: 5LZX, PDB: 5LZY and PDB: 5LZZ.
di tine in
B
e yc
n
AUG
ni
em om
m
(Met)
an X
de
is
H
3X FLAG VHP β UGA
C
NC-stop 5’ 3’
AUG GUU input eEF1A
(Met) (Val)
B
IVT + affinity
long NC 80S • aa-tRNA • eEF1A
didemnin B purify
{
Pelota
+ Hbs1l 80S • Pelota • Hbs1l 80S • eRF1
GMPPCP
trunc. NC
IVT + high salt affinity
80S • mRNA
DN-Hbs1l wash purify
polyA NC
C termination E
P
C
GMPPCP: + -
PP
.
TP
elongation rescue
ro
GTP: - + other: M
- -
pu
G
G
-175 -175 eRF3: - - - + +
-175
-83 eRF1: - - + + +
eRF3 - -83 Hbs1l - -83
eEF1A - -63
-48 -63 -63
eRF1 - Pelota -
-48 -48
-32
-25
ribo. -32 -32
-25 -25
prot. -16 ribo. ribo. NC-tRNA -
prot. prot. -16
-16
-7
-7
released NC -
-7
Figure S1. Isolation of Translational Decoding Complexes for Cryo-EM, Related to Figure 1
(A) Schematic of the mRNA constructs used for in vitro translation and isolation of ribosome-nascent chain complexes (RNCs). The start codon (AUG), stop codon
(UAG or UGA), and coding regions for the 3X Flag tag (green), the autonomously-folding villin headpiece (VHP) domain (blue), the cytosolic portion of Sec61b
(orange), and KRas (purple) are indicated.
(B) Experimental strategies for isolating the indicated RNCs from in vitro translation (IVT) reactions.
(C) SDS-PAGE and Coomassie staining of isolated RNCs representing the elongation complex (80S,aa-tRNA,eEF1A); pre-accommodated (80S,eRF1,eRF3) or
accommodated (80S,eRF1) termination complexes; and rescue complex (80S,Pelota,Hbs1l) reconstituted with a truncated mRNA (see panel A). Copurified,
exogenously-added, and ribosomal (ribo. prot.) proteins are indicated.
(D) The long NC construct (see panel A) was translated in vitro in rabbit reticulocyte lysate (RRL) with the indicated translational inhibitors added at the following
concentrations: 50 mg/mL cycloheximide (CHX), 10 mM anisomycin, 200 mM emetine, and 50 mM didemnin B. The translation reactions were affinity purified via the
3X Flag tag on the nascent chain. The elutions and inputs were analyzed by SDS-PAGE and immunoblotting for the indicated proteins, revealing that didemnin B
specifically traps eEF1A on the isolated RNCs.
60S
40S
mRNA
7
1
FSC=0.5
FSC
1
overall FSC
model vs. map
FSC
self-validation
cross-validation 0.143 3.5 Å 3.7 Å 4.0 Å 3.5 Å
0
10 5 3.33 2.5 10 5 3.33 2.5 10 5 3.33 2.5 10 5 3.33 2.5
Resolution (Å) Resolution (Å) Resolution (Å) Resolution (Å)
pre-accom.
accom.
eRF1
eRF1
Trp377
Arg330
uL11 uL11
N N C
18S C
rRNA G1507 eS31 eS31
Figure S7. Conformational Changes during Decoding Factor Accommodation, Related to Figure 7
(A) The minidomain of pre-accommodated eRF1 (colored by domains) forms an interaction (circled) with eS31 (yellow) that is stabilized by G1507 of 18S rRNA.
(B) Upon accommodation, the M (purple) and C (pale blue) domains of eRF1, and the L7/L12 rRNA stalk base (blue) supporting uL11 (light cyan) undergo
conformational changes to establish new interactions (circled) between the eRF1 minidomain with uL11 and the L7/L12 stalk base. Arrows indicate the direction
and magnitude of movement of the minidomain and uL11 from the pre-accommodated state.
Article
Correspondence
p.vanbergen@uu.nl (P.M.P.v.B.e.H.),
m.baldus@uu.nl (M.B.)
In Brief
An NMR approach shows how receptors
move in native membranes at high
resolution, revealing that, while the
intracellular domain of EGFR is rigid, the
extracellular domain is highly dynamic
until bound by ligand.
Highlights
d NMR can be applied to study activation of full-length EGFR in
native membranes
Sciences
4Biomolecular Imaging, Bijvoet Center for Biomolecular Research
Cell 167, 1241–1251, November 17, 2016 ª 2016 Elsevier Inc. 1241
Figure 1. Preparation of EGFR-Rich Membrane Vesicles from A431 Cells
Schematic presentation for the preparation of A431 membrane vesicles. For MS/EM/dSTORM/gSTED studies, cells were grown on DMEM medium, while for
ssNMR studies, A431 cells were cultured in [13C, 15N]-labeled DMEM medium (20 plates were required for one sample). Cells were scraped from the plates and
vesiculated by passing them through a syringe 10 times. After removal of the unbroken cells and cell nuclei by spinning at low speed, the membrane vesicles were
spun down at high speed and loaded into an ssNMR rotor. Note that all methods can also be used to study whole cells.
the negative cooperativity of ligand binding (Arkhipov et al., tained for EGFR domains (Ferguson et al., 2003; Lu et al., 2010;
2014). On the other hand, recent FRET studies speak in favor Ogiso et al., 2002; Stamos et al., 2002) and assuming the C-ter-
of an increased distance between domain I and the membrane minal domain to be unstructured, we monitored EGFR structure
after ligand binding in line with an upright position of the dimer- and dynamics at global and residue-specific levels. Taken
ized ECDs (Valley et al., 2015; Ziomkiewicz et al., 2013). together, our NMR data reveal dynamics of specific EGFR re-
These studies, together with previous work highlighting the in- gions in the unliganded state, which are strongly reduced by
fluence of native membrane lipids such as cholesterol (den Har- ligand binding, suggesting that a reduction in conformational en-
tigh et al., 1992) or gangliosides (Coskun et al., 2011; Miljan and tropy contributes to the free energy of EGFR dimerization.
Bremer, 2002), as well as receptor glycosylation (Liu et al., 2011)
for receptor activation and internalization, underline the notion RESULTS
that a comprehensive understanding of receptor activation re-
quires the study of the full-length EGFR in its native environment. Isolation and Characterization of EGFR-Rich A431
To address this aspect, we describe in the following the develop- Membrane Vesicles
ment and application of a solid-state NMR (ssNMR)-based To investigate EGFR in its native membrane environment, we
approach to directly examine structural and dynamical proper- used A431 cells known to exhibit a high (1–2 3 106 receptors
ties of full-length EGFR in native membrane vesicles before per cell) expression level of EGFR (Haigler et al., 1978) to pro-
and after activation. Unlike solution-state NMR, where small duce EGFR-containing membrane vesicles amenable for our
membrane proteins such as the transmembrane region of multi-technique approach (Figure 1). Confocal microscopy of
EGFR can be studied in membrane mimetics (Endres et al., A431 cells and EGFR negative cells confirmed high-level expres-
2013), ssNMR can give detailed structural insight into the role sion of EGFR (Figure 2A). In addition, super-resolution light mi-
of the bilayered membrane for protein structure in synthetic croscopy (dSTORM) experiments using anti-EGFR nanobodies
(Matsushita et al., 2013) or native bacterial membranes (Kaplan and cryo-electron microscopy revealed the isolation of vesicles
et al., 2015) largely irrespective of their size and mobility. In addi- with a size of 50–250 nm, with EGFR localized to the membrane
tion, ssNMR can probe changes in local or overall protein dy- (Figures 2B and 2C). To determine the orientation of EGFR in
namics at ambient temperature by the reduction in signal inten- these vesicles, we treated the vesicles with Proteinase K for
sity in dipolar-based experiments due to the presence of motion 15 min at 4 C and analyzed the samples by western blotting us-
(Etzkorn et al., 2010; Hong et al., 2012; Schneider et al., 2010) ing an antibody specific for the intracellular domain. Comparison
and by tracking ssNMR line width variations due to backbone of the EGFR protein band intensity using densitometry with the
fluctuations at low temperatures (Koers et al., 2014). Importantly, remaining intracellular domain band of 65 kDa suggests that
the latter studies are fully compatible with sensitivity enhance- approximately 85% ± 6.4 (mean ± SEM, n = 3) of EGFR is in
ment methods such as dynamic nuclear polarization (DNP) that the right outside-out orientation (Figure 2D). This was confirmed
results in NMR signal enhancements by one to two orders of by three-color gated stimulated emission depletion (gSTED) mi-
magnitude (Ni et al., 2013). The combination of this high-sensi- croscopy, where the vesicles were stained with the lipophylic
tivity technique with tailored amino-acid labeling allows for the membrane stain DiI, EGF-A488, and an anti-EGFR nanobody
study of local protein structure even in complex molecular envi- conjugated to A647 (NB-A647). Almost all vesicles showed
ronments (Kaplan et al., 2015). EGF binding (Figure 2E). Fluorescence intensity analysis of colo-
To investigate EGFR in its native membrane environment by calized vesicles shows a high degree of correlation between the
ssNMR, we utilized A431 cells to extract EGFR-enriched mem- EGF-A488 and NB-A647 (Pearson correlation coefficient: r =
brane vesicles amenable for NMR studies. For reference, we 0.577, N = 84, p < 0.001) (Figure 2F). In addition, we observed
characterized these membrane vesicle preparations by electron ligand-induced phosphorylation (Figure 2G), confirming high-
microscopy, super-resolution light microscopy, and mass level expression of functionally active EGFR in the isolated mem-
spectrometry (MS). Using previous structural information ob- brane vesicles. To probe the level of EGFR expression, we
experiments, which significantly increase NMR signal intensity identified correlation in Figure 5C. We also observe a clear shift
via electron polarization (Ni et al., 2013). The increased sensitivity and spectral changes after EGF binding in the random-coil Phe
(with a DNP enhancement factor ε 20 at 800 MHz and 80 at Cb region, which strongly suggests that these signals stem
400 MHz) allowed us to perform 2D and 3D NCOCX experiments from 357FT358 (D3), as well as from the two sequential correla-
at 400 (Figures 5B, 5D, and 5E) and 800 (Figure S6B) MHz DNP tions in the CT (as indicated in Figure 5C).
conditions, as well as a 2D 15N-edited 13C-13C experiment In full accordance with a dominant contribution of EGFR to our
(Baker et al., 2015) at 400 MHz DNP conditions (Figure 5C). spectra, we did not observe methionine correlations in b strand
Again, we made use of standard spectral regions expected for conformations (indicated Metb(0,0) in Figure 5C). Instead, we de-
a-helix (red), b strand (blue), and random-coil (black) ssNMR fre- tected Met correlations (which can be discriminated on the basis
quencies for both 13C and 15N dimensions. These spectral re- of their characteristic Cb shifts) in a-helix and random-coil con-
gions are indicated for expected Phe and Met correlations by formations, in line with EGFR and actin predictions. In summary,
solid and dashed lines in Figures 5B–5E, respectively. our spectra at DNP temperatures (100 K) suggested the domi-
In general, the addition of EGF can lead to chemical-shift or nant role of EGFR signals also in our LT-DNP spectra. For both
line-width changes in ssNMR data of EGFR due to local alter- EGF-free and EGF-bound conformations, our observed correla-
ations in protein structure and dynamics or due to the presence tions globally matched with expectations from previous X-ray
of a nearby ligand. Interestingly, the addition of EGF significantly structures of the corresponding EGFR subdomains, and we
increased spectral resolution both in NC (Figures 5B, 5D, and 5E could tentatively assign chemical-shift changes to a residue
and S6), as well as in 15N-edited CC (Figure 5C) experiments, pair located close to the EGF binding site previously seen in pro-
indicative of a reduction in local backbone and side-chain fluctu- tein crystals. In addition, our ssNMR data suggested a significant
ations that reduce line broadening at low temperatures (Koers reduction in local backbone and side-chain fluctuations that
et al., 2014) and a dominant contribution of EGFR correlations would give rise to structural disorder at low temperatures before
to the spectrum. In line with the latter conclusions, we found EGF binding.
the most-dominant signals in Phe a-helical and random-coil re-
gions (Wang and Jardetzky, 2002) in full analogy to the expected DISCUSSION
three correlations for EGFR in domain I and in domain III, as well
as in the CT, respectively (Figure 5A). In addition, we found at an Increasing evidence suggests that a comprehensive view of
15
N chemical shift of 124 ppm (Figure 5B, NCOCX experiment), EGFR activation requires the study of structure and dynamics
which is characteristic for Leu residues in b strand or random- of the full-length receptor in its native cell membrane setting
coil conformations, and a clear b strand Phe correlation in our (Bessman et al., 2014; Kovacs et al., 2015). NMR has, for a
15
N-edited 13C-13C experiment (Figure 5C) that can only stem long time, contributed to obtaining such information for mole-
from EGFR, namely, the sequential pair 380FL381 in domain III cules that tumble rapidly under in vitro (Arkhipov et al., 2013;
(Figure 5A, denoted Pheb(1,0) in Figure 5B). In the crystal struc- Kern and Zuiderweg, 2003; Kerns et al., 2015; Nygaard et al.,
ture (Ogiso et al., 2002), the 380FL381 pair is located close to the 2013) and, more recently, under in-cell conditions (Banci et al.,
EGF binding site (see Figures 6A and 6B), which would readily 2013; Serber et al., 2001; Smith et al., 2015). On the other
explain the observed chemical-shift changes for the tentatively hand, ssNMR provides increasing possibilities to conduct such
(F) Scatterplot of integrated fluorescence intensities of individual A431 membrane vesicles in EGF-A488 and anti-EGFR NB-647 channels.
(G) Phosphorylation assay of A431 plasma membrane vesicles to detect phosphorylated EGFR (pEGFR) with anti-P1068 antibody. A431 cells were incubated at
37 C with (+) or without () EGF for 10 min. For membrane vesicles samples, either A431 cells were incubated at 37 C for 10 min with EGF (+), followed by vesicle
preparation, or vesicles were first prepared from A431 cells, after which they were incubated at 37 C without () or with (+) EGF.
studies on large, possibly membrane-embedded, protein com- sugars, suggesting the presence of receptor dynamics at
plexes in their natural cell environment (Chow et al., 2014; Fred- different timescales in our samples (Figure S4).
erick et al., 2015; Kaplan et al., 2015; Renault et al., 2012). Here, Our results suggest a model for EGFR activation in which the
we have shown how to extend such studies to examine large ECD is present on the cell surface of resting cells as an
eukaryotic protein receptors in their native membrane setting ensemble of different conformers. Both the closed, tethered
by isolating fully and specifically [13C, 15N]-labeled membrane conformation can be expected, as well as the open conforma-
vesicles that express the functional receptor of interest to high tion in which the autoinhibitory tether between domain II and
levels. Combining 2D ssNMR data at ambient temperatures IV is released (Figure 6A). Based upon a previously suggested
with DNP studies of specifically labeled membrane vesicles al- DG of 1 to 2 kcal/mole of the domain II/IV interaction,
lowed us to examine the overall structure and dynamics of the 80%–97% of the ECD was expected in the closed conformation
full-length EGFR before and after ligand binding in situ. (Ferguson et al., 2003). In this framework, global and local
Taken together, our ssNMR analysis suggests that the dynamics probed in our ssNMR studies would be most compat-
observed spectroscopic changes due to EGF binding are largely ible with the presence of global domain motions of large por-
due to alterations in receptor dynamics. Before activation, our tions of the ECD combined with local backbone fluctuations
data are in accordance with a highly dynamic ECD and CT and (detected by DNP-ssNMR for the EGF binding region, as well
a rigid KD, in line with earlier studies suggesting autoinhibitory in- as the dimerization interface, Figure 6B) that can lead to the
teractions of the KD and the N-terminal portions of the intracel- open conformation, enabling the ectodomain to form inactive
lular JM region with the intracellular membrane surface (Endres (pre)dimers previously detected for very short time periods
et al., 2013; Sengupta et al., 2009) (Figures 6A and 6B). The fluc- (Low-Nam et al., 2011). The highly dynamic nature of the unli-
tuations (local and global) of the ECD in the absence of EGF pre- ganded EGFR also explains the ligand-independent dimeriza-
clude strong interactions of the ECD with the membrane, which tion and activation of EGFR at higher expression levels of
is in disagreement with recent MD studies on the EGFR ECD EGFR in the plasma membrane of different cancer cells. The
domain (Arkhipov et al., 2013). Rather, our experimental results ECD dynamics result in the presence of the ECD in the extended
are in line with experimental results (Coskun et al., 2011; den conformation, which is prone to form dimers. Since the percent-
Hartigh et al., 1992; Liu et al., 2011; Miljan and Bremer, 2002) age of extended conformations will not change, the number of
and recent computational studies (Kaszuba et al., 2015) sug- EGFRs in the extended conformation is higher, resulting in a
gesting a key role for the natural composition of the cell mem- higher probability for predimer formation. Similarly, it explains
brane and receptor glycosylation for receptor dynamics. Indeed, the observation that the ECD in active EGFR mutants can
we observed in our ssNMR experiments additional mobility in lead to enhanced ligand-independent dimerization (Valley
other endogenous cellular components, including lipids and et al., 2015). Deletion of parts of the ECD as has occurred in viral
and oncogenic variants as v-ERB and EGFviii release the closed rigid conformation with reduced conformational entropy, which
conformation, resulting in a larger number of less-stable ligand- contributes to the binding of the multiple low-affinity interaction
independent dimers. As a consequence, basal kinase activity motifs that are present not only in the ECD (Dawson et al.,
levels are higher but less than EGF-induced kinase activity. 2005) but in the entire EGFR. This cooperative binding of two
We hypothesize that EGF binding can occur to all conformers, rigid EGFR monomers involves the tether in domain II, GxxxG,
including the open conformation. In this model, EGF does not or GG4 motifs in the TM, the anti-parallel a helices in the JM,
induce a conformational change of the receptor but rather stabi- as well as the KD domain (Doerner et al., 2015; Ferguson et al.,
lizes the open conformation, which is preceding receptor dimer- 2003; Jura et al., 2009; Lu et al., 2010). In this way, the reduc-
ization. The reduced dynamics of the liganded EGFR result in a tion in global, as well as local, dynamics contributes to the
Connecting lines track experimentally observed Phe (solid lines, annotated by Phe [X,Y]) and Met (dashed lines, annotated by Met [X,Y]) correlations in a-helical, b
strand, and random-coil conformations, respectively. X,Y stand for the number of predicted sequential correlations for X = EGFR and Y = Actin on the basis
Figure 5A (EGFR) and Figure S6A (Actin). In (C), tentative assignments for the 380FL381 pair, as well as for the spectral correlations consistent with signals
stemming from 357FT358 (DIII) and from the two sequential correlations in the CT, are indicated. All experiments were conducted at 400 MHz DNP conditions.
See also Figures S5 and S6.
Supplemental Information includes six figures and can be found with this Chow, W.Y., Rajan, R., Muller, K.H., Reid, D.G., Skepper, J.N., Wong, W.C.,
article online at http://dx.doi.org/10.1016/j.cell.2016.10.038. Brooks, R.A., Green, M., Bihan, D., Farndale, R.W., et al. (2014). NMR spec-
troscopy of native and in vitro tissues implicates polyADP ribose in biominer-
alization. Science 344, 742–746.
AUTHOR CONTRIBUTIONS
Coskun, Ü., Grzybek, M., Drechsel, D., and Simons, K. (2011). Regulation of
M.K., P.B.H., and M.B. designed experiments. M.K. and S.N. prepared iso- human EGF receptor by lipids. Proc. Natl. Acad. Sci. USA 108, 9044–9048.
lated labeled EGFR membrane vesicles. M.K. and K.H. conducted ssNMR Cox, J., and Mann, M. (2008). MaxQuant enables high peptide identification
experiments. C.d.H. performed the phosphorylation assay. M.K. and C.d.H. rates, individualized p.p.b.-range mass accuracies and proteome-wide pro-
performed confocal microscopy. P.J. and W.J.C.G. preformed EM. E.A.K, tein quantification. Nat. Biotechnol. 26, 1367–1372.
Garrett, T.P.J., McKern, N.M., Lou, M., Elleman, T.C., Adams, T.E., Lovrecz, Matsushita, C., Tamagaki, H., Miyazawa, Y., Aimoto, S., Smith, S.O., and Sato,
G.O., Zhu, H.-J., Walker, F., Frenkel, M.J., Hoyne, P.A., et al. (2002). Crystal T. (2013). Transmembrane helix orientation influences membrane binding of
structure of a truncated epidermal growth factor receptor extracellular domain the intracellular juxtamembrane domain in Neu receptor peptides. Proc.
bound to transforming growth factor alpha. Cell 110, 763–773. Natl. Acad. Sci. USA 110, 1646–1651.
Garrett, T.P.J., McKern, N.M., Lou, M., Elleman, T.C., Adams, T.E., Lovrecz, Mikhaylova, M., Cloin, B.M.C., Finan, K., van den Berg, R., Teeuw, J., Kijanka,
G.O., Kofler, M., Jorissen, R.N., Nice, E.C., Burgess, A.W., and Ward, C.W. M.M., Sokolowski, M., Katrukha, E.A., Maidorn, M., Opazo, F., et al. (2015).
(2003). The crystal structure of a truncated ErbB2 ectodomain reveals an Resolving bundled microtubules using anti-tubulin nanobodies. Nat. Commun.
active conformation, poised to interact with other ErbB receptors. Mol. Cell 6, 7933.
11, 495–505. Miljan, E.A., and Bremer, E.G. (2002). Regulation of growth factor receptors by
Gradmann, S., Ader, C., Heinrich, I., Nand, D., Dittmann, M., Cukkemane, A., gangliosides. Sci. STKE 2002, re15.
van Dijk, M., Bonvin, A.M.J.J., Engelhard, M., and Baldus, M. (2012). Rapid Morris, G.A., and Freeman, R. (1979). Enhancement of Nuclear Magnetic-
prediction of multi-dimensional NMR data sets. J. Biomol. NMR 54, 377–387. Resonance Signals by Polarization Transfer. J. Am. Chem. Soc. 101, 760–762.
Haigler, H., Ash, J.F., Singer, S.J., and Cohen, S. (1978). Visualization by Ni, Q.Z., Daviso, E., Can, T.V., Markhasin, E., Jawla, S.K., Swager, T.M., Tem-
fluorescence of the binding and internalization of epidermal growth factor in kin, R.J., Herzfeld, J., and Griffin, R.G. (2013). High frequency dynamic nuclear
human carcinoma cells A-431. Proc. Natl. Acad. Sci. USA 75, 3317–3321. polarization. Acc. Chem. Res. 46, 1933–1941.
den Hartigh, J.C., van Bergen en Henegouwen, P.M., Verkleij, A.J., and Boon- Nygaard, R., Zou, Y., Dror, R.O., Mildorf, T.J., Arlow, D.H., Manglik, A., Pan,
stra, J. (1992). The EGF receptor is an actin-binding protein. J. Cell Biol. 119, A.C., Liu, C.W., Fung, J.J., Bokoch, M.P., et al. (2013). The dynamic process
349–355. of b(2)-adrenergic receptor activation. Cell 152, 532–542.
Hoffman, D.B., Pearson, C.G., Yen, T.J., Howell, B.J., and Salmon, E.D. (2001). Ogiso, H., Ishitani, R., Nureki, O., Fukai, S., Yamanaka, M., Kim, J.-H., Saito,
Microtubule-dependent changes in assembly of microtubule motor proteins K., Sakamoto, A., Inoue, M., Shirouzu, M., and Yokoyama, S. (2002). Crystal
and mitotic spindle checkpoint proteins at PtK1 kinetochores. Mol. Biol. Cell structure of the complex of human epidermal growth factor and receptor extra-
12, 1995–2009. cellular domains. Cell 110, 775–787.
Hong, M., Zhang, Y., and Hu, F. (2012). Membrane protein structure and Pines, A., Gibby, M.G., and Waugh, J.S. (1973). Proton-Enhanced NMR of
dynamics from NMR spectroscopy. Annu. Rev. Phys. Chem. 63, 1–24. Dilute Spins in Solids. J. Chem. Phys. 59, 15–19.
Jura, N., Endres, N.F., Engel, K., Deindl, S., Das, R., Lamers, M.H., Wemmer, Renault, M., Tommassen-van Boxtel, R., Bos, M.P., Post, J.A., Tommassen,
D.E., Zhang, X., and Kuriyan, J. (2009). Mechanism for activation of the EGF J., and Baldus, M. (2012). Cellular solid-state nuclear magnetic resonance
receptor catalytic domain by the juxtamembrane segment. Cell 137, 1293– spectroscopy. Proc. Natl. Acad. Sci. USA 109, 4863–4868.
1307. Santoni, V., Molloy, M., and Rabilloud, T. (2000). Membrane proteins and pro-
Kaplan, M., Cukkemane, A., van Zundert, G.C.P., Narasimhan, S., Daniëls, M., teomics: un amour impossible? Electrophoresis 21, 1054–1070.
Mance, D., Waksman, G., Bonvin, A.M.J.J., Fronzes, R., Folkers, G.E., and Sauvée, C., Rosay, M., Casano, G., Aussenac, F., Weber, R.T., Ouari, O., and
Baldus, M. (2015). Probing a cell-embedded megadalton protein complex by Tordo, P. (2013). Highly efficient, water-soluble polarizing agents for dynamic
DNP-supported solid-state NMR. Nat. Methods 12, 649–652. nuclear polarization at high frequency. Angew. Chem. Int. Ed. Engl. 52, 10858–
Kaszuba, K., Grzybek, M., Or1owski, A., Danne, R., Róg, T., Simons, K., Coskun, 10861.
Ü., and Vattulainen, I. (2015). N-Glycosylation as determinant of epidermal Schneider, R., Seidel, K., Etzkorn, M., Lange, A., Becker, S., and Baldus, M.
growth factor receptor conformation in membranes. Proc. Natl. Acad. Sci. (2010). Probing molecular motion by double-quantum (13C,13C) solid-state
USA 112, 4334–4339. NMR spectroscopy: application to ubiquitin. J. Am. Chem. Soc. 132, 223–233.
Stamos, J., Sliwkowski, M.X., and Eigenbrot, C. (2002). Structure of the Yarden, Y. (2001). The EGFR family and its ligands in human cancer. signalling
epidermal growth factor receptor kinase domain alone and in complex with mechanisms and therapeutic opportunities. Eur. J. Cancer 37(Suppl 4 ),
a 4-anilinoquinazoline inhibitor. J. Biol. Chem. 277, 46265–46272. S3–S8.
Tebbutt, N., Pedersen, M.W., and Johns, T.G. (2013). Targeting the ERBB fam- Ziomkiewicz, I., Loman, A., Klement, R., Fritsch, C., Klymchenko, A.S., Bunt,
ily in cancer: couples therapy. Nat. Rev. Cancer 13, 663–673. G., Jovin, T.M., and Arndt-Jovin, D.J. (2013). Dynamic conformational transi-
Valley, C.C., Arndt-Jovin, D.J., Karedla, N., Steinkamp, M.P., Chizhik, A.I., Hla- tions of the EGF receptor in living mammalian cells determined by FRET and
vacek, W.S., Wilson, B.S., Lidke, K.A., and Lidke, D.S. (2015). Enhanced fluorescence lifetime imaging microscopy. Cytometry A 83, 794–805.
Further information and requests for reagents may be directed to Paul van Bergen en Henegouwen (p.vanbergen@uu.nl).
A431 cells obtained from ATCC (CRL-1555, LGC Standards, Germany) and EGFR negative cells NIH 3T3 clone 2.2 murine fibroblasts
were cultured in Dulbecco’s modified eagle’s medium (DMEM: GIBCO, invitrogen, Paisley, UK) containing 10% (v/v) fetal calf serum
(FCS), L- glutamine, penicillin and streptomycin at 37 C with an atmosphere containing 5% CO2.
METHOD DETAILS
Phosphorylation assay
Phosphorylation of EGFR was induced either by adding 8 nM EGF to the cells in medium before membrane vesicles were prepared or
to membrane vesicles in a phosphorylation buffer for 10 min at 37 C. Proteins were separated by SDS-PAGE and blotted onto PVDF-
membrane. The membrane was incubated with R-a-phosphoEGFR (Y1068) (Cell Signaling Technology, Danvers, Massachusetts)
and M-a-Actin followed by G-a-R800 (Li-Cor) and G-a-mouse700 (Li-Cor). To detect EGFR the blot was first stripped with stripping
buffer and then blocked and incubated with R-a-EGFR (C74B9, Cell signaling technology)) followed by G-a-R800. The detection was
performed with the Odyssey imaging system (Li-COR) and bands were quantified using Odyssey software.
13
C, 15N labeling of eukaryotic cells
A431 cells were cultured in the labeled medium described above on Corning cell culture dishes (150 mm x 25 mm). Cells cultured in
the first week (2-3 passages) in the labeled medium were not used to prepare the samples to ensure full incorporation of labeled sub-
stance in the cells. Once the plates were 80%–90% confluent, cells were incubated with PBS containing 2 mM EGTA at 37 C for
15 min, after which they were scraped. Subsequently, cells were spun at 500xg for 10 min at 4 C. The cell pellet was resuspended in
PBS and spun again at 500xg for 10 min at 4 C and used to prepare the membrane vesicles as described below. Approximately 20
plates (150 mm x 25 mm) were used to fill a 3.2 mm rotor with [13C, 15N] labeled A431 membrane vesicles.
Cryo-electron microscopy
For the preparation of thin vitrified specimens of the A431 vesicles, a 3 ul drop of sample was placed on the surface of a glow
discharged Quantifoil micromachined holey carbon (R 2/2) TEM grid (Quantifoil Micro Tools GmbH, Jena, Germany) held by
the Vitrobot mark IV tweezer (FEI, Eindhoven, the Netherlands). Before introducing the sample into the Vitrobot, the environmental
chamber of the Vitrobot was equilibrated at room temperature (22 C) and humidity was set at 100%. Blotting conditions were cho-
sen so that a 10-500 nm liquid specimen film spanning R 2/2 mm holes of the QF were formed when excess sample was removed
by the blotting filter paper in the Vitrobot. The specimen was released and fell through the opening shutter and into liquid ethane at
its freezing point, where the thin specimen films were vitrified. The vitreous specimen was transferred under liquid nitrogen into a
Gatan 626 single tilt liquid nitrogen cryo holder (Gatan GmbH, Munich, Germany) and into a Tecnai20 LaB6 electron microscope
(FEI, Eindhoven, the Netherlands), where the specimen temperature was maintained below 165 C. An Eagle 4k 3 4k CCD cam-
era (FEI, Eindhoven, the Netherlands) was used under normal and low-dose conditions to record micrographs of the vesicles,
which was done in Tif format with a nominal under focus of 3 mm. Vesicle diameter was measured using the IMOD software pack-
age (Kremer et al., 1996).
Mass spectrometry
A431 vesicles and cells were lysed in 50mM ammonium bicarbonate, 1% SDC, 10mM TCEP, 100mM TRIS, 40mM chloroacetamide
and complete protease inhibitor cocktail (Roche) and boiled for 5 min at 95 C. The supernatant was diluted 10 times and digested
overnight using LysC (1:75) and trypsin (1:50). SDC was removed by acidifying the samples with formic acid and spinning down. The
supernatant was desalted using C18 SepPak (Waters) cartridges, vacuum-dried and stored at 80 C for further analysis. Peptide
mixtures were reconstituted in 10% formic acid and 1mg of protein digest of each sample was analyzed by nano-LC-MS/MS on
an Orbitrap Q-Exactive plus (ThermoFisher Scientific, Bremen). The digest was trapped on an in-house made trap column (Reprosil
pur C18, dr maisch, 100 mm x 2 cm, 3 mm) by loading for 10 min with A (A: 0.1% formic acid) and separated on an analytical column
(Poroshell 120 EC C18, Agilent Technologies, 50 mm x 50 cm, 2.7 mm) using a 2 hr linear gradient from 13% to 40% B (B: 0.1% formic
acid, 80% ACN). During each scan cycle, the 10 most intense peptide precursors were selected for higher-energy collisional disso-
ciation (HCD). Raw data files were processed with MaxQuant version 1.5.3.30. The data were searched against the Human UniProt
database (February 2016, 151.869 entries). A false discovery rate was set to 1% at protein and peptide level. Peptide intensities were
normalized to total peptide intensities in each LC-MS run. For relative quantification, intensities of all unique and razor peptides of a
protein were summed up (Cox and Mann, 2008).
The DQSQ CC 2D datasets (with and without EGF) in Figure 4 were acquired using 110 t1 points with a spectral window of
46656.176 Hz in t1. The spectra were processed using an EM function, line broadening 100 Hz in t2 and t1 and with 2K and 1K
zero filling in t2 and t1 respectively, with 8 coefficients linear prediction in t1. Note that the experiment with EGF was multiplied by
factor 1.4 to compensate for the sample amount compared to the sample without EGF.
The NCOCX experiments in Figure 5 (400 MHz DNP) were acquired using 8 points in t1 and 13 points in t2 with spectral width of
3012.048 Hz and 1620.745 Hz in t2 and t1 respectively. The spectra were processed using a squared sine bell function 3 in t3, t2 and t1
with 4k zero filling in t3. In t2 and t1 128 points of zero filling were used.
The 15N-edited CC experiments in Figure 5C were acquired using 50 points in t1 with spectral width of 12569.131 Hz in t1. The
spectra were processed using a squared sine bell function 2 in t2 and t1 with 2k zero filling in t2. In t1 128 points of zero filling
were used.
The DQSQ CC 2D datasets in Figure S1C were acquired using 110 t1 points with a spectral window of 46656.176 Hz in t1. The
spectrum was processed using EM function, line broadening 100 Hz in t2 and t1 and with 2k and 1K zero filling in t2 and t1 respectively,
with 8 coefficients linear prediction in t1.
The PARIS CC 2D dataset in Figure S2 was acquired using 221 t1 points with a spectral window of 36982.246 Hz in t1. The spectrum
was processed using EM function with line broadening 100 Hz in both t1 and t2. 1k and 2k zero filling was used in both t1 and t2 respec-
tively, with 4 linear prediction coefficients in t1.
The HC HETCOR 2D dataset in Figure S4 was acquired using 92 t1 points with a spectral window of 7142.86 Hz in t1. The spectrum
was processed using squared sine function (SSB = 2) in both t1 and t2. 1k and 2k zero filling was used in both t1 and t2 respectively,
with 40 linear prediction coefficients in t1.
The NCOCX experiments in Figure S6B (800 MHz DNP) were acquired using 15 t1 points with a spectral width of 3333.33 Hz in t1.
The spectra were processed using squared sine function 2.5 in both t1 and t2 with 4k and 1k zero filling points in t2 and t1 respectively.
Figure S1. Temperature Dependence of One- and Two-Dimensional ssNMR Experiments Using [13C, 15N]-Labeled A431 Plasma Membrane
Vesicles with and without EGF, Related to Figure 4
(A) 13C CP (cross polarization, which probes the rigid parts of the sample (Pines et al., 1973)) experiment of [13C, 15N]-labeled A431 plasma membrane vesicles
without EFG at 253 K (blue) and 285 K (orange).
(B) INEPT-based (See (Morris and Freeman, 1979) experiment, to probe the mobile parts of the sample of [13C, 15N]-labeled A431 plasma membrane vesicles
without EFG at 253 K (blue) and 285 K (orange).
(C) 2D 13C,13C) double-quantum / single-quantum experiment (DQSQ) with (red) and without (blue) EGF performed at 253 K.
(D) First increment of 2D NCa of [13C, 15N]-labeled A431 plasma membrane vesicles without EGF (blue at 253 K and orange at 285 K) and with EGF (red at 253 K
and green at 285 K).
Figure S2. Comparison of ssNMR Spectra of [13C, 15
N]-Labeled A431 Membrane Vesicles at Low Temperatures to EGFR Chemical-Shift
Predictions, Related to Figure 4
The 2D (13C,13C) PARIS experiment was performed at 253 K. Black crosses represent FANDAS (Gradmann et al., 2012) predictions of EGFR based on the
different available structures and assuming random-coil chemical shifts for the C-terminal region (CT). Note that the peaks at 70 ppm are stemming from lipids.
As mentioned in the section Materials and Methods, EGFR samples were prepared using unlabeled Glutamine, Tryptophan and Cysteine amino acids and,
correspondingly, were not included in the FANDAS correlation map. FANDAS predictions were made based on the following structures: 1NQL (Extracellular
inactive), 2M20 (Transmembrane domain), 2M20 (Juxtamembrane), 1M14 (Kinase domain), 1M14 (part of the C-terminal tail).
Figure S3. ssNMR Signal Patterns for Extended Measurement Periods, Related to Figure 4
1D 13C CP and INEPT on [13C, 15N]-labeled A431 vesicles with and without EGF performed during the course of 2D experiments. At the end of measurements (day
16), both samples showed the same profile as in the beginning of the measurements. Data were recorded on a 700 MHz NMR instrument.
Figure S4. Mobile Molecules Appear at Higher Temperature in 2D ssNMR Data, Related to Figure 4
2D INEPT experiment (See Andronesi et al., 2005) of [13C, 15
N]-labeled A431 membrane vesicles without EGF performed at 285 K showing mobile molecular
components.
Figure S5. Secondary-Structure Analysis of EGFR, Actin, and EGFR Domains, Related to Figures 4 and 5
(A) Comparison of the distribution of Ser, Thr, Pro and Ala residue in different secondary structures between EGFR (red) and Actin (blue). The y axis represents the
number of each amino acid in the correspondent secondary structure.
(B) Heatmaps of the distribution of Ala, Pro, Ser and Thr residues in EGFR for the three secondary structure elements (a-helix, b strand and random coil). Red and
green stand for the highest and lowest numbers of occurrence, respectively.
Figure S6. Sequential Correlations Predicted for Actin in the MFTL-Labeled A431 Membrane Vesicles and High-Field DNP Data, Related to
Figure 5
(A) highlights the three expected correlations of Actin in the MFTL labeled A431 membrane vesicles.
(B) 2D NCOCX of MFTL labeled A431 vesicles with (cyan) and without (orange) EGF performed on a 800 MHz DNP machine (Koers et al., 2014). Dotted lines
connect the Cb region of Phe in both spectra.
Article
Correspondence
jfliu@mail.hust.edu.cn (J.L.),
shawnxu@umich.edu (X.Z.S.X.)
In Brief
A taste receptor homolog absorbs UV
light and mediates avoidance behavior in
C. elegans in response to light exposure.
Highlights
d LITE-1, a taste receptor homolog, is a bona fide
photoreceptor that senses UV light
of MOE, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
2Life Sciences Institute and Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, MI 48109, USA
3Departments of Structural Biology and Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
4Department of Pharmacology, Case Western Reserve University, Cleveland, OH 44106, USA
5Present address: Institute of Neuroscience, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China
6Lead Contact
1252 Cell 167, 1252–1263, November 17, 2016 ª 2016 Elsevier Inc.
α C-LITE-1 Figure 1. LITE-1 Adopts an Unusual Mem-
brane Topology, with Its C Terminus Facing
Extracellularly and N Terminus Located
A Intracellularly
(A) A schematic of LITE-1 membrane topology.
Antibodies were raised against the N-terminal
α N-LITE-1 (aN-LITE-1) and C-terminal (aC-LITE-1) peptide
(15 aa) of the LITE-1 long isoform.
B Non-permeabilized surface staining Permeabilized staining (B) LITE-1 displays a distinct membrane topology,
GFP Ab staining Merge GFP Ab staining Merge with its C terminus facing extracelluarly and it
N terminus located in the cytoplasm. Shown are
confocal images from immunofluorescence
α C-LITE-1 staining. LITE-1 was co-expressed with GFP as a
transgene in muscles under the myo-3 promoter.
Staining was performed on primary cultured cells
under non-permeabilizing conditions for surface
staining or under permeablizing conditions to stain
α N-LITE-1 the entire cell. aN-LITE-1 and aC-LITE-1 detect
the N- and C-terminal end of LITE-1, respectively.
aN-Myc stains the Myc tag fused to the N-terminal
end of LITE-1. See Figure S1 for controls. Scale
bar, 2 mm.
α N-Myc (C) BiFC images showing that the N terminus of
LITE-1 is located in the cytoplasm. Shown on the
left are schematics describing the design of the
BiFC approach. Shown on the right are fluores-
C cence images. N-YFP::ZIP::LITE-1 was expressed
YFP DsRed Merge
as a transgene in muscles using the myo-3 pro-
moter. C-YFP::ZIP (or C-YFP::DZIP that lacks a
N-YFP::ZIP::LITE-1 zipper domain) and DsRed were co-expressed as
+
a separate transgene in muscles using the same
C-YFP::ZIP
+ promoter. Two transgenes were crossed together
N-YFP::ZIP::LITE-1 DsRed to examine reconstitution of YFP fluorescence
in muscles. Only if the N terminus of LITE-1 is
located intracellularly would one be able to
C-YFP::ZIP detect YFP fluorescence. Muscles were acutely
N-YFP::ZIP::LITE-1
dissected out from transgenic worms using a
+
C-YFP::ΔZIP protocol described previously (Liu et al., 2013).
+ Scale bar, 100 mm.
DsRed Also see Figure S1.
N-YFP::ZIP::LITE-1
C-YFP::ΔZIP
as a chemoreceptor (Yau and Hardie, 2009). In this case, LITE-1 residue into another GR family member promotes photosensi-
would sense light-produced chemicals, but not light per se. tivity, opening up the intriguing prospect that it might be possible
To address this conundrum, here we purified LITE-1 protein to genetically engineer new photoreceptors.
from worm lysate and found that it directly absorbs UVA and
UVB light. This property of LITE-1, together with its capacity in RESULTS
producing light-evoked functional outputs in vivo, indicates
that LITE-1 is a photoreceptor. LITE-1 bears a number of unique LITE-1 Adopts a Membrane Topology Opposite to
features that distinguish it from other photoreceptors. These Conventional 7-TM Receptors
include an exceptionally high efficiency in photoabsorption, an As a first step, we considered whether LITE-1 is related to any
ability to sense both UVA and UVB light, a strict dependence known photoreceptors. LITE-1 is predicted to contain 7-TM do-
on conformation for photoabsorption, a strong resistance to mains (Figure 1A). The only known 7-TM photoreceptors in meta-
bleaching by UV light, and a reversed membrane topology zoans are opsins, but LITE-1 has no significant homology with
compared to opsins. These results identify LITE-1, a taste recep- opsins at the sequence level (Edwards et al., 2008; Liu et al.,
tor homolog, as a unique photoreceptor, with features not seen 2010). As both insect OR (olfactory receptors) and GR (gustatory
in any known photoreceptors. Thus, novel types of photorecep- receptors) members were shown to possess a membrane topol-
tors are present in the animal kingdom. Furthermore, we identi- ogy opposite to conventional 7-TM receptors (Benton et al.,
fied two tryptophan residues in LITE-1 that are critical for 2006; Zhang et al., 2011), we thus questioned whether LITE-1
photoabsorption. Remarkably, introducing such a tryptophan and opsins are even related at the membrane topology level.
UVA
absorbance peak of the purified Rho was 500 nm, which was tor did not absorb light when purified and tested side by side
identical to that published in literature (Figure S3D versus Fig- with LITE-1 and Rho (Figure S3C). Thus, multiple control exper-
ure 2I). This set of control experiments also validated our iments support that LITE-1 absorbs photons and does so at a
experimental system, including protein expression, purification, high efficiency. This property of LITE-1, together with its
concentration determination, and spectral analysis. capacity in producing various light-induced functional outputs
In another control experiment, we purified mammalian adeno- [e.g., light-induced muscle contraction and calcium transients
sine A2A receptor (A2AR) ectopically expressed in worm muscles and avoidance behavior ([Figures 2A–2D and S2 and Movies
(Salom et al., 2012) (Figures S3A and S3B). Like LITE-1 and S1 and S2]), indicates that LITE-1 is a photoreceptor. LITE-1 is
opsins, A2AR is also a 7-TM receptor but is not expected to be also the only photoreceptor that shows strong absorption of
photosensitive. Indeed, we found that, as predicted, this recep- both UVA and UVB light.
LITE-1 Strictly Depends on Its Conformation for NaOH and observed a similar phenomenon (Figures 3C and
Photoabsorption 3D). These observations demonstrate that, unlike typical
We next sought to characterize the photoabsorption of LITE-1. photoreceptors, LITE-1 strictly depends on its conformation for
A photoreceptor is usually composed of two moieties: a host photoabsorption.
protein and a prosthetic chromophore (Falciatore and Bowler, We also tested H2O2. Interestingly, H2O2 treatment abolished
2005; Wang and Montell, 2007; Yau and Hardie, 2009). The LITE-1’s photoabsorption (Figure S4A). As an oxidizing agent,
spectral properties of a photoreceptor are certainly affected by H2O2 can damage the function of proteins, lipids, and nucleic
the host protein. However, the absolute ability of a photore- acids (Fridovich, 2013). Oxidization of LITE-1 may affect the
ceptor to absorb light does not rely on the host protein, as light conformation of LITE-1, which is required for its absorption of
absorption is mediated by the chromophore (e.g., retinal, flavin, light. Similarly, H2O2 treatment also destroyed the spectral
bilin, and p-coumaric acid) (Falciatore and Bowler, 2005; Marti fingerprint of bRho by shifting its absorbance peak from 568 to
et al., 1991; Radding and Wald, 1956). Consequently, denaturing 370 nm (Figure S4B). Thus, H2O2 appears to inhibit the photoab-
a photoreceptor usually shifts its absorbance peaks to different sorption of both LITE-1 and bRho in vitro. Nevertheless, as it is
wavelengths but does not eliminate them, as they are mediated difficult to estimate the endogenous concentration of H2O2,
by the associated chromophore (Dutta et al., 2010; Hagins, whether and how H2O2 affects LITE-1 function in vivo remains
1973; Hubbard, 1969; Maglova et al., 1989). This, surprisingly, to be determined.
does not appear to be the case for LITE-1. Denaturing LITE-1
with urea abolished the light absorption by LITE-1, eliminating Genetic Screens Identify Residues Critical for LITE-1
both the 280 and 320 nm peaks (Figure 3A). As a comparison, Function
the same urea treatment failed to abolish the light absorption To obtain a better understanding of LITE-1 photoabsorption, we
by bacterial rhodopsin (bRho) but instead shifted its absorbance attempted to identify residues critical for LITE-1 function. In a ge-
peak from 568 nm to 370 nm (Figure 3B), the latter of which is the netic screen for mutant animals defective in UV-light-induced
signature peak of free retinal, the chromophore of bRho (Sperling avoidance behavior, we isolated several lite-1 mutant alleles
and Rafferty, 1969). A similar phenomenon was observed with (Liu et al., 2010). We hypothesized that mutations in transmem-
our purified bovine rhodopsin (Rho) (Figures S3E and S3F). It is brane domains are more likely to affect the photoabsorption of
notable that the 280 nm peak of denatured bRho remained LITE-1 rather than its coupling to downstream signaling mole-
unchanged (Figure 3B), consistent with the notion that this cules. Two mutants, lite-1(xu8) and lite-1(xu10), thus came to
peak was mediated by the intrinsic light absorption by trypto- our attention, as the residues mutated (A332V and S226F,
phan residues of the bRho protein. This peak was not that respectively) reside in putative transmembrane domains (Fig-
distinct in denatured LITE-1 in Figure 3A since the ure 7I). The objective was to purify these mutant forms of
concentration of LITE-1 used was one-tenth that of bRho. We LITE-1 protein and then characterize their photoabsorption
also treated LITE-1 using other denaturing agents such as in vitro. We first tested their role in vivo and found that, as
22 22 Blot w/ α1D4
expected, A332V and S226F mutations disrupted LITE-1 func- forms of LITE-1 were insensitive to UVA light (Figures 4A and
tion in vivo. Specifically, worms ectopically expressing LITE-1 S5A), they were nevertheless sensitive to UVB light (Figures 5C
harboring either mutation were no longer sensitive to UVA light and S5B). In addition, as was the case with wild-type LITE-1,
in behavioral assays (Figures 4A and S5A). In addition, these UVB light also induced robust calcium transients in muscle cells
two point mutations nearly abolished UVA-light-evoked calcium ectopically expressing these two mutant forms of LITE-1 (Fig-
transients in muscle cells ectopically expressing LITE-1 (Figures ures 5D–5H). These results are in line with the data from spectral
4B–4E). We successfully purified LITE-1A332V and LITE-1S226F analysis (Figures 5A and 5B). Thus, it appears that the absorption
proteins to homogeneity (Figures 4F and 4G). LITE-1A332V and of UVA and UVB light by LITE-1 can be separated, providing
LITE-1S226F displayed an absorbance spectrum distinct from further evidence demonstrating the specificity of LITE-1
wild-type LITE-1: they both lost the 320 nm peak but retained photoabsorption.
normal absorption at 280 nm (Figures 5A and 5B). Thus, the
two mutations disrupted LITE-1’s absorption of UVA but not LITE-1 Absorption of UVB but Not UVA Light Shows
UVB light. This is consistent with the fact that our genetic screen Resistance to Photobleaching
was targeted for isolating mutants defective in responding to Prolonged light illumination bleaches photoreceptors (Wang
UVA but not UVB light, since the optical system of the micro- and Montell, 2007; Yau and Hardie, 2009). We tested this prop-
scope used to evoke and assay phototaxis behavior did not erty of LITE-1 and found that pre-exposure to UV light can
transmit UVB light (Liu et al., 2010). readily bleach LITE-1’s ability to absorb UVA light by elimi-
Given that LITE-1A332V and LITE-1S226F proteins retained nating its 320 nm peak (Figure 5I). Surprisingly, such treatment
normal absorption of UVB light in vitro, one would predict that spared the 280 nm peak (Figure 5I), indicating that the ability for
these two mutant forms of LITE-1 shall preserve the sensitivity LITE-1 to capture UVB light was more stable and relatively
to UVB light in vivo. To test this idea, we set up an optical path resistant to photobleaching. This experiment reveals an addi-
through which UV light was directed to the worm directly. tional feature that distinguishes LITE-1 absorption of UVA and
Indeed, though transgenic worms expressing these two mutant UVB light.
Absorbance (O.D.)
1.5 1.5
(A and B) S226F and A332V mutations disrupt
LITE-1’s absorption of UVA but not UVB light
1 1
in vitro. The extinction coefficient at 280 nm for
0.5 LITE-1A332V and LITE-1226F is: 4.0 3 106 M 1cm 1
0.5
and 3.75 3 106 M 1cm 1, respectively, which are
0 0 similar to wild-type LITE-1 (Figure 2I).
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700 (C) S226F and A332V mutations do not disrupt the
Wavelength (nm) Wavelength (nm)
sensitivity of LITE-1 to UVB light in vivo, shown by
C 100 *** UVB D E behavioral assays. LITE-1 harboring S226F or
*** UVB A332V was expressed as a transgene in muscle
Body paralysis (%)
80 ***
150 150 lite-1 transgene cells under the myo-3 promoter. WT (wild-type)
***
transgene transgene 150 *** muscles to elicit calcium transients. Shades along
100 100 *** the traces in (D–G) represent SEM (H) Bar graph.
100
n R 10. ***p < 0.00001 (ANOVA with Bonferroni
50 50
50
test).
(I) LITE-1 absorption of UVB but not UVA light
0 0
0 shows resistance to photobleaching. LITE-1 was
10s 10s
pre-exposed to UV light for 5 min (17 mW/mm2,
302 nm) at room temperature prior to spectro-
I 1 photometric analysis. Pre-exposure to UV light for
LITE-1 (0.18 μM)
30 min still did not notably affect the UVB photo-
LITE-1 (0.18 μM)
Absorbance (O.D.)
Two Tryptophan Residues Are Required for LITE-1 mutagenesis and expressed the corresponding mutant form
Function of LITE-1 as a transgene in muscle cells. We first examined their
Our success in identifying residues critical for LITE-1’s absorp- function in vivo. Two tryptophan residues, W77 and W328, when
tion of UVA light encouraged us to explore what may underlie mutated to alanine, abolished the sensitivity of LITE-1 to UVA
its absorption of UVB light. Tryptophan residues show intrinsic light in vivo in behavioral assays (Figures 6A and S6A), whereas
absorption of UVB light, peaking at 280 nm. It is also known mutating the other four tryptophan residues did not elicit a
that light absorption by tryptophan is quite resistant to photo- notable effect (Figure 6A). We obtained a similar result when
bleaching (Wu et al., 2008). These two features together led us mutating W77 and W328 to F (phenylalanine) (Figure 6A).
to question whether tryptophan residues in LITE-1 play a role Furthermore, the two tryptophan mutations W77F and W328F
in mediating its absorption of UVB light. Six tryptophan residues nearly eliminated UVA-light-induced calcium transients in mus-
are found in LITE-1 (Figure 7I). However, should any of these cle cells ectopically expressing LITE-1 (Figures 6C–6F). These
tryptophan residues be important for LITE-1 function, they would data identify a critical role for W77 and W328 in LITE-1 function
not be expected to be picked up by our genetic screen, as the in vivo.
mutagen (EMS) used in the screen would typically mutate a tryp- Lastly, we purified the two mutant forms of LITE-1, LITE-1W77F
tophan residue to a stop codon rather than generate a missense and LITE-1W328F, to homogeneity (Figures 4F and 4G) and exam-
mutation. ined their photoabsorption in vitro. Strikingly, W77F and W328F
Therefore, to test the above hypothesis, we mutated each of mutations not only abolished LITE-1’s absorption of UVA light at
the six tryptophan residues to alanine through site-directed 320 nm, but also nearly eliminated its absorption of UVB light at
UVA
RCaMP ∆R/R (%)
UVB
RCaMP ∆R/R (%)
Absorbance (O.D.)
1 1
UVB transgene
Absorbance (O.D.)
GUR-3 (Figures 7C, 7D and S7C–S7E). We then mutated residue in vitro (Figures 7G and 7H). As expected, GUR-3 showed little
Y79 in GUR-3 to W (i.e., GUR-3Y79W), which corresponds to W77 absorption of UVB light (Figure 7H). By contrast, strong absorp-
in LITE-1 (Figure S7A). Strikingly, worms ectopically expressing tion of UVB light at 280 nm was observed in GUR-3Y79W (Fig-
the tryptophan-bearing GUR-3Y79W then became very sensitive ure 7G). The extinction coefficient of this tryptophan-bearing
to UVB light (Figures 7A and S7G). UVA light was not that GUR-3Y79W protein reached the level of 106 M 1cm 1 (1.03 3
effective on these worms (Figures S7B–S7F). This result was 106 M 1cm 1), which is about one-third of that found for
expected, as UVA absorption by LITE-1 apparently requires LITE-1. This data provides a biochemical basis for the observed
additional key elements such as residues A226 and S332 and photosensitivity of GUR-3Y79W. This set of experiments also rai-
perhaps others (Figures 4A–4E). We also examined UVB-light- ses the intriguing prospect that it might be possible to genetically
evoked calcium transients in muscle cells and found that ectopic engineer new photoreceptors.
expression of the tryptophan-bearing GUR-3Y79W greatly
potentiated UVB-light-induced calcium response in these cells DISCUSSION
(Figures 7B–7D). Thus, introducing a tryptophan residue into
GUR-3 promotes photosensitivity. In summary, our results demonstrate that the C. elegans taste re-
Having characterized the photosensitivity of GUR-3Y79W and ceptor homolog LITE-1 is a bona fide photoreceptor. As some
GUR-3 in vivo, we then purified both proteins to homogeneity photoreceptors are multifunctional—for example, Drosophila
(Figures 7E and 7F) and examined their photoabsorption rhodopsin also responds to heat (Shen et al., 2011)—it remains
insects (Clyne et al., 2000; Liu et al., 2010; Scott et al., 2001).
Supplemental Information includes seven figures and two movies and can be
Some of them in fact do not act as chemoreceptors (Thorne found with this article online at http://dx.doi.org/10.1016/j.cell.2016.10.053.
and Amrein, 2008). For example, Drosophila Gr28b(d) encodes
a thermosensor (Ni et al., 2013), while another Gr28b isoform AUTHOR CONTRIBUTIONS
has been implicated in UV-light-induced avoidance behavior
(Xiang et al., 2010). The inverted membrane topology makes it J.G. performed the experiments and analyzed the data. Y.Y., B.Z., Z.W., J.P.,
unlikely for LITE-1 to function as a GPCR. Interestingly, LITE-1 and Z.F. assisted J.G. in performing the experiments. A.W. initiated the project
and generated reagents. L.K. performed immunostaining on primary cultured
can functionally interact with G protein signaling (Liu et al.,
muscle cells and analyzed the data. J.G., J.L., and X.Z.S.X. wrote the paper.
2010); but given the atypical topology of LITE-1, its interaction
with G protein signaling is likely to be indirect (Liu et al., 2010). ACKNOWLEDGMENTS
It is also unclear whether LITE-1 possesses ion channel activity
like some OR and GR members. At the sequence level, no clear We thank Tom Kerppola for BiFC plasmids; David Salom and Kris Palczewski
mammalian LITE-1 homologs could be identified. This, however, for technical assistance and providing strains; Zhaohui Xu for helpful discus-
does not necessarily imply a lack of LITE-1 orthologs in mam- sions; and Wenyuang Zhang, Jiejun Zhou, John Tesmer, Frederick Stull, and
James Bardwell for technical assistance. Some strains were obtained from
mals, as 7-TM receptors tend to share limited homologies
the CGC. This work utilized the Core Center for Vision Research funded by
even among those within the same subfamilies. In fact, 7-TM re- P30 EY007003 from the NEI. A.W. was supported by a predoctoral training
ceptors with a reversed membrane topology are present in the grant from the NEI (T32EY013934). This work was supported by the NSFC
mammalian genome (Iwabu et al., 2010). For example, the (31130028, 31225011, and 31420103909 to J.L.), the Program of Introducing
Bellono, N.W., Kammel, L.G., Zimmerman, A.L., and Oancea, E. (2013). UV Maglova, L., Atanasov, B., and Keszthelyi, L. (1989). Unfolding of monomeric
light phototransduction activates transient receptor potential A1 ion channels bacteriorhodopsin in water-urea solution. Biochim. Biophys. Acta 975,
in human melanocytes. Proc. Natl. Acad. Sci. USA 110, 2383–2388. 271–276.
Benton, R., Sachse, S., Michnick, S.W., and Vosshall, L.B. (2006). Atypical Marti, T., Rösselet, S.J., Otto, H., Heyn, M.P., and Khorana, H.G. (1991). The
membrane topology and heteromeric function of Drosophila odorant receptors retinylidene Schiff base counterion in bacteriorhodopsin. J. Biol. Chem. 266,
in vivo. PLoS Biol. 4, e20. 18674–18683.
Bhatla, N., and Horvitz, H.R. (2015). Light and hydrogen peroxide inhibit C. el- Matsuyama, T., Yamashita, T., Imamoto, Y., and Shichida, Y. (2012). Photo-
egans Feeding through gustatory receptor orthologs and pharyngeal neurons. chemical properties of mammalian melanopsin. Biochemistry 51, 5454–5462.
Neuron 85, 804–818. Moore, C., Cevikbas, F., Pasolli, H.A., Chen, Y., Kong, W., Kempkes, C., Par-
Christensen, M., Estevez, A., Yin, X., Fox, R., Morrison, R., McDonnell, M., ekh, P., Lee, S.H., Kontchou, N.A., Yeh, I., et al. (2013). UVB radiation gener-
Gleason, C., Miller, D.M., 3rd, and Strange, K. (2002). A primary culture system ates sunburn pain and affects skin by activating epidermal TRPV4 ion channels
for functional analysis of C. elegans neurons and muscle cells. Neuron 33, and triggering endothelin-1 signaling. Proc. Natl. Acad. Sci. USA 110, E3225–
503–514. E3234.
Christie, J.M., Arvai, A.S., Baxter, K.J., Heilmann, M., Pratt, A.J., O’Hara, A., Ni, L., Bronk, P., Chang, E.C., Lowell, A.M., Flam, J.O., Panzano, V.C., Theo-
Kelly, S.M., Hothorn, M., Smith, B.O., Hitomi, K., et al. (2012). Plant UVR8 bald, D.L., Griffith, L.C., and Garrity, P.A. (2013). A gustatory receptor pa-
photoreceptor senses UV-B by tryptophan-mediated disruption of cross- ralogue controls rapid warmth avoidance in Drosophila. Nature 500, 580–584.
dimer salt bridges. Science 335, 1492–1496. Oesterhelt, D., and Hess, B. (1973). Reversible photolysis of the purple com-
Clyne, P.J., Warr, C.G., and Carlson, J.R. (2000). Candidate taste receptors in plex in the purple membrane of Halobacterium halobium. Eur. J. Biochem.
Drosophila. Science 287, 1830–1834. 37, 316–326.
de Bono, M., and Maricq, A.V. (2005). Neuronal substrates of complex behav- Okano, T., Fukada, Y., Shichida, Y., and Yoshizawa, T. (1992). Photosensitiv-
iors in C. elegans. Annu. Rev. Neurosci. 28, 451–501. ities of iodopsin and rhodopsins. Photochem. Photobiol. 56, 995–1001.
Dutta, A., Kim, T.Y., Moeller, M., Wu, J., Alexiev, U., and Klein-Seetharaman, J. Radding, C.M., and Wald, G. (1956). Acid-base properties of rhodopsin and
(2010). Characterization of membrane protein non-native states. 2. The SDS- opsin. J. Gen. Physiol. 39, 909–922.
unfolded states of rhodopsin. Biochemistry 49, 6329–6340. Rizzini, L., Favory, J.J., Cloix, C., Faggionato, D., O’Hara, A., Kaiserli, E., Bau-
Edwards, S.L., Charlie, N.K., Milfort, M.C., Brown, B.S., Gravlin, C.N., Knecht, meister, R., Schäfer, E., Nagy, F., Jenkins, G.I., and Ulm, R. (2011). Perception
J.E., and Miller, K.G. (2008). A novel molecular solution for ultraviolet light of UV-B by the Arabidopsis UVR8 protein. Science 332, 103–106.
detection in Caenorhabditis elegans. PLoS Biol. 6, e198. Salom, D., Cao, P., Sun, W., Kramp, K., Jastrzebska, B., Jin, H., Feng, Z., and
Falciatore, A., and Bowler, C. (2005). The evolution and function of blue and red Palczewski, K. (2012). Heterologous expression of functional G-protein-
light photoreceptors. Curr. Top. Dev. Biol. 68, 317–350. coupled receptors in Caenorhabditis elegans. FASEB J. 26, 492–502.
Foster, R.G., and Soni, B.G. (1998). Extraretinal photoreceptors and their regu- Scott, K., Brady, R., Jr., Cravchik, A., Morozov, P., Rzhetsky, A., Zuker, C., and
lation of temporal physiology. Rev. Reprod. 3, 145–150. Axel, R. (2001). A chemosensory gene family encoding candidate gustatory
Fridovich, I. (2013). Oxygen: how do we stand it? Med. Princ. Pract. 22, and olfactory receptors in Drosophila. Cell 104, 661–673.
131–137. Shen, W.L., Kwon, Y., Adegbola, A.A., Luo, J., Chess, A., and Montell, C.
Hagins, F.M. (1973). Purification and partial characterization of the protein (2011). Function of rhodopsin in temperature discrimination in Drosophila. Sci-
component of squid rhodopsin. J. Biol. Chem. 248, 3298–3304. ence 331, 1333–1336.
Hu, C.D., Chinenov, Y., and Kerppola, T.K. (2002). Visualization of interactions Sperling, W., and Rafferty, C.N. (1969). Relationship between absorption
among bZIP and Rel family proteins in living cells using bimolecular fluores- spectrum and molecular conformations of 11-cis-retinal. Nature 224, 590–594.
cence complementation. Mol. Cell 9, 789–798. Thompson, C.L., and Sancar, A. (2002). Photolyase/cryptochrome blue-light
Hubbard, R. (1969). Absorption spectrum of rhodopsin: 500 nm absorption photoreceptors use photon energy to repair DNA and reset the circadian
band. Nature 221, 432–435. clock. Oncogene 21, 9043–9056.
Insinna, C., Daniele, L.L., Davis, J.A., Larsen, D.D., Kuemmel, C., Wang, J., Ni- Thorne, N., and Amrein, H. (2008). Atypical expression of Drosophila gustatory
konov, S.S., Knox, B.E., and Pugh, E.N., Jr. (2012). An S-opsin knock-in mouse receptor genes in sensory and central neurons. J. Comp. Neurol. 506,
(F81Y) reveals a role for the native ligand 11-cis-retinal in cone opsin biosyn- 548–568.
thesis. J. Neurosci. 32, 8094–8104. Vought, B.W., Dukkipatti, A., Max, M., Knox, B.E., and Birge, R.R. (1999).
Iwabu, M., Yamauchi, T., Okada-Iwabu, M., Sato, K., Nakagawa, T., Funata, Photochemistry of the primary event in short-wavelength visual opsins at
M., Yamaguchi, M., Namiki, S., Nakayama, R., Tabata, M., et al. (2010). Adipo- low temperature. Biochemistry 38, 11287–11297.
nectin and AdipoR1 regulate PGC-1alpha and mitochondria by Ca(2+) and Wang, T., and Montell, C. (2007). Phototransduction and retinal degeneration
AMPK/SIRT1. Nature 464, 1313–1319. in Drosophila. Pflugers Arch. 454, 821–847.
Kolesnikov, A.V., Kisselev, O.G., and Kefalov, V.J. (2014). Signaling by Rod Wang, R., Mellem, J.E., Jensen, M., Brockie, P.J., Walker, C.S., Hoerndli, F.J.,
and Cone Photoreceptors: Opsin Properties, G-protein Assembly, and Mech- Hauth, L., Madsen, D.M., and Maricq, A.V. (2012). The SOL-2/Neto auxiliary
Requests for reagents and resources may be directed to the Lead Contact, X.Z. Shawn Xu (shawnxu@umich.edu).
C. elegans strains were maintained at 20 C on nematode growth medium (NGM) plates seeded with OP50 bacteria. Liquid culture
was used to produce large quantities of worms for protein purification (see Methods Details). Transgenic lines were generated by
injecting plasmid DNA directly into hermaphrodite gonad. Integrated transgenic strains were outcrossed at least six times before
used for protein purification.
METHODS DETAILS
Molecular biology
All the plasmids are listed in the Key Resources Table. All the LITE-1 and GUR-3 constructs carry a 1D4 tag at the C terminus, with the
exception in Figure 1 where no such a tag was included to LITE-1. Myc tag was only included in the construct used in Figure 1B. As
listed in the Key Resources Table, some plasmids contain an SL2::YFP fragment, which directs expression of YFP as a separate tran-
script under the control of the same promoter of its upstream gene in an operon-like fashion. This enables expression of YFP as a co-
expression marker in muscle cells under the control of the same muscle-specific myo-3 promoter that drives expression of LITE-1.
Quantification and statistical parameters were indicated in the legends of each figure, including error bars (SEM), n numbers, and
p values. For those involving multiple group comparisons, we applied ANOVA followed by a post hoc test. We considered
p values of < 0.05 significant.
α C-LITE-1
α N-LITE-1
α N-Myc
UVA
lite-1 transgene
0.1
0.05
0
5s
Figure S2. Ectopic Expression of LITE-1 as a Transgene in Muscle Cells Confers Photosensitivity, Related to Figure 2
LITE-1 was expressed as a transgene in muscle cells under the myo-3 promoter. Worm locomotion speed was monitored and quantified by WormLab system
(MBF Bioscience). UVA light (350 ± 20 nm, 0.8 mW/mm2) was directed to worms, which induced muscle contraction in lite-1 transgenic worms but not in WT
worms, leading to the paralysis of the former (locomotion speed reduced to zero), but not the latter. To minimize the effect of endogenous lite-1 gene on
locomotion speed under UV light, the experiments were done in lite-1(xu7) mutant background (i.e., both genotypes carried lite-1(xu7) mutation). Shades along
the traces denote error bars (SEM). n = 25.
A kDa B
kDa
191 191
97 97
64
64
51 51
39 39
28 28
19 Blot w/ α1D4
C 19 D
0.5
Absorbance (O.D.)
0 0
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700
G H
2 0.5
Absorbance (O.D.)
0.3
1
0.2
0.5
0.1
0 0
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700
Wavelength (nm) Wavelength (nm)
Figure S3. Comparison of the Spectral Properties of LITE-1, Bovine Rhodopsin, and Adenosine A2A Receptor Purified from Worm Muscles,
Related to Figure 2
(A and B) LITE-1, Rho, and A2AR were purified side-by-side from transgenic worms under the same conditions. All transgenes have a 1D4 tag at the C terminus.
(A) Coomassie staining. (B) Western.
(C) LITE-1 shows strong photoabsorption at 0.5 mM, whereas A2AR does not.
(D) Rho shows minimal photoabsorption at 0.5 mM, and only shows modest photoabsorption at a higher concentration (2.7 mM). Note: the y axis scale in (C) and (D)
are different.
(E and F) Denaturing LITE-1 with urea abolishes its photoabsorption (E), whereas the same treatment does not eliminate the photoabsorption of Rho and
instead shifts its 500 nm absorbance peak to 370 nm (F).
(G and H) Denaturing LITE-1 with NaOH abolishes its photoabsorption (E), whereas the same treatment on Rho does not and instead shifts its 500 nm
absorbance peak to 370 nm (F).
A B
1.5 1.5
LITE-1 (0.4 μM) Mock bRho (4 μM) Mock
Absorbance (O.D.)
H2O2 (0.1mM) H2O2 (0.1mM)
1 1
0.5 0.5
0 0
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700
0.2
0.05
0
5s
B
0.2
UVB
WT
Locomotion speed (mm/s)
0.15
lite-1 transgene
lite-1(S226F) transgene
0.05
0
5s
Figure S5. Residues S226 and A332 in LITE-1 Are Critical for Its Sensitivity to UVA but Not UVB Light In Vivo, Related to Figures 4 and 5
(A and B) LITE-1S226F and LITE-1A332V were expressed as a transgene in muscle cells under the myo-3 promoter. Worm locomotion speed was monitored and
quantified by WormLab system (MBF Bioscience). UVA (350 ± 20 nm, 0.8 mW/mm2) (A) or UVB (280 ± 10 nm, 0.03 mW/mm2) (B) light was directed to the worm,
which induced muscle contraction, leading to paralysis of the worm (locomotion speed reduced to zero). To minimize the effect of endogenous lite-1 gene on
locomotion speed under UV light, the experiments were done in lite-1(xu7) mutant background (i.e., all genotypes carried lite-1(xu7) mutation). Shades along the
traces denote error bars (SEM). n = 25.
A UVA
0.2
lite-1(W77F) transgene
0.1
lite-1(W328F) transgene
0.05
0
5s
UVB
B 0.2
Locomotion speed (mm/s)
lite-1 transgene
0.15
lite-1(W77F) transgene
0.1
lite-1(W328F) transgene
0.05
0
5s
Figure S6. The Two Tryptophan Residues W77 and W328 in LITE-1 Are Required for Its Sensitivity to Both UVA and UVB Light In Vivo, Related
to Figure 6
(A and B) LITE-1W77F and LITE-1W328F were expressed as a transgene in muscle cells under the myo-3 promoter. Worm locomotion speed was monitored and
quantified by WormLab system (MBF Bioscience). UVA (350 ± 20 nm, 0.8 mW/mm2) (A) or UVB (280 ± 10 nm, 0.03 mW/mm2) (B) light was directed to the worm.
The two tryptophan mutations disrupted the ability of LITE-1 in mediating UVA- and UVB-light-induced paralysis caused by muscle contraction (locomotion
speed reduced to zero). To minimize the effect of endogenous lite-1 gene on locomotion speed under UV light, the experiments were done in lite-1(xu7) mutant
background (i.e., all genotypes carried lite-1(xu7) mutation). Shades along the traces denote error bars (SEM). n = 25.
A *
LITE-1 - 74- I Y S W L V F C L L L F T T L R K F N Q V G V R P N G T R E N - L Q - E F F A N - 111
GUR-3 - 76- I Y N Y L T L A I L T A A T I R R I S Q I K Q K S A T N E E K D A A - F H V L N - 114
GUR-1 -114- L F L F R L L A I F P A T T D R K S R R - - - - - K R N H R S I I K L I L Y V N - 148
GUR-4 - 52- - - - - - - - - - L R - - - I D L - - - - - - R K P G A K R N I - - - - - - - N - 66
GUR-5 - 45- - - - - - - - - - L R - - - L D F V - - - - - N S D G W A R K I - - - - - - - N - 60
LITE-1 - 314- A Q - S I C W S E V V S I V I W I V N A I L V L L L F S L P A F M I N -
* 374
GUR-3 - 316- N G I Q A D M A E T F S V A I W L T N T M L A L M L F S I P A F M I A - 350
GUR-1 - 394- V H V K I C W A A Y Q V - - - - - V M A I L H I I I I C S T G M M T N - 423
GUR-4 - 304- Y D L I L C M P - - - - - - - - - - - - - - - T I G L C A F S F F A V - 323
GUR-5 - 297- T D F L I C M P - - - - - - - - - - - - - - - F I L F C T C A F C S V - 316
B C D E
100 150
Body paralysis (%)
gur-3 transgene
0 0 0 0
10s 10s
F 0.2 G 0.2
UVA UVB
Locomotion speed (mm/s)
Locomotion speed (mm/s)
lite-1 transgene
0.15 0.15
gur-3(Y79W)
transgene
0.1 0.1
gur-3 transgene
0.05 0.05
5s 5s
0 0
Figure S7. Sequence Alignment of C. elegans GR Family Proteins and Additional Data Related to GUR-3, Related to Figure 7
(A) The two tryptophan residues W77 and W328 in LITE-1 are marked with an asterisk in red. W77 is not conserved in any other GR members. W328 is only found
in GUR-3. The sequences between residues 112-313 in LITE-1 are not shown, as there is limited homology in this large segment between LITE-1 and other GRs.
(B) Mutating Y79 to W in GUR-3 does not promote its sensitivity to UVA light in vivo shown by paralysis assay. GUR-3Y79W and GUR-3 were expressed as a
transgene in muscle cells. Worms were exposed to a 20 s pulse of UVA light (350 ± 20 nm, 0.8 mW/mm2), and those showing muscle contraction-induced
paralysis during light illumination were scored positive. n = 50. Error bars: SEM p = 0.153 (t test).
(C–E) Mutating Y79 to W in GUR-3 does not promote its sensitivity to UVA light in vivo shown by calcium imaging. A 5 s pulse of UVA light (340 ± 20 nm,
0.7 mW/mm2) was directed to the worm. (C) and (D) Imaging traces. (E) Bar graph. n = 20. p = 0.779 (t test).
(F and G) Mutating Y79 to W in GUR-3 promote its sensitivity to UVB but not UVA light in vivo shown by locomotion assay. The assay was done as in Figure S6. The
lite-1 transgene traces were duplicates from Figure S6 and were included for comparison.
Article
Correspondence
diefenbach@uni-mainz.de (A.D.),
antigoni.triantafyllopoulou@
uniklinik-freiburg.de (A.T.)
In Brief
Polyploid macrophages develop in
response to chronic inflammatory
signaling from toll-like receptors via
replication stress and activation of the
DNA damage response.
Highlights
d Polyploid macrophage fate is controlled by persistent
inflammatory stimuli
1Department of Rheumatology and Clinical Immunology, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg,
Germany
3Institute of Human Genetics, Biozentrum, Am Hubland, 97074 Würzburg, Germany
4Molecular Carcinogenesis Group, Department of Histology and Embryology, School of Medicine, National and Kapodistrian University of
1264 Cell 167, 1264–1280, November 17, 2016 ª 2016 Elsevier Inc.
Thomas Haaf,3 Thomas Ness,16 Mario M. Zaiss,17 Reinhard E. Voll,1 Sachin D. Deshmukh,18 Marco Prinz,7,19
Torsten Goldmann,20 Christoph Hölscher,21,22,23 Anja E. Hauser,8 Andres J. Lopez-Contreras,24 Dominic Grün,6
Vassilis Gorgoulis,4,25,26,27 Andreas Diefenbach,9,10,* Philipp Henneke,2,28 and Antigoni Triantafyllopoulou1,2,30,*
23German Centre for Infection Research, 23845 Borstel, Germany
24Center for Chromosome Stability, Department of Cellular and Molecular Medicine, Panum Institute, University of Copenhagen,
2200 Copenhagen N, Denmark
25Faculty Institute of Cancer Sciences, Manchester Academic Health Sciences Centre, University of Manchester, Manchester M20 4QL, UK
26Biomedical Research Foundation, Academy of Athens, 115 27 Athens, Greece
27Department of Pathophysiology School of Medicine, National and Kapodistrian University of Athens, 115 27 Athens, Greece
28Center for Pediatrics and Adolescent Medicine, Medical Center, University of Freiburg, Faculty of Medicine, University of Freiburg,
formation of a granuloma, a compact and often highly ordered cess for the genesis of the various polyploid MF subsets found
aggregate of immune cells that forms in response to a persistent in granulomatous diseases is lacking. The fact that MF in gran-
inflammatory stimulus. At its core, the granuloma consists of ulomas carry varying copy numbers of their genomic information
different macrophage (MF) subsets displaying a range of mor- poses a series of significant basic questions that have not been
phologies, such as epithelioid MF, foam cells (i.e., MF loaded addressed to date. Does the formation of polyploid MF pose a
with lipid droplets), mononuclear (MoNucl), binuclear (BiNucl), threat to their genomic stability? What is the role of the DNA
and multinuclear (MultiNucl) MF (or MMF), and Langhans giant damage response in MF differentiation into polyploid subsets?
cells (Ramakrishnan, 2012; Williams and Williams, 1983). The Do polyploid MF constitute a distinct fate that contributes to
molecular programs that control the differentiation of such MF the pathogenesis of granulomatous diseases?
populations in response to a chronic stimulus are likely critical Here, using an array of techniques, we delineate a MF differ-
to disease outcome. A prominent example is tuberculosis, a entiation pathway in response to persistent inflammatory stimuli.
pandemic infectious disease caused by Mycobacterium (M.) Sensing of BLP controlled the differentiation of proliferating MF
tuberculosis. M. tuberculosis infects approximately a third precursors into polyploid MF expressing distinct metabolic and
of the world’s population and is the leading cause of death ECM remodeling gene expression signatures. Toll-like receptor
from a bacterial infection worldwide (Nathan, 2009). In tubercu- (TLR)2 signaling via MyD88 promoted MF genome duplications
losis, distinct spectra of MF differentiation determine disease via mitotic defects but not by cell-to-cell fusion. BLP-induced
outcome. On one end of the spectrum, microbicidal MF kill intra- polyploid MF grew further by re-entering the cell cycle and over-
cellular bacteria. On the other end, permissive MF provide coming p53-dependent barriers to their proliferation. TLR2
mycobacteria with a replicative niche. These spectra are charac- signaling promoted MF polyploidy and alleviated genomic insta-
terized by distinct metabolic and effector profiles. Thus, microbi- bility, by regulating Myc and the DNA damage response (DDR).
cidal MF produce reactive nitrogen species, MF characterized Therefore, we have unlocked a previously unknown and unique
by lipid accumulation (foam cells) are associated with granuloma role of growth and DDR signals in determining MF differentiation
necrosis and bacterial persistence (Peyron et al., 2008; Russell in the presence of persisting inflammatory stimuli.
et al., 2009), and MF expressing extracellular matrix (ECM)-
remodeling molecules, such as matrix metalloproteinase 9 RESULTS
(MMP9) are crucial for granuloma formation and bacterial spread
(Taylor et al., 2006; Volkman et al., 2010). It is currently unknown MF with Varying Numbers of Nuclei Co-localize with
if the various phenotypically distinct MF subsets contained in Proliferating F4/80+ Precursors in Small Mycobacterial
granulomas exhibit distinct metabolic or functional profiles. Granulomas
Thus, the mechanisms controlling MF differentiation and the To identify mechanisms by which persistent inflammatory sig-
functional profiles of the various MF subsets contained in gran- nals instruct MF differentiation in granulomas, we used infection
ulomas are key to identifying novel strategies to promote host with M. bovis Bacillus Calmette-Guerin (BCG). BCG induced the
resistance. formation of BiNucl MF and MMF in liver granulomas (Figures
Within the scope of understanding MF cell-fate decisions in S1A–S1B) and increased the numbers of proliferating (Ki67+)
granulomas, an important and unresolved question relates to F4/80+ cells (Figures S1C and S1D). In smaller granulomas, an
MF polyploidization. It is generally believed that the formation organized topographical arrangement of Ki67+F4/80+ precur-
of polyploid giant cells can be explained by cell-to-cell fusion sors and BiNucl MF or MMF emerged: the former were posi-
(Helming and Gordon, 2009). While this has been well docu- tioned in the outside (Figure S1E), while BiNucl MF and MMF
mented for RANKL-induced osteoclasts and for the generation were found primarily in the center of the granuloma (Figure S1F).
of MMF using stimulation of non-cycling progenitors with myco- In more mature granulomas, Ki67+F4/80+ cells were fewer and
bacteria or bacterial lipoproteins (BLPs) in vitro (Puissegur et al., their location within granulomas was ill defined (Figure S1G).
2007), direct evidence for cell-to-cell fusion as the leading pro- These data raised the possibility that during BCG infection
B C
D E F G
H I J
M
K
(E–J) MafB negatively regulates MMF formation. (E) qRT-PCR of Mafb mRNA expression. Mean ± SD of triplicate determinants from three independent ex-
periments. (F and G) Immunoblotting (IB) for MafB and IRF8. Example of two independent experiments. (H) Violin plots comparing expression of Mafb, by scRNA-
seq. (I and J) MF precursors transduced with empty retroviral vector pMX-IRES-EGFP (EV) or pMX-Mafb-IRES-EGFP (MafbV) prior to stimulation. (I) IB for MafB.
(J) Numbers of MMFs. Mean ± SD from three independent experiments.
(K and L) Metabolic gene signatures in BLP-induced MMF. (K) Violin plots comparing expression of Apoe; Abca1 was analyzed by scRNA-seq. (L) IF for Nile Red
is shown.
(M) Heatmap of selected genes differentially expressed in Kupffer cells, granuloma F4/80hi MF, and F4/80low BiNucl and MMF. Means of duplicate determinants
from five to nine independent biological replicates per group.
(E and M) qRT-PCR data normalized relative to Gapdh mRNA expression.
(H and K) y axis, log2 (normalized count+0.1) expression levels; black point, mean of expression level.
*p < 0.05, **p < 0.01; scale bars, 10 mm. See also Figures S1, S2, S3, and S4.
D E
J K
Figure 2. MMF Formation from MF Precursors Does Not Involve Cell-to-Cell Fusion
(A) IF for CD45.1, CD45.2 in stimulated MF precursors. White arrows, CD45.1+CD45.2– MMF (middle), CD45.1+CD45.2+ osteoclast-like MMF (bottom).
(B and C) IF for CD45.1, CD45.2 on liver granuloma cryosections from BCG-infected CD45.1:CD45.2 chimeras. (B) Example of a CD45.1+CD45.2– MMF.
(C) Numbers of BiNucl and MMF with the indicated phenotype; n = 5 chimeras, N.D., not detectable.
(D–K) BLPs and mycobacteria regulate nuclear ploidy (D–F and I): QIBC (G, H, J, and K): FISH. (D) Nuclear area per single nucleus. Black line, mean nuclear area.
One representative experiment of three independent experiments. (E and F) DNA content per nucleus. (E) Representative histograms. Red line, cutoff for DNA
content >4c. (F) Percentage of polyploid (>4c) nuclei, as in (E). Mean ± SD from three independent experiments. (G and H) FISH for chromosomes 2, 11, X, 16
in vitro. (G) Representative images. (H) Percentage of total nuclei with the indicated number of FISH signals. 155–212 nuclei per condition were analyzed. Mean ±
SD from three independent experiments. (I) Distribution of polyploid nuclei in BLP-stimulated MoNucl, BiNucl, and MultiNucl cells. (J and K) FISH for chromo-
somes 2, 11 in lung cryosections from M. tuberculosis-infected WT and Il13Tg mice. (J) Representative images. (K) Numbers of polyploid (FISH signals R2;2 per
nucleus) nuclei from 25 visual fields in MF-rich granuloma areas.
*p < 0.05, **p < 0.01, ***p < 0.001; scale bars, 10 mm. See also Figure S5.
F G H
I K L
(B–E) Examples of still images from selected time points. (B) Successful cell divisions in medium without FSL-1. The corresponding movie is Movie S1.
(C–E) Examples of cytokinesis failure outcomes in FSL-1-stimulated MF precursors.
(C) Cytokinesis failure leads to a BiNucl daughter cell (top), which re-enters mitosis and re-fails cytokinesis, generating again a BiNucl daughter cell (bottom). The
corresponding movies are Movies S2 (top) and S4 (bottom).
(D) Cytokinesis failure leads to a BiNucl daughter cell (top). Lagging chromosome (yellow arrow) visualized at the cleavage furrow, cleavage furrow regression,
and formation of a BiNucl daughter cell containing a micronucleus (MN, bottom). The corresponding movies are Movies S3 (top) and S6 (bottom).
(E) MoNucl parent cell undergoes a tripolar mitosis and fails cytokinesis producing a MultiNucl daughter cell (top). Yellow arrows, lagging chromosomes and MN
(bottom). The corresponding movies are Movies S5 (top) and S7 (bottom).
(F) Outcome of single-cell divisions from MoNucl and BiNucl parent cells. Mean ± SD from three independent experiments.
(G) Percentage of MoNucl, BiNucl, and MultiNucl daughter cells per 100 mitoses.
(H) Percentage of MoNucl and BiNucl cells entering mitosis during the live-cell imaging session.
(I and J) MMF formation via endoreplication and cytokinesis failure (I) or recurrent cytokinesis failure (J).
(K) Percentage of cells containing MN by QIBC. Mean ± SD from three independent experiments.
(L) Lagging chromosomes in a mitotic macrophage in BCG liver granuloma.
(G and H) n > 300 mitotic events, representative of two independent experiments. *p < 0.05; scale bars, 10 mm; timescale, hours: minutes. See also Movies S1, S2,
S3, S4, S5, S6, and S7.
C D E
G H
D E F
G H I
J K L M
P Q R
S T
B C
D E F
G H
D E F
H I J
L M K
Figure 7. BLPs Activate the DDR and Induce MF Polyploidy via Myc
(A–C) p53 suppresses BLP-induced polyploid MF, QIBC. (A) Representative images; (B) percentage of BiNucl and MMF; (C) percentage of polyploid (>4c) nuclei.
(B and C) Mean ± SD from averages of triplicate replicates from two independent experiments.
(D) qRT-PCR of Myc mRNA expression, normalized relative to Gapdh mRNA expression. Mean ± SD of triplicate determinants pooled from three independent
experiments.
(E) IB of nuclear lysates for Myc. Example of two independent experiments.
(F–H) Myc regulates S phase TLR2-DDR signaling, QIBC. (F) Representative images. (G) Mean gH2AX intensity per nucleus. Red lines, cutoff for gH2AXhi
expression. Black lines, mean values. (H) Percentage of gH2AXhi nuclei, as in (G).
(I) Mean EdU versus total DAPI intensity (top) and gH2AX versus total DAPI intensity (bottom). Purple line, cutoff for EdU positivity. Darker dots (bottom), EdU+
nuclei. Red line (bottom), cutoff for gH2AX positivity.
(J) Percentage of gH2AX+ nuclei, among EdU+ nuclei, as in (I).
(K) Percentage of EdU+ nuclei, as in (I).
(L and M) Myc regulates BLP-induced lipid droplet accumulation. QIBC of cytoplasmic lipid droplet accumulation. (L) Nile Red IF. (M) Percentage of nuclei
associated with Nile Red+ cytoplasmic droplets.
(H, J, K, and M) Mean ± SD from three independent experiments. *p < 0.05, **p < 0.01, ***p < 0.001, Scale bars, 10 mm.
A.T., L.H., and V.H. designed, performed, and analyzed the majority of the ex- Coschi, C.H., Martens, A.L., Ritchie, K., Francis, S.M., Chakrabarti, S., Berube,
periments with help from K.G., J. Senges, N.M., and T.A. The indicated exper- N.G., and Dick, F.A. (2010). Mitotic chromosome condensation mediated by
iments were performed and analyzed by I.N., T.H. (FISH, SKY, cytogenetics), the retinoblastoma protein is tumor-suppressive. Genes Dev. 24, 1351–1363.
T.N. (DNA fiber assays, metaphase analysis), S., D.G. (scRNA-seq), K.E., V.G., Davoli, T., and de Lange, T. (2011). The causes and consequences of poly-
T.N., R.E.V., and T.G. (human studies), D.E. and M.P. (LCM-MF gene expres- ploidy in normal development and cancer. Annu. Rev. Cell Dev. Biol. 27,
sion), S.D.D. (microscopy), and J. Stefanowski, A.E.H., M.M.Z., C.K., and C.D. 585–610.
(osteoclast analysis in vivo). C.H. (M. tuberculosis infections), D.P. (microar- Davoli, T., Denchi, E.L., and de Lange, T. (2010). Persistent telomere damage
rays), B.K. (LCI), M.S. (pathology), and M. Follo (QIBC) helped with experi- induces bypass of mitosis and tetraploidy. Cell 141, 81–93.
ments. D.W., M. Fliegauf, S.S., M.S., and A.J.L.-C. provided critical reagents;
Emson, C.L., Bell, S.E., Jones, A., Wisden, W., and McKenzie, A.N. (1998).
L.R. and A.D. analyzed gene array data; S.S., M.H.S., and S.D.D. provided in-
Interleukin (IL)-4-independent induction of immunoglobulin (Ig)E, and pertur-
tellectual input. A.J.L.-C. and V.G. provided intellectual input on RS and DDR.
bation of T cell development in transgenic mice expressing IL-13. J. Exp.
V.G. directed the human studies. A.D. co-directed research and revised the
Med. 188, 399–404.
manuscript. P.H. oversaw initial experiments. A.T. directed research and wrote
the manuscript with input from co-authors. Fenech, M., Kirsch-Volders, M., Natarajan, A.T., Surralles, J., Crott, J.W.,
Parry, J., Norppa, H., Eastmond, D.A., Tucker, J.D., and Thomas, P. (2011).
Molecular mechanisms of micronucleus, nucleoplasmic bridge and nuclear
ACKNOWLEDGMENTS
bud formation in mammalian and human cells. Mutagenesis 26, 125–132.
We thank L. Ivashkiv, Y. Tanriver, M. Lenardo, E. Trompouki, and P. Heun for Fujiwara, T., Bandi, M., Nitta, M., Ivanova, E.V., Bronson, R.T., and Pellman, D.
helpful discussions and R. Rzepka, J. Volz, A. Hölscher, K. Schrenk, M. Vavra, (2005). Cytokinesis failure generating tetraploids promotes tumorigenesis in
A. Imm, and the Advanced Medical Bioimaging Core Facility of the Charité for p53-null cells. Nature 437, 1043–1047.
excellent technical assistance; C. Blattner and M. Oren for p53/ mice. The Ganem, N.J., and Pellman, D. (2007). Limiting the proliferation of polyploid
work was supported by the European Union’s Seventh Framework Pro- cells. Cell 131, 437–440.
Further information and requests for reagents may be directed to, and will be fulfilled by the corresponding author Antigoni Trianta-
fyllopoulou (antigoni.triantafyllopoulou@uniklinik-freiburg.de).
Human specimens
Formalin-fixed, paraffin-embedded sections from 10 M. Tuberculosis lung, 15 sarcoidosis skin and 10 giant cell arteritis temporal
artery biopsies, obtained for diagnostic purposes, were analyzed. The demographics of the patients are listed below.
M. Tuberculosis patients (Borstel Cohort): 6 males and 4 females, 18-72 years old at the time of the biopsy.
Skin sarcoidosis patients (Athens Cohort): 2 males and 13 females, 36-75 years old at the time of the biopsy. Giant cell arteritis
patients (Athens Cohort): 2 males and 5 females, 51-67 years old at the time of the biopsy.
Giant cell arteritis patients (Freiburg Cohort): 3 males and 7 females, 65-84 years old at the time of the biopsy.
Protocols for experimental use of clinical samples were approved by the Ethics Committees of the Medical School of Athens
(sarcoidosis and giant cell arteritis samples), the University of Freiburg (giant cell arteritis samples) and the University of Lübeck
(M. tuberculosis samples).
Mice
Conventional C57BL/6 mice were purchased from Charles River or Janvier. IL-13-transgenic (tg) mice were previously described
(Emson et al., 1998). 8-12-week old, age- and sex- matched mice were used for all in vitro and in vivo experiments. For the generation
of bone marrow chimeras C57BL/6 CD45.2+ mice were lethally g-irradiated (900 rads) from a cesium source and subsequently re-
constituted with a mixture of bone marrow cells from C57BL/6 CD45.2+ and C57BL/6 CD45.1+ congenic mice. For the first 4 weeks,
mice received antibiotic-containing drinking water. Animals were allowed to reconstitute for 6-12 weeks prior to infection with
M. bovis BCG. Following reconstitution bone marrow of chimeric mice contained roughly equal numbers of CD45.1+ and CD45.2+
leukocytes. All animal experiments were approved and performed in accordance with the guidelines of the local animal care and
use committees of the Regierungspräsidium Freiburg and Kiel.
METHOD DETAILS
Retroviral transductions
Retrovirus packaging was performed by transfecting the retroviral vectors into Phoenix cells using FuGENE6 (Roche) for the pBABE-
H2BGFP vector (pBABE-H2BGFP was a gift from Fred Dick (Addgene plasmid # 26790)) and CalPhos transfection reagent (Clontech)
for the pMX-Mafb-IRES-Egfp and pMX-IRES-Egfp empty vectors. Bone marrow cells were infected with the recombinant retrovi-
ruses in the presence of 4mg/ml polybrene and M-CSF (20ng/ml) for 24h, after which the media was changed. After 48h adherent
cells were collected, GFP+ cells were sorted and re-plated for stimulation.
FISH
Labeled probes from four different mouse chromosomes; 2qH3(Anurka); 11qE1 (Tlk2); 16qC4(Rcan1); XqF1(Rab9b) were hybridized
to methanol-acetic acid fixed cells, according to suppliers instruction (Kreatech). After hybridization and washing cells with specific
hybridization signals were photographed using specific sets of filters using fluorescence microscope (Axio Imager, Zeiss) equipped
with a CCD camera and digitized images of the FITC, CY3, and DAPI signals of same cell were merged using the FISH imaging soft-
ware, FISHView 2.0 (Applied Spectral Imaging).
SKY
Metaphase chromosomes were prepared according to standard procedures. Hybridization with mouse SKY chromosome paints
(SkYPaint, Applied Spectral Imaging) was carried out following manufacturer’s instructions. After hybridization and washing, spectral
images were acquired using a HiSky system (SD300) and dedicated Spectral imaging Software (Vers. 2.6). Obtained SKY images
were then analyzed by the SkyView software, version 6.0 (Applied Spectral Imaging). Karyotypes depicted in the figures are prepared
from the spectrally classified pseudo-colored chromosomes.
Gapdh (Mm03302249_g1),
Emr1 (Mm 00802529_m1),
Apoe (Mm01307193_g1),
Nfkbiz (Mm00600522_m1),
Ccl5 (Mm01302427_m1),
Chi3l3 (Mm00657889_mH),
Lox (Mm00495386_m1),
Ctsk (Mm00484039_m1),
Mmp9 (Mm 00600163_m1),
Pcna (Mm00448100_g1),
Ccnd2 (Mm00438070_m1),
Mcm6 (Mm00484848_m1),
Blm (Mm00476150_m1),
Rad50 (Mm00485504_m1),
Rad52 (Mm00448543_m1) and
Myc(Mm00487804_m1).
Data are presented as mean ± SD. Sample number (n) indicates the number of independent biological samples in each experiment.
Sample numbers and experimental repeats are indicated in figures and figure legends or methods section above. p value of datasets
was determined by Student’s t test with 95% confidence interval. All statistical tests were performed with Graph Pad Prism V4 soft-
ware (*p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, n.s. not significant).
DATA AVAILABILITY
The accession number for the gene array data reported in this paper is ArrayExpress: E-MTAB-5085. The accession number for the
scRNA seq data reported in this paper is NCBI GEO: GSE86929.
F>4c vs
F2c
Ctrl
F> 4c vs
F>4c vs
Ctrl
F>4c vs
F2c
F2c vs
Ctrl
F2c vs
0
Upregulated genes Downregulated genes
Figure S4. Gene Signatures Characteristic of Polyploid Macrophages-Reactome Pathway Enrichment Analysis, Related to Figure 1
MF precursors stimulated with FSL-1 or medium for 6 days were isolated based on their DNA content and analyzed by single cell RNA-seq. Reactome pathway
enrichment analysis was performed on genes exhibiting a minimum of 2-fold expression difference between the two subgroups with a p-adjusted value less
than 0.05.
A C
CD45.1:CD45.2 (CD45.1 x CD45.2) F1
uninfected chimera control uninfected non-chimeric control
CD45.1:CD45.2
50%:50%
Analysis
Week: -6 0 2-3
B
CD45.2 CD45.1 merge
DAPI DAPI
D endoreplication
F
G2 11, DAPI
2c S
x2 G1 4c
E cytokinesis failure
M. bovis BCG
M
G2
2c 2c
S
G1
2c
Figure S5. MMF Formation from MF Precursors Does Not Involve Cell-to-Cell Fusion, Related to Figure 2
(A and B) M. bovis BCG-induced MMF do not require cell-to-cell fusion in vivo. (A) CD45.2 mice were lethally irradiated and reconstituted with a mixture of BM
cells from CD45.1 and CD45.2 mice. 6-12 weeks later, uninfected mice were analyzed. Representative images of bone cryosections from 2 independent ex-
periments showing roughly equal numbers of either CD45.1+ or CD45.2+ bone marrow leukocytes. Scale Bar 100 mm
(B) CD45.2 mice were lethally irradiated and reconstituted with a 1:1 mixture of bone marrow cells from CD45.1 and CD45.2 mice. 6-12 weeks later, the mice were
infected with M. bovis BCG i.p. and analyzed 2-3 weeks p.i. Liver granuloma cryosections were stained with antibodies to CD45.1, CD45.2 and F4/80.
Representative images from 3 independent experiments with n = 5 mice per experiment are shown. p.i., post infection.
(C) (CD45.1xCD45.2)F1 non-chimeric uninfected mice were analyzed as controls. Representative images of bone cryosections from 2 independent experiments
showing numerous leucocytes double positive for CD45.1 and CD45.2. Scale Bar 100 mm
(D) Schematic depiction of polyploidy arising via endoreplication: cycling cells undergo multiple cell cycles without entering mitosis, doubling their DNA content
within a single nucleus.
(E) Schematic depiction of polyploidy arising via cytokinesis failure: cells enter mitosis but fail to physically split into two following chromosome segregation,
giving rise to BiNucl tetraploid daughter cells.
(F) C57BL/6 wild-type were infected with M. bovis BCG i.p. 2-3 weeks p.i. liver cryosections were analyzed by FISH for chromosome 11. Representative pictures
from 3 independent experiments are shown. p.i., post infection.
(G) Schematic depiction of polyploidy arising via cytokinesis failure: following a first cytokinesis failure, BiNucl tetraploid daughter cells, re-enter mitosis and re-
distribute their genetic content into two nuclei thus generating a BiNucl daughter cell with 4c DNA content in each nucleus.
A B
Control
Ccnd1 Ccnd2
Relative expression
FSL-1 FSL-1: - - - + +
**
10 3 day: 0 3 6 3 6
*
8
2 cyclin D1
6
4 1 cyclin D2
2
0 0
day: 0 1 3 6 Tbp
day: 0 1 3 6
C D A Myd88+/+
B
Control
A B Myd88+/+
FSL-1
BrdU
25 ** 3 FSL-1
20
15 2
10
1
5
0 0 DAPI, total intensity per
Myd88+/+ Myd88-/- Myd88+/+ Myd88-/- nucleus (A.U.)
Figure S6. TLR2 Signaling Confers a Proliferation Advantage to Polyploid MF Progeny, Related to Figure 4
(A) qRT-PCR of Cnnd1 and Cnnd2 mRNA, normalized relative to Gapdh mRNA expression. Mean ± SD of triplicate determinants pooled from 3 independent
experiments.
(B) IB of nuclear lysates for Cyclins D1 and D2. Example of 2 independent experiments.
(C–E) Increased BrdU incorporation into polyploid nuclei, IF for BrdU, DAPI; QIBC. (C) Representative images. (D) Mean BrdU fluorescence versus total DAPI
intensity per single nucleus. (E) Percent of BrdU+ nuclei belonging in gate A or B, as in (D). n = 1000-5000 nuclei per condition. Mean ± SD from 3 independent
experiments.
*p < 0.05, **p < 0.01, Scale Bar, 10 mm.
(legend on next page)
Figure S7. Replication Stress and Activated DDR in Granulomas Enriched in MMF In Vivo, Related to Figure 6
(A–C) Representative images of uninvolved tissue adjacent from granulomatous areas from 10-15 patient biopsies per granulomatous disease are shown. (A) IH
for gH2AX. (B) IF for p-RPA2. (C) IF for 53BP1. n = 10-17 patient biopsies per disease.
(D) GSEA for p53-dependent genes in MF precursors stimulated for 6 days with FSL-1 versus medium. Gene array was performed with 4 independent biological
replicates per group.
Scale Bars, 100 mm.
Article
Correspondence
yalanwang@mdanderson.org (Y.A.W.),
rdepinho@mdanderson.org (R.A.D.)
In Brief
Epigenetic activation of WNT5A
expression contributes to glioblastoma
tumor recurrence by promoting
differentiation of glioma-derived stem
cells into endothelial cells.
Cell 167, 1281–1295, November 17, 2016 ª 2016 Elsevier Inc. 1281
A B ** C 100
KT
200 **
Colony Number
N
L
3D
3D
TR
FP
75
p5
p5
CTRL (n=5)
G
C
p53DN 100
CTRL p53DN p53DN-AKT p53DN(n=5)
pAKT(S473) 50 p53DN-AKT(n=10)
pAKT(T308)
Total AKT 0 25
pERK p<0.0001
Total ERK
0
Actin 0 10 20 30
Weeks
D F Nestin GFAP Ki67
E
N
G Top 10 Enriched Pathways with FDR<0.25 1.0 1.2 1.4 1.6 1.8 2.0 2.2
HEMATOPOIESIS_STEM_CELL_NUMBER_LARGE_VS_TINY_UP
ENDOTHELIAL CELL MARKER CD31+ VS. CD31- UP(STEM_CELL_DN)
CD8_STEM_CELL_MEMORY_VS_NAIVE_CD8_TCELL_UP
LEUKEMIC_STEM_CELL_DN
CD8_STEM_CELL_MEMORY_VS_EFFECTOR_MEMORY_CD8_TCELL_DN
HEMATOPOIETIC_STEM_CELL_VS_CD4_TCELL_DN
HEMATOPOIETIC_STEM_CELL_VS_COMMON_LYMPHOID_PROGENITOR_DN -log(p,10)
HEMATOPOIETIC_STEM_CELL_VS_MULTIPOTENT_PROGENITOR_UP Normalized ES
GERMLINE_STEM_CELL
LIVER_CANCER_STEM_CELL_DN
Enrichment Profile
0.100
Oncogenic Activation (85 genes)
Geneset Hits
0.075 Ranking Metric Scores
0.050
0.025
0.000
Ranked List Metric
Figure 1. Overexpression of p53DN and myr-AKT Generates Malignant Glioma and Upregulates EC Signaling Pathway
(A) Immunoblot analysis of overexpressed oncogenes in hNSCs.
(B) Soft agar colony formation of hNSCs expressing p53DN, p53DN/myr-AKT (p53DN-AKT). Error bars represent SD of triplicate wells. **p < 0.01. Representative
images are shown.
(legend continued on next page)
CD133+/CD144+
Fold Change of
2 8
Vector
6
4
0.0 0.0 1
2
34
E2
F
31
14
10
FR
vW
D
DMSO RAPA
TI
D
D
C
G
C
C
C
CD133-PE
VE
p53DN
D
Dil-AcLDL/DAPI
CD105 VEGFR2 vWF
0.0 0.0
NSC media
0.0 0.3 54.1 11.8
p53DN-AKT
EC media
0.0 0.2
CD144-FITC
6
6
3
2
7
54
57
58
60
11
14
E CD133+/CD144- CD133+/CD144+ CD133+/CD144+ +RAPA F
BT
BT
TS
TS
TS
TS
pAKT (Ser473)
AKT
p-p70S6K (Thr389)
p70S6K
pS6 (Ser235/236)
S6
Actin
G 80 H 40 I 1.5
CD133+/CD144- GFP myr-AKT DMSO RAPA
CD133+/CD144+ ** **
**
CD133+/CD144+
CD133+/CD144+
60
Fold Change of
Fold Change of
30
Percentage (%)
40 20 **
0.5
20 10 **
0 0 0
TS543 TS576 TS586 BT112 TS603 BT147 TS543 TS576 TS586 TS603 BT147
CD133+/CD144+ (%)
0.07 0.58 0.07 0.70 **
8
GPR37
Vector
4 **
0
0.08 10.3 0.14 1.51
LIP
T
X5
F7
r
7
3
A
14
14
cto
Myr-AKT
AK
R3
RT
T5
TC
CL
DT
DL
MYLIP
MY
Ve
GP
WN
DM
CX
NU
C
Scramble shCXCL14 shDLX5
NUDT14
CD133-PE
TCF7
DLX5
D 30
Branch Points
Number of
0.05 0.19 0.02 6.73
20
WNT5A
DMRT3
10
**
0
X5
F7
LIP
ble
7
3
A
14
14
R3
RT
T5
TC
CL
DL
DT
ram
MY
GP
WN
DM
CX
sh
sh
NU
CD144-FITC
sh
Sc
sh
sh
sh
sh
sh
E F DMSO BOX5 G
CD133+/CD144+ (%)
** **
12 DMSO BOX5 30
Branch Points
Number of
** 20
8
4 10
0 0
OE myr-AKT OE WNT5A DMSO BOX5
chromatin status defined by both H3K4me3 and H3K27me3 To further explore the mechanisms governing the transcrip-
marks (Bernstein et al., 2006; Figures 4A, 4B, S3A, and S3B). tional regulation of the WNT5A locus under AKT activation,
In contrast, the WNT5A promoter of WNT5A-expressing iGSCs TCGA proteomic datasets analyses (RPPA) further confirmed
exhibited an active H3K27ac mark with concomitant loss of the the correlation between WNT5A mRNA levels and the mTOR/
repressive H3K27me3 mark (Figures 4A and 4B). These patterns S6K pathway (Figure S4A). We next identified a significant
are consistent with the poised WNT5A promoter being epigenet- negative correlation between WNT5A expression and known
ically activated during transformation. master transcription factors of NSC self-renewal and lineage
hNSC R2 R1
H3K27ac 19.1
_
R1 P
P
iGSC
21.6
_
hNSC
36.1
_
H3K4me3
iGSC
30.4 _
hNSC
27.7
_
H3K4me1,2
iGSC
WNT5A WNT5A
chr11 AAGTCGTCAGTGAA GGTAATTAGG
C _
AATT
10 kb (hg19) 31,840,000 2.0 2.0
hNSC
T A GCGT
Bits
Bits
12.6 1.0
AA 1.0
_
H3K27me3 AA
C
C T G G
T
C
A
G
iGSC 0.0 CGGA C
0.0
G
C C
A
A
T
G A
GG
TTT
C G
AA
C
T
T
G
A CC A A
T A T TGT TC C
G
CC G
5 10
C C G C
T T
26.5 _ 5 10
PAX6 Binding Motif DLX5 Binding Motif
hNSC
16.4 _
H3K27ac
iGSC D F
55.7 _
30 30
hNSC
Relative Fold Enrichment
20 20
hNSC
33.4 _
H3K4me1,2 iGSC
10
10
PAX6
chr7 20 kb (hg19) 96,660,000 0
E R1 R2 P n
22.4 _
0
xo
hNSC
TB
_
AC
H3K27me3
iGSC G H
Relative WNT5A mRNA Level
**
iGSC ** *
24.2 _
1 8 *
hNSC
H3K4me3 47.7 _
iGSC
32.6 _
0.5 4
hNSC
H3K4me1,2 31.8 _
iGSC 0 0
iGSC TS603 BT147 iGSC TS543 TS576
DLX6 DLX5 Vector PAX6 OE Vector DLX5 OE
determination including Gli2, FoxG1, SOX2, PAX4/6, and HES1 sive H3K27me3 mark following transition from hNSCs to iGSCs
in this specific context (Figure S4A). These findings indicate (Figures 4C and S4B–S4G). Correspondingly, the WNT5A locus
that downregulating the neurogenesis TFs may be necessary possesses PAX6 binding motifs located in regulatory region 1
for EC lineage differentiation of GSC. Moreover, only the PAX (R1), regulatory region 2 (R2), and promoter region (P) (Figures
subclass (PAX4 and PAX6) promoter exhibited a gain in repres- 4A and 4B), which were further validated by ChIP-PCR in hNSCs
Vector
Vector
DAPI/TRA-1-85
WNT5A OE
WNT5A OE
C Vector WNT5A OE D F
p=7.94e-05 GCV(-) GCV(+)
( )
Intratumor
p=7.39e-05
DAPI/TRA-1-85/CD34
30
TRA-1-85+/CD34+(%)
Tumor-1
20
p=0.004
10
Peritumor
Tumor-2
Intratumor Peritumor
Vector WNT5A OE
G H
80
p<0.0001
GCV(-)
CD34 Microvascular
DAPI/TRA-1-85
Density (MVD)
60
40
20
GCV(+)
0
GCV(-) GCV(+)
Figure 5. WNT5A-Mediated Endothelial Lineage Differentiation in Tumor Neovascularization and Satellite Lesion Formation
(A) Representative images for the hemorrhage lesion in mouse brain that received injection of TS543-overexpressing WNT5A (WNT5A OE) versus control (Vector).
H&E and IHC analyses of tumor sections show the microvascular hyperplasia (black arrows) and expression of CD34 and WNT5A. Scale bar, 50 mm.
(B) Representative images for the satellite lesions in peritumoral areas. Scale bar, 200 mm.
(C) Representative images for GdECs (yellow arrows) identified by co-staining with TRA-1-85 and CD34 in intratumoral and peritumoral areas. Scale bar, 25 mm.
(D) Quantitation of TRA-1-85+/CD34+ cells using Vectra software system (n = 3 tumors).
(E) High magnification of rectangle area in (C). Scale bar, 10 mm.
(F) IHC staining of CD34 in intracranial tumors derived frompCD144-GFP infected WNT5A-TS543following GCV treatment. Representative images of low (scale
bar, 100 mm) and high (scale bar, 50 mm) magnification.
(G) Dotplots for quantitation of MVD in tumors with/without GCV treatment (n = 4 tumors, five fields per tumor).
(H) Representative images for tumor appearance (left, scale bar, 2,000 mm) and peritumoral satellite lesions (right, scale bar, 200 mm).
See also Figure S5.
Distance from
Host EC (μm)
20
10
0
TRA-1-85+/ TRA-1-85+/ e.g., pCD144-GFP+
pCD144-GFP- pCD144-GFP+
TS543-WNT5A TS603 ** **
D 3 ** * E * ** F
4 **
Fluorescence Intensity
3
Relative mRNA level
of Invaded HBMECs
Fluorescence Intensity
of Invaded HBMECs
**
3
2 **
2
2
1 1
1
0 0 0
TS543-WNT5A TS603 CD144 WNT5A CD144 WNT5A
X5 A+
A
A
S
TR
T5
T3
FB
BO T5
pCD144-GFP- pCD144-GFP+ pCD144-GFP- pCD144-GFP+
N
4%
N
rW
rW
rW
G H
DAPI/TRA-1-85 DAPI/pCD144-GFP DAPI/CD34 Merge pCD144-GFP- pCD144-GFP+
pCD144-GFP+/HBMECs HBMECs
Large Lesion Small Lesion
Neurosphere Formation
** *
* ** ** **
2
0
TS543 TS603
Figure 6. Recruitment of Host ECs by WNT5A-Mediated GdECs Contributes to GSCs Self-Renewal and Proliferation
(A) Representative images of IF analysis for GdECs (green arrows), compared with tumor cells (red arrows), are in close proximity to mouse ECs (white arrow) in
tumor sections. Scale bar, 10 mm.
(B) Dotplots show the distance from mouse ECs to the nearest tumor cells and GdECs, respectively (n R 15).
(C) Illustration of the transwell system to measure EC recruitment.
(D) Fluorescence intensity shows HBMECs recruitment after co-culture with GdECs for 24 hr (n R 3).
(E) qRT-PCR for CD144 and WNT5A mRNA levels in sorted pCD144-GFP– and pCD144-GFP+ from TS543-WNT5A and TS603 (n = 3).
(F) Fluorescence intensity shows HBMECs recruitment after co-culture with NSC media containing rWNT5A (0.5 mg/ml) or rWNT3A (0.05 mg/ml) (n = 3).
(G) Representative images of GdECs (green arrows) and mouse ECs (white arrows) in variously sized satellite lesions. Scale bar, 20 mm.
(H) Neurosphere formation of TS543 or TS603 co-cultured with GdECs and HBMECs (n = 3). Cartoon depicting the experimental approach. Error bars represent
SD of the mean; *p < 0.05 and **p < 0.01.
See also Figure S6.
GBM sections, pCD144-GFP+ GdECs were consistently in close GSCs in the presence of GdEC + HBMEC co-cultures. Strikingly,
proximity to host ECs (CD34+/TRA-1-85–) in the peritumoral sat- only GdEC/HBMEC co-cultures, but not GdEC or HBMEC cul-
ellite lesions; and the larger satellite lesions possessed greater tures, increased sphere formation of GSC TS543 and TS603
numbers of GdECs and mouse host ECs (Figure 6G). Addition- (Figures 6H and S6E). These co-cultures also increased soft
ally, GCV-mediated depletion of GdECs resulted in diminished agar colony formation of TS543 and TS603 (Figures S6F and
satellite lesion formation (Figure 5H), although individual SOX2 S6G). These observations gain added significance in light of
positive GSCs were still present throughout peritumoral area emerging evidence for the crucial role of ECs in NSC/GSC niche
(data not shown). These observations suggest that GdECs are formation that supports NSC/GSC growth and survival (Calabr-
required for the maintenance and expansion of the peritumoral ese et al., 2007; Shen et al., 2004; Zhu et al., 2011). Together,
satellite lesions, prompting us to speculate that GdECs recruit these observations support our model that GSC differentiation
host ECs, which may act synergistically to provide a microenvi- into GdEC stimulates host EC recruitment via WNT5A to create
ronment that supports the growth and survival of GSCs in these a vascular-like niche supporting GSC growth and survival,
peritumoral areas. To test this hypothesis, we audited tumor thereby promoting tumor cells growth beyond the primary tumor
sphere formation to check proliferation and self-renewal of microenvironment.
120
DAPI/SOX2 DAPI/CD31 SOX2/CD31 Merge
Wilcoxon rank
Raw Image
100
Distance from Host EC
(SOX2-/CD31+) (μm)
test p< 2.2e-16
80
DAPI/SOX2 DAPI /CD105 SOX2/CD105 Merge
60
40
Score Map
20
DAPI/CD133 DAPI/CD31 CD133/CD31 Merge
0
GSC GdEC
(SOX2+/CD31-) (SOX2+/CD31+)
D E G
2000
F
SOX2/CD31/Hematoxylin
500
1.0
0.8
Primary
p=0.0059
Paired Peritumor
0.3
0.6
P52
Case #9
P54
0.4
P42
0.2
0.4
0.2
P56
Recurrent
0.1
0.2
u 0.0
Median Survival
r
or
In mo
8.4 mo
m
tu
rit
4.9 mo
tra
0.0
0.0
Pe
4000
Higher in Rec
GdEC Sig. Score in Recurrent Tumors
10
4000
2000
between pri/rec Pairs
Wilcoxon rank
GdEC Staining Index (%)
0
test p=5.5e-07
-0.5 0.0
0
2000
-1500
Primary Recurrent Normalized mRNA Level of WNT5A in Recurrent Tumors Fisher exact test p=0.003, odds ratio=3.884
Figure 7. Correlation of WNT5A-Mediated GdEC with Peritumoral Satellite Lesion and Tumor Recurrence in GBM Patients
(A) Representative images of GdECs (yellow arrows) defined using indicated EC and GSC markers. White arrows denote host ECs. Scale bar, 20 mm.
(B) Representative images with IHC double-staining and cell segmentation obtained from Caliper InForm analysis software show the close proximity of GdEC
(SOX2+/CD31+, yellow) and host ECs (SOX2–/CD31+, green) compared with GSCs (SOX2+/CD31–, red) in tumor sections. SOX2–/CD31– cells are marked in blue
color. Scale bar, 20 mm.
(C) Boxplot of distances from host ECs to the nearest GSCs and GdECs, respectively (n = 300).
(D) The correlation between WNT5A mRNA expression and GdEC signature score. n = 364 (IDHwt GBMs); mRNA expression was normalized across genes.
(E) Representative image of H&E staining for intratumoral and peritumoral regions (black dashed line) of GBM patient’s sample. Black arrows denote peritumoral
satellite. Scale bar, 200 mm.
(F) Representative images for GdECs (black arrows) and host ECs (red arrows) in variously sized satellite lesions in IHC double-staining tumor sections. Scale bar,
25 mm.
(G) Fourteen patients’ primary tumors were divided by WNT5A staining index into two groups (low and high). Tumor sections with peritumoral satellite lesions
(more than ten) were counted as the highest score. *p = 0.04 by the log-rank test for PFS between two groups, HR = 3.45 (high versus low).
(H) Comparison of WNT5A mRNA expression between nine pairs of intratumor and peritumor regions from GBM patients. Each dot in the scatterplot represents a
pair. Boxplot summarizes the distribution of WNT5A expression in nine intratumor and peritumor regions, respectively.
(I) TCGA GBMs (IDHwt, n = 228) were used for PFS analysis. Red and blue lines show survival curves of top 20% of GBMs with highest and lowest WNT5A mRNA
expression, respectively.
(J) Representative images for WNT5A (brown) and CD31 (red) staining of paired primary/recurrent tumors from one GBM patient. Scale bar, 25 mm.
(K) Unbiased quantification of GdEC frequency in primary and recurrent GBMs (n = 150).
(L) Correlation between WNT5A expression and GdEC signature scores in recurrent GBMs. Small boxplot panel shows all 81 pairs while the big boxplot panel
shows the majority of samples.
(M) Association of differences of WNT5A mRNA expression and GdEC signature score between 81 matched primary/recurrent GBMs pairs. Each circle in the
scatterplot represents a GBM pair; mRNA expression was normalized across genes.
See also Figure S7 and Tables S2, S4, S5, and S6.
SUPPLEMENTAL INFORMATION Brennan, C.W., Verhaak, R.G., McKenna, A., Campos, B., Noushmehr, H.,
Salama, S.R., Zheng, S., Chakravarty, D., Sanborn, J.Z., Berman, S.H.,
Supplemental Information includes seven figures and seven tables and can be et al.; TCGA Research Network (2013). The somatic genomic landscape of
found with this article online at http://dx.doi.org/10.1016/j.cell.2016.10.039. glioblastoma. Cell 155, 462–477.
Calabrese, C., Poppleton, H., Kocak, M., Hogg, T.L., Fuller, C., Hamner, B.,
AUTHOR CONTRIBUTIONS Oh, E.Y., Gaber, M.W., Finklestein, D., Allen, M., et al. (2007). A perivascular
niche for brain tumor stem cells. Cancer Cell 11, 69–82.
B.H., Y.A.W., and R.A.D. designed the project and analyzed data; B.H. per- Cancer Genome Atlas Research Network (2008). Comprehensive genomic
formed the experiments; Q.W. performed bioinformatics analysis for ChIP- characterization defines human glioblastoma genes and core pathways.
seq, RNA-seq, DNA microarray, and clinical datasets; S.H. performed ChIP- Nature 455, 1061–1068.
seq, DNA microarray, and data analysis; R.G.W.V., Y.Z., and J.Z. provided Ceccarelli, M., Barthel, F.P., Malta, T.M., Sabedot, T.S., Salama, S.R., Murray,
assistance for TCGA data analysis; C.-E.G.S., D.O., M.M.M., P.D., Y.W.H., B.A., Morozova, O., Newton, Y., Radenbaugh, A., Pagnotta, S.M., et al.; TCGA
G.W., Z.T., H.Y., and W.-T.L. provided assistance in cell-culture and molecular Research Network (2016). Molecular profiling reveals biologically discrete sub-
biochemical experiments; Q.C. provided assistance in image capture; Q.W., sets and pathways of progression in diffuse glioma. Cell 164, 550–563.
J.Z., Y.Y., N.L., and L.C. provided assistance for analysis of GBM paired sam-
Chen, J., Li, Y., Yu, T.S., McKay, R.M., Burns, D.K., Kernie, S.G., and Parada,
ples; Z.D.L., G.N.F., J.J.P., and M.S.B. provided TCGA GBM biospecimens;
L.F. (2012). A restricted cell population propagates glioblastoma growth after
E.P.S., G.N.F., and L.J.C. provided assistance for pathological analysis on hu-
chemotherapy. Nature 488, 522–526.
man GBM samples; L.C. provided intellectual contribution and designed early
study. S.H., C.-E.G.S., D.O., X.L., J.H., and D.J.S. provided critical intellectual Cheng, C.W., Yeh, J.C., Fan, T.P., Smith, S.K., and Charnock-Jones, D.S.
contributions throughout the project; B.H., Y.A.W., and R.A.D. wrote the (2008). Wnt5a-mediated non-canonical Wnt signalling regulates human endo-
manuscript. thelial cell proliferation and migration. Biochem. Biophys. Res. Commun. 365,
285–290.
ACKNOWLEDGMENTS Cheng, L., Huang, Z., Zhou, W., Wu, Q., Donnola, S., Liu, J.K., Fang, X., Sloan,
A.E., Mao, Y., Lathia, J.D., et al. (2013). Glioblastoma stem cells generate vascular
The authors thank Dr. Raghu Kalluri for critical reading and comments; Dr. pericytes to support vessel function and tumor growth. Cell 153, 139–152.
Keith L. Ligon for initial assistance with histopathological analysis and de Groot, J.F., Fuller, G., Kumar, A.J., Piao, Y., Eterovic, K., Ji, Y., and Conrad,
providing GSCs lines; Drs. Colin Watts, Andrea Sottoriva, and Sara G.M. Pic- C.A. (2010). Tumor invasion after treatment of glioblastoma with bevacizumab:
cirillo for providing detailed information about their published datasets of gene Radiographic and pathologic correlation in humans and mice. Neuro-oncol.
expression profile; Verlene K. Henry and her staff for their help in mouse brain 12, 233–242.
implantation; Keith A. Michel and Charles V. Kingsley for assistance with MRI
Dunn, G.P., Rinne, M.L., Wykosky, J., Genovese, G., Quayle, S.N., Dunn, I.F.,
imaging and analysis; Shan Jiang for excellent mouse husbandry and care; Dr.
Agarwalla, P.K., Chheda, M.G., Campos, B., Wang, A., et al. (2012). Emerging
Jared K. Burks for assistance with confocal image and PE Vectra system; Dr.
insights into the molecular and cellular basis of glioblastoma. Genes Dev. 26,
Karen C. Dwyer and her staff for assistance with flow cytometer; Sequencing &
756–784.
Non-Coding RNA program and Sequencing and Microarray Facility at MDACC
provided sequencing service. This research is supported by UCSF Brain Ferrara, N., Hillan, K.J., Gerber, H.P., and Novotny, W. (2004). Discovery and
Tumor SPORE Tissue Bank P50 CA097257 (J.J.P.), NIH 2P50CA127001 development of bevacizumab, an anti-VEGF antibody for treating cancer.
(YAW), 5P01CA095616 (R.A.D. and L.C.), the Ben and Catherine Ivy Founda- Nat. Rev. Drug Discov. 3, 391–400.
tion Research Award (2009, R.A.D. and L.C.), and Clayton Foundation Folkins, C., Shaked, Y., Man, S., Tang, T., Lee, C.R., Zhu, Z., Hoffman, R.M.,
(RAD). The core facilities are supported by P30CA16672. and Kerbel, R.S. (2009). Glioma tumor stem-like cells promote tumor
Further information and requests for reagents may be directed to, and will be fulfilled by the corresponding author Ronald A. DePinho
(rdepinho@mdanderson.org).
METHOD DETAILS
Anchorage-Independent Growth Assays, Transwell Assay and Matrigel-based Tube Formation Assay
Anchorage-independent growth assays were performed in triplicate in 6-well plates or in 48-well plates. Indicated cells (2 3 104 or
1 3 103 per well) were seeded in NSC proliferation media with EGF and bFGF containing 0.4% low-melting agarose on the top of
bottom agar containing 1% low-melting agarose NSC proliferation media with EGF and FGF. After 14 – 21 days, colonies were
stained with Iodonitrotetrazolium chloride (Sigma) and counted.
Transwell assays were performed in BD FluoroBlok 96-multiwell insert systems (3.0 mm pore sizes) as per manufacturer’s pro-
tocol (BD biosciences). HBMECs were seeded in transwell inserts at 1 3 104 cells/ well in EC media overnight. After 4 hr starvation
in EC basal media at 37 C, 5% CO2 incubator, the inserts were transferred into the basal chambers containing chemoattractant in
NSC media as indicated. After 24 hr incubation, the inserts were transferred into a second 96-well plate containing 4 mg/mL
Calcein AM (BD biosciences) in DPBS. Incubate for 1 hr at 37 C, 5% CO2, fluorescence of invaded cells was read at wavelengths
of 494/517 nm (Ex/Em) on fluorescent plate reader. Neurosphere formation was performed by transwell assay in 24-well plate by
culturing sorted GdECs or non-GdECs with HBMECs (1 3 104 of indicated cells) in transwell inserts containing NSC media, and
GSC being cultured in basal chamber at 1 cell per microliter (500 ml/well) in NSC media. GSC neurospheres were counted after
7 days.
EC tubular formation was assessed by growth factor reduced Matrigel assay kit (BD Biosciences) in three-dimensional (3D) culture
according to the manufacturer’s instructions. The CD133+/CD144+ cells sorted from p53DN-AKT-hNSCs were infected by lentivirus
carrying shRNA targeting the indicated genes (Figures 3C and 3D) or were treated with BOX5 (100mM) (Figures 3F and 3G). Cells were
harvested at 48 hr post-infection or treatment and then were cultured in growth factor reduced Matrigel. Quantification was per-
formed after 8-12 hr. To quantify the tubular formation, branch points (3 or more tubular branches emanating from a point) were
analyzed with an inverted microscope at 40x magnification and counted in 5 random fields per well.
Identification of Histone H3K27 Status Switch Genes and AKT Activation Signature Genes
Genomic regions within 2 kilobases upstream and downstream of gene transcriptional start sites (TSSs) were examined for histone
modification peaks based on Model-based Analysis of ChIP-Seq (MACS). Histone H3K27 status switch genes were identified as a
group of genes with dynamic histone modification changes of H3K27me3 and H3K27ac in iGSCs compared with hNSCs. AKT acti-
vation signature genes (417) were identified based on gene expression profile comparison: at least 2-fold changes for 3 independent
tumor spheres lines derived from p53DN-AKT-hNSCs (iGSC-1, iGSC-2, and iGSC-3) versus hNSCs; two independent cell lines for
p53DN-AKT-hNSCs (different levels of AKT activation) versus hNSCs; one line for p53DN-AKT-hNSCs (higher AKT levels) versus the
other line for p53DN-AKT-hNSCs (lower AKT levels).
For quantification of microvessel density (MVD), images of tumor sections with IF or IHC staining were captured by using the digital
slide scanner, Pannoramic 250 Flash II. Measurement was performed in a single area of intratumoral or peritumoral tumor
(0.178 mm2 in Pannoramic view) representative of the highest microvessel density (‘‘hot spot’’). The CD34 positive cells or micro-
vessels were counted. Five fields in each tumor were randomly selected for MVD analysis and statistical analysis was performed by
using Welch’s t test of Graphpad Prism6.
Quantification of GdECs by co-localization analysis using Caliper Vectra Image System and InForm software. Briefly, the IF or IHC
(double staining-Wrap red and DAB) stained slides were loaded onto the Vectra slide scanner. Vectra Nuance 3.0.0 software was
used to build the spectral libraries using 1 single chromogen only (e.g., DAPI, AlexaFluor-488, AlexaFluor-594, DAB, Wrap red, he-
matoxylin). Nuance multispectral image cubes were acquired with 20 3 objective lens (0.5 micron/pixel) and using a full CCD frame at
1 3 1 binning (1360 3 1024 pixels) for analysis. For GdECs in IF stained xenograft tumors (Figure 5D), at least 3 image fields from 3
Data Resources
The gene expression profile by microarray and the histone landscape by ChIP-Seq in this paper have been deposited in NCBI GEO:
GSE85615 and GSE86624.
A B
1.0
13.5 mo 4.5 mo
rep2
Total S6 S6_pS235/236 S6_pS240/240
rep3
0.4
0.2
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
Overall Survival (months) Overall Survival (months) Overall Survival (months)
C E 60
H3K27me3 D iGSC-1 iGSC-2 iGSC-3
mean of score
0
-2 -1 0 1 2
75 (kb)
Survival (%)
5’ TSS 3’
60
mean of score
H3K27ac
50
NSC Media
40 iGSC
25 20
hNSC
Weeks 0
0 -2 -1 0 1
(kb) 2
0 10 20 30 40 50 5’ TSS 3’
NesƟn1/DAPI
Tuj1/GFAP/DAPI
+1%FBS
FSC-A
G 5
Relative mRNA Level
**
2 *
I Tumor-1 Tumor-2
1
DAPI/CD34/TRA-1-85
0
CD144 VEGFR2 CD31 TIE2 vWF CD105
H NSC media EC media
Dil-AcLDL/DAPI
DAPI/CD31/TRA-1-85
Figure S1. Characterization of EC Phenotypes in Tumor Neurospheres Derived from p53DN-AKT-hNSCs, Related to Figure 1
(A) Overall survival relative to the levels of AKT pathway activation in TCGA GBM cohorts. TCGA GBM samples with IDH wild-type, TP53 mutations, proteomic
datasets (RPPA) and clinical data were divided into two groups according to the indicated protein levels by optimal cutoff. Patient survival relative to the levels of
total AKT, AKT-pT308, AKT-pS473, S6, S6-pS235/236, and S6-pS240/244.
(B) Images for soft agar colony formation assay in 6-well plates showing transformation of hNSCs expressing p53DN, p53DN, and myr-AKT (p53DN-AKT).
DMSO
CD144
RAPA
CD31
B Dil-AcLDL/DAPI E F
CD105 VEGFR2 vWF
4022
HUVEC
HUVEC
HUVEC
CD133-/CD144+
D Phase contrast Dil-AcLDL/DAPI
CD133+/CD144-
-
CD133-/CD144+
900
CD133+/CD144+
EC Signature Score
DAPI/eNOS/VEGFR2
CD133+/CD144+
CD133+/CD144+
CD133+/CD144-
CD133+/CD144-
800
G
TS543 TS576 TS586
CD133-/CD144-
Figure S2. AKT Activation Induces Endothelial Lineage Differentiation of hNSCs, Related to Figure 2
(A) IF staining of CD144 and CD31 in hNSCs expressing empty vector (Vector), p53DN transduced hNSCs (p53DN) and p53DN-AKT transduced hNSCs (p53DN-
AKT), Scale bars, 50 mm.
(B) Immunofluorescence analysis of HUVECs cultured with EC media for EC marker (CD105, VEGFR2, and vWF) expression and functional uptake of DiI-AcLDL.
Scale bar, 40 mm.
(C) IF staining of EC marker (CD105 and VEGFR2) expression and DiI-AcLDL uptake with Rapamycin (RAPA) treatment (50 nM) in sorted CD133+/CD144+ cells
from p53DN-AKT-hNSCs. Scale bars, 50 mm.
(D) Representative images showing tubular network formation and DiI-AcLDL uptake of sorted CD133+/CD144+ cells from p53DN-AKT-hNSCs under EC culture
conditions. Scale bars, 100 mm (top) and 50 mm (bottom).
(E) EC signature scores were calculated using the gene expression profiles of HUVEC (GSE20986) and the p53DN-AKT-hNSCs sorted cell fractions, CD133-/
CD144-, CD133+/CD144-, CD133+/CD144+, and CD133-/CD144+.
(F) Immunofluorescence analysis of sorted subpopulations from p53DN-AKT-hNSCs cultured with EC media for 3 days for EC marker (VEGFR2 and eNOS)
expression. Scale bar, 40 mm.
(G) Representative images showing the formation of the tubular network on matrigel of patient-derived GSCs (TS543, TS576, TS586, TS603, BT112 and BT147)
under EC culture conditions. Scale bars, 100 mm.
A Relative mRNA Level of WNT5A B C
4
myr-AKT OE WNT5A OE
KT
3
A
N-
- + - + BOX5
N
3D
3D
RL
p5
p5
CT
2 p-CaMKII
pAKT
myr-AKT OE
- + BOX5
CD133-PE
WNT5A OE
Total CaMKII
Actin
0.0 0.1 0.2
CD144-FITC
Figure S3. AKT Upregulates WNT5A in EC Lineage Differentiation of hNSCs, Related to Figure 3
(A) qRT-PCR and (B) Immunoblotting analyses of WNT5A expression in hNSCs, p53DN-hNSCs and p53DN-AKT-hNSCs.
(C) Immunoblots showing WNT5A/CaMKII pathway in BOX5 (100uM) treated p53DN-hNSCs with overexpressed myr-AKT or WNT5A.
(D) Representative FACS showing the percentage of CD133+/CD144+ cells in p53DN- hNSCs that overexpress myr-AKT or WNT5A under treatment with WNT5A
antagonist BOX5 (50 mM) for 72 hr.
(E) Immunoblots showing WNT5A/CaMKII pathway in CD133+/CD144+ cells sorted from p53DN-AKT-hNSCs with BOX5 treatment (100uM).
A
Acetyl-a-Tubulin-Lys40
S6_pS235_S236
S6_pS240_S244
p70S6K_pT389
mTOR_pS2448
Rictor_pT1135
- Non-Sig.
Transglutaminase
Rb_pS807_S811
PI3K-p110-alpha
EGFR_pY1068
EGFR_pY1173
PARP_cleaved
HER3_pY1289
HER2_pY1248
PEA15_pS116
AMPK_pT172
Correlation
C-Raf_pS338
* p<0.01
Chk1_pS345
beta-Catenin
Annexin_VII
14-3-3_zeta
N-Cadherin
Src_pY416
Caspase-8
Cyclin_D1
Cyclin_B1
Heregulin
ER-alpha
Bap1-c-4
VEGFR2
** p<0.001
p90RSK
ARID1A
INPP4B
p70S6K
IGFBP2
GAPDH
4E-BP1
ERCC1
MYH11
GATA3
PREX1
Paxillin
Smad1
Notch1
PEA15
eEF2K
Raptor
53BP1
Rad51
Rab25
mTOR
-0.6 0 0.6
N-Ras
MIG-6
eIF4G
EGFR
FASN
HER2
eIF4E
GAB2
TFRC
ERK2
XBP1
TSC1
c-Met
ARHI
Chk1
Bcl-2
IRS1
Snail
SCD
DJ-1
VHL
SF2 *** p<0.0001
Bax
Syk
p53
Bid
Akt
WNT5A Odds Ratio Sig.
E hNSC
2kb (hg19) chr3:193,859,000 F _
27.2 _
iGSC
16.7 _ hNSC
hNSC
_
_
iGSC
55
57.5 _ hNSC _
hNSC hNSC
H3K4me3 28.2 _ H3K4me3
H3K4me3 33.9 iGSC
_
_
iGSC iGSC
48.7 _
_
49.2 _ hNSC
hNSC hNSC
H3K4me1,2 35.6 _ H3K4me1,2 _
H3K4me1,2 32.2 _
iGSC iGSC
iGSC
TCF4 PAX4
HES1
H 30
CD133+/CD144-
I iGSC TS603 BT147
J iGSC TS543 TS576
Relative mRNA Level
CD133+/CD144+
20
- + - + - + OE PAX6 - + - + - + OE DLX5
4
PAX6 DLX5
3
2 WNT5A WNT5A
1
Actin Actin
0
A
4
X6
14
T5
PA
D
N
C
SOX2
TCF4 S X2
SOX2
PAX6 TCF4
TCF4
CF
F4
F4
FOXG1 PAX6
P
PAX
AX66
FO
FOXG1
OX
OX
DLX5
Core Transcriptional Networks in NSC Class 3. Endothelial cells Core Transcriptional Networks in NSC
Repressed Class 3.
Endoth
Endothelial
helial cells
c (WNT5A signaling, etc) Repressed
Lineages
(WNT5A signaling, etc) Class 2. Poised Lineages
eages Lineages
Class 2. Poised Lineages
Neurons Neurons
Oligodendrocyte
Oligodendrocytes
t Astrocytes Oligodendrocytes Astrocytes
B Vector WNT5A OE
C Tumor volume (mm^3) 500 p=0.0397 D Vector (n=5) WNT5A OE (n=5) F 150
1 00
Survival (%)
MVD by CD34
400
80
300 100
60
p<0.001 p<0.0001
200 40
20 50
100
0
0 0 10 20 30 40
Vector WNT5A OE Days after Implantation 0
Vector WNT5A OE
CD31 vWF
E G Vector WNT5A OE H 60
Satellites in peritumoral
region(mm^2)
40 p<0.0001
Vector
20
0
Vector WNT5A OE
WNT5A OE
HUVEC TS543
J Vector WNT5A OE
pCD144-GFP White Field
Vector WNT5A OE
K pCD144-GFP+ pCD144-GFP+
0.61% 6.98%
FL-1
FSC-H
L pCD144-GFP / DAPI TUNEL/ DAPI Merge M N GCV(-) GCV(+)
GCV (-) (n=5) GCV (+) (n=5)
GCV(-)
1 00
Survi val (%)
DAPI/TRA-1-85
75
50 p=0.012
GCV(+)
25
0
0 10 20 30 40 50
Days after Implantation
p=1.5e-05
MVD (CD34/Field)
20
15
10
0
Low frequency of High frequency of
GdECs GdECs
C
P1 P2 P3 P4
DAPI / TRA -1- 85 / pCD144-GFP /CD34
D **
E pCD144-GFP- pCD144-GFP+
pCD144-GFP+
/HBMECs HBMECs
30 *
Cell Number (10e+03)
25
TS543
20
15
10
TS603
0
CTRL rWNT5A
0 hour 72 hours
F G 80
Soft Agar Colony Formation
60
TS603
Soft Agar Colony Formation
70
TS543
50 60
50
40
40
30
30
20 20
10 10
0
0
- + - - - - + -
pCD144-GFP- - + - - - - + - pCD144-GFP-
+ - - + - + - -
+ - - + - + - - pCD144-GFP+
pCD144-GFP+
+ + + + + - - -
GSC + + + + + - - - GSC
HBMEC - - + + - - - +
HBMEC - - + + - - - +
Figure S6. WNT5A-Mediated GdECs Recruit Existing ECs for GSC Growth, Related to Figure 6
(A) Representative images showing the density of existing endothelial cells (TRA-1-85-/CD34+) and GdECs (pCD144-GFP+, green arrows) in the peritumoral
areas. Scale bars, 50 mm.
(B) Boxplots show the CD34-based MVD analyzed in peritumoral areas with low (less than 5%) and high (more than 5%) frequency of GdECs (n = 3 tumors, 5 fields
per tumor).
(C) Representative images show the distance between mouse endothelial cells (TRA-1-85-/CD34+, white arrows) and the nearest GdECs (pCD144-GFP+, green
arrows)/tumor cells (TRA-1-85+/GFP-, red arrows) in multiple peritumoral areas (P1-P4). Scale bar, 25 mm.
(D) The number of HBMECs was counted after 72 hr treated with/without rWNT5A at 0.5 mg/ml in serum-free EC media. Error bars represent SD of the mean, n = 3;
*p < 0.05, **p < 0.01.
(E) Representative images showing neurosphere formation of TS543 and TS603 co-cultured with GdECs and/or HBMECs in transwell for 7 days. Scale bars,
200 mm.
Soft agar colony formation assay in 48-well plate showing anchorage-independent growth capability of GSC co-culturing with GdECs and HBMECs in TS543 (F)
and TS603 (G) Error bars represent SD of the mean for 5 wells.
A B C D G
2000
Normalized GdEC Signature Score
6
10 15 P4 P41
p=8.9e-8 p=0.0002 p<0.0001
EC Signature Score
Relative mRNA Level
9
8 P57
Paired Peritumor
P49 p=0.00039
4
10 P55
6
6
8
P52
1000
4
P42
4
7
2
5 P54
0
2 Spearman Correlation Test
6
or
In mo
rho=0.185, p-value=0.0004
m
P56
tu
rit
tra
Pe
0
0
0 0
5
IDHwt GBMs non-turmor Low,n=6 High,n=6 Low High -0.5 0.0 0.5 1.0 1.5 2.0 0 2 4 6
n=364 n=10 WNT5A WNT5A Normalized WNT5A mRNA Expression Paired Intratumor
E SOX2/CD31/Hematoxylin F WNT5A/CD31/Hematoxylin H I
400
Tumor Vessel High WNT5A (case #6) Low WNT5A (case #7)
0.4
Normalized WNT5A Expression
test p=6.09e-05 test p=4.25e-08
Intratumoral
Raw Image
200
0.2
Score Map
Peritumoral
0.0
0
CE NE CE NE
J WNT5A/CD31/Hematoxylin L M
1500
Primary Recurrent 4000
Higher in Rec
4000
2000
1000
0
Case #1
500
2000
0
-500
-1000
0
Case #14
10 GSC-derived EC
(GdEC)
20 Glioblastoma stem
cell (GSC)
5 non-Glioblastoma
10 stem cell
Primary tumor Satellite lesion Recurrent tumor Endothelial lineage
differentiation of GSC
0 0 x WNT5A-mediated
x
recruitment
x
4)
4)
)
14
Depletion of GdEC
14
=1
=1
n=
n=
i(n
i(n
x
c(
Blocking recruitment
c(
Pr
Pr
Re
Re
Figure S7. WNT5A and GdECs Are Strongly Correlated with Tumor Recurrence in Human GBMs, Related to Figure 7
(A) WNT5A mRNA expression in TCGA IDHwt GBM tumors compared to non-tumor brain tissues. Gene expression was normalized by RMA and p value was
calculated by Wilcoxon Rank test.
(B) Two groups, Low WNT5A (n = 6) and High WNT5A (n = 6), show the average of WNT5A mRNA level for 12 fresh GBM specimens (IDHwt) from TCGA.
(legend continued on next page)
(C) The quantitation of GdEC (CD105+/SOX2+) percentage in 12 tumors from Low and High WNT5A groups. The p value was calculated by unpaired Student’s t
test in two groups.
(D) Correlation between WNT5A mRNA expression and EC signature score (n = 364 IDHwt); mRNA expression was normalized across genes.
(E) Identification of GdECs in tumor vessels by an automated quantitative pathology imaging system. Representative images with IHC double-staining and cell
segmentation obtained from Caliper InForm analysis software show tumor vessels with close proximity of GdEC (SOX2+/CD31+, yellow) and host ECs (SOX2-/
CD31+, green) in GBM patient specimens. SOX2+/CD31- cells are marked in red color and SOX2- /CD31- cells are marked in blue color. Scale bars, 20 mm.
(F) Representative IHC images show WNT5A and CD31 staining in the primary tumors of two patients with peritumoral satellite lesions. Scale bars, 25 mm (top
panel); 50 mm (bottom panel).
(G) Comparison of GdEC signature score between 9 pairs of intratumor and peritumor regions from GBM patients. Each dot represents a pair. Boxplot sum-
marizes the distribution of GdEC signature score in 9 intratumor and peritumor regions, respectively.
(H and I) Boxplots showing WNT5A expression and GdEC signature score in 39 samples from contrast-enhancing (CE) regions and 36 samples from non-
enhancing (NE) regions from 27 different glioma patients.
(J) Representative double-stained IHC images show WNT5A and CD31 staining in paired primary and recurrent GBM from 2 patients. Scale bars, 25 mm.
(K) Quantification of WNT5A and CD31 staining index in 14 paired primary and recurrent GBMs. The p values were calculated by Wilcoxon signed-rank test.
(L) Correlation between WNT5A expression and GdEC signature scores in primary GBMs. Boxplot inset shows all the 81 pairs, while large boxplot panel shows
the majority of samples (n = 69).
(M) Association of differences of WNT5A mRNA expression and EC signature score between 81 matched primary/ recurrent GBMs pairs. Each circle represents a
GBM pair. The mRNA expression was normalized across genes.
(N) Cartoon showing the model for GSC-EC differentiation and recruitment contributing to satellite lesions formation and tumor recurrence. It may be possible to
block tumor recurrence by targeting this process.
Article
Correspondence
kateri.moore@mssm.edu
In Brief
HSCs count and remember the number of
times they have divided to limit their cell
divisions, a mechanism that may underlie
many phenomena associated with HSC
aging.
Highlights
d A rare population of dormant LR-HSCs persists throughout
adult life
Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1496, New York, NY 10029, USA
4Mathematical Sciences
5Centre for Human Development, Stem Cells and Regeneration, Faculty of Medicine
6Institute for Life Sciences
*Correspondence: kateri.moore@mssm.edu
http://dx.doi.org/10.1016/j.cell.2016.10.022
1296 Cell 167, 1296–1309, November 17, 2016 Published by Elsevier Inc.
Figure 1. LR-HSCs Persist in BM throughout Life and Contain All LT-HSC Activity in Aging BM
(A) Schematic of long-term dox treatments. Two- to 4-month-old 34/H2BGFP mice were placed on dox for periods ranging from 3–22 months. At the end of dox,
chase BM was analyzed for the presence of LR-HSCs.
(B) Histogram of the LSKCD48-Flk2–CD150+ HSC compartment before and after 12-month dox chase. LR-HSCs were determined by gating above the back-
ground GFP levels of single transgenic TetO-H2BGFP HSCs.
(legend continued on next page)
(C) Time course of label dilution after initiation of dox chase; n = 2–15 mice per time point.
(D) Percent of HSCs that are label-retaining after 10–22 months of dox chase (sLR-HSCs). n = 42 mice from 12 independent experiments.
(E–L) HSC populations were sorted from 19-month-old mice chased with dox for 15 months into Total, GFPHi, and GFPLo HSC populations. 200 cells from each
population were competitively transplanted per mouse. (E) Gating strategy for Total, GFPHi, and GFPLo HSC fractions. (F and G) Blood chimerism of granulocytes
(F), and total white blood cells (G) during primary and secondary transplants. (H–J) Analysis of donor-derived stem and progenitor cell compartments in recipient
BM. Gating strategy (H) and quantification of donor-derived HSPCs in primary (I) and secondary (J) transplantations after 22 and 24 weeks, respectively. (K and L)
Lineage distribution of donor-derived peripheral blood in primary (K) and secondary (L) hosts at 22 and 24 weeks, respectively. n = 8–14 mice per group from two
independent experiments. Data are displayed as the mean ± SEM. *p < 0.05, **p < 0.01, ***p < 0.001 by Welch’s t test.
See also Figures S1 and S2 and Table S1.
(A) Reconstitution curves of donor-derived (CD45.2+) total white blood cells, myeloid, B cells, and T cells for each transplanted mouse through 24 weeks in
primary and secondary recipients. The transition from primary to secondary transplantation is marked by the x axis break. The horizontal line marks the threshold
of successful reconstitution. Secondary transplants are displayed as the mean ± SEM.
(B) Examples of the five reconstitution patterns observed. Definition of repopulation patterns: myeloid-restricted only repopulated myeloid cells; bipotent pro-
genitors gave rise to myeloid and B cells; ST-HSCs showed transient repopulation of all three lineages, with donor chimerism of at least one lineage dropping
below threshold by 24 weeks after primary transplantation; IT-HSCs repopulated all three lineages, but had at least one lineage drop below threshold by 24 weeks
after secondary transplantation; and LT-HSCs maintained repopulation in all three lineages above threshold throughout both primary and secondary
transplantation.
(C) Distribution of repopulating cell types found within each aging HSC compartment. The zoomed region represents 10% of the Total HSC compartment.
(D) Heatmaps displaying the regeneration of primitive BM populations by clonally transplanted aging HSCs after primary transplant. Transplanted cell populations
are listed above each column—initially sorted cell (left) and retrospectively categorized repopulating cell (right). Regenerated HSPC types are listed to the right.
The darker the chamber, the greater the proportion of reconstituted mice regenerated the given cell type. Numbers within each chamber represent the per-
centage in decimal format of reconstituted mice with each cell type.
See also Figure S5 and Table S2.
Figure 4. HSCs Count Symmetric Self-Renewal Divisions throughout Adult Life and Progress toward Dormancy
(A–C) Analysis of H2BGFP subpopulations within the GFPHi LR-HSC compartment. (A) Histogram displaying the H2BGFP peaks 0–4 visible within the LR-HSC
compartment of young (3–4 months on dox) and aging mice (14–22 months on dox). (B) Quantification of (A). n = 21 and 13 mice from six independent experiments
for young and aging mice, respectively. (C) Least-squares fitting of single cell GFP intensity data collected from LR-HSCs found within each GFP peak of young
mice. Observed experimental data are plotted as open circles, while predictions of a theoretical model in which H2BGFP concentration is reduced by a factor of 2
with each cell division is given by the dashed blue line. Experimental data were collected from 1,568 single LR-HSCs from six independent experiments.
(D) Percentage of LR-HSCs within the HSC compartment. n=19 and 13 from young and aging mice, respectively.
(E) Absolute number of LR-HSCs per long bone in young and aging mice. Predicted LR-HSC numbers were generated by extrapolating the expansion of each
young LR-HSC data point based on the distribution of cells found in peaks 0–4 for each mouse using the model in (F), then corrected based on the average
distribution of cells found in aging mice in (B). n = 17 and 10 mice from five and three independent experiments for young and aging mice, respectively.
(F) Symmetric self-renewal expansion model of LR-HSCs. As LR-HSCs slowly divide throughout adult life they transition from peak 0 to peak 4, symmetrically self-
renewing to double their numbers with each cell division. Arrows in the histogram depict the expansion capacity of cells as they progressively divide to reach
peak 4. Numbers displayed are the average numbers of LR-HSCs in each peak per long bone from 17 young mice in five independent experiments. Boxed in red is
the summation of LR-HSCs predicted to accumulate in peak 4 with aging.
(G) Mathematical modeling of cell-cycle progression as a function of divisional history within the LR-HSC compartment. Five models were considered (see the
STAR Methods for details). Displayed are representations of cell-cycle time progressions for each model (red dashed lines), as well as the experimentally
determined (open circles) and model-predicted sLR-HSC numbers (dashed blue curves) found in each GFP peak of aging mice. Cell-cycle times for the step
function and super-exponential models are actual times predicted by the model. As the constant, linear, and exponential models do not fit the data well, their
corresponding cell-cycle times are only visual representations.
(H and I) Distribution of LR-HSCs across each GFP peak (H), and quantification of LR-HSC absolute numbers after various lengths of dox chase (I). Legends refer
to the length of dox chase. Data are representations of two to six independent experiments per group.
(J) Cell-cycle analysis of GFP Peak cells in young (5 months old, 3 month dox chase, n = 3) and aging (11 months old, 9 month dox chase, n = 2) mice. Each mouse
represents an independent experiment. Data are displayed as the mean ± SEM. **p < 0.01, ***p < 0.001 by Welch’s t test.
See also Figure S6.
accurately documenting large numbers of cell divisions in vivo able regenerative potential (Foudi et al., 2009; Qiu et al., 2014;
have largely precluded observations of these phenomena. Wilson et al., 2008). We showed that label-retaining cells within
Here, we used H2BGFP label dilution to track HSC cell divisions the functionally heterogeneous LSK population could long-
accrued through the process of aging and investigate their term repopulate a mouse at a frequency of 1 in 2.9 cells (Qiu
impact on regenerative potential. et al., 2014). Here, we used label-retention to dissect the hetero-
geneity of the aging HSC compartment. We find that clonal sLR-
Divisional History and Heterogeneity of the Aging HSC HSCs function exclusively with IT- or LT-HSC potential. On the
Compartment other hand, the proliferative non-LR-HSCs contain a diverse
Previous work using H2BGFP label-retention suggests that the class of progenitors with reduced self-renewal and differentia-
fewer times HSCs cycle over time, the greater their transplant- tion potential. This non-LR-HSC population, being the product
but actually expands over time. This partially explains the obser- It remains uncertain whether the non-LR-HSC compartment is
vation that aging HSCs show impaired function upon transplan- directly derived from the LR-HSC compartment as a function of
tation due to the diminished frequency of LR-HSCs within the continuously accumulated proliferative history. As the aging
stem cell compartment, while also revealing how LT-HSCs in- non-LR-HSC compartment shows attenuated repopulation po-
crease with aging via increased absolute number of LR-HSCs tential and increased myeloid cell output, it would be consistent
within the whole BM. This compartment symmetrically self-re- with recent studies indicating that increased divisional history re-
news over time precluding contribution to homeostatic hemato- capitulates these hallmarks of aged HSCs (Beerman et al., 2013;
poiesis. The non-LR-HSC compartment also expands over time, Walter et al., 2015). However, it is not yet clear if the aging LR-
but to a greater extent, and contains cells with limited self- and non-LR-HSC compartments differ in other described phe-
renewal and differentiation capacity upon transplantation. These notypes of aged HSCs including the surrogate DNA damage
cells are likely to support homeostatic hematopoiesis. With ag- marker gH2AX foci or cdc42 localization.
ing, the non-LR-HSC compartment becomes dominated by It will be interesting to investigate the underlying molecular
CD41 expressing cells enriched for myeloid-restricted repopula- mechanisms responsible for this cellular memory. Several
tion potential and partially accounts for the increased propensity studies have tracked cell division numbers in Bacillus subtillis
of the compartment as a whole to produce greater myeloid cell sporulation, Drosophila spermatogenesis, and oligodendrocyte
output with age. precursor differentiation. These studies reveal that after several
Morrison, S.J., Wandycz, A.M., Akashi, K., Globerson, A., and Weissman, I.L. Si, K., Lindquist, S., and Kandel, E.R. (2003b). A neuronal isoform of the aplysia
(1996). The aging of hematopoietic stem cells. Nat. Med. 2, 1011–1016. CPEB has prion-like properties. Cell 115, 879–891.
Müller-Sieburg, C.E., Cho, R.H., Thoman, M., Adkins, B., and Sieburg, H.B. Sieburg, H.B., and Müller-Sieburg, C.E. (2004). Classification of short kinetics
(2002). Deterministic regulation of hematopoietic stem cell self-renewal and by shape. In Silico Biol. (Gedrukt) 4, 209–217.
differentiation. Blood 100, 1302–1309.
Muller-Sieburg, C.E., Cho, R.H., Karlsson, L., Huang, J.F., and Sieburg, H.B. Sudo, K., Ema, H., Morita, Y., and Nakauchi, H. (2000). Age-associated char-
(2004). Myeloid-biased hematopoietic stem cells have extensive self-renewal acteristics of murine hematopoietic stem cells. J. Exp. Med. 192, 1273–1280.
capacity but generate diminished lymphoid progeny with impaired IL-7 Sun, J., Ramos, A., Chapman, B., Johnnidis, J.B., Le, L., Ho, Y.J., Klein, A.,
responsiveness. Blood 103, 4111–4118. Hofmann, O., and Camargo, F.D. (2014). Clonal dynamics of native haemato-
Nakamura-Ishizu, A., Takizawa, H., and Suda, T. (2014). The analysis, roles poiesis. Nature 514, 322–327.
and regulation of quiescence in hematopoietic stem cells. Development 141,
van der Wath, R.C., Wilson, A., Laurenti, E., Trumpp, A., and Liò, P. (2009).
4656–4666.
Estimating dormant and active hematopoietic stem cell kinetics through exten-
Pang, W.W., Price, E.A., Sahoo, D., Beerman, I., Maloney, W.J., Rossi, D.J., sive modeling of bromodeoxyuridine label-retaining cell dynamics. PLoS ONE
Schrier, S.L., and Weissman, I.L. (2011). Human bone marrow hematopoietic 4, e6972.
stem cells are increased in frequency and myeloid-biased with age. Proc. Natl.
Acad. Sci. USA 108, 20012–20017. Walter, D., Lier, A., Geiselhart, A., Thalheimer, F.B., Huntscha, S., Sobotta,
M.C., Moehrle, B., Brocks, D., Bayindir, I., Kaschutnig, P., et al. (2015). Exit
Paul, F., Arkin, Y., Giladi, A., Jaitin, D.A., Kenigsberg, E., Keren-Shaul, H.,
from dormancy provokes DNA-damage-induced attrition in haematopoietic
Winter, D., Lara-Astiaso, D., Gury, M., Weiner, A., et al. (2015). Transcriptional
stem cells. Nature 520, 549–552.
heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–
1677. Wilson, A., Laurenti, E., Oser, G., van der Wath, R.C., Blanco-Bose, W., Jawor-
Perié, L., Duffy, K.R., Kok, L., de Boer, R.J., and Schumacher, T.N. (2015). The ski, M., Offner, S., Dunant, C.F., Eshkind, L., Bockamp, E., et al. (2008). He-
branching point in erythro-myeloid differentiation. Cell 163, 1655–1662. matopoietic stem cells reversibly switch from dormancy to self-renewal during
Pietras, E.M., Warr, M.R., and Passegué, E. (2011). Cell cycle regulation in he- homeostasis and repair. Cell 135, 1118–1129.
matopoietic stem cells. J. Cell Biol. 195, 709–720. Yamamoto, R., Morita, Y., Ooehara, J., Hamanaka, S., Onodera, M., Rudolph,
Qiu, J., Papatsenko, D., Niu, X., Schaniel, C., and Moore, K. (2014). Divisional K.L., Ema, H., and Nakauchi, H. (2013). Clonal analysis unveils self-renewing
history and hematopoietic stem cell function during homeostasis. Stem Cell lineage-restricted progenitors generated directly from hematopoietic stem
Reports 2, 473–490. cells. Cell 154, 1112–1126.
Information and requests for reagents may be directed to the lead contact Kateri Moore (kateri.moore@mssm.edu).
Tg(tetO-HIST1H2BJ/GFP)47Efu/J (TetO-H2BGFP), hCD34-tTA (hCD34), C57BL/6 (B6), and the congenic B6.SJL-Ptprca Pepcb/
BoyJ (SJL) mice were acquired and maintained as previously described (Qiu et al., 2014). Double transgenic mice 34/H2BGFP
mice were derived from crossbreeding the single transgenic TetO-H2BGFP and hCD34 mice. F1 mice from this cross were used
for all experiments, with the exception of cell cycle analysis in Figures S2I and S2J, which was performed on B6 BM. The F1 progeny
of crosses from TetO-H2BGFP and B6 mice were used for background GFP gating controls in all label-dilution experiments. Dox was
administered through the drinking water at 1 mg/ml to mice beginning between 2-4 months of age and changed twice weekly. Both
male and female mice were used in all experiments. Sample sizes for experiments were determined without formal power calcula-
tions. Animal experiments were approved by the Institutional Animal Care and Use Committee and conducted in accordance with the
Animal Welfare Act.
METHOD DETAILS
Transplantation Assays
HSCs were sorted from 34/H2BGFP mice (CD45.2) that were 19 months of age, and had been chased with dox for 15 or 17 months
into various populations based on label retention. Sorted cells from each population were injected retro-orbitally into lethally irradi-
ated SJL (CD45.1) mice (2 rounds of 550 rads, three hours apart) at a dosage of 200 sorted cells plus 1.3x105 cells of Lin/CD48/Flk2-
depleted competitor BM (CD45.1) per mouse. Mice were bled at timed intervals post transplantations from the retro-orbital venous
plexus, red blood cells were lysed, and the contribution of donor derived CD45.2+ cells were assessed for contribution to the B cell
(B220+), T cell (CD4/CD8+), and Myeloid (CD11b/Gr1+) lineages. Granulocytes were identified as SSCHiCD11b/Gr1+ cells. Secondary
transplants were performed at 24 weeks post primary transplant and 5x106 cells of pooled whole BM from each group were
transplanted into secondary hosts.
Cell-Cycle Analysis
Cells were stained and prepared as above, then fixed in 2% methanol-free paraformaldehyde diluted in PBS. Cells were then washed
three times with PBS containing 5% NCS, permeabilized in 0.2% Triton X-100, then stained with anti-Ki-67 (PE, eBioscience), and
DAPI prior to analysis. For cell cycle analysis of GFP peak cells, LSKCD48-Flk2-CD150+ cells were first enrichment sorted prior to
fixation, subsequent staining, and cell cycle analysis.
Mathematical Modeling
Assuming that GFP dilutes by a factor of 2 with each cell division, the relative positions of the GFP peaks in the LR-HSC population are
described by the following model:
y0 c
yn = + c;
2n
where yn is the mean fluorescence intensity at the nth peak and c is a constant that accounts for background fluorescence. The MFI of
GFP levels in HSCs of single transgenic TetO-H2BGFP mice was used to estimate c.
Expansion of LR-HSC numbers during aging was described by the following model:
dx0
= k0 x0 ;
dt
dx1
= 2k0 x0 k1 x1 ;
dt
dx2
= 2k1 x1 k2 x2 ;
dt
dx3
= 2k2 x2 k3 x3 ;
dt
dx4
= 2k3 x3 k4 x4 ;
dt
where xn ðtÞ is the expected number of cells per long bone that have divided n( = 0, 1, 2, 3, 4) times since dox chase, and lnð2Þ=kn is the
expected length of time to the next division for a cell that has previously divided n times. The above system is linear and may therefore
be directly integrated and easily compared with experimental data. To determine how divisional history affects cell cycle time we fit
the following functional forms for kn to experimental data using least-squares fitting.
To account for the gain of CD41 expression in the LR-HSC fraction during aging, we assume that CD41 negative LR-HSCs gain
CD41 expression with probability 1 a each time they divide. The dynamics of CD41 expression within the label-retaining fraction
are then described by the following model:
dx0 dy0
= k0 x0 ; = k0 y0 ;
dt dt
dx1 dy1
= 2ð1 aÞk0 x0 k1 x1 ; = 2k0 y0 + 2ak0 x0 k1 y1 ;
dt dt
dx2 dy2
= 2ð1 aÞk1 x1 k2 x2 ; = 2k1 y1 + 2ak1 x1 k2 y2 ;
dt dt
dx3 dy3
= 2ð1 aÞk2 x2 k3 x3 ; = 2k2 y2 + 2ak2 x2 k3 y3 ;
dt dt
dx4 dy4
= 2ð1 aÞk3 x3 k4 x4 ; = 2k3 y3 + 2ak3 x3 k4 y4 ;
dt dt
where xn ðtÞ and yn ðtÞ are the expected number of CD41- and CD41+ LR-HSCs cells per long bone that have divided
n ð = 0; 1; 2; 3; 4Þ times since dox chase. To minimize the number of free parameters we assumed that CD41 status does not alter
cell cycle progression and use model 5, above, to account for the onset of cellular quiescence. In this case the full model has three
free parameters ðk; b; and a).
Data are presented as mean ± SEM. The sample size for each experiment and the replicate number of experiments are included in the
figure legends. Statistical significance was determined by Welch’s t test, Paired t test, or One-Way ANOVA followed by test for linear
trend using GraphPad Prism 6 (GraphPad Software, La Jolla, CA). P values < 0.05 were considered significant. P values for each
experiment are included in associated figure legends.
Figure S1. Dynamic Range of the hCD34-tTA 3 TetO-H2BGFP System and Specificity of the hCD34 Promoter to a Subset of HSCs during
Adulthood, Related to Figure 1
(A) Dynamic range of the H2BGFP reporter system in the absence of dox chase. Vertical lines indicate one-half dilutions in fluorescence intensity of the H2BGFP
label, indicating a range of 7-8 H2BGFP dilutions prior to reaching background level.
(B) Schematic for examining H2BGFP loss over time in 34/H2BGFP animals without dox treatment.
(C) Histograms depicting H2BGFP level in the LSKCD48-CD150+ BM HSC compartment from mice of various ages that have never been exposed to dox. The
upper and lower range of GFPHi HSC frequency is displayed for each age group.
(D) Quantification of GFPHi HSC frequency from mice of various age groups never exposed to dox (n = 3-12 mice per group). Data are displayed as the mean ±
SEM. Statistical significance was assessed by one-way ANOVA followed by test for linear trend; **p < 0.01.
(E) Schematic for testing active H2BGFP labeling of the HSC compartment after dox release. Single transgenic hCD34 and H2BGFP mice were mated together to
produce double transgenic 34/H2BGFP mice that were born on dox. Progeny were raised on dox until 8 weeks (56 days) of age, at which point dox was removed.
BM was then collected at various time points after dox removal, and LSKCD48-CD150+ cells were analyzed for the presence of H2BGFP above background
levels.
(F) Time course kinetics of H2BGFP labeling after dox release. Data are displayed as the mean ± SEM (n = 3-5 mice per group from two independent experiments).
Figure S2. Leakiness of the hCD34-tTA 3 TetOH2BGFP System, Related to Figure 1
(A) Experimental setup. Single transgenic hCD34-tTA and TetO-H2BGFP mice were mated while exposed to dox through the drinking water. Pups born from
these matings were maintained on dox until adulthood, at which point BM was analyzed for the presence of H2BGFP expression above background levels.
(B) Histogram showing GFP levels of LSKCD48-CD150+ cells from BM of 34/H2BGFP mice born on dox.
(C) Modified experimental timeline. Mice born on dox were analyzed after a year of continuous dox treatment.
(D) Histograms of GFP levels from three 34/H2BGFP mice born and maintained on dox for 1 year, and three single transgenic TetO-H2BGFP mice (background).
(E) Quantification of the brightest GFP intensity from each mouse displayed in (D).
Data are displayed as the mean ± SEM.
Figure S3. Quantification of Young and Aging HSC Populations, and Cell-Cycle Analysis of HSCs Based on CD41 Expression, Related to
Figure 2
(A and B) Frequency (A) and absolute number (B) of HSCs in young and aging bone marrow. n = 10-17 mice per group.
(C and D) Frequencies (C) and absolute numbers (D) of various HSPC populations (I-III) in young and aging bone marrow. n = 6-7 mice per group.
(E and F) Frequency (E) and absolute number (F) of CD41+ HSCs in young and aging bone marrow. n = 6-10 mice per group.
(G and H) Frequencies (G) and absolute number (H) of HSC populations characterized based on CD41 expression and label retention in young and aging bone
marrow. n = 6-10 mice per group from 2-3 independent experiments.
(I and J) Representative images (I) and quantification (J) of CD41– and CD41+ HSC snapshot cell cycle profiles. n = 6 mice per group from two independent
experiments.
(K) Histograms displaying the H2BGFP label retention over time of CD41– and CD41+ HSCs. Histograms are representations of young mice chased with dox for
12 weeks.
(L) Quantification of H2BGFP label retention in (K). n = 9-11 mice per group from three independent experiments. Data are displayed as the mean ± SEM. *p <
0.05, **p < 0.01, ***p < 0.001 by Welch’s t test (quantifications), or paired Student t test (cell cycle).
Figure S4. Megakaryocyte Potential of HSC Compartment with Aging Based on Divisional History, Related to Figure 2
Single cells from the GFPHi, GFPLo, and Total HSC populations were sorted from young (5 months old, dox treated 3 months) and aging (11 months old, dox
treated 9 months) mice into wells of a 96 well plate and were cultured in the presence of SCF, IL-3, and Tpo.
(A–D) Images of representative colonies after 13 days in culture. Mixed cell colonies containing both small and large cells (A and B), small cell only colonies (C), and
large cell only colonies (D). Yellow arrows mark large megakaryocyte-like cells.
(E and F) Representative images of cytospun mixed (E) and small cell only colonies (F) stained with H&E. Only mixed colonies showed megakaryocytes with large
multi-lobed nuclei (black arrows). Large cell only colonies generated too few cells to be mounted on slides for staining.
(G) Quantification of colony types found from each sorted HSC population.
(H) Quantification of colony size at day 13 generated from each sorted HSC population. Data are displayed as the mean ± SEM of 64-130 single cells per group
from 4 independent experiments.
Figure S5. Synchronistic Repopulation Kinetics in Paired Secondary Transplantations, Related to Figure 3
Bone marrow from each mouse repopulated with 15 cells from aging HSC populations was transplanted into paired secondary hosts. Repopulation kinetics were
followed in both secondary recipients over 24 weeks to determine the degree of synchronicity of total white blood cell repopulation (%CD45.2+) in independent
hosts. We quantitatively defined the degree of synchronicity as the Hamming distance between pairs of time series.
(A) Repopulation curves grouped into 2 clusters based on the degree of synchronicity. The cluster boxed in gray contains curves with kinetics determined to be
synchronous, while the cluster boxed in red contains asynchronous kinetics for paired secondary hosts. The letter on the right side of each repopulation kinetic
indicates the retrospectively identified repopulating cell type. L, LT-HSC; I, IT-HSC; S, ST-HSC, B, Bipotent Progenitor, M, Myeloid Progenitor.
(B) Scaled paired secondary repopulation curves used to determine symbolic dynamics data from each secondary repopulation curve. The orange and blue
curves represent individual secondary recipients. Shaded plots indicate asynchronous repopulation behavior.
(C) Hamming distance measurements. Any two kinetics were defined to be asynchronous (red dots), if their Hamming distance was > 2. One exception occurred
where secondary repopulation was considered asynchronous with a Hamming distance of 2 due to a shorter symbolic dynamic sequence.
Figure S6. Cell-Cycle Profiles of HSCs from Each GFP Peak, Related to Figure 4
(A) Gating strategy for each HSC population. Cells were enrichment sorted on LSKCD48-Flk2-CD150+ cells, then fixed, stained for Ki67 and DAPI, then analyzed
by flow cytometry for cell cycle state. Displayed are cells sorted from a young mouse.
(B and C) Static cell cycle profile of Total, GFPHi, and Peak 0-4 cells from Young (5 months old, 3 months on dox; B) and Aging (11 months old, 9 months on dox; C)
mice. Between 5000-15000 events in the Total HSC gate were acquired for each sample. n = 3 and 2 for Young and Aging mice respectively.
Article
Correspondence
peter.kharchenko@post.harvard.edu
(P.V.K.),
david_scadden@harvard.edu (D.T.S.)
In Brief
Hematopoietic stem cells display
heterogeneous, stereotypic clonal
behavior that is conserved under various
conditions, and the differences in their
epigenome, instead of niche, are
responsible for this remarkable memory.
Highlights
d Clonal tracking demonstrates lone-specific functional
heterogeneity in vivo
1310 Cell 167, 1310–1322, November 17, 2016 ª 2016 Elsevier Inc.
A C E
D F
Cerulean intercalated by multiple LoxP pairs (Figure 1A) to To examine the efficiency of HUe in marking hematopoietic
enable Cre-induced stochastic recombination and expression. cells, we crossed the HUe mouse with the interferon-inducible
The design is very similar to the independently created Mx1-Cre strain (Kühn et al., 1995) (herein Mx1-Cre;HUe). We
‘‘Confetti’’ mouse (Snippert et al., 2010) with the distinction did not observe background fluorescence in the absence of
that the HUe mouse has 20 tandemly integrated cassettes Cre including in transplantation-mediated stress settings (data
enabling a wider range (theoretically >103) of possible colors not shown). We activated endogenous hematopoietic cell
generated by random combinations, in analogy to the color labeling by administering polyinosinic:polycytidylic acid (pIpC)
range generated by a television screen using three basic color into Mx1-Cre;HUe mice and evaluated mice after an interval
hues (red, blue, green). We crossed HUe with various pro- (>30 days) when the effects of interferon induction have been
moter-driven Cres to demonstrate marking in mesenchymal long shown to subside (Essers et al., 2009). Intra-vital imaging
or hematopoietic tissue (Figures 1C–1F). in live animals showed labeling of cells in the calvarial bone
Mice / Clones
HSC Behavior Is Highly Cell Autonomous clone size) and lineage commitment. Each color-defined clone
A major advantage of the HUe model is that we can measure and behaved similarly in different recipients, consistently exhibiting
characterize the behavior of endogenous HSC in vivo, then cell activation, proliferation, and lineage differentiation charac-
selectively isolate live HSCs based on fluorescent tagging, trans- teristics distinct from the other clones. The individual HUe fluo-
plant them into new hosts, and study their long-term behavior in rescent profiles of different cell types (Figure 3A, e.g., B cells
competition or under varying stress conditions. This cannot be from recipient 1) collected from multiple recipients were
achieved by DNA barcoding or transposon insertion analyses analyzed using unbiased hierarchical clustering. Clustering of
because these methods require the destruction of cells. Trans- the HUe fluorescent profiles grouped the same cell types
planting equal aliquots of randomly fluorescent-tagged donor together even though they were from different recipients (Fig-
HSCs into 20–40 C57BL/6J recipients resulted in an unantici- ure 3B). This demonstrates that the extent and consistency
pated consistency of clonal behavior in recipients (p < 10 16) of clone-specific biases was sufficiently large to distinguish
(Figures 3A, S4C,and S5A–S5C). That is, the individual clones different hematopoietic cell types in recipient mice solely based
in the recipients behaved after transplant as they had as endog- on their clonal composition, as measured by the fluorescent dis-
enous HSC in the donor in terms of cell proliferation (defined by tribution of each cell type. We termed the group of transplanted
2.0
C ∆LKS-∆LKS P<0.01 D
density
Saline Control LPS Treated Clonal Difference
∆CMP-
∆CMP
LPS IR
VS
0.0
Ter119
Ter119
Mac1
B220
Mac1
GMP
CMP
B220
MEP
GMP
CMP
MEP
CD3
CLP
LKS
CD3
−1.0 −0.5 0.0 0.5 1.0
CLP
LKS
Gr1
Gr1
∆LKS correlation of difference patterns
LKS
Pearson R = 0.35 SLAM *
SLAM *
LKS * LKS *
CLP * CLP *
∆LKS vs. ∆CMP CMP * CMP *
Pearson R = 0.01 GMP * GMP *
MEP * MEP *
VS P<0.001 P<0.001
B220 * B220 *
P<0.01 CD3 * P<0.01
CD3 *
∆CMP P<0.05 Gr1 * P<0.05
CMP Gr1 *
P<0.10 P<0.10
Pearson R = 0.41 Mac1 * Mac1 *
not significant not significant
T Cells T Cells
CLP CLP
B Cells B Cells
T
SL
LK
C
T
SL
LK
LP
D
M
EP
LP
EP
AM
3
AM
P
P
C D
Figure 5. Interrogation of the Molecular Signature Associated with Distinct Functions of HSC Clones
(A) To examine molecular differences associated with phenotypically distinct HSC clones, LT-HSC cells belonging to two selected clones (Cohort1.Y and Co-
hort1.R) were harvested from a HUe recipient cohort, subjected to RNA-seq (transcriptome), WGBS (DNA methylation) assays, and flow cytometric measurement
of multi-lineage reconstitution.
(B–D) Both long-term lineage contribution and clone size production of the two select LT-HSC (LineageLoSca+cKit+CD48 CD150+) clones (Cohort1.Y and
Cohort1.R) to HSC, MPP, CLP, CMP, GMP, MEP, B cells, T cells, monocytes, granulocytes, and erythroid compartments were measured by flow cytometry (B
and D) and analyzed as described in Figure S3. The percentage of cells representing either Cohort1.Y or Cohort1.R among all fluorescent cells in each he-
matopoietic compartment was shown. The Cohort1.R clone exhibited higher proliferation rate as it increased in size (density of cells) from HSC to MPP
compartment (C and D) and was present in all hematopoietic compartments particularly toward myelopoiesis. In contrast, the Cohort1.Y clone showed lower
proliferation rate (i.e., decreased clone density from HSC to MPP) and a strong presence in the CLP compartment (C), but reduced production in the CMP, GMP,
and downstream myeloid compartments.
See also Figures S6 and S7.
Figure 6. Immunophenotypically Equivalent HSCs Have Distinct Functional Attributes that Are Associated with Distinct Transcriptional and
Epigenetic Regulatory States
(A) The epigenetic state of both Cohort1.Y and Cohort1.R clones matched that expected of the HSCs. The DNA methylation state of enhancers activated
at different stages of hematopoiesis was examined in the two clones. Both clones showed equally low methylation levels at the enhancers active at the
HSC stage, with higher methylation observed at the enhancer regions activated at later MPP and CLP stages. Whiskers represent 95% confidence
interval.
(B) Higher proliferative bias of the Cohort1.R clone was apparent from its epigenetic state. Gene set enrichment analysis (GSEA) analysis showed higher
DNA methylation of HSC-specific enhancers and lower methylation of MPP-specific enhancers in the Cohort1.R clone relative to the Cohort1.Y clone.
Similarly, Cohort1.R clone showed higher DNA methylation at HSC-specific and lower at MPP-specific promoter regions. Combined with the corre-
spondingly higher expression of MPP- and lower expression of HSC-specific genes in the Cohort1.R clone, all three types of molecular signatures reflect
higher proliferative bias of the Cohort1.R clone. In each GSEA plot, the genes (enhancer/promoters) are ranked according to their relative expression (DNA
methylation level) ration between Cohort1.Y and Cohort1.R, with the highest Y/R ratios positioned on the left. The top plot shows rank sum statistics with
the point of maximum deviation from 0 considered to be the enrichment score of that set (red vertical line). The middle plot marks the positions of the genes
(promoters/enhancers) that belong to the set. The bottom plots show log2 fold ratio of expression (DNA methylation) magnitudes between Cohort1.Y and
Cohort1.R.
(C) GSEA analysis showed higher expression of proliferation-associated genes and genes associated with G1 phase in the Cohort1.R clone compared to the
Cohort1.Y clone, consistent with higher relative contribution of the Cohort1.R clone to the MPP compartment observed in fluorescence data. Higher relative
expression of genes associated with unmobilized HSC and G0 phase signature was seen in the Cohort1.Y clone.
(D) Enhancer state reflected lymphoid-specific bias of the Cohort1.Y clone. Consistent with the pronounced lymphoid bias observed for the Cohort1.R clone in
fluorescence data, Cohort1.Y clone showed lower DNA methylation at CLP-specific enhancer elements and higher methylation at CMP-specific enhancers
relative to the Cohort1.R clone.
(E) Despite both Cohort1.R and Cohort1.Y clones having been immunophenotypically defined as HSCs, molecular profiling of their epigenetic and transcriptional
landscape revealed distinctive signatures reflective of their differential functional behavior. Consistent with its larger clone size, the Cohort1.R clone had
distinctive DNA methylation pattern at enhancer and promoter regions, as well as transcription of genes indicative of a proliferative cell state. In comparison, the
Cohort1.Y clone showed a pronounced lymphoid output and such lineage preference was manifested by lower DNA methylation of lymphoid-specific enhancer
regions, while no discernable pattern was detected in terms of promoter methylation or gene transcription.
See also Figures S6 and S7.
We are grateful to Drs. Jeff W. Lichtman, Jean Livet, and Joshua R. Sanes for Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X.,
their kind gifts of the Brainbow fluorescent vectors. We also thank Laura Prick- Murre, C., Singh, H., and Glass, C.K. (2010). Simple combinations of lineage-
ett, Kathryn E. Folz-Donahue, and Meredith Weglarz at the Flow Cytometry determining transcription factors prime cis-regulatory elements required for
Core Facility of the Harvard Stem Cell Institute for their technical assistance. macrophage and B cell identities. Mol. Cell 38, 576–589.
We are grateful for support from Science for Life Laboratory, the Knut and Alice Jordan, C.T., and Lemischka, I.R. (1990). Clonal and systemic analysis of long-
Wallenberg Foundation, and the National Genomics Infrastructure, funded by term hematopoiesis in the mouse. Genes Dev. 4, 220–232.
the Swedish Research Council, for assistance with single-cell RNA-seq mea- Kharchenko, P.V., Tolstorukov, M.Y., and Park, P.J. (2008). Design and anal-
surements. This work was supported by NIH grants 1R21HL126070-01A1 to ysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26,
V.W.C.Y., DK103074 and CA193461 and the Gerald and Darlene Jordan Chair 1351–1359.
to D.T.S, and the Ellison Medical Foundation AG-NS-0965-12 and NIA
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L.
5K25AG037596 to P.V.K.
(2013). TopHat2: accurate alignment of transcriptomes in the presence of in-
sertions, deletions and gene fusions. Genome Biol. 14, R36.
Received: January 18, 2016
Revised: August 9, 2016 Kittler, R., Pelletier, L., Heninger, A.K., Slabicki, M., Theis, M., Miroslaw, L.,
Accepted: October 25, 2016 Poser, I., Lawo, S., Grabner, H., Kozak, K., et al. (2007). Genome-scale RNAi
Published: November 17, 2016 profiling of cell division in human tissue culture cells. Nat. Cell Biol. 9, 1401–
1412.
REFERENCES Kühn, R., Schwenk, F., Aguet, M., and Rajewsky, K. (1995). Inducible gene
targeting in mice. Science 269, 1427–1429.
Aiuti, A., Biasco, L., Scaramuzza, S., Ferrua, F., Cicalese, M.P., Baricordi, C., Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with
Dionisio, F., Calabria, A., Giannelli, S., Castiello, M.C., et al. (2013). Lentiviral Bowtie 2. Nat. Methods 9, 357–359.
hematopoietic stem cell gene therapy in patients with Wiskott-Aldrich syn-
drome. Science 341, 1233151. Lara-Astiaso, D., Weiner, A., Lorenzo-Vivas, E., Zaretsky, I., Jaitin, D.A., David,
E., Keren-Shaul, H., Mildner, A., Winter, D., Jung, S., et al. (2014). Immunoge-
Anders, S., and Huber, W. (2010). Differential expression analysis for sequence
netics. Chromatin state dynamics during blood formation. Science 345,
count data. Genome Biol. 11, R106.
943–949.
Biasco, L., Pellin, D., Scala, S., Dionisio, F., Basso-Ricci, L., Leonardelli, L.,
Lemischka, I.R. (1993). Retroviral lineage studies: some principals and appli-
Scaramuzza, S., Baricordi, C., Ferrua, F., Cicalese, M.P., et al. (2016). In vivo
cations. Curr. Opin. Genet. Dev. 3, 115–118.
tracking of human hematopoiesis reveals patterns of clonal dynamics during
early and steady-state reconstitution phases. Cell Stem Cell 19, 107–119. Lemischka, I.R., Raulet, D.H., and Mulligan, R.C. (1986). Developmental po-
tential and dynamic behavior of hematopoietic stem cells. Cell 45, 917–927.
Bock, C., Beerman, I., Lien, W.H., Smith, Z.D., Gu, H., Boyle, P., Gnirke, A.,
Fuchs, E., Rossi, D.J., and Meissner, A. (2012). DNA methylation dynamics Livet, J., Weissman, T.A., Kang, H., Draft, R.W., Lu, J., Bennis, R.A., Sanes,
during in vivo differentiation of blood and skin stem cells. Mol. Cell 47, J.R., and Lichtman, J.W. (2007). Transgenic strategies for combinatorial
633–647. expression of fluorescent proteins in the nervous system. Nature 450, 56–62.
Busch, K., Klapproth, K., Barile, M., Flossdorf, M., Holland-Letz, T., Schlenner, Lu, R., Neff, N.F., Quake, S.R., and Weissman, I.L. (2011). Tracking single
S.M., Reth, M., Höfer, T., and Rodewald, H.R. (2015). Fundamental properties hematopoietic stem cells in vivo using high-throughput sequencing in conjunc-
of unperturbed haematopoiesis from stem cells in vivo. Nature 518, 542–546. tion with viral genetic barcoding. Nat. Biotechnol. 29, 928–933.
Chambers, S.M., Boles, N.C., Lin, K.Y., Tierney, M.P., Bowman, T.V., Brad- Lu, F., Liu, Y., Jiang, L., Yamaguchi, S., and Zhang, Y. (2014). Role of Tet pro-
fute, S.B., Chen, A.J., Merchant, A.A., Sirin, O., Weksberg, D.C., et al. teins in enhancer activity and telomere elongation. Genes Dev. 28, 2103–2119.
(2007). Hematopoietic fingerprints: an expression database of stem cells Mazurier, F., Gan, O.I., McKenzie, J.L., Doedens, M., and Dick, J.E. (2004).
and their progeny. Cell Stem Cell 1, 578–591. Lentivector-mediated clonal tracking reveals intrinsic heterogeneity in the hu-
Ding, L., Ley, T.J., Larson, D.E., Miller, C.A., Koboldt, D.C., Welch, J.S., man hematopoietic stem cell compartment and culture-induced stem cell
Ritchey, J.K., Young, M.A., Lamprecht, T., McLellan, M.D., et al. (2012). Clonal impairment. Blood 103, 545–552.
Notta, F., Mullighan, C.G., Wang, J.C., Poeppl, A., Doulatov, S., Phillips, L.A., Tsang, J.C., Yu, Y., Burke, S., Buettner, F., Wang, C., Kolodziejczyk, A.A., Teich-
Ma, J., Minden, M.D., Downing, J.R., and Dick, J.E. (2011). Evolution of human mann, S.A., Lu, L., and Liu, P. (2015). Single-cell transcriptomic reconstruction
BCR-ABL1 lymphoblastic leukaemia-initiating cells. Nature 469, 362–367. reveals cell cycle and multi-lineage differentiation defects in Bcl11a-deficient
hematopoietic stem cells. Genome Biol. 16, 178.
Oki, T., Nishimura, K., Kitaura, J., Togami, K., Maehara, A., Izawa, K., Sakaue-
Venezia, T.A., Merchant, A.A., Ramos, C.A., Whitehouse, N.L., Young, A.S.,
Sawano, A., Niida, A., Miyano, S., Aburatani, H., et al. (2014). A novel cell-cy-
Shaw, C.A., and Goodell, M.A. (2004). Molecular signatures of proliferation
cle-indicator, mVenus-p27K-, identifies quiescent cells and visualizes G0-G1
and quiescence in hematopoietic stem cells. PLoS Biol. 2, e301.
transition. Sci. Rep. 4, 4012.
Verovskaya, E., Broekhuis, M.J., Zwart, E., Weersing, E., Ritsema, M., Bos-
Picelli, S., Björklund, A.K., Faridani, O.R., Sagasser, S., Winberg, G., and
man, L.J., van Poele, T., de Haan, G., and Bystrykh, L.V. (2014). Asymmetry
Sandberg, R. (2013). Smart-seq2 for sensitive full-length transcriptome
in skeletal distribution of mouse hematopoietic stem cell clones and their
profiling in single cells. Nat. Methods 10, 1096–1098.
equilibration by mobilizing cytokines. J. Exp. Med. 211, 487–497.
Shi, P.A., Hematti, P., von Kalle, C., and Dunbar, C.E. (2002). Genetic marking Wilson, N.K., Kent, D.G., Buettner, F., Shehata, M., Macaulay, I.C., Calero-
as an approach to studying in vivo hematopoiesis: progress in the non-human Nieto, F.J., Sánchez Castillo, M., Oedekoven, C.A., Diamanti, E., Schulte, R.,
primate model. Oncogene 21, 3274–3283. et al. (2015). Combined single-cell functional and gene expression analysis re-
Snippert, H.J., van der Flier, L.G., Sato, T., van Es, J.H., van den Born, M., solves heterogeneity within stem cell populations. Cell Stem Cell 16, 712–724.
Kroon-Veenboer, C., Barker, N., Klein, A.M., van Rheenen, J., Simons, B.D., Wu, Y., Zhou, H., Fan, X., Zhang, Y., Zhang, M., Wang, Y., Xie, Z., Bai, M., Yin,
and Clevers, H. (2010). Intestinal crypt homeostasis results from neutral Q., Liang, D., et al. (2015). Correction of a genetic disease by CRISPR-Cas9-
competition between symmetrically dividing Lgr5 stem cells. Cell 143, mediated gene editing in mouse spermatogonial stem cells. Cell Res. 25,
134–144. 67–79.
Snodgrass, R., and Keller, G. (1987). Clonal fluctuation within the haemato- Xi, R., Lee, S., Xia, Y., Kim, T.M., and Park, P.J. (2016). Copy number analysis
poietic system of mice reconstituted with retrovirus-infected stem cells. of whole-genome data using BIC-seq2 and its application to detection of
EMBO J. 6, 3955–3960. cancer susceptibility variants. Nucleic Acids Res. 44, 6274–6286.
Requests should be addressed to and will be fulfilled by Lead Contact David T. Scadden (david_scadden@harvard.edu).
Mouse Models
HUe, Prx1-CreER, Mx1-Cre, Col(II)-CreER, and C57BL6/J strains were used and cross-bred as needed in this study. Mouse strains
HUe and Prx1-CreER were made in-house, while B6.Cg-Tg(Mx1-cre)1Cgn/J (Mx1-Cre), FVB-Tg(Col2a1-cre/ERT)KA3Smac/J
(Col(II)-CreER), and C57BL6/J were obtained from Jackson Laboratory. To image labeled cells of limb bud mesenchyme,
Prx1-CreER was crossed with HUe to create Prx1-CreER;HUe. 2mg of 4OH-tamoxifen and 1mg of progesterone was injected
into < 20 g pregnant females at E18.5. Mice were sacrificed for imaging at post-natal day 1 to 1 month of age. To image cells of
labeled cartilage, Col(II)-CreER was crossed HUe to create Col(II)-CreER;HUe. Col(II)-CreER;HUe mice at 2 months of age was in-
jected with 2mg of 4OH-tamoxifen and sacrificed for imaging 2-4 weeks post-injection. To study hematopoiesis, Mx1-Cre was
crossed with HUe to create Mx1-Cre;HUe strain. To induce hematopoietic cell labeling, Mx1-Cre;HUe mice were injected with
12.5ug pIpC/g BW at two weeks of age. For most transplantation studies, 6-8 months old Mx1-Cre;HUe and C57BL6/J mice
were used. To track endogenous hematopoiesis, bone marrow aspirates were obtained from Mx1-Cre;HUe mice at 2, 3, 5, and
10 months old. For all studies, age matched littermates were used as experimental controls. All animal housing, usage, and proced-
ures performed were approved by the Institutional Animal Care and Use Committee of Massachusetts General Hospital.
METHOD DETAILS
Flow Cytometry
For each mouse, tibiae, femurs, iliac crests, and spines were collected for bone marrow cells. Isolation and enumeration of different
hematopoietic cell types was performed by flow cytometry. Bone marrow cells harvested from each animal were ACK lysed before
antibody staining. We routinely stain 5x107 cells per sample for the stem population, and 1x107 cells per sample for each progenitor
and mature population. Lineage cocktail consists of biotinylated B220, CD3e, CD4, CD8a, CD19, CD11b, Gr1, Ter119, CD11c, and
NK1.1 antibodies. Fluorescence conjugated to streptavidin was used to recognize lineage cocktail. Using the following antibody com-
binations, we were able to identify hematopoietic subpopulations at the stem-cell level: hematopoietic stem cells (HSCs) (Lineage-Pa-
cific Orange, cKit-APC-Cy7, Sca-PE-Cy7, CD48-APC, CD150-PE-Cy5), at the stem/progenitor level: multipotent progenitor cells
(MPPs) (Lineage-Pacific Orange, cKit-APC-Cy7, Sca-PE-Cy5), common lymphoid progenitors (CLPs) (Lineage-Pacific Orange, cKit-
APC-Cy7, Sca-PE-Cy5, CD127-PE-Cy7), common myeloid progenitors (CMPs), granulocyte macrophage progenitors (GMPs), mega-
karyocyte erythroid progenitors (MEPs) (all three with Lineage-Pacific Orange, cKit-APC-Cy7, Sca-PE-Cy5, CD16/32-PE-Cy7, CD34-
efluor660), as well as mature lineages: B cells (B220-APC), T cells (CD3-APC), monocytes (Mac1-APC), granulocytes (Gr1-APC), and
erythroid cells (Ter119-APC) using a BD FACSAria II Cell Sorter equipped with ultraviolet, violet, blue, yellow/green, red lasers.
Single-cell RNA-Seq
HUe mice were induced by pIpC two weeks before single-cell sort of LKS SLAM cells. Whole bone marrow was isolated from femurs
and tibiae by softly crushing bones, filtered by 40 mm cell strainer, and resuspended in Media 199 (ThermoFisher Scientific) supple-
mented with 2% fetal bovine serum (ThermoFisher Scientific) and RNase Out (ThermoFisher Scientific). Cells were stained for LKS
SLAM markers, Calcein AM (ThermoFisher Scientific) and propidium iodide for cell viability detection. Single-cells were sorted using
a BD FACSAria II (BD Biosciences) into PCR 384 well plates (ThermoFisher Scientific) containing standard lysis buffer. Whole tran-
scriptome amplification was performed using the Smart-seq2 protocol (Picelli et al., 2013), and libraries prepared by Nextera XT
(Illumina). Samples were pooled and sequenced on an Illumina Nexseq 500 instrument using a 50 bp paired-end-reads. The analysis
was carried out using PAGODA package v1.99.3(Fan et al., 2016).
ATAC-Seq
We used an independent cohort of HUe recipient mice to select new clones to for a second set of experiment that integrated ATAC-,
DNA- and RNA-seq analysis in correlation with HSC behavior. A new HUe recipient cohort of 32 mice was divided into two sub-co-
horts: one group for monitoring lineage output using flow cytometry, one group for isolation of LT-HSC clones. Two new LT-HSC
(LineageLoSca+cKit+CD48-CD150+) clones (Red2 and Yellow2) were isolated from multiple recipient cohort mice by flow cytometry
and subjected to ATAC-Seq, WGBS, and RNA-seq. For ATAC-Seq, 10,000 cells of each clone were lysed in 50 mL of cold lysis buffer
(10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% IGEPAL CA-630) and immediately subjected to a transposition reaction at
37 C for 30 min with 2.5 mL transposase enzyme (Illumina Nextera DNA Preparation Kit). Transposed DNA was purified using QIAGEN
MinElute PCR purification Kit and subjected to library amplification using NEBNext High-Fidelity 2X PCR Master Mix, Invitrogen
SYBR Green I Dye, and primers (Table S2). Prior to sequencing, the ATAC-Seq library was assayed for quality using TapeStation
Gene Set Enrichment Analysis for Testing Cell Type Bias of Clones
Gene set enrichment analysis (GSEA) was performed to investigate whether gene expression profiles of the selected clones show
bias toward a specific hematopoietic cell type. In the GSEA, the genes were ranked by the Z score corresponding to the p value
of the expression differences between the clones (using tophat2 (Kim et al., 2013) with default parameters, and DESeq (Anders
and Huber, 2010) with local fit option). GSEA was used to test gene sets obtained based on differential expression analysis of
RNA-seq data across 16 different hematopoietic cell types from Lara-Astiaso et al. (2014). Following the same steps as described
by Lara-Astiaso, the RNA-seq data (GEO: GSE60101) were aligned to the mm9 mouse genome assembly using bowtie2 (Langmead
and Salzberg, 2012) with default parameters. Read counts of genes were calculated using ‘analyzeRNA.pl’ from the Homer package
(Heinz et al., 2010). Differential gene expression analysis was performed based on DESeq between a given pair of cell types (i.e. HSC
and MPP). Genes significantly higher expression in a given cell type (FDR-corrected p value < 0.05) were selected as a set for GSEA
analysis (Figure 6B).
GSEA was also applied to test whether the selected clones also show significant bias in DNA methylation at promoters (within 2 kb
upstream of transcription start sites) of the identified cell-type-specific genes (Figure 6B). In the GSEA, promoters were ranked based
on the maximum likelihood estimates of log2 fold ratio of promoter methylation of the clones, calculated based on the SPP package
(Kharchenko et al., 2008).
Finally, GSEA was also employed to study potential cell type bias of the two clones in DNA methylation of enhancers (Figures 6B
and 6D). For the GSEA, enhancers were ranked by the maximum likelihood estimates of log2 fold ratio of DNA methylation of the ‘Yel-
low’ and ‘Red’ clones. For a given pair of cell types (HSC versus MPP in Figure 6B, or CLP versus CMP in Figure 6D), we selected top
500 enhancers that are most ‘active’ (as determined by the H3K27ac read counts) in one cell type and are not ‘on’ (determined by
H3K4me1 counts) in the paired cell type, and vice versa, for the GSEA analysis.
HUe RNA-seq, WGBS and ATAC-seq data accessible through GEO datasets (GEO: GSE87527).
A
poly(I:C)
In vitro
Mx1-Cre;HUe
In vivo
B
2000 progenitors
CLP 8 days
c-Kit
IL7R
transplanted into
C57BL/6J post-transplant
Lin Sca
CD16/32
GMP
c-Kit
c-Kit
C Mx1-Cre;HUe LKS
poly(I:C) 9.5 Gy
1 Mx1-Cre;HUe HSC +
RFP
CFP
30 weeks post-transplantation
YFP GFP
Figure S1. Faithful In Vivo Propagation of Fluorescence over Generations Enables Clonal Tracking, Related to Figure 1
(A) Demonstration of fluorescence fidelity in vitro. Bone marrow cells harvested from induced Mx1-Cre;HUe were plated at low concentration in methycellulose-
containing medium for single cell derived hematopoietic colony to emerge and imaged under fluorescent microscope. Uniformity in fluorescence in individual
colonies showed that color was consistent over generations of cell division in vitro.
(B) Demonstration of fluorescence fidelity in vivo. Bone marrow cells harvested from induced Mx1-Cre;HUe were flow sorted to isolate CLP, CMP, GMP, and MEP
populations. Two thousand cells of each population were intravenously transplanted into each of 5 sublethally irradiated C57BL/6J recipients. Spleens of
recipient mice were harvested 8 days post-transplantation, and cells were subjected to flow cytometric analysis of HUe fluorescence. Endogenous HUe fluo-
rescence emanating from cells was plotted in a 3 dimensional graph with x axis (tDimer2 = red fluorescence), y axis (Cerulean = blue fluorescence), and z axis
(EYFP = green fluorescence) representing increasing fluorescent intensities in log scale. Recipient mice that received the same batch of donor cells exhibited a
HUe fluorescent profile nearly identical to the donor cells injected into them.
Southern blot detected multiple copies of transgene inserted into the mouse genome in different founder lines:
5 copies control
No copy control
1 copy control
Founder 2
Founder 3
Founder 4
Founder 6
Founder 1
7.4 kb
B DNA Fingerprinting with probes to detect random genomic rearrangement in the presence of Cre:
Experimental Design:
Flow sorted Mx1-Cre;HUe GMPs
poly(I:C)
Transplanted 5000 4.5 Gy 4.5 Gy 4.5 Gy
cells per recipient.
RFP
CFP
Profile 1
Mx1-Cre;HUe
Profile 1 Profile 2 Profile 3
YFP GFP
Selected colour-restricted Mx1-Cre;HUe GMPs.
Mouse 1
Mouse 2
Mouse 3
Mouse 4
Mouse 5
Mouse 6
Mouse 7
Mouse 8
Mouse 9
Mouse 11
Mouse 10
Mouse 12
Mouse 13
Mouse 14
Mouse 15
Mouse 16
Mouse 17
Mouse 18
Mouse 19
Mouse 20
Mouse 21
Month 3 Month 5
Increase in density
Decrease in density
B Quantification of HUe clonal changes from 5 to 10 months old by custom-designed MClust R program
−3 0 3
Month 5 Difference Month 10
Green
Blue Red
N1
Green
E1: #: 9678.7; %nAF: 46.5; %T: 16.9 E1: #: 14400.7; %nAF: 92.6; %T: 55.1
W1: #: 7976.7; %nAF: 38.3; %T: 13.9 W1: #: 553.8; %nAF: 3.6; %T: 2.1
N1: #: 2855.1; %nAF: 13.7; %T: 5.0 W1 E1 N1: #: 329.8; %nAF: 2.1; %T: 1.3
Figure S3. Quantification of Hematopoietic Clonal Changes under Homeostatic Conditions—Mouse #8, 5 to 10 Months Old, Related to
Figures 2, 4, 5, and 7
(A) A Mx1-Cre;HUe mouse was induced for endogenous HUe fluorescence with pIpC at one month old. Bone marrow aspirates were obtained from tibiae at 5 and
10 months old respectively. HUe fluorescence in total bone marrow cells was projected in spherical graph with x axis (tDimer2), y axis (Cerulean), and z axis
(EYFP) representing increasing fluorescent intensities in log scale. Clones that showed changes are highlighted with colors: red represents an increase in density,
while blue represents a decrease in density in comparison to the other time point.
(B) To quantify the pattern changes between the two time points, HUe fluorescence in 3D space was projected into two dimensional plots using sinusoidal
projection: 5 months old (on left) and 10 months old (on right), and the difference plot in between. In the difference plot, red represents a decrease in cell density at
the indicated HUe fluorescence at 10 months old, whereas blue represents an increase in cell density at 10 months old. White contour line indicates statistically
significant changes with a Z-score of 3 to 3. Clones that were statistically different between the two time points were summarized in the panel below. For each
clone, we scored the change in absolute number of cells, the percentage of non-autofluorescent cells, and the percentage of the total number of cells.
A Experimental Set 1 Experimental Set 2 B Experimental Set 1 Experimental Set 2
0.15
total clonal difference
total clonal difference
150
0.12
frequency
frequency
0.10
100
0.08
0.05
0.04
50
50
0.00
0.00
0
s Recipient 1
e ek
1 6w
Total BM
16 weeks
1
Recipient 2
C57BL/6J
16
we
ek
s
Recipient 3
Figure S4. Hematopoiesis Is a Composite of Dissimilar Clones with Stereotypic Behavior, Related to Figures 2 and 3
(A) Distribution of clone sizes (measured as fraction of total cells) is shown for the two sets of mice.
(B) A total fraction of cells affected by shifts in the clonal composition between the adjacent time points is shown. Whiskers show 95% confidence interval.
(C) Illustrated is another example of HUe recipient cohort. Endogenous fluorescence activated Mx1-Cre;HUe mice were used as donors. Bone marrow cells from
multiple Mx1-Cre;HUe donors were pooled as one mixture and flow sorted to isolate HSPCs (LineageLoSca+cKit+). HSCs with random endogenous fluorescence
were mixed with support cells from C57BL/6J and transplanted into each of 20 lethally irradiated C57BL/6J recipients. After sixteen weeks of reconstitution, the
recipients showed high consistency in clonal pattern including proliferation, fluorescence, and lineage characteristics in all hierarchy of hematopoietic cell types.
HUe clonal fluorescent patterns of B cells, monocytes, and erythroid cells in multiple recipients are shown, illustrating consistency among the recipients and the
distinction between different cell compartments.
A Clonal consistency within each cell type B Clonal pattern of each cell type is uniquely distinct from
across multiple mice in a HUe recipient others. Comparing correlation coefficients within each cell
cohort type against other cell types in the same army.
C Clonal differences among cell types are consistant within a given HUe recipient cohort
Recipient Cohort 1 Recipient Cohort 2
Ter119
Ter119
SLAM
SLAM
Mac1
Mac1
B220
B220
GMP
GMP
CMP
CMP
MEP
MEP
CD3
CD3
CLP
CLP
LKS
LKS
Gr1
Gr1
Ter119 * Ter119 *
SLAM * SLAM *
MEP * MEP *
Mac1 * Mac1 *
LKS * LKS *
Gr1 * Gr1 *
GMP * P<0.001 GMP * P<0.001
CMP * P<0.01 CMP * P<0.01
CLP * P<0.05 CLP * P<0.05
CD3 * P<0.10 P<0.10
CD3 *
not significant not significant
B220 * B220 *
D 12
LPS treatment Control 12 hours 44 days
number of non−AF clusters
10
0
B2
Te
SL
LK
G
LP
D
M
r1
ac
EP
r1
20
AM
3
P
E
P
19
12
number of non−AF clusters
0
B2
Te
SL
LK
G
LP
D
M
r1
ac
EP
r1
20
AM
3
P
19
Green clone
low
Blue clone
10
dimension 2
dimension 2
dimension 2
cell PC score
neutral
0
−10
high
−20
−15 −10 −5 0 5 10 15 20
dimension 1 dimension 1 dimension 1
Cluster assignment C
B
Red clone Green clone Blue clone
20
0.4
0.8
A 0.8
*** **
10
dimension 2
***
0.3
0.6
C
cluster fraction
cluster fraction
cluster fraction
expected fraction
0.6
0
0.2
0.4
***
0.4
D
−10
0.1
0.2
0.2
* *
−20
0.0
0.0
0.0
−15 −10 −5 0 5 10 15 20
dimension 1 A B C D A B C D A B C D
cluster cluster cluster
Figure S6. Transcriptional Heterogeneity within and between Clones as Illustrated by Single-Cell RNA-Seq Analysis, Related to Figures 5, 6,
and 7
Single HSCs (LineageLocKit+Sca+CD48-CD150+) belonging to individual clones were flow sorted from a pIpC induced Mx1-Cre;HUe mouse and subjected to
single-cell RNA-seq analysis.
(A) tSNE visualization of transcriptional heterogeneity, with cells colored according to (from left to right plot) the HUe clone they belong to (red, green or blue
clones), intensity of the mitotic signature (orange – high mitotic expression activity, green – low), and intensity of B cell like signature. The distribution of clones
showed notable bias toward particular transcriptional states, however the cells of both large clones (red and green) can be found throughout the transcriptional
space, indicating that despite overall transcriptional and phenotypic bias, substantial intra-clonal transcriptional variability was present.
(B) The plot shows transcriptional cluster definitions. Clusters A-D describe key subpopulations, with the small erythroid-like and neutrophil-like groups in the
center omitted.
(C) Plots show tests of distribution of different HUe clones within different transcriptional clusters. Each subplot tests distribution of a particular HUe clone (red,
green, blue). The clusters were defined by x axis, and the y axis represents the fraction of each transcriptional cluster taken up by a given clone. The dashed
horizontal gray line shows the fraction of the transcriptional cluster the HUe clone was supposed to account for based on clone’s overall frequency. The bars show
observed fraction, with the whiskers providing 95% CI. The stars indicate statistical significance (*p = 0.05, **p = 0.01, ***p = 0.001).
A 4
Cohort1.R : 35 CNVs
log2 coverage ratio
-4
1 3 5 7 11 13 15 17 19
B 4
Cohort1.Y : 76 CNVs
-4
C 4
Cohort1.R vs Cohort1.Y : 2 CNVs
-4
D 4
Cohort2.P : 52 CNVs
-4
E 4
Cohort2.G : 65 CNVs
-4
F 4
Cohort2.P vs Cohort2.G : 0 CNVs
Figure S7. Analysis of Genomic Differences between the Clones, Related to Figures 5, 6, and 7
The CNV profiles obtained from the WGBS data for each clone are shown. Black and red dots represent log2 copy ratios of bins and CNV segments, respectively.
The blue lines represent log2 copy ratios of zero, 1 and +1.
(legend continued on next page)
(A and B) Control-free CNV profiles for the R and Y clones from Cohort1. Many CNVs were detected within each clone (see titles), though almost all occur in both
clones.
(C) CNV profile resulting from direct comparison of Cohort1.R and Cohort1.Y clones. Two closely spaced CNVs reported on chromosome 4 appeared to be
WGBS artifacts: the first deletion was also called in both Cohort1.R and Cohort1.Y with almost identical breakpoints by the control-free CNV calling method (A, B),
indicating that the clone-specific deletion was likely a calling artifact. The adjacent large CNV (31.6 Mbp) was not called in either Cohort1.R or Cohort1.Y clone.
Instead, the region consisted of four smaller CNV segments with similar breakpoints and log2 ratios in both clones, indicating that the large deletion was unlikely to
be a genuine clone-specific deletion and caused by an erroneous CNV segmentation.
(D–F) Analogous control-free and direct comparison plots for Cohort2.P and Cohort2.G clones. In this case, CNV pattern appears to be identical between clones
with no notable deviations in the direct comparison. In summary, we found no convincing evidence of clone-specific CNVs despite the reasonable sensitivity of
our CNV detection method (see the STAR Methods).
Article
Correspondence
fuchslb@rockefeller.edu
In Brief
Loss of communication between
epithelial and immune cells in the skin
underlies the slowdown in wound healing
associated with aging.
Highlights
d Intrinsic and extrinsic defects impair wound re-
epithelialization in aged skin
*Correspondence: fuchslb@rockefeller.edu
http://dx.doi.org/10.1016/j.cell.2016.10.052
Cell 167, 1323–1338, November 17, 2016 ª 2016 Elsevier Inc. 1323
rounded, and begin to produce epidermal mitogens such as display DETC and wound repair defects which are strikingly
FGF7/10 and IGF1, facilitating wound re-epithelialization similar to those in aged mice, and that elevating this signaling
(Jameson et al., 2002). Mice lacking the T cell receptor d subunit pathway can stimulate Skint expression as well as improve
(TCRd) show pronounced delays in cutaneous wound healing epidermal migration in aged skin. These findings not only
(Itohara et al., 1993; Jameson et al., 2002). However, these demonstrate proof of principle, but in addition, offer new promise
mice lack all gd T cells, including both epidermal DETCs and for therapeutic intervention in elderly individuals who need a
dermal Vg4Vd1 T cells; each secretes a different repertoire of boost in restoring skin barrier acquisition after injury.
factors and cytokines that could impact wound-repair (Gray
et al., 2011; Sumaria et al., 2011). RESULTS
Mice that selectively lack Vg5Vd1 DETCs have been described
(Boyden et al., 2008; Turchinovich and Hayday, 2011) but have Aged Animals Maintain a Functional Epidermis in
not been tested for possible defects in wound repair. They har- Homeostasis
bor a null mutation in selection and upkeep of intraepithelial The dorsal (backskin) epidermis of young (2–4 month) mice is a
T cells 1 (Skint1), lack canonical Vg5Vd1-expressing DETCs. stratified epithelial tissue composed of dead outer stratum cor-
Skint1 is the founding member of a family (Skint1-11) of butyro- neum cells, differentiating granular and spinous layers, and an
philin-like proteins containing transmembrane spanning do- inner proliferative basal layer attached to an underlying base-
mains and extracellular IgV and IgC domains (Mohamed et al., ment membrane (Figure 1A). The corresponding epidermis of
2015). During development, Skint1 is expressed by thymic aged (22–24 month) female C57BL6/J animals also displayed
epithelial cells, promoting functional differentiation of DETC pro- these morphological features, although an 20% reduction in
genitors (Boyden et al., 2008). A number of Skint family members epidermal thickness was accompanied by an equivalent dermal
are also expressed in the skin epidermis and intestinal epithelium thinning (Figures 1B and 1C). Immunofluorescence microscopy
(Boyden et al., 2008). However, their functions in these adult tis- confirmed the presence of a seemingly normal differentiation
sues remain unexplored. program in aged mouse skin (Figure 1D and data not shown).
In the present study, we were drawn to DETCs and Skints In all, we carried out immunostaining for basement membrane
through an unbiased approach in defining the age-related de- protein b4 integrin (CD104), basal keratins 5 and 14 (K5 and
fects that underlie impaired re-epithelialization after skin wound- K14), spinous layer keratins (K10 and K1), wound-response ker-
ing. Using mouse as a model system, we first showed that atins (K6 and K17), and granular layer proteins filaggrin and lor-
re-epithelialization to restore the skin barrier is delayed in aged icrin, and observed no obvious structural differences between
mice. We found that aged skin epidermal keratinocytes are aged and young skin.
less transcriptionally dynamic after wounding and fail to regulate To probe more deeply for differences between young and
key processes necessary for wound-repair. Many genes facili- aged epidermal keratinocytes in vivo, we used fluorescence acti-
tating interactions with immune cells weren’t activated properly vated cell sorting (FACS) to purify basal layer keratinocytes
in basal keratinocytes at the wound-edge of aged skin. Most (a6-integrinhighCD34negativeSca1high) from young and aged
notable were Skint genes. When we investigated the DETCs, mouse skin, followed by deep sequencing (RNA-seq) of their
we found that our unwounded aged mice harbored Vg5Vd1 mRNAs. Comparative expression analysis of duplicates of
DETCs, and hence differed from Skint1 null mice. However, the RNA-seq data revealed 74 genes that were ±2-fold differentially
DETCs displayed an age-related, wound-specific defect in their expressed (q < 0.05) between young and aged keratinocytes
behavior. (Figure 1E and Table S1). Overall, however, their transcripts
Our findings brought to the forefront prior speculation, never (56 upregulated, 18 downregulated) were relatively modestly
tested, that SKINTs or some other interacting ligand(s) on changed (1.9-fold average), indicating that under homeostatic
wound-proximal keratinocytes might function in the DETC conditions, aged animals maintain an epidermis that is architec-
response to injury (Havran et al., 1991; Jameson et al., 2004; Ko- turally and transcriptionally similar to that of young mice.
mori et al., 2012). We therefore turned to addressing whether
Skints might function in adult tissue homeostasis and wound- Aged Skin Is Slow to Re-epithelialize Wounds Following
repair, and whether perturbations in SKINTs might affect DETCs Injury
and/or their communication with epidermal cells to account for We next challenged the epidermis to wounding, to see if the
some of the age-related defects in wound healing. epidermis of aged mice was able to mount an injury response
Specifically, we discovered that young mice conditionally comparable to younger mice. Six millimeter punch biopsies
knocked down for Skint3 and Skint9 in epidermal keratinocytes created full-thickness (epidermis + dermis) wounds on the ani-
display defects in wound-repair and in wound-related DETC mals’ backs, which typically healed by 7 days (d7). As shown
behavior. Similarly, we found that young mice which a) lack in the images from representative experiments, young mice
Vg5Vd1-DETCs altogether, or b) display DETCs, but either lack consistently closed their wounds faster than their aged counter-
the Skint3-4-9 gene cluster or are epidermally knocked down parts, with little difference on the macroscopic level in wound
for individual Skints, also exhibit delays in skin re-epithelialization contraction (Figure S1). When quantified over five independent
during wound-repair. Finally, we identified conserved STAT3 wound studies, the biggest differences in wound area were
binding motifs in Skint promoters and showed that STAT3- consistently between d3 and d5 post wounding, where the rate
signaling and one of its upstream activators, Interleukin-6, are of wound closure was always faster in young animals (Figures
diminished in aged, wounded skin. Moreover, Stat3-null mice S1B and S1C).
Gran.(LOR, FLG)
Derm
Spin.(K1, K10) HF
HF
Basal(K5, K14)
SubCu Fat
B.M.(CD104)
-log10 p-value
2
K5 K10
0
−10 −5 0 5 10
log2 Fold Change
To visualize the re-epithelialization process, wounded skins Declines in Proliferative Capacity of Aged Keratinocytes
from mice were collected at intervals and subjected to K14 im- To understand the basis of the delayed re-epithelialization of
munostaining (Figure 2A). Wounds at d1 post-wounding ex- wounds in aged animals, we assayed the functional abilities
hibited little or no signs of re-epithelialization, but thereafter, an of young and aged keratinocytes to proliferate and migrate in
epithelial tongue of migrating keratinocytes was visible under- response to injury. To this end, at intervals after wounding,
neath the eschar (Figure 2C). Beginning at d3 and culminating mice were pulsed with 5-ethynyl-20 -deoxyuridine (EdU) for
at d5, a marked delay in re-epithelialization was evident in the 3 hr before harvesting and analyzing their skins. As quantified
wound beds of aged mice. At d5 when re-epithelialization had both in tissue sections of wounded skin and basal keratino-
closed the wound (96% ± 4%) in young animals, the epithelial cytes analyzed by flow cytometry, significantly fewer EdU
tongues from opposing sides of the wounds of aged animals labeled cells were seen in aged versus young epidermis at d3
had migrated less than half way (41% ± 8%) into the wound and d5 post-wound time-points (Figure 3A). This difference
site (Figure 2C). By d7, aged wounds were still not completely was not seen under homeostatic conditions, where basal levels
closed (92% ± 5%). of proliferation were similar (Figure 3B). Together, these find-
The delay in wound closure in aged animals corresponded ings suggest that the defect is rooted specifically in the
well to the decreased migration of the epithelial tongue under wound-response. Consistent with our punch wound studies,
the eschar, as quantified in Figure 2D. Moreover, signs of we also observed a notable decrease in proliferation in aged
epidermal responsiveness, as judged by enhanced epidermal versus young epidermal keratinocytes from skins that were
thickening and associated wound-induced keratin K17, analyzed 24 hr after waxing to depilate the skin, a procedure
extended further from the wound site in young than in aged that stimulates basal layer proliferation analogous to punch
mice (Figure S2). wounds (Figure 3C).
d1
d3 S
d5
S
d7
S
B
Epi
d1
Dermis
d3
d5
d7
C D
0.5
1
yg wound
yg epi
aged wound
aged epi
0
-10 -5 0 5 10
log2(fold change)
D
Genes changed in aged vs young keratinocytes after wounding
Lat2 (-1.1), Ik (-1.2), Il10 (-1.3), Il7 (-1.5), Defb1 (-1.5), Cxcl12 (-1.6), Il15 (-1.9),
immune
Ccl20 (-3.1), Ccl2 (-3.2), Ccrl1 (-4.4), Il1r2 (-4.5), Il6 (-4.5), Skint2 (-7.1), Skint3 (-3.0),
function
Skint9 (-4.0), Skint5 (-3.7), Rfx1(2.3), Il17b (1.9), Il28ra (1.7), Il18 (1.6), Tnfrsf14 (1.7),
Traf6 (1.3), CD97 (3.1)
Inha (2.6), Cdk2 (2.4), Ccnd3 (2.2), Ccnd1 (2.1), Rb1 (1.9), Bub1b (1.9),
cell cycle Cdc25a (1.7), Pten (1.5), Ccna2 (1.4), Cdkn2c (1.4), Chek1 (1.3), Apc (1.2),
Cdc25c (1.2), Cdkn2b (1.2), Pin1 (-1.3), Bccip (-1.3), Cdc16 (-1.3), Cdc6 (1.3)
locomotive Cxcl9 (-1.3), Ccl1 (-1.3), Cklf (-1.6), Cxcl12 (-1.6), Cxcl1 (-1.6), Cxcl10 (-2.0)
behavior Ccl7 (-2.4), Ccl8 (-2.8), Ccrl2 (-2.8), Ccl20 (-3.1), Ccl2 (-3.2), Cxcl14 (-3.6)
Ccrl1 (-4.4), Lamb1 (3.3), Nexn (-1.3), Cdh13 (-1.4), Myh9 (6.4), Slit1 (3.3)
(C) Volcano plot of differentially regulated genes between young wound and aged wound samples. Vertical rose-colored lines denote fold changes
greater ± 2-fold. Horizontal rose line denotes p value > 0.05. Blue dots indicate genes with a GO annotation relating to immune function (note marked failure of
many of these genes to be upregulated in aged wounds).
(D) Table of selected genes for indicated GO term from RNA-seq analysis (blue, downregulated by log2 of value; green, upregulated by log2 of value). Data are
represented as mean ± SEM.
See also Figure S4 and Tables S2, S3, S4, S5, and S6.
D
E
F G J
H
I
(F) Sagittal imunofluorescence images of skin (wounded and unwounded), immunostained for DETCs. Dashed lines denote epidermal/dermal boundaries. Scale
bars, 25 mm. Insets show DETCs highlighted with arrows.
(G) Quantification of the numbers of dendrites per DETC. n = 5. Students t test was used to measure statistical significance.
(H) Whole-mount DETC immunofluorescence of ear-skin. Shown are images prior to and at d1 and d3 after wounding. Scale bars, 100 mm. n = 3. Yellow dotted
lines denote wound edge (wd).
(I) Density plots of the distribution of rounded (no dendrites) of DETCs in ear-skin whole-mount preparations at times post-wounding indicated. Vertical lines
represent mean distance of rounded DETCs from wound edge (0 mm).
(J) Quantifications of DETCs at the wound site at times after injury. Data are represented as mean ± SEM. See also Figure S5.
i
i
ep
ep
ou
ou
ed
yg
w
w
d5 post-wounding
ag
ed
yg
ag
D E S
Scr shRNA
K14, DAPI
S
Skint3 shRNA
Skint9 shRNA
p=0.02 p=0.007
F G H
100
Percent of DETCs
dendrites/DETC
75
2.0
1.5
50
1.0
0.5
25
0.0
0
e
t3
t9
bl
in
in
m
Sk
Sk
ra
Sc
sh
75 DETC
3
K14, DAPI
FVBjax 2
50 1
S 0
25
FVBtac
S 0
d1
d1
d3
d3
6
6
x
x
BL
BL
ja
ja
B
B
FV
FV
57
57
C
L unwounded d1 d3 M
Rounded DETC distribution
wd wd 0.0100 0.005
FVBjax
C57BL6
C57BL6
γδTCR (DETC)
0.004
0.0075
density
density
0.0000 0.000
0 100 200 300 0 100 200 300
distance from wound edge (μm) distance from wound edge (μm)
Figure 6. Failure of Keratinocytes To Up-Regulate Skints Results in Impaired Wound Healing in Young Mice
(A) Heatmap of Skint gene family expression from RNA-seq data. Asterisks denotes splice variant. Yg wound = young keratinocytes isolated from the wound
edge, aged wound = aged keratinocytes isolated from the wound edge, young epi = young keratinocytes under homeostatic conditions, aged epi = aged
keratinocytes under homeostatic conditions.
(B) qRT-PCR of Skint mRNAs from keratinocytes isolated from wound edges.
(C) Illustration in utero lentiviral infections into amniotic sacs of E9.5 embryos and selective transduction of mouse skin epidermis.
(D) Knockdown efficiency of Skint shRNAs as measured by qRT-PCR of adult epidermis prior to wounding.
(E) Immunofluorescence images of d3 backskins of wounded young mice whose epidermises were transduced in utero for the Skint shRNAs indicated. Scr,
scrambled. Tissue sections are immunolabeled for K14 (green) and DAPI (blue). S, scab; arrows denote wound-edge. Scale bar, 500 mm.
(F) Wound closure at d5 of young mice transduced for the indicated shRNAs. n = 4.
(G and H) Quantification of DETC number and number of dendrites per DETC (H) in sections of unwounded and wounded skins transduced as indicated. n = 4.
(I) Immunofluorescence images of back-skins of re-epithelialization process following 5d or 3d after wounding of young mice of the strains indicated. Note that
FVBJax lacks Skints 3-4-9; FVBTac lacks Vg5Vd1 DETCs. Tissue sections are immunostained for K14 (green) and DAPI (blue). S, scab; arrows denote wound-edge.
Scale bar, 100 mm. n = 2.
(J) Quantification of wound closure by re-epithelialization.
(K) Quantificaton of dendrities per DETC from tissue sections of wounds in C57BL6 and FVBJax at time-points indicated.
(L) Whole-mount immunofluorescence and quantifications of DETCs in ear-skin of young FVBJax versus C57BL/6 mice. Scale bars, 100 mm. n = 2.
(M) Density plots of the distribution of rounded (no dendrites) of DETCs in ear-skin whole-mount preparations at times post-wounding indicated. Vertical lines
represent mean distance of rounded DETCs from wound edge (0 mm). Data are represented as mean ± SEM. See also Figure S6.
(J) DIC images (from d5 time-point) of explant cultures from young and aged tissue biopsies treated with IL-6 at concentrations indicated. Dashed lines denote the
borders of keratinocyte outgrowth; E, explant. Scale bars, 10 mm. n = 8.
(K) Quantifications are of the distance of outgrowth of keratinocytes in explants during a 7 day time-course. n = 8. Data are represented as mean ± SEM. See also
Figure S7.
Further information and requests for reagents may be directed to, and will be fulfilled by the Lead Contact: Elaine Fuchs (fuchslb@
rockefeller.edu).
METHOD DETAILS
Wounding Study
Punch biopsies were performed on anaesthetized mice in the telogen phase of the hair cycle. For backskin wounds, dorsal hairs were
cut with clippers and skin was swabbed with EtOH prior to wounding. 6 mm biopsy punches (Miltex) were used to make full-thickness
wounds. For ear punch biopsies, animals were anesthetized and a 2 mm biopsy was used to punch a through-and-through wound
(hole) in the center of each ear. After wounding, tissue was collected at 0.5d, 1d and 3d after wounding for immunostaining (details
below). Depilation was performed as described (Keyes et al., 2013). For EdU pulse experiments mice were injected intraperitoneally
(50 mg/g) (Sigma-Aldrich) at specified intervals before collection.
Cell Culture
Young and aged basal cell keratinocytes were FACS isolated from animals and plated on mitomycin-C treated J2 fibroblast feeder
cells to establish primary cell lines. Independent clones were cultured and passaged in E-media supplemented with 15% serum and a
final concentration of 0.3 mM Ca2+ for 3 passages and then moved to feeder-free cell culture (Rheinwald and Green, 1975). For colony
forming efficiency assays, viability of epithelial keratinocytes was determined using trypan blue (Sigma) staining on a hemocytometer
after FACS-isolation. Equal numbers of live cells were plated, in triplicate, onto mitomycin-C treated dermal fibroblasts in E-media
supplemented with 15% serum and 0.3 mM Ca2+. After 14 days in culture, cells were fixed and stained with 1% Rhodamine B
(Sigma). Colony diameter was measured from scanned images of plates using ImageJ and colony numbers were counted. For
IL-6 treatment experiments, keratinocytes were serum starved for 24 hr then treated with recombinant mIL-6 (R&D Systems) at
10ng/ml for indicated time-points. Cells were collected directly in Trizol (Invitrogen) and RNA was extracted for qRT-PCR (see below).
Cell adhesion and cell spreading assays were performed as described previously (Humphries, 2009). Wells were coated using
10 mg/ml human plasma fibronectin (Milipore), 40 mg/ml rat tail collagen-I (Corning), 0.1% (w/v) poly-L-lysine (Sigma), and 1 mg/ml
BSA (Sigma) for 1 hr at room temperature, washed with PBS 3-times and used in cell adhesion assays.
For scratch migration assays, keratinocytes were plated on 6-well tissue culture dishes and allowed to reached confluency.
Scratches were then created by manual scraping of the cell monolayer with a pipette tip. The dishes were then washed with PBS,
replenished with E media supplemented with 1 mM HEPES, and photographed for periods of 25–36 hr in 5% CO2 on a PerkinElmer
Volocity spinning disk system equipped with a heated enclosure and gas mixer (Solent) and 20X/0.75 CFI Plan-Apo objective. Indi-
vidual keratinocytes migration was manually tracked using ImageJ software. Transwell migration assays were performed in 6-well
plates (Corning). The bottom of each well was coated with 10 mg/ml fibronectin and fibroblast-conditioned E-media containing
0.3 mM Ca2+ was added. Young and aged keratinocytes were serum starved for 24 hr, and a total of 20,000 cells/well were plated
in serum-free medium containing 0.3 mM Ca2+. At time-points indicated cells were washed off the top membrane and then cells were
fixed to the bottom membrane. Cells were stained using hemotoxylin and eosin and counted under the microscope.
For explants assays, backskin tissue was harvested and hair was removed with Nair and washed with PBS. Subcutaneous fat was
gently removed with a scalpel. Explants were cut out using a 1.5 mm dermal biopsy punch (Miltex), then placed on fibronectin coated
24 well tissue culture dishes and secured to bottom of dish with 1–2 mL Matrigel (Corning), and submerged in E-media containing
0.3 mM Ca2+. Outgrowths from explants were imaged at indicated time-points and analyzed with ImageJ. For explants treated
IL-6, 2 mm biopsy punches were used to cut out explants and treated with 10ng/ml and 50 ng/ml of IL-6 in E-media containing
0.3 mM Ca2+. Images were taken at indicated time-points and outgrowths measured using ImageJ.
Flow Cytometry
Preparation of adult mice backskins for isolation of keratinocytes and staining protocols were done as previously described (Nowak
and Fuchs, 2009). Briefly, subcutaneous fat was removed from skins with a scalpel, and skins were placed dermis side down on
trypsin (GIBCO) at 37 C for 45 min. Single-cell suspensions were obtained by scraping the skin to remove the epidermis and hair
follicles from the dermis. Cells were then filtered through 70 mm, followed by 40 mm strainers. Cell suspensions were incubated
with the appropriate antibodies for 30 min on ice. The following antibodies were used for FACS: a6-integrin (BD PharMingen),
CD34 (eBiosciences) and Sca-1 (eBiosciences). DAPI was used to exclude dead cells. Cell isolations were performed on FACS
Aria sorters running FACS Diva software (BD Biosciences). For EdU incorporation experiments, staining was performed using
Click-iT EdU Alexa Fluor 488 Flow Cytometry Kit (Life Technologies) per manufacturer’s instructions. FACS analyses were performed
using LSRII FACS Analyzers and results were analyzed with FlowJo software.
For analysis of immune cells at the wound site, wound tissue was isolated from the backskin, keeping margins as close to wound as
possible. Tissue was minced in media (RPMI with L-glutamine, Sodium pyruvate, acid free HEPES, Penicillin and streptomycin) then
Liberase TL (Roche) was added (25 g/ml) and tissue was digested for 90 min at 37 C while shaking gently. The digest reaction was
stopped by addition of 20 ml of 0.5 M EDTA and 1 ml of 10% DNase solution. Cells were passed through a 70 mm strainer and stained
with the following antibodies from eBiosciences: Ly6c-FITC 1:100, Ly6g-PE 1:200, CD11c-PECy7 1:150, CD11b-PacBlue 1:300,
MHCII-AF700 1:300, CD45-A780 1:100, CD64-PerCP-Cy5 1:200, TCRb-PCRP 1:200, gdTCR-APC 1:400. Dead cells were excluded
using a LIVE/DEAD Fixable Blue Dead Cell Stain Kit (Molecular Probes), for UV excitation. FACS analyses were performed using LSRII
FACS Analyzers and results were analyzed with FlowJo software.
RT-qPCR
RNA was purified from FACS sorted cells by directly sorting into TrizolLS (Invitrogen) and purified using Direct-zol RNA MiniPrep kit
(Zymo Research). Equivalent amounts of RNA were reverse-transcribed by SuperScript VILO cDNA Synthesis Kit (Invitrogen). cDNAs
were normalized to equal amounts using primers against b-actin. cDNAs were mixed with indicated primers and Power SYBR Green
PCR Master Mix (Applied Biosystems), and quantitative PCR (qPCR) was performed on a Applied Biosystems 7900HT Fast Real-
Time PCR system. Primer sequences for RT-PCR were obtained from Roche Universal ProbeLibrary.
For Vg5 qPCR, unwounded and wounded skin was incubated in 50 mM EDTA for 1 hr, to separate epidermis was separated from
dermis. Epidermal cells were immediately frozen in liquid nitrogen. Frozen tissues were homogenized using Bessman Tissue Pulver-
izer (SpectrumTM) and collected in Trizol (Invitrogen). RNA was extracted using Direct-zol RNA MiniPrep kit (Zymo Research) per
manufacturer’s instructions. Equivalent amounts of RNA were reverse-transcribed by SuperScript VILO cDNA Synthesis Kit
(Invitrogen). cDNAs were mixed with indicated primers and Power SYBR Green PCR Master Mix (AppliedBiosystems), and quanti-
tative PCR (qPCR) was performed on a Applied Biosystems 7900HT Fast Real-Time PCR system.
Student’s t test was used to determine the significance between two groups with Prism5 software. Box-and-whisker plots are used to
describe the entire population without assumptions about the statistical distribution. Error bars plotted on graphs denote SEM. For all
statistical tests, the 0.05 level of confidence was accepted as a significant difference.
Figure S1. Wound Closure in Young and Aged Animals, Related to Figure 2
(A) Representative examples of skin from young (2–4 months) and aged (22–24 months) mice subjected to 6 mm punch biopsy. Representative wounds shown at
the indicated time-points after wounding.
(B) Area of wound over time-course measured from images. n = 5.
(C) One phase exponential decay modeling of wound closure in young and aged animals by the equation: N(t) = N0e-t/T. Exponential decay constant (l) T = 1/l.
Span is the difference between the initial size of the wound and the plateau of wound closure. Fit of curves are significantly different (Mann-Whitney test,
p < 0.0001).
Figure S2. Epidermal Response to Wounding, Related to Figure 2
(A) Sagittal images of young and aged backskin after punch biopsy at indicated time-points. Immunostaining for K17 and b4-integrin (CD104) antibodies as
indicated by color-coded secondary antibodies. Scale bar, 200 mm. ‘‘S’’ denotes scab. n = 5.
(B) Quantification of the distance away from the wound site where upregulation of K17 can be observed in epidermal keratinocytes. Data are represented as
mean ± SEM.
Figure S3. In Vitro Keratinocyte Assays, Related to Figure 3
(A) Quantification of migration in Boyden chamber assay of young and aged keratinocytes after 12 or 24 hr after seeding. Migration is expressed as the percentage
of cells that reach the bottom of chamber versus the total number of cells seeded in the upper chamber.
(B) Cell adhesion in vitro. FACS isolated basal epidermal keratinocytes from young and aged skins were assayed for their ability to attach to tissue culture dishes
coated with the different substrates indicated (fibronectin, collagen I, Poly-Lysine, and bovine serum albumin). Representative images of young and aged
keratinocytes on fibronectin coated plate after adhesion assay. Plot of cells bound to fibronectin coated plates at 10, 30 and 60 min time-points. n = 6.
(C) Quantification of cell adhesion to different substrates indicated.
(D) Quantification of cell spreading (area of the cell attached to dish) after cell adhesion assay. n = 6. (E) Colony forming efficiency of young and aged keratinocytes
isolated by FACS. n = 15.
(F) Quantification of colony number and size of colonies for in vitro growth assays. Data are represented as mean ± SEM.
Figure S4. Global Transcriptional Analysis in Young and Aged Keratinocytes, Related to Figure 4
(A) Schematic of keratinocyte isolation from wound edge and FACS isolation strategy.
(B) RNA-seq data from isolated keratinocytes for epithelial markers Krt14/Krt16, Krt5, Trp63, Itga6, Klf5 and markers for endothelial cells (Cd31), immune cells
(Cd45), fibroblasts (Cd140a), and melanocytes (Sox10).
(C) Heatmap of unsupervised hierarchal clustering of RNA-sequencing data after k-means clustering of transcripts into 25 groups. Young wd = keratinocytes
isolated from the wound edge of a young mouse; aged wd = keratinocytes isolated from the wound edge of an aged mouse, young epi = young keratinocytes
under homeostatic conditions, aged epi = aged keratinocytes under homeostatic conditions.
(D) Venn Diagrams comparing genes from young keratinocytes versus young wounded keratinocytes genes up (500 genes) and down (1679 genes) regulated (Yg,
young), and aged keratinocytes versus aged wounded keratinocytes genes up (236 genes) and down (328 genes) regulated (Ag, aged) by folds indicated.
Overlapping regions denote genes similarly regulated in both young and aged wounded skin.
(E) Dot plot of leading edge results in GSEA. Each dot represents a GO term grouped into the 8 core functional categories. The size of the dot denotes GO
terms -log FDR value.
A B γδTCR DAPI
Young
Aged
C Young Aged D
γδTCR
E γδTCR DAPI
d1 d3 d5 d7
Young
Aged
dendrites/
75 DETC
Percent of DETCs
5
4
50 3
2
1
25 0
0
d1
d1
d3
d3
g
ed
ed
un
un
ag
ag
yo
yo
bits
motif 1 1
3.2e-66
-4000 0 0 ATAC GAAG
CC
T
G
G TA A
C
C
G
T
TGT
AATC TG
A
T
T
ATAGTCGTT A AG
CG
AC
C
A
T
GA
TT
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
2
T T T T AT T
C CC GCC CTGCC CCT G GC GGGATTA AGG
motif 1
p-value:
bits
1
motif 2
motif 2 T A A 3.1e-50
0 TC A
G
T
A
TT T T A T CG T C T CCCGC
GG C
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
motif 3 2
bits
motif 3 1 p-value:
C motif 1 motif 2 motif 3 0
G
T
A
T
G
C
A TGC T
G CT G
T
A A T CA
C T
TTCACA
A A
T
A C TA
G
C
A GA T
GGC
T
C
A
GACA AC
T
G
G
A
T
C
C
T
G C T A
4.3e-49
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Tcfap2c Nr5a2 Msx2 Nfatc2
Nkx3 Nr3c1 Ar Stat3
Gfi1b Tcp1 Meis2 Meis1 D
Err1 Esr1 Hnf4a Gfi1b 2
STAT3
Arid3a Foxa1 Meis1 Stat4
T C GGAA
bits
binding site 1
Gabpa Zfp423 Nr2f1 Sfpi1
2 0
C
T
G CTAAT
TG A
G
Nfya Hnf4g
T
A
Dux Esr2
G
A A A
T
10
11
Tal1
Gata1
Ehf
Err2
Bcl6
Pknox1
Meis3
Nr3c1
motif 3
bits
1
0
G
T
A G TGCTG GAA T AACTC GG CCTCTG AAGAGCA T GT TCTT
T
G
C
A TGC T
G
T
A A T
TG
CC C
A TTCACA
A
T
A
T
A C TA
G
C
A GA T
GGGAC
T
C
A
C
AC
T
G
A
G
A
C
TC
G T A
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
E2F6 Stat5a/b
Stat6
E Hrs: 0 0.5 2 6
DAPI, K5
pSTAT3
pSTAT3
unwounded skin H
F Young Aged Stat3 Het Stat3 cKO
pSTAT3, DAPI
γδTCR DAPI
K5, CD104,
pSTAT3
γδTCR
S S
post-wound
S
d3
S
d5
S
d7
Figure S7. STAT3 Regulation of Skint Genes and Response to STAT3 Induction by Epidermal Keratinocytes In Vitro, Related to Figure 7
(A) Diagram of the Skint locus used for promoter analysis. Diagram (below) shows location of each of the 3 motifs found across all 11 Skint promoters.
(B) Top three conserved motifs found in promoter regions ( 4,000 bp from start codon) in Skint family of genes.
(C) Transcription factors found by TomTom to have binding sites in each motif. Transcription factors highlighted in red were found to be expressed with a
FPKM > 1 in our RNA-seq data.
Correspondence
mahesh.desai@lih.lu (M.S.D.),
emartens@umich.edu (E.C.M.)
In Brief
Regular consumption of dietary fiber
helps prevent erosion of the intestinal
mucus barrier by the gut microbiome,
blunting pathogen infection and reducing
the incidence of colitis.
Highlights
d Characterized synthetic bacterial communities enable
functional insights in vivo
SUMMARY gory that includes a broad array of polysaccharides that are not
digestible by human enzymes—has also drawn it into the spot-
Despite the accepted health benefits of consuming light: it provides an important substrate to the community of
dietary fiber, little is known about the mechanisms microbes (microbiota) that inhabits the distal gut (Sonnenburg
by which fiber deprivation impacts the gut micro- and Sonnenburg, 2014). Unlike humans, who produce 17
biota and alters disease risk. Using a gnotobiotic gastrointestinal enzymes to digest mostly starch, our gut micro-
mouse model, in which animals were colonized with biota produces thousands of complementary enzymes with
diverse specificities, enabling them to depolymerize and ferment
a synthetic human gut microbiota composed of fully
dietary polysaccharides into host-absorbable short-chain fatty
sequenced commensal bacteria, we elucidated the
acids (SCFAs) (El Kaoutari et al., 2013). Thus, the physiology of
functional interactions between dietary fiber, the the gut microbiota is geared toward dietary polysaccharide
gut microbiota, and the colonic mucus barrier, which metabolism. At present, relatively little is known about how a
serves as a primary defense against enteric patho- fiber-deprived gut microbiota fulfils its energy demands and
gens. We show that during chronic or intermittent how low fiber-induced microbiota changes impact our health.
dietary fiber deficiency, the gut microbiota resorts Apart from dietary fiber, an alternative energy source for the
to host-secreted mucus glycoproteins as a nutrient microbiota is the glycoprotein-rich mucus layer that overlies
source, leading to erosion of the colonic mucus the gut epithelium as a first line of defense against both
barrier. Dietary fiber deprivation, together with a commensal microbes and invading pathogens (Johansson
fiber-deprived, mucus-eroding microbiota, pro- et al., 2013; McGuckin et al., 2011). The colonic mucus layer is
a dynamic and chemically complex barrier composed largely
motes greater epithelial access and lethal colitis by
of secreted mucin-2 glycoprotein (MUC2) (Johansson et al.,
the mucosal pathogen, Citrobacter rodentium. Our
2008). Goblet cells secrete MUC2 as a disulfide cross-linked
work reveals intricate pathways linking diet, the gut network that expands to form an inner layer, which is tightly
microbiome, and intestinal barrier dysfunction, adherent to the epithelium and is poorly colonized by
which could be exploited to improve health using commensal bacteria. As bacterial and host enzymes continu-
dietary therapeutics. ously hydrolyze the luminal edge of this layer, a looser outer layer
is formed that supports a more dense and metabolically distinct
INTRODUCTION community (Li et al., 2015). A key nutritional aspect of the mucus
layer for gut bacteria is its high polysaccharide content, with up
The diet of industrialized nations has experienced a decrease in to 80% of the mucin biomass being composed of mostly
fiber intake, which for many is now well below the recommended O-linked glycans (Johansson et al., 2013). However, only a
daily range of 28 35 g for adults, and this deficit has been linked distinct subset of gut microbiota species has evolved the capac-
to several diseases (Burkitt et al., 1972; Sonnenburg and ity to utilize this nutrient source (Hoskins and Boulding, 1981;
Sonnenburg, 2014). Fiber provides direct physical benefits, Png et al., 2010).
including increased fecal bulking and laxation (Burkitt et al., The direct impact of fiber polysaccharides on the microbiota,
1972). However, another feature of dietary fiber—a nutrient cate- combined with the ability of at least one nutritional generalist
Cell 167, 1339–1353, November 17, 2016 ª 2016 Elsevier Inc. 1339
(Bacteroides thetaiotaomicron) to shift from dietary polysaccha- complexity of the gut microbiota is a barrier to deriving detailed
rides to mucus glycan metabolism in the absence of fiber (Son- conclusions because sequence-based approaches (16S rRNA
nenburg et al., 2005), suggests a connection between diet and gene and meta-genomics/-transcriptomics) suffer from sub-
the status of the colonic mucus barrier. Indeed, three previous stantial functional uncertainty. Thus, to test our hypothesis that
reports have correlated reduced dietary fiber with thinner colonic specific members within a fiber-deprived gut microbiota cause
mucus (Brownlee et al., 2003; Earle et al., 2015; Hedemann et al., damage by increasingly foraging for nutrients in the protective
2009). Nevertheless, the underlying mechanisms with respect to mucus layer, we designed a synthetic microbiota (SM) contain-
involvement of the microbiota and, perhaps more importantly, ing 14 species of fully sequenced commensal human gut bac-
consequences for the host remain largely unknown. Such knowl- teria (Figure 1A). The selected species were chosen to represent
edge is important as it could provide explanations for why devi- the five dominant phyla and collectively possess important core
ations or imbalances in gut microbial community membership metabolic capabilities (Figure S1A).
and physiology (‘‘dysbiosis’’) correlate with several negative To provide an additional layer of functional knowledge about
health outcomes, including pathogen susceptibility, inflamma- complex carbohydrate metabolism, we pre-evaluated our 14
tory bowel disease (IBD), and colon cancer (Cameron and Sper- species for growth in vitro on a panel of 42 plant- and animal-
andio, 2015; Flint et al., 2012; McKenney and Pamer, 2015). derived mono- and polysaccharides, including purified mucin
Finally, such knowledge could inform therapeutic and preventa- O-glycans (MOGs) as sole carbon sources (Martens et al.,
tive strategies to correct these conditions. 2011). These growth assays allowed us to determine that all
The integrity of the mucus layer is critical for health. Genetic major groups of dietary fiber and host mucosal polysaccharides
ablation of Muc2 in mice brings bacteria into close contact could be used by one or more strains in our community as well as
with the epithelium, leading to inflammation and colon cancer which bacteria target each glycan (Figures 1A, S1A, and S1B;
(Van der Sluis et al., 2006). Additional studies have implicated Table S1). It is evident that the four mucin-degrading species
reduced or abnormal mucus production or O-glycosylation in fall into two categories: mucin specialists (A. muciniphila and
the development of intestinal inflammation (Fu et al., 2011; Lars- B. intestinihominis), which only grow on MOGs as a sole poly-
son et al., 2011) and penetration of commensal bacteria in the saccharide source, and mucin generalists (B. thetaiotaomicron
inner mucus layer in murine models of colitis and ulcerative and B. caccae), which each grow on several other polysaccha-
colitis patients (Johansson et al., 2014). Moreover, the mucus rides. Overall, our choice of species is physiologically and
barrier—a reservoir of antimicrobial peptides and immunoglobu- ecologically representative of the more complex native gut mi-
lins—is the first structure that a mucosal pathogen must crobiota. Because our community is composed of bacteria
overcome to establish an infection (McGuckin et al., 2011). with determined carbohydrate metabolic abilities, it allows us
Given that the status of the mucus layer is precariously balanced to address our central hypothesis in more precise, mechanistic
between replenishment by goblet cells and degradation by gut detail.
bacteria, we hypothesized that a fiber-deprived microbiota To develop a gnotobiotic model, we assembled the SM in
would progressively forage on this barrier, leading to inflamma- germfree mice, which were fed a standard fiber-rich (FR) labora-
tion and/or increased pathogen susceptibility. tory diet that contains 15% dietary fiber from minimally pro-
We aimed to investigate the mechanistic connections be- cessed grains and plants (Figures 1B and 1C). Colonized animals
tween chronic or intermittent dietary fiber deprivation on micro- were maintained on the FR diet for 14 days to monitor reproduc-
biota composition and physiology as well as the resulting effects ibility and stability of community assembly (Figure 1B). All of the
on the mucus barrier. To create a model that facilitates functional introduced species persisted in each mouse between 6 and 54
interpretation, we assembled a synthetic gut microbiota from or 66 days of colonization depending on the length of the exper-
fully sequenced human gut bacteria in gnotobiotic mice. In the iment (n = 37 total, two independent experiments; analyzed by
face of reduced dietary fiber, we examined changes in commu- both 16S rRNA sequencing [Table S2] and qPCR approaches
nity physiology and susceptibility to Citrobacter rodentium, a [Table S3]). Individual mice exhibited reproducible SM assembly
murine pathogen that models human enteric E. coli infection irrespective of caging, mouse gender, experimental replicate, or
(Collins et al., 2014). We demonstrate that a microbiota deprived method of analysis (Figure S2; Tables S2 and S3). In addition to
of dietary fiber damages the colonic mucus barrier and promotes 29 germfree control animals, a total of four different gnotobiotic
pathogen susceptibility. Our findings suggest a mechanism colonization experiments (51 SM-colonized mice in total; exper-
through which diet alters the activity of the gut microbiota iments 1 4) were performed according to the timeline shown in
and impacts health, which is important prerequisite knowledge Figure 1B.
for rationally designing future dietary interventions and
therapeutics. Both Chronic and Intermittent Fiber Deficiency
Promotes Enrichment of Mucus-Degrading Bacteria
RESULTS Although dietary changes are known to perturb microbiota
composition, the impact of diet variation, especially chronic or
A Synthetic Human Gut Microbiota with Versatile Fiber intermittent fiber deficiency, on the activities and abundance of
Polysaccharide Degrading Capacity mucin-degrading bacterial communities has not been studied
Diet changes are known to rapidly affect the composition of the in functional detail. After validating stable SM colonization, three
microbiota in humans and rodents (David et al., 2014; Faith et al., groups of mice were maintained by constant feeding of one of
2011; McNulty et al., 2013; Rey et al., 2013). However, the full three different diets: fiber-rich (FR), fiber-free (FF), or prebiotic
s
on
ke hia ofa um en
Ak eric aer ios exig i
i
C vinb cte cta s s
ch l l a ym a itz
13 species included here
b ria te e icr
i
a r b a re a l i i n
la
Es n s e s form usn
Eu ebu la in cca om
M cali um tin om
+ Desulfovibrio piger Gavages with SM
hi
an li ns
40 days
b t
R es es tai s
ip
a
a
e ri tes ih
rn id the mi
rm c cie
lo r riu le
os ie ca ot
o l iu ia r
in
si HS
ct ide uni s
C trid ant m p
Fa acte in stin
(D, days)
Ba ero s for
Ba tero es atu
uc
(Proteobacteria)
m
c id ov
o
a
Ba tero es
m
s y
c id
l
Fecal samples Fecal samples
Ba tero
li
13 days
c
Ba
Mucus O-glycans
Pullulan Germ-free Fiber-rich (FR) diet
Glycogen
Amylopectin (potato)
mice Confirm
Amylopectin (maize) microbial Fiber-free (FF) diet
Inulin colonization
Levan Prebiotic (Pre) diet
Heparin (qPCR)
Hyaluronan
Polysaccharides
Chondroitin sulfate
Polygalacturonate FR/FF diets daily change
Rhamnogalacturonan I Normalized growth:
Pectic galactan (potato)
1.0-0.8
Pectic galactan (lupin)
0.8-0.6 FR/FF diets 4-day change
Arabinogalactan
Arabinan 0.6-0.4
Oat spelt xylan 0.4-0.2 Pre/FF diets daily change
Arabinoxylan (wheat) 0.2-0.0
Galactomannan No growth
Glucomannan Pre/FF diets 4-day change
Xyloglucan
β-glucan
Cellobiose Host responses
Laminarin Readouts
Lichenin Microbial responses
Dextran
α-mannan
Arabinose C
Fructose
Fucose
Galactose FR diet FF diet Pre diet
Monosaccharides
Figure 1. Carbohydrate Utilization by the Synthetic Human Gut Microbiota Members and Gnotobiotic Mouse Treatments
(A) Heatmap showing normalized growth values of 13/14 synthetic human gut microbiota (SM) members.
(B) Schematic of the gnotobiotic mouse model illustrating the timeline of colonization, feeding strategies, and fecal sampling.
(C) Compositions of the three distinct diets employed in this study (common additives such as vitamins and minerals are not shown). The prebiotic mix contained
equal proportions of 14 host indigestible polysaccharides (see Table S1).
See also Figure S1.
(Pre). In contrast to the FR diet that contained naturally milled a similar effect on community composition as the FF diet but
food ingredients with intact fiber particles, the Pre diet was separated slightly by PCoA ordination from FF, likely due to
designed to study the effect of adding a mixture of purified, increased Bacteroides abundance (Figures 2A and 2B). Intrigu-
soluble glycans, similar to those used in prebiotic formulations ingly, the abundances of the same four bacteria noted above
(Figure 2C). To imitate the fact that the human diet experiences fluctuated rapidly on a daily basis when the FR and FF diets
fluctuating amounts of fiber from meal-to-meal, four other were oscillated (Figures 2C and S3), corroborating their ability
groups were alternated between the FR and FF or Pre and FF to respond dynamically to variations in dietary fiber. The increase
diets on a daily or 4-day basis (Figure 1B). in mucin-degrading species observed in fecal samples matched
Fecal microbial community dynamics showed that in mice with cecal abundances at the end of the experiment (Figure 2D
switched to the FF diet, several species rapidly and reproducibly and panels to the right of plots in Figure 2A). Moreover, similar
changed in abundance (Figures 2A, 2B and S3). Four species— levels of mucin-degrading bacteria were quantified in the colonic
A. muciniphila, B. caccae, B. ovatus, and E. rectale—were highly lumen and mucus layer using laser capture microdissection (Fig-
responsive to diet change. A. muciniphila and B. caccae are able ure 2E), indicating that proliferation of mucin-degrading bacteria
to degrade MOGs in vitro. B. ovatus and E. rectale cannot in this model is a community-wide effect and not limited just to
metabolize MOGs, but together can use a broad range of poly- the mucus layer.
saccharides found in dietary fiber (Figure 1A). In the absence Many of the other bacteria (except R. intestinalis and B. intes-
of fiber, the abundance of A. muciniphila and B. caccae tinihominis) were sensitive to changes between the FR and FF
increased rapidly with a corresponding decrease of the fiber-de- diets on daily and 4-day bases, albeit to lower degrees (Fig-
grading species (Figure 2A). The Pre diet, which contains purified ure S3B; Table S2). Two additional species especially sensitive
polysaccharides and is otherwise isocaloric with the FF diet, had to diet change were Desulfovibrio piger (increased on FF diet)
ts
sc A
ts
sc A
sc A
rip
Tr rRN
rip
Tr rRN
16 cal
16 cal
16 cal
rip
Tr r R N
Ce
Ce
Ce
an
S
an
S
an
S
A Fiber-rich (FR) diet Fiber-free (FF) diet Prebiotic (Pre) diet
100
A. muciniphila
Fecal and cecal bacteria
(relative % abundance)
C. aerofaciens
80
D. piger
E. coli HS
60
F. prausnitzii
R. intestinalis
M. formatexigens
40
C. symbiosum
E. rectale
B. intestinihominis
20
B. caccae
B. uniformis
0
B. ovatus
B. thetaiotaomicron
FR diet 6 13 16 19 22 25 42 45 48 51 54 6 13 16 19 22 25 42 45 48 51 54 6 13 16 19 22 25 42 45 48 51 54
Days
B C
0.4
10
Fiber rich (FR) ****** ** ***** * ********** * *******
* ** * * **** **
20
Prebiotic (Pre) *
0.2
5
1-day FR/FF
PCoA 2 (9%)
10
0
0.0
-5
0
-10
Change in fecal bacteria
(relative % abundance)
-10
-0.2
8 13 17 21 25 42 46 50 54 -15 8 13 17 21 25 42 46 50 54
-0.4
E. rectale B. ovatus
10
**
5
D
Cecal mucus-degrading bacteria
80
-5
0
(relative % abundance)
60
-10
-5
40
-15
-10
20
8 13 17 21 25 42 46 50 54 8 13 17 21 25 42 46 50 54
0 Days Days
Fiber-rich (FR)
FR
FF
pre
F
F
F
R/F
R/F
re/F
re/F
Fiber-free (FF) 1-day FR/FF group feeding on 1-day FR/FF group feeding on
fiber-rich (FR) diet fiber-free (FF) diet
ay F
ay F
ay P
ay P
1-day FR/FF
1-d
4-d
1-d
4-d
60
Colonic mucus- and fiber-
E 50 ns
(relative % abundance)
FR Lumen
40 FF Mucus
degrading bacteria
ns
30
ns
20
ns
ns
10 ns ns
ns
0
A. municiphila B. caccae E. rectale B. ovatus
Colon Colonic section Lumen Mucus (mucus) (mucus) (fiber) (fiber)
ha
Xy
R
β-
19
cellulose & other β-glucans, hemicelluloses
pectins
17 starch and storage glycans
fungal cell wall mannan
yl
-X
ac a
l-G ra
A, - A r
yl
15 may target multiple polysaccharides
-A
Fiber rich (FR)/Fiber free (FF)
αG ha A
et
L
-L
a l
-G α-
host O- and N-linked glycans
,α
,
yl
al
13
hy
al
ac , βX
R
βG lA
βG
et
L-
a
m
α-
lu
yl
Fold change
11 et
al
Ac
al yl
al
αG l-X
βG
αG , βG
N
αM , αG βM al
9
lc
y
βG
et
βG
a al, an
al
ac
αG βX ra,
a,
A
Ar
-A
7
lc
yl
αM A
yl
L-
βG α-L
αG
al
βX
lu yl
βX NA , α-
et yl
lu l
αG
βX n
αG -Xy
βG βX
a
ac ha
,
ac l-X
,
,
lu al
βG n
yl
an
lu
lc
5
lc yl
lc
βG αG
yl
R
yl c
y
,
βG βX
βM
al
αG ha
al
lu
et
L-
αG
βG
βG
R
α-
,
,
lu
L-
an
3
α-
lc c
c
lc c
uc c
αM
βG NA
αN NA
βG NA
m NA
in
eu
al
1
αG
-1
Fiber targeting enzymes
-3
Mucus targeting
enzymes
-5
G 1
9
G 8
E6
E7
G 1
5
G 4
G H9
G 3
11
G 0
G 1
PL 8
42
C 1
53
G 8
G 6
43
G 8
94
G 6
36
30
78
97
G 3
G 0
G 3
85
13 4
6
G 5
G 5
G 9
2
PL
PL
E
E
H
H
H
1
E1
3
9
2
7
7
1
am 8
10
11
10
12
10
40
PL
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
pf GH
C
C
C
G
G
G
H
H
G
G
13 Species color codes (bar colors):
A. muciniphila R. intestinalis B. intestinihominis
11
C. aerofaciens M. formatexigens B. caccae
9 E. coli C. symbiosum B. uniformis
Prebiotic (Pre)/FF
B. thetaiotaomicron
5
-1
-3
-5
G 1
9
G 8
E7
G 1
E6
G 3
5
G 4
G H9
11
G 0
G 1
78
97
G 3
G 0
G 3
85
13 4
PL 8
42
C 1
53
G 8
G 6
43
G 8
G 4
G 6
36
30
6
G 5
G 5
G 9
2
PL
PL
E
H
H
H
1
E1
7
1
am 8
9
2
7
3
9
2
10
11
10
12
10
40
PL
H
H
H
H
H
H
H
pf GH
H
H
H
H
H
H
H
H
H
C
C
C
C
G
G
G
G
H
H
G
G
B C D FR
7 FF/FR, normalized FR 1-day FR/FF
to community FF
FF 4-day FR/FF
mucin O-glycan (MOG)-specific transcripts
1.5
Cecal microbial enzyme activity
5 128 120
(μmol/min/mg of protein)
1.0
64 *
100
SCFAs and OA
Fold change
4 0.5
32 *** ***
80
16 0.0
3 60
**
** * 8
ns p=0.0007
4 40
2
ns
9
20
0. 0
2 ns
p=
1 1 0
st)
an e
o-
yc e
te
e
te
in se
s) e
s)
e
yc as
ali
gl a s
at
e uc
) as
an a s
o-
at
at
ra
na
s)
e r ila st)
an
uc ta
gl sid
in
ca ne
n t s id
et
ct
ty
ta
in a s l g l
in id
yc id
ph iali
io
(m ulf a
st)
)
cc
0
a c ge
La
Ac
Bu
ini ec aio
uc os
nt o
gl os
op
la lo
uc i d t y
ali
Su
la u c
c s et
(p xy
(m a m a c e
S
(m -fuc
uc sp
nt ct
er
Pr
B . ucu
(p -gl
h
la la
β-
m us . t on gen
s -N-
(p -ga
β
α
. r
B ic us
n
( m A uc
i
)
β
(m m uc
(m
Figure 3. Diet-Specific Changes in Carbohydrate Active Enzyme Expression Reveal a Community Shift from Fiber to Mucus Degradation
(A) Positive and negative fold-changes in transcripts encoding carbohydrate active enzymes (CAZymes) between either FR/FF (top) or Pre/FF (bottom)
comparisons. Only CAZyme families (x axis) in which >2-fold changes and p < 0.05 (Student’s t test) were observed for all of the genes in that family in RPKM-
normalized cecal community transcriptomes are shown as averages; open circles denote statistically insignificant differences. n = 3 mice/group, experiment 1.
(B) Fold-change values of empirically validated (Table S5), MOG-specific transcripts of three mucus-degrading bacteria. n = 3 mice/group, experiment 1. Data are
shown as average and error bars represent SEM. Student’s t test.
(legend continued on next page)
(C) Activities of cecal enzymes determined by employing p-nitrophenyl-linked substrates. n = 4 for FR and FF groups and n = 3 for other groups, experiment 1.
Data are shown as average and error bars represent SD. One-way ANOVA, FR diet group versus other groups.
(D) Concentrations of organic acid (OA, succinate) and short-chain fatty acids (SCFA) determined from cecal contents. n = 4 mice/group; 2 mice/dietary group in
two independent experiments (#2A and 3). Middle lines indicate average of the individual measurements shown and error bars represent SEM. Student’s t test.
See also Figure S4 and Tables S4, S5, and S6.
Thickness of inner
mucus layer (μm)
100 a
80
*a
60
b
40
b
† b
b b
B Muc2
20
DAPI D
0
Mice: 6 6 3 3 3 3 3 6 3
Measurements: 962 722 347 261 70 77 349 845 198
FF
et
SM t
S t
ith ie
ith ie
/F
/F
/F
F )
Pr M)
F e)
e)
m t
di
m t
er ie
er ie
(w R d
e/
(w F d
FR
FR
re
fre
(g R d
e
(g F d
Pr
Pr
f
F
F
da
da
da
da
1-
4-
1-
4-
12
D FR diet (with SM) E F ns
FF diet (with SM)
p=0.011 11 ns
***
10 *
Lipocalin (pg per g feces)
10
p=0.055
10000
8 9 *
**
6 1000 8 ** ***
4 ns
ns 100 7
2
0 10 6
c
2
f1
f3
f3
e)
e)
ith et
5a
uc
)
ith et
da /FF
FF
et
FF
SM
SM
Tf
Tf
Kl
SM
SM
fre
/F
fre
(w di
(w di
M
uc
di
e/
e/
FR
FR
M
FR
ith
ith
e
FF
m
Pr
Pr
Pr
(w
(w
er
er
y
FR day
da
(g
(g
da
Mucus gene probed
et
et
1-
4-
et
et
1-
4-
di
di
di
di
FR
FF
FF
G
−0.5
−1.5
0.5
−1
1
Actgn1a
Rnf3 14
Muc i1
Cd9 c1
Osm t2
Sta a4
Igf1 a1
Bcl6 1
Lim 1
Fn1 2
Cyth 4
Pm 4
Abl1 1
Siglep1
Clic 1
Fnbs1
Fer 1
Egr 2
t1
lg
Rapk1
Cd3 3
1
Tgfba
s2
Ighm1
Ras 9
Plxn 2
1
Pro 2
Par 2
Myh r
9
Axl 2
Casl1
3
F3 3
Clip 6
m
a
p
t1
t3
Saa3
Fur 1
a
in
Flnc1
p
4
Ccl14
s
1
Myl6
v
Rela
Mylk
Actn
Ets2
r
Slc2
Bcl2
Flnb
Flna
Myh
Gbp
Gab
Rab
Icos
Gsn
Nco
Soa
Saa
Scn
Dsp
Anx
Thb
Mm
l
Acs
Itgb
Ptg
Sta
Bcr
Vcl
Figure 4. Microbiota-Mediated Erosion of the Colonic Mucus Barrier and Host Responses
(A) Alcian blue-stained colonic sections showing the mucus layer (arrows). Scale bars, 100 mm. Opposing black arrows with shafts delineate the mucus layer that
was measured and triangular arrowheads point to pre-secretory goblet cells.
(B) Immunofluorescence images of colonic thin sections stained with a-Muc2 antibody and DAPI. Opposing white arrows with shafts delineate the mucus layer.
Inset (FF diet group) shows a higher magnification of bacteria-sized, DAPI-stained particles in closer proximity to host epithelium and even crossing this barrier.
Scale bars, 100 mm; inset, 10 mm.
(C) Blinded colonic mucus layer measurements from Alcian blue-stained sections. Mice in the FR and FF fed colonized groups (experiments 1 and 2A), and in the
FR-diet fed germfree groups are from two independent experiments; all other colonized mice are from experiment 1. Asterisk and dagger indicate that colons of
only two and one mice contained fecal masses, respectively. Data are presented as average and error bars represent SEM. Statistically significant differences are
annotated with different letters p < 0.01. One-way ANOVA with Tukey’s test.
(D) Microarray-derived transcript levels of genes involved in the production of colonic mucus (n = 4 for the FR diet group and n = 3 for the FF diet group). Data are
from two independent experiments (#2A and 3). Values are shown as average and error bars represent SEM. Student’s t test.
(E) Levels of fecal lipocalin (LCN2) measured by ELISA in the FR and FF diet fed groups (day 50, Figure S6A; experiment 2A). n = 7 mice/group. Middle lines
indicate average of the individual measurements shown and error bars represent SEM. Mann-Whitney test.
(F) Colon lengths of mice subjected to different dietary treatments. Data for the FR (with SM) and FF (with SM) are representative of three independent exper-
iments (experiments 1, 2A, and 3). Middle lines indicate average of the individual measurements shown and error bars represent SEM. One-way ANOVA, FR diet
group (with SM) versus other groups.
(G) Changes in the host cecal transcriptome between FR and FF diet conditions. Heatmap shows statistically significant fold changes of genes identified from
ingenuity pathway analysis (false discovery rate [FDR] < 0.05 and absolute Log2 fold-change > 0.5). n = 4 for the FR diet group and n = 3 for the FF diet group; data
are from two independent experiments (#2A and 3).
See also Figure S5 and Table S7.
C. rodentium (Cr)
C. rodentium (Cr)
20
* Caging: 3 cages; 1, 1,
10 * and 3 mice
15 * Fiber-free (FF) diet
* 10 * ** **
*** *** **** **** **
(SM+Cr)
9
* *
ns Caging: 2 cages; 1 and
ns 4 mice
5
ns ** ns
8 0 FR diet (Cr only)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Caging: 2 cages; 2 and
Days post infection (dpi) dpi 3 mice
C D
105 100 FF diet (Cr only)
Caging: 1 and 4 mice
Weight change (%)
100 80
Survival (%) FR (SM+Cr)
95 60
FF (SM+Cr)
90 40 FR (Cr only)
** FF (Cr only)
85 * ** 20
80
** *** 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
dpi dpi
E F
Cecum Cecum
(10 dpi) (10 dpi) FR (SM+Cr) FF (SM+Cr)
FR (SM+Cr)
FF (SM+Cr)
a
80 80 80 FR diet (SM+Cr)
b
60 60 60 c FF diet (SM+Cr)
40 a 40 40 FR diet (Cr only)
(10 dpi)
a a a
20 a 20 20 d FF diet (Cr only)
a
0 0 0
log10CFU/g feces
10
Thick colonic mucus Thin colonic mucus Thin colonic mucus Thin colonic mucus
(with SM) (with SM)) (germ free) (germ free) a
a
9
8
Infection with luciferase- Readouts; panels B–E;
carrying Citrobacter rodentium (Cr)
SM t
nl t
SM t
nl t
ith ie
4 days post infection (dpi)
r o ie
ith ie
r o ie
)
y)
y)
(w R d
(C R d
(w F d
(C F d
F
F
F
C Flushed colons (4 dpi) D *
2.5 FR diet (with SM) FF diet (with SM) *
ns
1010 ns ns
Luminescence intensity in
Radiance (p/sec/cm2/sr) x 107
(Radiance, p/sec/cm2/sr)
2.0 **
1.5
108
1.0
107
SM t
nl t
SM t
nl t
ith ie
r o ie
ith ie
r o ie
)
)
0.5
y)
y)
(w R d
(C R d
(w F d
(C F d
F
F
F
Cage, male Cage, females Cage, males Cage, females
FR diet FF diet
E
Figure 6. Fiber-Deprived Gut Microbiota Promotes Faster C. rodentium Access to the Colonic Epithelium
(A) Experimental setup for luminescent C. rodentium experiment (experiment 4).
(B) Fecal burdens of C. rodentium at 4 dpi. Data are shown as averages and error bars represent SEM; statistically significant differences are shown with different
letters (p < 0.001). One-way ANOVA with Tukey’s test.
(C) Bioluminescence images of flushed colons showing the location and intensity of adherent C. rodentium colonization.
(D) Quantified bioluminescence intensities of C. rodentium from (C) and Figure S7C. Middle lines indicate average of the individual measurements shown.
Kruskal-Wallis one-way ANOVA with Dunn’s test.
(E) Transmission electron microscopy images of the representative colonic regions from flushed colons; arrowheads denote individual C. rodentium cells and ‘‘P’’
denotes epithelial pedestals in high power/FF image. Scale bars, low power views 10 mm and high power views 2 mm.
See also Figure S7.
interpretations. We demonstrate that fiber deficiency allows the precise catalytic roles. Here, we not only leverage knowledge
subset of mucin-degrading bacteria to increase their population of the substrate and enzyme specificities associated with
and express mucin-degrading CAZymes to access mucin as a some of the well-studied species in our SM (Table S4 and
nutrient. While the ability to annotate CAZyme functions is well references therein), but we also employ new in vitro growth
developed vis-a-vis many other metabolic functions that are and transcriptional profiling experiments for key mucus-degrad-
important in the microbiome (El Kaoutari et al., 2013), there are ing bacteria (B. caccae and A. muciniphila). Our results point out
still substantial ambiguities in connecting such predictions with a poignant example of how this evolving ‘‘bottom up’’ approach
Kamada, N., Kim, Y.-G., Sham, H.P., Vallance, B.A., Puente, J.L., Rogowski, A., Briggs, J.A., Mortimer, J.C., Tryfona, T., Terrapon, N., Lowe,
Martens, E.C., and Núñez, G. (2012). Regulated virulence controls the E.C., Baslé, A., Morland, C., Day, A.M., Zheng, H., et al. (2015). Glycan
ability of a pathogen to compete with the gut microbiota. Science 336, complexity dictates microbial resource allocation in the large intestine. Nat.
1325–1329. Commun. 6, 7481.
Kaper, J.B., Nataro, J.P., and Mobley, H.L.T. (2004). Pathogenic Escherichia Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister,
coli. Nat. Rev. Microbiol. 2, 123–140. E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., et al.
(2009). Introducing mothur: open-source, platform-independent, community-
Kozich, J.J., Westcott, S.L., Baxter, N.T., Highlander, S.K., and Schloss, P.D.
supported software for describing and comparing microbial communities.
(2013). Development of a dual-index sequencing strategy and curation
Appl. Environ. Microbiol. 75, 7537–7541.
pipeline for analyzing amplicon sequence data on the MiSeq Illumina
sequencing platform. Appl. Environ. Microbiol. 79, 5112–5120. Schneider, C.A., Rasband, W.S., and Eliceiri, K.W. (2012). NIH Image to
Larsson, J.M.H., Karlsson, H., Crespo, J.G., Johansson, M.E.V., Eklund, L., ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675.
Sjövall, H., and Hansson, G.C. (2011). Altered O-glycosylation profile of Sonnenburg, E.D., and Sonnenburg, J.L. (2014). Starving our microbial
MUC2 mucin occurs in active ulcerative colitis and is associated with self: the deleterious consequences of a diet deficient in microbiota-accessible
increased inflammation. Inflamm. Bowel Dis. 17, 2299–2307. carbohydrates. Cell Metab. 20, 779–786.
Further information may be obtained from the Lead Contact Eric C. Martens (Email: emartens@umich.edu; address: University of
Michigan Medical School, Ann Arbor, Michigan 48109, USA).
METHOD DETAILS
Experimental Design
A total of four gnotobiotic animal experiments (Experiments 1 4; also mentioned in figure legends) were performed – details of the
experimental replication are provided in the corresponding figure legends. Both male and female mice were randomly used depend-
ing on the availability of animals. Gnotobiotic Experiment 1 contained 2 male mice in Fiber-rich (FR) group, 2 male mice in Fiber-free
(FF) group and 1 male mouse in Prebiotic (Pre) group; all other animals in Gnotobiotic Experiment 1 were females. Gnotobiotic Exper-
iment 2A and 2B had all male mice. All animals in Gnotobiotic Experiment 3 were females. Gender details of the animals in gnotobiotic
Experiment 4 are shown in Figure 6 (both males and females were used). For infection with wild-type C. rodentium in germfree (GF)
mice, all male mice were used. Gender details of GF mice used for infection with luciferase-expressing C. rodentium are included in
Figure S7 (both males and females were used). Finally, all GF mice used for measurement of the colonic mucus layer (Figure 4C) were
females. The researchers were not blinded to the identities of the treatment groups; however, the technician who assigned individual
gnotobiotic animals to different treatment groups was not aware of the experimental details. Measurements of the colonic mucus
layer were single blinded (see details below in the relevant section). The pathologist who devised the inflammation-scoring rubric
was not blinded, and the pathologist who performed the histology scoring and the technician who performed electron microscopy
were blinded for the identities of the treatment groups (see below for details of the methods). No data were excluded from the final
analysis.
Sample size estimations were performed as follows in consultation with a statistician. Based on previous studies it was assumed
an effect size (ratio of mean difference to within group standard deviation) of 3 would be reasonable for readouts such as mucus layer
measurements, enzyme assays and measurement of transcript changes. With 3 animals in each group and a 5% significance level,
two-sided, this would yield a power of 78% for the t test. Therefore, for some of the feeding groups (those alternated between
different diets), 3 animals were used. However, for other feeding groups that were more important for the central research question
of the study (e.g., constant feeding on Fiber-rich (FR) and Fiber-free (FF) diets), at least 4 animals were used to obtain higher power.
For C. rodentium infection experiments, in most cases 5 animals per group were used based on results of our previous study (Ka-
mada et al., 2012).
qPCR
In addition to Illumina sequencing of the 16S rRNA genes (V4 region), as a second approach to quantifying relative bacterial abun-
dance in fecal samples, phylotype-specific bacterial primers were designed. The primers were designed against randomly selected
genes that were checked for homology against the other 13 species in each case. These primer sequences are listed in Table S3A.
The primers were tested for specificities against the target strain by comparing the primer and target gene sequences against
sequences in public databases. Moreover, specificity of each primer was validated by the following three approaches: 1) by
Immunofluorescence Staining
The immunofluorescence staining for Muc2 mucin was performed on the colonic thin sections after several modifications to the pro-
tocols from Johansson and Hansson, 2012 and an immunohistochemistry/tissue section staining protocol from BD Biosciences,
USA (http://www.bdbiosciences.com). The sections were deparaffinized by dipping in 50 mL Falcon conical tubes filled with xylene
(Sigma-Aldrich, USA) for 5 min, followed by transfer to another tube with fresh xylene for 5 min – care was taken to completely
immerse the tissue material in the liquid (also in the subsequent steps). This was followed by two dehydration steps of 5 min each
using 100% isopropanol contained in conical tubes. The slides were then washed by dipping in conical tubes containing Milli-Q
water. The antigens were retrieved by placing the slides in a glass beaker with enough BD Retrievagen A (pH 6.0; BD Biosciences,
USA) to cover the slides. The sections were then heated by microwaving and holding at about 89 C for 10 min (microwaving was
repeated during this time, as required). The slides were then cooled for 20 min at room temperature. Afterward, the slides were
washed 3 times with Milli-Q water. Excess liquid was gently blotted away and a PAP pen was used to draw a circle around the tissue
area, in order to better hold liquid on the tissue area during subsequent steps. Blocking was performed by immersing the slides into
blocking buffer (1:10 dilution of goat serum (Sigma, USA) in 1x Tris-buffered Saline (TBS; 500 mM NaCl, 50 mM Tris, pH 7.4)) and
incubating them at room temperature for 1 hr. For the primary antibody staining, the tissue sections were covered in a 1:200 dilution
Mucin 2 antibody (H-300) (original concentration: 200 mg/ml; Santa Cruz Biotechnology, USA) in the aforementioned blocking buffer
and incubated for 2 hr at room temperature. After the incubation step, the excess liquid was blotted away and the slides were rinsed
3 times in 1x TBS (in conical tubes) for 5 min each. The secondary antibody staining was performed by covering the tissue sections
with a 1:200 dilution of Alexa Fluor 488 conjugated goat anti-rabbit IgG antibody (original concentration: 2 mg/ml; Thermo Fisher Sci-
entific, USA) in blocking buffer and the sections were incubated for 1 hr at room temperature in dark. The excess liquid was blotted
away and the sections were rinsed twice for 5 min each using TBS. Next, the sections were stained for 5 min at room temperature in
dark using a 10 mg/ml of DAPI solution diluted in 1x TBS (Sigma-Aldrich, USA). The sections were then rinsed with Milli-Q water and
blotted dry. Finally, the sections were covered with ProLong Gold Antifade Mountant (Invitrogen, USA), covered with coverslips and
the edges of the coverslips sealed with nail polish. The slides were kept at room temperature in dark for at least 24 hr and then visu-
alized by Olympus BX60 upright fluorescence microscope (Olympus, USA).
Tissue Histology
To perform histology analyses on GI tracts of C. rodentium infected mice: first the intestinal segments (cecum and colon together)
were fixed in Carnoy’s fixative for 3 hr, followed by transfer to fresh Carnoy’s fixative overnight. Next, the samples were washed
in 100% methanol (2x) for 30 min each, which was followed by washing in 100% ethanol (2x) for 20 min each. The samples were
then stored in 100% ethanol at 4 C until further use. After 100% ethanol washes, the intestinal tissue samples were divided into 3
sections for histology: cecum, ascending colon, and the descending colon/rectum. These sections were embedded, processed
Statistical Analyses
All experimental analyses were conducted in consultation with a statistician. Unless otherwise stated in individual method sections
above, all statistical analyses were performed using Prism 5.04 (GraphPad Software, Inc.), except statistics for colony forming units
(CFU) for C. rodentium (Figure 5A) were performed in Excel. Statistically significant differences are shown with asterisks as follows:
*p < 0.05, **p < 0.01, ***p < 0.001 and ****p < 0.0001; whereas, ns indicates comparisons that are not significant. Numbers of animals
(n) used for individual experiments, details of the statistical tests used and pooled values for several biological replicates are indicated
in the respective figure legends. A two-tailed t test was employed in all cases. Since generally the microbiome data did not follow a
normal distribution, for these data a nonparametric test such as the Wilcoxon test was used. An exception was the data in Figure 2C,
which generally followed a normal distribution and hence a t test was used (see details in the section: Illumina sequencing and data
analysis). For the other data that appeared normally distributed, a t test was used; otherwise, a non-parametric (Mann-Whitney) test
was used (for example for Figure 4E). Finally, ANOVA (parametric) and Kruskal-Wallis (nonparametric) methods were used to
describe differences between more than two groups. For data in Figure 6D, a non-parametric approach with Dunn’s test and
involving pairwise comparisons was employed.
Accession Numbers
Data from this study have been deposited in the NCBI Short-Read Archive (SRA) and Gene Expression Omnibus (GEO) databases
under the following accession and/or BioProjectID identifiers: 16S rRNA gene sequences and metadata (SRA: SRP065682,
PRJNA300261); RNA-seq data (NCBI: SRP092534, SRP092530, SRP092478, SRP092476, SRP092461, SRP092458, SRP092453);
mouse microarray data (GEO: GSM2084849-55). The commands used to analyze the 16S rRNA gene data can be found online at
the following link: https://github.com/aseekatz/mouse.fiber.
A
Hemicelluloses P
Plant fiber polysaccharide
(Bo, Bu, Ros &
Released
Ac
Pr
Erec) a
and starch degraders monosaccharides (mn)
op
et
B
at
io
(Bo, Bu, Bt, Bc, (E uty
e,
(Degradation by
na ina
Pectins
Fa rec, rat
Su
te
(Bo, Bt, Bc & Bu) Fpra, Mfor, Erec, Ros) e, R e Acetate all species)
cc
Cellobiose
Cs os, Acetogen
ym
te
(Bo, Bt, Ros, Erec, ) (Mfor)
Fpra & Mfor)
Starches, Fructans
(Bo, Bu, Bt, Bc, Ros,
Plant cells and fiber te,
SCFA + CO2 + H2 Dietary and released
eta amino acids (aa)
, Ac
Erec, Fpra & Mfor)
α, β-glucans ate
pion te
(Degradation by
Pro
(Bo, Bu, Bt, Ros,
cina Ecol, Csym, Col)
Suc
& Erec)
Propionate,
Sulfated and non- -2 Acetate,
sulfated GAGs Shed cells or SO 4 Succinate
(Bo, Bt & Bc)
meat
-2
Energy Mucin SO4 + H2 H 2S
Mucus source degradation Sulfate
(Amuc, Bar, Bt, Bc) reduction (Des)
Healthy
Host tissue
colonic tissue
OD600
OD600
OD600
sulfate (Potato)
(Potato
t )
0.4 Inulin
0.5 Xylan
0.2 Arabinoxylan
Cellobiose
0.0 0.0 Galactomannan
Galactot mannan
0 20 40 60 80 100 0 20 40 60 80 100
Time (h) Time (h)
Figure S1. Versatile Metabolic Abilities Contributed by Members of the Human Gut Synthetic Microbiota (SM), Related to Figure 1
(A) A schematic displaying abilities of the SM to degrade a wide variety of dietary and host-derived polysaccharides and possible metabolic interactions between
members of the SM. GAGs, Glycosaminoglycans.
(B) Representative growth curves of selected members of the SM on several polysaccharides and glycans as sole carbon sources (n = 2 for each glycan; values
are shown as averages). The absorbance was measured every 10 min. See Table S1 for raw and normalized growth values and growth media descriptions for the
13 members of the SM evaluated for carbohydrate growth ability (all except D. piger).
By experiment
0.4
Experiment 1
Experiment 2
0.2
FR: cage 1
FR: cage 10
FR: cage 11
FR: cage 2
FR: cage 3
PCO 2 (7%)
FR: cage 4
FR: cage 7
0.0
FR: cage 8
FR: cage 9
FF: cage 1
FF: cage 2
FF: cage 5
FF: cage 6
-0.2
-0.4
By sex
0.4
Male
Female
FR: cage 1
FR: cage 10
FR: cage 11
FR: cage 2
FR: cage 3
PCO 2 (7%)
FR: cage 4
FR: cage 7
0.0
FR: cage 8
FR: cage 9
FF: cage 1
FF: cage 2
FF: cage 5
FF: cage 6
-0.2
-0.4
Figure S2. PCoA Plots Demonstrating Clustering of Fecal Bacterial Communities over Time in Two Feeding Regimens, Related to Figure 2
Principal coordinates analysis (PCoA) of microbial community dissimilarity (Bray-Curtis) in fecal samples (collected according to Figure 1B) as determined by 16S
rRNA-based sequencing (V4 region). Samples from both Experiments 1 and 2 are shown, with samples coded by experiment (top panel) or sex (bottom panel) and
by cage (legend) (Experiment 1: n = 4 mice/group; Experiment 2: n = 7 mice/group).
A 50
A. municiphila B. caccae D. piger
40
Fiber rich 10
(FR)
Fiber free 8
40
30
(FF)
6
Prebiotic
30
20
(Pre)
4
1-day
20
FR/FF
10
2
Relative abundance (%)
10
0
6 13 17 21 25 42 46 50 54 6 13 17 21 25 42 46 50 54 6 13 17 21 25 42 46 50 54
Days Days Days
30
15
25
15
20
10
10
15
10
All groups 5
5
fed FR diet
5
0
0
6 13 17 21 25 42 46 50 54 6 13 17 21 25 42 46 50 54 6 13 17 21 25 42 46 50 54
Days Days Days
B FR Relative
FF 1-day FR/FF 4-day FR/FF abundance:
ae
E. is
ex s
C au a
is
is
B. hila
ae
E. is
ex s
C au a
is
is
ov l
ov l
C ofa ii
li
C ofa ii
li
un e r
un er
a
a
itz
co
itz
co
u
en
ym en
u
en
ym ien
et
et
rm
l
in
rm
l
in
ct
ct
su
su
te tina
te tina
B. pig
B. pig
at
at
cc
cc
F. . th
F. . th
. a sn
. a sn
om
om
ip
ip
re
re
ig
ci
ig
E.
E.
c
ifo
R bio
ifo
R bio
ca
ca
ic
ic
s
s
.
.
B
B
D
D
ih
ih
un
te
un
te
. f B.
. f B.
at
at
pr
pr
st
st
er
er
n
n
m
m
m
m
.i
.i
.s
.s
or
or
A.
in
A.
in
B.
B.
M
Figure S3. Fecal Microbial Community Dynamics in Mice from Distinct Dietary Feeding Groups, Related to Figure 2
(A) Relative abundance of indicated bacteria in mice over time subjected to various dietary regimes as determined by Illumina-based 16S rRNA sequencing
(Experiment 1). An explanation for the inverse relationship between the relative abundances of D. piger and M. formatexigens on FR and FF diets is their
In vivo In vivo
In vitro
FF FF-FR 1-day
MOG
19 121 In vivo 8
β G a l te
(21)
a
FR
β G ulf
al
79
-s
Ac
116
O
Frequency
N
24 1 (4) (8) 44 6
lc
Ac
βG
(1)
N
ex
βH
10 360 4
al
Ac
Ac
(33)
βG
lN
(2)
N
eu
βG Ga
al
37
αGal c
uc Ac
αG
αN
,α
α Ga l N
αF lcN
7 (5) 9 2
al
al
αG
αG
(2) 1 1
1
0
G s
G 2
G 43
G 05
33
G 8
G 31
ta 2
G 6
G 3
G 20
G 88
G 35
PL 28
G _2
G 91
G 78
G 32
G 51
G 89
PL 29
G 8_2
PL 6
G 120
G 09
G 7
G _2
53
se
E
l fa 3
H
1
H
9
3
1
H
H
su M B
H
H
H
H
9
H
H
H
H
H
H
1
H
H
1
H
830 total genes upregulated
C
H
H
H
C
B A. muciniphila
In vivo In vivo 10
In vitro
FF FF-FR 1-day
ul c
MOG
te
O xNA
fa
10 22
In vivo 8
e
-s
βH
(2)
FR
Frequency
104 6
8 4 7 (2) 32
Ac
N
eu
4
αN l
321
a
Ac
Ac
al c
βG
,G A
0
N
(24)
N
al alN
uc
lc
al c
Ac
αG
βG NA
αF
αG l,G
N
42 2
uc
al
al
αG l
a
a
αG
αG
βG
αF
0 (8) 0
3 0
0
1
ta 0
s
G H2
G 33
G 29
G 43
G H89
G 105
G 109
G 3
G 13
G 16
G 27
G 36
G 63
G 95
97
se
12
lfa 2
su G H
H
H
H
H
H
H
H
H
H
H
G
H
H
H
C SO4
O-sulfate αGalNAc αGalNAc
αGalNAc
αFuc α1,2 αFuc α1,2
Ser/Thr
Ser/Thr Ser/Thr
α1,3 β1,4 β1,3 β1,3 α1
α1,3 β1,4 β1,3 β1,3 α1 α1,3 β1,4 β1,3 β1,3 α1
αNeuNAc βGal βGlcNAc βGal αGalNAc αGal βGal
βHexNAc
Sulfated core 1 βGal βGlcNAc βGal A blood group βGlcNAc βGal B blood group
βHexNAc βHexNAc
Key:
N-acetyl N-acetyl N- acetyl
Galactose Fucose
glucosamine galactosamine neuraminic acid
Figure S4. Dynamic Changes in Transcriptional Profiles of B. caccae and A. muciniphila In Vivo and In Vitro, Related to Figure 3
Figures are based on RNA-Seq measurements of B. caccae (A) and A. muciniphila (B) responses in vitro (minimal medium with simple sugars or MOG) and in vivo
(constant feeding or daily alternation of FR and FF diets). In vivo samples are from the entire cecal community at the end of Experiment 1. Gene transcripts that
were increased > 5-fold relative to the corresponding simple sugar references are included for each bacterium. Venn diagrams show overlap and differences of
the transcripts between various groups. Numbers indicate the total differentially regulated gene count for a given sector, while number in parentheses denote
numbers of carbohydrate-degrading enzymes (glycoside hydrolase, polysaccharide lyase or carbohydrate esterase families counted toward this number; sul-
fatases and carbohydrate binding module, CBM, families were not counted). Note that A. muciniphila shows less regulatory versatility as manifest by most of its
upregulated enzymes being confined to the core (dark pink) sector containing all of the in vivo samples. This suggests that MOG only triggers a small percentage
(legend continued on next page)
of this species’ O-glycan degrading responses in vitro; although 8 enzymes were also triggered in vitro. The corresponding histograms display frequencies of
related enzyme families (shown with matching colors to their respective Venn sectors). Possible mucin-related degradative functions are given above each family-
specific histogram bar. For in vitro samples, n = 2 for each MOG and simple sugar grown condition; for in vivo samples, n = 3 mice/group (Experiment 1).
(C) Schematic mucin O-glycan structures, from among 102 that can be found on human and murine Muc2, with the sites at which various enzymes noted in (A)
and (B) would be expected, or are known, to cleave. See also Tables S4 and S5 for in vivo and in vitro transcript data.
A
Rectum
Fiber-rich (FR) diet
Rectum
Fiber-free (FF) diet
Organ morphology
Gavaged with synthetic
microbiota (SM) for Inflammatory response
3 consecutive days (days 1, 2 and 3)
Organismal survival
Figure S5. Histology Images, Body Weights, and Additional Cecal Tissue Transcriptional Responses of Gnotobiotic Mice Fed Fiber-Rich (FR)
and Fiber-Free (FF) Diets, Related to Figure 4
(A) Depictive histology images (Hematoxylin and Eosin of colonic thin sections) showing no overt signs of inflammation between the two dietary regimens
(Experiment 1), in the absence of C. rodentium. Scale bars, 500 mm.
(B) Weight change in mice over time. Values are shown as averages ± SEM; n = 4 for FR and FF groups; and n = 3 for Pre group (Experiment 1). ns, not significant;
One-way ANOVA with Tukey’s test.
(C) Top 15 altered diseases and functions between two dietary regimens detected by Ingenuity Pathway Analysis of microarray data (cecal tissue mRNA). n = 4 for
the FR diet group and n = 3 for the FF diet group (Experiments 2A and 3).
A Fiber-rich (FR) diet (n = 7 mice) Fiber-free (FF) diet (n = 7 mice) A. muciniphila
100
C. aerofaciens
D. piger
F. prausnitzii
R. intestinalis
Fecal bacteria
M. formatexigens
60
C. symbiosum
E. rectale
B. intestinihominis
40
B. caccae
B. uniformis
B. ovatus
20
B. thetaiotaomicron
C. rodentium (Cr)
0
6 8 13 16 19 22 25 42 45 49 52 57 60 63 66 6 8 13 16 19 22 25 42 45 49 52 57 60 63 66
Days
FR diet Sacrifice 2 mice for Experiment 2B Sacrifice 2 mice for
mucus layer mucus layer
measurements (day 51) Infection with Cr measurements (day 51) Infection with Cr
(day 56; n = 5 mice) (day 56; n = 5 mice)
Experiment 2A
Figure S6. Microbial Community Structure Pre- and Post-Citrobacter rodentium Infection and Severity of Colitis Post-C. rodentium Infec-
tion, Related to Figure 5
(A) Stream plots illustrating fecal microbial community dynamics over time. Stream plots are based on Illumina sequencing of the V4 region of 16S rRNA genes
(Experiment 2A,B); see Figure 1B for timeline. See Table S2 for % relative abundance of each species in individual mice. Experimental setup for the gnotobiotic
experiments 2A and 2B is also shown.
(B) Histological images illustrating the similar severity of C. rodentium-associated hyperplasia in SM-colonized mice from different feeding groups or germfree
animals only exposed to pathogen. The images are Hematoxylin and Eosin (H and E) stained sections of unflushed cecal tissue all at 10 dpi (Experiment 2B). Scale
bars, 500 mm; higher power inset bars, 50 mm.
A SM + Cr, FR diet SM + Cr, FF diet GF + Cr, FR diet GF + Cr, FF diet
40
1.5
20
1.0
0
r
ly r
ly r
+C
+C
on R-C
on F-C
M
0.5
F
-S
-S
FR
FF
Mice: 5 1 5 4
Cage, male Cage, males Cage, males Cage, males
Measurements: 152 37 136 170
Figure S7. Thickness of the Rectal Mucus Layer Post-Citrobacter rodentium Infection and Bioluminescence Images of Flushed Colons
Showing Colonization Intensity of Luciferase-Expressing C. rodentium, Related to Figures 5 and 6
(A) Periodic acid-Schiff (PAS)-Alcian Blue (AB) stained colonic thin sections showing the mucus layer (shown with opposing arrows with shafts) in recta of different
groups of mice at 10 dpi (Experiment 2B). Scale bar, 50 mm.
(B) Mucus layer measurements in the recta of mice from PAS-AB stained thin sections (exemplified in A). Asterisk indicates that the FF-SM group had only one
mouse, whose rectal mucus layer could be measured, because the other mice from this group were severely affected with colitis. Data are shown as average ±
SEM; statistically significant differences are shown with different alphabets (p < 0.01); One-way ANOVA with Tukey’s test.
(C) Flushed colons showing intensity of adherent, luciferase-expressing C. rodentium in germ free (GF) mice pre-fed the FR and FF diets and infected with the
pathogen (that is mice without the synthetic microbiota).
Article
Correspondence
h.stunnenberg@ncmls.ru.nl
In Brief
As part of the International Human
Epigenome Consortium (IHEC), this study
reveals that b-glucan reverses the state of
epigenetic immune tolerance that
develops after exposure to LPS and
restores the ability of human
macrophages to produce cytokines that
are critical for anti-pathogen responses.
Explore the Cell Press IHEC webportal at
http://www.cell.com/consortium/IHEC.
*Correspondence: h.stunnenberg@ncmls.ru.nl
http://dx.doi.org/10.1016/j.cell.2016.09.034
1354 Cell 167, 1354–1368, November 17, 2016 ª 2016 Elsevier Inc.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
A Innate immunity memory model B
Distal H3K27ac dynamics Distal H3K4me1 dynamics
time
d0 1h 4h 24h d6 50
20
d0
PC2 (13.4%)
1 hour
PC2 (15%)
0
4 hour
Wash 0
out day 1
−20
day 6
−50
BG
BG- Mf PC1 (55.9%) PC1 (69.3%)
C
BG up / LPS down LPS up Differentiation gain Differentiation loss
4 4 3 1
H3K27ac H3K27ac H3K27ac
H3K27ac signal
H3K27ac
3 3 2 0
logFC
2 2 1 −1
1 1 0 −2
0 0 −1 −3
d0 1h 4h d1 d6 d0 1h 4h d1 d6 d0 1h 4h d1 d6 d0 1h 4h d1 d6
Naive LPS BG
2
40
0
time -2
20
d0
Differentiation gain
PC2 (10.8%)
1 hour
0
4 hour
day 1
−20
day 6
(i)
treatment
Naive
−40
LPS
BG
−60
1h
4h
d1
d6
1h
4h
d1
d6
1h
4h
d1
d6
1h
4h
d1
d6
Mo Naive LPS BG
LPS
effect of
BG or LPS
1h
4h
d1
d6
1h
4h
d1
d6
1h
4h
d1
d6
(C) A total of 17,500 H3K27ac dynamic gene-distal regions were identified and can be clearly separated into four clusters: BG up/LPS down, LPS up, differ-
entiation gain, and differentiation loss. Solid lines are median log-FC relative to day 0, and shaded areas represent the 25th and 75th quartile. Naive cells are shown
as a green line, LPS as a red line, and BG as a purple line. H3K4me1 at these regions can be seen in Figure S1B; LPS induces early H3K27ac accumulation,
followed by long-term H3K4me1 marking, while BG induces concurrent accumulation of H3K27ac and H3K4me1.
(D) PCA plots showing the relationships among all samples based on dynamic gene expression. PC1 explains most of the variation and is associated with
differentiation. PC2 is LPS related, with LPS 4 hr and LPS day 1 samples separating from the corresponding naive and BG samples.
(E) Heatmap of differentiation associated genes, as well as those induced by LPS or BG exposure. The general trend in expression is that BG exposed cells start to
express differentiation associated genes faster (at day 1) than naive cells, while LPS exposed cells lag behind.
(F) Top pathways associated with differentiation and showing opposing directions in response to BG and LPS.
See also Figures S1, S2, and S3 and Tables S1, S2, S3, and S4.
Enhancers
NFKB 1
CREB1
Relative importance (RF)
50
25
100
SPI1 SPI1 IRF NRF1
MITF
CREB1
75 ZFP161
Promoters
E2F JUNB
NFKB
CREB1 KLF15
50
25
0
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100
B C
H3K27ac dynamic H3K27ac dynamic
enhancers promoters
Abundance over background
BG up / LPS down
Differentiation gain
Differentiation loss
Background
CREB1
0
JUNB
NFKB
LPS up
BG up / LPS down
Differentiation gain
Differentiation loss
Background
Figure 2. Motif Enrichment at Epigenetically Dynamic Promoters and Enhancers and Associated Transcription Factor Networks
Motif enrichment analysis was performed on ATAC-sequencing (nucleosome-free) peaks that overlap H3K27ac dynamic enhancers and H3K27ac promoters.
(A) Random forest (RF) and a partial least-squares (PLS) classifiers were trained using the TF motifs found by GIMME to determine features (TF motifs) based on
their ability to separate the 4 H3K27ac clusters shown in Figure 1C. Both classifiers produce a feature importance score (between 0 and 100), which is a measure
of how ‘‘characteristic’’ the presence or absence of the TF motif is for the considered cluster. Green dots represent positive features (motif over-represented in
cluster), and red dots represent negative features (motif under-represented in cluster). The EGR2 motif was the strongest positive feature for the BG up/LPS down
enhancer cluster, NF-kB for the LPS up cluster, SPI1 (PU.1) for the differentiation gain cluster, and JUNB for the differentiation loss cluster. At the promoter
regions NF-kB was a positive feature for LPS up cluster, MITF for the differentiation gain cluster, and CREB1 and JUNB for the differentiation loss cluster.
(B) Motif enrichment is plotted as absolute difference in abundance compared to background (yellow, higher abundance than background; blue, lower
abundance than background) for the top enriched motifs. Consistently identified transcription factor motifs include SPI1 at differentiation associated enhancers,
NF-kB at LPS enhancers, and EGR2 and MITF at BG enhancers. Abundance increase over background supports the level of importance score.
(C) A diagram of the transcription factor network based on EGR2 and MITF motif occurrence at BG-induced lysosome and lipid metabolism genes. Purple arrows
indicate the direction of expression induced by BG exposure, and red arrows indicate the direction of expression induced by for LPS exposure. BG exposure
induces transient expression of the genes, while LPS exposure inhibits activation. The full network based on promoter abundance is shown in Figure S4B.
See also Figure S4.
Trained
4 hour
0 day 1
day 6
−25
re-exposure
−50
treatment
Naive
−50
−100 LPS Partially
BG
−100 0 100 −50 0 50 100
time
50
d0
25
1 hour
3
Expression
PC2 (14.3%)
25 4 hour
PC2 (23%)
day 1
0
day 6
0
re-exposure
treatment
−25
Naive -3
−25
LPS
−50 BG
−50 0 50 −40 −20 0 20 40 60
PC1 (52.7%) PC1 (42.5%) Naive Day 6
Tolerized LPS re-exposure
Figure 3. Macrophage Endotoxin Tolerance Defined at the Transcriptional Level following LPS Re-exposure
(A) The innate immune memory model, including data collection at LPS re-exposure at day 6.
(B) PCA plots of dynamic RNA-seq, H3K27ac at promoters and enhancers, and H3K4me1 peaks, including LPS re-exposure samples. After re-exposure to LPS,
significant enhancer H3K27ac changes occur in LPS-Mfs, indicating that they are capable of activating their enhancers. However, the level of their response is
lower compared to monocytes, naive-Mfs, and BG-Mfs, which can be seen on the second principal component. Unlike RNA and H3K27ac, H3K4me1 does not
show significant changes following LPS re-exposure in any of the three macrophage subtypes.
(C) The total macrophage transcriptional response (750 genes) to LPS was separated into three groups based on the induction of genes in LPS-Mfs, relative to
naive-Mfs and BG-Mfs, revealing a gradient in LPS-Mf response to LPS re-exposure. The groups are (G1) tolerized genes, (G2) partially tolerized genes, and (G3)
responsive genes.
See also Figure S5.
cytokines, but maintain their ability to express other genes, such enhancers (observable as large shift in PC2; Figure 3B). This indi-
as those required for tissue repair (Foster et al., 2007). Given the cates that tolerized macrophages can and do respond to LPS at
wide-ranging epigenetic alterations in LPS-Mfs (Figure 1C; Table the epigenetic and transcriptional level. However, from H3K27ac
S1), we sought to investigate the epigenetic basis for endotoxin and H3K4me1 principal-component analysis (PCA), it is clear that
tolerance by exposing differentiated naive-Mfs, LPS-Mfs, and the epigenetic profile of LPS-Mfs is markedly different from that of
BG-Mfs to LPS for 4 hr (LPS re-exposure) (Figure 3A). The overall naive-Mfs and BG-Mfs (observable as an LPS-Mfs lag on PC1;
transcriptional and histone modification changes induced in mac- Figure 3B).
rophages by LPS re-exposure are shown in Figure 3B, and few Polytomous modeling was used to separate genes based on
differences were observed between naive-Mfs and BG-Mfs. their transcriptional response to LPS re-exposure (4 hr) in mac-
LPS-Mfs show an avid response to LPS re-exposure both tran- rophages at day 6. In total, 780 genes showed higher expression
scriptionally and with H3K27ac deposition at promoters and distal (FC > 2, posterior probability > 0.3) in naive-Mfs following 4-hr
0.5 LPS-Mf
RNA
-0.5 Naive-Mf
Restim
-1.5
LPS-Mf
B IRF
STAT2
EGR2 15
KLF6
100
E2F3 SP1 IRF ZNF350
ZBTB7B EGR2
STAT
75 EGR2 E2F3
50
25
0
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100
Relative importance (PLS)
D G1 G2 G3
(n = 95) (n = 106) (n = 189)
Dynamic H3K27ac promoters Dynamic H3K27ac promoters Dynamic H3K27ac promoters
2 2 2
1 1 1
logFC
logFC
logFC
0 0 0
d0 1h 4h d1 d6 R d0 1h 4h d1 d6 R d0 1h 4h d1 d6 R
Figure 4. Histone Modification Dynamics and Open Chromatin Analysis at Tolerized Gene Promoters
(A) Heatmap showing average expression of 777 LPS-responsive genes in naive-Mfs. Genes are ranked based on their induction in LPS-Mfs, first by tolerance
group (G1, G2, and G3) and then by relative induction compared to naive-Mfs within each group. Response to LPS re-exposure is a gradient in LPS-Mfs, with the
most tolerized genes on the left and the most responsive genes on the right.
(B) Heatmap showing abundance of significant motifs in the promoter regions of the three macrophage LPS-responsive gene groups. The tolerized gene pro-
moters are enriched for several transcriptional repressors, such as EGR2 and TP53, while the partially tolerized gene promoters are enriched for IRF and STAT
motifs.
(C) Random forest (RF) and a partial least-squares (PLS) classifiers importance score (between 0 and 100) for each tolerized gene cluster (G1, tolerized; G2,
partially tolerized; and G3, responsive). Green dots represent over-represented motifs, and red dots under-represented motifs. The top features of G1 gene
promoters are E2F3, EGR2, and ZBTB7B motifs. The top features for G2 gene promoters are IRF and STAT, while G3 promoters do not have over-represented
features but are depleted of EGR2, E2F3, and ZNF350.
(D) Median H3K27ac at dynamic promoters of G1, G2, and G3 group genes, shaded areas represent the 25th and 75th quartile. This shows that LPS-Mfs do not
accumulate H3K27ac at tolerized genes but do so at the promoters of responsive genes. See also Figure S6 for H3K4me3.
See also Figure S6.
3000 1400
2500 1200
IL6(pg/mL)
IL6(pg/mL)
1000
2000
800
1500 600
1000 400
500 200
0 0
Donor: A Naïve
B C D ATolerized
B C D Tolerized
A B C Dco-ATolerized
B C D ATolerized
B CD
f
f
-M
M
f
M
IBET +BG
LPS +IBET
LPS
S-
sc
coIBET + iBET
na
re
4
IL6 (pg/mL)
3 4000
ICU 2
3000
Fever 1
0
Inflammation -1
2000
-2 1000
Naive Tolerized -3
Monocyte -4 0
monocyte
G
G
G
G
ed
ed
ve
+B
+B
+B
+B
riz
ai
riz
N
ed
ed
ve
le
ve
le
To
To
riz
riz
Ex vivo
ai
ai
N
le
le
To
To
TNF ( pg/mL)
4
8000
3
2 6000
1
4000
0
72h 72h -1 2000
-2
-3 0
LPS LPS
G
G
ve
ed
G
G
ed
+B
+B
+B
24h 24h
ai
+B
riz
riz
ed
ve
le
ed
ve
le
To
riz
Cytokine
ai
To
Cytokine
riz
ai
le
N
le
release
To
release
To
Figure 5. BG Can Reverse Both In Vitro and In Vivo LPS-Induced Tolerance and Reinstate Proper Cytokine Production in Macrophages
(A) The in vitro monocyte tolerance reversal model, with BG added therapeutically after 24 hr of LPS exposure (rescue-Mfs). The histone-mimic and inflammation
blocker IBET was used in a preventative (co-culture with LPS for 24 hr LPS-co-IBET-Mfs) and a therapeutic (added after 24 hr of LPS exposure [LPS + IBET-Mfs])
manner. Following several days of rest, macrophages were re-exposed to LPS and cytokine release measured after 24 hr.
(B) BG re-instates IL-6 release in tolerized macrophages. Data from six donors are shown for naive-Mfs, LPS-Mfs, and rescue-Mfs.
(C) Preventative use of IBET blocks the first LPS response in monocytes, resulting in differentiation of macrophages that can release cytokines at the second LPS
exposure. Therapeutic use of IBET does not re-instate cytokine release in macrophages.
(D) Experimental human endotoxemia model, with ex vivo BG administration. Monocytes were isolated from 12 healthy volunteers before (naive) and 4 hr after
LPS injection (tolerized). Naive or tolerized monocytes were exposed to BG for 24 hr, followed by culture media, or culture media alone. After 3 days ex vivo,
monocytes were re-exposed to LPS, and cytokines were measured 24 hr later.
(E and F) BG recovered IL-6 release in 9 out of 12 tolerized monocytes (E) and TNF release in 8 out of 12 monocytes (F). Data are presented as mean ± SD.
2.5 2
5.0 2
1.5
logFC
1.5
4.5 1
1
4.0 0.5
0.5
3.5 0 0
similarity to naive response
f
f
f
f
-M
-M
-M
M
-M
S-
S-
ve
ve
ue
ue
LP
LP
ai
ai
sc
sc
N
N
re
re
(G1) Most tolerized
IRF1 IRF8
1.6 3
1.4
2.5
1.2
0 logFC 2
1
0.8 1.5
0.6
1
0.4
Individual genes
0.5
0.2
0 0
f
f
f
f
-M
-M
M
-M
-M
S-
S-
ve
ve
ue
ue
LP
LP
ai
ai
sc
sc
N
N
re
re
(G2) Partial tolerized
ITIH4 CXCL10
4.5 8
4 7
3.5 6
3
5
logFC
2.5
4
2
3
1.5
1 2
0.5 1
0 0
f
f
f
f
-M
-M
-M
-M
-M
-M
-M
S-
T-
S-
S-
ve
ve
ve
ET
ue
ue
ue
BE
LP
LP
LP
ai
ai
ai
sc
sc
sc
-IB
N
N
+I
re
re
re
co
S
LP
S
LP
established distal enhancers that were modulated in the oppo- phagocytosis, and lysosome maturation (Figures 1G and S2) and
site direction by BG or LPS exposure (Figure 1D). Deposition of have clear TF motif signatures for EGR2, MITF, and ARNT (Fig-
H3K27ac and H3K4me1 at these regions was accelerated by ure 2A). Interestingly, EGR2, a TF downstream of the BG recep-
BG exposure and delayed or completely blocked by LPS expo- tor dectin-1, showed clear transient upregulation by BG but re-
sure. Accordingly, expression of genes near these elements was mained inactive in LPS-exposed monocytes, suggesting a
induced by BG, peaking at 24 hr post-exposure, while they re- possible role in modulating these pathways (Figure S4). TFs
mained lowly expressed in LPS-exposed monocytes (Figure 1F). and pathways linking lipid biosynthesis and inflammation have
These genes were involved in lipid metabolism and biosynthesis, been described (Spann et al., 2012). Further, macrophage
Signal Intensity
PC2 (explains 8.5%)
25
treatment
Naive
LPS-Mf
Rescue-Mf
0 LPS-coIBET-Mf
time
Donor 1
Donor 2
Donor 3
Donor 1
Donor 2
Donor 3
Donor 1
Donor 2
d0
day 6
−25
C ATP9B D LPL
LPS repressed H3K27ac LPS repressed H3K27ac
Naïve-Mf 1 Naïve-Mf 1
Naïve-Mf 2 Naïve-Mf 2
LPS-Mf 1 LPS-Mf 1
LPS-Mf 2 LPS-Mf 2
Rescue-Mf 1 Rescue-Mf 1
Rescue-Mf 2 Rescue-Mf 2
response to infection requires a substantial amount of energy, from sepsis patients, indicating that reversal of tolerance using
and shifts in metabolism and energy production are a whole innate immune ‘‘trainers’’ is a viable therapeutic strategy (Cheng
mark of macrophage polarization to M1 or M2 subtypes (Ghes- et al., 2016). We show that BG exposure can indeed reverse the
quière et al., 2014), as well as for establishment of trained immu- tolerance in macrophages induced by LPS exposure, with
nity (Cheng et al., 2014). rescue-Mfs showing higher release of cytokines in response to
Reversal of tolerance after the initial inflammation phase has a second LPS stimulus (Figures 5A–5C). This was in contrast to
garnered interest because of the limited success of inflamma- the inflammation blocker IBET151, which only prevented toler-
tion-blocking treatments to reduce overall sepsis mortality ance when used to block the initial LPS response but could not
(Angus and van der Poll, 2013) and because the majority of reverse it when given to cells after LPS-induced inflammation
sepsis deaths occur due to secondary hospital infection during (Figure 5). In order to further relate our findings to the in vivo sit-
the tolerized phase (Gilroy and Yona, 2015). Our hypothesis uation, we used an experimental human endotoxemia model to
was that BG can reverse LPS-induced tolerance because it induce tolerance in vivo (Draisma et al., 2009; Kox et al., 2014).
discordantly regulated pathways that LPS also affected. Specif- In terms of cytokine production, in-vivo-tolerized monocytes
ically, LPS fails to activate key regulators of lipid, lysosome, and behave similarly to their in-vitro-tolerized counterparts. The to-
metabolism genes, EGR2 and MITF, while BG induces their lerized state of in vivo LPS-exposed monocytes is similar to
expression (Figures 1 and 2). Recently, IFNG was shown to that of ex-vivo-exposed monocytes and, most importantly, can
partially recover metabolic function in tolerized monocytes also be rescued by ex vivo BG exposure (Figures 5D–5F),
Cheng, S.C., Quintin, J., Cramer, R.A., Shepardson, K.M., Saeed, S., Kumar, Lavin, Y., Winter, D., Blecher-Gonen, R., David, E., Keren-Shaul, H., Merad,
V., Giamarellos-Bourboulis, E.J., Martens, J.H., Rao, N.A., Aghajanirefah, A., M., Jung, S., and Amit, I. (2014). Tissue-resident macrophage enhancer
et al. (2014). mTOR- and HIF-1a-mediated aerobic glycolysis as metabolic landscapes are shaped by the local microenvironment. Cell 159, 1312–
basis for trained immunity. Science 345, 1250684. 1326.
Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Bur-
Cheng, S.C., Scicluna, B.P., Arts, R.J., Gresnigt, M.S., Lachmandas, E., Gia-
rows-Wheeler transform. Bioinformatics 25, 1754–1760.
marellos-Bourboulis, E.J., Kox, M., Manjeri, G.R., Wagenaars, J.A., Cremer,
O.L., et al. (2016). Broad defects in the energy metabolism of leukocytes un- Mammana, A., and Chung, H.R. (2015). Chromatin segmentation based on a
derlie immunoparalysis in sepsis. Nat. Immunol. 17, 406–413. probabilistic model for read counts explains a large portion of the epigenome.
Genome Biol. 16, 151.
de la Rica, L., Rodrı́guez-Ubreva, J., Garcı́a, M., Islam, A.B., Urquiza, J.M.,
Hernando, H., Christensen, J., Helin, K., Gómez-Vaquero, C., and Ballestar, McLean, C.Y., Bristor, D., Hiller, M., Clarke, S.L., Schaar, B.T., Lowe, C.B.,
E. (2013). PU.1 target genes undergo Tet2-coupled demethylation and Wenger, A.M., and Bejerano, G. (2010). GREAT improves functional interpre-
DNMT3b-mediated methylation in monocyte-to-osteoclast differentiation. tation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501.
Genome Biol. 14, R99. Netea, M.G., Joosten, L.A., Latz, E., Mills, K.H., Natoli, G., Stunnenberg, H.G.,
Draisma, A., Pickkers, P., Bouw, M.P., and van der Hoeven, J.G. (2009). Devel- O’Neill, L.A., and Xavier, R.J. (2016). Trained immunity: A program of innate im-
opment of endotoxin tolerance in humans in vivo. Crit. Care Med. 37, 1261– mune memory in health and disease. Science 352, aaf1098.
1267. Nicodeme, E., Jeffrey, K.L., Schaefer, U., Beinke, S., Dewell, S., Chung, C.W.,
Foster, S.L., Hargreaves, D.C., and Medzhitov, R. (2007). Gene-specific con- Chandwani, R., Marazzi, I., Wilson, P., Coste, H., et al. (2010). Suppression of
trol of inflammation by TLR-induced chromatin modifications. Nature 447, inflammation by a synthetic histone mimic. Nature 468, 1119–1123.
972–978. Nishikawa, K., Iwamoto, Y., Kobayashi, Y., Katsuoka, F., Kawaguchi, S., Tsu-
jita, T., Nakamura, T., Kato, S., Yamamoto, M., Takayanagi, H., and Ishii, M.
Ghesquière, B., Wong, B.W., Kuchnio, A., and Carmeliet, P. (2014). Meta-
(2015). DNA methyltransferase 3a regulates osteoclast differentiation by
bolism of stromal and immune cells in health and disease. Nature 511,
coupling to an S-adenosylmethionine-producing metabolic pathway. Nat.
167–176.
Med. 21, 281–287.
Ghisletti, S., Barozzi, I., Mietton, F., Polletti, S., De Santa, F., Venturini, E.,
Ostuni, R., Piccolo, V., Barozzi, I., Polletti, S., Termanini, A., Bonifacio, S., Cu-
Gregory, L., Lonie, L., Chew, A., Wei, C.L., et al. (2010). Identification and char-
rina, A., Prosperini, E., Ghisletti, S., and Natoli, G. (2013). Latent enhancers
acterization of enhancers controlling the inflammatory gene expression pro-
activated by stimulation in differentiated cells. Cell 152, 157–171.
gram in macrophages. Immunity 32, 317–328.
Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for
Gilroy, D.W., and Yona, S. (2015). HIF1a allows monocytes to take a breather
comparing genomic features. Bioinformatics 26, 841–842.
during sepsis. Immunity 42, 397–399.
Quintin, J., Saeed, S., Martens, J.H., Giamarellos-Bourboulis, E.J., Ifrim,
Glass, C.K., and Natoli, G. (2016). Molecular control of activation and priming
D.C., Logie, C., Jacobs, L., Jansen, T., Kullberg, B.J., Wijmenga, C.,
in macrophages. Nat. Immunol. 17, 26–33.
et al. (2012). Candida albicans infection affords protection against reinfec-
Goodridge, H.S., Simmons, R.M., and Underhill, D.M. (2007). Dectin-1 stimu- tion via functional reprogramming of monocytes. Cell Host Microbe 12,
lation by Candida albicans yeast or zymosan triggers NFAT activation in 223–232.
macrophages and dendritic cells. J. Immunol. 178, 3107–3115.
Quintin, J., Cheng, S.C., van der Meer, J.W., and Netea, M.G. (2014). Innate
Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., immune memory: towards a better understanding of host defense mecha-
Murre, C., Singh, H., and Glass, C.K. (2010). Simple combinations of lineage- nisms. Curr. Opin. Immunol. 29, 1–7.
determining transcription factors prime cis-regulatory elements required for Rialdi, A., Campisi, L., Zhao, N., Lagda, A.C., Pietzsch, C., Ho, J.S., Marti-
macrophage and B cell identities. Mol. Cell 38, 576–589. nez-Gil, L., Fenouil, R., Chen, X., Edwards, M., et al. (2016). Topoisomerase
Huang da, W., Sherman, B.T., and Lempicki, R.A. (2009). Systematic and inte- 1 inhibition suppresses inflammatory genes and protects from death by
grative analysis of large gene lists using DAVID bioinformatics resources. Nat. inflammation. Science 352, aad7993.
Protoc. 4, 44–57. Saeed, S., Quintin, J., Kerstens, H.H., Rao, N.A., Aghajanirefah, A., Matarese,
Kleinnijenhuis, J., Quintin, J., Preijers, F., Joosten, L.A., Ifrim, D.C., Saeed, S., F., Cheng, S.C., Ratter, J., Berentsen, K., van der Ent, M.A., et al. (2014).
Jacobs, C., van Loenhout, J., de Jong, D., Stunnenberg, H.G., et al. (2012). Ba- Epigenetic programming of monocyte-to-macrophage differentiation and
cille Calmette-Guerin induces NOD2-dependent nonspecific protection from trained innate immunity. Science 345, 1251086.
Further information and requests for reagents may be directed to, and will be fulfilled by the corresponding author Hendrik G. Stun-
nenberg (h.stunnenberg@ncmls.ru.nl).
METHOD DETAILS
Cytokine Assays
TNFa and IL-6 were measured using ELISA according to the manufacturer protocol (IL6: Sanquin; and TNFa: R&D). For cytokines
production assays the differences between groups were analyzed using the Wilcoxon signed-rank test. The level of significance
was defined as a p value < 0.05.
Chromatin Immunoprecipitation
Purified cells were fixed with 1% formaldehyde (Sigma) at a concentration of approximately 10 million cells/ml. Fixed cell prepara-
tions were sonicated using a Diagenode Bioruptor UCD-300 for 3x 10 min (30 s on; 30 s off). 67 ml of chromatin (1 million cells) was
incubated with 229 ml dilution buffer, 3 ml protease inhibitor cocktail and 0.5-1mg of H3K27ac, H3K4me3, H3K4me1, H3K27me3,
H3K9me3 or H3K36me3 antibodies (Diagenode) and incubated overnight at 4 C with rotation. Protein A/G magnetic beads were
washed in dilution buffer with 0.15% SDS and 0.1% BSA, added to the chromatin/antibody mix and rotated for 60 min at 4 C. Beads
were washed with 400ml buffer for 5 min at 4 C with five rounds of washes. After washing chromatin was eluted using elution buffer for
20 min. Supernatant was collected, 8 ml 5M NaCl, 3ml proteinase K were added and samples were incubated for 4 hr at 65 C.Finally
samples were purified using QIAGEN; Qiaquick MinElute PCR purification Kit and eluted in 20 ml EB. Detailed protocols can be found
on the Blueprint website (http://www.blueprint-epigenome.eu/UserFiles/file/Protocols/Histone_ChIP_May2013.pdf).
Statistical parameters including the exact value of n, the definition of center, dispersion, and precision measures (mean ± SEM) and
statistical significance are reported in the Figures and the Figure Legends. Data are judged to be statistically significant when p < 0.05
by two-tailed Student’s T-Test or 2-way ANOVA, where appropriate.
Data Resources
Raw data files for the RNA, ATAC, and ChIP sequencing and analysis have been deposited in the NCBI Gene Expression Omnibus
under accession number: GSE85246.
Links to GEO SubSeries linked to GSE85246:
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85243
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85245
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE87218
H3K27ac H3K4me1
C Promoter Promoter
H3K27ac H3K4me3 H3K27me3 H3K9me3
dynamics dynamics dynamics dynamics
25 20
10
10
time
PC2 (14.3%)
PC2 (17.2%)
5
PC2 (7.1%)
PC2 (6.4%)
5 d0
0
0
4 hour
0
0 day 1
day 6
−5
−25
−20
−5
treatment
Naive
−10
LPS
−10
BG
−40 −15
−50 −25 0 25 50 −20 0 20 40 −25 0 25 50 −20 0 20 40
80 1
H3K27me3
%Fraction of bins
2
H3K4me1
60 60 3
H3K27ac 4
H3K4me3 40 5
40
7
Bivalent
9
Prom Enh Hetero 20 20 8
6
0 0
Naive 4h
Naive d1
Naive d6
Mo d0
LPS 4h
LPS d1
LPS d6
BG 4h
BG d1
BG d6
Naive 4h
Naive d1
Naive d6
Mo d0
LPS 4h
LPS d1
LPS d6
BG 4h
BG d1
BG d6
1 2 3 4 5
log(mean count+1)
Figure S1. Summary of Dynamic Histone Marks and PCA Plots of Dynamic Active Histone Modifications at Promoters and Repressive Marks,
Related to Figure 1
(A) Percentage of histone ChIP-seq peaks designated as dynamic across time-points and between treatments. H3K27ac was the most dynamic modification,
with almost a third of regions showing significant changes.
(B) Heatmap showing histone intensity of H3K27ac and H3K4me1 at dynamic H3K27ac enhancers with 12kb ± from center of the peak.
(C) PCA plots for all time-points for H3K27ac dynamic promoters, H3K4me3 dynamic promoters, dynamic H3K27me3 regions, and dynamic H3K9me3 regions.
H3K27ac and H3K4me3 at promoters behave similarly over time and in response to LPS or BG exposure, and reflect the behavior of H3K27ac at enhancers.
Unlike active marks, repressive marks show little dynamics up to day 1.
(D) Stacked plots showing chromatin state changes over differentiation at ‘‘LPS-Mf up’’ and ‘‘BG up / LPS down’’ H3K4me1 enhancers. These enhancers are
established through H3K27ac dynamics shown in Figure 1C. The genome was segmented into 9 chromatin states based on the 5 histone marks analyzed. This
analysis indicates that H3K4me1 increase is associated with loss of H3K27me3.
A Number of genes at each time-point that deviate from RPMI by FC >2
No
No
Number of genes
1500 1500
.
ge
ge
1000 1000
up
ne
ne
500 500 500
s
d6 0 4h d6 0 4h
-500
down
-1500
d1 d1 1h 4h d1 d6 1h 4h d1 d6
B Overlap between H3K27ac clusters in Figure 1C and RNA clusters in Figure S2A
gene expression
LPS up BG up
1h 4h d1 d6 1h 4h d1 d6
H3K27ac promoter
LPS
pattern
BG
Diff gain
Diff loss
C
LPL (Lipoprotein Lipase) – Promoter and Enhancer belong to ‘BG up / LPS down’ cluster
RNA-seq Histone modifications
d0
d0 Naive
4h LPS
Naive
H3K27ac
BG
4h LPS Naive
d1 LPS
BG BG
Naive
RNA
Naive
H3K4me3 H3K27ac H3K27me3
d1 LPS LPS
BG
BG
Naive
Naive d6 LPS
BG
d6 LPS Naive
LPS
BG
BG
Figure S2. RNA-Seq Dynamics in Response to LPS and BG and Relationship to Histone Marks, Related to Figure 1
(A) Number of genes showing treatment (LPS or BG) specific expression at each time point (1h, 4h, d1, d6). LPS exposure induces the largest number of genes at
each time-point, with a minimum of 110 transcripts at 1h, and a maximum of 650 transcripts at day 1. Up to 100 genes maintain LPS-specific expression at d6.
Comparatively BG induced gene expression patterns peak at d1, a fraction of which is maintained to d6.
(B) Overlap between gene expression group and promoter H3K27ac cluster. LPS-induced H3K27ac accumulation at promoters correlates well with LPS induced
gene expression at all time-points. However, at day 1 and day 6, the ‘LPS-up’ genes are equally explained by a lag in differentiation-associated repression in LPS
treated cells. Conversely, BG exposure leads to faster expression of differentiation associated genes, with higher overlap between ‘BG-up’ genes and ‘differ-
entiation gain’ and BG-associated H3K27ac promoters.
(C) Example tracks of a BG induced/LPS repressed gene and an LPS induced gene, LPL (Lipoprotein Lipase).
A DNA methylation B
(all CpGs)
2,700 DMRs
1
DNA methylation
BG 4h
Naive 4h 0.5
Naive 1h
LPS 1h
BG 1h
LPS 4h 0
d0
Naive d1
d0 Naive LPS BG
BG d1
Day 6
BG d6
Naive d6
1 C
LPS d1
Distal Distal Open Promoter
LPS d6 H3K4me1 H3K27ac ATAC H3K4me3
0
BG 4h
Naive 4h
Naive 1h
LPS 1h
BG 1h
LPS 4h
d0
Naive d1
BG d1
BG d6
Naive d6
LPS d1
LPS d6
1.0
1.0
0.8
0.8
0.8
5mC+5hmC level
5mC+5hmC level
5mC+5hmC level
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
d0 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6 d0 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6 d0 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6
-3
Mo 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6 Mo 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6 Mo 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6
Figure S3. DNA Methylation Dynamics in Monocyte-to-Macrophage Differentiation and Tolerance and Training, Related to Figure 1
(A) Correlation plot of DNA methylation values, showing clear separation of LPS d1 and LPS-d6 from other samples.
(B) Boxplot of 2,700 DMRs, showing that the general trend is loss of methylation during monocyte-to-macrophage differentiation.
(C) Chromatin context of DMRs. The majority (91%) of DMRs occur in distal regions marked by H3K4me1, 69 occur at H3K27ac marked enhancers and open
chromatin regions. Only 6% occur at promoters.
(D) Boxplots showing DNA methylation over time for macrophage sub-type specific DMRs. Analysis identified DMRs common to all macrophages, and those that
are only established in LPS-Mf or not-established in LPS-Mf.
(E) Heatmap of H3K27ac changes at DMRs. Generally, DNA de-methylation at DMRs was associated with accumulation of H3K27ac.
A Timecourse RNA-seq Time-specific validation
EGR2 Donor 1 Donor 2
150 150 4
relaƟve to RPMI
RPKM
Fold Change
100 100 3
2
4 hour
50 50
1 N=5
0 0 0
d0 4h d1 d6 d0 4h d1 d6 RPMI LPS
Naive BG
relaƟve to RPMI
RPKM
60
Fold Change
100 40
40 4 hour
50 20 N=5
20
0 0 0
d0 4h d1 d6 d0 4h d1 d6 RPMI LPS
Naive BG
relaƟve to RPMI
RPKM
Fold Change
100 100 2 4 hour
1 N=7
0 0 0
d0 4h d1 d6 d0 4h d1 d6 RPMI LPS
Naive BG
relaƟve to RPMI 6
Fold Change
B BG
Dectin FDPS
ALG14
14
DLAT
IRF3
CDIPT
ESRRA SOAT
SOAT1
STK11 AP1S1
BPNT1
NT1
NDUFS6
FIG4
SNAI3
EGR2
NSDHL PHYH
RDH14 ADH5
APOL1
PPARA MTMR2
2 AGPAT2
A GPAT2
2 FECH
LTA4H
PIGK
MAF AKR1B1
A KR1B1 PIGV
IRF8
CPT2
ABHD5
APOL2 IVD CD36
ARSA SLC27A3
S LC27A3
3
PIGN
ZNF232 ACAT2
ACAT1
THRA ALG8
GDE1
SCD
Yellow: KLF9
CEBPA
HTRA2
LPCAT4
L PCAT4
4
PGAP3
Green:
MITF
Expression peak d6 ABCC1
IGF2R GCDH
PTGR1 ALAS1
SLC27A1
S LC27A1
1 INPP4A
A PLA2G4C
P LA2G4C
C
Red lines:
PDHB FGR
DHRS4 SMPD1 NFE2L1
N FE2L1 INPP5F CLN6
NANP
GNS
ZNF25
PLA2G15
P LA2G15
5
CDK8 MGST3 PIK3R2 CTSZ
PLTP PLD3 PPT1
Motif presence
ALG5 MTERFD1
M TERFDD1
1 PPAP2B
P PA B
THEM4 CTSDAP3M2
GALNSCTS PIGF
ZHX3
X3
GBA
PLD1 UBTF TFAM PIP4K2C
P IP4K2C
C
SCARB1
S CARB1
1 USF2 PTPMT1
PTPMT1 PDE3B
GLB1 RBL2
STUB1 CTSH AP3D1 ST3GAL2
S T3GAL2
2
AGPAT5
A GPAT5
5 PIGQ FABP
FABP3
ALDH3A2
A LDH3A2
2 ECI2 ZNF768 MSMO1 PEX7
GM2A PIGO DHCR24
PTGES2
AGPAT4
A GPAT4
4
ATP6AP1
A TP6AP1
1 ABCC3
TNFRSF21
TNFRSF2
21 ZNF775
UGCG
OSBPL1A
O SBPL1A
A NAGA ACAA2 ALG6
LPL PCCB
AGPS DPAGT1
D PAGT1 SRD5A3
S RD5A3
3 HMGCL CYP4V2
CECR5 HEXB ATP5A1
A TP5A1 RGL1
CTSB TECR
ZNF32 TEX2
ALG12
LAMP1
LA
AMP1
P1 PLA2G7
P LA2G7
7
GPD2 ECHS1 ECI1
CI1 ZNF616
Z NF6 ECH1 FAR2 C14orf1
C 14orf1
WDTC1
PDSS2 ACP5
MCEE ETNK1 HSD11B1
H SD11B1
1 CTSO
LIPA
L IPA
A SLC25A1
S LC2
IDH1
ACADM PIP4K2B
P IP4K2B
B
ACADVL
A CADVL
L HADHB
MGLL ACOT7 ABCD3 PDK4
MVK CPN
CPNE3 ACAD8
SORT1
NCEH1 RDH10 NAGLU
ME1 CAT ABHD3 HDLBP
ACP2 PAFAH2
P AFAH2
2
SLC25A20
S LC25A20
0 HEXA GPC4
DBI CARM1 ERLIN2
CERS6 ST6GALNAC2
ST6
6GALNA
A 2
AC
CD63 PC FUCA1
SDC2
CMAS
G6PD SLC27A4
S LC27A4
4
CROT
HSD17B4
H SD17B4
4
POGLUT1
P OGLUT1
1 ACOX1 PDHA1
2 2
1
logFC
logFC
1 1
logFC
0
0 0
−1
−1 −1
RPMI RPMI RPMI
LPS LPS LPS
BG BG BG
−2 −2 −2
d0 1h 4h d1 d6 R d0 1h 4h d1 d6 R d0 1h 4h d1 d6 R
800 60 40000
RPKM
C
RNA level Relative cytokine release
TNF IL6
500 800 100
over naïve - Mf
Fluorescence
400 600 10
RPKM
300
400 1
200
100 200 0.1
0 0 0.01
Day
- 6- Restim
- - Day
- -6 Restim
- - LPS BG LPS BG
D
Tolerized pathways Responsive pathways
Apoptosis
Type I diabetes Phagocytosis
Jak−STAT signaling
p53 signaling
Cytosolic DNA−sensing DNA sensing
Allograft rejection Helicobacter infection
Jak−STAT signaling Apoptosis
Chemokine signaling Toll−like signaling Hematopoiesis
150 400
200 100
100 200
100 50 50
0 0 0 0
- 6- ResƟm
Day - - - 6- ResƟm
Day - - - 6- ResƟm
Day - - - 6- ResƟm
Day - -
NFKB1 RELA
300 80
60
200
40 Naïve- Mf
100
20 LPS - Mf
0 0
-
Day 6- ResƟm
- - -Day 6- ResƟm
- -
2 1
1
logFC
1 0
0
0 −1
RPMI RPMI RPMI
LPS LPS LPS
BG BG BG
−1 −1 −2
d0 1h 4h d1 d6 R d0 1h 4h d1 d6 R d0 1h 4h d1 d6 R
Figure S6. Active Histone Mark Changes at Promoters of Tolerized and Responsive Genes and Overall Chromatin States at the Same
Promoters, Related to Figure 4
(A) Expression at day 6 and at LPS re-exposure for STAT2 and 5A, and IRF1 and 8 (mean RPKM of 4 donors, error bars represent standard deviation). These
pro-inflammatory TFs show a tolerized response in LPS-Mf to LPS re-exposure. The inability of these genes to be activated may play a role in the tolerance of
downstream targets, as suggested from the enrichment of their motifs in the G2 partially tolerized gene promoters (Figure 4B).
(B) expression at day 6 and at LPS re-exposure for NFKB1 and RELA. These TFs are responsive to LPS re-exposure in LPS-Mf, and their motifs are not
significantly enriched in tolerized genes. This suggests that NF-kB signaling is not impaired at the level of transcription. Data are represented as mean ± SD.
(C) LPS-Mf do not accumulate H3K4me3 at tolerized genes, but do so at the promoters of responsive genes. This pattern is similar to that of H3K27ac shown in
Figure 4D.
d0 24h 48h d6
A
Culture
RPMI +BG
LPS
LPS + BG
B Differentiation-associated genes
Lipid metabolism, oxidative phosphorylation
LAMP1
1000
3 800
Expression
RPKM
600
400
-3
200
0
Mo Naive LPS BG Naive LPS LPS Naive LPS LPS Naive BG LPS LPS
BG BG BG
4h d3 d6 d6
5 2.5 20
4 2.0
RNA
15
3 1.5
2 1.0 10
1 0.5 5
0 0.0 0
5 2.5 20
4 2.0
RNA
15
3 1.5
1.0 10
2
1 0.5 5
0 0.0 0
Figure S7. Expression of Genes Involved in Lipid Biosynthesis and Metabolism following BG Reversal of LPS-Induced Tolerance, Related to
Figure 7
(A) Experimental set-up, indicating the collection of samples for gene expression analysis. Samples were collected at day 1 +4h, indicating that monocytes were
treated with media (RPMI) or LPS for 24 hr, at which point cells were exposed to BG for 4 hr and collected. Additionally samples were collected at day 3 and day6.
(B) BG exposure, following LPS, recovers the expression of genes involved in lipid biosynthesis and oxidative phosphorylation as early as day 3. LAMP1 is
an example of a lysosome gene that shows high expression in BG-Mf and low expression in LPS-Mf. BG exposure recovers the expression of this gene in
LPS-BG-Mf.
(C) BG addition at day 1 in Naive monocytes induces the expression of EGR2, MITF and CSF1, as it does when added at day 0 (Figure S4C). In tolerized
monocytes, BG induces the expression of EGR2 and MITF, but to a lesser degree. This indicates that BG receptor pathways are not completely disrupted by LPS
exposure, providing a basis for BG reversal of LPS-induced tolerance.
Resource
Correspondence
mf471@cam.ac.uk (M.F.),
cew54@medschl.cam.ac.uk (C.W.),
mikhail.spivakov@babraham.ac.uk
(M.S.),
peter.fraser@babraham.ac.uk (P.F.)
In Brief
This study deploys a promoter capture
Hi-C approach in 17 primary blood cell
types to match collaborating regulatory
regions and identify genes regulated by
noncoding disease-associated variants.
Explore this and other papers at the Cell
Press IHEC webportal at http://www.cell.
com/consortium/IHEC.
Highlights
d High-resolution maps of promoter interactions in 17 human
primary blood cell types
SUMMARY INTRODUCTION
Long-range interactions between regulatory ele- Genomic regulatory elements such as transcriptional en-
ments and gene promoters play key roles in tran- hancers determine spatiotemporal patterns of gene expres-
scriptional regulation. The vast majority of interac- sion. It has been estimated that up to 1 million enhancer
tions are uncharted, constituting a major missing elements with gene regulatory potential are present in
link in understanding genome control. Here, we use mammalian genomes (ENCODE Project Consortium, 2012).
Although a number of well-characterized enhancers map close
promoter capture Hi-C to identify interacting regions
to their target genes, assignment based on linear proximity is
of 31,253 promoters in 17 human primary hematopoi- error prone, as many enhancers map large distances away
etic cell types. We show that promoter interactions from their targets, bypassing the nearest gene (Mifsud et al.,
are highly cell type specific and enriched for links be- 2015; Sanyal et al., 2012; Schoenfelder et al., 2015). Long-
tween active promoters and epigenetically marked range gene regulation by enhancers in vivo involves close
enhancers. Promoter interactomes reflect lineage re- spatial proximity between distal enhancers and their target
lationships of the hematopoietic tree, consistent with gene promoters in the three-dimensional nuclear space (Carter
dynamic remodeling of nuclear architecture during et al., 2002), most likely involving a direct interaction (Deng
differentiation. Interacting regions are enriched in ge- et al., 2014), while the intervening sequences are looped
netic variants linked with altered expression of genes out. Thus, a comprehensive catalog of promoter-interacting
they contact, highlighting their functional role. We regions (PIRs) is a requisite to fully understand genome tran-
scriptional control.
exploit this rich resource to connect non-coding dis-
Thousands of disease- and trait-associated genetic variants
ease variants to putative target promoters, priori- have been identified by genome-wide association studies
tizing thousands of disease-candidate genes and (GWAS). The vast majority of these variants are located in non-
implicating disease pathways. Our results demon- coding regions of the genome, often at considerable genomic
strate the power of primary cell promoter interac- distances from annotated genes, making assessment of their
tomes to reveal insights into genomic regulatory potential function in disease etiology problematic. However,
mechanisms underlying common diseases. GWAS variants are enriched in close proximity to DNase I
Cell 167, 1369–1384, November 17, 2016 ª 2016 The Authors. Published by Elsevier Inc. 1369
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Table 1. Summary of PCHi-C Datasets Generated in This Study
Biological Detected Promoter
Cell Type Acronym Replicates Unique Captured Read Pairsa Interactionsb
Megakaryocytes MK 4 653,848,788 150,779
Erythroblasts Ery 3 588,786,672 151,215
Neutrophils Neu 3 736,055,569 142,435
Monocytes Mon 3 572,357,387 165,947
Macrophages M0 M40 3 668,675,248 180,190
Macrophages M1 M41 3 497,683,496 171,031
Macrophages M2 M42 3 523,561,551 186,172
Endothelial precursors EndP 3 420,536,621 145,888
Naive B cells nB 3 629,928,642 189,720
Total B cells tB 3 702,533,922 213,539
Fetal thymus FetT 3 776,491,344 166,743
Naive CD4+ T cells nCD4 4 844,697,853 210,074
Total CD4+ T cells tCD4 3 836,974,777 199,525
Non-activated total CD4+ T cells naCD4 3 721,030,702 211,720
Activated total CD4+ T cells aCD4 3 749,720,649 213,235
Naive CD8+ T cells nCD8 3 747,834,572 216,232
Total CD8+ T cells tCD8 3 628,771,947 204,382
Total 11,299,489,740 698,187c
a
Total numbers of valid read pairs across all biological replicates are listed. See Table S1 for replicate-level statistics.
b
Interactions with CHiCAGO scores >5. This excludes 9,396 interactions involving 484 captured non-promoter fragments that are not considered
further in the study.
c
Unique interactions detected in at least one cell type.
PCHi-C
PCHi-C
DHSs
sDI
Physical interactions of
31,253 annotated promoters TADs
Hi-C
CHiCAGO
D E
Frequency of interactions crossing
1.0
Cumulative number of
Cumulative number of
0.8
interactions [x 1000]
TAD boundary
PIRs [x 1000]
0.6
0.4
0.2
0.0
Figure 1. Promoter Capture Hi-C across 17 Human Primary Blood Cell Types
(A) Schematic representation of the project.
(B) Interaction landscape of INPP4B gene promoter along a 5-Mb region in naive CD4+ (nCD4) cells (PCHi-C, top panel). Each dot denotes a sequenced di-tag
mapping, on one end, to the captured HindIII fragment containing INPP4B gene promoter, and on the other end, to another HindIII fragment located as per the
x axis coordinate; the y axis shows read counts per di-tag. Red dots denote high-confidence PIRs (CHiCAGO score R5), and their interactions with INPP4B
promoter are shown as red arcs. Gray lines denote expected counts per di-tag according to the CHiCAGO background model, and dashed lines show the upper
bound of the 95% confidence interval. Genes whose promoters were found to physically interact with INPP4B promoter are labeled in bold. Promoters selectively
interact with specific DNase hypersensitivity sites (DHSs, middle panel) defined in the same cell type from the ENCODE project. Some of these interactions occur
within the same topologically associated domain (TADs, black line, as defined according to the standardized directionality index score, sDI), while others span
TAD boundaries. A conventional Hi-C profile for the same locus in nCD4 cells is shown in the bottom panel.
(C) Interaction landscape of the INPP4B, RHAG, ZEB2-AS, and ALAD promoters in naive CD4+ cells (nCD4), erythroblasts (Ery), and monocytes (Mon). Dot plots
as in (B), with high-confidence PIRs shown in red (CHiCAGO score R5) and sub-threshold PIRs (3 < CHiCAGO score < 5) shown in blue.
(D) The numbers of unique interactions (left) and PIRs (right) detected for a given number of analyzed cell types. Lines and dots show the mean values over 100
random orderings of cell types; gray ribbons show SDs.
(E) Proportions of interactions crossing TAD boundaries per cell type; observed and expected frequencies of TAD boundary-crossing interactions. Error bars
show ±SD across 1000 permutations (see Quantification and Statistical Analysis).
See also Figures S1 and S2, Table S1, and Data S1.
600
Mφ1Mφ2 Mφ0
Mφ1 Mφ1 Mφ2
o oo oo o EndP
Mφ2 oo oooEndP o
EndP
500
1000
Mφ0o Mφ0o
o MK
MK MK
Height
Eryo Ery
o o
oEry MK
o Neu Mon
500
o o
Neu oo
400
Mon Mon aCD4
800
O O
Cluster ID
400
PC1
naCD4
naCD4
PC1
naCD4 O
O
aCD4
nCD8
nCD4
O
EndP
300
naCD4
0
tCD4
tCD8
Mφ2
Mφ1
Mφ0
FetT
Mon
Neu
tCD4 O
MK
Ery
tCD4
nB
0
tCD4
nBo onB FetT
O
tB
O
nCD4 O
oo O
O
nCD4 O
nCD8 nCD8 nCD8
o nB oFetT
−400
O
nCD4
o
O O O
CHiCAGO 1
tB otBFetT
O
tCD8 tCD8 tCD8 nCD4
−500
o ooo
o oo o 6 4
oo o
ooo
ooo
4 5
o o
6
2
−1500 −1000 −500 0 500
0
PC2 7
C
Lymphoid Myeloid 8
Cluster ID
10-14
naCD4
Cluster
aCD4
nCD8
nCD4
EndP
tCD4
tCD8
Mφ2
Mφ1
Mφ0
FetT
specificity
Mon
Neu
MK
Ery
nB
tB
score
3 1 15
2
1 3
4 16
5
−1
6 17-18
7
−3 8 19-20
9
10
11
12 21
13
14 22-24
15
16 25-26
17
18 27-30
19
20 31
21
22
23
24
25 32
26
27
28
29
30 33
31
32
33 34
34
Figure 2. Promoter Interactions Reflect the Lineage Relationships of the Hematopoietic Tree
(A) Principal Component Analysis (PCA) of the CHiCAGO interaction scores for each individual biological replicate (nB, naive B cells; tB, total B cells; FetT, fetal
thymus; aCD4, activated CD4+ T cells; naCD4, non-activated CD4+ T cells; tCD4, total CD4+ T cells; nCD8, naive CD8+ T cells; nCD4, naive CD4+ T cells; tCD8,
total CD8+ T cells; Mon, monocytes; Neu, neutrophils; M40–2, Macrophages M0, M1, M2; EndP, endothelial precursors; MK, megakaryocytes; Ery, erythro-
blasts). The inset shows the results of a separately performed PCA for CD4+ and CD8+ T cells only.
(B) Top (dendrogram): hierarchical clustering of the cell types according to their promoter interaction profiles. Bottom (heatmap): Autoclass Bayesian clustering of
interactions according to their cell-type specificity. Cluster IDs are shown on the right. Cluster 9 containing 108,066 interactions is not shown for clarity.
(C) Cell-type specificity of interaction clusters. The heatmap shows cluster specificity scores in each cell type (see Quantification and Statistical Analysis for
details). Cell types and clusters are arranged as in (B).
See also Figures S3A and S3B.
available (Figure 3). We found PIRs to be significantly enriched for H3K4me3 and H3K36me3 at PIRs, which are marks associ-
for histone marks associated with active enhancers, such as ated with active promoters and transcribed regions, respec-
H3K27ac and H3K4me1, in comparison with distance-matched tively, consistent with non-coding transcription of regulatory
random controls (Figures 3A and 3B). We also found enrichment regions (Natoli and Andrau, 2012).
Mφ2
1 Mφ1
Mφ0
Ery
0
ac e1 e3 e3 e 3 e3
ac
e1
e3
e3
3K 3
27 3
H me
3 K me
27
27
m m m 7 m K9m
4m
4m
9m
K 4 4 6
36
3K
H3 3K 3K K3 2
3K
3K
K H3
3K
H3 H3
H
H H
H
H
C
LCR
Regulatory
build
annotation
AC104389.31 >
< HBE1 OR51B6 > AC104389.32 >
Ensembl < CoTC_ribozyme < HBG1
annotation < HBB < HBD < HBBP1 < HBG2 < OR51B4
< OR51B2 < OR51B5
< OR51AB1P < OR51B3P
< CTD-2643I7.1
< AC104389.28 < OR51B8P
HindIII
fragments
Activity
Ery
PCHi-C
Activity
Mon
PCHi-C
Activity
nCD8
PCHi-C
Ensembl Homo sapiens version 83.37 (GRCh37.p13) Chromosome 11: 5,241,525 - 5,392,845 20Kb
Ensembl annotation Regulatory build annotation ChromHMM activity
Protein coding Processed transcript Promoter CTCF binding site TFBS
Distal enhancer Active Inactive
Pseudogene RNA gene Proximal enhancer DHS
D E F
Promoter Capture Hi−C
Correspondence of Correspondence of promoter
Count
0 20 50
Mφ2
Mφ1
Mφ0
Mon
Neu
Ery
MK
Neu
Mon
Mφ0 p=6e-5 p=0.02
Mφ1 Observed / expected
Mφ2
MK 0.6 1.0 1.4
Ery
nCD4 Active promoter Active enhancer
nCD8 Non-active promoter Non-active enhancer
1
● 1
0 ●●
●●●
●●●● ● 0 ●
●●●●●
●●
0
−1
−1
−2
−1 0 1 2 −1 0 1 2
2 3
Ery Neu
−4
* *
2
−4 −2 0 2 4 6 >=8
1
No. active enhancers (mean centred)
● 1 ●
B ●●● ●
0 ●●● 0
● ●● ●
●●● ● ●●●●
●
●
Cluster ID
−1
−1
nCD4
Mφ0
Mφ2
Mφ1
Mon
Neu
Ery
MK
−1 0 1 2 −1 0 1 2
Mean gene specificity score (interactions with active enhancers)
1
D
4
−4 −2 0 2 4
5
123 5 6 7 8 9 10 11 12
nCD4
7
Mφ1
Mφ2
Genes
Mφ0
Mon
MK
Ery
9
Neu
E Cluster ID
1 2 3 4 5 6 7 8 9 10 11 12
nCD4
11
Mon
MK
Ery
−4 −2 0 2 4 Neu
Gene specificity score
(interactions with active enhancers)
0.08 ***
*** 0.08
***
0.04 *** 0.04
***
*** ***
0 0
chr1 chr6
27,025,000 27,075,000 27,125,000 27,175,000 100,000,000 150,000,000
Gene Baited promoter fragment PIR SNP Gene Baited promoter fragment PIR SNP
2
Gene expression
2 2
0 0
0 0
-3 -2 -2 -2
G/G G/A A/A G/G G/A A/A A/A A/T T/T A/A A/T T/T
10 3
5
1
0
27080000 27100000 27120000 27140000 chr1 86260000 86280000 86300000 86320000 chr6
ARID1A eQTL test ZDHHC18 eQTL test NDUFAF4 eQTL test ZBTB2 eQTL test rs117561058
rs71636780 ARID1A+ZDHHC18 PIR NDUFAF4 PIR NDUFAF4+ZBTB2 PIR
red blood cell traits (Figure 6D), inviting further in-depth vali- more), and did not capture all those prioritized with COGS
dation by specialist communities. The COGS prioritization strat- (Figure S6B).
egy produced distinct results from a ‘‘brute-force’’ approach We further focused on a subset of 421 highest-scoring genes
based on promoter colocalization with disease susceptibility prioritized for at least one autoimmune disease. Taking into ac-
regions (DSRs) within the same TADs, which yielded con- count known and predicted protein-protein interactions and
siderably more candidates per disease (on average, 5-fold pathway co-localization of their products, we constructed a
GLC B
MCHC
FNMD
LSMD
INS B
BP D
BP S
●MS
MCH
MCV
RBC
PBC
GLC
HDL
PCV
CEL
SLE
LDL
T1D
T2D
BMI
PLT
INS
●A
MS
CD
RA
UC
HB
TG
PV
TC
HT
Autoimmune
utoim
o munen
● Blood
o
MK
5.0 ●CEL ● Metabolic
ta
● Other
he Ery
Lymphoid
●RA ● T2D
D Mφ1
●PBC Mφ0
●SLE
S Mφ2
tB
2.5 nB
●T1D
●CD FetT
Neu
UC GLC
●MCHC ●HT Mon
● GL
G LC
L naCD4
P D●●● ● GLC B
G
vs
0.0 BP LSMD
M
●INSS B● FNMD
tCD4
BP S aCD4
● ● N
Myeloid
PV
IN
NS
N
●INS BMI ●●TT2D
2 tCD8
●L ●● ●●PPCV TG
LDL
●RBC
R nCD8
PLT
L
PLT nCD4
−2.5 ●TC HB ●HDLH MCV
MCH
C ● ●
−2.5 0.0 2.5 5.0 −4 −2 0 2 4 6
Mon, Mφ & Neu vs MK & Ery
85
26
19
22
19
41
43
23
58
15
58
85
75
29
77
47
52
49
C D
5
Input GWAS data
ILR/SHC signaling ●
6 IL2 signaling ●
GM-CSF, IL3 & 5 signaling ●
value
-log10p
● ● ● ●
●● ●
WNT ligand biogenesis and trafficking
Signaling by Wnt
0.00 0.04 DNA methylation
AR transcription regulated by PKN1, KLK2 & 3 ● ●
●
PCHi-C integrated analysis 0.03 SIRT1 down-regulates rRNA
PRC2 methylates histones and DNA
●
● ●
●
1.00 0.02 Meiotic recombination ● ●
● ● ●
●●
probability
0.01 Hemostasis
0.75 Diseases associated with GAG METAB
0.50
Diseases of glycosylation
Immune System
DARPP−32 events
STING mediated induction of host IR
●
● ●●
●
●
0.25 IL6 signaling ●
TCR signaling
Phosphorylation of CD3 and TCR
●
●
0.00 Downstream TCR signaling ●
TAT of ZAP−70 to Immunological synapse ●
Chr1 117200000 117400000 117600000 Retinoid METAB & TRANS ● ●
CD
CEL
GLC
GLC B
HB
HDL
INS B
LDL
LSMD
MCH
MCHC
MCV
MS
PLT
PV
T1D
TC
TG
RA
C1orf137 PTGFRN CD101 TRIM45
CD58 CD2 TTF2
IGSF3 RP4-753F5.1
Physical interaction
Predicted interaction
Pathway FYN
IKZF1
SLE CD5 CD247
T1D INPP5D
ITGA4
MS CD3G CD3D
PBC PTPN2 CD4 IKZF3
RA EOMES
JAK2 MYC
CEL JAK1
UC IL24 TYK2 STAT4
SGMS1
CD IL22RA2 IL2RA
IFNGR1 SOCS1
VWF
IL19 IL12B SOCS3 IL6ST
CDKN1B
IRF1
GATA3 ANGPT2
GAPDH ETS2
IRF8
NFKB1
FOXO1
ETS1
REL ICAM1 ILF3
Figure 6. Promoter Interactions Link GWAS SNPs with Putative Target Genes
(A) Enrichment of GWAS summary statistics at PIRs by tissue type. Axes reflect blockshifter Z scores for two different tissue group comparisons, first lymphoid
versus myeloid, then additionally within the myeloid lineage. Traits are labeled and colored by category (BMI, body mass index; BP_D, diastolic blood pressure;
BP_S, systolic blood pressure; CD, Crohn’s disease; CEL, celiac disease; FNBMD, Femoral neck bone mineral density; GLC, glucose sensitivity; GLC_B, glucose
sensitivity BMI-adjusted; HB, hemoglobin; HDL, high-density lipoprotein; HEIGHT, height; INS, insulin sensitivity; INS_B, insulin sensitivity BMI-adjusted; LDL,
low-density lipoprotein; LSBMD, lumbar spine bone mineral density; MCH, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration;
MCV, mean corpuscular volume; MS, multiple sclerosis; PBC, primary biliary cirrhosis; PCV, packed cell volume; PLT, platelet count; PV, platelet volume; RA,
rheumatoid arthritis; RBC, red blood cell count; SLE, systemic lupus erythrematosis; T1D, type 1 diabetes; T2D = type 2 diabetes; TC, total cholesterol; TG,
triglycerides; UC, ulcerative colitis).
(B) Blockshifter enrichment Z scores of GWAS summary statistics in PIRs by individual tissue type using endothelial cells as a control. Red indicates enrichment in
the labeled tissue; green indicates enrichment in the endothelial cell control.
(C) Example of the COGS gene prioritization method in 1p13.1 RA susceptibility region. GWAS summary p values for association with RA (Okada et al., 2012) (top)
are transformed into posterior probabilities for variant being causal (middle), which are then aggregated at all PIRs interacting with a given gene, accounting for
LD, to compute gene scores. Arcs representing promoter-PIR interactions are color coded with genes.
(D) Bubble plot of traits with significant enrichment (p.adj < 0.05) in one or more pathways from the Reactome database (Fabregat et al., 2016). Top numbers
indicate the total number of genes analyzed for each trait (gene score >0.5), bubble size indicates the ratio of test genes to those in the pathway, and blue to red
corresponds to decreasing adjusted p value for enrichment.
(E) The ‘‘core autoimmune disease network’’ containing the 421 highest-scoring genes prioritized for autoimmune disease. Genes (nodes) are color coded based
on diseases for which they were prioritized as candidates by the COGS algorithm. Edges between genes are drawn based on prior knowledge about their physical
interactions, predicted interactions and pathway associations obtained from GeneMania (Montojo et al., 2010) and are color coded accordingly. Inset shows
gene names for the highest-connected central part of the network. See Quantification and Statistical Analysis.
See also Figure S6 and Table S3.
CONSORTIA
STAR+METHODS
The contributing members of the BLUEPRINT Consortium (http://www.
Detailed methods are provided in the online version of this paper blueprint-epigenome.eu) are Joost H. Martens, Bowon Kim, Nilofar Sharifi,
and include the following: Eva M. Janssen-Megens, Marie-Laure Yaspo, Matthias Linser, Alexander
Kovacsovics, Laura Clarke, David Richardson, Avik Datta, and Paul Flicek.
d KEY RESOURCES TABLE
d CONTACT FOR REAGENT AND RESOURCE SHARING AUTHOR CONTRIBUTIONS
d EXPERIMENTAL MODEL AND SUBJECT DETAILS
d METHOD DETAILS Conceptualization, P.F. and M.S.; Methodology, B.M.J., O.S.B., C.W., S.M.H.,
J.C., P.F.-P., and M.S.; Investigation, B.M.J.; Formal Analysis, O.S.B., S.P.W.,
B Cell Isolation and Purity Test
R.K., S.M.H., S.S., J.C., S.W.W., C.V., M.J.T., P.F-P., F.W., C.W., and M.S.;
B Cell Fixation
Resources, M.F., F.B., S.F., A.J.C., K.R., K.D., L.G., BLUEPRINT Consortium,
B Hi-C Library Preparation H.G.S., M.K., J.A.T., D.R.Z., and W.H.O.; Writing, M.S., B.M.J., and P.F., with
B Biotinylated RNA Bait Library Design contributions from all authors; Supervision, P.F., M.S., M.F., C.W., D.R.Z.,
B PCHi-C O.S., W.H.O., and J.A.T.; Project Administration, M.S., M.F., W.H.O., and P.F.
B Sequencing
d QUANTIFICATION AND STATISTICAL ANALYSIS ACKNOWLEDGMENTS
Locke, A.E., Kahali, B., Berndt, S.I., Justice, A.E., Pers, T.H., Day, F.R., Powell, Peters, J.E., Lyons, P.A., Lee, J.C., Richard, A.C., Fortune, M.D., Newcombe,
C., Vedantam, S., Buchkovich, M.L., Yang, J., et al.; LifeLines Cohort Study; P.J., Richardson, S., and Smith, K.G.C. (2016). Insight into genotype-pheno-
ADIPOGen Consortium; AGEN-BMI Working Group; CARDIOGRAMplusC4D type associations through eQTL mapping in multiple cell types in health and
Consortium; CKDGen Consortium; GLGC; ICBP; MAGIC Investigators; immune-mediated disease. PLoS Genet. 12, e1005908.
MuTHER Consortium; MIGen Consortium; PAGE Consortium; ReproGen Rajagopal, N., Srinivasan, S., Kooshesh, K., Guo, Y., Edwards, M.D., Banerjee,
Consortium; GENIE Consortium; International Endogene Consortium (2015). B., Syed, T., Emons, B.J.M., Gifford, D.K., and Sherwood, R.I. (2016). High-
Genetic studies of body mass index yield new insights for obesity biology. Na- throughput mapping of regulatory DNA. Nat. Biotechnol. 34, 167–174.
ture 518, 197–206. Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K.
Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold (2015). limma powers differential expression analyses for RNA-sequencing
change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550. and microarray studies. Nucleic Acids Res. 43, e47.
Maller, J.B., McVean, G., Byrnes, J., Vukcevic, D., Palin, K., Su, Z., Howson, Sahlén, P., Abdullayev, I., Ramsköld, D., Matskova, L., Rilakovic, N., Lötstedt,
J.M., Auton, A., Myers, S., Morris, A., et al.; Wellcome Trust Case Control B., Albert, T.J., Lundeberg, J., and Sandberg, R. (2015). Genome-wide map-
Consortium (2012). Bayesian refinement of association signals for 14 loci in ping of promoter-anchored interactions with close to single-enhancer
3 common diseases. Nat. Genet. 44, 1294–1301. resolution. Genome Biol. 16, 156.
Manning, A.K., Hivert, M.-F., Scott, R.A., Grimsby, J.L., Bouatia-Naji, N., Chen, Sanyal, A., Lajoie, B.R., Jain, G., and Dekker, J. (2012). The long-range inter-
H., Rybin, D., Liu, C.-T., Bielak, L.F., Prokopenko, I., et al.; DIAbetes Genetics action landscape of gene promoters. Nature 489, 109–113.
As Lead Contact, Mikhail Spivakov is responsible for all reagent and resource requests. Please contact Mikhail Spivakov at mikhail.
spivakov@babraham.ac.uk with requests and inquiries. Raw data are shared under managed access in accordance with the ethical
consent signed by the volunteers. Recall of Cambridge BioResource volunteers is by application. Processed data have been made
publicly available as described below.
Human primary blood cells were obtained from either a single healthy donor (Mon, Neu, M40 (2/3 reps), M41, M42 (1/3 reps), Ery,
EndP, nCD4 (1/4 reps), tCD4, tCD8 (2/3 reps), tB, FetT) or pooled from multiple healthy donors (MK, M40 (1/3 reps), M42 (2/3 reps),
METHOD DETAILS
Cell Fixation
8x107 cells per library were resuspended in 30.625 ml of DMEM supplemented with 10% FBS, and 4.375 ml of formaldehyde was
added (16% stock solution; 2% final concentration). The fixation reaction continued for 10 min at room temperature with mixing and
was then quenched by the addition of 5 ml of 1 M glycine (125 mM final concentration). Cells were incubated at room temperature for
5 min and then on ice for 15 min. Cells were pelleted by centrifugation at 400g for 10 min at 4 C, and the supernatant was discarded.
The pellet was washed briefly in cold PBS, and samples were centrifuged again to pellet the cells. The supernatant was removed, and
the cell pellets were flash frozen in liquid nitrogen and stored at 80 C.
PCHi-C
Capture Hi-C of promoters was carried out with SureSelect target enrichment, using the custom-designed biotinylated RNA bait
library and custom paired-end blockers according to the manufacturer’s instructions (Agilent Technologies). After library enrich-
ment, a post-capture PCR amplification step was carried out using PE PCR 1.0 and PE PCR 2.0 primers with 4 PCR amplification
cycles.
Sequencing
Hi-C and PCHi-C libraries were sequenced on the Illumina HiSeq2500 platform. 3 sequencing lanes per PCHi-C library and 1
sequencing lane per Hi-C library were used.
where the weights dc,i are distances between cell type c and cell types i, calculated using the complete dataset (e.g., CHiCAGO
interaction scores for all interactions or expression values for all genes; distances calculated using Euclidean distance
metric). The distance weights are introduced to account for imbalances in the distances between cell types. For example,
among the cell types considered here are three types of macrophages that are likely to have very similar profiles of the
measured property compared with other analyzed cell types (and so the distances between macrophage samples will also
be smaller than between macrophages and other cell types). The distance weights focus the calculation of sc on cell types
that are relatively more distant from cell type c. In this example therefore, they will result in the calculation of sc for each
type of macrophage placing relatively little weight on the other types of macrophages. Without this weighting, specificity scores
for macrophages would be smaller on average simply because macrophages are over-represented among the cell types
considered.
Calculation and Clustering of Gene Specificity Scores (Interactions with Active Enhancers)
We quantified the cell type-specificity of each gene’s interactions with active enhancers through calculation of gene specificity
scores. This analysis was restricted to the eight cell types for which BLUEPRINT expression and histone modification data were
available. The original set of high-confidence interactions was filtered to (i) only contain baits that mapped exclusively to a unique
protein-coding gene promoter and (ii) only contain interactions for which at least one of the eight cell types has both a CHiCAGO
score R 5 and an active enhancer. For this analysis, PIRs were considered as ‘‘active enhancers’’ if they contained proximal/distal
enhancer or transcription start site features (based on the Ensembl Regulatory Build) that were found to be in the active state based
on ChromHMM segmentations of the histone modification data in the corresponding cell type. This resulted in a set of 139,835 in-
teractions and 7,004 unique baits. To focus the analysis on active enhancers, for each interaction CHiCAGO scores were set to zero
for cell types where the enhancer had an inactive status. Finally, to avoid large CHiCAGO scores dominating the specificity analysis,
scores were asinh-transformed and values larger than a threshold of 4.3 (equivalent to a score z36.8) were set to 4.3. We refer to
these scores as ‘‘processed CHiCAGO scores.’’
For each enhancer-promoter interaction, specificity scores sc for each cell type c were calculated as described above (see ‘‘Defi-
nition of specificity scores’’ and equation therein), with xi defined as the processed CHiCAGO score for cell type i. The distance
weights weights dc,i were calculated based on the full set of CHiCAGO interaction scores (asinh-transformed with upper threshold
of 4.3). Now consider a single gene (protein-coding gene promoter) g. Let ng denote the number of enhancer interactions this gene
has among the set of 139,835 interactions. The gene then has ng specificity scores sc for cell type c, one for each interaction. These ng
scores are averaged to obtain the interaction-based gene specificity score for cell type c, sgc . The heatmap in Figure 4B shows these
scores for eight cell types and 7,004 genes.
eQTL Analysis
To evaluate the number of lead eQTLs in monocytes and B cells (Fairfax et al., 2012) that physically contact their target gene pro-
moters, we performed association tests using LIMIX (Lippert et al., 2014) within 2Mb windows around the gene bodies. For each
gene expression probe, at most one lead eQTL SNP was considered at FDR < 10%. We then counted cases, whereby the lead
eQTL or at least one SNP in LD with it (r2 > = 0.8, based on the 1000 Genomes EUR cohort (Auton et al., 2015)) overlapped a PIR
for the eQTL-associated gene. The same strategy was taken to evaluate the number of PIRs detected in at least one of the 17
cell types overlapping cis-eQTLs (FDR < 10%) for the PIR target genes reported in the whole-blood meta-analysis study (Westra
et al., 2013).
To compute the enrichment of eQTLs at PIRs in the monocyte and B cell data (Fairfax et al., 2012), we used LIMIX to perform as-
sociation tests between each SNP overlapping each PIR and the expression of the respective PIR-connected gene probe. The same
analysis was performed at random regions (‘‘randomised PIRs’’) generated in a manner maintaining the distribution of distances and
spatial interdependencies of the observed PIRs and accounting for the strand directionality of the genes. Specifically, the bait posi-
tion of all PIRs of a given gene was shifted to the bait position of another randomly selected gene. This procedure was performed for
all genes over 1000 permutations. If the randomly selected gene was on the opposite strand compared to the gene of origin, the set
of interactions was mirrored around the bait position. Enrichment was assessed by comparing a) proportions of SNPs that are eQTLs
for the PIR-connected target gene (Figures 5A and 5B) and b) proportions of PIR-connected genes with at least one significant
association (Figures S5A and S5B) at the observed and randomized PIRs over binned distances between the PIRs and the target
gene TSS. The p values were adjusted for all tests across variants and genes in each distance bin.
For the examples of SNPs in PIRs, associations of PIRs (plus extra 500bp on either side of them) with the connected gene expres-
sion were tested for each gene, and the p values were corrected globally for all tests across all variants and genes. Significant
associations were reported at FDR < 10%.
To assess the enrichment of whole-blood cis-eQTLs at the PIRs of their target genes (Figure S5C), we randomized PIRs in the same
way as for the monocyte and B cell analysis presented above, and compared the overlap of observed versus randomized PIRs with
the lead eQTL SNPs for the PIR-connected genes or SNPs in LD with them.
Integration of GWAS Summary Statistics with Tissue Specific PCHi-C and Functional Information
In order to prioritize genes, traits and tissues for further study we developed the COGS algorithm to compute tissue specific gene
scores for each GWAS trait, taking into account linkage disequilibrium, interactions and functional SNP annotation. For each
GWAS trait, and for each SNP in a given recombination block, we used Wakefield’s synthesis (Wakefield, 2009) to compute approx-
imate Bayes factors and thus the posterior probability for that SNP being causal for that trait assuming at most one causal variant in
the recombination block (Maller et al., 2012). For each gene annotation, for which we have at least one high-confidence interaction
(CHiCAGO score > = 5), and recombination block we compute a block gene score that is composed of the contributions of three
components: (1) coding SNPs in the annotated gene as computed by VEP (McLaren et al., 2010), (2) promoter SNPs, which we define
as SNPs that overlap a region encompassing the bait and flanking HindIII fragments and not any coding SNPs, (3) SNPs that overlap
PIRs for a tissue or set of tissues that do not overlap coding SNPs. Thus for a given target gene, recombination block and trait we can
derive a block ‘‘genescore’’ that is the sum of the posterior probabilities (as computed by PMI) of SNPs overlapping each component.
We assume statistical independence between blocks, so that we can combine block genescores to get an overall ‘‘genescore’’:
Y
genescore = 1 ð1 genescore:blockÞ:
TAD-Based Prioritization
To compare COGS with ‘‘brute-force’’ TAD-based prioritization, we computed TAD-level scores for eight autoimmune traits across
eight cell types. Briefly, for each TAD in each cell type, we subdivided and summed posterior probabilities for each trait (excluding the
MHC region) by overlap with 0.1cM recombination blocks to obtain block TAD scores, removing coding SNPs, and computed an
overall TAD score such that:
Y
TAD:score = 1 ð1 TADscore blockÞ:
A TAD score was assigned to each gene mapping within the respective TAD in each tissue, and the maximum score across all eight
cell types was selected.
Software
Scripts to compute specificity scores are available at https://github.com/Steven-M-Hill/PCHiC-specificity-score-analysis. Imple-
mentations of the PMI, blockshifter and COGS algorithms, along with supporting documentation, are available at https://github.
com/ollyburren/CHIGP.
Data Resources
The accession number for the raw sequencing reads reported in this paper that were deposited to EGA (https://www.ebi.ac.uk/ega)
is EGAS00001001911. Lists of PHi-C-detected significant interactions, detected interactions between active promoters and active
enhancers, and a comparison of interactions scores between PCHi-C and reciprocal capture Hi-C experiments are available as part
of the Data S1 archive. High-confidence interactions (CHiCAGO score > = 5 in at least one cell type) are available via the CHiCP
browser (Schofield et al., 2016), where they can be visualized alongside GWAS data (https://www.chicp.org) and as custom tracks
for the Ensembl browser (ftp://ftp.ebi.ac.uk/pub/contrib/pchic/CHiCAGO). The regulatory build annotations and segmentations of
the BLUEPRINT datasets are available as a track hub for the Ensembl browser (ftp://ftp.ebi.ac.uk/pub/contrib/pchic/hub.txt). Further
processed datasets, including TAD definitions, regulatory region annotations, specificity scores and gene prioritization data, are
available via Open Science Framework (https://osf.io/u8tzp).
A
Obser ved
Randomized TADs
Mon nCD4 Ery
4000 4000 4000
Number of baits
Number of baits
Number of baits
3000 3000 3000
0 0 0
1 0.5 0 1 0.5 0 1 0.5 0
Fraction of within-TAD interactions per bait
2 2
sDI
sDI
0 0
-2
-2
77680000 77830000
0 77980000 78130000 78280000 132400000 132600000 132800000 133000000 133200000 133400000 Bait
PIR
chr11 chr3 TAD boundary
100
150
200
250
100
150
200
250
100
150
200
250
100
150
200
250
100
150
200
250
100
150
200
250
100
150
200
250
100
150
200
250
50
50
50
50
50
50
50
50
0
0
0
50
100
150
200
250
0
50
100
150
200
250
-2 -1 0 1 2
log2 enrichment
C chromosome 1
D
MK
15
MK
Count
7.5
Ery
Ery
0
0.8 0.9 1
Neu Value
Neu
Mon
Mon Mon
Neu
Mon Neu
Mφ0
Mφ0 Mφ0
MK
Mφ0 MK
nCD4 Ery
Ery
nCD4 nCD4
nCD4
nCD8 nCD8
nCD8
nCD8 nB
nB
nB
nB
nB
nCD8
nCD8
nCD4
nCD4
Ery
Ery
MK
MK
Mφ0
Mφ0
Neu
Neu
Mon
Mon
nB
0 20 40 60 80 100 120 140 160 180 200 220 240
genomic position (Mb)
1.0
1.0
1.0
1.0
Cumulative density
Cumulative density
Cumulative density
Cumulative density
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
q2 HCI q2 HCI q2 HCI q2 HCI
Asinh CHiCAGO score (reciprocal capture system)
HCI in PCHi-C
Other
0.5Mb 0.5Mb
200
80
150
60
TRPC3 TES
100
40
N
N
50
20
0
0
500
800
400
600
300
533781 695079
N
N
400
200
200
100
0
0
400
500
400
300
300
200
533782 695082
N
N
200
100
100
0
TES
TRPC3
Figure S2. Validation of Promoter Interactions Using Reciprocal Capture Hi-C, Related to Figure 1
(A) Cumulative density plots showing the distributions of asinh-transformed CHiCAGO interaction scores for promoter-containing reciprocal capture Hi-C
fragment pairs that are detected as high-confidence interactions (HCI) in the PCHi-C analyses in the respective cell types (blue line - HCI; CHiCAGO score > = 5)
1000
Observed ***
Myeloid
Number of baits
Expected
3138 ***
2
500
845 4305
11237
Invariant Lymphoid
74 795
423 1
0
0.0 0.2 0.4 0.6
Variance of specificity score
across interactions of the same bait 0
tB tCD4 tCD8 Ery Mon
Regulatory
build
HindIII
fragments
Activity
PCH-iC
Ery
Activity
PCH-iC
nCD8
Activity
PCH-iC
Mon
LCR
Ensembl Homo sapiens version 83.37 (GRCh37.p13) Chromosome 11: 5,015,756 - 5,934,932 100Kb
Ensembl annotation Regulatory build annotation ChromHMM activity
Protein coding Processed transcript Promoter CTCF binding site TFBS
Distal enhancer Active Inactive
Pseudogene RNA gene Proximal enhancer DHS
6
Residual gene expression
4
2
0
-2
-4
-6
0 10 20 30 40 50 60 70
No. of PIRs
B
Mon Mφ0 Mφ1 Mφ2
Mean gene specificity score
(expression data)
1
1 1 1
● ● ● ●
● ●
● ●● ● ● ●● ●
●● ● ●
● ●
●●●●
● ●
●
● ●
● ●
0
●●● ● ●● 0
●●●●
●●
●
●● ●●●●●
●
●●●●
● ●
●● 0 0 ●
●●
●●
● ●●
● ●
−1
−1 −1 −1
−1 0 1 2 0 1 2 0 1 2 0 1 2
C Cluster ID
1 2 34 5 6 7 8 9 10 11 12
nCD4
Mφ1
Mφ2
Mφ0
Mon
MK
Ery
Neu
Top 100 Mon-specific genes (based on expression)
−4 −2 0 2 4
Gene specificity score
(interactions with active enhancers)
Figure S4. Additional Evidence of the Link between Promoter Interactions and Gene Expression, Related to Figure 4
(A) Partial residual plot of log2-gene expression as a function of the number of PIRs interacting with the respective baited region in the cell types, where the
promoter is active in all analyzed cell types. The trendline is from a linear regression using iterated reweighted least-squares (see Quantification and Statistical
Analysis).
(B) Mean gene specificity score (based on interactions with active enhancers) for each of the clusters in Figure 4B is plotted against analogous mean gene
specificity scores based on expression data for monocytes (Mon) and macrophages M0, M1, M2 (M40-2). Error bars indicate ± SD. Plots for nCD4, MK, Ery and
Neu are shown in Figure 4C. See Quantification and Statistical Analysis for details.
(C) A subset of the heatmap in Figure 4B, showing interaction-based gene specificity scores for the top 100 monocyte-specifically expressed genes (obtained by
ranking genes according to their monocyte (Mon) expression-based specificity scores), together with cluster IDs.
A B C
Monocytes Total B cells Whole blood
1200
0.03 ***
0.03 *** *** 800
***
0.02 0.02
***
*** *** 400
0.01 0.01
0 0 0
Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb
10 50 00 00 00 00 10 50 00 00 00 00
0- 10- 0-1 0-2 0-5 -10 0- 10- 0-1 0-2 0-5 -10
5 10 20 00 5 10 20 00
5 5
Lead eQTL SNP location Lead eQTL SNP location
D E
Total B cells 10Mb Monocytes 2Mb
9 5 6 73 rs1
79 48 08 NCO
81 9 21
rs3 R KA rs4 61 A4
0
AU
chr20
12500000 22500000 32500000 42500000 52500000 46000000 48000000 50000000 chr10
Gene Baited promoter fragment PIR SNP Gene Baited promoter fragment PIR SNP
Gene expression
2 2 2
0
0 0
-2
-2 -2
-4
A/A A/T T/T A/A A/G G/G
A/A A/G G/G
100kb window around rs3817995 100kb window around rs4948673 and rs10821610
20kb 14 40kb 14 40kb
4
3 10 10
-log10p
-log10p
2 6 6
1
2 2
19995000 20015000 20035000 20055000 chr20 45905000 45945000 chr10 51515000 51555000 chr10
AURKA eQTL test rs3817995 AURKA PIR NCOA4 eQTL test rs4948673 rs10821610 NCOA4 PIRs
Figure S5. Further Details on the Enrichment of eQTLs at Promoter-Interacting Regions, Related to Figure 5
(A and B) The proportion of genes with at least one eQTL SNP per gene expression probe located within PIRs compared with the equivalent proportion of eQTL
SNPs located within matched random regions (‘‘randomised PIRs’’) in monocytes (A) and total B cells (B). See Quantification and Statistical Analysis for details on
the randomization strategy. Asterisks represent the significance of enrichment at observed versus randomized PIRs (permutation test *p < 0.05; **p < 0.01; ***p <
0.001).
(C) Number of lead cis-eQTLs in whole blood (FDR < 10%) physically contacting regulated gene promoters (accounting for linkage disequilibrium). Results
obtained with randomized PIRs are shown as controls. Asterisks represent the significance of enrichment at observed versus randomized PIRs (permutation test
*p < 0.05; **p < 0.01; ***p < 0.001).
(D) An example of an extremely long-range eQTL association between rs3817995 and AURKA expression in total B cells, with the SNP located > 30 Mb away from
AURKA transcription start site (TSS). The gray dashed line represents the significance threshold.
(E) An example of two independent eQTL signals detected for NCOA4 in monocytes, with the primary eQTL SNP (rs4948673) located > 5 Mb away from the TSS.
The second, independent eQTL SNP (rs10821610) is located close (< 20kb) to the NCOA4 TSS. The gray dashed line represents the significance threshold.
A B C
1.00
probability
2.5
expression in IBD
0.75 219 538
2.0
Observed 0.50
Block- 1.5
shifted 0.25 12,490 3,203
HindIII
1.0
0.00
0.00 0.25 0.50 0.75 1.00 COGS TAD-based
TAD-based gene scores Crohn's disease
(max across 8 AI diseases) Ulcerative colitis
D F
SLE SLC15A4 tB RA GIN1 tB
8 8
6
-log 10 p
-log 10 p
6
4
2 2
E G
SLE BLK tB RA RASGRP1 tB
20
10
-log 10 p
-log 10 p
10 6
2
0
10500000 11500000 chr8 38000000 39000000 chr15
eQTL test GWAS test eQTL test GWAS test
Figure S6. Colocalization of GWAS and eQTL Signals at Prioritized Candidate Genes, Related to Figure 6
(A) A schematic of the permutation strategy implemented in blockshifter. GWAS summary statistics are converted to posterior probabilities for a given SNP to be
causal (red dots depict SNPs likely to be causal, blue dots depict other SNPs). Blocks of adjacent PIRs found in either test (purple) or control (cyan) tissue sets,
separated by two or more non-PIR HindIII fragments (gray), are then defined. Labels of HindIII fragments within each block are then rotated (‘block-shifted’) to
generate test sets for estimating the empirical variance of the test statistic under the null while accounting for genomic structure.
(B) Comparison of COGS prioritization scores with those obtained using a ‘‘brute-force’’ algorithm based on shared TADs for eight autoimmune (AI) diseases (see
Quantification and Statistical Analysis for details). Quadrants correspond to genes not exceeding the score cutoff of 0.5 with both methods, and exceeding it with
just one or both methods. Counts of genes in each quadrant are shown.
(C) Odds ratios of differential expression in the immune cells of irritable bowel disease (IBD) patients (FDR < 5%) (Peters et al., 2016) for genes prioritized for
Crohn’s disease (purple) and ulcerative colitis (blue) by the PCHi-C-based COGS or a TAD-based algorithm (score > 0.5).
(D–G). 2 Mb windows around the genes prioritized by the GWAS/PCHi-C based algorithm in rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE)
were overlapped with eQTLs for the same genes in B cells. In five cases high LD (r2 > 0.8) was detected between the GWAS lead SNP and the eQTL lead SNP in the
2Mb regions. Shown are Manhattan plots for two SLE-prioritized genes (SLC15A4, panel D; BLK, panel E) and two RA-prioritized genes (GIN1, panel F;
RASGRP1, panel G), for which high LD (r2 > 0.8) was detected between the GWAS lead SNP and the eQTL lead SNP, providing evidence for colocalization of the
GWAS and eQTL signals in these regions.
Resource
Correspondence
dhg@mednet.ucla.edu (D.H.G.),
prabhakars@gis.a-star.edu.sg (S.P.)
In Brief
As part of the IHEC consortium, this study
characterized histone acetylation
patterns in brain samples from patients
with autism spectrum disorder (ASD),
uncovering a distinct epigenetic
signature in ASD and providing a rich
resource for future molecular analyses of
ASD patients. Explore the Cell Press IHEC
web portal at http://www.cell.com/
consortium/IHEC.
Highlights
d Histone acetylation population study of ASD and control
brain samples
Cell 167, 1385–1397, November 17, 2016 ª 2016 Elsevier Inc. 1385
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
cerebellar dysfunction observed in some animal models of ASD In order to define the core set of chromatin aberrations in
(Abrahams and Geschwind, 2010; de la Torre-Ubieta et al., typical ASD cases, we employed a systematic mathematical cri-
2016). H3K27ac was selected as the representative acetylation terion to exclude atypical samples (STAR Methods). Because the
mark because it highlights active enhancers and promoters number of excluded samples was relatively small (5%–20% of
(Wang et al., 2008; Heintzman et al., 2009; Creyghton et al., total cases and 14%–20% of dup15q), acetylation fold changes
2010) and is also correlated with gene expression and transcrip- were not substantially altered (PFC: R = 0.94, TC: R = 0.90, CB:
tion factor binding (Kumar et al., 2013). We used the data to R = 0.98; Figure S4). We then used the remaining (typical) sam-
define aberrantly acetylated enhancer and promoters in ASD ples to define the final set of DA peaks for each brain region.
brain and thereby characterize commonly altered pathways, up- Strikingly, we detected 5,153 DA peaks in PFC and 7,009
stream regulatory factors, and developmental dynamics of in TC, indicating widespread, systematic histone acetylation
affected loci. In addition, we used chromatin immunoprecipita- changes in ASD cerebral cortex (Figures 1B and 1C). In contrast,
tion sequencing (ChIP-seq) reads to call SNPs within enhancers only 247 DA peaks were detected in CB. The limited molecular
and promoters. We then used the genotype-independent signal pathology of ASD cerebellum is consistent with results from tran-
correlation and imbalance (G-SCI) test (del Rosario et al., 2015) scriptomic studies (Voineagu et al., 2011; Parikshak et al., 2016).
to detect haQTLs in regulatory regions and assessed their rela- To evaluate the likelihood of false-positive DA peaks, we re-
tionship to known psychiatric disease-associated variants. This peated the entire procedure (initial DA peaks, discarding atypical
dataset from post-mortem human brains will provide a rich samples, final DA peaks) after randomly permuting ASD and con-
resource for future molecular analyses of ASD and serve as proof trol labels. At the same false discovery rate (FDR) threshold (Q %
of principle for the HAWAS approach, which can be applied to a 0.05), permuted datasets generated fewer than 100 DA peaks, on
wide variety of human diseases. average. Moreover, after 1,000 tries, none of the permutated data-
sets generated as many DA peaks as the true dataset (Figure 1B).
Thus, the chromatin changes we detected in ASD samples were
RESULTS far in excess of what would be expected by chance.
To further characterize the overall consistency of the DA peak
Data Generation, Processing, and Differential sets, we examined their overlaps. Over 45% of ASD-upregulated
Acetylation Analysis regions in PFC overlapped ASD-Up peaks in TC (p z 0; Fig-
In total, we performed 257 H3K27ac ChIP-seq assays on tissue ure 1D). The same was true of ASD-downregulated peaks. More-
samples from PFC, TC, and CB, in 94 individuals aged 10 years over, the ASD-versus-control acetylation fold change was highly
and above (45 ASD, 49 control; Figure 1A; Table S1). Forty-eight correlated between PFC and TC (R = 0.86; p z 0). Thus, the
ChIP-seq profiles were discarded based on data quality, result- chromatin dysregulation signature of ASD was highly consistent
ing in a final acetylome set comprising 209 profiles (STAR between the two cortical regions. Cerebellar DA peaks, on the
Methods; Table S2): 81 from PFC (41 ASD, 40 control), 66 from other hand, showed only 5% overlap with same-direction
TC (30 ASD, 36 control), and 62 from CB (31 ASD, 31 control). cortical DA peaks.
We used DFilter (Kumar et al., 2013) to call peaks in each of Of the 45 cases, 7 had a monogenic form of ASD, duplication
the 209 ChIP-seq profiles and then defined two consensus 15q syndrome (dup15q; Figure 1A), while the others had no
peak sets: 56,503 cortical peaks (union of PFC and TC) and detectable structural variants and were therefore idiopathic
38,069 CB peaks. Each consensus ChIP-seq peak defined a re- (Parikshak et al., 2016). It is possible that individuals with
gion of focal histone acetylation and thus represented a putative syndromic ASD could have unique chromatin aberrations. We
promoter or enhancer region. therefore defined DA peaks separately for syndromic and
The heights (aggregate read counts) of consensus peaks idiopathic ASD, relative to the same set of controls (STAR
represent acetylation levels of cortical and cerebellar regulatory Methods). Remarkably, acetylation changes were highly concor-
regions in each sample. We normalized these peak heights for dant genome-wide between the two forms of ASD (Figure 1E,
GC content (Figure S1) and distributional skews and then R = 0.88 in PFC, R = 0.87 in TC). To maximize statistical power,
controlled for confounders by regressing out multiple biological we therefore retained the original set of DA peaks based on all
covariates such as age, sex, and proportion of neurons and ASD samples (syndromic plus idiopathic).
also multiple technical covariates (STAR Methods; Figure S2). PFC and TC gene expression levels have been comprehen-
Corrected peak heights were used to define an initial set of differ- sively measured using RNA sequencing (RNA)-seq in a parallel
entially acetylated (DA) loci between ASD and control in each study on the same cohort (Parikshak et al., 2016). To investi-
brain region (Wilcoxon rank sum test; Q % 0.05, fold change gate the consistency between chromatin aberrations and
R1.3). Based on acetylation levels at these DA loci, we gene expression changes in ASD, we focused on promoter re-
measured inter-sample divergence and found that a small num- gions of differentially expressed (DE) genes (FDR % 0.05; linear
ber of atypical ASD samples showed greater similarity to control mixed model) (Parikshak et al., 2016). We used the EFilter tool
and vice versa (Figure S3). This was not surprising, given the (Kumar et al., 2013) to convert promoter histone acetylation
tremendous etiological heterogeneity of ASD and previous find- profiles into expression estimates and then identified the subset
ings from transcriptomic analysis (Voineagu et al., 2011). Never- of DE genes whose acetylation-based expression estimates
theless, in the majority of cases, ASD acetylomes resembled were significantly divergent between ASD and control (Q %
each other more than they resembled control and vice versa 0.05; Wilcoxon test; Benjamini-Hochberg correction), after con-
(Figure S3). trolling for covariates as before. At these gene loci, promoter
2 0 3 1 3 4
78 156
CB CB R=0.86
PC1 (52.1%)
E PFC TC
Union of DA peaks Union of DA peaks
2 2
log2FC idiopathic
log2FC idiopathic
PC2 (12.9%)
0 0
R=0.88 R=0.87
P≈0 P≈0
-2 -2
-2 0 2 -2 0 2
log2FC dup15q log2FC dup15q
PC1 (44.7%)
F TC
1 R=0.33 1 R=0.38
log2FC DA (EFilter)
log2FC DA (EFilter)
PC2 (10.5%)
P=0.016 P=0.0012
0 0
-1 -1
-2 0 2 -2 0 2
PC1 (28.4%) log2FC DGE (RNAseq) log2FC DGE (RNAseq)
acetylation changes were significantly correlated with expres- to CB, due to the lack of detectable DE genes in that tissue.
sion fold change, both in PFC (R = 0.33; p = 0.016) and in TC Thus, while measurement noise and biological differences be-
(R = 0.38; p = 0.0012) (Figure 1F). This analysis was not extended tween chromatin variation and expression variation may have
0 10 20 0 10 20
-log10 (P value) -log10 (P value)
B
chr12 2,200,000 2,900,000 chr12 13,500,000 14,300,000
PFC Up PFC Up
TC Up TC Up
CACNA1C GRIN2B
D
chr7 50,700,000 50,850,000 chr2 240,000,000 240,300,000
PFC Down PFC Down
TC Down TC Down
GRB10 HDAC4
Developmental Stage Specificity of Epigenetically Histone Acetylation QTLs in Human Brain Regions
Dysregulated Loci Noncoding genetic variants that affect disease susceptibility
It has been shown that genes upregulated during early postnatal potentially act via a gene regulatory mechanism (Boyle et al.,
development are often differentially expressed in adolescent and 2012; Maurano et al., 2012). Because histone acetylation serves
adult ASD brain (Parikshak et al., 2013). We therefore asked as a measure of gene regulatory function, such variants are also
whether early postnatal genes might also be enriched for the likely to influence acetylation levels. It is therefore instructive to
ASD-related acetylation changes we detected in older subjects identify histone acetylation QTLs (haQTLs), which are defined
(R10 years old). Using a database of human RNA-seq profiles as genetic variants that correlate with population variation in his-
(BrainSpan, 2015), we defined the 2,000 genes most upregu- tone acetylation (del Rosario et al., 2015). As we and others have
lated at each developmental stage (fold change relative to me- previously shown (del Rosario et al., 2015; Grubert et al., 2015),
dian expression across all stages). We then tested for enrich- haQTLs can be used to prioritize causal variants within disease-
ment of DA peaks near each such gene set. This analysis was associated loci.
performed separately for PFC and TC, using expression profiles To identify haQTLs in the three human brain regions, we used
from the corresponding regions of the developing human brain. the G-SCI pipeline that was previously validated on lymphoblas-
As expected, ASD-Up DA peaks in the adult (more precisely, toid cell lines (del Rosario et al., 2015). The pipeline uses ChIP-
R10 year) brain were significantly overrepresented near adult- seq reads to call DNA sequence variants in active regulatory re-
upregulated genes (Figure 4). Surprisingly, however, we found gions, followed by filtering to remove low-confidence variants
even greater enrichment of ASD-Up DA peaks near genes upre- (STAR Methods). By analogy to exome sequencing, this stage
gulated at 10–12 months after birth, which corresponds to the of the pipeline can be termed ‘‘regulome sequencing.’’ A unique
stage of synapse formation, and neuronal maturation. In aspect of the G-SCI method is that called variants need not be
contrast, ASD-Down DA peaks did not show stage-specificity. explicitly genotyped. Rather, counts and base qualities of refer-
Thus, although chromatin aberrations in ASD affect genes with ence- and alternative-allele ChIP-seq reads are used to infer ge-
a broad variety of developmental specificities, genes upregu- notype likelihoods. These likelihoods are then used to compute
lated at or near 12 months after birth are particularly strongly the haQTL p value of the variant using the G-SCI test, which max-
associated with increased acetylation in ASD cortex. imizes statistical power by combining information from peak
B
Motif Fold
Peaks Motif name Protein Motif logo P-value Q-value
class enrichment
PFC up V$E4BP4_01 PAR-BZIP E4BP4, HLF, TEF, DBP 9e-8 <1e-4 1.70
PFC down V$SPIB_01 ETS ETS family M01204 2e-6 7e-4 1.34
height variability and allelic imbalance across all individuals independent analysis at a genome-wide level. We therefore in-
within the cohort. In order to separate the cis effect of regulatory tersected the haQTL set with genome-wide significant (p %
SNPs from more general disease effects, we first adjusted ChIP- 5e-8) variants known to be associated with shared aspects of
seq peak heights by regressing out the diagnosis variable (ASD five psychiatric disorders: schizophrenia, bipolar disorder, major
versus control). We then applied the G-SCI test to called SNPs depressive disorder, ASD, and attention-deficit/hyperactivity
and identified 2,000 haQTLs in each of the three brain regions disorder (Cross-Disorder Group of the Psychiatric Genomics
(Figure 5A; Table S7). Note that these haQTLs are not specific to Consortium, 2013). While this GWAS set was too small to test
ASD. Rather, they represent region-specific regulatory variation for statistical enrichment near haQTLs, we did uncover two in-
in the general population. stances where brain haQTLs were strongly linked (R2 R 0.8) to
GWAS analyses have not so far uncovered statistically sig- disease-associated variants (Table S7). Most notably, an haQTL
nificant ASD-associated variants that have been replicated in (rs4765905) in an intron of the syndromic ASD gene CACNA1C
-log10(FDR)
30 (A) ASD-Up DA peaks in PFC are most significantly
40 enriched near genes upregulated 1 year after
1.65 1.49 20 1.36 birth. Bar height indicates enrichment Q value
20 (FDR). Numbers above bars indicate fold enrich-
10
1.15 1.15 ment (Q % 0.05).
1.13
0 0 (B) Similar plot, TC.
10mos
8-40yrs
4mos
4yrs
4mos
10mos
4yrs
8-40yrs
25pcw
37pcw
8-9pcw
12-16pcw
17-19pcw
21-24pcw
26pcw
8-9pcw
12-16pcw
17-19pcw
21-24pcw
26pcw
37pcw
1yr
1yr
was linked to multiple psychiatric disease-associated SNPs peaks for genetic differentiation between patients and controls
within the locus (Figure 5B). Based on Hi-C data from (chi-square test). The distribution of genetic differentiation p
GM12878 cells (Jin et al., 2013), the putative enhancer contain- values was close to uniform (data not shown), suggesting that
ing this haQTL SNP was predicted to form a long-range loop genetic variation in cis SNPs is not a major contributor to case-
to the CACNA1C promoter, suggesting that it could exert its in- control acetylation differences at DA peaks. It is thus likely that
fluence on psychiatric disease by modulating the chromatin ASD-specific differential acetylation is driven mostly by other
state of CACNA1C. In addition, we intersected haQTLs with factors such as environmental influences, SNPs in trans (at a
128 SNPs associated with schizophrenia in a recent large-scale different locus), indels, and larger chromosomal variants (Krumm
meta-analysis of schizophrenia (Ripke et al., 2014). This analysis et al., 2015).
revealed two additional haQTLs strongly linked to psychiatric Overall, acetylation changes in ASD cerebral cortex were
disease-associated variants (Table S7). For example, we found significantly correlated with differential gene expression, consis-
that the haQTL SNP rs8054791 was linked to the schizo- tent with the known functional consequences of these alterations
phrenia-associated variant rs9922678 in an intron of GRIN2A, in chromatin structure (Figure 1F). However, the majority of DA
a glutamate receptor gene that has also been associated with peaks did not lie next to DE genes. This is consistent with previ-
ASD (Figure 5C). ous studies; we and others have shown that differences in chro-
matin state between two sample types are only moderately
DISCUSSION correlated with differential expression (Kumar et al., 2013; Yen
and Kellis, 2015). Differences in the sensitivity of ChIP-seq and
Despite etiological heterogeneity, our results indicate that RNA-seq at various loci could provide one explanation for this
shared aberrations in histone acetylation are widespread in phenomenon. For example, post-mortem RNA degradation or
ASD cerebral cortex: over 5,000 enhancer or promoter loci low steady-state mRNA levels could reduce the detectability of
were systematically shifted up or down (Figure 1B). The fact DE genes in some cases, while low read mappability or occlusion
that histone acetylation changes were broadly similar between of the acetylated epitope (for example) could limit the sensitivity
PFC and TC indicates similarity in ASD mechanisms across of DA peak analysis at other loci. Moreover, noise levels could
cortical regions and also suggests that our results on differential vary between the mRNA and chromatin readouts at individual
acetylation are unlikely to represent methodological artifacts. loci, resulting in differential statistical power. Finally, although
Note that, as expected for a complex disorder with highly hetero- histone acetylation and gene expression are correlated in gen-
geneous etiology, this global signature of chromatin alteration is eral, post-transcriptional regulation, other histone modifications,
not shared by all ASD samples (Figure 1C). An earlier transcrip- DNA methylation status, and the influence of additional regulato-
tomic study revealed a similar pattern of changes shared by ry elements within the same locus could all contribute to genuine
many, but not all, ASD cases (Voineagu et al., 2011). Neverthe- biological differences between mRNA fold change and acetyla-
less, the fact that the majority of patients conform to a single tion shifts. Thus, case-control chromatin profiling could serve
global epigenomic pattern indicates that the diverse causal as a valuable complement to the more common strategy of tran-
mechanisms of ASD have shared downstream effects on the scriptomic profiling by highlighting novel disease mechanisms.
acetylome. In contrast to cerebral cortex, only 247 loci were We found evidence for shared pathways and functional
found to be perturbed in cerebellum, indicating that the former themes among DA loci in ASD cerebral cortex (Figure 2). Among
is affected to a much greater degree. This disparity between loci with increased H3K27ac, there was strong enrichment for
ASD cerebrum and cerebellum has also been observed at the genes related to ion channels, synaptic function, and epilepsy/
transcriptomic level (Voineagu et al., 2011). Syndromic dup15q neuronal excitability, all of which have previously been shown
cases showed acetylome alterations that were highly correlated to be dysregulated in this disorder (Voineagu et al., 2011; Bour-
with those observed in idiopathic ASD (R R 0.87 in cerebral geron, 2015). Moreover, these adult DA loci were strongly en-
cortex), suggesting that most chromatin aberrations are shared riched for genes developmentally upregulated at or around
between idiopathic ASD and this syndromic form. 12 months of life (Figure 4), which coincides with the peak of
To examine the genetic basis of the epigenomic aberrations early experience-dependent synaptogenesis. A similar temporal
detected in ASD, we tested all high-coverage SNPs within DA enrichment has also been observed for cerebral DE genes in
C GRIN2A
chr16: 9,940,000 haQTL: rs8054791 9,950,000
AA The above functional enrichments have
AA intriguing links to ASD epidemiology and
AG results from model organisms. In addition
AG to the well-studied roles of synaptic, ion-
GG channel, and glutamate-pathway genes
GG in ASD (Schmunk and Gargus, 2013; Par-
ikshak et al., 2015), exposure to HDAC in-
30 Reference reads 0.6 hibitors in utero has been linked to ASD
Nonreference reads and ASD-like symptoms in humans and
Peak heights
20
0
cial deficits in rodents (Chomiak et al.,
2013; Christensen et al., 2013). HDAC
10
suppression could thus be a common
-0.4 epigenomic feature of ASD. Chemokine
0 AA AG GG pathway changes in ASD are also plau-
Individuals
sible. Suppression of the chemokine re-
ceptor gene CX3CR1, which flanks eight
ASD (Parikshak et al., 2013). Loci with decreased acetylation in downregulated peaks in TC (p = 1.4e-8, Table 1), causes micro-
ASD also converged on shared functional categories, such as glial activation (Wolf et al., 2013). Moreover, CX3CR1 knockout
digestive tract morphogenesis, chemokine signaling, HDAC ac- mice have two phenotypes observed in autism: impaired
tivity, and immune processes related to microglia. Note that it is social interaction and increased repetitive behavior (Zhan
possible for functional categories to appear systematically en- et al., 2014). Finally, the enrichment of downregulated DA peaks
riched in DA peaks merely because of the contribution of a single near digestive tract morphogenesis genes could point to the
highly enriched ‘‘jackpot’’ gene. However, our results are likely to existence of pleiotropic loci potentially contributing to the co-
be robust to such artifacts, because we discarded functional morbidity of gastrointestinal problems with ASD (McElhanon
terms that had fewer than five genes near DA peaks and then et al., 2014).
manually inspected the remaining top hits (shown in Figure 2) In addition to pathway-level chromatin aberrations, we found
for jackpot effects. While the primary causes of ASD are highly strong enrichment of DA peaks near individual genes. The che-
heterogeneous, it appears that they nevertheless converge on mokine pathway genes CCL3L1/CCL3L3 (p = 3.1e-9) and
shared downstream epigenomic changes associated with spe- CX3CR1 (p = 1.4e-8) were both among the top five genome-
cific functions. It is possible that these shared chromatin alter- wide for enrichment in downregulated TC peaks (Tables 1 and
ations could in turn drive some of the shared symptoms of ASD. S6). The top-ranked gene in the same downregulated peak list
Further information and requests for reagents may be directed to, and will be fulfilled by the Lead Contact Shyam Prabhakar
(prabhakars@gis.a-star.edu.sg).
Human Subjects
Brain samples from 45 ASD and 49 control individuals were acquired from the Autism Tissue Program (ATP) at the Harvard Brain
Tissue Resource Center, the University of Maryland Brain and Tissue Bank and the Oxford Brain Bank. Sample acquisition protocols
were followed for each brain bank, and samples were de-identified prior to acquisition. Sample swaps were verified with independent
genotyping. Brain sample and individual level metadata are provided in Table S1.
METHOD DETAILS
Motif Analysis
For motif enrichment analysis, we used the HOMER ChIP-seq pipeline’s findMotifsGenome.pl script with the ‘‘-mknown’’ option
(Heinz et al., 2010). Motif models were drawn from the TRANSFAC vertebrate database (Matys et al., 2006) and the analysis was
performed separately on Up and Down DA peaks from each of the 3 brain regions (6 DA peak sets in total), with all peaks from
the same brain region as background. Motifs were classified as enriched based on fold enrichment (R1.3), FDR (%0.01) and number
of foreground peaks that had a motif match (R20). The list of enriched motifs was almost identical when we used the JASPAR data-
base (Mathelier et al., 2016) instead of TRANSFAC (data not shown).
SNP-Calling Pipeline
ChIP-seq reads were aggregated across all three brain regions for each individual and then passed to the multi-sample SNP-calling
pipeline. Reads used for SNP calling were de-duplicated and retained only if they were mapped to the genome in the correct orien-
tation. We performed indel realignment, base-quality-score recalibration and SNP calling using GATK version 3.2-2 (DePristo et al.,
2011). 1,297,168 SNPs within peaks in all three brain regions were called using GATK’s Haplotype Caller at a SNP quality threshold of
50. Subsequently, SNP calls were filtered out with the following criteria: MQ0Fraction > 0.001, QD < 4.3, within 6 bp of an indel, more
than seven SNPs within a 100-bp region, Mapping Quality < 45, Homopolymer Run > 10, MQ0 > 9.5, Dels > 0.255. Moreover, only
SNP calls covered by at least 5 non-reference reads across all libraries and 3 or more non-reference reads in at least one library were
retained. SNPs that violated Hardy-Weinberg equilibrium with a binomial test P-value 1 3 10 3 were discarded. To eliminate mapping
artifacts, SNPs in highly paralogous regions of the genome implicated by the ‘‘Self Chain’’ track on the UCSC Genome Browser (Kent
et al., 2002) (normalized score R 90) were filtered out. Finally, a high-confidence set of 821,606 SNPs within PFC and TC peaks and
560,972 SNPs called within CB peaks were obtained. Note that we did not perform genotype calling, since the G-SCI test does not
require prior knowledge of genotypes. Rather, it integrates over the likelihoods of all three genotypes for each individual, given the
data (del Rosario et al., 2015).
haQTL Calling
haQTLs were called in the 84 Caucasian samples using G-SCI test (del Rosario et al., 2015). The diagnosis status and top PCs which
account for more than 5% variance were regressed out from peak heights before haQTLs calling. We then performed the G-SCI test
on each of the 821,606 SNPs within peaks for PFC and TC regions and the 560,972 SNPs within peaks for CB. For each SNP, an
Data Resources
The accession number for the ChIP-seq data reported in this paper is Synapse: syn4587616.
A 0.08
PFC mean GC distribution
Fraction of reads
0.06
0.04
0.02
0
0 20 40 60 80 100
GC content (%)
B 0.08
TC mean GC distribution
Fraction of reads
0.06
0.04
0.02
0
0 20 40 60 80 100
GC content (%)
C 0.08
CB mean GC distribution
Fraction of reads
0.06
0.04
0.02
0
0 20 40 60 80 100
GC content (%)
Figure S1. GC Content Distribution of Samples in Three Brain Regions, Related to Figure 1
(A) GC content distributions of 81 samples in PFC were normalized to the mean GC distribution in PFC.
(B) GC content distributions of 66 samples in TC were normalized to the mean GC distribution in TC.
(C) GC content distributions of 62 samples in CB were normalized to the mean GC distribution in CB.
(legend on next page)
Figure S2. Correlation between Top 5 Principal Components and Covariates in Three Brain Regions before and after Regression, Related to
Figure 1
(A) PFC.
(B) TC.
(C) CB.
Pearson correlation coefficient is shown at each grid point. After regressing out correlated confounding factors, the top 5 PCs correlated with none of the co-
variates except diagnosis. InsertSize: fragment median insert size; Dup: percentage of duplicated reads; Reads: sequencing depth; Peaks: number of peaks;
Neuron: neuronal cell fraction.
A PFC
1
1 2 3
Median distance to control
B TC
3
Median distance to ASD
2 A
C
1
1 2 3
Median distance to control
C CB
1.6
Median distance to ASD
1 C
0.4
0.4 1 1.6
Median distance to control
log2FC all
0
-2.5
-2.5 0 2.5
log2FC subset
B TC
2.5
R=0.90
P≈0
log2FC all
-2.5
-2.5 0 2.5
log2FC subset
C 2 CB
R=0.98
P≈0
log2FC all
-2
-2 0 2
log2FC subset
Figure S4. Acetylation Fold Change between ASD and Control, Calculated Using All Samples Displayed on the Y Axis or Using Only Typical
Samples Displayed on the X Axis, Related to Figure 1
(A) PFC. The P-value of the fold-change correlation was calculated assuming a t-distributed Pearson correlation coefficient.
(B) Similar plot, TC.
(C) Similar plot, CB.
A 80
AGAS score
0
B
100
AGAS score
C
15
AGAS score
Correspondence
tomi.pastinen@mcgill.ca (T.P.),
ns6@sanger.ac.uk (N.S.)
In Brief
As part of the IHEC consortium, this study
integrates genetic, epigenetic, and
transcriptomic profiling in three immune
cell types from nearly 200 people to
characterize the distinct and cooperative
contributions of diverse genomic inputs
to transcriptional variation. Explore the
Cell Press IHEC web portal at http://www.
cell.com/consortium/IHEC.
1Department of Human Genetics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge
CB10 1HH, UK
2Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
3Human Genetics, McGill University, 740 Dr. Penfield, Montreal, QC H3A 0G1, Canada
4European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge
CB10 1SD, UK
5Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology,
1398 Cell 167, 1398–1414, November 17, 2016 ª 2016 The Authors. Published by Elsevier Inc.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Irina Colgiu,17 Frederik O. Bagger,2,4,18 Paul Flicek,9 Ehsan Habibi,15 Valentina Iotchkova,1,11 Eva Janssen-Megens,15
Bowon Kim,15 Hans Lehrach,14 Ernesto Lowy,9 Amit Mandoli,15 Filomena Matarese,15 Matthew T. Maurano,19
John A. Morris,3 Vera Pancaldi,7 Farzin Pourfarzad,20 Karola Rehnstrom,2,18 Augusto Rendon,2,21 Thomas Risch,14
Nilofar Sharifi,15 Marie-Michelle Simon,3 Marc Sultan,14 Alfonso Valencia,7 Klaudia Walter,1 Shuang-Yin Wang,15
Mattia Frontini,2,18,22 Stylianos E. Antonarakis,12 Laura Clarke,9 Marie-Laure Yaspo,14 Stephan Beck,8 Roderic Guigo,5,6,23
Daniel Rico,7,24 Joost H.A. Martens,15 Willem H. Ouwehand,1,2,18,22,25 Taco W. Kuijpers,2,20,26 Dirk S. Paul,8,27
Hendrik G. Stunnenberg,15 Oliver Stegle,4 Kate Downes,2,18 Tomi Pastinen,3,* and Nicole Soranzo1,2,22,25,29,*
7Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernandez Almagro, 3,
Madrid 28029, Spain
8UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
9Vertebrate Genomics, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus,
the Netherlands
16Molecular Developmental Biology, Radboud Institute for Life Sciences, Radboud University, P.O. Box 9101, Nijmegen 6500 HB, the
Netherlands
17Human Genetics Informatics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK
18National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
19Institute for Systems Genetics, New York University Langone Medical Center, ACLS West, Room 511, 430 East 29th Street, New York,
NY 10016, USA
20Blood Cell Research, Sanquin Research and Landsteiner Laboratory, Plesmanlaan 125, Amsterdam 1066CX, the Netherlands
21Bioinformatics, Genomics England, Charterhouse Square, London EC1M 6BQ, UK
22British Heart Foundation Centre of Excellence, Division of Cardiovascular Medicine, Addenbrooke’s Hospital, Hills Road,
Cambridge, Strangeways Research Laboratory, University of Cambridge, Wort’s Causeway, Cambridge CB1 8RN, UK
26Emma Children’s Hospital, Academic Medical Center (AMC), University of Amsterdam, Location H7-230, Meibergdreef 9,
disease-associated genetic variants may alter expression levels and fungal infection, migrating within minutes to sites of infec-
through changes in chromatin state (Grubert et al., 2015; Waszak tion, attracted by local tissue factors and resident macrophages
et al., 2015). Extending these integrated investigations to primary during the acute phase of inflammation. Finally, CD4+ naive
human cells in disease-relevant contexts is the necessary next T cells are part of the adaptive immune response system, repre-
step to unravel the cell- and context-specific regulatory effects senting mature helper T cells that have not yet encountered their
of complex disease variants. cognate antigen.
Here, we report an integrated analysis of genetic, epigenetic, We generated high-resolution whole-genome sequence, tran-
and transcriptomic datasets in the three major cells of the human scriptome, DNA methylation, and histone modification datasets
immune system, namely CD14+ monocytes, CD16+ neutrophils, in up to 197 individuals selected from a population-based sam-
and CD4+ naive T cells. Monocytes contribute to maintenance of ple and applied variance decomposition, QTL, and allelic imbal-
the resident macrophage pool under steady-state conditions ance analyses to investigate genetic and epigenetic influences to
and migrate to sites of infection in the tissues and divide/differ- transcription and RNA splicing in the three primary immune cells.
entiate into macrophages and dendritic cells to elicit an immune We demonstrate colocalization of molecular trait QTLs with 345
response. Neutrophil granulocytes (neutrophils) are primary unique genetic variants predisposing to seven human autoim-
blood cells of the innate immune and inflammatory response mune diseases, involving all data layers. Overall, the data and
system that form a first line of organismal response to bacterial results deepen our understanding of genetic and epigenetic
Monocytes Methylation
CD14+ CD16-
N=200 H3K4me1
T-cells
CD4+ CD45RA+ H3K27ac
Alt 3’ acceptor
Alt 5’ donor
Methylation intergenic TSS1500 TSS200 UTR 1st exon gene body Methylation M value (logit of beta value)
[440,905]
regulation of the transcriptional machinery in three primary cells For each individual, we performed whole-genome sequencing
of the immune system and inform the formulation and testing of (WGS) (mean read depth, approximately 73) (Figure S2; Table
functional hypotheses for human complex disease. S1) and probed the transcriptional profiles (RNA sequencing
[RNA-seq] at 80 million reads per sample) (Figure S3),
RESULTS genome-wide DNA methylation (Illumina 450K arrays) (Fig-
ure S2), and two histone modification marks for active and
Study Design poised enhancers and active promoters (H3K4me1 and
As part of the BLUEPRINT epigenome project, we recruited an H3K27ac, chromatin immunoprecipitation sequencing [ChiP-
initial set of 200 blood donors from a local blood donor popu- seq] at R30 million reads per sample) (Figure S4). Molecular as-
lation, ascertained to be free of disease and representative of says for monocytes and neutrophils were distributed across
the United Kingdom (UK) population at large (54% females, four laboratories, and assays on T cells were done at McGill
mean age 55 years) (Figure 1; Table S1). We used a (Figure S1). We carefully assessed and adjusted for possible
multi-step purification strategy (Figure S1) to isolate, for each sequencing artifacts that may arise due to differences in
donor, cell subsets corresponding to classical monocytes protocols between centers, applying stringent quality filters
(CD14+CD16) and neutrophils (CD66b+CD16+). Subsequently, where needed. We confirmed that our approach avoided signif-
through a collaboration with Epigenome Mapping Centre at icant effects by profiling a subset of the same individuals
McGill University, we were able to extend the study to a third across each respective center (‘‘cross-over experiments’’) (Fig-
cell type (‘‘phenotypically naive’’ CD4+CD45RA+ T cells, hence- ures S1, S2, S3, and S4; Table S2). Overall, the project gener-
forth referred to as CD4+ T cells or T cells for simplicity) for 169 ated 116,310 million QC-pass reads across all datasets, with
out of the 200 donors. 80% of donors passing ten or more assays and 56 donors
having complete data across all cell types and molecular as- is necessary for the correct interpretation of the contribution of
says (Table S1). epigenetic variation to organismal traits and disease.
We first sought to quantify the relative impact of genetic and
Decomposition of Transcriptional Variance into Genetic epigenetic factors to transcriptional variance. Associations be-
and Epigenetic Components tween epigenetic and RNA traits may arise from two potential
Matched genetic, epigenetic, and gene expression profiles from causes: (1) local epigenetic changes that correlate with RNA
multiple donors in this study provides a unique opportunity to level but themselves are due to DNA sequence variation (Fig-
characterize the relationship between hierarchies of gene regu- ure 2A), and (2) epigenetic changes that are correlated with
lation and how these regulatory links ultimately affect human RNA level and not associated with cis genetic variation.
phenotypic variation. Detailed understanding of this relationship To quantify the relative contribution of genetic and epigenetic
6000 20
other non-coding
4000
10
processed transcript
antisense 5
lincRNA
2000
pseudogene 2
protein-coding
1
0
C D
0.6
% cell specific QTLs
M 1
0.5
N
0.4 T 0.8
H3K4me1
0.3 0.6
0.2 0.4
0.6 -0.4
1
N
0.4
-0.6
T 0.2
Gene -0.8
0
M N T M N T M N T M N T -1
Gene Meth H3K27ac H3K4me1 MvN MvT NvT
G
Chromatin state
5 Transcription ((H3K36me3 low)
Transcription (H3K36me3 high)
Fold enrichment
Heterochromatin (H3K9me3)
4
Low signal
Repressed Polycomb (H3K27me3 high)
3
Repressed Polycomb (H3K27me3 low)
(C) Percentage of phenotypes that are cell-type-specific (top) and genome-wide patterns of QTL sharing (p1 statistics) among cell types (bottom).
(D) Correlation (Pearson) between effect sizes for QTLs shared between different cell types.
(E) Percentage of eSNPs also associated (r2 R 0.8) with H3K27ac and H3K4me1 (left) or methylation levels (right).
(F) Correlation (Pearson) between effect size of expression and other molecular trait QTLs at overlapping signals (LD R0.8).
(G) Fold-enrichment of eQTLs, hQTLs, and meQTLs in different chromatin segmentation states.
See also Figures S2, S3, and S4 and Tables S2, S3, and S4.
C D
r2=0.653, p=1.38e-13
Number of observations
correlations 2
r =0.418, p=3.03e-5
Homozygote
Proportion of SNPs
0.4
H3K27ac
Conditional H3K4me1
ARID5B
Lead
0.6
0.3
RNASeq
0.4
0.2
eQTL PV
0.2
0.1
Catalog rs7090445 rs71508903 rs3125734
rs4245595 rs10821944
rs10821936 rs7090871
0.0
<1.5x >1.5x >2.0x >3.0x >4.0x >5.0x >6.0x >7.0x >8.0x >9.0x AS+ AS+ AS+ AS+
QTL QTL QTL QTL
QTL QTL QTL QTL
Measured Allelic Ratios K4me1 K27ac K4me1 K27ac K4me1 K27ac K4me1 K27ac
Scale 100 kb hg19
Gene-peak -ve Gene-peak +ve chr7: 26,800,000 26,900,000 27,000,000 27,100,000
r2=0.909, p<1e-16
r2=0.758, p=1.12e-13
D Allelic
r2=0.678,
p=1.04e-12
RNASeq
H3K27ac/H3K4me1 eQTL PV
IL2RA
NHGRI GWAS rs7804356 rs17436410
Catalog rs10486483
RNA reverse
Average AA vs. BB
expression
rs12722489 rs3118475 LD r2=0.80
12.82 _
Gene CHT Scale 50 kb hg19
mapping results chr9: 21,750,000 21,800,000 21,850,000
-log(pv)
0_
7.654 _
H3K27ac CHT r2=0.383, p=3.65e-2 r2=0.467, p=8.79e-7
mapping results Allelic
-log(pv) correlations r2=0.333, p=2.61e-4
0_
0.5 _
H3K27ac
RNASeq A/(A+B) H3K4me1
allele ratio 0- MTAP
-0.5 _
0.5 _
H3K27ac A/(A+B) RNASeq
allele ratio 0-
-0.5 _
0.5 _ eQTL PV
H3K4me1 A/(A+B) NHGRI GWAS
allele ratio 0- rs4636294 rs7023329
Catalog
-0.5 _
deviated strongly from the null distribution (chi-square test = 71, only one of the transcripts in the pair showed allelic expression
2 degrees of freedom [df]) (based on orientation of genes tested) effect with mapped local QTL, whereas other transcripts showed
where the strand sharing was more common (+12%) and ‘‘tail- equal allelic expression for same eSNP (Figure 6C, bottom). This
to-tail’’ configuration was depleted (31%). This can partly be could indicate local trans acting activity of the verified cis variant.
explained by 50 overall bias of regulatory variants giving rise to, This hypothesis was supported in follow-up analyses where we
for example, bi-directional promoter variants (Figure 6A, top). tested 342 ‘‘cis-eQTLs’’ showing potential local cis and trans ef-
We also identified strong locally correlated allelic effects span- fects (the latter showing no allelic bias despite high allelic infor-
ning multiple independent annotations, extending to chromatin mativity) for genome-wide trans-associations and compared
layer and multiple genes in both strands (Figure 6B, middle). them to control set of 678 lead eSNPs (matched by mapping
Perhaps the most intriguing associations were those where significance and distance from TSS). The candidate local
Further information and requests for reagents may be directed to the corresponding author/lead contact; Nicole Soranzo (ns6@
sanger.ac.uk).
Human Subjects
Blood was obtained from donors who were members of the NIHR Cambridge BioResource (http://www.cambridgebioresource.org.
uk/) with informed consent (REC 12/EE/0040) at the NHS Blood and Transplant, Cambridge. Details of donor characteristics (gender,
smoking status past and present and age bin), identification (ID) code and donation date are listed in Table S1. Blood collection is
described in the STAR Methods.
METHOD DETAILS
BAM Processing. After creating BAM files from the sequenced lanes, base qualities were recalibrated (Abnizova et al., 2010) and
mapped to the human reference genome (GRCh37/hg19) with BWA. BAM files were sorted and duplicates were marked using
Picard (v1.98). Then BAMs were realigned around known and discovered INDELs using GATK (v3.4)) (DePristo et al., 2011)
and re-calibrated by GATK.
Variant Calling. SNP and INDEL calls were made using SAMtools/bcftools (Li, 2011) by pooling the alignments from 200 individual
low coverage BAM files. All-samples and all-sites genotype likelihood files (bcf) were created with SAMtools mpileup on chunked
chromosomes. The resulting VCFs were merged and Variant Quality Score Recalibration (VQSR) (DePristo et al., 2011) was per-
formed on the chunks, independently for SNPs and INDELs. GATK was run independently for SNPs and INDELs producing a VCF
file containing variant quality score log odds ratio (VQSLOD) scores for each site. The VQSR filter was applied to the SAMtools
calls.
Variant Quality Control and Filtering. We filtered variants that were identified as an INDEL within 10 bp of an INDEL and a SNP
within 3 bp of an INDEL. Additionally, variants were filtered if their VQSLOD score was below the score that was necessary to
discover 96% of truth sites. For SNPs this cut-off was a minimum VQSLOD score of 1.0078 and for INDELs a score of 0.91.
The missing and low confidence genotypes in the filtered VCFs were filled in with BEAGLE r1398 (Browning and Browning,
2007). Additional filtering was then applied to generate a final dataset containing variants with (i) Allelic R-Squared (AR2) R 0.8
(AR2 is the estimated squared correlation between the most likely allele dosage and the true allele dosage); (ii) Hardy-Weinberg
equilibrium (HWE) R 1x103; and (iii) allele count (AC) > 4.
Data QC. A set of 154,222 robustly QC’d autosomal SNPs extracted from a total of 7,009,917 was used to carry out sample quality
control using principal components analysis (PCA) for the identification of ethnic outliers and Identity-By-Descent (IBD) analysis for
the identification of duplicate samples. The SNPs used for the sample quality control consisted of bi-allelic variants with minor allele
frequency (MAF) R 0.05, Hardy-Weinberg P value R 104 and genotype missingness < 3%. In addition, a pairwise r2 threshold of 0.2
was used to select unlinked SNPs. This was done using the indep-pairwise function within PLINK v1.9 (Purcell et al., 2007), with a
moving window of 1000bp. Ethnicity was evaluated by merging the BLUEPRINT samples with the 14 populations present in the
1000 Genomes Project data. PCA was performed and the first three principal components were plotted to identify possible ethnic
outliers (see Figure S2A). A threshold on PC2 scores of 0.018 was used to differentiate the samples of European origin (GBR,
CEU, TSI, FIN, IBS) from the rest. In total 3 outliers were identified and excluded as being of mixed ethnic origins. The proportion
of alleles that were IBD was estimated in a pairwise manner for all samples using the PLINK Method-of-Moments function. The prob-
ability of sharing zero alleles by descent was found to be between (Z0) 0.91 and 1 for all pairwise estimations and therefore all the
individuals in the data were defined as unrelated. Other metrics for the complete variant call set, such as number of variants per sam-
ple and allele frequency, as well as depth of coverage and Ts/Tv ratio, are shown in Figures S2B–S2H.
RNA-Sequencing Sample Preparation
RNA sequencing (RNA-seq) preparation and library creation at McGill University (naive CD4+ T cells) and the Max Planck Institute for
Molecular Genetics (MPIMG, monocytes and neutrophils) were performed using identical methods. Following purification, cells were
lysed in TRIZOL reagent (Life Technologies) at a concentration of approximately 2.5 million cells/ml. RNA was extracted as per man-
ufacturer’s instructions, resuspended in ultra-pure water and quantified (Qubit, Invitrogen) prior to library preparation.
Data Generation
Library preparation. Sequencing libraries were prepared from 200ng RNA using an Illumina TruSeq Stranded Total RNA Kit with Ribo-
Zero Gold (Illumina). Adaptor-ligated libraries were amplified and indexed via PCR.
RNA Sequencing. For monocytes and neutrophils up to six libraries were multiplexed per lane and sequenced at MPIMG using
100bp single end (SE) protocols following manufacturer’s instructions (V3 chemistry, HiSeq 2000, Illumina). On average each sample
generated 9.18Gb of raw data (med 9.32Gb, SD 1.15Gb). For naive CD4+ T cells, libraries were prepared in the same way and
sequenced at McGill university using 100bp paired-end (PE) reads, generating on average 11.74Gb of raw data (med 10.83Gb,
SD 3.38Gb).
Data Processing
Pre-alignment QC. Prior to alignment reads from each RNA-seq library were initially subjected to a quality control step using FastQC
(v0.10.1), where, based on duplication rates and gene coverage, outliers were identified and discarded from further analysis. Reads
of monocytes, neutrophils and naive CD4+ T cells were trimmed for both PCR and sequencing adapters using Trim Galore (v0.32).
Alignment. Trimmed reads were aligned to the human genome using STAR (v2.4.0k) (Dobin et al., 2013). STAR default settings
were used given that they were optimized for 100bp reads in human. For STAR runs, annotated splice junctions retrieved from
GENCODE 15 were used to guide the alignment step.
Quantification of Gene Expression
To quantify and normalize gene expression, we used DESeq2 (v1.4.5) (Love et al., 2014) to obtain the read counts for each gene an-
notated in GENCODE 15.
The majority of the neutrophil samples were immunoprecipitated at WTSI but sequenced independently at WTSI and NCMLS. For
these specific samples only, we aligned each raw fastq file from the different sequencing centers to the reference genome and merge
aligned bams to create only one bam for each neutrophil sample. For MACS2 peak calling of these merged samples, we used WTSI
ChIP input as these samples were all immunoprecipitated at WTSI. For the case of 55 T cell H3K4me1 donors, we merged the aligned
bams of duplicates of same donors in order to gain signal amplifications, as one bam alone for these donors has poor amplification.
For a complete overview of data production, refer to Figure S1.
Data Quality. We removed ChIP samples that had a fraction of reads in peaks (FriP) score < 0.01, relative strand correlation
(RSC) < 0.8 and normalized strand correlation (NSC) < 1.05. FriP was calculated using the reference peak set that is generated as
described in the next section. We identified highly successful ChIP as those with FriP > 0.01 and RSC > 0.8 and NSC > 1.05. Other-
wise, we used genome browser tracks to confirm visually a good ChiP and include it in the final dataset. Figure S4 shows quality
control metrics and corresponding principal components, showing no batch effects after PEER correction using K = 10 factors.
Normalized Read Count in the Reference Peak Set. For each histone modification marker, we generated one reference peak set for
all cell types to provide an unbiased cross cell comparison of peak-based counts. For each marker, we took the union of significant
peaks (1% FDR) across all donors and across all three cell types, merged overlapping regions (BEDOPS–merge, v2.4.14) and removed
peaks found within ENCODE blacklisted regions. This process created one reference peak set per histone modification marker. Note
that the merging process will introduce very wide peaks (R100 KB) but they are at a very low proportion of less than 1% and 5% for
H3K27ac and H3K4me1 respectively. The reference peak set will be filtered further for read counts as described below.
Next we generated quantification signal of ChiPseq for each donor. Here, we only considered read counts under the peaks, as the
regions outside peaks are more likely to be noise or background signal than true ChIP enrichment. For each donor, we generated a
vector of log2 reads per million (log2RPM) per peak in the reference peak set by counting the number of overlapping reads under the
peaks (BEDOPS bedmap –count) and normalized the counts with the total number of reads in the library.
Note that by using only one reference peak set for all three cell types, there will be peaks where there is no signal in one cell type but
quite high in another. Hence for the QTL association analysis carried out per cell type and any downstream cell-specific analyses, we
further filtered the reference peak set to only consider peaks with log2RPM > 0 in at least 50% of the donors in a given cell type,
corrected for 10 PEER factors and applied quantile normalization across donors.
Additional Quality Control to Estimate Cross-Center and Cross-Sample Identity
Batch Correction. Within the study, sequencing data were generated from difference sequencing centers (Figure S1). We performed
the following steps to correct possible batch effects.
For RNA-sequencing and gene-level quantification, we first quantified gene expression by read count for single end RNA-seq sam-
ples, and fragment (pair) count for paired-end RNA-seq using DESeq2. The sequencing depth of different samples was then corrected
by using library size factor from DESeq2. We used 15 cross-over samples to assess the impact of the different sequencing protocol,
and specifically how the quantifications of single end and paired end samples correlated from the same donor in two different centers.
Using PCA analysis, we observed that the cross-over samples deviated from the main clusters before ComBat, which was corrected
and these samples clustered within the corresponding cell types after ComBat (Figure S3A). In Table S2 and Figure S3B, we assessed
correlation in gene expression for the 15 crossover samples at different stages of the analysis (raw data, before batch effect correction
using ComBat, after batch correction and finally after PEER correction). We observed a high correlation coefficient (mean 0.85) at the
level of raw data. The ComBat further corrected the sequencing center effect and improved the correlation coefficient (mean 0.96),
suggesting that the quantifications of single and paired-end RNA-seq were highly comparably. We observed that lowly expressed
genes tended to be less well correlated. Therefore, in the QTL analysis, we further required that a gene to have more than 10 read count
in 50% of the samples. Furthermore, we applied PEER to infer and correct for 10 hidden factors.
For PSI quantification, we found the crossover samples (see later) to display the greatest differences for low quantification values
(PSI from 0 to 0.1), with low overall correlations in pairwise comparisons (mean 0.556). We therefore requested PSI quantification to
Statistical Analyses
Variance Component Modeling of Gene Expression
To investigate the contributions to gene expression variability from different proximal molecular features we considered different vari-
ance component models fit using LIMIX (Casale et al., 2015; Lippert et al., 2014).
where y denotes the gene-expression profile across individuals, 1m an offset term, K l is a local relatedness matrix built using all fea-
tures from either one of the four molecular layers (genetic, methylation, H3K4me1 or H3K27ac data) that are within 1Mb from the gene
body, K g denotes the realized relatedness matrix (Lee et al., 2010), K h is the expression heterogeneity term and s2e I is the noise term.
Specifically, the local relatedness matrix for each feature type was estimated as linear kernel from all cis features of the considered
type (after standardization).
The variance parameters s2l , s2g , s2h and s2e were fitted using restricted maximum likelihood, independently for each of 16,549,
14,985 and 17,082 genes in monocytes, neutrophils and naive CD4+ T cells. The log restricted marginal likelihood was optimized
using a gradient-based optimization algorithm (BFGS) (Morales and Nocedal, 2011). The proportion of variance explained by individ-
ual components was then estimated analogous to the approach taken in classical (narrow sense) heritability analysis (Yang et al.,
2011):
s2l
h=
s2l + s2g + s2h + s2e
When comparing variance component estimates of the model in (1) with a model that does not account for expression heteroge-
neity, we found that accounting for expression heterogeneity yielded substantially lower epigenome variance estimates, whereas the
genetic variance estimates were unaffected (Figure S5J). Consequently, we considered a model that accounts for expression het-
erogeneity in all subsequent analyses. We also considered alternative window sizes (100kb and 1Mb), finding that the results
were most robust and that the overall variance was slightly increased when using 1MB window sizes (Figure S5K).
Accounting for Local Genetic Effects–To account for cis common genetic variation, we first corrected epigenetic features for local
genetic effects. To do so we fitted a separate variance component model for each individual epigenetic feature, using a local relat-
edness matrix based on all SNPs within 100Kb from the epigenetic mark. The effect from local genetic variants was estimated using
the best linear unbiased predictor and the residuals of this model were then used as an estimate of the non-genetic component of the
epigenetic marks (G-corrected marks). Additionally, we introduced a random effect in the model to account for genetic effects on
gene expression from variants within 1Mb from the gene body. Specifically, for each gene we considered the model
y = N 1m; s2l K l + s2geno K geno + s2g K g + s2h K h + s2e I ; (2)
Here, K geno is a local realized relatedness matrix built considering all genetic variants in 1Mb from the gene-body and K l is a local
relatedness matrix built considering all features from either one of the three epigenetic layers (methylation, H3K4me1 or H3K27ac
data) that are within 1Mb from the gene body. This model was used to estimate the proportion of variance explained by methylation,
H3K4me1 and H3K27ac data while accounting for underlying genetic effects.
The cumulative distribution of the proportion of variance explained by local genetics (using model (1)) and each of the three epige-
netic layers either accounting (model (2)) or not accounting (model (1)) for local genetic effects is shown in Figure 2B for monocytes,
Figure S5A for neutrophils and Figure S5B for T cells.
Joint Variance Component Model.
For each gene, we also considered variance component estimates obtained from a joint model across all four molecular layers
(genetics, methylation, H3K4me1 or H3K27ac)
y = N 1m; s2geno K geno + s2meth K meth + s2K4me1 K K4me1 + s2K27ac K K27ac + s2g K g + s2h K h + s2e I :
and tested for s2geno > 0. To test for cis contributions from methylation, H3K4me1 peaks and H3K27ac peaks that are independent
from cis common genetic variation, we used the model in (2), where the local relatedness matrix K l was built considering either
methylation, H3K4me1 or H3K27ac features (again within 1Mb from the gene body) after correction for local genetic effects, and
tested for s2l > 0. We considered log likelihood ratio (LLR) as test statistics and obtained p values using permutations, similar to
the approach in (Casale et al., 2015; Lippert et al., 2014). Specifically, we considered 30 permutations for each test and gene and
combined null LLRs across all genes. This resulted in a total of 600,000 permutation LLRs for each epigenetic layer and cell
type, which we used to estimate empirical P values (minimum pv z1.7 * 106). Empirical P values were corrected for multiple testing
using the Benjamini-Hochberg procedure. Significant associations with gene expression levels were reported at an overall FDR of
5%. Results from the variance component tests are shown in Figures 2C–2E for monocytes, Figure S5G for neutrophils and Fig-
ure S5H for T cells.
Epigenome-wide Association Analysis of Gene Expression
To differentiate epigenetic associations with gene expression that are due to underlying local genetic variation from associations that
are independent of genetic effects, we also carried out classical single-feature association tests, with and without adjusting for ge-
netic factors in the model. For both models, we considered associations between gene expression level and all epigenetic features
that are in 1Mb from the gene body.
Uncorrected EWAS Model. To test for association between gene expression and epigenetic features within 1Mb from the gene
body (including methylation and histone modification) we consider the following linear mixed model:
y = N 1m + eb; s2g K g + s2h K h + s2e I
Here, y denotes the gene-expression profile across individuals for gene g, 1m an offset term, e is the specific epigenetic feature of
interest, K g denotes the realized relatedness matrix (Lee et al., 2010), K h is the expression heterogeneity term and s2e I explains re-
sidual variance. All epigenetic features and gene-expression levels were quantile-normalized to unit variance Gaussian distribution
prior to testing for associations.
G-Corrected Model. Proceeding as in the variance component analysis, we considered the model:
y = N 1m + e0 b; s2geno K geno + s2g K g + s2h K h + s2e I ;
where e’ is the G-corrected genetic feature being tested and K geno is a local realized relatedness matrix built considering all variants in
1Mb from the gene-body. G-corrected epigenetic features were also quantile-normalized to a normal distribution prior to association
testing.
Association testing was performed using LIMIX (Casale et al., 2015; Lippert et al., 2014). For both models, variance components
were estimated under the null model and only the total variance was updated during the association testing (Kang et al., 2008b). For
multiple hypothesis correction, we performed a two-step procedure (Battle et al., 2014): we first obtained a gene-level P value as the
minimum nominal P value (Bonferroni corrected to account for multiple testing across cis features) and then used the Q-value pro-
cedure (Storey and Tibshirani, 2003) to correct for multiple testing across genes. We called genes with significant epigenetic asso-
ciation at FDR < 5%.
QTL-Mapping
Gene, Methylation, Histone Modification QTL Mapping. Cis-acting QTL mapping was done using the LIMIX package. We considered
genetic variants mapping to within 1 Mb (on each side) of each tested feature, and tested their association with gene expression,
splicing (percent splice in, PSI), methylation levels and histone modification peaks (H3K27ac and H3K4me1).
Linear regression models were fit between the genotypes and trait quantification, also including a random effect term accounting
for polygenic signal and sample relatedness (as in the variance component models above we used the realized relatedness matrix to
capture sample relatedness). Analogously to the variance decomposition analysis, we considered quantile-normalized PEER resid-
uals for this analysis. From the linear regression, we obtained the effect size and p value for each tested association.
To correct for multiple hypothesis testing, we performed a two-step procedure (LRVM) (Battle et al., 2014): first, we corrected for
multiple testing across variants for each molecular outcome using Bonferroni correction and, second, we adjusted the obtained p
values for multiple-testing across phenotypes within each layer using the Q-value procedure (Storey and Tibshirani, 2003), consid-
ered QTLs at a significance threshold of 5% FDR.
Data Resources
The full QTL summary statistics from this study can be accessed from http://blueprint-dev.bioinfo.cnio.es/WP10/qtls. The accession
numbers for the alignment data reported in this paper are European Genome-phenome Archive (EGA): EGAD00001002663 (WGS),
EGAD00001002671/EGAD00001002674/EGAD00001002675 (RNA), EGAD00001002670/EGAD00001002672/EGAD00001002673
ADDITIONAL RESOURCES
YRI
CEU
1 1.5
GBR
FIN
IBS
0.00
1
TSI
ASW 0.5
CLM
0.5
MXL
−0.02
PUR
0 0
1−5 5−10 10−20 20−30 30−40 40−50 50−60 60−70 70−80 80−90 90−100 1−5 5−10 10−20 20−30 30−40 40−50 50−60 60−70 70−80 80−90
−0.02 0.00 0.02 0.04 0.06
PC1
allele frequency (%) allele frequency (%)
D E F
50 4.0 14
deletions insertions number of variants (x 1,000,000)
number of INDELs (x 10,000)
12
40
3.5
counts per sample
depth of coverage
10
counts (x 1,000)
30
8
3.0
20 6
4
2.5
10
2
0 2.0 0
G H 20
2.6
Ts/Tv ratio
Het/HomAlt ratio
2.4
15
2.2
percentage
ratio
2.0 10
1.8
5
1.6
1.4 0
A>C A>G A>T C>A C>G C>T G>A G>C G>T T>A T>C T>G
0 50 100 150 200
I J K
0.25
0.25
0.25
0.20
0.20
0.20
0.15
0.15
0.15
Density
Density
Density
0.10
0.10
0.10
0.05
0.05
0.05
0.00
0.00
0.00
L M N
20
150
600
Proportion of variance (%)
15
PC 2 (13.51%)
Dimension 2
50
200
10
0
−50
−200
5
−150
−600
0
−400 −200 0 200 400 600 800 −200 0 200 400 600
4
60 70
Monocytes
60
3 3 3
3
% Individuals
3
50
50
Neutrophils
Density
Density
Density
40
T-cells 2 2 2
2
40
30
30
1 1 1
1
20
20
10 10
0 0 0
0
0 0
0 10 20 30 40 50 60 >70 0 10 20 30 40 50 60 >70 -2−2 -1−1 00 11 22 -2−2 -1−1 00 11 22 -2−2 -1−1 00 11 22
Million Reads H
B log2rpm
ChIP center: WTSI
log2rpm
NCMLS McGill
log2rpm
200000
70000
● ●●
● ●●●
●●●●●●
● ● ●●● ●
● ●● ●
●●
● ● ● ●● ●
●●●● ●●● ●
60000
●
●● ● ● ● ● ●
● ●●
● ●●
●
● ● ● ●●
Number Peaks
●
150000
● ● ●
● ●
● ● ● ●●● ●
50000
● ●● ●●●
● ●●●
●●● ●●● ●●
●
● ●●●
●
●●●
●●
●●
●●●●● ●● ●
● ●●●
●●
●● ● ● ● ● ●
● ●●● ● ●
● ● ● ● ●●● ●●● ●
● ● ● ● ●●●●●●●●● ● ● ●●
● ● ● ●●●● ● ●● ●
40000
● ● ●
● ● ●●
● ● ●● ● ●
● ●●●●● ● ●
●
● ●● ●●●●●●●● ● ● ●
● ● ● ●
●● ●●● ●● ●●●
●●
100000
● ●● ●● ●● ●●● ●
● ●● ● ● ● ●● ●●
● ●● ● ●●● ● ●
●●●● ●●●●● ● ● ●●● ●● ●
30000
● ●●●●
● ●
●
●
●●●●● ● ● ●●●●● ● ●●● ●●● ● ●
● ● ● ● ● ●●●●● ● ●
● ● ●
●●●
●●●●
●
●●●●●● ●●● ●●●●● ●●
●● ●
●●●
●●●● ●
●●●●● ●● ● ● ● ●●●●●
●● ● ●
● ●●● ●●●●● ●●●● ●
● ●● ●
● ●
●● ● ● ●●
●●● ●● ●
●●●● ● ●
●● ●●● ●● ● ●●●●● ● ●
● ● ●●● ● ●●● ●●● ●● ● ● ●
● ● ●●
20000
● ●●●● ●
●●● ● ●●●
● ● ●●●●●● ●●● ● ●
●●●● ● ●
●● ●● ●●● ●●
● ●●●●
● ●●●●●●●●● ● ● ●●
●●●
● ● ●
● ●● ●●● ● ● ● ● ●●● ●
●●●
● ● ● ●● ●
● ●●●● ●●●● ●● ●● ● ●●● ●
50000
● ●●● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
●●●● ●●●
●●●
●●●●●●
●● ●● ●
● ● ● ●●●●
●
●● ●
●●● ● ●●●● ●
10000
● ● ● ● ●
●●● ●●●● ● ●●●
●●● ● ● ● ● ● ●●●
● ●
● ● ● ●●● ●
●
●●●●
●●
● ●● ●●●
●●
●
●●
C ●
●
●
●●
0.1 0.2 0.3 0.4 FriP
70
●●
●
●●
●
●
●
0.1 0.2 0.3 0.4 0.5 I Monocytes
Monocytes H3K4me1 Neutrophil NeutH3K4me1
rophils T-cell H3K4me1
T−cells
100
90
4 4 6
4
6
60
80
5
50
70
3 3
3
3
% Individuals
4
60 40
Density
Density
Density
50 2 2
2
3
30
40
2
2
30 1 1
1
1
20
1
20
10
0 0 0
0
0
10
0 0
-2−2 -1−1 00 11 22 −2
-1−1 00 11 22 -2−2 -1−1 00 11 22
>1.85
-2
1.05
1.15
1.25
1.35
1.45
1.55
1.65
1.75
0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
0.4
0.6
0.8
1.2
1.4
1.6
1.8
2.2
2.4
2.6
2.8
0
1
>3
1
RSC
J log2rpm log2rpm log2rg2grpm
D ChIP center: WTSI NCMLS McGill
25 70
60
20
% Individuals
50
15
40
30
10
20
5
10
0 0
>1.85
1.05
1.15
1.25
1.35
1.45
1.55
1.65
1.75
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1
1.05
1.15
1.25
1.35
1.45
1.55
1.65
1.75
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1
0
>1.85
0
NSC
E F H3K27ac
SOO26A SOO2FT SOO2KJ
Z-score
McGill
NCMLS
Z-score
SOO7DD SOO7G7
r = 0.96 r = 0.88
5
0.025
ASE
ASE
CHT
SEC
0.020
4
Prob. to be most significant SNP
0.015
3
Fold−enrichment
0.010
2
0.005
1
−100kb
−90kb
−80kb
−70kb
−60kb
−50kb
−40kb
−30kb
−20kb
−10kb
TSS
TES
+10kb
+20kb
+30kb
+40kb
+50kb
+60kb
+70kb
+80kb
+90kb
+100kb
0
E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11
3.5
CHT
C ASE
ASE/QTL
0.025
QTL
3.0
0.020
2.5
Prob. to be most significant SNP
Fold−enrichment
2.0
0.015
1.5
0.010
1.0
0.005
0.5
0.000
0.0
−100kb
−90kb
−80kb
−70kb
−60kb
−50kb
−40kb
−30kb
−20kb
−10kb
TSS
TES
+10kb
+20kb
+30kb
+40kb
+50kb
+60kb
+70kb
+80kb
+90kb
+100kb
E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11
SEC
D Monocyte
18857
Neutrophil
0.5
CD4+ T−cell
Monocyte AND Neutrophil
10175
5772
5724
5473
5315
4591
3714
3542
0.3
13429
3482
2721
3546
4443
0.010
12605
6787
5406
12851
8726
4706
1114
1320
2387
3859
0.2
784
4588
4260
1970
0.005
1035
0.1
2532
2571
468
1299
3094
1071
1312
1669
2428
1997
228
1111
827
169
692
122
870
435
317
0.0
−90kb
−80kb
−70kb
−60kb
−50kb
−40kb
−30kb
−20kb
−10kb
TSS
TES
+10kb
+20kb
+30kb
+40kb
+50kb
+60kb
+70kb
+80kb
+90kb
+100kb
Correspondence
jd292@medschl.cam.ac.uk (J.D.),
david.roberts@ndcls.ox.ac.uk (D.J.R.),
who1000@cam.ac.uk (W.H.O.),
asb38@medschl.cam.ac.uk (A.S.B.),
ns6@sanger.ac.uk (N.S.)
In Brief
As part of the IHEC Consortium, this
study probes the allelic architecture and
regulatory landscape of cellular complex
traits with power to identify causal
pathways and links to diseases such as
schizophrenia. Explore the Cell Press
IHEC web portal at http://www.cell.com/
consortium/IHEC.
Highlights
d Genome-wide association study interrogates 36 traits
across the hematopoietic system
1Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
2National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
3Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Forvie Site,
Cambridge, University of Cambridge, Strangeways Research Laboratory, Wort’s Causeway, Cambridge CB1 8RN, UK
7Blood Research Group, NHS Blood and Transplant, John Radcliffe Hospital, Headley Way, Headington, Oxford OX3 9BQ, UK
8European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton,
Cell 167, 1415–1429, November 17, 2016 ª 2016 Elsevier Inc. 1415
Shuang-Yin Wang,10 Eleanor Wheeler,5 Steven P. Wilder,8 Valentina Iotchkova,5,8 Carmel Moore,4
Jennifer Sambrook,1,2,4 Hendrik G. Stunnenberg,10 Emanuele Di Angelantonio,4,6,19 Stephen Kaptoge,4,6
Taco W. Kuijpers,20,21 Enrique Carrillo-de-Santa-Pau,22 David Juan,22 Daniel Rico,22,23 Alfonso Valencia,22
Lu Chen,1,5 Bing Ge,24 Louella Vasquez,5 Tony Kwan,24 Diego Garrido-Martı́n,25,26 Stephen Watt,5 Ying Yang,5
Roderic Guigo,25,26,27 Stephan Beck,28 Dirk S. Paul,4,28 Tomi Pastinen,24 David Bujold,24 Guillaume Bourque,24
Mattia Frontini,1,2,19 John Danesh,4,5,6,12,19,* David J. Roberts,29,30,* Willem H. Ouwehand,1,2,5,6,19,*
Adam S. Butterworth,4,6,19,* and Nicole Soranzo1,5,6,19,32,*
20Emma Children’s Hospital, Academic Medical Center (AMC), University of Amsterdam, Location H7-230, Meibergdreef 9,
Amsterdam 1105AZ, the Netherlands
21Blood Cell Research, Sanquin Research and Landsteiner Laboratory, Plesmanlaan 125, Amsterdam, 1066CX, the Netherlands
22Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3,
disease-associated genetic variants (Hindorff et al., 2009). The explore rare and low-frequency variant associations. Increas-
development of clinically useful applications of these discov- ingly large whole-genome sequencing (WGS) reference panels
eries, such as disease prediction algorithms, identification of are being created. Larger panels include rare alleles from more
etiological mechanisms (Ferreira et al., 2013; Voight et al., variants and better capture the between-variant correlation
2012), and prioritization of new targets for drug discovery (Lopez, structure of study populations (1000 Genomes Project Con-
2008) has lagged behind. This is due partly to the characteristics sortium et al., 2015; Iotchkova et al., 2016b; Loh et al., 2016;
of the disease-associated variants, which are predominantly UK10K Consortium et al., 2015). Here, we exploit the recent im-
common (minor allele frequency [MAF] R5%), which tend to provements in the quality of imputation to carry out association
be associated with small differences in disease risk and analyses of rare and low-frequency genetic variants with 36
which often lie in regulatory regions of the genome, hindering different blood cell indices.
the identification of causal alleles, genes, and disease Blood cells make essential contributions to oxygen transport,
mechanisms. hemostasis, and innate and acquired immune responses (Jenne
Examples of low-frequency (MAF = 1%–5%) and rare variant et al., 2013; Jensen, 2009; Varol et al., 2015) and participate in
(MAF <1%) associations are beginning to emerge from the many other functions such as iron homeostasis, the clearance
application of massively parallel whole genome and exome of apoptotic cells and toxins, vascular and endothelial cell func-
sequencing to human populations (Polfus et al., 2016). Associ- tion, and response to systemic stress (Buttari et al., 2015). Qual-
ated rare variants tend to be easier to link to genes as they itative or quantitative abnormalities of blood cell formation, and
map predominantly in or near coding regions and have fewer of their physiological and functional properties, have been asso-
correlated variants. Furthermore, they can have larger pheno- ciated with predisposition to cancer and with many severe
typic effect sizes and are more likely to act through interpretable congenital disorders including anemias, bleeding, and throm-
mechanisms such as disruption of protein function. These fea- botic disorders and immunodeficiencies (Routes et al., 2014;
tures also enhance their clinical and scientific usefulness. For Schneider et al., 2015). Furthermore, variations in the properties
instance, rare loss of function alleles can be used to assess of many blood-cell subtypes have been associated with a wide
the likely consequences of modulating a pathway pharmacolog- variety of systemic diseases. However, the causal relationships
ically to prevent disease (Plenge et al., 2013). However, very between blood indices and disease risks are unclear and this
large studies are required for power to detect rare variant asso- hinders their potential value for informing new treatments.
ciations and consequently the sequencing approach is still rela- We report over 2,500 variants independently associated with
tively limited by cost. variation in the 36 indices. We examine the genetic architecture
Genotype imputation of large population cohorts (i.e., the of the associated variants and use them to reveal causal relation-
systematic genome-wide statistical inference of unmeasured ships with autoimmune, cardiovascular, and psychiatric dis-
genotypes using exogenous reference panels of sequenced indi- eases. Overall, this study expands the repertoire of genes and
viduals) (Howie et al., 2011) is fast becoming a viable strategy to regulatory mechanisms governing hematopoietic development
Meta-analysis
Ba Ne Eo Mo T B
HSC = hematopoietic stem cell; MPP = multipotent progenitor; LMPP = lymphomyeloid-restricted progenitors; CMP = common myeloid progenitor;
CLP = common lymphoid progenitor; MEP = megakaryocyte and erythroblast progenitor; GMP = granulocyte macrophage progenitor; P = platelet; R = red cell;
Ba = basophil; Ne = neutrophil; Eo = eosinophil; Mo = monocyte; Ma = macrophage; APC = antigen presenting cell; T = T-lymphocyte; B = B-lymphocyte.
in humans and opens potential avenues for targeting key path- (p value < 8.31 3 109) associations for each trait (Xu et al.,
ways involved in abnormal or dysregulated hematopoiesis. 2014) (STAR Methods). We identified 6,736 conditionally inde-
pendent index-variant associations and clustered these variants
RESULTS into 2,706 high linkage disequilibrium (LD) groups each repre-
sented by a sentinel variant (between-sentinel pairwise LD r2 <
Genetic Discoveries 0.8) (Figure 2; Tables S3 and S4). We confirmed the accurate
To identify genetic variants associated with 36 blood cell indices imputation of variants at the rare end of the allelic spectrum by
with increased resolution and statistical power, we studied a to- genotype comparisons with high read-depth (>503) whole
tal of 173,480 European ancestry individuals from three large- exome sequencing data from overlapping individuals, which
scale UK studies—INTERVAL (Moore et al., 2014), approved showed 92.95% concordance and 94.97% precision for rare
by Cambridge (East) Research Ethics Committee, UK Biobank alleles (STAR Methods). Of the sentinel variants, 283 were corre-
(Sudlow et al., 2015), and UK BiLEVE (a selected subset of the lated (r2 R 0.8) with previously reported variants (Table S5), vali-
UK Biobank cohort) (Wain et al., 2015), both approved by the dating most blood trait associations reported in populations of
North West Multi-centre Research Ethics Committee (Figures European ancestry (Gieger et al., 2011; van der Harst et al.,
1, S1, and S2; Tables S1 and S2). We tested univariate associa- 2012; Vasquez et al., 2016).
tions of 36 indices with 29.5 million imputed variants passing The sentinel variants included an unprecedented number of
quality control filters (MAF >0.01%, Figure S3) and used low-frequency (n = 210) and rare (n = 130) alleles (Figure 3A).
stepwise multiple regression to identify a parsimonious subset The genetic associations were almost completely cell-type-spe-
of genetic variants explaining the genome-wide significant cific (Figure 3B), with 900 sentinels (33%) associated exclusively
with red blood cell traits, 1,040 (38%) exclusively with white typic effect sizes between variants mapping to five distinct reg-
cell traits, and 570 (21%) exclusively with platelet traits. Only ulatory states inferred from genome segmentations based on six
five common variants (at ZFP36L2/THADA, SH2B3, HBS1L, histone marks in matched cells. Variants mapping to enhancer
PRTFDC1, and GCKR) were associated with traits across all and promoter regions had larger median effect sizes than those
six trait classes defined in Table S1. mapping to other regulatory classes (Figure 3E).
Curated genes known to cause rare inherited Mendelian blood
Properties and Biological Significance of Associated disorders (Greene et al., 2016; Westbury et al., 2015) were en-
Variants riched among genes containing conditionally significant associ-
To evaluate the representation of classes of genetic variants ations between variants altering protein sequence (missense,
across the allele frequency spectrum, we annotated variants frameshift, stop gained, start lost variants) and blood indices of
with their most severe consequence on GENCODE transcripts cell types matched to the disorders. For instance, we detected
using VEP (McLaren et al., 2016). Variants predicted to have se- a 21.3 (95% confidence interval [CI]: 5.8–52.0) fold enrichment
vere consequences (missense, frameshift, stop gained, start lost (FE) of Mendelian genes for bleeding, thrombotic and platelet
variants; Table S4) were highly enriched in the rare and low-fre- disorders in the platelet-associated genes, a 34.0 FE (95% CI:
quency ranges, consistent with observations from large-scale 11.4–72.1) of genes carrying mutations for Mendelian diseases
sequencing projects (UK10K Consortium et al., 2015) and nega- of the red blood cells in red cell genes and a 6.8 FE (95% CI:
tive selection against variants affecting protein function (Fig- 2.2–15.6) of Mendelian genes for primary immune disorders in
ure 3C). Phenotypic effect sizes (the absolute additive change myeloid white cell genes. The enrichment overlaps included a
in trait mean measured in SD per allele) decreased with known pathogenic missense variant (Landrum et al., 2016) in
decreasing severity of the variant consequence (p = 2.2 3 myeloperoxidase deficiency (MPO) (Romano et al., 1997), and
1016, Jonckheere-Terpstra test for trend in absolute value of we identified additional known pathogenic variants in uncurated
effect size with VEP impact; Figure 3D). For instance, missense genes including CX3CR1 (HIV progression) (Faure et al., 2000)
changes were overrepresented in the rare frequency range and hemochromatosis type 1 (HFE) (Adams et al., 2005) (Table
(p = 9.8 3 1029, Pearson’s c2 test) and displayed larger absolute S4). We also found rare missense variants in Mendelian disorder
effect sizes compared to non-missense variants (median 0.063 genes that had not previously been associated with blood cell
SD versus 0.035 SD, p = 2.5 3 1016, Mann-Whitney-Wilcoxon indices (Table S3) and/or where no pathogenic variants have
test). There were also significant differences in median pheno- been recorded in ClinVar. For example, missense variants in
B E
GMPR, TMC8, and RIOK3 were associated with reticulocyte with platelet indices, ten of which were missense variants and
counts. one a nonsense variant (in KALRN). These include variants
More generally, the 158 variants predicted to alter protein from regions previously identified to contain common weak-
sequence (Table S4) are of interest because of their potential effect variants (IQGAP2, JAK2, SH2B3, and TUBB1) but also
medical value. We focused on rare (MAF < 1%) protein-altering from three gene regions not previously identified by GWAS
variants because they can be more reliably linked to causal (CKAP2L, PLEK, and TNFRSF13B).
genes. For red blood cell indices, we found 14 missense We identified 11 rare protein-altering variants associated with
variants and one frameshift variant (in SPHK1), only one of white cell traits, including ten missense variants in regions previ-
which (rs116100695) was previously identified as pathogenic. ously associated (CEBPE, CXCR2, IL17RA, S1PR4), as well as in
rs116100695 is a rare missense variant in PKLR causing red novel genes not previously known to play a role in hematological
cell pyruvate kinase deficiency, a common cause of hereditary processes. These findings demonstrate roles in leukocyte for-
nonspherocytic hemolytic anemia (Kanno and Miwa, 1991). mation and/or function for ALOX15, AMICA1, and PLEK. Finally,
Some of the other variants are in genes previously associated some rare missense variants had pleiotropic effects across cell
with hereditary anemias. For example, a rare missense variant types. For instance, the rare missense variant in TNFRSF13B
(rs201514157) in SPTA1 was associated with reticulocyte count, (rs72553883) causing common variable immunodeficiency and
and a rare missense variant (rs202099525) in PIEZO1 was asso- selective immunoglobulin A deficiency (Castigli et al., 2005)
ciated with mean corpuscular hemoglobin concentration. Simi- was associated with platelet, myeloid white cell and lymphoid
larly we identified 11 rare protein-altering variants associated white cell indices (Table S4).
C D
Overall, these results expand our knowledge of the genes and size >0.5 SD, suggesting an upper boundary on phenotypic
regulatory regions controlling blood cell biology and function. effect sizes for variants in these frequency classes. The relation-
For rare variants, there were too few minor allele homozygotes ship between allele frequency and the absolute value of the
to estimate precisely genotypic effects on phenotype, even estimated effect size for the sentinel variants could in principle
across >170,000 individuals. However, the magnitude of some be explained by differential winner’s curse by allele frequency
rare heterozygote effects suggests that the corresponding (Figure 4A). However, the strength of the signal strongly sug-
homozygote effects could be clinically relevant. Indeed, it is gests natural selection against variants with large effects.
possible that effects of some homozygotes are more than double Conversely, associations with large phenotypic effects were
those of corresponding heterozygotes depending on the degree overrepresented among rare variants (p value = 1.58 3 1077,
of loss or gain of function, possible compensatory pathways, Pearson’s c2 test), with 21 rare sentinel variants having an esti-
and stress or demand for adaptation in response to injury or mated effect size >0.5 SD (median MAF = 0.09%), five of which
insult. had effects greater than 1 SD (Table S4). These correspond to
effects on traits of 2.73 g/dl, 3.77 fL (femtoliters), 51 3 109/L,
Allelic Architecture of Hematological Indices and 1.37 3 109/L for hemoglobin concentration (HGB), mean
The comprehensive nature of this study allows us to draw more corpuscular volume (MCV), and platelet and neutrophil counts,
general inferences about the allelic architecture of hematological respectively. The effect sizes seen in heterozygotes are suffi-
indices as an exemplar class of complex human traits. Our anal- ciently large to cause disease when carried in homozygosis.
ysis had at least 80% power to detect associations explaining Using the LD score regression (Finucane et al., 2015)
0.0265% of trait variance, which could be attained by a per-allele approach to polygenic modeling, we estimated that common
additive effect as small as 0.023 phenotypic SD for common autosomal genotypes explained between 18% and 30% of vari-
(MAF R5%) variants and 1.154 SD for variants at the lower limit ance in platelet indices, between 10% and 28% of variance in
of the frequency range we considered (MAF = 0.01%). No com- red cell indices, and between 5% and 21% of variance in white
mon or low-frequency variant had an estimated absolute effect cell indices (Figure 4B). Conditionally significant coding variants
We conducted a multivariable MR analysis to reassess epide- mune, three cardiometabolic, and five neuropsychiatric diseases
miological correlations between blood cell indices and a range (STAR Methods) and used genetic variants associated with 13
of human complex diseases and to identify shared causal main hematological indices. For each index-disease pair, we
pathways. The multivariable approach is advantageous because estimated the unconfounded increase in the odds ratio of
it ensures that results for one index are conditional on (i.e., disease per unit change (in SD) in the index. We applied a multi-
control for covariation in) all other indices. For this analysis, we ple testing correction for 182 disease-index comparisons
retrieved publicly available summary statistics for six autoim- (Figure 7).
We detected significant associations between white blood cell of the MHC region (Table S7). Other loci containing alleles
indices and autoimmune diseases (Figure 7C). The strongest robustly associated with higher eosinophil count and increased
was a positive association between eosinophilic indices and risk of rheumatoid arthritis were COG6, SPRED2, RUNX1, and
asthma (asthma odds ratio [OR] per SD increase in eosinophil the highly pleiotropic ATXN2/SH2B3/BRAP region (Table S4).
count = 1.71; 95% CI: 1.53–1.95; p = 4.0 3 1022). This finding As with eosinophils, we saw directionally discordant disease
corroborates evidence from known associations with eosinophil associations with lymphocyte count, which had positive associ-
counts at confirmed asthma loci, such as IL5, IL33, and IL1R1, ations with schizophrenia (OR = 1.17, 95% CI: 1.10-1.24; p =
as well as our discovery that the region around TSLP (another 1.1 3 107), multiple sclerosis (OR = 1.28, 95% CI: 1.14–1.45;
known asthma locus) contains three independent signals associ- p = 6.6 3 105), and coronary heart disease (CHD) (OR = 1.10,
ated with eosinophil count (Table S4). There was weaker evi- 95% CI: 1.04–1.15; p = 1.8 3 104), as well as inverse associa-
dence of a positive association between asthma and neutrophil tions with asthma (OR = 0.81, 95% CI: 0.73–0.90; p = 7.6 3 105)
indices (p = 2.74 3 105), as well as inverse associations with and celiac disease (OR = 0.75, 95% CI: 0.64–0.87; p = 2.6 3
monocyte (p = 1.24 3 104) and lymphocyte (p = 7.56 3 105) 104). However, only the associations with multiple sclerosis
counts. There was also strong evidence for a positive associa- and celiac disease were robust to removal of the MHC region,
tion between eosinophilic indices and rheumatoid arthritis suggesting that genes within MHC predominantly drive the links
(OR = 2.34, 95% CI: 2.01–2.74; p = 1.84 3 1027), a signal that between schizophrenia, coronary artery disease, and asthma.
was robust to a range of sensitivity analyses, including removal Finally, there was a weak positive association of CHD risk with
NEUT#
MS MS MS LYMPH#
RA RA RA EO#
0.50 1.0 2.0 0.50 1.0 2.0 0.50 1.0 2.0 0.50 1.0 2.0 3.0 0.25 0.50 1.0 2.0
Asthma (AST), celiac disease (CEL), inflammatory bowel disease (IBD), multiple sclerosis (MS), rheumatoid arthritis (RA) and type 1 diabetes (T1D). Chronic kidney disease (CKD),
coronary heart disease (CHD) and type 2 diabetes (T2D). Alzheimer's disease (AD), bipolar disorder (BpD), cross disorder (CrD), major depressive disorder (MDD) and schizophrenia (SCZ).
reticulocyte indices (OR = 1.12; 95% CI: 1.07–1.17; p value = et al., 2016; Paul et al., 2015). Clues to these molecular pathways
1.7 3 106) and a weak inverse association of CHD risk with have traditionally come from discoveries of highly penetrant mu-
MPV (OR = 0.92; 95% CI: 0.88–0.96; p = 8.1 3 105), both of tations associated with inherited disorders of the hematopoietic
which were robust to all sensitivity analyses (Figure 7). system, somatic mutations underlying blood cell cancers, and
These analyses have suggested a weak but significant positive from functional screens in model organisms (Boatman et al.,
association between hemolysis and CHD risk. This may prompt 2013; Ganz and Nemeth, 2012). More recently, such studies
re-evaluation of the risk of arterial thrombosis for patients with have been complemented by high-throughput molecular and ge-
on-going hemolysis as has been done for venous thrombosis. netic analyses of common biological variation (Vasquez et al.,
Perhaps, most strikingly the association between eosinophil 2016). Our study benefited from a substantial increase in statis-
count and rheumatoid arthritis may trigger more detailed genetic tical power compared to previous GWAS, driven by improve-
and clinico-epidemiological studies to dissect the provoking and ments in study design and data capture, including the use of
perpetuating pathology of this inflammatory disease. dense WGS-imputation panels and the accurate adjustment of
phenotypes for biological and technical covariates.
DISCUSSION The new associations, including a large number of rare and
low-frequency coding variants, define a detailed atlas of genes
The molecular programs that control hematopoietic stem cell dif- and regulatory regions influencing blood cell indices with
ferentiation and proliferation are incompletely understood (Notta cell-type-specific effects. There were several rare variants in
Further information may be directed to the Lead Contact, Nicole Soranzo (ns6@sanger.ac.uk). Results, including genome-wide
univariable summary statistics, are available from http://www.bloodcellgenetics.org.
We analyzed data from three large population studies with measurements of blood cell indices and imputed genome-wide geno-
types - the UK Biobank study, the UK BiLEVE study (a selected subset of UK Biobank) and the INTERVAL study. Although the UK
BiLEVE study is a subset of the UK Biobank study, we often refer to the UK BiLEVE study separately, since we conducted asso-
ciation analyses of UK BiLEVE participants as a distinct dataset due to their selected nature and a slightly different genotyping
array.
METHOD DETAILS
Genotyping
For all three studies, aliquots were shipped to Affymetrix in 96-well barcoded plates with two empty wells for Affymetrix controls.
Samples were quantified using a PicoGreen-based method to identify plates with high numbers of low concentration samples, which
could be replaced prior to genotyping. Genotyping was performed on the Affymetrix GeneTitan Multi-Channel (MC) Instrument
according to the Affymetrix Axiom 2.0 Assay Automated Workflow. Genotypes were then called in batches of approximately 50 plates
(4800 samples) using the Affymetrix Power Tools software to implement the Axiom GT1 algorithm. For the UK Biobank and UK
BiLEVE studies, rare variants (i.e., those with fewer than six minor alleles in a genotyping batch) were recalled using variant-specific
priors to improve performance.
Where:
m is an index for each of the 15 PCs provided by UK Biobank,
Em represents the eigenvalue corresponding to PC m (i.e., the genetic variance explained by PC m)
Pim represents the score of individual i on PC m
Cm represents the median score on PC m of participants with self-reported White ancestry (defined as ‘‘British,’’ ’’Irish,’’ ‘‘White,’’ or
‘‘Any other White background’’)
We used a threshold of genetic distance > 50 to identify non-Europeans, which resulted in the exclusion of 7,848 non-European
samples.
To implement further QC steps (heterozygosity analysis, PCA and identification of duplicate samples), a robust set of variants were
derived using the same methods as UK Biobank, i.e., selecting autosomal variants on both arrays that had passed variant QC in all 33
batches, had MAF R 2.5% and missingness % 1.5%, were not indels, were not C/G or A/T variants, and were not within 23 regions of
known long-range linkage disequilibrium (LD). These variants were then LD-pruned (r2 < 0.1) to obtain an uncorrelated set of variants.
The first fifty PCs were estimated using flashpca (Abraham and Inouye, 2014) and the heterozygosity analysis, which was carried out
in parallel with the ethnic outlier identification using PLINK v1.9 (Chang et al., 2015), identified 3,030 samples that had autosomal
heterozygosity greater than three SD from the mean, 2,667 of which were also ethnic outliers. To identify duplicate samples, we per-
formed identity-by-descent (IBD) analysis using the PLINK Method-of-Moments approach (http://pngu.mgh.harvard.edu/purcell/
plink/ibdibs.shtml), which identified 19 pairs of duplicate/monozygotic twins (pi_hat R 0.9). All 38 samples were excluded from the
analysis dataset.
Quality control (QC) of INTERVAL Genotype Data
In total, 48,813 INTERVAL samples were genotyped in ten batches. Following standard Affymetrix QC exclusions, within-batch sam-
ple and variant QC was performed. Non-best probesets were excluded to leave a single probeset per variant. As visual inspection of
cluster plots had identified that some variants, particularly rare variants, had minor allele homozygotes incorrectly called due to the
presence of an extreme intensity outlier, we failed variants from a batch if:
d the variant had fewer than ten called minor allele homozygotes;
d the cluster plot contained at least one sample with an intensity at least twice as far from the origin as the next most extreme
sample;
d the outlying sample (s) had an extreme polar angle (< 15 or > 75 ) in the direction of the minor allele.
Prior to further QC of variants within each batch, we excluded duplicate samples and samples that were clearly not of European
ancestry using a set of high-quality autosomal variants, defined as those with:
Duplicate samples were defined as those with p b R 0.9 using the PLINK Method-of-Moments IBD approach and non-Europeans
were defined as those with scores on PC1 or PC2 < 0 following a PCA including INTERVAL samples with 1000 Genomes major
ancestry populations (1000 Genomes Project Consortium et al., 2015).
Variants were then excluded from a batch if they strongly deviated from HWE (p value < 5x106), following a Fisher’s exact test for
low-frequency and rare variants (defined as those with a maximum MAF < 0.05 across all ten batches) or a c2 test for common var-
iants. Similarly, variants were excluded from a batch if they had a within-batch call rate < 0.97. Finally, variants were dropped from all
batches if they failed in at least four of the batches due to deviation from HWE, low call rate or Affymetrix variant exclusion criteria.
After merging passing samples and variants across the ten batches, we estimated the level of sample contamination using the
method described by Jun et al. (2012), which examines the relationship between allele frequency and probeset intensity. We
excluded samples with more than 10% contamination, as well as those who had both 3%–10% contamination and ten or more first-
or second-degree relatives (defined as pi_hat R 0.1875). Heterozygosity outliers (heterozygosity more than three standard deviations
away from the mean), samples with missing phenotypic sex and sex mismatches were then also removed, as were variants with a
MAF range greater than 0.05 across all batches, variants that were monomorphic in one or more batches but had MAF > 0.01 in
another batch, and variants that had different minor alleles between batches (only for variants with maximum MAF < 0.475 across
batches).
For IBD analysis and PCA, another set of 100,000 high quality variants was selected using the same criteria described above for
the UK Biobank QC (Figure S3). The global IBD analysis (performed using PLINK Method-of-Moments approach) revealed 69 pairs of
across-batch duplicates (or monozygotic twins), who were removed from the dataset. A between-study IBD analysis, including the
INTERVAL, UK Biobank and UK BiLEVE studies revealed a further 1100 participants who were in both the INTERVAL and combined
Variant Imputation
UK Biobank and UK BiLEVE
The pre-imputation variant QC, phasing and imputation conducted on the combined UK Biobank and UK BiLEVE dataset has been
described in detail (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=157020). Briefly, sample and variant QC was performed as
described above, then variants were additionally removed if they:
d were only on the UK BiLEVE array and had failed in more than one (of eleven) UK BiLEVE batches;
d were only on the UK Biobank array and had failed in more than two (of 22) UK Biobank batches;
d were on both arrays and had failed in three or more of the 33 total batches.
Multiallelic variants and variants with MAF < 0.01 were then removed, as were non-autosomal variants. The UK Biobank and UK
BiLEVE study samples were then jointly phased and imputed using a combined 1000 Genomes Phase 3-UK10K panel. Phasing was
conducted using SHAPEIT3 (O’Connell et al., 2016), a modified version of SHAPEIT2 (Delaneau et al., 2013), in chunks of 5,000 var-
iants with an overlap of 250 variants between chunks. Imputation was performed using IMPUTE3, a modified version of the IMPUTE2
software (Howie et al., 2011), in chunks of 2Mb with a 250kb buffer region. Post-imputation, variants with MAF < 0.00001 (1 in
100,000) were filtered from the dataset using QCTOOL (http://www.well.ox.ac.uk/gav/qctool/), leaving 72,355,667 variants for
analysis in the dataset.
INTERVAL
Prior to imputation, additional variant QC steps were performed to establish a high quality imputation scaffold. We imposed a global
HWE filter of p value < 5x106, a call rate filter of 99% over the batches that a variant was not failed in, and a global call rate filter of
75% (effectively ensuring a variant passed in at least eight of the ten batches). Finally we removed all monomorphic variants.
Non-autosomal and multi-allelic variants were removed as part of the QC process and the dataset was then phased using
SHAPEIT3, with the same criteria used for UK Biobank (chunks of 5,000 variants with an overlap of 250 variants between chunks)
and subsequently imputed using the same combined 1000 Genomes Phase 3-UK10K imputation panel described above. Imputation
was performed on the Sanger Imputation Server (https://imputation.sanger.ac.uk), which uses the PBWT imputation algorithm (Dur-
bin, 2014), and analyses whole chromosomes. No imputation quality or variant frequency filters were applied, resulting in 87,696,910
imputed variants in the dataset.
Using whole-exome sequencing (WES) data for 3,976 INTERVAL study participants who were also in our post-QC imputation
dataset, we were able to assess imputation accuracy. We adapted two metrics (Linderman et al., 2014) to compare genotype
data to sequencing data for these purposes. The first was non-reference concordance, which considers all heterozygotes and minor
allele homozygotes in the WES dataset and calculates the proportion seen in the imputed dataset. The second was precision, which
considers all the heterozygous and minor allele homozygotes in the imputed dataset, and calculates what proportion of these calls
was correct according to the WES dataset. For 146 missense, loss-of-function or rare high-impact (beta > 0.5SD) variants that
passed QC in the WES dataset, we observed a median non-reference concordance of 98.6%, 98.8% and 93.9% in common
(MAF > 0.05), low-frequency (MAF > 0.01 and MAF % 0.05) and rare variants (MAF < 0.01) respectively and median precision of
99.5%, 99.3% and 98.5% in common, low-frequency and rare variants respectively.
For each blood cell index, we used the central part of the data (the data differing from the median by less than 3.5 median absolute
deviations on the adjustment scale) to estimate the effect on the mean of the (adjustment scale transformed) index of within machine
time-dependent drifts, delay time between venipuncture and measurement, day of the week and time of year. We restricted the
model fit to the central part of the data in order to minimize influence from outlying data points. After fitting the regression model
we computed model residuals for the full dataset and used these residuals to compute index values adjusted for technical effects.
Specifically, we used the R package mgcv (https://cran.r-project.org/package=mgcv) (R Core Team, 2014; Wood, 2011) to fit a
generalized additive model (GAM) with the following regression equation:
X X
Eðaðyi ÞÞ = s½tðiÞ5mðiÞ + s tday ðiÞ; tven ðiÞ 5mðiÞ + 1wðiÞ = w + c tyear ðiÞ + 1mðiÞ = m
w˛ m
fmon;.;sung
Here:
d a denotes a function transforming the measured index data y to the adjustment scale.
d m (i) denotes the instrument used to acquire measurement i.
d w (i) denotes the day of the week on which measurement yi was acquired.
d t (i) denotes the time difference between the time of measurement of observation i and midnight (am) on the first day of the
study.
d tyear (i) represents the difference between the time of measurement of observation i and midnight (am) on the 1st of January on
the year in which observation i was measured.
d tday (i) represents the difference between the time of measurement of observation i and midnight (am) on the day of
measurement.
d tven (i) represents the difference between midnight (am) on the day observation i was measured and the time of venipuncture.
d Each term in square brackets represents a contribution to the linear predictor.
d s[ ] indicates a smoothing term. For the univariate terms we smooth using P-splines, while for bivariate smooths we smooth
using thin plate splines.
d c[ ] indicates a cyclic smoothing term, used here to model seasonal variation on a circle representing time of year.
d We use the symbol 5 to indicate the presence of an interaction between the smooth and the categorical variable to its right
(in both cases here, the instrument id m (i)).
d a (yi) represents the trait data for observation i on the adjustment-scale, after correction for drift using the GAM.
d d (i) denotes the day on which the index measurement for i was acquired.
d Measurements acquired on day-instrument pairs with fewer than 10 data-points or for which zd,m > 8 were excluded from
further analysis.
After making these exclusions we refitted the GAM for drift described above to obtain measured index values that are adjusted for
drift effects without the influence of data from aberrant days. We then recomputed the derived indices from the measured indices. For
some indices, the power gained from the adjustments for technical effects alone is equivalent to thousands of additional samples
(Figure S1).
Exclusions Based on Phenotypes and Covariates
We sought to exclude individuals with blood cancers or major blood disorders from the UK Biobank study on the grounds that, if
included, their noisy blood counts may reduce the power to detect genetic associations. Using data from the baseline health assessment
self-report, the linked cancer registry and linked hospital inpatient record summaries, we identified and removed individuals suffering
from blood cancers or other blood disorders. Specifically we excluded participants who had a self-report or medical history containing
a record of myelofibrosis, lymphoma, leukemia, malignant lymphoma, multiple myeloma, multiple myelofibrosis or myelodysplasia,
chronic lymphocytic leukemia, chronic myeloid leukemia, acute myeloid leukemia, polycythemia vera, polycythemia, a myeloprolifera-
tive disorder, essential thrombocytosis, a hematological cancer histology report, an unspecified lymphatic or general hematological
neoplasm, a myelodysplastic syndrome, or an unspecified heme malignancy, monoclonal gammopathy, an unspecified hereditary he-
matological disorder, hemochromatosis, thalassemia, hemophilia, sickle cell anemia, neutropenia, lymphopenia or pancytopenia. In
aggregate this excluded 5,045 participants from the UK Biobank phenotype dataset, of whom 1,611 had measured genotypes.
Since we had no access to detailed health record data on the INTERVAL participants, we did not make any similar exclusion for
INTERVAL. However, participants in the INTERVAL study are generally healthier than those in UK Biobank and are active whole blood
donors, therefore the incidence of blood disorders is likely to be substantially lower. Hematologists screened the baseline full blood
counts of INTERVAL participants and very few probable cases of leukemia were identified.
Non-seasonal Environmental and Variance Explained by Sex Differences
We adjusted all indices for environmental and sex differences using a GAM, again solely using the central part of the data (the data
after adjustment for technical effects, differing from the median by less than 3.5 median absolute deviations on the adjustment scale)
to fit the model. The measured environmental covariates differ between the INTERVAL and UK Biobank studies and consequently the
models we fitted differed slightly.
For the INTERVAL study dataset we fit a model with the following terms:
d A univariate smooth (30 knots) for age at venipuncture, with an interaction with a categorical variable describing menopausal status
with the following levels: male, female-premenopausal, female-postmenopausal, female-had-hysterectomy, no-answer, unsure
d A bivariate smooth (30 knots) for log-height and log-weight (which implicitly adjusts for body-mass index [BMI]), with the same
categorical interaction variable as for age
d A univariate smooth for pack-years of smoking
d A categorical variable describing current smoking habits with levels: never, special-occasions, rarely, occasional, most-days,
every-day, no-answer
d A categorical variable describing alcohol drinking status with levels: never, previous, current, no-answer
d A categorical variable describing current alcohol drinking habits with levels: never, special-occasions, 1-3-times-monthly,
1-2-times-weekly, 3-5 times weekly, most-days, no-answer.
For the UK Biobank study dataset we fit a model with the following terms:
d A univariate smooth (30 knots) for age at venipuncture, with an interaction with a categorical variable describing menopausal
status with the following levels: male, female-premenopausal, female-postmenopausal, female-had-hysterectomy, unsure
For both datasets, where data-points were missing for a covariate, we imputed them by the mean covariate value and included a
dummy variable to allow the mean of the index value for individuals with missing data to differ from the mean index value for individ-
uals with non-missing data.
Removal of Outliers and Normalization
We removed observations by index for which there was a large difference between the raw measured index value and the adjusted
index value. Specifically, we removed a data point if the difference, on the adjustment scale, between the original raw measured data
and the adjusted data was more than 3.5 median absolute SD from the median of the distribution of such differences for the relevant
index.
We removed outliers from the phenotype data. We first considered outliers in each marginal univariate distribution. For each index
on the adjustment scale, we removed all data-points lying more than 4.5 median absolute deviations from the median index value on
the adjustment-scale. We then grouped the indices as follows:
After standardizing the variables on the adjustment scale, we performed a principal component analysis for each group and
computed the sum of squares of the leading d PC-scores where d is the number of independent measurements required to compute
the variables in each group. We compared the sum of squares to a c2d distribution and removed outliers falling into the upper 107 tail
probability.
Finally, within each study we quantile-inverse-normal transformed the trait data within each level of a categorical variable formed
by crossing a categorical variable indexing the hematology analyzer with a categorical variable with the levels male, female-premen-
opausal, female-postmenopausal, female-had-hysterectomy, no-answer, unsure.
The final number of participants passing phenotype and genotype QC from each of the studies is shown in Table S2, along with
summary statistics for each blood cell index.
d population-genotype interactions (i.e., true allelic effect size differences between studies),
d variation in LD between study populations,
d study specific quantile-inverse-normal transformations, when there are differences in the adjustment of phenotypes for cova-
riates between studies,
d differences in genotyping measurement error between studies (when independent of phenotype, such errors tend to bias as-
sociations toward the null) and
d differences in phenotyping measurement techniques between studies, none of which are necessarily reasons to regard an
observed population association as spurious.
Due to the high power of the present analysis, we found that common variants showing directionally concordant evidence for
association across the three studies were often removed when we filtered variants by thresholding a statistic measuring evidence
for quantitative heterogeneity in effect size (Cochran’s Q). Consequently, we devised an alternative (generalized) statistic to detect
heterogeneity in effect size that we regard as implausible for genuine population associations. The three dimensional plot (Figure S2E)
illustrates our approach.
Model Selection by Stepwise Multiple Regression
Many of our observed associations likely reflect the same underlying causal signal due to LD between the variants. For each blood
index, we therefore sought to identify a parsimonious set of genetic variants explaining the genome-wide significant associations by
stepwise multiple linear regression, using the fastLM implementation in the R package RcppEigen. We first partitioned the blood in-
dex-specific genome-wide significant variants into the unique minimal set of blocks such that no block could be further partitioned
into subsets of variants separated by at least 10Mb. We then performed a block and blood index-specific bidirectional stepwise
model selection procedure, combining the individual level data from all three studies. Every regression model we assessed included
the covariates used in the original marginal analyses (i.e., study-specific principal component scores and dummy variables for
recruitment center). Additionally, we included dummy variables to absorb between-study blood index variation, an adjustment which
was intrinsic to the meta-analyses of marginal associations.
The stepwise procedure started with the ‘empty’ model, containing only covariates as predictors. At each step we adjusted the
model by:
1. adding the unmodeled variant with the smallest p value for association with the residuals of the current model, providing that
such a p value was below the genome-wide significance level (8.31x109)
2. iteratively pruning variants from the model when the p value comparing the current model with the sparser model was greater
than the genome-wide significance level, with the variant corresponding to the largest such p value being pruned at each
iteration.
When neither 1. or 2. were possible the procedure terminated. We modeled only the additive effects of the imputed allele dosages.
After identifying a terminal set of variants for each block, we merged the variants for each blood index across blocks and ran the
same stepwise procedure but on the merged set of variants for each index, starting with the saturated model. This ensured selection
of a set of variants for each index that were mutually conditionally significant at the genome-wide level, accounting for any residual LD
over 10Mb. Although the stepwise procedure made no adjustment of p values to account for the model search, it also ignored addi-
tional strong evidence for associations from the apposition of distinct signals. Our genome-wide significance level is conservative, so
the selected variants for each index are likely to represent causally distinct signals, except in regions where imputation is imprecise
(where multiple variants may tag a single causal signal).
We report univariable and multivariable summary association statistics for the variants with conditionally significant associations in
Table S4.
Consensus Set of Variants over Blood Indices
Because we performed a distinct model selection procedure for each blood cell index, a locus that was associated with
multiple indices could be represented by different sentinel variants. To identify conditional variants reflecting the same signals, we
clumped the selected set of variants from all indices using pairwise LD. First, we identified the set of variants considered conditionally
where each l corresponds to a genomic control inflation factor (Table S2), to undo approximately the effect of our genomic control
adjustments.
In order to measure systematically the statistical significance of the overlaps between our blood cell index-associated variants and
BLUEPRINT epigenetic data, we used GARFIELD (Iotchkova et al., 2016b), a novel enrichment analysis approach that uses genome-
wide association summary statistics to calculate odds ratios for association between annotation overlap and disease status at given
genome-wide statistical significance thresholds. Tests for significance are implemented via generalized linear modeling framework
accounting for LD, minor allele frequency (MAF), and local gene density. LD (r2) was calculated with PLINK v1.9 using variants from
the combined UK10K and 1000 genomes Phase3 European cohorts in 1 Mb windows. Overlap of blood cell index-associated var-
iants with BLUEPRINT annotations was based on genomic position overlap or LD tagging (r2 R 0.8). Variants significantly associated
with blood cell indices were ‘greedily’ pruned by sequentially retaining the most significant variant and pruning around it (LD r2 R 0.1)
until no significant variants remained. This approach tries to ensure independence of variants in the enrichment tests, while ensuring
that we retain the most significantly associated variants. We tested for enrichment all variants with MAF R 1% reaching a p value of
1x108 and performed multiple testing correction based on the number of traits, segmentation states and cell types used.
Integration with BLUEPRINT Molecular QTL Data
Many of the common variants we discovered were non-coding (i.e., intronic, intergenic, in 50 or 30 untranslated regions or were just
upstream or downstream of genes) suggesting they may act through regulatory mechanisms. To investigate this, we tested coloc-
alization of the 29.5 million variants we included in our GWAS of blood indices with BLUEPRINT molecular QTL data (Table S6) using
the software SMR (Summary data-based Mendelian Randomization) (Zhu et al., 2016). The BLUEPRINT QTL data consists of expres-
sion QTL (eQTL), splicing QTL (sQTL) and a histone mark H3K4me1 (hQTL) identifying sites of active or poised enhancers in 200
European samples (Chen et al., 2016). Data were available for monocytes, neutrophils and T cells, hence we restricted our annotation
to loci that were associated with myeloid or lymphoid cell indices. SMR takes the variant with the most statistically significant asso-
ciation with each QTL (defined as p < 5x108), then tests whether the ratio of that variant’s effect size with the QTL against its effect
size with each myeloid or lymphoid index is significant (p < 0.001). Having established the presence of a QTL and a blood cell index
association at the same locus, the software then proceeds to test whether this apparently colocalized signal is the result of linkage
(i.e., two independent signals in the same genomic region) or causality/pleiotropy (i.e., the same causal variant affects both the QTL
and the blood cell index). This is performed via a Heterogeneity In Dependent Instruments (HEIDI) test statistic, which assesses the
homogeneity of the ratio across variants in the region, with p > 0.05 indicating colocalization (Figure 6).
Figure S1. Adjustment for Technical Covariates Affecting Full Blood Count Measurements, Related to Figure 1, Tables S1 and S2, and the
STAR Methods
(A) Day averaged measurements of MCV taken from a single instrument over the course of UK Biobank baseline recruitment. The discontinuities may have been
generated by calibration of the machine against a variable deterministically related to MCV. Continuous drift is visible within some of the piecewise continuous
segments. The left plot is obtained using the raw data while the right plot is obtained using the technically adjusted trait, showing elimination of discontinuities and
drift.
(B) The effect of the time of day of acquisition on the average measurement of MONO%. Data are taken from a single Coulter instrument over the full UK Biobank
baseline recruitment period. The left plot is obtained using the raw data while the right plot is obtained using the technically adjusted trait, showing elimination of
the dependence of the mean of MONO% on time of day.
(C) Example of the effect of time delay between venipuncture and acquisition on the measurement of the mean white blood cell count. Each point gives the
average WBC# for samples acquired during baseline UK Biobank recruitment on a single Coulter instrument during a fifteen minute delay interval. The boundaries
(legend continued on next page)
of the shaded region interpolate the 95% confidence intervals of the means. The left plot is obtained using the raw data while the right plot is obtained using the
WBC# trait data that has been adjusted for the technical covariates. The dependence of the mean cell count on delay time has been eliminated.
(D) Percentages of the variance of each UK Biobank measured variable explained by the adjustment for technical covariates and seasonal drift on the relevant
adjustment scale. Integer labels show the effective number of additional samples gained from making the technical adjustments, meaning the expected number
of additional samples that would be required to obtain equivalent p values in a GWAS for the trait if the adjustment were not made.
(E) As for (D) except for INTERVAL.
(legend on next page)
Figure S2. Adjustments for Sex and for Biological and Environmental Covariates Affecting Full Blood Count Measurements, Related to
Figure 1, Tables S1 and S2, and the STAR Methods
(A) The dependence of mean neutrophil count on sex and menopause status in the UK Biobank data adjusted for technical effects. The top plot is obtained using
the raw data while the bottom plot is obtained adjusting the data for menopause and sex effects showing the elimination of the variance these covariates explain.
(B) Day averaged measurements of neutrophil count taken from a single instrument over the course of the UK Biobank baseline recruitment. There is a long run
upward drift in the average count over time. Seasonal oscillation in the average counts is also visible. The top plot is obtained using the raw data while the bottom
plot is obtained using the technically adjusted data, showing the elimination of drift and seasonal oscillation.
(C) Percentage of variance of UK Biobank traits explained (on the relevant adjustment scale) by sex and covariates affecting full blood counts, including age,
menopausal status, smoking and alcohol variables.
(D) As for (C) except for INTERVAL traits.
(E) Illustration of the method used to determine the weight of evidence that heterogeneity in effect sizes across the three studies exceeded a tolerance criterion.
The axes represent effect sizes in UK Biobank, INTERVAL and UK BiLEVE. The black dot represents the vector of study specific effect size estimates ( b b UK Biobank,
b
b INTERVAL, b b UK BiLEVE,) for a variant. If the dot lies inside the infinite yellow double-pyramid (defined by three planes intersecting the origin, each normal to one of
n1 = (1, 1/4, 1/4), n2 = ( 1/4,1, 1/4), n3 = ( 1/4, 1/4, 1)) we consider that there is no evidence of between study heterogeneity. If the black dot lies outside the
yellow double-pyramid we measure the strength of evidence for heterogeneity as the distance between the black dot and the nearest point on the surface of
the pyramid (red dot), with distances scaled to account for the standard errors of the study specific estimators. The nearest point on the pyramid is thus defined as
the point in the smallest confidence surface for the estimators that intersects the pyramid (blue ellipsoid). We thresholded the distance score at 5.2 and filtered all
variant-blood index pairs exceeding the score from further analysis.
A
INTERVAL
INTERV
RVAL
Post-QC dataset
• Pre-imputation variant exclusions: HWE p-value<5x10-6, call rate<99% in passed
batches, call rate<75% across all 10 batches, monomorphic variants
Dataset for imputation
(43,059 samples,
654,966 variants)
Imputation
(87,696,910 variants)
• Post-imputation variant exclusions: Info
score<0.4, MAF<0.0001, failed these filters in UK
BiLEVEE or UK Biobank
Association analysis
(~29.5M variants)
B
UK Biobankk + UK BiLEVE
UK Biobank + UK BiLEVE
raw genotype data
(153,293 samples in 33 batches
820,967 variants)
• Batch-specific variant exclusions: duplicate probesets, standard Affymetrixx criteria
fails, non-autosomal,l multiallelic, call rate<95%, within-batch plate effects, HWE
deviation
• Batch-specific sample exclusions: sex mismatches, duplicates, heterozygosity
outliers, high missingness
• Additional variant exclusions: batch effects,
UK Biobank + UK BiLEVE
imputed data release
(151,733 samples,
72,355,667 variants)
Analysis dataset
(132,959 samples)
UK Biobank UK BiLEVE
(87,265 samples) (45,694 samples)
• Post-imputation variant exclusions: Info
score<0.4, MAF<0.0001, failed these filters in UK
E UK Biobankk or INTERVAL
BiLEVE, INTERVAL
V
Association Association
analysis analysis
(~29.5M variants) (~29.5M variants)
3 4
1
2
10 bp
0.1–1 kb 10–100 kb
150 bp
Modified histone
5-methyl-cytosine (5mC) and its
oxidative derivatives (e.g., 5hmC) Add linkers DNA wound
are measured genome-wide around histone
using enrichment- and Enrich
conversion-based methodologies
Genomic positions of modified histones are Antibody
followed by massively parallel
sequencing. Bisulfite conversion measured genome-wide by chromatin
provides quantitative measure- Bilsulfite Enrich immunoprecipitation followed by massively
ments of 5mC but is unable to treatment parallel sequencing (ChIP-seq). Histones can
distinguish 5hmC. Antibody Antibody be liberated from the genome by sonication,
enrichment provides qualitative enzymatic digestion, or by transposon Remove
measurement of 5mC and 5hmC. insertion (not shown). If sonication is used, histone and
Bisulfite-converted or -enriched UCU CU CU chromatin must first be chemically cross-linked add linkers
DNA is purified, subjected to (see Step 4). Following histone liberation,
library construction and clonally Sequence specific chemical modifications are enriched by
sequenced. Specialized and analyze immuno-absorption. DNA associated with Sequence
algorithms are required to align enriched histones is purified, subjected to and analyze
bisulfite-converted reads to a library construction, clonally sequenced, and
reference genome. aligned to a reference genome.
Massively parallel sequencing technologies provided the foundation from which the field of epigenomics has been built. This SnapShot depicts key sequencing-based
methods used in the analysis of epigenomes, including (1) bisulfite sequencing, (2) chromatin immunoprecipiation sequencing (ChIP-seq), (3) determination of open chromatin,
and (4) 3D chromatin capture.
Bisulfite Sequencing
Genome-wide 5-methyl-cytosine (5mC) is measured by enrichment-based methods and direct sequencing of bisulfite-converted DNA. Enrichment methods provide
qualitative measures of 5mC (Down et al., 2008), are capable of distinguishing oxidative derivatives (e.g., hmC), and can be combined with methyl-restriction-based assays to
improve detection of unmethylated cytosines (Maunakea et al., 2010). Enrichment-based methods require 50–100 million sequence reads per sample. Direct sequencing of
bisulfite-treated DNA provides quantitative measurements of 5mC genome wide but is unable to distinguish oxidized derivatives. Two main strategies for library construction
have emerged for bisulfite sequencing. In the first method (shown), a genomic library with adapters added is subjected to bisulfite treatment (Lister et al., 2009). In the second
(not shown), genomic DNA is first bisulfite-converted and then subjected to library construction (Miura et al., 2012). For conversion methods 1 billion 100 nt sequence reads
are generated for each sample and library construction methods introduce distinct library specific biases in genome coverage.
ChIP-Seq
Genomic locations of modified histones are detected genome wide by ChIP-seq (Barski et al., 2007). Typically, 25–50 million immunoprecipitated fragments are sequenced
for a histone mark. The two main strategies that have emerged to release nucleosomes from chromatin associated with 100–300 bp genomic fragments are shown. In addi-
tion, strategies that utilize transposon-based fragmentation of chromatin have recently become available (Schmidl et al., 2015).
REFERENCES
Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., and Zhao, K. (2007). Cell 129, 823–837.
Bonev, B., and Cavalli, G. (2016). Nat. Rev. Genet. 17, 661–678.
Boyle, A.P., Davis, S., Shulha, H.P., Meltzer, P., Margulies, E.H., Weng, Z., Furey, T.S., and Crawford, G.E. (2008). Cell 132, 311–322.
Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y., and Greenleaf, W.J. (2013). Nat. Methods 10, 1213–1218.
Crawford, G.E., Holt, I.E., Whittle, J., Webb, B.D., Tai, D., Davis, S., Margulies, E.H., Chen, Y., Bernat, J.A., Ginsburg, D., et al. (2006). Genome Res. 16, 123–131.
Down, T.A., Rakyan, V.K., Turner, D.J., Flicek, P., Li, H., Kulesha, E., Gräf, S., Johnson, N., Herrero, J., Tomazou, E.M., et al. (2008). Nat. Biotechnol. 26, 779–785.
Lister, R., Pelizzola, M., Dowen, R.H., Hawkins, R.D., Hon, G., Tonti-Filippini, J., Nery, J.R., Lee, L., Ye, Z., Ngo, Q.-M., et al. (2009). Nature 462, 315–322.
Maunakea, A.K., Nagarajan, R.P., Bilenky, M., Ballinger, T.J., D’Souza, C., Fouse, S.D., Johnson, B.E., Hong, C., Nielsen, C., Zhao, Y., et al. (2010). Nature 466, 253–257.
Miura, F., Enomoto, Y., Dairiki, R., and Ito, T. (2012). Nucleic Acids Res. 40, e136.
Schmidl, C., Rendeiro, A.F., Sheffield, N.C., and Bock, C. (2015). Nat. Methods 12, 963–965.
1430.e1 Cell 167, November 17, 2016 © 2016 Elsevier Inc. DOI http://dx.doi.org/10.1016/j.cell.2016.11.015