Cell - 17 November 2016

Leading Edge
Editorial
A Cornucopia of Advances in Human Epigenomics

In 21 papers published this week across Cell Press journals, sion? If so, which genes? The IHEC studies apply epigenomic in-
including 5 in this issue of Cell, the International Human Epige- formation from primary cells to address these questions using all
nome Consortium (IHEC) pushes forward our understanding of or a combination of genetic, epigenomic, transcriptional, and
nucleosome positioning and modifications to DNA and histones chromosome contact data. Integrating multilayered data gener-
in primary human cells. Collectively, these papers illuminate how ates compelling hypotheses regarding the molecular outcomes
changes to chromatin contribute to cell-type-specific biology, of genetic variants, including whether the variants are likely to
development, variation between individuals, and disease. cause alterations in splicing or gene expression and what their
The editors of Cell Press are pleased to present these papers target ‘‘regulatees’’ are.
to readers in a unified way. The insights and impact of the collec- Finally, as we’ve seen time and time again, new ways of ob-
tion of papers are truly more than the sum of the individual parts, taining, parsing, and visualizing data move fields forward.
and we hope that contemporaneous publication facilitates Advances in this area represented by the papers in the Cell
appreciation of the synergies between the different comprehen- Press/IHEC package include applying soft X-ray tomography
sive datasets, innovative methodological approaches, and rich to the visualization of changes in chromatin compaction and dis-
biological observations. While the papers treat a relatively broad tribution during the course of development and an approach to
range of topics in terms of biological processes, diseases, and analyzing epigenome-wide association studies that identifies
tissues, conceptually they converge on several themes high- cell-type-specific, disease-relevant signals. To facilitate discov-
lighting how epigenomics research is forging ahead to provide ery, sharing, integration, and analysis of data from the IHEC
us with a deeper understanding of what all these ‘‘marks’’ and project, two web-based platforms have been developed and
their positions in a genome really mean. are described in separate papers. Related to data sharing, a
To begin with, while the ability to determine the genetic commentary in this issue discusses different approaches
sequence of single cells was a sterling accomplishment, geared toward striking a balance between ease of data access
achieving single-cell resolution for epigenomic data presents to support scientific progress with personal privacy concerns
different and, in many ways, greater challenges. Several of the and proper credit attribution for those generating the experi-
IHEC papers make significant progress in this area by presenting mental data.
approaches and high-resolution data that disentangle heteroge- For a more detailed description of the papers illustrating the
neity both within different cell types of a tissue and between reg- concepts above and for references to the specific papers that
ulatory elements in a clonal population of cells. present the different findings, please read the commentary pre-
Next, an important aim of collecting and analyzing epigenome pared by the IHEC Consortium authors in this issue. And for a
data is to reveal aspects of the underlying biology and its relevant perspective on the consortium’s endeavors and future directions
molecular mechanisms. A set of papers in the IHEC collection from one of its members, check out the audio interview on the
capitalize on epigenomic analysis combined with other datasets Cell homepage. In addition, we’ve created a special interactive
to identify transcription factors that play key roles in cell fate web portal (www.cell.com/consortium/ihec) where you can
determination and oncogenesis. In a different twist, and under- both browse the entire collection and zero in on specific areas
scoring the potential clinical relevance of this kind of integrative of interest. The site illustrates the extensive international collab-
analysis, a pair of papers reveals the role of certain metabolites orations that are behind this body of work. Additionally, there’s a
and pathogen cues in eliciting epigenetic changes in the context two-volume digital edition that you can download from this site
of an immune response. One of these cues was indeed shown to that collates all of the research articles and associated commen-
boost the innate immune response. taries published across Cell Press journals. And lastly, for an
Still other papers in the package tackle the challenge of editorial perspective on consortium science and arranging coor-
pinning down the effects of noncoding variants identified in dinated publication, check out the November 17th blog post at
genome-wide association studies. Do they affect gene expres- http://crosstalk.cell.com. We hope you enjoy the collection!
The Cell editorial team

http://dx.doi.org/10.1016/j.cell.2016.11.001
Cell 167, November 17, 2016 ª 2016 Published by Elsevier Inc. 1139
Leading Edge
Analysis
hands and says science doesn’t know

A Scientist and a Journalist Walk into anything.’’
a Bar. It’s the job of science writers to explain

this incremental, back-and-forth way that
science operates and to explain many of
the other basics of science, such as how
experiments are designed, what peer
Who are science journalists, and how can journalists and
review is and isn’t, and how theories
research scientists work together to improve science communi- are evaluated based on their explanatory
cation? power.
‘‘This is a country where half of the
In 2011, science journalist Mark Johnson Many science writers feel compelled people don’t believe in evolution,’’ says
and his colleagues from the Milwaukee to write, and most have questions Michael Specter, a science writer for
Journal Sentinel won a Pulitzer Prize for about the world that they need to answer. The New Yorker. ‘‘It doesn’t matter how
their series about a 4-year-old boy with a One science writer calls herself a ‘‘profes- good [the] science is if the people.don’t
mysterious and life-threatening intestinal sional nerd and question-asker.’’ Some- believe it, don’t accept, don’t think it’s
disease. one who ‘‘writes about science’’ could important.’’
The journalists described how doctors be a grant writer, an educational curricu- The public funds science through their
advocated to have the boy’s exome lum developer, or a public information of- taxes and science writers would like to
sequenced in order to diagnose him and ficer writing press releases for universities bring those taxpayers into the conversa-
how the results showed a previously un- and research organizations, but we’ll tion by explaining what a new scientific
known, single genetic mutation. For which focus on science writers who write for discovery means and why it matters. Ellen
Pulitzer category did the journalists win? the general public in newspapers, maga- Ruppel Shell, director of Boston Univer-
The one for explaining stuff. Because it zines, books, and blogs. sity’s master’s program in science jour-
ends up that accurately and clearly ex- When science writers first discover that nalism, says, ‘‘Most of [our students] are
plaining stuff is difficult. they can combine their two loves of sci- very thoughtful people who come into
this because they want to do good in the
world; they want to see positive change.
‘‘We need to have a way to talk about both the fear and the [They].are really concerned with the
promise [of scientific technology], without people forming into public understanding of science and
factions. The only way that can happen is if lots of people are want to contribute in a positive way.’’
Journalists need to tell the public ‘‘why
talking about it all the time.’’
science is important, interesting, and rele-
vant to their lives. I’d say that’s the num-
ber one job we have,’’ says Shell.
Science journalists and science writers ence and writing, many are surprised Specter, who is writing a book about
regularly do just that: they accurately and thrilled and consider it a ‘‘happy mar- CRISPR and other approaches to gene
and clearly explain stuff about science in riage,’’ the ‘‘best job in the world.’’ Their editing, spent three months at the Broad
a compelling way for the general public. science background has taught them to Institute this summer working directly
focus on an area of interest and to ask with geneticists. He’s worried that there
Who Are Science Writers? key questions. Their writing skills enable are not enough public conversations
Science writers don’t have one specific them to tell engaging stories, using vivid about science. ‘‘Technology moves faster
type of educational background; they details, emotion, and drama. Their jour- than our ability to deal with it,’’ says
arrive at science writing from a variety of nalism training teaches them to look at Specter, ‘‘and now we’re.on the verge
paths and educational backgrounds, but topics from multiple angles and to place of being capable of doing really freaky
as a group, they tend to be generalists. today’s scientific discovery into a wider things with genetics. Those freaky things
They’re curious and want to know how body of knowledge. Overall, they work to are exciting, but they’re also scary. We
the world works. In high school, they bridge the gap between the non-scientist need to have a way to talk about both
might have read Scientific American and and the scientist. the fear and the promise, without peo-
may have adored Stephen Jay Gould. Robin Marantz Henig, the immediate ple.forming into factions. The only way
Many studied science in college, while past president of the National Association that can happen is if lots of people are
others studied journalism or English. of Science Writers (NASW), says that ‘‘not talking about it all the time.’’
Some discovered that they felt clumsy every [scientific] study is definitive; in fact, Explaining how science operates,
and out of place in the lab or didn’t have no one study is definitive,’’ and some- describing relevant new findings, and
the patience for repetitious lab work. times, one study contradicts a previous bringing everyone into the conversation
One science writer said she was a ‘‘lethal study. ‘‘That’s just the way science oper- is a big job. Unfortunately, journalists
menace’’ in the lab; another had night- ates. [But] when there’s a back-and-forth regularly get the science wrong or they
mares about pipetting. kind of thing, the public throws up its overdramatize an incremental discovery
1140 Cell 167, November 17, 2016 ª 2016 Published by Elsevier Inc.
Another place where scientists and
journalists are learning from each other
is the Kavli Conversations on Science
Communication, hosted by New York
University’s Science, Health and Environ-
mental Reporting Program. In a series of
live conversations, a scientist and a jour-
nalist discuss a specific scientific topic
and how best to communicate that topic
to the public. Recent topics include the
microbiome, genetic engineering, animal
cognition, and, on a lighter note, the hu-
mor in science.
Because science moves so quickly and
is so technically complicated, science
writers struggle to keep up. Having a sci-
entist colleague as a behind-the-scenes
resource would allow a journalist to feel
Journalists in the 2016 Marine Biology Laboratory hands-on research course learn to pipette. comfortable asking ‘‘dumb’’ questions in
Photo courtesy of Brad Shuster.
a nonjudgmental environment.
The Marine Biological Laboratory’s Lo-
or they leave out the larger meaning of a and offer to speak to a class or mentor a gan Science Journalism Program creates
discovery. Clearly, science writers can’t journalist, or they could contact a profes- just this sort of nonjudgmental environ-
do this job alone. So, how can research sional organization, such as the NASW or ment. The program allows working
scientists work with journalists to improve the Council for the Advancement of journalists to immerse themselves in a
science communication to the general Science Writing (CASW). These two orga- scientific community, where they ‘‘get
public? Here are some key ways that sci- nizations hold a joint meeting each year to unfiltered, unrestricted access to scien-
entists can help. discuss the craft of writing and to learn tists,’’ says Brad Shuster, biologist at
about new advances in science. CASW’s New Mexico State University, who has
1. Spend Time with Science Writers ‘‘New Horizons in Science’’ program con- taught the ten-day biomedical research
One way for scientists to engage in larger, sists of briefings from top scientists on course for 6 years.
public conversations is to simply spend near-future research, especially those ‘‘We’re together for 12–14 hours a day,
time with science journalists. They can topics that are expected to stir up contro- and we have informal discussions about
look for science writers who cover their versy society-wide. The goal ‘‘is to have a anything they want: politics, the ethics of
area of research and connect with them really sustained interaction between sci- funding, the state of science journalism,’’
by commenting on their articles publicly entists and journalists in a setting where says Shuster. Their day starts with coffee
or by contacting them privately. To find sci- they can spend some time together, and a short lesson at the white board and
ence journalists, scientists can read STAT, have a lot of conversations, and have then lab work all day. The journalists use
Quanta Magazine, The Open Notebook, formal presentations on new areas of sea urchins to study early development,
The Last Word on Nothing (a group blog), research,’’ says Rosalind Reid, executive and they perform genetic screens on
or Mosaic, a science magazine published director of CASW. ‘‘Our sessions are yeast, looking for interesting mutations.
by the Wellcome Trust. Scientists can offer highly interactive, and many times, the The scientists and journalists eat meals
to discuss their research specifically or the scientists say, ‘This is one of the most together, attend community-wide talks,
world of science more generally. fascinating speaking engagements I’ve and return to the lab after dinner.
Scientists who see themselves as part had.’.We encourage the scientists to Phong Tran, biologist at the Perelman
of the larger, public project of science stick around for a couple of hours after School of Medicine at the University
can make a big difference. ‘‘If you want they give a talk so that there can be lots of Pennsylvania and the Institut Curie,
to encourage [public] funding, obviously, and lots of questions.’’ has taught the biomedical course for
you have to reach out in partnership with Closer to home, scientists can attend three years. His goal is to ‘‘get the
the taxpayers,’’ says Shell. ‘‘By being events hosted by regional groups, such students to think like a scientist’’ by
available to journalists and by being as the Northern California Science Writers designing their own projects and
open to these conversations, you can Association, the New England Associa- answering novel research questions.
minimize the misinformation that gets out tion of Science Writers, or the Philadel- Tran has discovered that journalists, like
[as well as] the anti-science sentiment, phia-area Science Writers Association. scientists, have ‘‘an inherent curiosity to
which some people think is growing.’’ What better way to connect informally know things’’ and to get to the bottom
In addition to connecting with individual with journalists than to go on a museum of an issue by conducting extensive,
journalists, scientists could contact a tour or a nature hike with them or attend detailed research. He started teaching
university’s science journalism program a book talk and have a beer with them? the course because he ‘‘wanted the
Cell 167, November 17, 2016 1141

the speaker uses a technical term, audi-
ence members hold up an orange ‘‘jar-
gon’’ sign. Whenever the speaker uses
simple, understandable language and
displays energy and passion, audience
members hold up a green ‘‘awesome’’
sign. ‘‘You get real-time feedback’’ on
your talk, says Steven C. Pan, a PhD
candidate in psychology at the University
of California, San Diego, who attended
the 2015 ComSciCon conference. ‘‘It
was really fun.’’
During the conference, Pan worked on
an article about his research and later
Attendees at ComSciCon 2013 give feedback to a speaker. Photo courtesy of ComSciCon.
submitted it to Scientific American. He
was thrilled to have it published 2 months
experience of interacting with journalists, Another tool journalists use is concrete, after the conference. ‘‘I found that the
people outside of [his] immediate circle everyday language without jargon. Scien- pace of writing this 1,000-word piece
[which is] mostly scientists.’’ tists can practice using this tool by imag- was much slower than writing standard
One of the participating journalists, ining describing their research to a young manuscripts for academic journals,’’
Sudhi Oberoi, a science writer at the person. To motivate scientists to do just says Pan. In academic writing, Pan says
Indian Institute of Science in Bangalore, that, the Alan Alda Center for Communi- he can rely on all the standard terminology
believes that for communication to be cating Science at the Stony Brook Univer- of the field, terms that are ‘‘tremendous
effective between a science writer and a sity School of Journalism hosts an annual time-savers’’ because the academic audi-
scientist, it ‘‘must be free and open’’ and competition called ‘‘The Flame Challenge: ence easily recognizes them. ‘‘But when
that the MBL course organizers and in- Explaining Science to an 11-Year-Old.’’ I’m writing for Scientific American, you
structors ‘‘fully embodied this ethos.’’ The contest, which started in 2012, asks cannot assume that someone will know
scientists to answer a basic scientific ques- that particular jargony term, so everything
2. Learn Some Storytelling Skills tion in an understandable and engaging has to be explained in plain language. It’s
Another way research scientists can way. It’s named after the first year’s chal- really such a different style of writing.’’
collaborate with journalists to improve lenge question: ‘‘What is a flame?’’ Scien- Another tool that can help scientists
science communication is to learn the tists submit their answers as a video or communicate is practicing looking at the
craft of storytelling themselves. Scientists short article—and, yes, the entries are world through the eyes of a non-scientist.
can use storytelling tools—narration, a actually judged by 11-year-olds! Dan Fagin, director of NYU’s master’s
compelling conflict, a personal anecdote, The Alan Alda Center provides unusual program in science journalism and direc-
and everyday language—to describe the courses and workshops (which they also tor of communication workshops for
excitement of a scientific discovery or take on the road), including improvisa- NYU scientists, says that one of the
the frustration of cancer cells ‘‘winning’’ tional acting classes, to teach scientists biggest challenges for scientists writing
a battle against experimental drugs to how to use passion and energy to better for the public is cultivating a ‘‘basic
make the story memorable. connect with the public when talking empathy’’ for the reader. ‘‘We teach [sci-
One storytelling tool is the metaphor—a about science. entists] in our science communication
concrete, familiar example to describe a Another organization that helps scien- workshops.to try to get outside yourself
more abstract, unfamiliar concept. For tists with storytelling skills is ComSciCon, as the communicator and think carefully
example, the phrase ‘‘unzipping like a founded by science graduate students for about the experience of the person who
zipper’’ is often used to describe what science graduate students. Their national is being communicated to. That’s not
happens when a DNA strand undergoes conferences include lively workshops, easy to do, but it’s really important.’’
the first step of DNA replication. And panel discussions with journalists and Picturing one’s reader and tailoring
when neurobiologist Michael Graziano educators, and one-on-one mentoring. one’s writing to the needs of that reader
describes what happens in the brain Before arriving, attendees prepare a draft is essential for all types of communica-
during the mental act of attention, he bor- article of their research for a general audi- tion. Having a profound empathy for the
rows imagery from a familiar scene: ‘‘Neu- ence, a poster abstract, a lesson plan reader or the listener or the viewer, and
rons act like candidates in an election, for school children, and a 1-minute wanting to make their experience enjoy-
each one shouting and trying to suppress ‘‘pop talk’’—a short description of their able, can result in creative and engaging
its fellows.’’ Scientists should come up research, told in an engaging way. communication.
with their own helpful metaphors to During the conference, attendees give
describe their research and use these their pop talk to the group. Behind them, 3. Speak Directly to the Public
metaphors when talking with journalists a large screen displays a ticking bomb To improve public communication of sci-
and the public. counting down the seconds. Whenever ence, scientists can also speak directly
1142 Cell 167, November 17, 2016

to the public. Specter insists that all scien- Pan. ‘‘Outreach efforts are still considered in Brooklyn, a live series of lectures and
tists should discuss their work with the sort of gravy on top of that. .But from the performances based on science; listening
public on a regular basis, even if it’s basic perspective of public service and making to and promoting Inquiring Minds, another
research. If the research is publicly a real impact, it’s essential.’’ weekly podcast, hosted by a neuroscien-
funded, there needs to be much more One of the remarkable results of scientist and a science educator, which ex-
openness, he says. ‘‘I regularly visit pla- tists hanging out with journalists is that sci- plores scientific topics affecting politics
ces like the Broad [Institute], Stanford, entists sometimes get ‘‘infected with the and society; and attending or starting a
and NIH, where researchers are making desire to write, themselves,’’ says Reid Science Café (also called ‘‘Science on
remarkable discoveries, and [scientists] of CASW. ‘‘I remember a physicist who Tap’’), informal gatherings in bars, restau-
say, ‘Isn’t it enough that we’re doing these came to the 2012 New Horizons [meeting] rants, cafés, or bookstores, where anyone
amazing things? Do we also have to and just decided to stay for 3 or 4 days, can come listen to a guest scientist and
explain to people why amazing things and [he] basically made a lot of friends participate in the discussion. There’s a
are good for them?’ And unfortunately, among the writers.’’ With their encourage- myriad of ways that scientists can use
the answer is, ‘Yes, you do! It’s the ment, he explored Twitter. ‘‘He started their creativity and talents to reach out to
same reason you have to explain why tweeting and they were re-tweeting his the public.
vaccines work.’’’ stuff, so it kind of took off immediately.’’ Once upon a time, a scientist and a
Pan agrees that speaking to the public Some options for scientists to explore journalist walked into a bar, started a
is an important aspect of a scientist’s include: tweeting about their own dialog, and created a magazine article
role, even if it’s not valued within the research; volunteering to tell their story and a lecture series and an improv
academy. ‘‘In promotion and tenure deci- at The Story Collider, a weekly podcast class and a podcast and a journalism
sions, it’s still more about the papers that of ‘‘true personal stories about science’’; workshop and a newsletter and a confer-
you publish in academic journals,’’ says participating in the Secret Science Club ence and a website and a.
Susan Matheson
Cambridge, MA, USA
Cell 167, November 17, 2016 1143

Leading Edge
Bench to Bedside
Exon Skipping Therapy

Courtney S. Young1 and April D. Pyle1,2,*
1
Molecular Biology Interdepartmental Program, Center for Duchenne Muscular Dystrophy, Eli and Edythe Broad Center of Regenerative Medicine and
Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095; 2Department of Microbiology, Immunology and Molecular Genetics,
University of California, Los Angeles, Los Angeles, CA 90095
*Correspondence: apyle@mednet.ucla.edu edu
050
NAME
Exondys 51 (Eteplirsen)
S K E L E TA L M U S C L E F I B E R
APPROVED FOR
Patients with Duchenne muscular dystrophy who have a confirmed
DMD trancript with patient mutation lacking exon 50 mutation applicable to exon 51 skipping
48 49 51 52 53 54 TYPE
Phosphorodiamidate morpholino oligomer (PMO)
48 49 51 52 53 54 MOLECULAR TARGETS
Exondys 51 PMO RNA transcript of DMD, exon 51
CELLULAR TARGETS
Skeletal muscles expressing DMD transcript
DMD trancript reading frame restored
48 49 52 53 54 EFFECTS ON TARGETS
Causes DMD transcript exon 51 to be skipped, putting the RNA back
in frame and creating an internally deleted but somewhat functional
Exondys 51 is the first therapy for Duchenne muscular dystrophin protein
dystrophy (DMD) to have been granted accelerated
approval by the FDA. Approval was granted based on a DEVELOPED BY
Sarepta Therapeutics
<1% increase in dystrophin expression as a surrogate
marker. Exondys 51 targets DMD exon 51 for skipping to
restore the reading frame for 13% of Duchenne patients.
DMD by the Numbers After 4 Years of

Exondys 51 Treatment
Duchene muscular dystrophy affects
Boys still walking
1 in 5,000 External control 15%

Exondys 51 83%
males births
27 Years
Average lifespan
Exondys 51 is applicable
165m
Exondys 51 treated patients walk
for 13% of Duchenne boys
165 meters further in 6 minutes
1987 2012
Cloning of the DMD gene 2003 Sarepta’s Phase IIB 48-week results
PMO exon skipping in
1988 Duchenne model
Out-of-frame hypothesis 2014
Sarepta’s Phase III trial ongoing
1996
First demonstration 2016
of exon skipping Accelerated approval granted
for Duchenne
1985 1995 2005 2015
References for further reading are available with this article online: www.cell.com/cell/fulltext/S0092-8674(16)31511-2
1144 Cell 167, November 17, 2016 Published by Elsevier Inc.

Leading Edge
Essay
The International Human Epigenome Consortium:

A Blueprint for Scientific Collaboration and Discovery
Hendrik G. Stunnenberg,1,* The International Human Epigenome Consortium,4 and Martin Hirst2,3,*
1Department of Molecular Biology, Faculties of Science and Medicine, Radboud University, Nijmegen, 6525AG, the Netherlands
2Department of Microbiology and Immunology, Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
3Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC V5Z 4S6, Canada
4http://ihec-epigenomes.org/
*Correspondence: h.stunnenberg@ncmls.ru.nl (H.G.S.), mhirst@bcgsc.ca (M.H.)

The International Human Epigenome Consortium (IHEC) coordinates the generation of a catalog of
high-resolution reference epigenomes of major primary human cell types. The studies now pre-
sented (see the Cell Press IHEC web portal at http://www.cell.com/consortium/IHEC) highlight
the coordinated achievements of IHEC teams to gather and interpret comprehensive epigenomic
datasets to gain insights in the epigenetic control of cell states relevant for human health and
disease.
One of the great mysteries in develop- to improve human health. A critical nism to facilitate communication among
mental biology is how the same genome component of IHEC is to coordinate the members and provides a forum for coordi-
can be read by cellular machinery development of common bioinformatics nation with the objective of maximizing ef-
to generate the plethora of different standards, data models and analytical ficiency among researchers working to
cell types required for eukaryotic life. tools to organize, integrate and display understand, treat, and prevent diseases.
As appreciation grew for the central the epigenomic data generated. Current full members of IHEC include:
roles of transcriptional and epigenetic IHEC members all contribute to these AMED CREST/IHEC Team Japan; DLR-
mechanisms in specification of cellular primary goals, but they also have indi- PT for BMBF German Epigenome Pro-
fates and functions, researchers around vidual complementary goals such as gramme DEEP; CIHR Canadian Epige-
the world encouraged scientific funding developing new and improved ways to netics Environment, and Health Research
agencies to develop an organized and monitor or manipulate the epigenome, Consortium (CEEHRC); European Union
standardized effort to exploit epigenomic discovering new epigenomic mecha- FP7 BLUEPRINT Project; Hong Kong
assays to shed additional light on this nisms, training the next generation of Epigenomics Project; KNIH Korea Epi-
process (Beck et al., 1999; Jones and epigenome researchers, exploring epige- genome Project; NHGRI ENCODE; the
Martienssen, 2005; American Association nomic features associated with disease NIH Roadmap Epigenomics Program;
for Cancer Research Human Epigenome states, and translating epigenomic dis- and the Singapore Epigenome Project
Task Force; European Union, Network coveries into improvements to human (http://ihec-epigenomes.org/). In the sub-
of Excellence, Scientific Advisory Board health. This is in keeping with the larger sequent sections, we overview experi-
2008). overarching vision of IHEC, which is mental and computational tools devel-
In March 2009, leading scientists and to help address fundamental questions oped by IHEC members and highlight
international health research funding in how the genome and environment key findings from a collection of recent
agency representatives were invited to a interact during development and aging, publications from IHEC members.
meeting in Bethesda, Maryland (US), to and how the epigenome influences health
gauge the level of interest in an interna- and disease. Indentifying Heterogeneity in
tional epigenomics project and to identify There are many strengths to a con- Epigenomic Measurements
potential areas of focus. This meeting, sortium model, bringing together research Cellular and allelic heterogeneity provides
and a subsequent conference in January expertise and knowledge from across a significant challenge in the interpre-
2010 in Paris (France) ultimately led to the world. These include the ability to tation of epigenomic signatures that
the creation of the International Human implement and monitor high-quality data are typically derived from heterogeneous
Epigenome Consortium (IHEC). and assay standards and to maximize populations of millions of individual cells.
The primary goals of IHEC are to coor- coverage of human cells and tissues while To address this challenge, we have devel-
dinate the production of reference maps avoiding unnecessary duplication. Addi- oped a series of molecular and computa-
of human epigenomes for key cellular tionally, this model helps harmonize data tional approaches to deconvolute epige-
states relevant to health and diseases, to collection, management, and analysis, nomic signatures from heterogeneous
facilitate rapid distribution of the data to to facilitate sharing and retrieval across populations. Three independent strate-
the research community, and to accel- countries and provides open access to gies are presented to explore the hetero-
erate translation of this new knowledge data and results. IHEC provides a mecha- geneity at bivalent domains, a ‘‘poised
Cell 167, November 17, 2016 ª 2016 Elsevier Inc. 1145

state’’ marking important developmental inferring epigenomic co-localization net- nomic states in both normal and patho-
genes characterized by an active (histone works (Juan et al., 2016), for program- logical tissues (Arts et al., 2016; Holland
H3 lysine 4 trimethylation, H3K4me3) and matic data access and filtering (Albrecht et al., 2016; Durek et al., 2016; Novakovic
a repressive (H3K27me3) mark on the et al., 2016), for analyzing the results et al., 2016). Memory of such external ex-
same histone, and reveal that this combi- of epigenome-wide association studies posures, coordinated at the chromatin
natory epigenetic signature is both lost (Breeze et al., 2016), for detecting ChIP- level, can influence future behavior of the
and gained at key regulatory genes during seq peaks (Hocking et al., 2016), and for cell and susceptibility to disease under
development (Lorzadeh et al., 2016; predicting transcription factor binding stress conditions.
Kinkley, et al., 2016; Weiner et al., 2016). (Schmidt et al., 2016). As part of IHEC’s Epigenomic profiles of normal cell
Further, these methods define previ- mission to develop quality standards types also provide a valuable comparator
ously undescribed co-occurrence pat- for epigenomic data, we have validated for their counterparts in diseased tissues.
terns of histone modifications on single the accuracy of epigenome assays Such comparisons have been performed
nucleosomes and in relationship with and proposed widely used quality stan- in solid tumors such as breast cancer
enzyme accessibility of chromatin. To dards for epigenome mapping (http:// (Pellacani et al., 2016) and extra-cranial
access the molecular information within ihec-epigenomes.org/). In addition, we malignant rhabdoid tumors (Chun et al.,
a diversity of interacting cell types in investigated the effect of sequencing 2016), hematological neoplasms such as
complex tissues we developed in silico depth on the accuracy of whole-genome mantle cell lymphoma (Queirós et al.,
deconvolution methods that provide bisulfite sequencing (Libertini et al., 2016), and chronic lymphocytic leukemia
estimates of genomic CpG methylation 2016a, Libertini et al., 2016b) and con- (Rendeiro et al., 2016). These analyses
and gene transcription within complex ducted a community-wide benchmarking have not only provided unprecedented in-
tissues, including solid tumors (Onuchic study comparing locus-specific DNA sights into disease pathogenesis but have
et al., 2016) and hematological neo- methylation assays across 18 labora- also enabled the stratification of diseases
plasms (Queirós et al., 2016). Finally, tories in seven different countries, estab- into novel clinico-biological subtypes.
a meta-epigenomic approach that com- lishing that DNA methylation profiling is On the one hand, pathological tissues
bines low-input and single-cell DNA accurate and robust enough for use as and cells exhibit epigenetic imprints
methylation sequencing gave rise to a a clinical biomarker (Bock et al., 2016). of the developmental or differentiation
comprehensive map of the DNA methyl- Finally, two studies have started to con- stages from which they originate and, on
ation dynamics of human hematopoietic nect epigenome regulation to the 3D the other hand, they acquire disease-
stem cell differentiation, experimentally structure of the nucleus, using high-reso- specific epigenetic alterations. Exciting
and bioinformatically accounting for epi- lution imaging (Le Gros et al., 2016) as well outcomes of these comparisons are the
genomic heterogeneity (Farlik et al., as computational methods for integrative identification of disease-specific regula-
2016). data analysis (Pancaldi et al., 2016). tors and distant enhancers regulating
oncogenes, the functional character-
New Computational Tools Bolster Epigenome Analysis Identifies ization of mutated/aberrantly expressed
the Utility of Epigenome Data for Pathways Involved in Cell Fate chromatin and transcriptional regulators,
Biology and Medicine Determination and Disease and how these might be profitably tar-
As of today, IHEC has generated over Recent technical advances allow the gen- geted by novel (Franci et al., 2016) as
7,000 datasets, which are publicly avail- eration of genome-wide signatures for pri- well as existing therapies (Nebbioso
able through several channels. For mary human cell types of increasingly et al., 2016; Chun et al., 2016; Mandoli
specialized analyses, the raw data files narrowly defined biological properties. et al., 2016).
containing personally identifiable data This provides new insights into the epige- These insights, together with the under-
can be obtained under the controlled netic and transcriptional basis of their dif- standing of how immune cells alter their
access scheme from dbGaP (NIH) and ferentiation capabilities, their responses epigenomes in reaction to or to contribute
EGA (EBI). For common analyses not to specific stimuli, and how these are to a diseased environment (De Simone
using any personally identifiable informa- altered in pathological conditions. et al., 2016; Paul et al., 2016; Galindo-
tion, pre-processed data can be obtained Exciting new information can be Albarrán et al., 2016; Novakovic et al.,
from the unrestricted GEO (NIH) and retrieved from epigenomic differences 2016), and how the epigenomic changes
ArrayExpress (EBI) repositories. To guide between developmentally linked cell are established by environmental cues
new users, IHEC has made a substantial types, their inferred relationships, and (Holland et al., 2016), will likely lead to
investment into dedicated data access the likely identity of chromatin and tran- new biomarkers for a better diagnosis
tools. The IHEC Data Portal (http:// scriptional regulators of their differentia- and estimation of prognosis, as well as
epigenomesportal.ca/ihec/) provides a tion and developmental states (Hamada improved epi-drug based treatments
comprehensive overview and single point et al., 2016; Pellacani et al., 2016; Durek and outcomes for a plethora of disease
of entry for accessing all IHEC reference et al., 2016; Galindo-Albarrán et al., states. A present example of epigenomic
epigenome data (Bujold et al., 2016). 2016; Schuyler et al., 2016; Wallner analysis that may lead to testable clinical
This portal is complemented by tools et al., 2016). Analysis of cells subjected intervention is the reversal of endotoxin-
for comparing epigenome data between to specific external stimuli shed new light induced tolerance in macrophages (No-
cell types (Fernández et al., 2016), for on how environmental cues alter epige- vakovic et al., 2016).
1146 Cell 167, November 17, 2016

The IHEC consortium is confident ated with psychiatric diseases (Sun et al., data sharing can be simplified. In this
that the comprehensive analysis of epi- 2016). context the lighter protection and secu-
genomes in health and disease will lead The three-dimensional structure of rity models we describe here will take
to a better understanding of how differen- chromosomes within the nucleus con- growing importance for data intensive
tiation and stability of cellular phenotypes stitutes a key layer of epigenetic informa- health research.
is controlled on a molecular level. By iden- tion, since it can generate diverse read-
tification of novel biomarkers as well as outs from a constant genome sequence. IHEC Looking Forward
targets for therapy, this will likely lead to From a practical standpoint, one can Epigenomic assays have revealed that
improved treatment and outcomes in a use maps of long-range loops between selected subsets of regulatory elements
variety of diseases. enhancers and promoters to deter- in our genomic blueprints are read differ-
mine which gene is regulated by a dis- ently by gene expression machinery to
Epigenetic Marks Illuminate Effects ease-associated noncoding variant. For maintain expression of the suites of genes
of Noncoding DNA Variants in example, maps of long-range contacts needed for cellular functions. Genome-
Disease in 17 primary human blood cell types wide epigenomic data for a diverse set
A major challenge following the identifi- exhibited systemic variation across cell of human cells and tissues also have great
cation of DNA variants associated with types and identified over 2,500 potential utility for generating hypotheses about the
different diseases is pinning down their ef- disease genes when combined with a regulatory elements associated with com-
fects, especially when they lie in noncod- database of disease-associated variants plex human diseases. These hypotheses
ing regions of the genome. A common (Javierre et al., 2016). Similarly, chromatin can be tested by disease experts in the
mechanistic hypothesis is that the genetic contact maps in 21 primary human broad scientific community, for instance
variant affects the function of a cis-regula- tissues and cell types yielded a large using CRISPR-based profiles to function
tory element and thereby the expression compendium of candidate genes when (P2F) approaches for epigenome editing
of a gene, which then influences the dis- combined with known disease-associ- and screening (Stricker et al., 2016).
ease phenotype. To confirm such a hy- ated noncoding variants and also re- Although IHEC is well on its way toward
pothesis, it is important to characterize vealed thousands of frequently interact- accomplishing its primary goals of gener-
the molecular phenotypes that mediate ing regions (FIREs) with unusually high ating high quality reference epigenomes
the effect of genotype on disease. The levels of long-range chromatin con- and making them available to the scientific
IHEC studies capitalize on epigenomic tacts (Schmitt et al., 2016). Together, the community, much more remains to be
information to address these questions, studies in this section play a crucial role done. As IHEC itself further develops, we
and several papers in the package take in using epigenetics to fill in the gaps be- anticipate shifting our focus toward a num-
on the question of DNA variants in disease tween genotype and disease phenotype. ber of possible new directions. These
directly. For example, a study of popula- include extensions of the previous goals
tion variation in epigenetic states and Further Exploration as well as new opportunities to drive to-
gene expression in three human blood A challenge faced by international consor- ward the overarching vision of improving
cell types showed that these molecular tia working with human data is the need to human health including the integration
traits were often influenced by the same efficiently and openly share their data of information from the environment and
genetic variants in a coordinated manner, while sufficiently protecting the identity aging in the interpretation of cellular states.
and underpinned hundreds of previously of participant donors from potential re- Advances in technology will allow investi-
reported autoimmune disease associa- identification. The response of the com- gation of epigenomic changes in single
tions (Chen et al.,2016). Moving one step munity has been to develop a ‘‘controlled cells rather than populations and the char-
further along the path from genotype access’’ governance framework to pro- acterization of tissue/disease-linked het-
to phenotype, a related study cataloged vide an additional level of privacy and se- erogeneity. Understanding natural and
population variation in cellular traits (36 curity protection to the sharing of sensitive disease-linked variation in human epige-
blood-cell parameters) in a cohort of data. Our commentary (Joly et al., 2016) nomes has already begun through IHEC,
173,480 individuals and again detected presents the advantages and limitations, and will be expanded upon. Targeted edit-
correlations with genetic variation (Astle associated with controlled access, and ining of the epigenome to functionally vali-
et al., 2016). Notably, genetic loci associ- troduces other, less demanding, data pro- date regulatory mechanisms has been
ated with blood cell traits were frequently tection and security models including gaining interest. Deeper investigation of
linked to epigenetic and transcriptomic registered access, open consent, and pri- epigenomic changes during critical devel-
traits and also to autoimmune conditions, vacy enhancing technologies. Following a opmental periods and upon environmental
schizophrenia, and heart disease, poten- critical review of each of these alternative exposure are natural extensions of current
tially implying an etiological role for models, we conclude that, while all pre- work. Integration of epigenomic and other
blood cell parameters. Along similar sent specific advantages, none of them -omic approaches (such as proteomics,
lines, correlations between genotype is currently ready to replace ‘‘controlled metabolomics, transcriptomics, and ana-
and histone acetylation variation in spe- access.’’ However, as we become more lyses of the microbiome) is already under-
cific brain regions, termed histone acety- familiar with data sharing, including its way in several countries. In particular,
lation QTLs (haQTLs), provided candidate risks and benefits, it is hoped that the there is considerable interest in integrating
regulatory variants at multiple loci associ- amount of procedural scrutiny around epigenomic, transcription factor binding
Cell 167, November 17, 2016 1147

and expression data with chromatin nome region sets. Nucleic Acids Res. 44(W1), T., Schmidt, F., Xiong, J., et al. (2016). Epigenomic
conformation and sub-nuclear imaging in- W581–W586. profiling of human CD4+ T cells supports a linear
American Association for Cancer Research Human differentiation model and highlights molecular
formation to develop a unified understand-
Epigenome Task Force; European Union, Network regulators of memory development. Immunity 45.
ing of the 3D organization and regulatory http://dx.doi.org/10.1016/j.immuni.2016.10.022.
of Excellence, Scientific Advisory Board (2008).
dynamics of the nucleus. There have
Moving AHEAD with an international human epige- Farlik, M., Halbritter, F., Müller, F., Choudry, F.A.,
been considerable new and exciting in- nome project. Nature 454, 711–715. Ebert, P., Klughammer, J., Farrow, S., Santoro,
sights in the fields of cancer and inflamma- A., Ciaurro, V., Mathur, A., et al. (2016). DNA
Arts, R.J.W., Novakovic, B., ter Horst, R., Carvalho,
tion in recent years, revealing primary epi- A., Bekkering, S., Lachmandas, E., Rodrigues, F., methylation dynamics of human hematopoietic
genomic alterations associated with Silvestre, R., Cheng, S.-H., Wang, S.-Y., et al. stem cell differentiation. Cell Stem Cell 19. http://
disease pathology. A key interest moving (2016). Glutaminolysis and fumarate accumulation dx.doi.org/10.1016/j.stem.2016.10.019.
forward is to translate the knowledge integrate immunometabolic and epigenetic pro- Fernández, J.M., de la Torre, V., Richardson, D.,
gained through basic epigenomic investi- grams in trained immunity. Cell Metab. 24 http:// Royo, R., Puiggròs, M., Moncunill, V., Fragko-
dx.doi.org/10.1016/j.cmet.2016.10.008. gianni, S., Clarke, L., Flicek, P., Rico, D., et al.;
gations and resource generating consortia
such as the IHEC to improve disease diag- Astle, W.J., Elding, H., Jiang, T., Allen, D., Ruklisa, BLUEPRINT consortium (2016). EPICO platform:
D., Mann, A.L., Mead, D., Bouman, H., Riveros- a reference cyber-infrastructure for comparative
nosis, stratification, and treatment through
Mckay, F., Kostadima, M.A., et al. (2016). The epigenomics. The BLUEPRINT Data Analysis Por-
the continued development of epige- allelic landscape of human blood cell trait variation tal as a practical case. Cell Syst. 3 http://dx.doi.
nomic-based biomarkers and small mole- and links to common complex disease. Cell 167, org/10.1016/j.cels.2016.10.021.
cule epigenetic therapeutics. These could this issue, 1415–1429.
Franci, G., Sarno, F., Nebbioso, A., and Altucci, L.
be investigated in longitudinal and well- Beck, S., Olek, A., and Walter, J. (1999). From (2016). Identification and characterization of
controlled intervention studies of epige- genomics to epigenomics: a loftier view of life. PKF118-310 as a KDM4A inhibitor. Epigenetics,
nomics in relation to disease, aging, and Nat. Biotechnol. 17, 1144. http://dx.doi.org/10. 0. Published online October 21, 2016. http://dx.
environmental exposure. 1038/70651. doi.org/10.1080/15592294.2016.1249089.
While not an exhaustive list, the above Bock, C., Halbritter, F., Carmona, F.J., Tierling, S., Galindo-Albarrán, A.O., López-Portales, O.H., Gu-
directions illustrate the wide range of Datlinger, P., Assenov, Y., Berdasco, M., Berg- tiérrez-Reyna, D.Y., Rodrı́guez-Jorge, O., Sán-
mann, A.K., Booher, K., Busato, F., et al.; chez-Villanueva, J.A., Ramı́rez-Pliego, O., Bergon,
potential opportunities provided by a
BLUEPRINT consortium (2016). Quantitative com- A., Lorio, B., Holota, H., Imbert, J., et al. (2016).
coordinated, comprehensive assessment parison of DNA methylation assays for biomarker CD8+ T cells from human neonates are biased
of epigenomic function. Future directions development and clinical applications. Nat. Bio- towards an innate immune response. Cell Rep.
of the IHEC consortium will depend on technol. 34, 726–737. 17 http://dx.doi.org/10.1016/j.celrep.2016.10.056.
the specific interests of the member Breeze, C.E., Paul, D.S., van Dongen, J., Butcher, Hamada, H., Okae, H., Toh, H., Chiba, H., Hiura,
projects, and an ongoing assessment of L.M., Ambrose, J.C., Barrett, J.E., Lowe, R., H., Shirane, K., Sato, T., Suyama, M., Yaegashi,
the best areas to continue to add value Rakyan, V.K., Iotchkova, V., Frontini, M., et al. N., Sasaki, H., and Arima, T. (2016). Allele-specific
in epigenomic investigations. (2016). eFORGE: a tool for identifying cell type- methylome and transcriptome analysis reveals
specific signal in epigenomic data. Cell Rep. 17 widespread imprinting in the human placenta.
http://dx.doi.org/10.1016/j.celrep.2016.10.059. Am. J. Hum. Genet. 99, 1045–1058.
SUPPLEMENTAL INFORMATION
Bujold, D., Anderson de Lima Morais, D., Gauthier,
Hocking, T.D., Goerner-Potvin, P., Morin, A., Shao,
Supplemental Information includes International C., Côté, C., Caron, M., Kwan, T., Chung Chen, K.,
X., Pastinen, T., and Bourque, G. (2016). Opti-
Human Epigenome Consortium members and affil- Laperle, J., Nordell Markovits, A., Pastinen, T.,
mizing ChIP-seq peak detectors using visual labels
iations and can be found with this article online at et al. (2016). The International Human Epigenome
and supervised machine learning. Bioinformatics,
http://dx.doi.org/10.1016/j.cell.2016.11.007. Consortium (IHEC) Data Portal. Cell Syst. 3 http://
btw672. Published online October 24, 2016.
An audio PaperClip is available at http://dx.doi. dx.doi.org/10.1016/j.cels.2016.10.019.
http://dx.doi.org/10.1093/bioinformatics/btw672.
org/10.1016/j.cell.2016.11.007#mmc2. Chen, L., Ge, B., Casale, F.P., Vasquez, L., Kwan,
Holland, M.L., Lowe, R., Caton, P.W., Gemma, C.,
T., Garrido-Martı́n, D., Watt, S., Yang, Y., Kundu,
ACKNOWLEDGMENTS Carbajosa, G., Danson, A.F., Carpenter, A.A.,
K., Ecker, S., et al. (2016). Genetic drivers of epige-
Loche, E., Ozanne, S.E., and Rakyan, V.K. (2016).
netic and transcriptional variation in human im-
Early-life nutrition modulates the epigenetic state
We would like to thank the trainees and research mune cells. Cell 167, this issue, 1398–1414.
and administrative assistants who are not listed in of specific rDNA genetic variants in mice. Science
Chun, H.J., Lim, E.L., Heravi-Moussavi, A., Saberi, 353, 495–498.
this overview but without whom the IHEC would S., Mungall, K.L., Bilenky, M., Carles, A., Tse, K.,
not have achieved this work. We also thank collab- Shlafman, I., Zhu, K., et al. (2016). Genome- Javierre, B.M., Burren, O.S., Wilder, S.P., Kreuz-
orators who are not members of IHEC for their valu- Wide Profiles of Extra-cranial Malignant Rhab- huber, R., Hill, S.M., Sewitz, S., Cairns, J., Wingett,
able contributions. The views expressed in this doid Tumors Reveal Heterogeneity and Dysregu- S.W., Várnai, C., Thiecke, M.J., et al. (2016). Line-
article are solely those of the authors and may not lated Developmental Pathways. Cancer Cell 29, age-specific genome architecture links disease
necessarily reflect those of the NIH (USA) and 394–406. variants to target genes. Cell 167, this issue,
the European Commission. Neither the European 1369–1384.
De Simone, M., Arrigoni, A., Rossetti, G., Gruarin,
Commission nor any person acting on behalf of
P., Ranzani, V., Politano, C., Bonnal, R.J.P., Pro- Joly, Y., Dyke, S.M.O., Knoppers, B.M., and Pasti-
the Commission is responsible for the use that
vasi, E., Sarnicola, M.L., Panzeri, I., Moro, M., nen, T. (2016). Are Data Sharing and Privacy Pro-
might be made of the presented information.
et al. (2016). Transcriptional landscape of human tection Mutually Exclusive? Cell 167, this issue,
tissue lymphocytes unveils uniqueness of tumor- 1150–1154.
REFERENCES
infiltrating T regulatory cells. Immunity 45. http:// Jones, P.A., and Martienssen, R. (2005). A blue-
Albrecht, F., List, M., Bock, C., and Lengauer, T. dx.doi.org/10.1016/j.immuni.2016.10.021. print for a Human Epigenome Project: the AACR
(2016). DeepBlue epigenomic data server: pro- Durek, P., Nordström, K., Gasparoni, G., Salhab, Human Epigenome Workshop. Cancer Res. 65,
grammatic data retrieval and analysis of epige- A., Kressler, C., de Almeida, M., Bassler, K., Ulas, 11241–11246.
1148 Cell 167, November 17, 2016

Juan, D., Perner, J., Carrillo de Santa Pau, E., Mar- Nebbioso, A., Carafa, V., Conte, M., Tambaro, Rendeiro, A.F., Schmidl, C., Strefford, J.C., Wa-
sili, S., Ochoa, D., Chung, H.R., Vingron, M., Rico, F.P., Abbondanza, C., Martens, J.H.A., Nees, lewska, R., Davis, Z., Farlik, M., Oscier, D., and
D., and Valencia, A. (2016). Epigenomic Co-locali- M., Benedetti, R., Pallavicini, I., Minucci, S., Bock, C. (2016). Chromatin accessibility maps of
zation and Co-evolution Reveal a Key Role for et al. (2016). c-Myc modulation & acetylation is chronic lymphocytic leukaemia identify subtype-
5hmC as a Communication Hub in the Chromatin a key HDAC inhibitor target in cancer. Clin. Can- specific epigenome signatures and transcription
Network of ESCs. Cell Rep. 14, 1246–1257. cer Res. clincanres.2388.2015. http://dx.doi.org/ regulatory networks. Nat. Commun. 7, 11938.
10.1158/1078-0432.CCR-15-2388.
Kinkley, S., Helmuth, J., Polansky, J.K., Dunkel, I., Schmitt, A.D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan,
Gasparoni, G., Fröhler, S., Chen, W., Walter, J., Novakovic, B., Habibi, E., Wang, S.-Y., Arts, C.L., Li, Y., Lin, S., Lin, Y., Barr, C.L., and Ren, B.
Hamann, A., and Chung, H.R. (2016). reChIP-seq R.J.W., Davar, R., Megchelenbrink, W., Kim, B., (2016). A Compendium of Chromatin Contact
reveals widespread bivalency of H3K4me3 and Kuznetsova, T., Kox, M., Zwaag, J., et al. (2016). Maps Reveal Spatially Active Regions in the Hu-
H3K27me3 in CD4(+) memory T cells. Nat. Com- b-glucan reverses the epigenetic state of LPS man Genome. Cell Rep. 17 http://dx.doi.org/10.
mun. 7, 12514. induced immunological tolerance. Cell 167, this 1016/j.celrep.2016.10.061.
issue, 1354–1368.
Le Gros, M.A., Clowney, E.J., Magklara, A., Yen, Schmidt, F., Gasparoni, N., Gasparoni, G., Gian-
A., Markenscoff-Papadimitriou, E., Colquitt, B., Onuchic, V., Hartmaier, R.J., Boone, D.J., Sam-
moena, K., Cadenas, C., Polansky, J.K., Ebert,
Myllys, M., Kellis, M., Lomvardas, S., and Larabell, uels, M.L., Patel, R.Y., White, W.M., Garovic,
P., Nordstroem, K., Barann, M., Sinha, A., et al.
C.A. (2016). Soft X-ray tomography reveals gradual V.S., Oesterreich, S., Roth, M.E., Lee, A.V., and
(2016). Combining transcription factor binding af-
chromatin compaction and reorganization during Milosavljevic, A. (2016). Epigenomic Deconvolu-
finities with open-chromatin data for accurate
neurogenesis in vivo. Cell Rep. 17 http://dx.doi. tion reveals pervasive epithelial-stromal metabolic
gene expression prediction. BioRxiv. http://dx.
org/10.1016/j.celrep.2016.10.060. coupling within human breast tumors. Cell Rep. 17
doi.org/10.1101/081935.
http://dx.doi.org/10.1016/j.celrep.2016.10.057.
Libertini, E., Heath, S.C., Hamoudi, R.A., Gut, M., Schuyler, R.P., Merkel, A., Raineri, E., Altucci, L.,
Pancaldi, V., Carrillo-de-Santa-Pau, E., Javierre,
Ziller, M.J., Czyz, A., Ruotti, V., Stunnenberg, Vellenga, E., Martens, J.H.A., Pourfarzad, F.,
B.M., Juan, D., Fraser, P., Spivakov, M., Valencia,
H.G., Frontini, M., Ouwehand, W.H., et al. Kuijpers, T.W., Burden, F., Farrow, S., et al.
A., and Rico, D. (2016). Integrating epigenomic
(2016a). Information recovery from low coverage (2016). Distinct trends of DNA methylation
data and 3D genomic structure with a new measure
whole-genome bisulfite sequencing. Nat. Com- patterning in the innate and adaptive immune
of chromatin assortativity. Genome Biol. 17, 152.
mun. 7, 11306. systems. Cell Rep. 17 http://dx.doi.org/10.1016/j.
http://dx.doi.org/10.1186/s13059-016-1003-3.
Libertini, E., Heath, S.C., Hamoudi, R.A., Gut, M., celrep.2016.10.054.
Paul, D.S., Teschendorff, A.E., Dang, M.A.N.,
Ziller, M.J., Herrero, J., Czyz, A., Ruotti, V., Stun- Lowe, R., Hawa, M.I., Ecker, S., Cunningham, S., Stricker, S.H., Köferle, A., and Beck, S. (2016). From
nenberg, H.G., Frontini, M., et al. (2016b). Fouts, A.R., Ramelius, A., Burden, F., et al. profiles to function in epigenomics. Nat. Rev.
Saturation analysis for whole-genome bisulfite (2016). Increased DNA methylation variability in Genet. http://dx.doi.org/10.1038/nrg.2016.138.
sequencing data. Nat. Biotechnol. http://dx.doi. type 1 diabetes across three immune effector cell
org/10.1038/nbt.3524. Sun, W., Poschmann, J., Cruz-Herrera del Rosario,
types. Nat. Commun. http://dxdoi.org/10.1038/
R., Parikshak, N.N., Hajan, H.S., Vibhor Kumar, V.,
Lorzadeh, A., Bilenky, M., Hammond, C., Knapp, ncomms13555.
Ramasamy, R., Belgard, T.G., Elanggovan, B.,
D.J.H.F., Li, L., Miller, P.H., Carles, A., Heravi- Pellacani, D., Bilenky, M., Kannan, N., Heravi- Wong, C.C.Y., et al. (2016). Histone Acetylome-
Moussavi, A., Gakkhar, S., Moksa, M., et al. Moussavi, A., Knapp, D.J.H.F., Gakkhar, S., wide Association Study of Autism Spectrum
(2016). Nucleosome density ChIP-seq identifies Moksa, M., Carles, A., Moore, R., Mungall, A.J., Disorder. Cell 167, this issue, 1385–1397.
distinct chromatin modification signatures of pro- et al. (2016). Analysis of normal human mammary
moters associated with MNase accessibility. Cell epigenomes reveals cell-specific active enhancer Wallner, S., Schröder, C., Leitão, E., Berulava, T.,
Rep. 17 http://dx.doi.org/10.1016/j.celrep.2016. states and associated transcription factor net- Haak, C., Beißer, D., Rahmann, S., Richter, A.S.,
10.055. works. Cell Rep. 17 http://dx.doi.org/10.1016/j. Manke, T., Bönisch, U., et al. (2016). Epigenetic dy-
celrep.2016.10.058. namics of monocyte-to-macrophage differentia-
Mandoli, A., Singh, A.A., Prange, K.H.M., Tijchon,
tion. Epigenetics Chromatin 9, 33.
E., Oerlemans, M., Dirks, R., Ter Huurne, M., Wier- Queirós, A.C., Beekman, R., Vilarrasa-Blasi, R.,
enga, A.T.J., Janssen-Megens, E.M., Berentsen, Duran-Ferrer, M., Clot, G., Merkel, A., Raineri, E., Weiner, A., Lara-Astiaso, D., Krupalnik, V., Gafni,
K., et al. (2016). The hematopoietic transcription Russiñol, N., Castellano, G., Beà, S., et al. (2016). O., David, E., Winter, D.R., Hanna, J.H., and
factors RUNX1 and ERG prevent AML1-ETO onco- Decoding the DNA methylome of mantle cell lym- Amit, I. (2016). Co-ChIP enables genome-wide
gene overexpression and onset of the apoptosis phoma in the light of the entire B-cell lineage. Can- mapping of histone mark co-occurrence at
program in t(8;21) AMLs. Cell Rep. 17 http://dx. cer Cell 30. http://dx.doi.org/10.1016/j.ccell.2016. single-molecule resolution. Nat. Biotechnol. 34,
doi.org/10.1016/j.celrep.2016.08.082. 09.014. 953–961.
Cell 167, November 17, 2016 1149

Leading Edge
Commentary
Are Data Sharing and Privacy

Protection Mutually Exclusive?
Yann Joly,1,* Stephanie O.M. Dyke,1 Bartha M. Knoppers,1 and Tomi Pastinen2
1Centre of Genomics and Policy, McGill University, Montreal, QC H3A 0G1, Canada
2Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada
*Correspondence: yann.joly@mcgill.ca
We review emerging strategies to protect the privacy of research participants in international epi-
genome research: open consent, genome donation, registered access, automated procedures, and
privacy-enhancing technologies.
With the advent of the Human Genome data with public funding still lack familiar- privacy-enhancing technologies. How-
Project and the emergence of next-gener- ity with recent scientific norms appli- ever, these approaches have yet to gain
ation sequencing (NGS) technologies, cable to governing, curating, and sharing broad acceptance from the international
open data sharing has become increas- big data. This can lead to delays in research community.
ingly popular within the scientific commu- data sharing, as well as incomplete quality Based on the authors’ experience
nity. This model is now the norm for large control. as members of the International Cancer
scale ‘‘OMICS’’ research projects. In the The initial response from policymakers Genome Consortium (ICGC), the Inter-
context of epigenomic research, open has been to propose an intermediate national Human Epigenome Consortium
science will facilitate the association of approach involving ‘‘controlled access’’ (IHEC), and the Global Alliance for
rich epigenetic datasets (such as DNA to potentially identifying human genomic Genomics and Health (GA4GH), this
methylation, RNA expression, chromatin data and associated metadata. This pro- Commentary provides an overview of
states, and conformation) with data on cess requires, at a minimum, that re- the trade-offs involved in controlled-
participants’ medical and environmental searchers requesting access to the data access approaches for epigenomics
exposure. These will provide a reference complete an access agreement providing research and assesses the potential of
for studies of genetic and epigenetic personal and institutional identification. various proposed alternatives.
events that underlie human development, Researchers are also required to describe It is particularly timely to discuss this
diversity, and disease. The independent the purpose of their research and commit in the field of epigenetics, where research
evaluation of robustness of analytical to a number of good privacy and security is shifting from animal models to human
strategies and conclusions from experi- practices for the processing of controlled participants. Epigenomic datasets are
ments, a critical part of the scientific pro- data. even richer and more informative than
cess, hinges on access to materials, The controlled access approach is not genomic data with sequence variation
which, in this case, are raw sequence files a standalone, comprehensive method only, enhancing the benefits and exac-
and associated metadata. to ensure the complete protection of erbating the challenges of open data
Although the open science model potentially identifying health data. Rather, sharing and making concerns about
has made significant contributions to the controlled access should be deployed as ethics and data security all the more rele-
progress of ‘‘OMICS’’ research, it has part of an overall data privacy protection vant in this field. Members of IHEC have
also met with resistance from some framework that includes state-of-the-art adopted a tiered strategy to sharing
stakeholders. These include concerns physical, administrative, and technical se- data, using a completely open access
from the private sector over intellectual curity safeguards working in conjunction policy approach as a default and a
property rights and from the data pro- with national and international privacy controlled access approach where the
ducers over attribution and recognition norms. It should also include the develop- sensitivity of the data requires greater
of their work. Another important critique ment of an effective compliance and care. The IHEC Bioethics Workgroup con-
comes from privacy advocates due to accountability framework. ducts research to support IHEC’s data
the inherently identifying nature of genetic Yet, controlled access has been criti- sharing policies and regularly evaluates
information and the possibility for data cized by some members of the scientific the risks and benefits of evolving data
misuse by third parties. A daunting chal- community who believe that it repre- sharing strategies (Dyke et al., 2016b).
lenge is the inevitable variation among sents a strong impediment to open sci-
the large number of data producers, ence research. In an attempt to address The Controlled Access Approach
most operating outside large international this critique, other potential models According to the 2009 Toronto Statement
consortia, now having access to NGS have been proposed and implemented, on Data Sharing, ‘‘Data about human sub-
technologies in all branches of life sci- such as open consent, genome donation, jects participating in genetic and epide-
ences. Researchers producing valuable registered access, and the use of diverse miological research require particularly
1150 Cell 167, November 17, 2016 ª 2016 Elsevier Inc.

careful consideration owing to privacy- on information from collaborators, and researchers from a variety of disciplines
protection issues and the potential harms on occasional spot checks. This system have been increasingly experimenting
that could arise from misuse. [F]or these generally works but is not impervious to with other models for securely sharing
reasons, it is important to develop and occasional setbacks or governance risks. data, and some of their proposed solu-
implement robust governance models The main argument against controlled tions deserve a closer look.
and procedures for human subjects’ data access involves the low number of
early in a project’’ (Toronto International researchers using controlled databases Open Consent and Genome
Data Release Workshop Authors et al., in comparison with the number of re- Donation
2009). In order to address this concern, searchers accessing entirely open data Open consent requires that research par-
especially at a time when sophisticated portals. Lower levels of access could in ticipants consent to unrestricted disclo-
bioinformatics methods of data re-identifi- turn result in poor data quality given the sure of their data and of any information
cation are published regularly in scientific more limited peer review of the controlled that may emerge from future unspecified
journals (Erlich and Narayanan, 2014), data (Kaye, 2012). Are researchers resis- research based on that data. No promises
large-scale research projects such as tant to the controlled access model and of anonymity, privacy, or confidentiality
ICGC implemented the controlled access consequently unwilling to spend the time are made to them. To avoid terminological
approach to data sharing (Milius et al., to complete data access request appli- confusion and needlessly capitalizing
2014). Controlled access uses an access cations? Or are access committees on the popularity of ‘‘open source,’’ only
agreement, overseen by an access com- too restrictive in formulating their access projects whose data can be accessed
mittee, to make data availability condi- criteria and inefficient in implementing the by researchers completely openly—i.e.,
tional upon the researchers identifying authorization process? The truth probably without having to identify themselves or
themselves and agreeing to a number lies somewhere in the middle: there is agree to any limitations or contractual
of conditions on data usage. Because great cultural resistance to the model agreement—should qualify as open con-
controlled access is regarded as a form from a scientific community already overre- sent. Several projects are currently
of open access, these conditions should gulated with the multiplication of ethics described by researchers as open con-
be kept to the minimum necessary to and oversight committees, privacy laws sent, although they actually require
ensure that participants’ data are reason- and arrangements, and overly complex completing various procedures prior to
ably well protected from re-identification. data and material transfer agreements gaining access to the data.
Further, limited conditions aimed at pre- (Greenbaum et al., 2011). The added According to its proponents, the lead-
venting parasite patenting and at securing administrative burden imposed by distinct, ing moral principle behind the open con-
proper recognition of the work for data sometimes overly legalistic—and contra- sent approach is veracity, or truthfulness,
producers and curators are also generally dictory—controlled access agreements, which is necessary to the respect of per-
included. It is a given that controlled ac- as well as overprotective access com- sonal autonomy (Lunshof et al., 2008).
cess does not permit researchers to curtail mittees in some research fields, can The open consent model is currently
national legal and ethical policies on pri- present an unpleasant surprise that may used by projects such as the Personal
vacy and data protection. Controlled ac- discourage many researchers from ac- Genome Project (PGP) and in epige-
cess enables data sharing for projects cessing otherwise valuable data. However, nomic research by the PGP-UK and
that meet these standards and have it is worth noting that the impediments by the Encyclopedia of DNA Elements
received relevant ethics approval. associated with controlled access are, in (ENCODE) Consortium. More recently,
A well-implemented controlled access theory, remediable. The simplicity of the genome donation has been suggested
framework can provide a number of ad- process can be improved, and this, along by PGP as a potential strategy enabling
vantages for data producers, such as pro- with proper education and engagement of participants in studies using a standard
moting users’ compliance with privacy the scientific community on the topic of consent and controlled access model
laws and ethics policies, the recognition controlled access, possibly through users’ to request their data and donate them
of the work of data producers and cura- satisfaction survey, could optimize the to an approved open access project.
tors, providing a minimum level of ethics process. The work of GA4GH in harmo- The major benefit of open consent for
oversight over access requests, and sub- nizing data sharing policies and standards scientists is an obvious one. The data
stantially reducing possible data misuse will be of key importance in promoting the of consenting participants are made
(Milius et al., 2014; Joly et al., 2011). How- integration and quality of the controlled completely free for use by members
ever, it should be noted that operational access process (The Global Alliance for of the scientific community without any
constraints can limit access committees’ Genomics and Health, 2016). It remains burdensome access formalities. How-
ability to provide some of these advan- to be seen whether this would provide ever, the potential widespread use of
tages. For example, a small committee sufficient impetus for the larger scientific this model does raise some legal and
working on a part-time basis cannot community to more frequently apply for ethical challenges. National privacy regu-
hope to successfully police all the avail- access to controlled datasets. lations of many countries, outside the
able new research to ensure that the con- United States, Austria, and the United
tributions of data producers are always Alternatives Kingdom, where it is currently piloted,
properly recognized. Instead, committees Is controlled access truly needed for could prevent their researchers from us-
will often rely on the good faith of users, epigenomic data sharing? In recent years, ing open consent to share the personal
Cell 167, November 17, 2016 1151

health data (including potentially identifi- Given the substantial legal and ethical and controlled access as an intermediate
able genetic data) of their research partic- challenges associated with open con- data access tier.
ipants (Rothstein et al., 2016). Genome sent, it does not seem probable that it Innovative projects have begun ex-
donation, i.e., research participants re- will become the default approach to perimenting with novel data access
questing to obtain their data from tradi- sharing human epigenomic datasets in registration models, such as Sage Bio-
tional biobanking projects to donate it to the near future. Given the more stringent networks mPower Parkinson’s disease
a completely open access repository, consent requirements associated with study. Within GA4GH, standards and
could permit researchers from some of children research participants, open con- guidance for the registered access model
these countries to circumvent this sent is unlikely to be permitted for these are currently under development. It is
obstacle, but the full legal and ethical ram- populations. However, infant and child- envisaged that this data access tier
ifications of this novel strategy remain to hood developmental changes in the epi- could be used to provide tiered access
be carefully assessed. PGP-UK, who pio- genome are key to understanding how to several different categories of data
neered the genome donation approach the environment can interact with the users (e.g., researchers and clinicians)
and has been using it with ethics approval genome and potentially lead to later and be based on a simplified online
since June 2015, has addressed this point life diseases (a central hypothesis in agreement between data custodians and
by subjecting donors to an entrance exam translational epigenomics).In countries registered users (Dyke et al., 2016a). An
demonstrating that any risks have been that do authorize it (on the adult popula- example of the kind of permissions and
disclosed and understood. tion), open consent remains an inter- restrictions on data use that may need
More conceptually, the undeniable esting alternative to promote uncompli- to be agreed to by data users in the
need for truthfulness in the informed con- cated data sharing practices and broad registered access process are those
sent process does not seem to justify or data usage by the research community. stemming from the consent provided by
necessitate abandoning other possible This strategy seems most suited to the research participants whose data are
means of protecting participants’ data. It context of fundamental, resource-gen- shared. The benefits of both standard-
is perfectly truthful and ethical to tell a eration-focused epigenomics research izing categories of consent-based condi-
participant that, although complete ano- programs where data are only associ- tions and facilitating the understanding
nymity is not possible in OMICS research, ated with a minimal amount of phenotype of such limitations on the scope of data
the project is designed to provide partici- information, making it less vulnerable use have led researchers involved in
pants’ data with a reasonably high level to privacy and re-identification attacks GA4GH to develop Consent Codes to
of security and privacy—for example, than translational research initiatives, describe these restrictions on data use
through controlled access or sample and which require linkage with multiple (Dyke et al., 2016b). Standard conditions
data anonymization. clinical and environmental data fields. A of registration, including a commitment
Although some argue that it is a moral well-designed communication strategy to best research or care practices,
imperative that informed participants to ensure participants are thoroughly along with standard descriptions of other
who desire to openly share their samples informed of the implications of partici- conditions on data use, such as those
and data with the scientific community pating in open consent projects would included in the GA4GH Consent Codes,
be allowed to do so, this type of reasoning also be in order. will enable the federation of registered
does not resolve the question of what access systems in different parts of the
should be done with the remainder Registered Access and Automated world.
of the population. For this potentially Procedures The registered access model has two
large group of individuals, open consent The ‘‘registered access’’ model offers an significant advantages over controlled
could raise two types of adverse conse- attractive option for sharing data while access: (1) its application process is
quences: (1) it could discourage people maintaining a degree of privacy protection considerably simpler and less burden-
that would otherwise be interested and establishing other terms and condi- some for both researchers and data
in participating in epigenomics studies tions of data use. In a registered data custodians; and (2) it is also easier
from doing so for fear of discrimination access model, applicants are approved to automate, in large part because its
or other types of data misuse; and (2) if for access to shared data by providing approval process is less demanding
the number of open epigenomics projects details of their identity for authentication (e.g., it does not rely on the review of
were to increase significantly compared and agreeing to terms and conditions of descriptions of research proposals). The
to the number of controlled-access data use during the registration process. latter advantage means that registered
(or similar) epigenomics projects, partici- Such agreements can address issues access requires much less manual pro-
pants who desire a higher level of privacy such as acceptable scope of data use, cessing, thereby cutting costs and poten-
protection could become marginalized albeit requiring that data custodians tially increasing reliability. While the level
and excluded from the epigenomics clearly describe permitted uses of data of protection provided through registered
research sphere altogether. While both and relinquish the process of reviewing access mechanisms remains to be deter-
scenarios are certainly possible, due to individual data use proposals, which is mined and will impact how much data are
the paucity of empirical data, it remains central to many controlled access sys- shared this way, it will likely prove a valu-
difficult to accurately assess their actual tems. It could therefore be said that regis- able tool in the context of epigenomic
likelihood. tered access falls between open access data sharing.
1152 Cell 167, November 17, 2016

Given the substantial advantages of effectively anonymize shared data (for Stakeholders should therefore continue
automating data access mechanisms, methylation data see Dyke et al., 2015), improving on the conceptual and proce-
many of these may eventually be auto- (2) securely pool individual participant dural aspects of controlled access and
mated. For example, the Broad Institute data from multiple studies to answer spe- eliminating inconsistent requirements so
is developing a ‘‘Data Use Oversight cific research questions, or (3) securely as to make this approach a more efficient
System’’ (DUOS) to manage the use of perform meta-analyses of aggregate and attractive option for the scientific
controlled access datasets. This system, data derived from study-specific ana- community.
which is currently in its early pilot phase, lyses. These methods usually involve The potential alternatives to controlled
could eventually come to support the either cryptography (including homomor- access presented all show potential to
formal controlled access process or, a phic encryption), data suppression, data replace or complement it to different de-
substantial part of it. obfuscation, or the creation of synthetic grees. Open consent, in countries where
It would both seem inevitable and data (Erlich and Narayanan, 2014; Kuiper it is permitted and conditional upon very
beneficial that a substantial part of the et al., 2015; Wolfson et al., 2010). carefully designed protocols and consent
controlled access system also be auto- In theory, privacy-enhancing technolo- forms, could replace controlled access as
mated and registered in the near future. gies for data analyses promise to enable the default method for fundamental epi-
The precise extent to which data pro- researchers to perform complex statisti- genomics research using limited data-
ducers and funders will be able to dele- cal analysis on distributed data without sets. Registered access will come to
gate the work currently undertaken by compromising the personal data protec- play an increasingly important role in the
controlled access committees to com- tion of research participants. However, next few years and perhaps even replace
puters and the trade-offs this would most of these methods are currently in controlled access for much epigenomic
imply remain to be determined. The their pilot phase or suffer from technical, data sharing. However, it will take some
outlook of registered access/automated legal, financial, and political challenges time before we can assess its full potential
access models for the future of epige- that have prevented their broad imple- for data sharing. Privacy-enhancing tech-
nomic data sharing seems very prom- mentation. The few that may be ready nologies for data analysis will likely be
ising, as they could provide simple for implementation on a larger scale still used as alternatives to controlled access
cost-effective solutions to the data ac- require a broad user base ready to adopt within small research groups and consor-
cess quandary. harmonized security protocols and dis- tia. However, it will be a significant chal-
ease and phenotype ontologies and to lenge for a single model to achieve broad
Privacy-Enhancing Technologies use a common analysis model. Given usage by the research community and
The fields of information technology these serious limitations, these privacy- remain current over a significant period
security and bioinformatics have also enhancing control technologies may not of time.
contributed data sharing solutions with be ready for broad adoption in the next Beyond all these data sharing methods,
the potential to increase privacy protection few years. However, research and discus- it should also not be forgotten that privacy
of OMICS research participants. Good in- sion on these methods should continue in as an ethical concept and as a funda-
formation technology security practices, the hope that more interesting models mental human right is not static. The
including firewalls, virus patches, encryp- may emerge from the pack and gather privacy concerns and expectations of
tion, data de-identification techniques, the political consensus needed for their research participants are likely to evolve
authentication schemes, and password validation and adoption by the scientific in the coming years as the implications
protections, are now common in bio- community. of data-intensive health research and the
banking (Westfall et al., 2012). The bene- computerization of health data become
fits of adopting current information tech- Conclusion better understood by stakeholders. In
nology security practices are significant, Our brief overview of the main trends in particular, in the future, open sharing
and they should be a given in any reposi- secure, efficient, and ethical data sharing of epigenomics data for medical research
tory storing personal health data. did not identify a model capable of replac- may eventually become quite com-
In addition to these best practices, ing controlled access in the near future. mon and considered a low-risk activity
information technology security and bio- Currently, this model offers a satisfactory deserving only a limited amount of pro-
informatics experts have addressed level of governance and oversight for hu- cedural scrutiny.
the challenge raised by the sharing man epigenomics datasets that could be
of ‘‘OMICS’’ data and accompanying used to re-identify research participants. ACKNOWLEDGMENTS
phenotype information by looking at it Given that OMICS and other medical
from a different perspective than the one research data will increasingly be made The authors would like to thank Ms. Katie Saulnier
taken by policymakers. These experts available through cloud computing infra- from McGill Centre of Genomics and Policy for
have, in recent years, proposed a number structure operated by third parties and editorial assistance on the manuscript. We also
of methods (e.g., Data Shield, the UK data subject to complex contractual terms, gratefully acknowledge the financial support of
CIHR through awards EP1-120608 and EP2-
service Secure Lab, the Beacon Project, the importance of adopting sophisticated
120609, Genome Quebec, Genome Canada, the
ViPAR, partial derivatives meta-analysis, and responsive governance approaches Government of Canada, the Ministère de l’Écono-
the Hybrid Synthetic Microdata Platform) to the sharing of sensitive data should mie, Innovation et Exportation du Québec (Can-
that permit users to either (1) more be considered an ethical imperative. SHARE grant 141210), FRQ-S Chercheur Boursier
Cell 167, November 17, 2016 1153

Junior 2- 30719, and the Canada Research Chair in Erlich, Y., and Narayanan, A. (2014). Nat. Rev. Rothstein, M.A., Knoppers, B.M., and Harrell, H.L.
Law and Medicine. Genet. 15, 409–421. (2016). J. Law Med. Ethics 44, 161–172.
Greenbaum, D., Sboner, A., Mu, X.J., and Gerstein, The Global Alliance for Genomics and Health
REFERENCES
M. (2011). PLoS Comput. Biol. 7, e1002278. (2016). Science 352, 1278–1280.
Dyke, S.O.M., Cheung, W.A., Joly, Y., Ammerpohl, Joly, Y., Zeps, N., and Knoppers, B.M. (2011).
Toronto International Data Release Workshop
O., Lutsik, P., Rothstein, M.A., Caron, M., Busche, Hum. Genet. 130, 441–449.
Authors, Birney, E., Hudson, T.J., Green, E.D.,
S., Bourque, G., Rönnblom, L., et al. (2015). Kaye, J. (2012). Annu. Rev. Genomics Hum. Genet. Gunter, C., Eddy, S., Rogers, J., Harris, J.R.,
Genome Biol. 16, 142. 13, 415–431. Ehrlich, S.D., Apweiler, R., Austin, C.P., et al.
Dyke, S.O.M., Kirby, E., Shabani, M., Thorogood, (2009). Prepublication data sharing. Nature
Kuiper, J., van den Heuvel, E.R., and Swertz, M.A.
A., Kato, K., and Knoppers, B.M. (2016a). Eur. J. 461,168–170.
(2015). Biopreserv. Biobank. 13, 178–182.
Hum. Genet. 28. advance online publication.
Lunshof, J.E., Chadwick, R., Vorhaus, D.B., and Westfall, J.E., Kim, C.M., and Ma, A.Y. (2012). Int.
http://dx.doi.org/10.1038/ejhg.2016.115.
Church, G.M. (2008). Nat. Rev. Genet. 9, 406–411. J. Inf. Manage. 32, 419–430.
Dyke, S.O.M., Philippakis, A.A., Rambla De Argila,
J., Paltoo, D.N., Luetkemeier, E.S., Knoppers, Milius, D., Dove, E.S., Chalmers, D., Dyke, S.O.M., Wolfson, M., Wallace, S.E., Masca, N., Rowe, G.,
B.M., Brookes, A.J., Spalding, J.D., Thompson, Kato, K., Nicolás, P., Ouellette, B.F., Ozenberger, Sheehan, N.A., Ferretti, V., LaFlamme, P., Tobin,
M., Roos, M., et al. (2016b). PLoS Genet. 12, B., Rodriguez, L.L., Zeps, N., and Joly, Y. (2014). M.D., Macleod, J., Little, J., et al. (2010). Int. J. Epi-
e1005772. Nat. Biotechnol. 32, 519–523. demiol. 39, 1372–1382.
1154 Cell 167, November 17, 2016

Leading Edge
Commentary
Building Bridges through Scientific Conferences

Juleen R. Zierath1,2,3,*
1Department of Molecular Medicine and Surgery, Integrative Physiology, Karolinska Institutet, 171 77 Stockholm, Sweden
2Department of Physiology and Pharmacology, Integrative Physiology, Karolinska Institutet, 171 77 Stockholm, Sweden
3Section of Integrative Physiology, The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical
Science, University of Copenhagen, 2200 Copenhagen, Denmark

*Correspondence: juleen.zierath@ki.se
Getting together to exchange ideas, forge collaborations, and disseminate knowledge is a long-
standing tradition of scientific communities. How conferences are serving the community, what
their current challenges are, and what is in store for the future of conferences are the topics covered
in this Commentary.
Conferences are a big part of scientists’ ings provide an excellent opportunity later, I still have great appreciation for
lives. They provide an important forum to bring people together with different that experience and how invaluable it
to bring a research community together views and advance a field of study. was at such an early stage of my career.
and serve as a platform to disseminate Throughout my career, I have had the Presenting my unpublished work gave
knowledge and forge collaborations. opportunity to participate in various sci- me confidence, as it allowed me to test
However, we are now in an era of rapid entific conferences as attendee, speaker, new ideas and to get constructive feed-
technological changes that influence the and organizer. I have also had the privi- back on the project. Moreover, it opened
way in which the research community lege of working with nonprofit organiza- a network that has enriched my career.
interacts and communicates science. tions such as Keystone Symposia as I have since learned that, for many scien-
With so many emerging forms of digital Chair of the Board of Directors and pro- tists, this is not an uncommon experience,
communication and social media, one fessional societies such as the European and I perceive it to be one of the values of
can question whether the face-to-face Association for the Study of Diabetes as face-to-face meetings.
meeting is still relevant. The globalization President and Member of the Executive
of science brings many stakeholders Committee. Collectively, these experi- Not All Conferences Are Born Equal
from diverse regions of the world to the ences have given me insight from several Every week, numerous scientific confer-
table, and with this comes a need to find different directions into the future of sciences are taking place around the world.
a common meeting ground to discuss entific conferences. With this short Com- These meetings come in many different
emerging biology. This is compounded mentary, I will share thoughts, ideas, and flavors. But, scientists do not have limit-
by the fact that, today, individual sci- personal reflections on the importance of less amounts of time and money to go to
entists are facing tighter budgets, and scientific conferences, focusing on ques- every meeting. Therefore, it is vital to pri-
consequently, many investigators are cut- tions related to how they serve the com- oritize how resources are spent. Different
ting travel expenses to preserve precious munity, what the current challenges are, meetings serve different purposes, and
funding to maintain research activities. and the outlook for the future. one should choose carefully. Depending
Thus, with the potential risk that funds on your career stage, you may choose
to support research conferences may Breaking into the Business one type of conference over another. For
shrink, there may be fewer opportunities Each one of us has our own unique view of example, young trainees may be better
to participate in vital face-to-face meet- the importance of scientific conferences, off going to national meetings in which
ings. But, in order to make progress and this view is likely to shift during they can meet their local community
in research and education, scientists various stages of our career. One of my to learn more about what their peers are
are dependent on an open exchange earliest experiences of a scientific confer- doing, what techniques are available in
of knowledge and the ability to come ence was as a graduate student, where I nearby laboratories, and to inspire poten-
together to discuss, debate, and tackle had the honor of presenting my first short tial collaborations with the people readily
emerging issues related to their field of talk at a large society meeting. The mem- available to them. Post-doctoral fellows
study. ory of this experience is still quite vivid. and early-stage investigators may be bet-
The quest for new knowledge has been I practiced the talk numerous times and ter served by going to larger, international
a driving force for centuries. Even early discussed with my colleagues for hours meetings to ‘‘break into the business’’ and
Greek philosophers, including Socrates, in anticipation of possible questions I develop new contacts or collaborations.
Plato, and Aristotle, understood the might receive from the delegates. The More senior scientists may focus on
need to present new ideas, question final publication was greatly enhanced selective, invitation-only meetings (think-
dogma, and participate in vigorous by the input received during the presenta- tank style) in which the future direction
debate. In this regard, face-to-face meet- tion (Kaiserauer et al., 1989). Decades of the field may be set, as this can be

the place where they consolidate their terprise. For example, various societies nity to make new contacts. Don’t under-
worth and the value of their personal are developing mentorship programs, estimate the value of ‘‘down-time’’ during
‘‘brand.’’ With only so much time and young academies, and fellowship pro- a meeting. I have found that spontaneous
money for individual scientists to spend, grams that are specifically tailored to fos- conversations with my fellow meeting
more discipline may be necessary to opti- ter the next generation of talent. Some participants during the coffee breaks,
mize the deployment of these resources. organizations, such as The Keystone poster sessions, or even happenstance
No matter how you slice it, the exchange Symposia, offer a Fellows Program to meetings at the gym during my morning
of knowledge is critical for the advance- address the need for a program specif- work-out session have led to produc-
ment of science. Scientific conferences ically focused on the early career devel- tive collaborations that might not have
are not simply social get-togethers for opment of life scientists from both under- occurred had I not been in the right place
scientists to catch up and have a drink. represented and minority backgrounds at the right time. If you are in an environ-
They are important points of contact (King, 2012). The Women in Cell Biology ment where the main focus is on exciting
for networking, knowledge transfer, and committee of the American Society biology, there is a very good chance
career development. Because of this, for Cell Biology organizes career dis- you will come away with new ideas
funding for scientific conferences should cussion and mentoring roundtables, that change the way you do science.
be highly positioned on the research childcare awards, a Mentoring Theater, The face-to-face format allows for new
agenda for national funding agencies, pri- career-related panels and workshops, professional relationships to be forged.
vate foundations, and philanthropists and career recognition award in conjunc- A successful meeting is not only one
dedicated to the advancement of science tion with their scientific conferences (Ma- where you have learned new knowledge,
and education. sur, 2013). Today, scientific conferences but also one where a solid collaboration
offer much more than a meeting place has been formed.
Networking and Career to deliver oral and poster presentations
Development chucked full of your latest data. They offer Log in or Log out?
Each year, countless numbers of stu- countless opportunities for both planned My own bias is that there is an increasing
dents, post-doctoral fellows, and early/ and spontaneous interactions, where need for scientists to come together for
mid-stage career scientists participate one can make key contacts and discuss face-to-face interactions at scientific con-
in scientist meetings and present their important issues that may impact your ferences, but there is also room for a
research. These opportunities are an career trajectory. virtual meeting experience. This can take
important component of research training the form of a one-off digital conference
and education that greatly contribute to Every Minute Matters of a single lecture on a specific topic or
the advancement of the next generation Science is moving at an extraordinary part of a larger offering of an annual
of future scientific leaders. Scientific con- pace, and the information flow can be meeting from a professional society. The
ferences provide an excellent forum to overwhelming with thousands of new pa- digital conference experience can add to
network with both peers and leaders in pers published each month. I often find one’s effort to keep up with science. For
the field, all on an even playing ground that one of the best ways to get updated example, many larger conferences often
due to the fact that the pursuit of new on the latest breakthroughs is through run several parallel sessions, making it
knowledge is a common denominator. In participation in a scientific conference. It impossible for even the most energetic
addition to scientific sessions, many con- is also a way to get introduced to an person to be in two or more session halls
ferences also provide targeted workshop emerging field of science. Even though at the same time. Thus, one can choose to
opportunities for students and early-stage the speed of scientific publishing has attend a particular session in ‘‘real time’’
scientists to gain insight into career devel- dramatically increased, largely due to and follow up online with the other parallel
opment, access to mentorship, informa- the digital age, sometimes the fastest sessions of interest at a more convenient
tion related to job openings in academia way to get insight into the latest discovery time ‘‘online.’’ Access to virtual meeting
and industry, and hints on publishing in your field may be from hearing a pre- programs can also save time and
and grant writing to benefit career pro- sentation of unpublished data at a scienti- money. For example, spending some
gression (King, 2013). While some of fic conference. One question to think hours viewing a recent digital account of
these issues can be addressed in univer- about is whether attendance at a scientific talks from a meeting and then dialing in
sities and colleges, often the experience conference taught you something that for a live question and answer session
of coming together with like-minded saved you time or money, and if so, how can be a great time and money saver
individuals, but from diverse profes- much? You might learn about a new and add to, but not replace, the face-to-
sional environments, adds an additional assay, become aware of or gain access face conference experience. Virtual meet-
perspective to enrich career develop- to a key reagent, or benefit from insight ings may also give you the chance to keep
ment. Scientific conferences have an into exciting biology that changes the up with a peripheral interest. The good
important role to play in attracting and way you address your own science. Every news is that there are so many more
energizing new talent into a field. One of meeting should have a purpose that leads new ways to gain access to live and
the challenges for conference organizers to an outcome. Minimally, you should join achieved scientific conferences. But,
is to better integrate students and early- in the conversation and participate in the one cannot underestimate the face-to-
stage scientists into all aspects of the en- debate. Take advantage of the opportu- face experience. By being present in
1156 Cell 167, November 17, 2016

Figure 1. The Pathogenesis of Type 2 Diabetes Is Not Fully Understood
Scientific conferences are vital for creating a forum to bring together students and investigators from a variety of backgrounds to challenge dogma, test new
ideas, and bring clarity to research problems. Complex biology, where different aspects of a problem are investigated by independent groups, is unlikely to be
resolved by scientists working in isolation. Adapted from De Meyts (1993) and reproduced with permission from Professor Pierre De Meyts.
person, you can form a stronger network exchange remains. Broader expertise don’t consider presentations at meetings
than you could by attending a conference from many disciplines may be required to be previous publications that under-
online. You never know who you might to tackle the complex etiology of a dis- mine the novelty of a set of findings.
meet at a scientific conference—it could ease like type 2 diabetes. Progress will Thus, presentations at scientific confer-
be the head of a new research institute, not be made in isolation. Many fields ences could be brought into the historical
the founder of an up and coming biotech evolve in large part because of the de- record by becoming more formally inte-
company, or the next Nobel Laureate. bates and discussions that occur at sci- grated with the eventual journal publica-
My own view is that virtual meetings entific meetings. tion of the work. How future meetings
will never replace face-to-face versions can take advantage of this attribute to
because of this unpredictable human On the Horizon help the dissemination of scientific ideas
element. So how can more people be brought to while, at the same time, increasing their
the table to address the vast research relevance and their value is an ongoing
Solving Complex Problems questions that we as scientists face? concern for many professional societies
Throughout the years, some of the stron- One could imagine the ‘‘future’’ scientific and associations.
gest scientific partnerships have been conference where people gather for a
formed through interactions in scientific face-to-face meeting where new and Building Bridges and Filling in the
conferences. Taking an example from unpublished research is presented. This Gaps
my own field of diabetology, scientific could be interfaced with a virtual offering In order to tackle the complexity of the
conferences have provided a forum for of the various presentations that are sup- many communicable and non-communi-
long-standing debates on the pathogen- ported with an online discussion or ques- cable diseases facing humankind, clinical
esis of type 2 diabetes. Given the tion and answer forum to bring in a wider and experimental scientists in inter-
complexity of this metabolic disease, the community. This would leave a public and cross-disciplinary areas will need to
contributions of countless investigators record of the pioneering work presented continue to come together as a commu-
have been vital to progress, and synthesis at the face-to-face meeting, while also nity in face-to-face interactive exchanges
of different ideas and approaches that creating a forum for continued dialog to share knowledge. Thus, in the foresee-
happens at meetings (Figure 1) have pro- that community could become actively able future, while digital and social media
vided a greater understanding of etiology engaged in. The hope would be that the can play an important role, scientific
and pathophysiology of the disease, with online comments and dialogue might conferences will remain vital to serve
multiple treatment options now being help improve the work before it finds its the community by offering an arena to
available. Despite this, there is no cure way to the official scientific record in the advance the dialog, debate, and discuss.
for diabetes, and the need for scientific form of a journal publication. Journals Face-to-face meetings give you the
Cell 167, November 17, 2016 1157

chance to expand your knowledge base tion. In the regard, scientific conferences man element of scientific conferences cannot
by learning what is going outside of your are important for bridging communities be underestimated.
own region and meeting collaborators and helping us all to fill in the knowledge
REFERENCES
from around the world. Even as a senior gaps.
investigator with an interest in life-long De Meyts, P. (1993). Adv. Exp. Med. Biol. 334,
learning, rarely do I come away from a ACKNOWLEDGMENTS 89–100.
meeting without gaining new knowledge Kaiserauer, S., Snyder, A.C., Sleeper, M., and
or meeting someone who has changed I am particularly grateful to Pierre De Meyts, Zierath, J. (1989). Med. Sci. Sports Exerc. 21,
the way I think about my own work. Often Curtis C. Harris, Anna Krook, Juan Carlos Lopez, 120–125.
and Jane L. Peterson for valuable discussions
that new kernel of information comes from King, L. (2012). Trends Mol. Med. 18, 699–701.
and input during the writing of this text. It is
interacting with an early-stage investi- worth noting that I met all of these individuals King, L. (2013). Trends Biochem. Sci. 38,
gator who is developing a new assay or through my participation in different meetings 373–375.
approach to address a particular ques- over the years, reinforcing the fact that the hu- Masur, S.K. (2013). Mol. Biol. Cell 24, 57–60.
1158 Cell 167, November 17, 2016

Leading Edge
Previews
A New Path through the Nuclear Pore

Alejandro Gozalo1 and Maya Capelson1,*
1Department of Cell and Developmental Biology, Epigenetics Program, Perelman School of Medicine, University of Pennsylvania,
Philadelphia, PA 19104, USA

*Correspondence: capelson@mail.med.upenn.edu
Knowing the configuration of the nuclear pore is essential for appreciating the underlying mecha-
nisms of nucleo-cytoplasmic communication. Now, Fernandez-Martinez et al. present a high-
resolution structure of the cytoplasmic nuclear pore-mRNA export holo-complex, challenging our
textbook depiction of this massive membrane-embedded complex.
Often referred to as the gateway to through the central transport channel of scopy (EM) to determine the average
the genome, the nuclear pore complex the NPC, lined with FG-rich Nups. At the morphology and dimensions of the
(NPC) functions as a selectively perme- cytoplasmic face of the NPC, an RNA heli- complex; (3) integration of available and
able channel that mediates nucleo-cyto- case Dbp5 and its binding partner Gle1 similar atomic structures of individual
plasmic transport, such as import of remodel the mRNP, removing export re- domains and components; (4) NMR data
proteins and export of newly generated ceptors, which ensures the directionality on disordered FG repeat domains; and
mRNAs. While the general picture of of mRNP transport and results in release (5) chemical crosslinking, followed by
transport across the NPC has been of the mRNA into the cytoplasm for trans- mass spectrometry (CX-MS), to define
drawn, molecular understanding of the lation. The remodeling of mRNPs occurs protein residues that are in spatial
functional interplay between the NPC in association with the Nup82-holo proximity. Through this highly integrative
and the transporting complexes remains complex that is comprised of Nup159, approach, the authors were able to
incomplete, largely because of the diffi- Nup82, Nsp1, and Dyn2 and acts as a generate a model structure of the Nup82
culty of solving the structure of such binding hub for the mRNP remodelers complex that satisfied all of the input
an enormous multi-subunit complex. The Gle1 and Dbp5. The last step of remodel- constraints at the final 9.0 Å resolution.
65–125 MDa NPC consists of multiples ing has been proposed to occur on cyto- The solved structure demonstrates that
of 30 different components, termed plasmic fibrils, comprised of the Nup82 Nup82 holo-complex assembles into an
nucleoporins (Nups), and various unstruc- complexes, which are thought to protrude asymmetric ‘‘D’’-shaped particle formed
tured domains, containing stretches of away from the NPC scaffold into the cyto- by compositionally identical subunits,
phenylalanine-glycine (FG) repeats, are plasm (Figure 1). This step is one of the each consisting of Nup82, Nup159, and
present within many Nups (D’Angelo and less understood transitions in the mRNA Nsp1, which bind to a Dyn2 dimer. One
Hetzer, 2008). FG-rich Nups comprise export process because, from the earlier of the subunits forms the ‘‘rod’’ while the
the NPC sub-complexes that interface proposed structure of the NPC, it is un- other forms the ‘‘loop’’ of the D-shaped
with mRNA export machinery and play clear how mRNPs would ‘‘jump’’ from holo-complex.
key roles in mRNA export (Knockenhauer the inner channel to the distally located With a structure of the Nup82 holo-
and Schwartz, 2016). In this issue of Cell, cytoplasmic fibrils and how the process complex in place, the authors used
Fernandez-Martinez et al. (2016) pre- of mRNP transport is coupled to its CX-MS to determine how the holo-com-
sents a sub-nanometer-resolution struc- final remodeling (Folkmann et al., 2011; plex associates with the rest of the NPC.
ture of the yeast Nup82 holo-complex, Knockenhauer and Schwartz, 2016). The majority of the identified crosslinks
which not only provides critical infor- In the current study, the authors base connected the Nup82 complex to the
mation for understanding the molecu- their approach on previously devel- NPC scaffold via the Nup84 Y-shaped
lar underpinnings of mRNA export but oped modeling methods, which integrate complex. Combining this data with previ-
also reveals that our generally accepted diverse sets of biochemical and biophys- ously determined density maps and crys-
depiction of the nuclear pore has been ical data to assemble structures of large tallographic data on the yeast Nup84
incorrect for decades. complexes, including the recently solved complex (Alber et al., 2007; Kelley et al.,
Export of mRNA is a multi-step process structure of the scaffold complexes of 2015), the authors put forward a structural
involving several key interactions with the the NPC (Shi et al., 2015; von Appen model for the entire Nup82-Nup84 com-
NPC (Folkmann et al., 2011; Knockenha- et al., 2015). The extensive array of plex assembly. Unexpectedly, the model
uer and Schwartz, 2016). After post-tran- high-resolution methods and information revealed that the Nup82 holo-complex is
scriptional modifications and processing, used here is remarkable. It includes (1) not extended away from the NPC scaf-
mRNA associates with ribonucleopro- quantitative mass spectrometry, such as fold, as has been proposed and drawn
teins (RNP) to form mRNP particles. With QconCAT-MS, on the purified Nup82 for decades, but instead positions right
the assistance of export receptor pro- complex to determine its stoichiom- over the scaffold ring of the NPC and
teins Mex67 and Mtr2, mRNPs traverse etry; (2) negative stain electron micro- faces downward into the central transport

translocation fusions of Nup214 are
strongly linked to several types of cancer
(Xu and Powers, 2009), while mutations
in the human homolog of Gle1 are impli-
cated in congenital human disorders
and in Amyotrophic Lateral Sclerosis
(ALS) (Kaneb et al., 2015). In particular,
the disease-associated mutations in
human Gle1 map to the predicted sites
of interaction between Gle1 and the
Nup82 holo-complex (Fernandez-Marti-
nez et al., 2016), supporting the potential
of this structure in informing us on the
molecular underpinnings of these dis-
eases. It is foreseeable that future struc-
ture-function investigations inspired by
this work will advance our understanding
Figure 1. Previous and Revised Models of the NPC and Their Implications for Mechanisms of of other NPC-associated processes,
mRNA Export including protein import, developmental
The newly obtained structure of the Nup82-Nup84 cytoplasmic complex assembly demonstrates that the
regulation, and RNA processing. The
previously proposed extended cytoplasmic filaments, formed by the Nup82 complex, are in fact more like
struts oriented inward toward the NPC transport channel. The Nup82 complex FG domains, which interact field is closing in on the full structure of
with transport receptors, are thus positioned in close proximity to the FG domains of the transport channel, the nuclear pore, an enormous achieve-
forming an ‘‘FG continuum.’’ This model reveals a streamlined mRNA export process, with an easy route ment in structural biology and a gateway
for the traveling mRNP particles from the inner transport channel to the remodeling proteins bound by the
Nup82 ‘‘struts.’’ to understanding this multi-functional
supercomplex.
channel of the NPC, with its FG regions a collection of truncation mutants for
REFERENCES
projecting similarly into the channel the Nup84 sub-complex components.
(Figure 1). Remarkably, mutations in the Nup84 Alber, F., Dokudovskaya, S., Veenhoff, L.M.,
This structural information has a far- complex components that resulted in Zhang, W., Kipper, J., Devos, D., Suprapto, A.,
reaching impact on our understanding an mRNA export defect mapped largely Karni-Schmidt, O., Williams, R., Chait, B.T., et al.
of mRNA export. It demonstrates that, to the Nup85-Seh1 arm of the Nup84 (2007). Nature 450, 695–701.
instead of distant cytoplasmic fibrils, complex ‘‘Y,’’ which is precisely the D’Angelo, M.A., and Hetzer, M.W. (2008). Trends
the Nup82 holo-complexes form ‘‘struts’’ interface that the structure predicts to Cell Biol. 18, 456–466.
that hover immediately over the exit interact with the Nup82 holo-com- Fernandez-Martinez, J., Kim, S.J., Shi, Y., Upla, P.,
point of the traversing mRNP particles plex. Consistently, truncation mutants Pellarin, R., Gagnon, M., Chemmama, I.E., Wang,
(Figure 1). The FG domains, projected of Nup85 exhibited loss of Nup82 from J., Nudelman, I., Zhang, W., et al. (2016). Cell
from the ‘‘struts,’’ thus form an FG con- the nuclear pore, as assessed with 167, this issue, 1215–1228.
tinuum with the underlying FG domains a Nup82-GFP reporter, reinforcing the Folkmann, A.W., Noble, K.N., Cole, C.N., and
of the central channel Nups, providing notion that mRNA export defects ex- Wente, S.R. (2011). Nucleus 2, 540–548.
a straightforward path for the trav- hibited by the Nup84 complex are Kaneb, H.M., Folkmann, A.W., Belzil, V.V., Jao,
eling mRNPs to the remodeling pro- explained by its interaction with the L.E., Leblond, C.S., Girard, S.L., Daoud, H., Nor-
eau, A., Rochefort, D., Hince, P., et al. (2015).
teins bound to Nup82 complex. In this Nup82 holo-complex.
Hum. Mol. Genet. 24, 1363–1373.
manner, the mRNP particles could make The study further demonstrated that
efficient contact with Dbp5 and Gle1 to this structure of the yeast Nup82-Nup84 Kelley, K., Knockenhauer, K.E., Kabachinski, G.,
and Schwartz, T.U. (2015). Nat. Struct. Mol. Biol.
undergo final remodeling, no ‘‘jumping’’ complex assembly aligns well with the
22, 425–431.
required. available cryo-EM density map of the hu-
Knockenhauer, K.E., and Schwartz, T.U. (2016).
This model of mRNA export was man NPC (von Appen et al., 2015), sug-
Cell 164, 1162–1171.
further validated by the identification gesting that the unexpected configuration
Shi, Y., Pellarin, R., Fridy, P.C., Fernandez-Marti-
of crosslinks of the Nup82 holo-com- of the cytoplasmic portion of the NPC
nez, J., Thompson, M.K., Li, Y., Wang, Q.J., Sali,
plex to mRNA export machinery, which and likely the proposed model of mRNA A., Rout, M.P., and Chait, B.T. (2015). Nat.
demonstrated that Gle1 and its Dbp5- export are evolutionarily conserved. This Methods 12, 1135–1138.
interacting domain are similarly oriented is particularly exciting since components von Appen, A., Kosinski, J., Sparks, L., Ori, A., Di-
by the Nup82-Nup84 scaffold down- of the human homolog of the Nup82 Guilio, A.L., Vollmer, B., Mackmull, M.T., Banterle,
ward, toward the inner channel (Fernan- holo-complex, the Nup88-Nup214 com- N., Parca, L., Kastritis, P., et al. (2015). Nature 526,
dez-Martinez et al., 2016). Furthermore, plex, have been associated with a variety 140–143.
to obtain functional in vivo validation of human pathologies. For instance, both Xu, S., and Powers, M.A. (2009). Semin. Cell Dev.
of the structure, the authors analyzed mis-expression of Nup88 and oncogenic Biol. 20, 620–630.
1160 Cell 167, November 17, 2016

Leading Edge
Previews
Veggies and Intact Grains

a Day Keep the Pathogens Away
Francesca S. Gazzaniga1 and Dennis L. Kasper1,*
1Division of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
*Correspondence: dennis_kasper@hms.harvard.edu
In this issue of Cell, Desai et al. compare how dietary fiber affects the gut microbiota and suscep-
tibility to disease. They find that a fiber-free diet promotes mucus-degrading bacteria and suscep-
tibility to Citrobacter rodentium infection.
The Western diet, characterized by reduced fiber influences the microbiota Further analysis of the intestines of
increased fat and sugar intake and and disease. these mice showed a thicker mucus
decreased fiber intake, has been impli- In a study in this issue of Cell, Desai layer in FR mice than in FF mice. This
cated in a wide variety of diseases, et al. (2016) designed a synthetic micro- difference suggests that, in the
including cancer, type 2 diabetes, and biota (SM) to investigate how dietary absence of dietary fiber, mucus-de-
cardiovascular disease (Cordain et al., fiber affects the composition of the gut grading bacteria outcompete fiber-
2005). Furthermore, the Western diet is microbiota and protection from disease. metabolizing bacteria by degrading the
associated with changes in the bacteria This SM is composed of 14 commensal host mucus lining. The mice on the
in our gut—i.e., our gut microbiota (Turn- species that represent the five dominant alternating FR and FF diets had a
baugh et al., 2008). These associations bacterial phyla present in a human mucus layer of intermediate thickness
have led to the premise that diet influ- gut. To characterize the metabolic prop- (Desai et al., 2016).
ences the gut microbiota, which in turn erties of each of these species, the To investigate whether the FF diet
influences health and disease. Generally, authors evaluated its growth in vitro with had any negative health effects, GF
the Western diet is less diverse than 42 different plant- and animal-derived and SM mice fed FR or FF diets were
more traditional diets (Turnbaugh et al., mono- and polysaccharides. Germ-free challenged with the mouse pathogen
2008). Many researchers hope that, if we (GF) mice colonized with the SM were Citrobacter rodentium. SM mice fed
determine which of the bacteria neces- fed fiber-rich (FR; 15% fiber from mini- the FF diet became much sicker than
sary to promote a healthy gut are missing mally processed grains and plants), mice fed the FR diet. GF mice on either
from our diet, we will be able to design fiber-free (FF), and prebiotic (Pre; purified diet did not display disease symptoms,
a remedial probiotic (bacterial) or prebi- soluble glycans found in prebiotics) diets. a result suggesting that a synergistic
otic (a supplement that feeds specific To imitate the fluctuating human diet, effect between the thin mucus layer
bacteria) or to change our diet to harness some groups of mice were alternately and mucus-degrading bacteria pro-
disease. fed FR and FF or Pre and FF diets, switch- motes C. rodentium disease (Figure 1).
To understand the mechanisms by ing every other day or every 4 days. The These data suggest that an FF diet pro-
which the Western diet influences the numbers of two mucus-degrading bacte- motes outgrowth of mucus-degrading
microbiota and disease, many studies of rial species—Akkermansia muciniphila bacteria, which in conjunction with a
mice have focused on high-fat diets (Dev- and Bacteroides caccae—increased on thinner mucus wall, increases suscepti-
kota et al., 2012; Mahana et al., 2016; the FF diet, whereas the growth of bility to the pathogen.
Turnbaugh et al., 2008). It is commonly two fiber-metabolizing species—Bacter- Using a synthetic, well-characterized
believed that increased fat intake is the oides ovatus and Eubacterium rectale— microbiota, this paper tracks how
cause for obesity-associated diseases, decreased. These changes also occurred dietary fiber influences gut bacterial
yet low-fat diets have failed to yield major when mice were fed alternating diets. The composition and how the bacteria stud-
health benefits (Taubes, 2001). Instead, Pre diet affected the microbial commu- ied affect gut health. Even though GF
other aspects of the Western diet nity in a manner similar to the FF diet; mice were gavaged with the same set
may be responsible for gut microbiota this observation suggests that eating of bacteria, dietary fiber influenced
changes and disease. In fact, recent foods containing prebiotics does not the ratio of these bacteria in the gut.
studies show that dietary fiber promotes have the same beneficial effect as actu- These results help explain why,
the growth of symbiotic bacteria that ally eating dietary fiber. Transcription although gut bacteria have been shown
increase the production of short-chain analysis of the bacteria confirmed these to play a major role in health and dis-
fatty acids and protect the host from findings, showing an increase in fiber- ease, studies of supplemental probiot-
a variety of diseases (Sonnenburg and metabolizing enzymes on the FR diet ics have found an underwhelming
Sonnenburg, 2014). Little is known, how- and in mucus-degrading enzymes on impact. The present study suggests
ever, about the mechanisms by which the FF diet (Desai et al., 2016). that, if probiotics are given but the

host’s diet does not promote their
growth, other bacteria will outcompete
them and any health benefits will be
lost. In addition, this investigation shows
that current prebiotics fail to exert the
same effects as a fiber-rich diet. Future
studies exploring how to prevent the
growth of mucus-degrading bacteria,
determining which prebiotics can pro-
mote the growth of beneficial bacteria,
and identifying which combinations of
bacteria are ideal for host health are
on the horizon. In the meantime, it looks
as if we should eat our vegetables . at
least every other day.
REFERENCES
Cordain, L., Eaton, S.B., Sebastian, A., Mann, N.,

Lindeberg, S., Watkins, B.A., O’Keefe, J.H., and
Brand-Miller, J. (2005). Am. J. Clin. Nutr. 81,
341–354.
Desai, M.S., Seekatz, A.M., Koropatkin, N.M., Ka-
mada, N., Hickey, C.A., Wolter, M., Pudlo, N.A., Ki-
tamoto, S., Terrapon, N., Muller, A., et al. (2016).
Cell 167, this issue, 1339–1353.
Devkota, S., Wang, Y., Musch, M.W., Leone, V.,
Fehlner-Peach, H., Nadimpalli, A., Antonopoulos,
D.A., Jabri, B., and Chang, E.B. (2012). Nature
487, 104–108.
Mahana, D., Trent, C.M., Kurtz, Z.D., Bokulich,
N.A., Battaglia, T., Chung, J., Müller, C.L., Li, H.,
Bonneau, R.A., and Blaser, M.J. (2016). Genome
Figure 1. Effects of Dietary Fiber and Gut Microbiota on the Colonic Mucus Barrier and
Med. 8, 48.
Pathogen Susceptibility
Mice colonized with a synthetic microbiota and fed a fiber-rich diet have more fiber-degrading bacteria Sonnenburg, E.D., and Sonnenburg, J.L. (2014).
and a thick mucus lining and are protected from Citrobacter rodentium infection. In contrast, mice fed a Cell Metab. 20, 779–786.
fiber-free diet have an outgrowth of mucus-degrading bacteria and a thin mucus layer and are susceptible Taubes, G. (2001). Science 291, 2536–2545.
to C. rodentium infection. Germ-free mice on either a fiber-rich or a fiber-free diet have a thin mucus layer
but are protected from C. rodentium infection. This study suggests that, on a fiber-free diet, mucus-de- Turnbaugh, P.J., Bäckhed, F., Fulton, L., and
grading bacteria outcompete fiber-degrading bacteria, erode the mucus layer, and promote susceptibility Gordon, J.I. (2008). Cell Host Microbe 3,
to C. rodentium infection. 213–223.
1162 Cell 167, November 17, 2016

Leading Edge
Minireview
The Ties That Bind: Mapping the Dynamic

Enhancer-Promoter Interactome
Cailyn H. Spurrell,1 Diane E. Dickel,1 and Axel Visel1,2,3,*
1MS 84-171, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
2U.S. Department of Energy Joint Genome Institute, Walnut Creek, CA 94598, USA
3School of Natural Sciences, University of California, Merced, CA 95343, USA
*Correspondence: avisel@lbl.gov
Coupling chromosome conformation capture to molecular enrichment for promoter-containing

DNA fragments enables the systematic mapping of interactions between individual distal regulatory
sequences and their target genes. In this Minireview, we describe recent progress in the application
of this technique and related complementary approaches to gain insight into the lineage- and cell-
type-specific dynamics of interactions between regulators and gene promoters.
Distal regulatory elements, such as enhancers, play a central enriched in non-coding loci harboring regulatory functions
role in controlling expression in mammalian genomes. Enhancer (Maurano et al., 2012), but specific examples of non-coding
sequences act as substrates for binding of tissue-specific tran- sequence variants conclusively and mechanistically linked to
scription factors and drive transcription through physical inter- disease remain limited. The functional genome annotations
action with gene promoters (Spitz and Furlong, 2012). Recent from the series of new papers (Schmitt et al., 2016; Javierre
chromatin profiling studies reveal the exceptional cell type and et al., 2016; Pellacani et al., 2016) along with a computational
temporal specificity of enhancer activity, which exceeds that of algorithm capable of integrating epigenomic findings described
other classes of gene regulatory sequences (Ernst and Kellis, in Breeze et al. (2016) provide handy tools for addressing
2010; Nord et al., 2013). This stunning specificity, alongside the gap between disease-associated non-coding variants and
advances in sequencing technologies and the increasingly their regulatory gene targets. Using these complementary tech-
recognized importance of non-coding sequences in human niques to explore the regulatory landscape in human tissues and
development and disease, has driven large-scale efforts to isolated primary cell populations, these studies report insights
annotate regulatory elements and gene transcription in the and resources that will be instrumental in linking variants with
human genome under a wide variety of conditions. The Interna- causal mechanisms of disease.
tional Human Epigenome Consortium (IHEC) (Bae, 2013) con-
nects many of these projects, with the goal of characterizing Insights into Cell-Type-Specific Regulation
1,000 epigenomes from different human cell types at diverse Histone ChIP-seq has now become a standard method to
developmental stages and disease states. identify regulatory regions genome-wide (Park, 2009). ChIP-
New studies published in this issue of Cell and in Cell Reports seq combines chromatin immunoprecipitation of modified
and described in greater detail throughout the following sections histones with high-throughput sequencing to identify active
of this Minireview build upon IHEC efforts to explore the role enhancers and other regulatory features. While the underlying
of cell-type-specific regulation and begin to address several DNA sequence does not vary between cell types, histone
important challenges in the field (Schmitt et al., 2016; Javierre modifications mark regions that are active or repressed in vivo
et al., 2016; Breeze et al., 2016; Pellacani et al., 2016). In brief, in a tissue-specific manner. When paired with technologies for
Pellacani et al. (2016) tackle the question of cell type specificity capturing specific cell types, ChIP-seq can be used to identify
of enhancers across the individual cell types that make up differential regulation in cell populations derived from heteroge-
heterogeneous tissues. The authors use chromatin profiling neous tissue. An elegant example of this approach is provided
methods to identify regulatory elements active in the distinct by Pellacani et al. (2016), who generate histone ChIP-seq, DNA
cell populations that comprise mammary tissue. While chromatin methylation, and gene expression data to identify cell-type-spe-
profiling is powerful for identifying predicted enhancer se- cific regulatory elements in primary human mammary tissue.
quences, it is limited in its ability to elucidate the gene target(s) Consistent with previous findings (Gascard et al., 2015), their
of the predicted enhancers. To address this challenge, Javierre results show widespread differences among the different cell
et al. (2016) and Schmitt et al. (2016) use cutting-edge chromo- types isolated from this heterogeneous tissue and relative
some conformation capture techniques to map enhancer-pro- to previous results from immortalized mammary cell lines. The
moter interactions in a variety of human tissues and primary biological relevance of these observations is reinforced by
cell types. Finally, disease-associated variants identified in the findings that differential enhancer utilization in mammary
genome-wide association studies (GWAS) are overwhelmingly cell types is consistent with cell-specific gene expression
non-coding (Altshuler et al., 2010; Visel et al., 2009) and are and that cell-type-specific enhancers are enriched for unique

transcription factor binding sites. This view of enhancer activity in libraries with far lower complexity than standard Hi-C,
mirrors results from previous chromatin profiling studies, and greatly reducing the amount of sequencing required and re-
these data allow the authors to derive insights into the cells sulting in high-resolution maps showing interactions between
that make up a complex tissue. promoters and other loci. Javierre et al. applied this method
to 17 primary human cell types from the hematopoietic lineage
3D Chromatin Structure Links Enhancers to Genes to further characterize the types of loci that interact with
While ChIP-seq can identify differential activity of regulatory promoters and to understand how long-range interactions
elements across tissues and cell types, it does not provide between promoters and other loci evolve during cell differen-
evidence that formally links individual distal regulatory elements tiation.
to their respective target genes. Tools based on chromosome The observed interactions anchored on promoters span a
conformation capture (3C) enable the identification of genomic median distance of 300 kb, and the distal interacting partners
regions that can be far apart in the linear genome sequence do not always link to the closest gene by linear distance. Consis-
but are proximate in three-dimensional (3D) space within the tent with the Schmitt et al. (2016) study, these distal regions iden-
nucleus. Hi-C, one variant of 3C, identifies these distal yet inter- tified as interacting with promoters are enriched for chromatin
acting partners on a global genomic scale by digesting cross- marks associated with active enhancers. Javierre et al. (2016)
linked chromatin and ligating physically interacting fragments further investigate the biological role of promoter-interacting re-
together (Lieberman-Aiden et al., 2009). The resulting libraries gions by comparing them to previously reported expression
are sequenced without further molecular enrichment for marks quantitative trait loci (eQTLs). Expression QTLs are identified
associated with any particular functional class of genomic ele- by measuring gene expression in a population of cells and linking
ments, thereby creating a largely unbiased genome-wide map expression differences to alleles of a sequence variant (Cookson
of chromatin architecture. The high complexity of these libraries et al., 2009). Using published eQTL data from several cell types,
requires deep sequencing to identify statistically significant the authors observe an enrichment for eQTLs in the promoter-in-
interactions. Thus, the approach was initially used to identify teracting regions from the same cell types. In particular, distal
megabase-scale topologically associated domains (TADs) of regions are enriched for eQTLs that associate with the same
chromosome organization (Dixon et al., 2012). This high-level interacting gene. This result supports that promoter-interacting
architecture tends to be conserved across cell types and regions have a functional regulatory role and that variation within
mammalian species, but initial datasets yielded limited insight promoter-interacting regions can be connected to potential
into intra-TAD interactions. Efforts to create higher-resolution gene targets.
maps require an order of magnitude more sequencing but are One important finding from Javierre et al. (2016) is that, in the
able to provide kilobase resolution views of such interactions hematopoietic lineage, chromatin architecture is highly dynamic
(Rao et al., 2014). and lineage-specific interactions delineate the myeloid and
A new paper by Schmitt et al. reports Hi-C analysis of 14 lymphoid regulatory landscape. The regulatory complexities of
primary human tissues and describes computational methods the promoter-interacting regions are schematically outlined in
to identify new features of genomic architecture. The authors Figure 1. The first column is an example of an invariant interac-
designed an algorithm to normalize sequencing depth variation tion between a single promoter and multiple enhancers across
across tissues, which allows them to identify both TADs and all cell types. While invariant interactions are abundant, many in-
cell-specific interactions. Consistent with the results from pre- teractions vary by cell type. Clustering the promoter-enhancer
vious cell-based studies, the authors observed that TAD struc- interactions shows a general divergence between interactions
ture is stable across different human tissues. Beyond the reso- found in the myeloid and lymphoid lineages. Schematic exam-
lution of TADs, however, high-resolution chromatin loops have ples of myeloid- and lymphoid-specific interactions are repre-
been described to partition the genome into smaller domains sented in columns 2 and 3 of Figure 1. These interactions are
within the TAD structure (Rao et al., 2014). Reinforcing these invariant within each lineage but divergent between the two
previous observations, a subset of the interactions reported cell lineages. Column 4 shows a CD4+ T cell-specific interac-
by Schmitt et al. represents a distinct set of sub-TAD regulatory tion, representative of cell-type-specific interactions, which
networks. The chromatin interactions within TADs show a were also observed in other individual cell types examined. Sur-
remarkable degree of tissue specificity; 40% of interactions prisingly, 80% of promoters had lineage- or cell-type-specific
are unique to one tissue type. These tissue-specific interaction interactions. Further showing the complexity of the regulatory
regions tend to be located near genes with tissue-specific network, in cells of the myeloid and lymphoid lineages the
expression, and they are enriched for marks of active en- same promoter may be regulated through different enhancer
hancers. These findings can begin to be used to directly link interactions (column 5), and one enhancer can interact with
genes with some of their non-coding regulatory elements, different promoters in a lineage-specific manner (column 6).
and they further demonstrate the diverse regulatory landscape Javierre et al. (2016) cluster these highly specific interactions
across human tissues. to create a detailed lineage tree of all 17 hematopoietic cell
A second paper, by Javierre et al. (2016), defines even more types that recapitulates the known relationships between
specific chromatin interaction architecture using a variant of different cell populations. Consistent with this, promoter-asso-
Hi-C that employs biotinylated RNA baits to enrich for interac- ciated enhancers are predicted to be active in a manner that
tions involving promoter sequences (Schoenfelder et al., mirrors the cell type specificity of expression of the interacting
2015). This promoter capture Hi-C (PCHi-C) technology results gene. The authors combined their chromatin interaction data
1164 Cell 167, November 17, 2016

Figure 1. Lineage-Specific Interactions of Promoters with Non-coding Regulatory Sequences
(Left) Javierre et al. (2016) used promoter capture Hi-C (PCHi-C) to systematically map interactions between promoters and promoter-interacting regions (PIRs)
across 17 primary human hematopoietic cell types (10 representative examples shown). (Right) Comparison of PIR-promoter interactions across cell types
reveals that some interactions are invariant across all cell types examined, while others are specific to major lineages or individual cell types. Importantly, some
promoters interact with different sets of PIRs depending on cell type, and vice versa, some PIRs interact with different sets of promoters in a cell-type-specific
manner.
with enhancer annotations and clustered genes according to conformation capture techniques complement these datasets
enhancer specificity for each cell type. This analysis identifies by linking tissue-specific enhancers with candidate gene targets,
sets of genes that are dynamically regulated in different cell and such approaches are increasingly being used to interpret
types across the hematopoietic tree. The correlation between non-coding disease-associated variation (Martin et al., 2015;
cell-type-specific enhancer activity and gene expression sup- Won et al., 2016). Most studies thus far have focused on
ports a functional role for these interactions in regulating cell one specific cell type or tissue to prioritize GWAS variants. In
fate and differentiation. contrast, Javierre et al. (2016) and Schmitt et al. (2016) analyze
genome interactions across many tissue types or cell popula-
Interpretation of Genetic and Epigenetic Variation in tions, further facilitating the prioritization of regulatory candi-
Disease dates. The papers show that lineage- and cell-type-specific
Elucidating the mechanistic role of non-coding sequence regulatory regions are enriched for genetic variation from associ-
variation in human disease remains an unmet challenge. Tissue- ation studies of phenotypes with similar cell specificity. Javierre
and cell-type-specific annotations of regulatory elements et al. (2016) also use lineage-specific interactions elucidated
generated by ChIP-seq are now widely available through the by PCHi-C to create a prioritized list of genes that may
work of the IHEC members and individual investigators. These be implicated in disease through interactions with disease-
efforts represent an important first step in bridging this gap, associated non-coding regions identified by GWAS. One type
and work is now being done to integrate these diverse maps of interaction diagrammed in Figure 1 is ‘‘lineage-specific pro-
together into high-confidence enhancer annotations to identify moter interactions.’’ Hypothetically, the presence of a pheno-
which disease-associated variants are most likely to impact type-associated variant in an enhancer that interacts with two
gene regulatory sequences (Dickel et al., 2016). Chromosome promoters in a relevant cell lineage would prioritize these genes
Cell 167, November 17, 2016 1165

over other nearby candidates, thereby helping to narrow down Cell Rep. 17. Published online November 15, 2016. http://dx.doi.org/10.
the list of genes whose misregulation might underlie the pheno- 1016/j.celrep.2016.10.059.
type. Javierre et al. (2016) outline how this strategy based on Cookson, W., Liang, L., Abecasis, G., Moffatt, M., and Lathrop, M. (2009). Nat.
PCHi-C data can be used to complement eQTL-based ap- Rev. Genet. 10, 184–194.
proaches, which require variants to have detectable effects on Dickel, D.E., Barozzi, I., Zhu, Y., Fukuda-Yuzawa, Y., Osterwalder, M., Man-
gene expression in order to link a regulatory sequence to a target nion, B.J., May, D., Spurrell, C.H., Plajzer-Frick, I., Pickle, C.S., et al. (2016).
Nat. Commun. 7, 12923.
gene (Guo et al., 2015). Their results highlight the strength of
using physical interaction data to link disease-relevant genes Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and
Ren, B. (2012). Nature 485, 376–380.
and enhancers.
Complementary to GWAS, epigenome-wide association Dunham, I., Kulesha, E., Iotchkova, V., Morganella, S., and Birney, E. (2014).
bioRxiv. http://dx.doi.org/10.1101/013045.
studies (EWAS) identify changes in the epigenome that are asso-
ciated with disease susceptibility. For example, previous EWAS Ernst, J., and Kellis, M. (2010). Nat. Biotechnol. 28, 817–825.
studies have found associations between specific changes in Gascard, P., Bilenky, M., Sigaroudinia, M., Zhao, J., Li, L., Carles, A., Delaney,
DNA methylation and phenotypic status (Liu et al., 2013). Build- A., Tam, A., Kamoh, B., Cho, S., et al. (2015). Nat. Commun. 6, 6351.
ing upon the success of the FORGE software (Dunham et al., Guo, H., Fortune, M.D., Burren, O.S., Schofield, E., Todd, J.A., and
2014), which intersects GWAS results with maps of DNase- Wallace, C. (2015). Hum. Mol. Genet. 24, 3305–3313.
hypersensitive sites to determine which disease-associated Javierre, B.M., Burren, O.S., Wilder, S.P., Kreuzhuber, R., Hill, S.M., Sewitz, S.,
variants fall into regulatory sequences, a new paper (Breeze Cairns, J., Wingett, S.W., Várnai, C., Thiecke, M.J., et al. (2016). Cell 167, this
et al., 2016) describes eFORGE, software designed to perform issue, 1369–1384.
similar analyses for EWAS results. The new tool maps regions Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy,
of differential methylation that have been implicated in dis- T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al.
(2009). Science 326, 289–293.
ease through EWAS to regulatory regions genome-wide. Thus,
eFORGE identifies potential mechanistic links between cell- Liu, Y., Aryee, M.J., Padyukov, L., Fallin, M.D., Hesselberg, E., Runarsson, A.,
Reinius, L., Acevedo, N., Taub, M., Ronninger, M., et al. (2013). Nat. Bio-
type-specific distal regulation and epigenome-wide association
technol. 31, 142–147.
studies, information that could aid in the development of disease
Martin, P., McGovern, A., Orozco, G., Duffus, K., Yarwood, A., Schoenfelder,
treatments.
S., Cooper, N.J., Barton, A., Wallace, C., Fraser, P., et al. (2015). Nat. Com-
The compelling new studies presented here use epigenomic mun. 6, 10069.
data to assess the regulatory architecture across an impressive
Maurano, M.T., Humbert, R., Rynes, E., Thurman, R.E., Haugen, E., Wang, H.,
range of primary human cells and tissues. Their findings empha- Reynolds, A.P., Sandstrom, R., Qu, H., Brody, J., et al. (2012). Science 337,
size the cell type specificity of regulatory interactions and the dy- 1190–1195.
namic nature of regulatory networks, and this information will Nord, A.S., Blow, M.J., Attanasio, C., Akiyama, J.A., Holt, A., Hosseini, R.,
be valuable for the interpretation of human disease findings. Phouanenavong, S., Plajzer-Frick, I., Shoukry, M., Afzal, V., et al. (2013). Cell
While this Minireview focused on assessing non-coding variants 155, 1521–1531.
from GWAS, cell-type-specific interactions can also be used Park, P.J. (2009). Nat. Rev. Genet. 10, 669–680.
to interpret rare non-coding variation from whole-genome Pellacani, D., Bilenky, M., Kannan, N., Heravi-Moussavi, A., Knapp, D.J.H.F.,
sequencing studies (Weedon et al., 2014), a technology that is Gakkhar, S., Moksa, M., Carles, A., Moore, R., Mungall, A.J., et al. (2016).
being adopted with increasing frequency for human disease Cell Rep. 17. Published online November 15, 2016. http://dx.doi.org/10.
studies. The computational and experimental resources from 1016/j.celrep.2016.10.058.
these epigenomic studies will be valuable for understanding Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D.,
chromatin structure, as well as for facing the considerable chal- Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., and Aiden,
lenge of linking non-coding variation with cell-specific mecha- E.L. (2014). Cell 159, 1665–1680.
nisms of disease. Schmitt, A.D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C.L., Li, Y., Lin, S., Lin, Y.,
Barr, C.L., and Ren, B. (2016). Cell Rep. 17. Published online November 15,
ACKNOWLEDGMENTS 2016. http://dx.doi.org/10.1016/j.celrep.2016.10.061.
Schoenfelder, S., Furlan-Magaril, M., Mifsud, B., Tavares-Cadete, F., Sugar,
This work was supported by National Institutes of Health grants R01HG003988, R., Javierre, B.-M., Nagano, T., Katsman, Y., Sakthidevi, M., Wingett, S.W.,
U54HG006997, U01DE024427, R24HL123879, and UM1HL098166. Research et al. (2015). Genome Res. 25, 582–597.
conducted at the E.O. Lawrence Berkeley National Laboratory was performed
Spitz, F., and Furlong, E.E.M. (2012). Nat. Rev. Genet. 13, 613–626.
under Department of Energy Contract DE-AC02-05CH11231, University of
California. Visel, A., Rubin, E.M., and Pennacchio, L.A. (2009). Nature 461, 199–205.
Weedon, M.N., Cebola, I., Patch, A.-M., Flanagan, S.E., De Franco, E., Cas-
REFERENCES well, R., Rodrı́guez-Seguı́, S.A., Shaw-Smith, C., Cho, C.H.-H., Lango Allen,
H., et al.; International Pancreatic Agenesis Consortium (2014). Nat. Genet.
Altshuler, D., Lander, E., and Ambrogio, L.A. (2010). Nature 476, 1061–1073. 46, 61–64.
Bae, J.-B. (2013). Genomics Inform. 11, 7–14. Won, H., de la Torre-Ubieta, L., Stein, J.L., Parikshak, N.N., Huang, J., Opland,
Breeze, C.E., Paul, D.S., van Dongen, J., Butcher, L.M., Ambrose, J.C., Bar- C.K., Gandal, M.J., Sutton, G.J., Hormozdiari, F., Lu, D., et al. (2016). Nature
rett, J.E., Lowe, R., Rakyan, V.K., Iotchkova, V., Frontini, F., et al. (2016). 538, 523–527.
1166 Cell 167, November 17, 2016

Leading Edge
Minireview
Concerted Genetic Function in Blood Traits

Sarah Kim-Hellmuth1,2 and Tuuli Lappalainen1,2,*
1New York Genome Center, New York, NY 10013, USA
2Department of Systems Biology, Columbia University, New York, NY 10027, USA
*Correspondence: tlappalainen@nygenome.org
The hematopoietic system plays a major role in human health. Two studies by Astle et al. and
Chen et al. published in this issue of Cell use genome-wide association and functional genomics
approaches to provide deep insights into the role of genetic variants in hematological traits.
We discuss these discoveries and future strategies toward completing our understanding of the
genetic basis for variation in human traits.
Introduction performed whole-genome sequencing, RNA-sequencing (RNA-

Blood plays a key role in a variety of vital functions, such as trans- seq), DNA methylation array, and H3K4me1 and H3K27ac
porting oxygen, nutrients, and metabolic waste, defending the ChIP-seq to identify regulatory effects of functional genetic var-
body against invading microbes, as well as healing wounds via iants in up to 197 healthy individuals. These data are used to
hemostasis. Disruption of normal hematological phenotypes analyze genetic versus epigenetic effects on gene expression,
(such as type, size, or number of blood cells) has direct links to molecular QTLs and their cell-type specificity, and GWAS
multiple diseases. Since blood is one of the most accessible tis- mechanisms.
sues of the human body, it has been extensively used for diverse During the past 10 years, epigenome-wide association studies
surveys into human biology, including measurements of clinical (EWAS) have sought to measure environmentally acquired epige-
biomarkers, cell type diversity, gene expression, and genetic as- netic effects on phenotypes (Birney et al., 2016). However, it is now
sociations to all these traits. increasingly understood that much of the epigenome variation is
In this issue of Cell, Chen et al. (2016) and Astle et al. (2016) genetic. Chen et al. (2016) analyzed this by measuring the relative
describe large genetic association studies to molecular and contribution of genetic and epigenetic factors to transcriptional
physiological phenotypes in blood, respectively. These studies variance. An association analysis between epigenetic traits and
not only build an extensive catalog of genetic variants associated gene expression with and without adjusting for cis-genetic effects
to diverse traits but also propose hierarchical causal mecha- showed that more than 50% of epigenome-transcriptome correla-
nisms of how molecular changes lead to changes at the physio- tions could be explained by underlying cis-genetic effects. This
logical and disease level. A classical problem with genome-wide was supported by a joint variance component model showing
association studies (GWAS) has been understanding biological that genetic effects explained the largest proportion of gene-
mechanisms that mediate the associations because the associ- expression variance. While these results from gene-expression
ations typically have small effect sizes and are located in non- analysis do not necessarily fully reflect variance components of
coding regions with unclear function. One of the approaches to complex traits, these results emphasize the importance of
bridge this gap has been mapping and characterization of ge- including QTLs in epigenome-wide association studies between
netic associations to gene expression to discover expression epigenome variation and disease or other high-level traits. How-
quantitative trait loci (eQTLs), as well as associations to other ever, Chen et al. (2016) also identified numerous genes, particu-
molecular traits, such as transcription factor binding, chromatin larly in immune pathways, where epigenetic regulation is corre-
states, alternative splicing, and protein abundance (reviewed by lated with transcription in a manner that cannot be explained by
Albert and Kruglyak, 2015; Pai et al., 2015). In the papers pub- common variants. As the authors indicate, while it would be
lished in this issue, the genetic associations to molecular traits tempting to assume that, in these loci, epigenetic variation causes
in Chen et al. (2016) provide insight not only of genome function changes in gene expression, the causal direction of such correla-
but also of cellular changes underlying disease associations. The tions remains unknown (Birney et al., 2016).
massive GWAS by Astle et al. (2016) describes intriguing links Mapping genetic associations to molecular traits has become
between rare and common hematological traits and diseases, a standard approach for understanding functional effects of ge-
and integrates their data to those from Chen et al. (2016) and dis- netic variants in cis. Furthermore, genetic variation can provide a
ease GWAS to shed light on biological mechanisms of blood causality anchor to describe causal relationships between
traits, which can further predispose to diverse diseases. different molecular changes. Chen et al. (2016) mapped genetic
The study by Chen et al. is the largest survey of multiple molec- effects on gene expression (eQTLs), splicing (sQTLs), DNA
ular traits in several primary cell types. They analyzed genetic, methylation (meQTLs), and histone modifications (hQTLs),
epigenetic, and transcriptomic variation in three major immune finding associations for 11.5% to 40% of tested features. These
cell types: CD14+ monocytes, CD16+ neutrophils, and CD4+ numbers were further increased by incorporating allele-specific
naive T cells. In the framework of the IHEC consortium, they expression data in the eQTL and hQTL mapping. Analysis of
Cell 167, November 17, 2016 ª 2016 Published by Elsevier Inc. 1167
LD patterns between eQTL, meQTL, and hQTL variants showed the associated variants are classified as pathogenic in the
that almost half of the eQTL variants were also associated with ClinVar database, and coding-associated variants are strongly
an epigenomic QTL. This suggests highly coordinated genetic enriched among Mendelian rare disease genes, demonstrating
influences on gene expression, DNA methylation, and chromatin overlap with Mendelian disease mutations and GWAS discov-
binding, although proving causal links between different molec- eries described here. The study also provides a large catalog
ular changes remains challenging, as the authors point out. In of new rare and low-frequency variants associating to diverse
contrast, sQTLs with eQTLs exhibit predominantly independent hematological traits, including associations with a plausible
genetic influences on splicing and expression, similarly to what biological hypothesis and possible medical importance. These
has been described in previous studies (Lappalainen et al., results indicate how well-powered GWAS reaches into the clas-
2013; Li et al., 2016). sical domain of Mendelian genetics and how these previously
The largest eQTL studies so far have been performed in whole largely separate fields inform each other.
blood due to its easy availability (Battle et al., 2014; Westra et al., The genetic associations to red cells, white cells, and platelets
2013; Wright et al., 2014), but the recent focus has been on were observed to be predominantly different, which is consistent
analyzing eQTL and other molecular QTLs in different tissues with the different biological roles of these cells. In order to
(Aguet et al., 2016), computationally deconvoluted cell types analyze functional underpinnings of the discovered GWAS loci,
(Westra et al., 2015; Zhernakova et al., 2015), and purified cell Astle et al. (2016) use epigenetic reference maps and genetic as-
types (Fairfax et al., 2012; Naranbhai et al., 2015; Raj et al., sociations to molecular traits described by Chen et al. (2016)
2014; Chen et al., 2016). The epigenomic QTL data by Chen from trait-matched primary blood cells. Heritability analysis
et al. (2016) provides additional insights to cell-type specificity, show the largest proportions of GWAS associations driven by
which is often analyzed only at the eQTL level. They show that enhancer elements, as well as transcribed regions, of which
sharing of genetic associations across monocytes, neutrophils, the latter includes both coding effects and proximal, as well as
and naive T cells was greatest for meQTLs, followed by eQTLs post-transcriptional, regulatory effects (Gaffney et al., 2012).
and H3K4me1 and H3K27ac hQTLs, which is consistent with The analysis of GWAS and QTL associations revealed 198
cell-type-specific roles of enhancers. Given that much of GWAS loci with a colocalized molecular QTL, pinpointing likely
GWAS heritability lies in enhancers (Finucane et al., 2015), causal genes and biological mechanisms for the hematological
analysis of specific cell types may be a key approach for better trait associations.
understanding the regulatory function of GWAS loci. Most hematopoietic traits can be considered intermediate
Finally, to link disease-associated loci to their causative phenotypes rather than diseases themselves. Furthermore,
genes and identify putative regulatory mechanisms, the authors while hematological traits associate to many diseases, it has
performed colocalization analysis of GWAS data from six usually remained unclear whether the blood trait changes are a
autoimmune diseases and their molecular QTLs. Interestingly, cause or consequence of the disease. Genetic associations pro-
out of the 115 loci with colocalized associations, a large vide an opportunity to decipher the causal relationship, and Astle
number involve chromatin or splicing changes without a detect- et al. (2016) use this to investigate causal links of hematopoietic
able corresponding eQTL, highlighting the importance of study- traits to multiple common complex diseases, in particular auto-
ing diverse molecular traits. While additional experiments are immune and cardiovascular diseases. This results in a number
needed to prove that these molecular events are causal to dis- of cases where hematological changes appear to be causal to
ease, these analyses provide a list of disease loci with a solid diseases, including both expected and entirely novel associa-
hypothesis of the molecular mechanism. tions. This is of major value in understanding multiple layers of
GWASs have become an essential tool to characterizing ge- biological processes underlying diseases. Overall, these findings
netic contribution to human disease, and large, comprehensive enhance our understanding of genetic effects on hematopoietic
GWASs analyzed with statistical sophistication can provide a processes and will help to identify novel therapeutic targets for
wealth of novel information of mechanisms of biological pro- hematological and other diseases.
cesses underlying disease etiology. The study by Astle et al. Altogether, these two studies build comprehensive catalogs of
(2016) is a prime example of such work. They performed a genetic effects to proximal molecular traits of the chromatin and
genome-wide association analysis in 173,480 individuals of gene expression, as well as physiological traits of the hemato-
European ancestry, testing 29.5 million polymorphic DNA seq- poietic system. Using available GWAS summary statistics, they
uence variants for association with 36 hematological traits. link their data to disease, thus building a comprehensive chain
This led to the discovery of 2,706 independent genetic variants of associations at multiple levels. These large-scale studies
associated with red cell, white cell, and platelet indices, which demonstrate how well-coordinated cohort datasets and in-
is a nearly 10-fold increase in the number of known associations creasingly affordable assays—both for physiological and molec-
(Vasquez et al., 2016). More than 10% of the associated variants ular phenotyping and for genome sequencing—can provide
are rare or low frequency, exemplifying the importance of high- biological and medical insights. A further scale-up and expan-
resolution imputation based on whole-genome sequencing. sion of these approaches holds major promises for the future.
In line with previous GWAS results, the majority of associated Increasing sample sizes and sequencing resolution are particu-
variants are common, located in non-coding regions of the larly important to reliably discover trans-acting regulatory vari-
genome, and with small effects. However, Astle et al. (2016) ants (Bonder et al., 2015; Jo et al., 2016; Westra et al., 2013),
also identify a large number of rare variants that are enriched which account for more than half of the genetically explained
for high effect sizes and effects on protein sequence. Nine of variance in gene expression (Battle and Montgomery, 2014),
1168 Cell 167, November 17, 2016

and rare disease-associated variants that may be of particular Fairfax, B.P., Makino, S., Radhakrishnan, J., Plant, K., Leslie, S., Dilthey, A.,
medical and pharmacological importance. Another key area for Ellis, P., Langford, C., Vannberg, F.O., and Knight, J.C. (2012). Nat. Genet.
44, 502–510.
future analysis is the analysis of increasingly diverse molecular
traits across multiple biologically relevant cell types and under Finucane, H.K., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P.-R.,
environmental stimuli (e.g., immune activation) (Barreiro et al., Anttila, V., Xu, H., Zang, C., Farh, K., et al.; ReproGen Consortium; Schizo-
phrenia Working Group of the Psychiatric Genomics Consortium; RACI
2012; Fairfax et al., 2014; Kim et al., 2014; Lee et al., 2014; Ye
Consortium (2015). Nat. Genet. 47, 1228–1235.
et al., 2014). The increasing scale of diversity of datasets must
Gaffney, D.J., Veyrieras, J.-B., Degner, J.F., Pique-Regi, R., Pai, A.A., Craw-
be complemented with increasingly sophisticated statistical ap-
ford, G.E., Stephens, M., Gilad, Y., and Pritchard, J.K. (2012). Genome Biol.
proaches. Finally, while observational molecular data can be 13, R7.
used to form solid hypotheses of causal mechanisms of human
Gasperini, M., Starita, L., and Shendure, J. (2016). Nat. Protoc. 11, 1782–1787.
traits, experimental validation using genome-editing technology
and high-throughput multiplexed functional assays (Gasperini Jo, B., He, Y., Strober, B.J., Parsana, P., Aguet, F., Brown, A.A., Castel, S.E.,
Gamazon, E.R., Gewirtz, A., Gliner, G., et al. (2016). bioRxiv, http://biorxiv.org/
et al., 2016) will be crucial to read the regulatory code of the
content/early/2016/09/09/074419
genome and understand the mechanistic relationships of human
Kim, S., Becker, J., Bechheim, M., Kaiser, V., Noursadeghi, M., Fricker, N., Be-
traits.
ier, E., Klaschik, S., Boor, P., Hess, T., et al. (2014). Nat. Commun. 5, 5236.
ACKNOWLEDGMENTS Lappalainen, T., Sammeth, M., Friedländer, M.R., ’t Hoen, P.A., Monlong, J.,
Rivas, M.A., Gonzàlez-Porta, M., Kurbatova, N., Griebel, T., Ferreira, P.G.,
S.K.-H. is supported by a research fellowship of the DFG. T.L. is supported by et al.; Geuvadis Consortium (2013). Nature 501, 506–511.
NIH grants R01MH106842, UM1HG008901, and HSN2682010000029C. Lee, M.N., Ye, C., Villani, A.C., Raj, T., Li, W., Eisenhaure, T.M., Imboywa, S.H.,
Chipendo, P.I., Ran, F.A., Slowikowski, K., et al. (2014). Science 343, 1246980.
REFERENCES
Li, Y.I., van de Geijn, B., Raj, A., Knowles, D.A., Petti, A.A., Golan, D., Gilad, Y.,
and Pritchard, J.K. (2016). Science 352, 600–604.
Aguet, F., Brown, A.A., Castel, S., Davis, J.R., Mohammadi, P., Segre, A.V.,
Zappala, Z., Abell, N.S., Fresard, L., Gamazon, E.R., et al. (2016). bioRxiv, Naranbhai, V., Fairfax, B.P., Makino, S., Humburg, P., Wong, D., Ng, E., Hill,
http://biorxiv.org/content/early/2016/09/09/074450 A.V.S., and Knight, J.C. (2015). Nat. Commun. 6, 7545.
Albert, F.W., and Kruglyak, L. (2015). Nat. Rev. Genet. 16, 197–212. Pai, A.A., Pritchard, J.K., and Gilad, Y. (2015). PLoS Genet. 11, e1004857–
Astle, W.J., Elding, H., Jiang, T., Allen, D., Ruklisa, D., Mann, A.L., Mead, D., e1004858.
Bouman, H., Riveros-Mckay, F., Kostadima, M.A., et al. (2016). Cell 167. Raj, T., Rothamel, K., Mostafavi, S., Ye, C., Lee, M.N., Replogle, J.M., Feng, T.,
http://dx.doi.org/10.1371/journal.pbio.0000051. Lee, M., Asinovski, N., Frohlich, I., et al. (2014). Science 344, 519–523.
Barreiro, L.B., Tailleux, L., Pai, A.A., Gicquel, B., Marioni, J.C., and Gilad, Y.
Vasquez, L.J., Mann, A.L., Chen, L., and Soranzo, N. (2016). ISBT Sci. Ser.
(2012). Proc. Natl. Acad. Sci. USA 109, 1204–1209.
11(Suppl, Suppl 1 ), 211–219.
Battle, A., and Montgomery, S.B. (2014). Hum. Genet. 133, 727–735.
Westra, H.-J., Arends, D., Esko, T., Peters, M.J., Schurmann, C., Schramm, K.,
Battle, A., Mostafavi, S., Zhu, X., Potash, J.B., Weissman, M.M., McCormick, Kettunen, J., Yaghootkar, H., Fairfax, B.P., Andiappan, A.K., et al. (2015). PLoS
C., Haudenschild, C.D., Beckman, K.B., Shi, J., Mei, R., et al. (2014). Genome Genet. 11, e1005223–e17.
Res. 24, 14–24.
Westra, H.-J., Peters, M.J., Esko, T., Yaghootkar, H., Schurmann, C., Kettu-
Birney, E., Smith, G.D., and Greally, J.M. (2016). PLoS Genet. 12, e1006105.
nen, J., Christiansen, M.W., Fairfax, B.P., Schramm, K., Powell, J.E., et al.
Bonder, M.J., Luijk, R., Zhernakova, D., Moed, M., Deelen, P., Vermaat, M., (2013). Nat. Genet. 45, 1238–1243.
van Iterson, M., van Dijk, F., van Galen, M., Bot, J., et al. (2015). bioRxiv,
http://biorxiv.org/content/early/2015/12/01/033084.1 Wright, F.A., Sullivan, P.F., Brooks, A.I., Zou, F., Sun, W., Xia, K., Madar, V.,
Jansen, R., Chung, W., Zhou, Y.-H., et al. (2014). Nat. Genet. 46, 430–437.
Chen, L., Ge, B., Casale, F.P., Vasquez, L., Kwan, T., Garrido-Martı́n, D., Watt,
S., Yang, Y., Kundu, K., Ecker, S., et al. (2016). Cell 167. http://dx.doi.org/10. Ye, C.J., Feng, T., Kwon, H.K., Raj, T., Wilson, M.T., Asinovski, N., McCabe, C.,
1371/journal.pbio.0000051. Lee, M.H., Frohlich, I., Paik, H.I., et al. (2014). Science 345, 1254665.
Fairfax, B.P., Humburg, P., Makino, S., Naranbhai, V., Wong, D., Lau, E., Jos- Zhernakova, D., Deelen, P., Vermaat, M., van Iterson, M., van Galen, M., and
tins, L., Plant, K., Andrews, R., McGee, C., and Knight, J.C. (2014). Science Arindrarto, W. van t Hof, P., Mei, H., van Dijk, F., Westra, H.-J., et al. (2015).
343, 1246949. bioRxiv, http://biorxiv.org/content/early/2015/11/30/033217
Cell 167, November 17, 2016 1169

Leading Edge
Review
Ever-Changing Landscapes: Transcriptional

Enhancers in Development and Evolution
Hannah K. Long,1,2,5 Sara L. Prescott,1,3,5,6 and Joanna Wysocka1,2,3,4,*
1Department of Chemical and Systems Biology
2Instituteof Stem Cell Biology and Regenerative Medicine
3Department of Developmental Biology
4Howard Hughes Medical Institute
Stanford School of Medicine, Stanford University, Stanford, CA 94305, USA

5Co-first author
6Present address: Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
*Correspondence: wysocka@stanford.edu
A class of cis-regulatory elements, called enhancers, play a central role in orchestrating spatiotem-
porally precise gene-expression programs during development. Consequently, divergence in
enhancer sequence and activity is thought to be an important mediator of inter- and intra-species
phenotypic variation. Here, we give an overview of emerging principles of enhancer function,
current models of enhancer architecture, genomic substrates from which enhancers emerge during
evolution, and the influence of three-dimensional genome organization on long-range gene regula-
tion. We discuss intricate relationships between distinct elements within complex regulatory
landscapes and consider their potential impact on specificity and robustness of transcriptional
regulation.
Introduction progress in the fields of genomics, genome editing, and imag-

The accurate, precise, and robust regulation of gene expression ing, as well as the realization that disease- and trait-associated
during development is a cornerstone for complex biological genetic variants frequently map to enhancers. Insights into
life. Much of this information is encoded by cis-regulatory ele- genome folding have revealed that most metazoan chromo-
ments called enhancers, canonically defined as short (100– somes are partitioned into self-associating topological domains
1,000 bp) noncoding DNA sequences that act to drive that restrict enhancer search space for target promoters.
transcription independent of their relative distance, location, or Furthermore, comparative epigenomics from both closely
orientation to their cognate promoter (for a historical perspective related and more distant species have afforded us a better
on the discovery of enhancers, see Schaffner, 2015). Although glimpse into cis-regulatory evolution and constraint. Here, we
enhancers share many features with other classes of cis-regula- discuss these new developments and give an overview of our
tory elements, especially promoters (reviewed in Kim and Shie- current understanding of the general principles of enhancer
khattar, 2015), it is their ability to activate transcription over function and evolution.
long genomic distances that sets them apart. This feature allows
a gene to be regulated by multiple distal enhancers with different General Principles of Enhancer Function
spatiotemporal activities, facilitating enormous combinatorial Enhancers as Clusters of TF Binding Sites
complexity of gene-expression repertoires (and, in turn, a vast Functional enhancers are composed of concentrated clusters of
array of cellular states) using a relatively limited set of genes. In transcription factor (TF) recognition motifs. Generally, enhancer
addition to their central role in developmental gene regulation, activation requires the binding of multiple TFs, often including
enhancers are fertile targets for evolutionary change, as they both lineage-specific factors and sequence-dependent effec-
are both cell-type specific (allowing for modulation of target tors of signal transduction pathways, thereby ensuring integra-
gene expression in specific tissue context without affecting other tion of intrinsic and extrinsic signaling cues at these elements
pleiotropic gene functions) and commonly exist in groups of (reviewed in Buecker and Wysocka, 2012) (Figure 1). One reason
redundant elements (facilitating accumulation of genetic varia- why the action of multiple TFs is essential is that regulatory
tion by buffering the risk of lethality) (Levine, 2010; Wittkopp element sequences have high intrinsic affinity for histone oc-
and Kalay, 2012). tamers (Tillo et al., 2010), creating a strong barrier for access
For decades, most of the key insights into enhancer function of TFs to the underlying DNA. Cooperative binding of multiple
and divergence have come from Drosophila genetics. However, factors in close proximity (i.e., less than one nucleosome length)
recent years have seen a renewed interest in long-range gene is thought to play a major role in overcoming the energetic barrier
regulation in many species, including mammals, fueled by the of nucleosome eviction, thus facilitating downstream effector
increasing availability of genome sequences, great technological binding and enhancer activation (Spitz and Furlong, 2012).

(legend on next page)
Cell 167, November 17, 2016 1171

Such cooperativity can rely on the direct physical association Several examples of enhancers falling at both ends of this
between TFs before or concurrent with DNA binding, called architectural spectrum have been described. The archetypal
‘‘direct cooperativity’’ (Figure 1A, left). Importantly, cooperative model of an enhanceosome is the mammalian viral-inducible
binding of TFs to DNA can also occur in the absence of direct interferon-b (IFNb) enhancer (Thanos and Maniatis, 1995), which
protein-protein interactions through a process referred to as ‘‘in- requires the cooperative binding of eight TFs to create a com-
direct cooperativity’’ or ‘‘collaborative competition,’’ in which a posite surface for DNA binding. While documented examples
cohort of TFs collectively competes with the same histone oc- of enhanceosomes are rare, simulations of TF binding have sug-
tamer for access to underlying DNA, as demonstrated experi- gested that this mode of enhancer regulation may be in fact rela-
mentally (Miller and Widom, 2003) and predicted by in silico tively common in mammalian genomes (Guturu et al., 2013). On
modeling (Mirny, 2010) (Figure 1A, middle). Thus, as a general the other end of the spectrum, so-called ‘‘billboard’’ enhancers
rule, increased numbers of TF motifs at an enhancer should be preserve function by maintaining transcription factor binding
positively correlated with increased nucleosomal eviction, DNA site (TFBS) content but with significant flexibility as to the order
binding, and, consequently, elevated gene-expression output. or spacing of those sites within the larger enhancer, likely by
Numerous studies looking both at endogenous enhancers relying on indirect TF cooperativity for activation and also allow-
(Burz et al., 1998) and using synthetic reporters (Erceg et al., ing for independent contribution of multiple sub-elements to
2014; Smith et al., 2013a) support this view but also highlight gene expression (Arnosti and Kulkarni, 2005). Interestingly, these
an often-complex interplay between different regulatory se- sorts of enhancers often contain suboptimal binding sites that
quences, as discussed in more detail below. In addition, at not only tolerate rapid motif turnover across species but that
some developmental enhancers, there is evidence for step- also may serve an important role in promoting enhancer speci-
wise licensing by lineage-determining TFs, also known as master ficity (Crocker et al., 2015; Farley et al., 2015). For example,
regulators, or pioneer factors, which directly bind nucleosomal the neural plate-specific Otx-a enhancer in Ciona contains
DNA to prime enhancers for activation (Figure 1A, right). These GATA and ETS DNA sequence motifs that are imperfect matches
factors can recruit chromatin remodeling activities, which then to the consensus sequence (Farley et al., 2015). Strikingly, opti-
facilitate removal and post-translational modification of his- mization of motif sequence or spacing causes stronger and
tones, meaning that subsequent TF and coactivator binding is ectopic enhancer activity patterns, as presumably the enhancer
less strictly dependent on direct competition between TFs and is now responsive to lower levels of the GATA or ETS-domain
nucleosomes (Zaret and Carroll, 2011). TFs that recognize these motifs. Thus, combinatorial deploy-
Motif Organization, Enhancer Grammar, and Models of ment of multiple suboptimal enhancers may help to promote
Enhancer Architecture specificity of regulation without sacrificing signal strength. In
Because of this complex interplay and hierarchical logic addition, often a number of TFs are localized to enhancers
between DNA-binding regulators at enhancers, the underlying without underlying consensus binding sites and are instead re-
organization of consensus motifs can have dramatic effects on cruited through protein-protein interactions, as proposed for
enhancer activity and robustness. The universal principles of the ‘‘transcription factor collective’’ model of enhancer organiza-
this organization—referred to as enhancer ‘‘grammar’’—are tion (Figure 1C) (Junion et al., 2012).
quite complex and still an area of active research. The enhancer Both large-scale comparisons of enhancer conservation
lexicon incorporates the type, binding affinity, number, spacing, across species (Taher et al., 2011), as well as synthetic enhancer
orientation, order, and local DNA shape of TF motifs, all of reporter studies (Smith et al., 2013a), have broadly supported the
which can affect the functional output of any given enhancer more flexible organizational models whereby a collection of TF-
and are subject to varying degrees of selective constraint during binding events can drive reporter activity in different orders or
evolution (Figure 1B). From these principles, two main distinct orientations. However, reexamination of the prevalence of
models for describing enhancer architecture have emerged, more rigid motif grammar may be warranted, as recent work
with the first ‘‘enhanceosome’’ model requiring rigid motif has challenged the widely held assumption that direct coopera-
organization and spacing for function, while the alternative ‘‘bill- tivity of TFs should be evident on the DNA level by the presence
board’’ model allows greater flexibility of binding, with the of a composite motif (i.e., complete motifs of both TFs with a
presence of a specific set of TFs being more important than defined spacing). In fact, SELEX analysis using consecutive
the defined order or orientation of their underlying motifs affinity purification to interrogate pairwise TF-binding specific-
(Figure 1C). ities (CAP-SELEX [Jolma et al., 2015]) has demonstrated that
Figure 1. Nucleosome Eviction, Enhancer Grammar, and Models of Enhancer Architecture

(A) TF mechanisms for overcoming the energetic barrier to nucleosomal eviction and underlying motif requirements.
(B) Parameters of motif grammar at enhancers.
(C) Models of enhancer architecture and their primary mechanism of TF binding, flexibility of motif organization, and selective constraint across evolution.
(D) Coactivator binding and chromatin signatures at active enhancers. Presence of cell-type-specific and broadly expressed TFs, RNA Polymerase II (RNAPII)
and associated enhancer RNAs (eRNAs), coactivator complexes such as Mediator (Med), nucleosome remodeling complexes (NRCs), histone acetyltransferases
(CBP/p300), and methyltransferases (MLL3/4) at tissue-specific enhancer elements is schematically depicted for a midbrain (top) or limb (bottom) specific
enhancer. Select modifications of neighboring nucleosomes associated with active enhancer states are highlighted. An overlapping set of protein complexes and
modifications is also present at promoters, with the distinction that enhancers have a high ratio of H3K4me1 to H3K4me3, while the reverse is true at promoters.
Both active enhancers and promoters are characterized by DNA hypomethylation, with the methylation status of enhancers being more dynamic and tracking
with their cell-type specificity.
1172 Cell 167, November 17, 2016

the majority of TF-TF interactions have a novel consensus motif, two species (Rada-Iglesias et al., 2011). One mechanism that
bound only weakly by each TF in isolation. Importantly, these could explain this conservation of ancestral function, but not of
novel long motifs detected by SELEX analysis are enriched in sequence, is compensatory TFBS turnover, which is tolerated
mammalian chromatin immunoprecipitation sequencing (ChIP- at enhancers following more flexible organizational principles.
seq) TF clusters, suggesting that this sort of cooperative bind- A classic example is the even-skipped stripe 2 enhancer (S2E)
ing is commonplace throughout the genome. Therefore, it was in the Drosophila genus, which has undergone dramatic reshuf-
likely the inability to recognize sites of direct cooperativity be- fling of TFBSs across different species, yet directs the same
tween TFs, rather than their true scarcity, that contributed to expression pattern in transgenic reporter assays (Ludwig et al.,
the notion that most mammalian enhancers are devoid of 2000; Arnosti et al., 1996). Chimeric enhancer assays provided
defined TF motif spacing. Indeed, evidence is beginning to strong evidence for stabilizing selection at this site, as swapping
emerge that spacing and orientation constraints on pairwise TF the 30 and 50 halves of the native S2E enhancers from different
binding are more prevalent than initially anticipated (Ng et al., species did not recapitulate the endogenous activity patterns
2014). Thus, in reality, the architecture of most enhancers likely (Ludwig et al., 2000). A more systematic enhancer discovery
falls into a spectrum between the rigid motif organization of the strategy, called STARR-seq, confirms that compensatory motif
enhanceosome and flexible organization of the billboard and turnover in functionally conserved enhancers is common even
TF collective, with defined spacing and orientation being between closely related species (Arnold et al., 2014). More
required for some motifs, but not others, within a single enhancer generally, TFBS loss or gain without concomitant alteration of
(see ‘‘Mixed modes within one enhancer,’’ Figure 1C). enhancer function appears to be a common event at enhancers
Implications of Enhancer Architecture on Regulatory that employ a more flexible mode of motif grammar over evolu-
Landscape Evolution tionary timescales.
It is interesting to consider what these different models of Impact of Sequences that Affect DNA Shape
enhancer architecture could mean during regulatory landscape In addition to the core TF motif recognition, emerging evi-
evolution. At enhanceosomes, any binding site mutation will dence supports the role of local DNA shape in influencing
preclude binding of the entire complex, demanding deep con- binding affinity, site accessibility, and binding site choice,
servation at the DNA-sequence level to maintain function. known as ‘‘shape readout’’ (Slattery et al., 2014). The DNA
Such strictly conserved enhancer sequences do indeed exist sequence at sites flanking core TF-binding sites can thus pro-
and have been studied for quite some time (reviewed in Visel foundly influence binding by TFs that are sensitive to DNA
et al., 2007). The question remains, however, whether there is shape (Gordân et al., 2013; Levo et al., 2015). Similarly,
anything unique about these deeply conserved enhancers. changes in protein structure that affect shape readout can
One hypothesis is that enhanceosomes may have potential dramatically affect binding preference. For example, altering
roles in mediating switch-like transcriptional activation of genes amino acids involved in DNA shape recognition by the Scr
requiring strict regulatory inputs from multiple signaling path- protein abrogated preferential binding to a narrowed minor
ways (Spitz and Furlong, 2012). Consistent with this theory, groove DNA structure, while introduction of these shape-
deeply conserved enhancers tend to fall as clusters near genes recognizing residues into another Hox factor, Antp, caused a
encoding developmentally important transcription factors (see switch in its binding specificity in vitro and transcriptional
Boffelli et al. [2004] and references within). Because of their targets in vivo to that of Scr (Abe et al., 2015). Thus, at least
developmental importance and strict structural requirements, in some defined cases, local DNA shape readout influences
mutations at conserved regulatory sites are also associated the binding preference of TF family members. Consequently,
with human disease, as seen for the deeply conserved Shh mutations falling outside core TFBS may affect local DNA
enhancer that drives preaxial polydactyly upon disruption (Let- shape and contribute to modulation or loss of enhancer activ-
tice et al., 2003). ity in both an evolutionary and disease context.
Despite their association with developmental TFs and human Role of Coactivators in Enhancer Activity
disease, deeply conserved enhancers are an exception rather The ability of TFs to activate transcription on chromatin tem-
than the rule. Instead, de novo gains and losses of enhancers plates is dependent on the recruitment of coactivator proteins
appear to be relatively common across evolution (discussed in (reviewed in Roeder [2005] and Weake and Workman [2010]).
more detail below), and furthermore, even when an ancestral Coactivators are defined as factors that are ‘‘required for the
function is conserved, these enhancers often retain no recogniz- function of DNA-binding activators, but not for basal transcrip-
able sequence similarity or have only very weak conservation tion per se, and do not show site-specific binding by them-
signatures. For example, only 24% of murine human heart selves’’ (Malik and Roeder, 2010). Coactivators typically act
enhancers with p300 occupancy showed any detectable conser- through modifying and remodeling the chromatin context of en-
vation with other vertebrates, yet these poorly conserved enhancers and include histone acetyltransferases (such as p300/
hancers still demonstrated heart-specific activity in transgenic CBP, SAGA complex, MOF, TIP60, and others), histone methyl-
mouse embryo reporter assays regardless of their level of transferases (e.g., MLL3/4, CARM1), chromatin remodeling fac-
sequence conservation (Blow et al., 2010; May et al., 2011). tors (e.g., Brg1, CHD7), and factors that promote crosstalk with
Similarly, poised developmental enhancers identified in human the basal transcriptional machinery at promoters (e.g., Mediator
embryonic stem cells drove cell type- and stage-specific ex- complex) (Krasnov et al., 2016; Malik and Roeder, 2010; Taatjes
pression when introduced to zebrafish embryos, even in the et al., 2004). Coactivator complexes are often broadly and highly
absence of detectable sequence conservation between the expressed and can be recruited to chromatin by a diverse range
Cell 167, November 17, 2016 1173

of TFs. Consequently, coactivators tend to be associated with subfunctionalization of enhancer activity (the loss of activity of
most active enhancers in a given cell type, regardless of the a subset of enhancers leading to different expression domains
examined tissue (see Figure 1D). This property has been widely of gene paralogs), distributed gene dosage (Lan and Pritchard,
utilized for annotations of putative enhancers in a myriad of 2015) or rapid divergence in the expression domain of one or
cellular contexts, where occupancy of coactivators (e.g., p300, both of the gene paralogs by repurposing existing enhancer(s)
Mediator, BRG1) or their enzymatic products (e.g., H3K27ac, for novel regulatory functions (Goode et al., 2011) can result in
catalyzed by p300/CBP or H3K4me1, catalyzed by MLL3/4) retention of both duplicated genes (Figure 2Ai–iii). For example,
can be mapped using ChIP-seq (for reviews on enhancer chro- teleost fish experienced a relatively recent whole-genome dupli-
matin signatures and epigenomic annotation strategies, see cation 350 million years ago, with many paralogous duplicated
Calo and Wysocka [2013]; Sakabe and Nobrega [2013]; and Whi- genes retaining an equivalent protein function but showing diver-
taker et al. [2015]). gent expression domains that arose through changes in their
Consistent with the idea that coactivators are the workhorses local cis-regulatory landscape (for example, the duplicated
behind enhancer-mediated transcriptional activation, RNA- pax6a/b in zebrafish [Kleinjan et al., 2008]). Smaller duplications
guided recruitment of the acetyltransferase domain of p300 of non-coding DNA can also generate copies of enhancer ele-
fused to dCas9 is sufficient to bypass the requirement for TFs ments in cis to the original enhancer and promoter pair
during enhancer activation in an endogenous chromosomal (Figure 2B). While in many cases these duplicated enhancers
context (Hilton et al., 2015). Conversely, RNA-guided recruit- are lost due to negative selection or genetic drift, in some in-
ment of the H3K4me1/2 demethylase LSD1 is sufficient to stances locally duplicated enhancers are retained, for example
repress enhancer activity and gene transcription (Kearns et al., enhancers at the human apoE/C-I/C-I0 /C-IV/C-II gene cluster
2015). A comprehensive study in Drosophila used GAL4 DNA- (Allan et al., 1995). Finally, genomic duplication of previously
binding domain (DBD) fusions of 338 coactivators and corepres- non-regulatory sequences can also provide new raw material
sors in enhancer complementation assays to test their capacity from which primordial regulatory sequences can emerge through
to bypass the requirement for key TFs by replacing their neutral drift, as discussed below.
motifs with GAL4 recognition sites in 24 distinct enhancer con- Repurposing of Ancestral Regulatory or Coding
texts. Interestingly, most (80%) of the examined coactivators Sequences
and corepressors activated or repressed, respectively, tran- With or without genomic duplication events, novel regulatory
scription in at least one enhancer context, whereas some function can arise from functional ancestral DNA sequences
coactivators, including p300, Mediator subunits MED15 and through the repurposing of older enhancer elements (Rebeiz
MED25, and Lpt1 (a fly homolog of the N-terminal part of et al., 2011). The evolutionary barrier to the formation of a novel
MLL3/4), were sufficient to activate transcription in all examined enhancer is potentially lower at these sites because they already
contexts (Stampfel et al., 2015). While not without caveats, taken contain motifs for DNA binding factors, which can act coopera-
together, these results suggest that the main role of TFs is to pro- tively with newly emerging sites. Recent genome-wide mapping
vide a specific genomic address for coactivator complexes, of DNase hypersensitive sites (DHSs) (Vierstra et al., 2014) re-
which in turn enable activation of transcription by influencing vealed that DHSs at orthologous loci between human and mouse
the activity of RNA polymerase and facilitating establishment were functionally active in distinct tissues, revealing frequent
of a transcriptionally permissive chromatin environment. regulatory element repurposing across 90 million years of evolu-
Nonetheless, from an evolutionary perspective, alterations of tion. Importantly, modulation of existing enhancers can drive
regulatory sequences themselves are the major driver of regulatory innovation and evolution of morphological change,
change in enhancer activity, as enhancer elements are subject such as morphogenesis of trichomes in Drosophila larvae
to high rates of turnover (Arnold et al., 2014; Villar et al., 2015) (Frankel et al., 2011) and limb length in bats (Cretekos et al.,
while both coactivator-TF interfaces and TF DNA-binding spec- 2008).
ificities appear to be well conserved over evolutionary time- Interestingly, enhancer function can also arise within protein-
scales (Minezaki et al., 2006; Nitta et al., 2015). It is therefore coding sequences—for example, in mouse liver, 6% of epige-
important to consider the genomic substrates from which en- nomically defined enhancers overlap exons (enhancer exons,
hancers emerge over evolutionary time and how the availability or eExons), a number of which have been implicated in regulation
of such substrates may either act to drive or limit the rates of of nearby liver-specific genes (Birnbaum et al., 2012). A further
enhancer genesis. extensive investigation of this phenomenon in 81 cell types found
that 15% of coding exons have DNase hypersensitivity foot-
Genomic Substrates for Evolving New Enhancers prints, which is suggestive that they are dual-use codons with
The birth of new enhancers can be mediated by a number of both protein-coding potential and TF-binding capacity, or
mechanisms, which will be discussed here in turn. ‘‘duons’’ (Stergachis et al., 2013). However, whether duons are
Regulatory Innovation following Genomic Duplication subject to selective constraint beyond that associated with their
Genomic duplication is an important event driving evolutionary protein-coding potential remains controversial (Agoglia and
change, as it generates new gene loci with the potential to evolve Fraser, 2016; Xing and He, 2015).
divergent functions and drives the formation of new regulatory Spontaneous Emergence of Enhancer Sequences from
programs (Taylor and Raes, 2004). Following a duplication event, Non-regulatory DNA
often one copy of a pair of duplicated genes is simply lost due to Attempts to reverse engineer enhancers (Smith et al., 2013b)
lack of selective pressure to maintain both paralogs. However, have revealed that short (i.e, 15-mer) random sequences are
1174 Cell 167, November 17, 2016

Figure 2. The Birth and Death of Enhancers during Evolution
(A) Following whole-genome or local duplication events, enhancers and associated genes can become duplicated (left) and subsequently undergo (i) loss of
enhancer function and subfunctionalization of associated gene activity, (ii) reduction in enhancer activity and gene dosage sharing between alleles, or (iii)
enhancer repurposing with novel tissue or developmental stage expression for the duplicated gene.
(B) Local duplication events can lead to increased enhancer copy-number.
(C) Novel enhancer activity can emerge from ancestral DNA through genetic drift and spontaneous appearance of TFBSs and protoenhancer activity.
(legend continued on next page)
Cell 167, November 17, 2016 1175

often sufficient to drive tissue-specific expression, suggesting (Figure 2D). As endogenous retroviruses are remnants of ancient
that enhancer activity can emerge spontaneously from neutral retroviral infections, one can speculate that the ancestral evolu-
DNA sequences purely by random genetic drift. Modeling theo- tionary history of viral infections and endogenization can and
retical rates of enhancer genesis in Drosophila concluded that it will continue to influence the future cis-regulatory evolution of a
should take 0.5–10 million years to evolve novel anterior-poste- species.
rior (A-P) patterning enhancer elements from neutral DNA DNA Methylation and Biased Gene Conversion in
sequence (Duque and Sinha, 2015). These predictions match Regulatory Evolution
well with observations from across the Drosophila genus The frequency with which TFBSs emerge by chance from ances-
whereby hundreds of species-specific enhancers have arisen tral DNA sequences and TEs is greatly influenced by local muta-
over similar timescales (Arnold et al., 2014) and provide an initial tional rate. Certain processes can act to increase or to suppress
framework for understanding emergence of new enhancers from the basal mutagenesis rate and have been implicated in rapid
the genetic void. Recent studies emphasize that indeed most evolution of sequences and emergence of novel regulatory func-
species-specific enhancers evolve de novo from previously tion. For example, the majority of vertebrate genomes are meth-
non-regulatory non-coding DNA (Figure 2C). For example, using ylated at cytosine residues in the context of CpG dinucleotides
comparative epigenomic analysis of enhancers from livers of 20 (Jones, 2012). Mutation of methylated cytosines to thymine
diverse mammalian species, Villar et al. (2015) demonstrated due to spontaneous deamination is subject to error-prone repair
that the majority (52%–77%) of species-specific enhancers (Duncan and Miller, 1980). Thus, C to T mutation is the most
arose through exaptation of ancestral DNA sequences (i.e., pre- common mutation among all base substitutions (Sved and
sent in the genomes of the ancestral species for at least Bird, 1990), favoring de novo genesis of TFBSs that contain
100 million years), with the remainder being derived primarily TpG dinucleotides. For example, there is evidence that sponta-
from transposable elements (Villar et al., 2015). neous deamination has helped to generate thousands of p53
Co-option of Transposable Elements binding sites genome-wide (Zemojtel et al., 2009).Therefore, in
A large proportion of eukaryotic genomes, including almost half addition to its role in genome silencing, DNA methylation may
the human genome, are composed of repetitive transposable play a role in regulatory evolution. A second mechanism causing
elements (TEs) (Lander et al., 2001). TEs are a rich source of reg- rapid change of local nucleotide composition is known as GC-
ulatory innovation, as they are disseminated throughout the biased gene conversion (gBGC), in which mismatch repair pro-
genome via TE expansion, carrying with them cis-regulatory cesses following meiotic recombination favor G or C alleles
sequences that exploit the host trans environment for their over A or T alleles, thus leading to increase in GC content at these
transcription (Feschotte, 2008) and have the potential to acquire regions (Galtier and Duret, 2007). gBGC might have played an
mutations that allow for their tissue-specific exaptation for host important role in recent human evolution, as around 20% of
gene regulation (Bourque et al., 2008). Indeed, evidence is sequences which are conserved across vertebrates but have
beginning to emerge that TEs play a central role in rewiring undergone rapid change in the human lineage (e.g., human
gene regulatory networks and can facilitate rapid evolution of accelerated regions [HARs]) can be explained by gBGC alone
ecologically relevant traits, such as sweet perception in primates (Kostka et al., 2012).
(Ting et al., 1992) or the diversification of tissues at the boundary Remodeling of the Chromosomal Context
between mother and fetus in eutherian mammals (Lynch et al., Novel enhancer function can arise not only through change of the
2011). enhancer sequence itself, but also through the rearrangement of
Indeed, according to some estimations, the majority of pri- the neighboring chromosomal context (Figure 2E). For example,
mate-specific regulatory sequences are derived from TEs (Jac- in the beetle Tribolium castaneum, the expression of the ladybird
ques et al., 2013). For example, Alu elements, which comprise gene is absent from the dorsal mesoderm as compared to the
10% of the human genome, contain suboptimal TFBSs and honeybee Apis mellifera and fruit fly Drosophila melanogaster
can acquire enhancer-like features with few mutations (Su and is replaced by expression of C15, a neighboring gene. This
et al., 2014), while ERV1 elements are enriched at sites of cis- switch in expression from ladybird to C15 appears to have arisen
regulatory divergence in primate neural crest cells (Prescott from a genomic inversion redirecting a conserved ladybird 30
et al., 2015), suggesting that many enhancers that changed enhancer to regulate C15 (Cande et al., 2009). Thus, despite
their activity since the separation of humans and chimpanzees minimal alteration to the enhancer itself, change in its genomic
did so through exaptation of pre-existing TEs. Importantly, location can rapidly introduce a new gene or enhancer into a
direct evidence for LTR-mediated gene regulation has been pre-existing gene-regulatory network in one step. Recently,
demonstrated for human innate immunity genes (Chuong et al., much progress has been made toward understanding how
2016). This is consistent with the emerging idea that motif-rich three-dimensional genome organization influences enhancer
LTRs of endogenous retroviruses are particularly fertile sub- function, poising the field to uncover the frequency with which
strates for evolving new enhancers, with many LTRs acquiring structural genomic variation can drive evolutionary rewiring of
inducible or cell type-specific enhancer function during evolution enhancer targets.
(D) Enhancer elements can be exapted from transposable elements. For example, following endogenization and unequal homologous recombination, the long-
terminal repeats (LTRs) of endogenous retroviral (ERV) elements can gain tissue-specific regulatory activity through accumulation of mutations and emergence
of TFBSs.
(E) Enhancer activity can be transferred to a new gene target, for example, through a genomic inversion event.
1176 Cell 167, November 17, 2016

Topological Constraints on Enhancer Function and Conservation of Topological Domain Structures across
Relationships between Enhancers within Complex the Eukaryotic Tree of Life
Regulatory Landscapes Given the importance of TAD boundaries in demarcating the
Topological-Associated Domains as Structural limits of gene-regulatory domains, changes to domain architec-
Elements of Chromatin Organization ture would represent a means for saltatory (or non-gradual)
A defining feature of enhancers is their ability to activate tran- change in gene regulation over evolutionary time, as shifting of
scription at a distance. Indeed, in mammalian genomes, such domain boundaries would expose multiple genes to a novel
distances can be over a megabase long, as exemplified by regulatory environment. However, this sort of restructuring of
the Shh limb enhancer (Lettice et al., 2003), raising the question TADs appears to be rare during evolution, as TADs tend to
as to what constrains enhancer search space such that high- overlap with highly syntenic genomic blocks of conserved
regulatory precision is preserved. A key insight into this question enhancers and non-coding elements (Harmston et al., 2016).
came from studies of chromosome folding using techniques Interestingly, structural rearrangements between mammalian
such as Hi-C and 5C, which are chromosome conformation genomes appear to instead occur between the boundaries of
capture (3C)-based assays and rely on the principle that adjacent domains (Farré et al., 2015; Vietri Rudan et al., 2015)
digestion and re-ligation of chromatin can capture the three- (Figure 4Ai–ii).
dimensional proximity of DNA sequences in the nucleus (exten- In the context of high conservation of TAD boundaries in
sively reviewed elsewhere, see de Wit and de Laat [2012]). vertebrates and their ostensible role in organizing regulatory
These studies have revealed that the chromosomes of many landscapes from Drosophila to man, it is surprising that some
eukaryotic genomes are segmented into self-interacting do- multicellular organisms appear to lack conventional TAD organi-
mains reported as physical domains in Drosophila (Hou et al., zation. For example, in C. elegans, TAD-like structures seem to
2012; Sexton et al., 2012) and topological-associated domains predominantly occur on the X chromosomes of XX hermaphro-
(TADs) in mammalian genomes (Dixon et al., 2012; Nora et al., dites and are driven by the dosage compensation complex
2012) (Figures 3A and 3B). High-resolution mapping has esti- (DCC), whereas autosomes are largely devoid of long-range
mated TAD median length at 185 kb in mammals, although chromosomal interactions (Crane et al., 2015). This could be
importantly a subset of domains are over a megabase in size perhaps rationalized by the fact that C. elegans has a compact
(Rao et al., 2014). Importantly, compartmentalization of the genome, with most of the cis-regulatory information usually con-
genome in this manner partitions the chromosome into ‘‘regula- tained within 10 kb from the TSS, thus alleviating the need for
tory neighborhoods’’ by limiting the activity of cis-regulatory el- long-range gene-regulatory domains. In plants, which appear
ements to genes that fall within the same TAD and by preventing to have long-range cell-type-specific enhancers (Zhu et al.,
spreading of heterochromatic marks (Dixon et al., 2012; Nora 2015), TAD structures are also not detectable by Hi-C, albeit
et al., 2012) (Figures 3B and 3C, also discussed in more detail both boundary-like elements and self-interacting heterochro-
below). matic regions having been reported (Feng et al., 2014; Grob
Strikingly, TAD boundaries appear to be largely invariant beet al., 2014). Interestingly, both C. elegans and A. thaliana do
tween cell types and are conserved across even distantly related not encode a CTCF homolog (Heger et al., 2012), suggesting
species, suggesting that TADs represent a fixed structural unit of that these species exploit alternative mechanisms of genome
chromatin organization within which tissue-specific regulatory organization and functional segmentation (the role of CTCF in
interactions can occur and evolve (Dixon et al., 2012; Vietri Ru- mammalian TAD formation is discussed below). Interestingly,
dan et al., 2015; Harmston et al., 2016). Such an organizational the inactive mammalian X chromosome also mostly lacks
role for TADs is supported by the demonstration of a one-to- TADs and is instead divided into two megadomains (Giorgetti
one correspondence between TADs and bands on polytene et al., 2016). Therefore in addition to being absent in a number
chromosomes from Drosophila larval salivary glands, with inter- of eukaryotic species, TAD-like structures are not a required
bands corresponding to highly transcribed TAD boundary feature of mammalian interphase chromosome folding.
regions (Eagen et al., 2015) (Figure 3D). Endoreplicated polytene It is unclear when and how TAD-like structures first appeared
chromosomes have long served as a cytological model for during evolution. However, recent studies from both bacteria
understanding the relationship between chromatin structure and yeast suggest that self-interacting domains may be an
and function, with highly stereotyped banding patterns observed ancient feature of chromosome organization. Hi-C experiments
across cells within a given cell type and only minor variations in Caulobacter cells found evidence of independent spatial
seen between tissues (Bridges, 1935; Mavragani-Tsipidou domains called ‘‘chromosomally interacting domains’’ (CIDs),
et al., 1990). This recent observation further suggests that ranging in length from 30 to 420 kb, which are proposed to
TADs and their boundaries represent fundamental features of correspond to supercoiled DNA loops, or plectonemes, ar-
chromatin organization. Notably, the invariance of TADs across ranged in a bottle-brush-like fiber and flanked by highly tran-
cell states is not due to a passive maintenance from one cell di- scribed genes at CID boundaries (Le et al., 2013). In the fission
vision to the next, as TAD structures are lost during mitosis to yeast S. pombe, the cohesin complex has been shown to be
accommodate an alternative conformational state of metaphase important for the formation of globule structures at the 50–
chromosomes (Naumova et al., 2013) (Figure 3E). This implies 100 kb scale (Mizuguchi et al., 2014). Further explorations in
that with each cell division, TADs must reproducibly refold, but S. cerevisiae also found CIDs which are much shorter self-inter-
the mechanisms that drive their reformation and maintenance acting regions than those detected in other organisms on the
are still under investigation. order of around 2–10 kb in size and bounded once again by
Cell 167, November 17, 2016 1177

Figure 3. Organization of Chromatin into Topologically Associated Domains
(A) Hi-C or 5C heatmaps visualize three-dimensional interactions or compartmentalization of chromosomes into TADs, visible as triangular blocks of increased
interaction frequencies.
(B) TAD boundaries restrict the influence of regulatory elements to genes within a given TAD and limit the spread of chromatin modifications. The boundary
regions between TADs have been observed to be associated with CTCF binding sites, housekeeping genes, SINE elements, and tRNA genes.
(C) A model of chromosome folding corresponding to (B), with convergent CTCF sites and cohesin depicted at loop anchors.
(D) Correspondence between TADs and cytological bands on polytene chromosomes and TAD boundaries with decondensed interbands.
(E) During mitosis, TAD structure is lost.
highly transcribed genes (Hsieh et al., 2015). Interestingly, these sary to understand the evolution and functional role of topologi-
self-associating topological domains in S. cerevisiae encompass cal domain architecture.
1–5 genes, approximately the same order of gene number as Formation of TAD Boundaries and CTCF-Mediated
mammalian TADs, suggesting that the size of genes and inter- Loops
genic spacing may influence the size and formation of topologi- Since TAD boundaries appear to restrict both enhancer function
cal domains. Further interrogation of the topological landscapes (Dowen et al., 2014; Flavahan et al., 2016; Guo et al., 2015;
at high resolution across the eukaryotic tree of life will be neces- Lupiáñez et al., 2015) and spreading of chromatin marks
1178 Cell 167, November 17, 2016

Figure 4. Topologically Associated Domains Define Discrete Units of Gene Regulation
(A) TADs appear to be highly conserved across mammals, therefore structural changes between species tend to occur at TAD boundaries, for example (i) insertion
of a new TAD or (ii) chromosomal breaks.
(B) Disruption of a TAD boundary (red box) can impact gene expression. (i) Boundary deletion fuses two adjacent TADs facilitating de novo regulation of genes
from one TAD by the enhancers in another. (ii) Boundary inversion can translocate genes or enhancers into an adjacent TAD, where they are then incorporated into
the regulatory environment of the new TAD. (iii) TAD boundary duplication can create a new TAD and potentially expose any duplicated genes to a novel regulatory
environment.
(C) Aberrant DNA methylation can abrogate CTCF binding, causing TAD boundary defects and misregulation of gene expression.
(D) Reporter constructs transposed into different regions of a TAD containing a developmental enhancer, but not into genomic regions outside the TAD, can
recapitulate the gene-expression pattern of the endogenous gene (Gene X) controlled by this enhancer.
(Narendra et al., 2015), they fulfill a canonical definition of insu- in mammals (Dowen et al., 2014; Sofueva et al., 2013). In fact,
lator elements. Indeed, TAD boundaries are enriched for insu- high-resolution Hi-C maps in mammalian cell lines revealed
lator-binding proteins, such as CP190, CTCF, and BEAF-32 in that 86% of roughly 10,000 long-range contact peaks, inter-
Drosophila (Van Bortle et al., 2014; Sexton et al., 2012; Phil- preted as anchors for chromatin loops, are associated with
lips-Cremins et al., 2013) and are enriched for CTCF and cohesin CTCF (Rao et al., 2014). Moreover, CTCF sites that are engaged
Cell 167, November 17, 2016 1179

in these contacts are characterized by a unique motif orientation, What is also becoming clear is that, while there are certainly
with convergent (inward-facing) CTCF motifs found in > 90% of well documented examples of enhancer-promoter loops, typical
loop anchors (Figure 3B) (Rao et al., 2014; Vietri Rudan et al., enhancer-promoter contacts are likely less stable and/or less
2015). Subsequent work has led to an as of yet experimentally frequent than structural loops mediated by CTCF that are readily
untested ‘‘loop extrusion model’’ of TAD formation whereby a detectable by Hi-C methods. For example, most enhancers and
chromosomal loop is randomly initiated then continuously fed promoters active in a given cell type are not detected as peak
through an extrusion complex containing a cohesin ring until points (a.k.a. loop anchors) on high-resolution Hi-C maps (Rao
convergent CTCF binding sites are encountered, at which point et al., 2014). Regrettably, the authors chose to de-emphasize
the loop structure is stabilized, likely via CTCF dimerization this point by focusing on a subset of promoters and enhancers
(Fudenberg et al., 2015; Sanborn et al., 2015). Although this hy- that coincide with stable loop anchor sites. However, the ele-
pothetical model is attractive, at present it remains unclear to ments in this subset are distinct from most enhancers and pro-
what extent convergent CTCF-cohesin sites are the major driver moters, as they are bound by CTCF and therefore may only
of genome segmentation into TAD domains. Interestingly, neigh- represent the minority of enhancer-promoter pairs, which serve
boring topological domains do not completely collapse or merge dual roles as both structural and regulatory elements.
upon deletion of domain boundary regions nor when cohesin is Intriguingly, recent live-imaging studies in Drosophila uncov-
depleted in non-cycling cells (Seitan et al., 2013; Sofueva ered the capacity of a single enhancer to drive synchronized
et al., 2013; Zuin et al., 2014). This can be rationalized by inherent transcriptional bursting from two equidistant promoters (Fukaya
interactions within a TAD mediating TAD structure organization et al., 2016), raising the question of whether or how chromatin to-
by preventing interactions with a neighboring TAD and thus pology regulates this behavior. Another recent study supports a
contributing to the presence of a boundary between them highly dynamic view of enhancer-promoter contacts in mamma-
(Giorgetti et al., 2014; Ulianov et al., 2016). Taken together, cur- lian cells and shows that enforcing promoter-enhancer looping
rent results suggest that multiple parallel mechanisms may can increase transcriptional burst frequency, but doesn’t affect
contribute to separating the genome into highly reproducible burst size (Bartman et al., 2016). One should also consider that
TAD structures. alternative mechanisms of communication may be at play,
Consequences of TAD Organization on Enhancer such as the previously proposed tracking model (Zhu et al.,
Function and Gene Regulation 2007), lateral propagation of torsional strain, indirect association
Regardless of the specific mechanisms, genome segmentation in transcriptional hotspots or ‘‘factories’’ (Kolovos et al., 2012),
into TADs has profound implications on how cis-regulatory land- enhancer-mediated trapping of activator complexes in a ‘‘reac-
scapes are organized and evolve by limiting the genomic search tion vessel,’’ or other yet-to-be-conceived mechanisms. With
space within which enhancers can act. As such, perturbations of carefully measured parameters of transcriptional bursting in rela-
TAD boundary elements, either through structural variation tion to changes in nuclear positions of relevant regulatory ele-
(Figure 4B) or aberrant DNA methylation resulting in loss of ments and intervening sequences in living cells, it may soon
CTCF binding (Figure 4C), can lead to misregulation of genes become possible to gain insights into the role of chromatin topol-
by exposing them to the regulatory influence of enhancers in ogy in long-range gene regulation by enhancers.
the adjacent TAD, in some cases resulting in congenital malfor- Regulatory Relationships among Simultaneously Active
mations or cancer (Nora et al., 2012; Gómez-Marı́n et al., 2015; Enhancers
Guo et al., 2015; Flavahan et al., 2016; Lupiáñez et al., 2015). With our greater appreciation of the complexity of genomic orga-
Importantly, genes lying within the same topological domain nization, it is becoming apparent that cis-regulatory elements do
are often co-regulated (Nora et al., 2012; Gómez-Marı́n et al., not function in isolation. Many (if not most) developmental genes
2015), suggesting that enhancers may sample multiple, if not are regulated by multiple enhancers with both overlapping and
all promoters confined within a TAD. This idea is supported by distinct spatiotemporal activities. Indeed, key lineage genes in
mouse reporter assays, in which transposed LacZ reporter sen- a given cell type tend to be associated with dense cluster(s) of
sors maintain a similar expression readout across a range of dis- highly active enhancers, often referred to as super-enhancers
tances from a known enhancer until they cross a TAD boundary (Hnisz et al., 2013; Whyte et al., 2013). These observations beg
(Symmons et al., 2014) (Figure 4D). However, given that in many a couple of questions: what are the regulatory relationships
cases genes in the same TAD in fact exhibit distinct expression among simultaneously active enhancers? What are the conse-
patterns, this apparent capacity of enhancers to activate tran- quences of multiple enhancers regulating one gene?
scription across their designated topological domain raises an The classic mode of enhancer behavior is described as auton-
issue of enhancer-promoter specificity. More quantitative anal- omous, modular, and additive—a feature which has been sug-
ysis of genomic space sampling by enhancers is warranted, as gested to confer evolvability to enhancers over other functional
some genomic positions within a topological domain may be parts of the genome (Carroll, 2008) (Figure 5A). Many demonstra-
favored targets due to preferential intra-domain chromatin folds. tions of the modular and additive behavior of enhancers exist,
Moreover, additional mechanisms, such as promoter-specific both for elements controlling the same locus across different tis-
repression, complex synergistic, or competitive relationships sues and for simultaneously active individual elements within a
between simultaneously active enhancers within the same TAD super-enhancer (for example, Hay et al., 2016; Maeda and
or biochemical compatibility between enhancers and promoters, Karch, 2011; Visel et al., 2009). This relative modularity is thought
may all contribute to achieving greater regulatory specificity of to confer highly tissue-specific phenotypes of enhancer dele-
enhancer-promoter interactions within a topological domain. tions near pleiotropically expressed genes.
1180 Cell 167, November 17, 2016

Figure 5. Regulatory Crosstalk between En-
hancers Operating within the Same Cis-
Regulatory Landscape
(A–E) Potential modes of enhancer relationships
and their effect on transcriptional output under
wild-type or mutant conditions.
proof of principle, multiple studies in other

species, including mammals, have docu-
mented similar multiplicative or greater-
than-multiplicative effects of enhancers
acting together to boost expression (for
example, Maekawa et al., 1989; Stine
et al., 2011) (Figure 5B).
Importantly, enhancers acting together
can drive not only quantitative but also
qualitative changes in spatiotemporal
activity patterns in vivo. A number of
reports using transgenes in Drosophila
and mice have demonstrated that the
spatial expression patterns of multiple en-
hancers tested together can be distinct
from the sum of each enhancer tested
individually and that this interplay is
required to recapitulate the endogenous
expression pattern of their target genes
(examples include Dib et al., 2011; Duni-
pace et al., 2011; Prazak et al., 2010).
Interestingly, one recurrent observation
from such studies is that often the effect
of multiple enhancers acting together is
not to generate novel activity domains
but to prevent ectopic expression outside
of its proper context. Several careful
examples of this in Drosophila have
been described, one being the restric-
tion of gap-gene-expression patterns to
proper contexts by specific enhancers
near hunchback and knirps (Perry et al.,
2011). Similarly, synergistic enhancers
In other cases, however, the overall expression pattern of a restrict aberrant activity domains at the murine Fgf8 locus
given gene is different from the sum of the individual activities (Marinic et al., 2013). One suggested mechanism for this
of each enhancer element. While the pervasiveness of such behavior is through the recruitment of long-range repressors,
non-additive interactions between enhancers acting within regu- which suppress the activity of the neighboring enhancer in
latory landscapes is still unclear, some studies suggest that such ectopic regions (Dunipace et al., 2011; Perry et al., 2011)
interactions may be in fact extremely common. For example, (Figure 5C). Thus, it appears that fine-tuning of gene activation
comparison of individual enhancer activity with endogenous patterns in vivo is in part mediated by repressive influences be-
gene-expression patterns estimate that the activity of up to tween neighboring regulatory elements.
37% of enhancers may be modulated by their extended In some cases, cis-regulatory landscapes appear to follow a
endogenous loci in Drosophila (Kvon et al., 2014). more complex hierarchical logic. A classic example of this is
The first appreciation of this regulatory complexity came from seen at the Drosophila bithorax complex (BX-C), responsible
detailed analysis of the endo16 regulatory region in sea urchins. for the restricted expression of Ubx, abd-A, or Abd-B along the
Using a quantitative reporter and in situ hybridizations, these fly’s A-P axis. A shared 300 kb regulatory region for the BX-C
studies discovered that some elements within the broader regu- is divisible into nine parasegment-specific chromosomal do-
latory region were required for the activity of other elements, or mains, with each domain shown to control the activation of
could linearly amplify the output of another element by a factor one of the three BX-C homeotic genes in a pattern appropriate
of 4 (Yuh and Davidson, 1996; Yuh et al., 1998). Since this early for that segment (reviewed in Maeda and Karch, 2011).
Cell 167, November 17, 2016 1181

Interestingly, while each domain contains several regulatory ele- and shadow enhancers at the knirps and Kruppel loci suggests
ments, generally only one or two of these enhancers is limited in that they act additively at the center of an expression domain
activity along the A-P axis to their correct parasegment, while the but can become dominantly repressive in the posterior regions,
rest drive activity in tissues along the entire axis when tested facilitating establishment of a sharp expression boundary (El-
alone. These elements with limited activity were termed ‘‘initi- Sherif and Levine, 2016). Therefore, enhancer competition can
ator’’ elements, as they were proposed to read the appropriate buffer against fluctuations in gene expression levels in the ‘‘on’’
parasegmental address and determine if the surrounding regula- state, yet impart sharp on/off boundaries at the edges of expres-
tory domain is to be altogether active or silenced (Iampietro et al., sion domains when combined with tissue-specific recruitment of
2010; Mihaly et al., 2006). Importantly, when initiator elements repressors.
are switched between domains, it results in a homeotic transfor- The ability of enhancers to cooperate or compete adds an
mation (Iampietro et al., 2010), demonstrating that the initiator is additional layer of complexity to understanding how regula-
sufficient to coordinate the various enhancers within the domain. tory landscapes evolve. Redundant ‘‘shadow’’ enhancers, for
While such a well-characterized domain-restricted system example, may facilitate accumulation of neutral mutations, which
hasn’t been discovered in mammals, some examples of hierar- may then be unmasked during times of environmental stress,
chical logic at enhancers have been described, such as a condi- conferring ‘‘evolvability’’ to the species by increasing the pheno-
tional relationship between two enhancers near the PU.1 locus in typic diversity to allow more rapid selection to a changing fitness
mouse myeloid cells where an upstream regulatory element landscape. In contrast, cooperative enhancers may experience
(URE) directly initiates the activity of a nearby enhancer (the weak negative selection to co-evolve across evolution, with po-
‘‘-12 kb enhancer’’), possibly through a chromatin-mediated tential positional constraints to remain within the same topolog-
mechanism (Leddin et al., 2011) (Figure 5D). ical domain or to retain enough space to minimize short-range
Enhancer Redundancy and Competition cross-repressive interactions (Dunipace et al., 2011). Conse-
Another mode of non-additive behavior in cis-regulatory regions quently, changes in synergistic relationships may be a source
is enhancer redundancy, with functionally redundant enhancers of evolutionary innovation that cannot be easily captured by con-
referred to as ‘‘shadow enhancers’’ (reviewed in Barolo [2012]), ventional sequence comparisons and will require more complex
as coined by Mike Levine and colleagues to describe ‘‘remote functional analyses to detect systematically, such as genetic
secondary enhancers mapping far from the target gene and perturbation screens to manipulate function of enhancers in their
mediating activities overlapping the primary enhancer’’ (Hong native chromosomal context.
et al., 2008). Importantly, under this definition, shadow en- Concluding Remarks and Future Perspective
hancers are redundant only in that they have overlapping activity As ever, progress in our mechanistic understanding of gene
patterns and are not necessarily functionally identical. For regulation has thrown into relief the remaining gaps in our
example, despite having overlapping activity, shadow en- knowledge and opened up many new questions regarding
hancers may have different temporal dynamics or may serve to enhancer function. First, despite identification of a huge number
fine-tune temporal or spatial gene-expression boundaries (see of putative enhancer sequences, our understanding of the
below). Interestingly, shadow enhancers may also confer robust- importance of orientation, spacing, and copy-number of TFBSs
ness against environmental or genetic variability (also known as for enhancer function remains rudimentary. Efforts to engineer
canalization), since mutation of a shadow enhancer can remain synthetic enhancers have made strides toward unpicking
cryptic under normal conditions but be revealed under stress the enhancer lexicon; however, thus far, they had limited
conditions (Frankel et al., 2010). combinatorial power, were often performed in an episomal
A potential mechanism for enhancer redundancy would be a context, and were designed without much appreciation for the
competition model, where two enhancers compete for associa- distinct sequence preferences of TFs in cooperative versus indi-
tion with a shared promoter, buffering individual enhancer activ- vidual binding contexts. Given that (1) the sampling size for
ity to facilitate a constant transcriptional output (Figure 5E). In potential enhancer sequences is so large, (2) multiple distinct
this model, strongly activated enhancers would function sub- sequences may confer equivalent activities, and (3) the local
additively, while weaker enhancers would be expected to func- regulatory context may alter enhancer output, uncovering a uni-
tion in a more additive manner as a consequence of lower rates versally predictive set of rules for enhancer grammar, activity,
of promoter competition. Live monitoring of transcription at the and specificity in different tissues will be a major challenge for
hunchback and snai loci in Drosophila suggests that enhancer future research.
pairs operate in exactly this sub-additive manner when they Chromosomal conformation studies (3C and derivatives) have
are strongly activated due to competition for the target promoter recently provided key insights into long-range regulation by
(Figure 5E) (Bothma et al., 2015). This model could explain showing that most metazoan genomes are partitioned into
phenotypic robustness (or canalization) in a population, as TADs, which delimit boundaries of self-interacting chromatin
loss-of-function polymorphisms in one enhancer would be and thus organize regulatory landscapes. However, 3C-type ap-
compensated for by increased frequency of promoter contacts proaches also have significant limitations, as they involve chro-
of the remaining enhancer, resulting in maintenance of normal matin crosslinking, are typically performed on the population
expression levels of the target gene. In addition, this enhancer level, and there are associated challenges in normalizing the
competition model could also create sharp boundaries of gene data to reflect realistic background models. As a result, not
expression during development through the recruitment of only may the extent to which enhancers and promoters
long-range repressors. For example, interrogation of the primary form ‘‘loops’’ have been overestimated, but also the kinetic
1182 Cell 167, November 17, 2016

information underlying enhancer activation at a single-cell level Arnold, C.D., Gerlach, D., Spies, D., Matts, J.A., Sytnikova, Y.A., Pagani, M.,
has been lost. Orthogonal approaches, including high-resolution Lau, N.C., and Stark, A. (2014). Quantitative genome-wide enhancer activity
maps for five Drosophila species show functional enhancer conservation
imaging of fixed cells, has already provided some insight into the
and turnover during cis-regulatory evolution. Nat. Genet. 46, 685–692.
dynamics of enhancer-promoter interactions (Giorgetti et al.,
Arnosti, D.N., and Kulkarni, M.M. (2005). Transcriptional enhancers: Intelligent
2014; Fabre et al., 2015; Williamson et al., 2016). We anticipate
enhanceosomes or flexible billboards? J. Cell. Biochem. 94, 890–898.
that future live-cell imaging approaches will further help to
Arnosti, D.N., Barolo, S., Levine, M., and Small, S. (1996). The eve stripe 2
address how the information from enhancers is dynamically
enhancer employs multiple modes of transcriptional synergy. Development
transduced to promoters to allow for precise activation of 122, 205–214.
transcription.
Barolo, S. (2012). Shadow enhancers: frequently asked questions about
It recently became clear that the majority of genetic variants distributed cis-regulatory information and enhancer redundancy. BioEssays
associated with human complex disease or normal range trait 34, 135–141.
variation maps to the non-coding parts of the genome (reviewed Bartman, C.R., Hsu, S.C., Hsiung, C.C.S., Raj, A., and Blobel, G.A. (2016).
in Tak and Farnham [2015]). A substantial fraction of such varia- Enhancer Regulation of Transcriptional Bursting Parameters Revealed by
tion is thought to modulate cis-regulatory element function, and Forced Chromatin Looping. Mol. Cell 62, 237–247.
a number of examples of disease-associated non-coding muta- Birnbaum, R.Y., Clowney, E.J., Agamy, O., Kim, M.J., Zhao, J., Yamanaka, T.,
tions that ablate or change enhancer function or impact the Pappalardo, Z., Clarke, S.L., Wenger, A.M., Nguyen, L., et al. (2012). Coding
folding of topological domains have been identified (reviewed exons function as tissue-specific enhancers of nearby genes. Genome Res.
in Scacheri and Scacheri [2015]). Indeed, genetic variation within 22, 1059–1068.
the human population coupled with quantitative epigenomic ap- Blow, M.J., McCulley, D.J., Li, Z., Zhang, T., Akiyama, J.A., Holt, A., Plajzer-
proaches can be leveraged to link sequence changes to chro- Frick, I., Shoukry, M., Wright, C., Chen, F., et al. (2010). ChIP-Seq identification
of weakly conserved heart enhancers. Nat. Genet. 42, 806–810.
matin-state divergence both locally and distally within interacting
chromosomal regions, providing an avenue for the interpretation Boffelli, D., Nobrega, M.A., and Rubin, E.M. (2004). Comparative genomics at
the vertebrate extremes. Nat. Rev. Genet. 5, 456–465.
of GWAS studies and future investigation of mechanisms
underlying disease traits (Grubert et al., 2015; Waszak et al., Bothma, J.P., Garcia, H.G., Ng, S., Perry, M.W., Gregor, T., and Levine, M.
(2015). Enhancer additivity and non-additivity are determined by enhancer
2015). Similarly, evidence has accumulated that enhancer
strength in the Drosophila embryo. eLife 4, 1–14.
sequence changes mediate morphological divergence between
Bourque, G., Leong, B., Vega, V.B., Chen, X., Lee, Y.L., Srinivasan, K.G.,
species, and ‘‘cellular anthropology’’ approaches utilizing plurip-
Chew, J., Ruan, Y., Wei, C., Ng, H.H., et al. (2008). Evolution of the mammalian
otent stem cells from great apes have enabled investigation of transcription factor binding repertoire via transposable elements. Genome
recent hominid cis-regulatory landscape evolution in develop- Res. 18, 1752–1762.
mentally and evolutionary relevant cell types (Prescott et al., Bridges, C.B. (1935). Salivary chromosome maps with a key to the banding of
2015; Gallego Romero et al., 2015). Further understanding of the chromosomes of Drosophila melanogaster. J. Hered. 26, 60–64.
the contribution of enhancer sequence variation to human dis- Buecker, C., and Wysocka, J. (2012). Enhancers as information integration
ease susceptibility, normal range variation, and evolutionary hubs in development: lessons from genomics. Trends Genet. 28, 276–284.
innovation will likely soon come from human genetics and Burz, D.S., Rivera-Pomar, R., Jäckle, H., and Hanes, S.D. (1998). Cooperative
follow-up functional studies in cells and model organisms. DNA-binding by Bicoid provides a mechanism for threshold-dependent gene
A key challenge will be to understand how combinations of activation in the Drosophila embryo. EMBO J. 17, 5998–6009.
such regulatory variants make us uniquely human and uniquely Calo, E., and Wysocka, J. (2013). Modification of enhancer chromatin: what,
individual. how, and why? Mol. Cell 49, 825–837.
Cande, J.D., Chopra, V.S., and Levine, M. (2009). Evolving enhancer-promoter
ACKNOWLEDGMENTS interactions within the tinman complex of the flour beetle, Tribolium casta-
neum. Development 136, 3153–3160.
We thank Wysocka lab members and anonymous reviewers for insightful Carroll, S.B. (2008). Evo-devo and an expanding evolutionary synthesis: a ge-
comments on the manuscript. We apologize to colleagues whose important netic theory of morphological evolution. Cell 134, 25–36.
primary studies we were unable to cite due to space constraints. This work
was supported by the Howard Hughes Medical Institute, NIH R01 Chuong, E.B., Elde, N.C., and Feschotte, C. (2016). Regulatory evolution of
GM112720-01 (J.W.) and the Wellcome Trust, Sir Henry Wellcome Postdoc- innate immunity through co-option of endogenous retroviruses. Science
toral Fellowship 106051/Z/14/Z (H.K.L.). 351, 1083–1087.
Crane, E., Bian, Q., McCord, R.P., Lajoie, B.R., Wheeler, B.S., Ralston, E.J.,
Uzawa, S., Dekker, J., and Meyer, B.J. (2015). Condensin-driven remodelling
REFERENCES
of X chromosome topology during dosage compensation. Nature 523,
240–244.
Abe, N., Dror, I., Yang, L., Slattery, M., Zhou, T., Bussemaker, H.J., Rohs, R.,
and Mann, R.S. (2015). Deconvolving the recognition of DNA shape from Cretekos, C.J., Wang, Y., Green, E.D., Martin, J.F., Rasweiler, J.J., 4th, and
sequence. Cell 161, 307–318. Behringer, R.R. (2008). Regulatory divergence modifies limb length between
Agoglia, R.M., and Fraser, H.B. (2016). Disentangling sources of selection on mammals. Genes Dev. 22, 141–151.
exonic transcriptional enhancers. Mol. Biol. Evol. 33, 585–590. Crocker, J., Abe, N., Rinaldi, L., McGregor, A.P., Frankel, N., Wang, S., Alsa-
Allan, C.M., Walker, D., and Taylor, J.M. (1995). Evolutionary duplication of a wadi, A., Valenti, P., Plaza, S., Payre, F., et al. (2015). Low affinity binding
hepatic control region in the human apolipoprotein E gene locus. Identification site clusters confer hox specificity and regulatory robustness. Cell 160,
of a second region that confers high level and liver-specific expression of the 191–203.
human apolipoprotein E gene in transgenic mice. J. Biol. Chem. 270, 26278– de Wit, E., and de Laat, W. (2012). A decade of 3C technologies: insights into
26281. nuclear organization. Genes Dev. 26, 11–24.
Cell 167, November 17, 2016 1183

Dib, S., Denarier, E., Dionne, N., Beaudoin, M., Friedman, H.H., and Peterson, Giorgetti, L., Galupa, R., Nora, E.P., Piolot, T., Lam, F., Dekker, J., Tiana, G.,
A.C. (2011). Regulatory modules function in a non-autonomous manner to and Heard, E. (2014). Predictive polymer modeling reveals coupled fluctua-
control transcription of the mbp gene. Nucleic Acids Res. 39, 2548–2558. tions in chromosome conformation and transcription. Cell 157, 950–963.
Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and Giorgetti, L., Lajoie, B.R., Carter, A.C., Attia, M., Zhan, Y., Xu, J., Chen, C.J.,
Ren, B. (2012). Topological domains in mammalian genomes identified by Kaplan, N., Chang, H.Y., Heard, E., and Dekker, J. (2016). Structural organiza-
analysis of chromatin interactions. Nature 485, 376–380. tion of the inactive X chromosome in the mouse. Nature 535, 575–579.
Dowen, J.M., Fan, Z.P., Hnisz, D., Ren, G., Abraham, B.J., Zhang, L.N., Wein- Gómez-Marı́n, C., Tena, J.J., Acemel, R.D., López-Mayorga, M., Naranjo, S.,
traub, A.S., Schuijers, J., Lee, T.I., Zhao, K., and Young, R.A. (2014). Control of de la Calle-Mustienes, E., Maeso, I., Beccari, L., Aneas, I., Vielmas, E., et al.
cell identity genes occurs in insulated neighborhoods in mammalian chromo- (2015). Evolutionary comparison reveals that diverging CTCF sites are signa-
somes. Cell 159, 374–387. tures of ancestral topological associating domains borders. Proc. Natl.
Duncan, B.K., and Miller, J.H. (1980). Mutagenic deamination of cytosine res- Acad. Sci. USA 112, 7542–7547.
idues in DNA. Nature 287, 560–561. Goode, D.K., Callaway, H.A., Cerda, G.A., Lewis, K.E., and Elgar, G. (2011).
Dunipace, L., Ozdemir, A., and Stathopoulos, A. (2011). Complex interactions Minor change, major difference: divergent functions of highly conserved cis-
between cis-regulatory modules in native conformation are critical for regulatory elements subsequent to whole genome duplication events. Devel-
Drosophila snail expression. Development 138, 4075–4084. opment 138, 879–884.
Duque, T., and Sinha, S. (2015). What does it take to evolve an enhancer? Gordân, R., Shen, N., Dror, I., Zhou, T., Horton, J., Rohs, R., and Bulyk, M.L.
A simulation-based study of factors influencing the emergence of combinato- (2013). Genomic regions flanking E-box binding sites influence DNA binding
rial regulation. Genome Biol. Evol. 7, 1415–1431. specificity of bHLH transcription factors through DNA shape. Cell Rep. 3,
1093–1104.
Eagen, K.P., Hartl, T.A., and Kornberg, R.D. (2015). Stable Chromosome
Grob, S., Schmid, M.W., and Grossniklaus, U. (2014). Hi-C analysis in Arabi-
Condensation Revealed by Chromosome Conformation Capture. Cell 163,
dopsis identifies the KNOT, a structure with similarities to the flamenco locus
934–946.
of Drosophila. Mol. Cell 55, 678–693.
El-Sherif, E., and Levine, M. (2016). Shadow enhancers mediate dynamic shifts
Grubert, F., Zaugg, J.B., Kasowski, M., Ursu, O., Spacek, D.V., Martin, A.R.,
of gap gene expression in the Drosophila embryo. Curr. Biol. 26, 1164–1169.
Greenside, P., Srivas, R., Phanstiel, D.H., Pekowska, A., et al. (2015). Genetic
Erceg, J., Saunders, T.E., Girardot, C., Devos, D.P., Hufnagel, L., and Furlong, Control of Chromatin States in Humans Involves Local and Distal Chromo-
E.E. (2014). Subtle changes in motif positioning cause tissue-specific effects somal Interactions. Cell 162, 1051–1065.
on robustness of an enhancer’s activity. PLoS Genet. 10, e1004060.
Guo, Y., Xu, Q., Canzio, D., Shou, J., Li, J., Gorkin, D.U., Jung, I., Wu, H., Zhai,
Fabre, P.J., Benke, A., Joye, E., Nguyen Huynh, T.H., Manley, S., and Duboule, Y., Tang, Y., et al. (2015). CRISPR Inversion of CTCF Sites Alters Genome
D. (2015). Nanoscale spatial organization of the HoxD gene cluster in distinct Topology and Enhancer/Promoter Function. Cell 162, 900–910.
transcriptional states. Proc. Natl. Acad. Sci. USA 112, 13964–13969.
Guturu, H., Doxey, A.C., Wenger, A.M., and Bejerano, G. (2013). Structure-
Farley, E.K., Olson, K.M., Zhang, W., Brandt, A.J., Rokhsar, D.S., and Levine, aided prediction of mammalian transcription factor complexes in conserved
M.S. (2015). Suboptimization of developmental enhancers. Science 350, non-coding elements. Philos. Trans. R. Soc. Lond. B Biol. Sci. 368, 20130029.
325–328.
Harmston, N., Ing-simmons, E., Tan, G., Perry, M., Merkenschlager, M., and
Farré, M., Robinson, T.J., and Ruiz-Herrera, A. (2015). An Integrative Breakage Lenhard, B. (2016). Topologically associated domains are ancient features
Model of genome architecture, reshuffling and evolution: The Integrative that coincide with Metazoan clusters of extreme noncoding conservation.
Breakage Model of genome evolution, a novel multidisciplinary hypothesis bioRxiv, http://dx.doi.org/10.1101/042952.
for the study of genome plasticity. Bioessays 37, 479–488.
Hay, D., Hughes, J.R., Babbs, C., Davies, J.O., Graham, B.J., Hanssen, L.L.,
Feng, S., Cokus, S.J., Schubert, V., Zhai, J., Pellegrini, M., and Jacobsen, S.E. Kassouf, M.T., Oudelaar, A.M., Sharpe, J.A., Suciu, M.C., et al. (2016). Genetic
(2014). Genome-wide Hi-C analyses in wild-type and mutants reveal high-res- dissection of the a-globin super-enhancer in vivo. Nat. Genet. 48, 895–903.
olution chromatin interactions in Arabidopsis. Mol. Cell 55, 694–707.
Heger, P., Marin, B., Bartkuhn, M., Schierenberg, E., and Wiehe, T. (2012). The
Feschotte, C. (2008). Transposable elements and the evolution of regulatory chromatin insulator CTCF and the emergence of metazoan diversity. Proc.
networks. Nat. Rev. Genet. 9, 397–405. Natl. Acad. Sci. USA 109, 17507–17512.
Flavahan, W.A., Drier, Y., Liau, B.B., Gillespie, S.M., Venteicher, A.S., Stem- Hilton, I.B., D’Ippolito, A.M., Vockley, C.M., Thakore, P.I., Crawford, G.E.,
mer-Rachamimov, A.O., Suvà, M.L., and Bernstein, B.E. (2016). Insulator Reddy, T.E., and Gersbach, C.A. (2015). Epigenome editing by a CRISPR-
dysfunction and oncogene activation in IDH mutant gliomas. Nature 529, Cas9-based acetyltransferase activates genes from promoters and en-
110–114. hancers. Nat. Biotechnol. 33, 510–517.
Frankel, N., Davis, G.K., Vargas, D., Wang, S., Payre, F., and Stern, D.L. (2010). Hnisz, D., Abraham, B.J., Lee, T.I., Lau, A., Saint-André, V., Sigova, A.A., Hoke,
Phenotypic robustness conferred by apparently redundant transcriptional en- H.A., and Young, R.A. (2013). Super-enhancers in the control of cell identity
hancers. Nature 466, 490–493. and disease. Cell 155, 934–947.
Frankel, N., Erezyilmaz, D.F., McGregor, A.P., Wang, S., Payre, F., and Stern, Hong, J.-W., Hendrix, D.A., and Levine, M.S. (2008). Shadow enhancers as a
D.L. (2011). Morphological evolution caused by many subtle-effect substitu- source of evolutionary novelty. Science 321, 1314.
tions in regulatory DNA. Nature 474, 598–603. Hsieh, T.-H.S., Weiner, A., Lajoie, B., Dekker, J., Friedman, N., and Rando,
Fudenberg, G., Imakaev, M., Lu, C., Goloborodko, A., Abdennur, N., and O.J. (2015). Mapping Nucleosome Resolution Chromosome Folding in Yeast
Mirny, L.A. (2015). Formation of Chromosomal Domains by Loop Extrusion. by Micro-C. Cell 162, 108–119.
Cell Rep. 15, 2038–2049. Hou, C., Li, L., Qin, Z.S., and Corces, V.G. (2012). Gene density, transcription,
Fukaya, T., Lim, B., and Levine, M. (2016). Enhancer Control of Transcriptional and insulators contribute to the partition of the Drosophila genome into phys-
Bursting. Cell 166, 358–368. ical domains. Mol. Cell 48, 471–484.
Gallego Romero, I., Pavlovic, B.J., Hernando-Herraez, I., Zhou, X., Ward, M.C., Iampietro, C., Gummalla, M., Mutero, A., Karch, F., and Maeda, R.K. (2010).
Banovich, N.E., Kagan, C.L., Burnett, J.E., Huang, C.H., Mitrano, A., et al. Initiator elements function to determine the activity state of BX-C enhancers.
(2015). A panel of induced pluripotent stem cells from chimpanzees: a PLoS Genet. 6, e1001260.
resource for comparative functional genomics. Elife 4, e07103. Jacques, P.-É., Jeyakani, J., and Bourque, G. (2013). The majority of primate-
Galtier, N., and Duret, L. (2007). Adaptation or biased gene conversion? Ex- specific regulatory sequences are derived from transposable elements. PLoS
tending the null hypothesis of molecular evolution. Trends Genet. 23, 273–277. Genet. 9, e1003504.
1184 Cell 167, November 17, 2016

Jolma, A., Yin, Y., Nitta, K.R., Dave, K., Popov, A., Taipale, M., Enge, M., Ki- Maeda, R.K., and Karch, F. (2011). Gene expression in time and space: addi-
vioja, T., Morgunova, E., and Taipale, J. (2015). DNA-dependent formation of tive vs hierarchical organization of cis-regulatory regions. Curr. Opin. Genet.
transcription factor pairs alters their binding specificity. Nature 527, 384–388. Dev. 21, 187–193.
Jones, P.A. (2012). Functions of DNA methylation: islands, start sites, gene Maekawa, T., Imamoto, F., Merlino, G.T., Pastan, I., and Ishii, S. (1989). Coop-
bodies and beyond. Nat. Rev. Genet. 13, 484–492. erative function of two separate enhancers of the human epidermal growth
factor receptor proto-oncogene. J. Biol. Chem. 264, 5488–5494.
Junion, G., Spivakov, M., Girardot, C., Braun, M., Gustafson, E.H., Birney, E.,
and Furlong, E.E.M. (2012). A transcription factor collective defines cardiac cell Malik, S., and Roeder, R.G. (2010). The metazoan Mediator co-activator com-
fate and reflects lineage history. Cell 148, 473–486. plex as an integrative hub for transcriptional regulation. Nat. Rev. Genet. 11,
761–772.
Kearns, N.A., Pham, H., Tabak, B., Genga, R.M., Silverstein, N.J., Garber, M.,
and Maehr, R. (2015). Functional annotation of native enhancers with a Cas9- , M., Aktas, T., Ruf, S., and Spitz, F. (2013). An integrated holo-
Marinic
histone demethylase fusion. Nat. Methods 12, 401–403. enhancer unit defines tissue and gene specificity of the Fgf8 regulatory land-
scape. Dev. Cell 24, 530–542.
Kim, T.-K., and Shiekhattar, R. (2015). Architectural and Functional Common-
alities between Enhancers and Promoters. Cell 162, 948–959. Mavragani-Tsipidou, P., Scouras, Z.G., and Kastritsis, C.D. (1990). Compari-
son of the polytene chromosomes of the salivary gland, the fat body and the
Kleinjan, D.A., Bancewicz, R.M., Gautier, P., Dahm, R., Schonthaler, H.B.,
midgut nuclei of Drosophila auraria. Genetica 81, 99–108.
Damante, G., Seawright, A., Hever, A.M., Yeyati, P.L., van Heyningen, V.,
and Coutinho, P. (2008). Subfunctionalization of duplicated zebrafish pax6 May, D., Blow, M.J., Kaplan, T., McCulley, D.J., Jensen, B.C., Akiyama, J.A.,
genes by cis-regulatory divergence. PLoS Genet. 4, e29. Holt, A., Plajzer-Frick, I., Shoukry, M., Wright, C., et al. (2011). Large-scale dis-
covery of enhancers from human heart tissue. Nat. Genet. 44, 89–93.
Kolovos, P., Knoch, T.A., Grosveld, F.G., Cook, P.R., and Papantonis, A.
(2012). Enhancers and silencers: an integrated and simple model for their func- Mihaly, J., Barges, S., Sipos, L., Maeda, R., Cléard, F., Hogga, I., Bender, W.,
tion. Epigenetics Chromatin 5, 1. Gyurkovics, H., and Karch, F. (2006). Dissecting the regulatory landscape of
the Abd-B gene of the bithorax complex. Development 133, 2983–2993.
Kostka, D., Hubisz, M.J., Siepel, A., and Pollard, K.S. (2012). The role of GC-
Miller, J.A., and Widom, J. (2003). Collaborative Competition Mechanism for
biased gene conversion in shaping the fastest evolving regions of the human
Gene Activation In Vivo. Mol. Cell Biol. 23, 1623–1632.
genome. Mol. Biol. Evol. 29, 1047–1057.
Minezaki, Y., Homma, K., Kinjo, A.R., and Nishikawa, K. (2006). Human tran-
Krasnov, A.N., Mazina, M.Y., Nikolenko, J.V., and Vorobyeva, N.E. (2016). On
scription factors contain a high fraction of intrinsically disordered regions
the way of revealing coactivator complexes cross-talk during transcriptional
essential for transcriptional regulation. J. Mol. Biol. 359, 1137–1149.
activation. Cell Biosci. 6, 15.
Mirny, L.A. (2010). Nucleosome-mediated cooperativity between transcription
Kvon, E.Z., Kazmar, T., Stampfel, G., Yáñez-Cuna, J.O., Pagani, M., Schern-
factors. Proc. Natl. Acad. Sci. USA 107, 22534–22539.
huber, K., Dickson, B.J., and Stark, A. (2014). Genome-scale functional
characterization of Drosophila developmental enhancers in vivo. Nature 512, Mizuguchi, T., Fudenberg, G., Mehta, S., Belton, J., Taneja, N., and Folco, H.D.
91–95. (2014). Cohesin-dependent globules and heterochromatin shape 3D genome
architecture in S. pombe. Nature 516, 432–435.
Lan, X., and Pritchard, J.K. (2015). Coregulation of tandem duplicate genes
slows evolution of subfunctionalization in mammals. Science 352, 1009–1013. Narendra, V., Rocha, P.P., An, D., Raviram, R., Skok, J.A., Mazzoni, E.O., and
Reinberg, D. (2015). CTCF establishes discrete functional chromatin domains
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J.,
at the Hox clusters during differentiation. Science 347, 1017–1021.
Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing
and analysis of the human genome. Nature 409, 860–921. Naumova, N., Imakaev, M., Fudenberg, G., Zhan, Y., Lajoie, B.R., Mirny, L.A.,
and Dekker, J. (2013). Organization of the mitotic chromosome. Science 342,
Le, T.B., Imakaev, M.V., Mirny, L.A., and Laub, M.T. (2013). High-resolution 948–953.
mapping of the spatial organization of a bacterial chromosome. Science
342, 731–734. Ng, F.S., Schütte, J., Ruau, D., Diamanti, E., Hannah, R., Kinston, S.J., and
Göttgens, B. (2014). Constrained transcription factor spacing is prevalent
Leddin, M., Perrod, C., Hoogenkamp, M., Ghani, S., Assi, S., Heinz, S., Wilson, and important for transcriptional control of mouse blood cells. Nucleic Acids
N.K., Follows, G., Schönheit, J., Vockentanz, L., et al. (2011). Two distinct Res. 42, 13513–13524.
auto-regulatory loops operate at the PU.1 locus in B cells and myeloid cells.
Nitta, K.R., Jolma, A., Yin, Y., Morgunova, E., Kivioja, T., Akhtar, J., Hens, K.,
Blood 117, 2827–2838.
Toivonen, J., Deplancke, B., Furlong, E.E.M., and Taipale, J. (2015). Conserva-
Lettice, L.A., Heaney, S.J.H., Purdie, L.A., Li, L., de Beer, P., Oostra, B.A., tion of transcription factor binding specificities across 600 million years of
Goode, D., Elgar, G., Hill, R.E., and de Graaff, E. (2003). A long-range Shh bilateria evolution. eLife 4, 1–20.
enhancer regulates expression in the developing limb and fin and is associated
Nora, E.P., Lajoie, B.R., Schulz, E.G., Giorgetti, L., Okamoto, I., Servant, N.,
with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735.
Piolot, T., van Berkum, N.L., Meisig, J., Sedat, J., et al. (2012). Spatial parti-
Levine, M. (2010). Transcriptional enhancers in animal development and evo- tioning of the regulatory landscape of the X-inactivation centre. Nature 485,
lution. Curr. Biol. 20, R754–R763. 381–385.
Levo, M., Zalckvar, E., Sharon, E., Dantas Machado, A.C., Kalma, Y., Lotam- Perry, M.W., Boettiger, A.N., and Levine, M. (2011). Multiple enhancers ensure
Pompan, M., Weinberger, A., Yakhini, Z., Rohs, R., and Segal, E. (2015). Un- precision of gap gene-expression patterns in the Drosophila embryo. Proc.
raveling determinants of transcription factor binding outside the core binding Natl. Acad. Sci. USA 108, 13570–13575.
site. Genome Res. 25, 1018–1029.
Phillips-Cremins, J.E., Sauria, M.E.G., Sanyal, A., Gerasimova, T.I., Lajoie,
Ludwig, M.Z., Bergman, C., Patel, N.H., and Kreitman, M. (2000). Evidence for B.R., Bell, J.S.K., Ong, C.T., Hookway, T.A., Guo, C., Sun, Y., et al. (2013).
stabilizing selection in a eukaryotic enhancer element. Nature 403, 564–567. Architectural protein subclasses shape 3D organization of genomes during
Lupiáñez, D.G., Kraft, K., Heinrich, V., Krawitz, P., Brancati, F., Klopocki, E., lineage commitment. Cell 153, 1281–1295.
Horn, D., Kayserili, H., Opitz, J.M., Laxova, R., et al. (2015). Disruptions of to- Prazak, L., Fujioka, M., and Gergen, J.P. (2010). Non-additive interactions
pological chromatin domains cause pathogenic rewiring of gene-enhancer in- involving two distinct elements mediate sloppy-paired regulation by pair-rule
teractions. Cell 161, 1012–1025. transcription factors. Dev. Biol. 344, 1048–1059.
Lynch, V.J., Leclerc, R.D., May, G., and Wagner, G.P. (2011). Transposon- Prescott, S.L., Srinivasan, R., Marchetto, M.C., Grishina, I., Narvaiza, I., Selleri,
mediated rewiring of gene regulatory networks contributed to the evolution L., Gage, F.H., Swigut, T., and Wysocka, J. (2015). Enhancer divergence and
of pregnancy in mammals. Nat. Genet. 43, 1154–1159. cis-regulatory evolution in the human and chimp neural crest. Cell 163, 68–83.
Cell 167, November 17, 2016 1185

Rada-Iglesias, A., Bajpai, R., Swigut, T., Brugmann, S.A., Flynn, R.A., and Wy- Sved, J., and Bird, A. (1990). The expected equilibrium of the CpG dinucleotide
socka, J. (2011). A unique chromatin signature uncovers early developmental in vertebrate genomes under a mutation model. Proc. Natl. Acad. Sci. USA 87,
enhancers in humans. Nature 470, 279–283. 4692–4696.
Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Symmons, O., Uslu, V.V., Tsujimura, T., Ruf, S., Nassari, S., Schwarzer, W.,
Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., and Aiden, Ettwiller, L., and Spitz, F. (2014). Functional and topological characteristics
E.L. (2014). A 3D map of the human genome at kilobase resolution reveals prin- of mammalian regulatory domains. Genome Res. 24, 390–400.
ciples of chromatin looping. Cell 159, 1665–1680. Taatjes, D.J., Marr, M.T., and Tjian, R. (2004). Regulatory diversity among
Rebeiz, M., Jikomes, N., Kassner, V.A., and Carroll, S.B. (2011). Evolutionary metazoan co-activator complexes. Nat. Rev. Mol. Cell Biol. 5, 403–410.
origin of a novel gene expression pattern through co-option of the latent activ- Taher, L., McGaughey, D.M., Maragh, S., Aneas, I., Bessling, S.L., Miller, W.,
ities of existing regulatory sequences. Proc. Natl. Acad. Sci. USA 108, 10036– Nobrega, M.A., McCallion, A.S., and Ovcharenko, I. (2011). Genome-wide
10043. identification of conserved regulatory function in diverged sequences. Genome
Roeder, R.G. (2005). Transcriptional regulation and the role of diverse coacti- Res. 21, 1139–1149.
vators in animal cells. FEBS Lett. 579, 909–915.
Tak, Y.G., and Farnham, P.J. (2015). Making sense of GWAS: using epigenom-
Sakabe, N.J., and Nobrega, M.A. (2013). Beyond the ENCODE project: using ics and genome engineering to understand the functional relevance of SNPs in
genomics and epigenomics strategies to study enhancer evolution. Philos. non-coding regions of the human genome. Epigenetics Chromatin 8, 57.
Trans. R. Soc. Lond. B Biol. Sci. 368, 20130022.
Taylor, J.S., and Raes, J. (2004). Duplication and divergence: the evolution of
Sanborn, A.L., Rao, S.S., Huang, S.-C., Durand, N.C., Huntley, M.H., Jewett, new genes and old ideas. Annu. Rev. Genet. 38, 615–643.
A.I., Bochkov, I.D., Chinnappan, D., Cutkosky, A., Li, J., et al. (2015). Chro-
Thanos, D., and Maniatis, T. (1995). Virus induction of human IFN beta gene
matin extrusion explains key features of loop and domain formation in
expression requires the assembly of an enhanceosome. Cell 83, 1091–
wild-type and engineered genomes. Proc. Natl. Acad. Sci. USA 112, E6456–
1100.
E6465.
Tillo, D., Kaplan, N., Moore, I.K., Fondufe-Mittendorf, Y., Gossett, A.J., Field,
Scacheri, C.A., and Scacheri, P.C. (2015). Mutations in the noncoding
Y., Lieb, J.D., Widom, J., Segal, E., and Hughes, T.R. (2010). High nucleo-
genome. Curr. Opin. Pediatr. 27, 659–664.
some occupancy is encoded at human regulatory sequences. PLoS One 5,
Schaffner, W. (2015). Enhancers, enhancers - from their discovery to today’s e9129.
universe of transcription enhancers. Biol. Chem. 396, 311–327.
Ting, C.N., Rosenberg, M.P., Snow, C.M., Samuelson, L.C., and Meisler,
Seitan, V.C., Faure, A.J., Zhan, Y., McCord, R.P., Lajoie, B.R., Ing-Simmons, M.H. (1992). Endogenous retroviral sequences are required for tissue-spe-
E., Lenhard, B., Giorgetti, L., Heard, E., Fisher, A.G., et al. (2013). Cohesin- cific expression of a human salivary amylase gene. Genes Dev. 6, 1457–
based chromatin interactions enable regulated gene expression within preex- 1465.
isting architectural compartments. Genome Res. 23, 2066–2077.
Ulianov, S.V., Khrameeva, E.E., Gavrilov, A.A., Flyamer, I.M., Kos, P., Mikha-
Sexton, T., Yaffe, E., Kenigsberg, E., Bantignies, F., Leblanc, B., Hoichman, leva, E.A., Penin, A.A., Logacheva, M.D., Imakaev, M.V., Chertovich, A.,
M., Parrinello, H., Tanay, A., and Cavalli, G. (2012). Three-dimensional folding et al. (2016). Active chromatin and transcription play a key role in chromo-
and functional organization principles of the Drosophila genome. Cell 148, some partitioning into topologically associating domains. Genome Res. 26,
458–472. 70–84.
Slattery, M., Zhou, T., Yang, L., Dantas Machado, A.C., Gordân, R., and Rohs,
Van Bortle, K., Nichols, M.H., Li, L., Ong, C.-T., Takenaka, N., Qin, Z.S., and
R. (2014). Absence of a simple code: how transcription factors read the
Corces, V.G. (2014). Insulator function and topological domain border strength
genome. Trends Biochem. Sci. 39, 381–399.
scale with architectural protein occupancy. Genome Biol. 15, R82.
Smith, R.P., Taher, L., Patwardhan, R.P., Kim, M.J., Inoue, F., Shendure, J.,
Vierstra, J., Rynes, E., Sandstrom, R., Zhang, M., Canfield, T., Hansen, R.S.,
Ovcharenko, I., and Ahituv, N. (2013a). Massively parallel decoding of
Stehling-sun, S., Sabo, P.J., Byron, R., Humbert, R., et al. (2014). Mouse reg-
mammalian regulatory sequences supports a flexible organizational model.
ulatory DNA landscapes reveal global principles of cis-regulatory evolution.
Nat. Genet. 45, 1021–1028.
Science 346, 1007–1013.
Smith, R.P., Riesenfeld, S.J., Holloway, A.K., Li, Q., Murphy, K.K., Feliciano,
Vietri Rudan, M., Barrington, C., Henderson, S., Ernst, C., Odom, D.T., Tanay,
N.M., Orecchia, L., Oksenberg, N., Pollard, K.S., and Ahituv, N. (2013b).
A., and Hadjur, S. (2015). Comparative Hi-C reveals that CTCF underlies evo-
A compact, in vivo screen of all 6-mers reveals drivers of tissue-specific
lution of chromosomal domain architecture. Cell Rep. 10, 1297–1309.
expression and guides synthetic regulatory element design. Genome Biol.
14, R72. Villar, D., Berthelot, C., Aldridge, S., Rayner, T.F., Lukk, M., Pignatelli, M., Park,
T.J., Deaville, R., Erichsen, J.T., Jasinska, A.J., et al. (2015). Enhancer evolu-
Sofueva, S., Yaffe, E., Chan, W.-C., Georgopoulou, D., Vietri Rudan, M., Mira-
tion across 20 mammalian species. Cell 160, 554–566.
Bontenbal, H., Pollard, S.M., Schroth, G.P., Tanay, A., and Hadjur, S. (2013).
Cohesin-mediated interactions organize chromosomal domain architecture. Visel, A., Bristow, J., and Pennacchio, L.A. (2007). Enhancer identification
EMBO J. 32, 3119–3129. through comparative genomics. Semin. Cell Dev. Biol. 18, 140–152.
Spitz, F., and Furlong, E.E.M. (2012). Transcription factors: from enhancer Visel, A., Akiyama, J.A., Shoukry, M., Afzal, V., Rubin, E.M., and Pennacchio,
binding to developmental control. Nat. Rev. Genet. 13, 613–626. L.A. (2009). Functional autonomy of distant-acting human enhancers. Geno-
mics 93, 509–513.
Stampfel, G., Kazmar, T., Frank, O., Wienerroither, S., Reiter, F., and Stark, A.
(2015). Transcriptional regulators form diverse groups with context-dependent Waszak, S.M., Delaneau, O., Gschwind, A.R., Kilpinen, H., Raghav, S.K., Wit-
regulatory functions. Nature 528, 147–151. wicki, R.M., Orioli, A., Wiederkehr, M., Panousis, N.I., Yurovsky, A., et al.
Stergachis, A.B., Haugen, E., Shafer, A., Fu, W., Vernot, B., Reynolds, A., Rau- (2015). Population Variation and Genetic Control of Modular Chromatin Archi-
bitschek, A., Ziegler, S., LeProust, E.M., Akey, J.M., and Stamatoyannopou- tecture in Humans. Cell 162, 1039–1050.
los, J.A. (2013). Exonic transcription factor binding directs codon choice and Weake, V.M., and Workman, J.L. (2010). Inducible gene expression: diverse
affects protein evolution. Science 342, 1367–1372. regulatory mechanisms. Nat. Rev. Genet. 11, 426–437.
Stine, Z.E., McGaughey, D.M., Bessling, S.L., Li, S., and McCallion, A.S. Whitaker, J.W., Nguyen, T.T., Zhu, Y., Wildberg, A., and Wang, W. (2015).
(2011). Steroid hormone modulation of RET through two estrogen responsive Computational schemes for the prediction and annotation of enhancers from
enhancers in breast cancer. Hum. Mol. Genet. 20, 3746–3756. epigenomic assays. Methods 72, 86–94.
Su, M., Han, D., Boyd-Kirkup, J., Yu, X., and Han, J.D.J. (2014). Evolution of Alu Whyte, W.A., Orlando, D.A., Hnisz, D., Abraham, B.J., Lin, C.Y., Kagey, M.H.,
elements toward enhancers. Cell Rep. 7, 376–385. Rahl, P.B., Lee, T.I., and Young, R.A. (2013). Master transcription factors
1186 Cell 167, November 17, 2016

and mediator establish super-enhancers at key cell identity genes. Cell 153, Zaret, K.S., and Carroll, J.S. (2011). Pioneer transcription factors: establishing
307–319. competence for gene expression. Genes Dev. 25, 2227–2241.
Williamson, W.I., Lettice, L.A., Hill, R., and Bickmore, W.A. (2016). Shh and Zemojtel, T., Kielbasa, S.M., Arndt, P.F., Chung, H.-R., and Vingron, M. (2009).
ZRS enhancer co-localisation is specific to the zone of polarizing activity. Methylation and deamination of CpGs generate p53-binding sites on a
Development 143, 2994–3001. genomic scale. Trends Genet. 25, 63–66.
Wittkopp, P.J., and Kalay, G. (2012). Cis-regulatory elements: molecular Zhu, X., Ling, J., Zhang, L., Pi, W., Wu, M., and Tuan, D. (2007). A facilitated
mechanisms and evolutionary processes underlying divergence. Nat. Rev. tracking and transcription mechanism of long-range enhancer function.
Genet. 13, 59–69. Nucleic Acids Res. 35, 5532–5544.
Xing, K., and He, X. (2015). Reassessing the ‘‘duon’’ hypothesis of protein evo- Zhu, B., Zhang, W., Zhang, T., Liu, B., and Jiang, J. (2015). Genome-Wide Pre-
lution. Mol. Biol. Evol. 32, 1056–1062. diction and Validation of Intergenic Enhancers in Arabidopsis Using Open
Yuh, C.H., and Davidson, E.H. (1996). Modular cis-regulatory organization of Chromatin Signatures. Plant Cell 27, 2415–2426.
Endo16, a gut-specific gene of the sea urchin embryo. Development 122, Zuin, J., Dixon, J.R., van der Reijden, M.I.J., Ye, Z., Kolovos, P., Brouwer,
1069–1082. R.W., van de Corput, M.P., van de Werken, H.J., Knoch, T.A., van IJcken,
Yuh, C.H., Bolouri, H., and Davidson, E.H. (1998). Genomic cis-regulatory W.F., et al. (2014). Cohesin and CTCF differentially affect chromatin architec-
logic: experimental and computational analysis of a sea urchin gene. Science ture and gene expression in human cells. Proc. Natl. Acad. Sci. USA 111, 996–
279, 1896–1902. 1001.
Cell 167, November 17, 2016 1187

Leading Edge
Review
Insulated Neighborhoods: Structural

and Functional Units of Mammalian Gene Control
Denes Hnisz,1,3,* Daniel S. Day,1,3,* and Richard A. Young1,2,*
1Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
2Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
3Co-first author
*Correspondence: hnisz@wi.mit.edu (D.H.), dsday@wi.mit.edu (D.S.D.), young@wi.mit.edu (R.A.Y.)

Understanding how transcriptional enhancers control over 20,000 protein-coding genes to maintain
cell-type-specific gene expression programs in all human cells is a fundamental challenge in regu-
latory biology. Recent studies suggest that gene regulatory elements and their target genes gener-
ally occur within insulated neighborhoods, which are chromosomal loop structures formed by the
interaction of two DNA sites bound by the CTCF protein and occupied by the cohesin complex.
Here, we review evidence that insulated neighborhoods provide for specific enhancer-gene inter-
actions, are essential for both normal gene activation and repression, form a chromosome scaffold
that is largely preserved throughout development, and are perturbed by genetic and epigenetic fac-
tors in disease. Insulated neighborhoods are a powerful paradigm for gene control that provides
new insights into development and disease.
Introduction hancers, first described over 30 years ago (Banerji et al., 1981;
Many recent reports describe evidence that specific chromo- Benoist and Chambon, 1981; Gruss et al., 1981), are segments
some structures play important roles in gene control. A core of DNA that are typically a few hundred base pairs in length
principle that has emerged from these studies is that genes and are occupied by multiple transcription factors that recruit
and their regulatory elements typically occur together within co-activators and RNA polymerase II to target genes (Bulger
specific DNA loop structures, which we have called ‘‘insulated and Groudine, 2011; Spitz and Furlong, 2012; Tjian and Maniatis,
neighborhoods.’’ Here, we review evidence that insulated 1994). Tens of thousands of enhancers are estimated to be
neighborhoods are structural and functional units of gene con- active in any given human cell type (ENCODE Project Con-
trol, and we explain how they are used during development to sortium, 2012; Roadmap Epigenomics et al., 2015). Enhancers
control the diverse cell identities that contribute to complex and their associated factors can regulate expression of genes
animals. We explain how insulated neighborhoods form the located far upstream or downstream by looping to the promoters
mechanistic basis of higher-order chromosome structures, of these genes, so the features that cause enhancers to regulate
such as topologically associating domains (TADs), we discuss only specific genes, generally on their own chromosomes, have
how genetic and epigenetic perturbations of neighborhood been something of a mystery for several decades (Figure 1A).
boundaries contribute to disease, and we outline how further This mystery, which we will call the enhancer-gene-specificity
study of neighborhood structure and function will lead to addi- conundrum, is important to solve because the majority of dis-
tional insights into development and disease. There are other ease-associated non-coding variation occurs in the vicinity of
excellent reviews that provide historical perspective and sum- enhancers and, thus, likely impacts these enhancers’ target
marize key insights into chromosome structure (Bickmore and genes (Ernst et al., 2011; Farh et al., 2015; Hnisz et al., 2013;
van Steensel, 2013; Cavalli and Misteli, 2013; de Laat and Maurano et al., 2012).
Duboule, 2013; Dekker and Heard, 2015; Dekker and Mirny, Some of the specificity of enhancer-gene interactions may be
2016; Gibcus and Dekker, 2013; Gorkin et al., 2014; Mer- due to the interaction of DNA-binding transcription factors at en-
kenschlager and Nora, 2016; Phillips and Corces, 2009; Phil- hancers with specific partner transcription factors at promoters
lips-Cremins and Corces, 2013); here, we focus on the insulated (Butler and Kadonaga, 2001; Choi and Engel, 1988; Ohtsuki
neighborhood as a model for further exploration of the principles et al., 1998). Each cell type expresses hundreds of different tran-
that underpin gene control in mammalian systems. scription factors, and these bind to DNA sequences in enhancers
and in promoter-proximal regions. Diverse factors bound at
The Enhancer-Gene-Specificity Conundrum these two sites interact with large cofactor complexes and
Cell-type-specific gene expression programs in humans are could, in principle, interact with one another to produce some
generally controlled by gene regulatory elements called en- degree of enhancer-gene specificity (Zabidi et al., 2015). It is
hancers (Buecker and Wysocka, 2012; Heinz et al., 2015; Levine not clear to what extent this mechanism contributes to specific
et al., 2014; Ong and Corces, 2011; Ren and Yue, 2015). En- enhancer-gene interactions throughout the human genome.

within megabase-sized domains. However, they provide only
limited insight into the molecular mechanisms that engender
specific enhancer-gene interactions within TADs, which contain,
on average, about eight genes whose expression is weakly
correlated.
Further understanding of the mechanisms that engender spe-
cific enhancer-gene interactions have come from genome-wide
maps of the proteins that bind enhancers, promoters, and insu-
lators, together with knowledge of the physical contacts that
occur between these elements (Chepelev et al., 2012; DeMare
et al., 2013; Dowen et al., 2014; Fullwood et al., 2009; Handoko
et al., 2011; Phillips-Cremins et al., 2013; Tang et al., 2015). In
the models that emerge from these data, each chromosome
contains thousands of DNA loops, formed by the interaction
of two CTCF molecules bound to different sites and reinforced
by a cohesin molecule (Figure 2A). Enhancer-bound proteins
Figure 1. The Enhancer-Gene-Specificity Conundrum
(A) Model of a genomic region encompassing an enhancer and two genes. The are constrained such that they tend to interact only with
features that cause an enhancer to regulate only specific genes are still not fully genes within these CTCF-CTCF loops. As described below,
understood, which we refer to as the enhancer-gene specificity conundrum. the subset of CTCF sites that form these ‘‘loop anchors’’ thus
(B) Model of a genomic region encompassing an enhancer and two genes with
the transcription factor CTCF bound in between. CTCF is a component of
function to insulate enhancers and genes within the loop from
enhancer-blocking insulators, but which CTCF-bound sites function as an enhancers and genes outside the loop. For these and other rea-
insulator in vivo is still unclear. sons, these CTCF-CTCF DNA loops have been called ‘‘insulated
neighborhoods.’’
Another potential solution to the enhancer-gene-specificity
conundrum lies in insulators, which are regulatory elements Insulated Neighborhoods
that can block the ability of an enhancer to activate a gene Insulated neighborhoods have been defined as chromatin loops
when located between them (Chung et al., 1993; Geyer and Cor- that are formed by a CTCF-CTCF homodimer, co-bound with co-
ces, 1992; Kellum and Schedl, 1991; Udvardy et al., 1985). Insu- hesin, and contain at least one gene (Dowen et al., 2014; Ji et al.,
lators are bound by the transcription factor CTCF (Bell et al., 2016). In human embryonic stem cells (ESCs), there are 13,000
1999), but only a minority of CTCF sites function as insulators insulated neighborhoods, which range from 25 kb to 940 kb in
(Liu et al., 2015). The features that distinguish the subset of size and contain from 1–10 genes (Figure 2B) (Dowen et al.,
CTCF sites that function as insulators are not understood, 2014; Ji et al., 2016). The median insulated neighborhood is
so the extent to which insulators provide a solution to the 190kb and contains three genes. These numbers will vary de-
enhancer-gene-specificity conundrum has not been clear pending on assumptions made for filtering genomic data, as
(Figure 1B). described below, but they provide an initial description of
genomic loops that is useful for further analysis. We describe
Chromosome Structure Constrains Enhancer-Gene below evidence that insulated neighborhood loop anchors
Interactions have insulating properties, that they are largely maintained dur-
The idea that chromosome structures can influence phenotypic ing development, and that the subset of CTCF sites that form
traits is nearly as old as the chromosome theory of inheritance neighborhood loop anchors are especially conserved in the hu-
(Boveri, 1909), but only recently have studies of chromosome man germline and in primates.
structure suggested how enhancers might be constrained to Evidence for Insulation
interact with specific genes (Figure 2A). In situ hybridization Three lines of evidence argue that insulated neighborhood struc-
techniques and microscopy have revealed that individual inter- tures have insulating boundaries. The majority of enhancer-gene
phase chromosomes tend to occupy small portions of the nu- interactions occur within the insulated neighborhoods (Fig-
cleus, called ‘‘chromosome territories,’’ rather than spreading ure 2C). Perturbation of insulated neighborhood anchor se-
throughout this organelle (Cremer and Cremer, 2010); interac- quences leads to local gene dysregulation (Figure 2D). Somatic
tions between chromosomes would be minimized in this manner. mutations in multiple tumor types alter insulated neighborhood
Furthermore, individual chromosomes are partitioned into meg- anchor sequences in order to activate oncogenes (Figure 2E).
abase-sized TADs, regions with relatively high intradomain DNA These lines of evidence, described in more detail below, indicate
interaction frequencies as measured by Hi-C chromosome that the insulating function of the neighborhood loop anchors is
conformation capture data (Dixon et al., 2012; Nora et al., generally necessary for normal gene activation and repression.
2012). These TADs, which have similar boundaries in all The vast majority of enhancer-gene interactions occur within
human cell types examined, have been proposed to constrain insulated neighborhoods (Dowen et al., 2014; Hnisz et al.,
enhancer-gene interactions because most DNA contacts occur 2016; Ji et al., 2016; Phillips-Cremins et al., 2013). For example,
within the TADs (Dixon et al., 2012, 2015). This structuring in the insulated neighborhoods of human ESCs, 90% of en-
of the genome helps explain why enhancer-gene interactions hancer-promoter loops are fully contained within the neighbor-
rarely occur between chromosomes and tend to be constrained hood boundaries (Figure 2C) (Ji et al., 2016). Similarly, in the
Cell 167, November 17, 2016 1189

Figure 2. Insulated Neighborhoods
(A) Hierarchy of chromosome structures: chromosome territories, TADs, and insulated neighborhoods. Anchor refers to the CTCF-bound site interacting with
another CTCF-bound sites, both co-bound by a cohesin ring.
(B) Features of insulated neighborhoods in human embryonic stem cells (ESCs). The values displayed for the size range and number of genes represent the middle
95% of the data range.
(C) Evidence for insulation of insulated neighborhoods: 90% of enhancer-gene interactions occur within insulated neighborhoods in human ESCs.
(D) Evidence for insulation of insulated neighborhoods: deletion of insulated neighborhood anchors leads to gene misregulation.
(E) Evidence for insulation of insulated neighborhoods: mutations of insulated neighborhood anchors in tumor cells lead to oncogene activation.
insulated neighborhoods of human T cells, 90% of enhancer- the neighborhood. Insulated neighborhood boundaries are also
promoter loops are fully contained within the neighborhood necessary to maintain repression of genes within the neighbor-
boundaries (Hnisz et al., 2016). It is also possible to estimate hood; deletion of a CTCF anchor of an insulated neighborhood
each neighborhood’s insulation efficacy using an ‘‘insulation containing a Polycomb repressed gene led to the activation of
score.’’ The insulation score of a neighborhood is calculated as that gene (Dowen et al., 2014).
the percentage of enhancer-promoter interactions that are fully The finding that cancer cells can activate oncogenes through
contained within the neighborhood. In human ESCs, 59% of somatic mutations or epigenetic modifications that disrupt insu-
insulated neighborhoods have an insulation score of 100%. lated neighborhood boundaries provides additional evidence
Genetic perturbation of neighborhood anchor sequences has that neighborhood loop anchors have functional insulating prop-
provided evidence for their structural and functional roles as in- erties (Figure 2E) (Flavahan et al., 2016; Hnisz et al., 2016; Katai-
sulators (Dowen et al., 2014; Flavahan et al., 2016; Hnisz et al., nen et al., 2015). Silent proto-oncogenes typically occur within
2016; Ji et al., 2016; Narendra et al., 2015). In a dozen loci and insulated neighborhoods, and genetic modification of the neigh-
in multiple cell types, CRISPR/Cas9 deletion of CTCF binding borhood loop anchors can cause activation of these oncogenes
sites at the anchors of insulated neighborhoods has been shown (Flavahan et al., 2016; Hnisz et al., 2016). Somatic mutations
to produce changes in the expression of genes within the neigh- occur frequently and recurrently in the loop anchors of onco-
borhoods and immediately adjacent to the deleted neighbor- gene-containing insulated neighborhoods in a variety of cancer
hood boundary. For example, the miR-290–295 miRNA gene cells (Figure 2E). Indeed, the CTCF DNA-binding motif in loop an-
cluster, which plays important roles in ESC pluripotency, occurs chor regions is among the most-altered human-transcription-
within an insulated neighborhood together with a super- factor-binding sequences in cancer cells (Ji et al., 2016). These
enhancer; when a CTCF loop anchor site of this neighborhood observations are consistent with the idea that mutations that
was deleted, there was a reduction in expression of the miRNA alter the loop anchor sites of oncogene-containing insulated
precursor and activation of an adjacent gene outside of the neighborhoods make an important contribution to the misregula-
neighborhood concomitant with looping of the super-enhancer tion of gene expression that is inherent to the cancer state (Fla-
to this outside gene (Figure 2D). Furthermore, when genes occur vahan et al., 2016; Hnisz et al., 2016; Katainen et al., 2015).
within multiple nested insulated neighborhoods, deletion of mul- Maintenance of Loop Anchors during Development
tiple boundary sites was required to observe changes in gene The majority of insulated neighborhoods that have been mapped
expression (Dowen et al., 2014). Thus, insulated neighborhood in human ESCs appear to be maintained during development
boundaries constrain the activity of enhancers to genes within because the experimental evidence indicates that CTCF binding
1190 Cell 167, November 17, 2016

Figure 3. Insulated Neighborhoods in
Development
Cell-specific enhancer-gene interactions occur
within insulated neighborhoods that are generally
maintained in different cell types. Left side displays
a linear model of a genomic region encompassing
a gene associated with cell-type-specific en-
hancers, the right side displays the insulated
neighborhood model of the locus.
Insulated Neighborhoods Are the

Mechanistic Basis of TADs
TADs are megabase-sized domains with
relatively high DNA interaction fre-
quencies and are identified using a
Hidden Markov Model-based analysis of
Hi-C chromosome conformation capture
and CTCF-CTCF loop structures are very similar in many data (Dixon et al., 2012; Nora et al., 2012). Two observations
other human cells (Ji et al., 2016). This constitutive behavior is argue that TADs are generally composed of, and likely structured
consistent with the observation that CTCF is expressed in by, insulated neighborhoods.
all cell types examined (Phillips and Corces, 2009). While Cohesin ChIA-PET data were used to identify insulated neigh-
different cell types share very similar insulated neighborhood borhoods and the enhancer-promoter interactions that occur
boundaries, the enhancer-gene interactions that occur within within them because cohesin occupies both CTCF-CTCF insula-
these neighborhoods are cell-type specific because enhancer tors and enhancer-promoter interaction sites (Kagey et al., 2010).
activity is cell-type specific (Figure 3) (Ji et al., 2016; Smith These ChIA-PET DNA interaction data are biased: it is enriched
et al., 2016). for interaction sites where cohesin is present. Hi-C interaction
Evolutionary Conservation data do not have this bias: it identifies interactions that should
The CTCF sites that form insulated neighborhood boundaries be independent of the functions of any one protein. Nonetheless,
are evolutionarily conserved. Human germline variation is rare the TADs identified by processing Hi-C data with the hidden Mar-
in CTCF binding sites at insulated neighborhood boundaries, kov model (Dixon et al., 2012; Nora et al., 2012) can also be
and few GWAS variants occur in these sites (Ji et al., 2016). identified when this algorithm is used to process cohesin ChIA-
Analysis of CTCF-binding sites across primates indicates that PET data (Ji et al., 2016). Murine and human ESC ChIA-PET
the DNA sequence in anchor regions of insulated neighbor- data, processed with the same Hidden Markov Model, captures
hoods is far more conserved in primates than in regions most TAD boundaries derived from Hi-C data in murine and hu-
bound by CTCF that do not participate in neighborhood loops man ESCs (Figure 4A). These results suggest that insulated
(55% of CTCF binding sites in the human genome do not neighborhoods are a major structuring component of TADs.
appear to participate in insulated neighborhood loops) (Ji ChIA-PET data revealed that R 50% of TADs have TAD-span-
et al., 2016). ning CTCF-CTCF loops (Ji et al., 2016) and are thus insulated
A Subset of CTCF-CTCF Loops Connects Enhancers and neighborhoods. Because the existing ChIA-PET data are not
Promoters, while Others Contribute to Recombination saturating, this is a minimal estimate; it is possible that the major-
Although most CTCF-CTCF loops form insulated neighborhoods ity of TADs have TAD-spanning CTCF-CTCF loops. Some TADs
(Figure 2C), a subset of CTCF-CTCF loops (19% in hESCs) appear to be a single insulated neighborhood, while others
occur at enhancer-promoter interaction sites. We infer that these consist of multiple nested or multiple independent insulated
interactions facilitate gene activation; previous studies have neighborhoods (Figures 4B–4D). TADs were originally discov-
noted that some genes interact with their enhancers by this ered using Hi-C data that had 40 kb resolution, and improve-
mechanism (Ong and Corces, 2014). ments of the experimental and analytical aspects of Hi-C
CTCF- and cohesin-associated loops also play essential roles methods revealed that many TADs are composed of smaller
in V(D)J recombination of the immunoglobulin heavy chain in TAD-like domains at higher resolution (Schmitt et al., 2016). It
developing lymphocytes. Recombinase-assisted rearrange- is thus possible that all high-resolution TADs are insulated
ments of DNA segments encoding regions of antigen-binding re- neighborhoods and vice versa, and so the CTCF-CTCF loops
ceptors occur during the development of cells of the adaptive that encompass insulated neighborhoods, together with the
immune system. CTCF-CTCF looping has been implicated in enhancer-promoter loops within them, likely form the mecha-
bringing these segments into spatial proximity and also in con- nistic basis of most interphase chromosome structures.
straining the off-target effects of the recombinase (Dong et al.,
2015; Hu et al., 2015). Thus, a subset of CTCF-CTCF loops Relationships between Insulated Neighborhoods and
have evolved to control DNA recombination. It is possible that Other DNA Loop Models
these CTCF-CTCF loops and the enhancer-promoter CTCF- Mammalian chromosome loop structures have been reported in
CTCF loops may also act as insulated neighborhoods. multiple studies, which have used different descriptors for these
Cell 167, November 17, 2016 1191

Figure 4. Insulated Neighborhoods Are the
Mechanistic Basis of TADs
(A) Hi-C and cohesin ChIA-PET identify similar
TADs. Bars indicate the TADs identified using Hi-C
and ChIA-PET in human ESCs at the genomic
region whose coordinates are indicated in the
bottom.
(B) Model of a TAD that consists of an insulated
neighborhood.
(C) Model of a TAD that consists of nested insu-
lated neighborhoods.
(D) Model of a TAD that consists of two insulated
neighborhoods, nested within a TAD-spanning
CTCF-CTCF loop.
TADs, this may be one of the earliest de-

scriptions of what we now term insulated
neighborhoods. Another study used high-
resolution Hi-C technology to identify
5,000 chromatin loops whose bound-
aries are occupied by CTCF and cohesin
in multiple human cell types; these were
termed ‘‘loop domains’’ (Rao et al.,
2014). This and other studies have noted
that the DNA binding motif of CTCF is
asymmetric and thus directional, and the
CTCF anchors of > 90% the loop domains
occur in a convergent orientation (de Wit
et al., 2015; Gómez-Marı́n et al., 2015;
Guo et al., 2015; Ji et al., 2016; Vietri
Rudan et al., 2015). The convergent orien-
tation of CTCF motifs in the anchors
is also a general feature of insulated
neighborhoods (Ji et al., 2016). A recent
study mapped CTCF-associated con-
tacts genome wide using CTCF ChIA-
PET and revealed 2,000 ‘‘CTCF-contact
domains (CCDs)’’ in human cells (Tang
et al., 2015). CCDs are clusters of CTCF-
associated chromatin loops that appear
separated from other CTCF-associated
loops, including sub-TADs, loop domains, and CTCF-contact loops. The vast majority of RNA-polymerase-II-associated inter-
domains (Phillips-Cremins et al., 2013; Rao et al., 2014; Tang actions (e.g., enhancer-promoter loops) were found to occur
et al., 2015). An analysis of the structures described in these within the boundaries of CTCF-contact domains. The anchor
studies suggests that they generally represent the same struc- sites of CTCF contact domains are bound by CTCF and cohesin,
tural unit as the insulated neighborhoods described here and the CTCF anchors of 90% of the CTCF contact domains
(Figure 5). occur in a convergent orientation (Tang et al., 2015). These fea-
Pioneering studies using 5C-technology first described TAD tures suggest that CTCF contact domains are either insulated
subtopologies, termed ‘‘sub-TADs,’’ together with the struc- neighborhoods or clusters of insulated neighborhoods.
turing proteins CTCF and cohesin at seven genomic loci in mu- Comparison of insulated neighborhoods with loop domains
rine ESCs (Phillips-Cremins et al., 2013). Sub-TADs were found and CTCF contact domains in the same cell type suggests
to be constitutively present in multiple cell types, and cell-type- extensive overlap between these structures. For example, 70%
specific enhancer-promoter contacts occurred within sub-TAD of loop domains and 54% CTCF contact domains have the
boundaries (Phillips-Cremins et al., 2013). Examination of same boundaries as an insulated neighborhood in human lym-
several of the sub-TAD loop structures (e.g., at the Nanog and phoblastoid cells (Rao et al., 2014; Tang et al., 2015). Differences
Olig1-2 loci) reveals that they are among the insulated neighbor- in experimental and analytical methods can explain many of the
hoods described for murine ESCs (Dowen et al., 2014; Phillips- differences in loop structures reported by various studies;
Cremins et al., 2013). Although these early studies of sub-TAD indeed, similarities among loop structures are more evident
structures did not test the insulating properties of the sub- when data are analyzed with increasing stringency (Figure 5).
1192 Cell 167, November 17, 2016

Figure 5. Relationships between Insulated
Neighborhoods and Other DNA Loop
Models
DNA loops at the EYA1 genomic locus generated
using three different types of chromatin contact
data in lymphoblastoid cells. Displayed are the
cohesin ChIA-PET interactions (Heidari et al.,
2014) used to identify insulated neighborhoods,
Hi-C data (Rao et al., 2014) used to identify ‘‘Loop
Domains,’’ and CTCF ChIA-PET data (Tang et al.,
2015) used to identify ‘‘CTCF contact domains.’’
Increased stringency filtering of the CTCF ChIA-
PET data reveals a chromosome structure similar
to the insulated neighborhoods and loop domain.
The coordinates of the genomic region are dis-
played at the bottom.
CTCF-CTCF loop formation is not de-

tected (Splinter et al., 2006; Tolhuis
et al., 2002), which suggests that tissue-
specific CTCF-CTCF loops participate in
developmental gene control.
Insulated Neighborhoods, Gene Regulation, and Recent studies further support the view that the loop anchors
Disease of insulated neighborhoods play key roles in gene control. As
Gene Regulation described above, the vast majority of enhancer-promoter inter-
Studies of gene control at imprinted loci were among the first to actions occur within insulated neighborhoods in embryonic
reveal the importance of CTCF loops in gene control and the role stem cells (ESCs), and genetic perturbation of insulated neigh-
of DNA methylation in control of CTCF-associated loops. Parent- borhood anchors leads to misregulation of local genes (Dowen
of-origin specific gene activity at the imprinted IGF2/H19 locus et al., 2014; Ji et al., 2016). Positional information in the devel-
is controlled by allele-specific CTCF-CTCF interactions that oping embryo depends on the precise expression of Homeobox
constrain enhancer-gene contacts in a DNA-methylation-depen- (Hox) genes, and CTCF sites located within a Hox gene cluster
dent manner (Figure 6A) (Kurukuti et al., 2006; Murrell et al., play a critical role in proper expression of Hox genes (Narendra
2004). An insulated neighborhood on the maternal allele allows et al., 2015); some of these critical CTCF sites form insulated
an enhancer-promoter interaction that activates the H19 gene, neighborhood anchors in ESCs. Hereditary mutations that invert
but not the IGF2 gene, which is excluded from the neighborhood. or delete a TAD boundary at the EPHA4 locus have recently been
A larger insulated neighborhood is formed on the paternal allele linked to limb malformations in humans (Lupiáñez et al., 2015),
to allow an enhancer-promoter interaction that activates the and this TAD has a TAD-spanning CTCF-CTCF loop in ESCs,
IGF2 gene. Paternal allele-specific DNA methylation of a CTCF indicating that it is an insulated neighborhood. Inversion of a
site in the H19 promoter region abrogates CTCF binding, thus CTCF anchor has also been shown to cause altered enhancer-
causing differential CTCF-CTCF loop formation while silencing promoter contacts at the protocadherin locus (Guo et al.,
H19 expression (Bell and Felsenfeld, 2000; Hark et al., 2000; 2015). The effect of insulated neighborhoods on signal-respon-
Kanduri et al., 2000; Szabó et al., 2000). Individuals who lose sive gene expression also supports the concept of insulation;
these allele-specific insulated neighborhoods develop Beck- gene activation by NOTCH signaling in T cells was found to be
with-Wiedemann syndrome (when both alleles have the paternal restricted to genes that occur within the same CTCF-CTCF loops
type of insulated neighborhood; Figure 6B) or Silver-Russell syn- as NOTCH-dependent enhancers (Wang et al., 2014).
drome (when both alleles have the maternal type of insulated Altered Neighborhoods in Cancer
neighborhood; Figure 6C) (Nativio et al., 2011). Recent studies have revealed that mutations that alter the loop
Early studies of gene control at the beta-globin locus also anchor sites of oncogene-containing insulated neighborhoods
demonstrated the importance of CTCF and its looping interac- make an important contribution to the misregulation of gene
tions in developmental control (Hou et al., 2008; Splinter et al., expression that is inherent to the cancer state (Flavahan et al.,
2006; Tolhuis et al., 2002). In vertebrates, the beta globin locus 2016; Hnisz et al., 2016; Katainen et al., 2015). Somatic muta-
contains a cluster of fetal and adult globin genes, and the devel- tions occur frequently and recurrently in loop anchors of onco-
opmental control of these genes is exerted by an upstream reg- gene-containing insulated neighborhoods in a variety of cancer
ulatory element called the locus control region (LCR) (Figure 6D). cells, and the CTCF DNA-binding motif in loop anchor regions
In erythroid cells expressing globin genes, a large CTCF-CTCF is among the most altered human-transcription-factor-binding
loop encompasses the beta-globin genes and the locus control sequences in cancer cells (Flavahan et al., 2016; Hnisz et al.,
region (LCR), consistent with the organization of the locus in an 2016; Katainen et al., 2015). DNA hypermethylation occurs in
insulated neighborhood. In fetal brain cells, CTCF binding to some cancer cells, and tumor-specific DNA methylation has
one of the beta-globin loop anchor regions is absent, and recently been implicated in the disruption of CTCF binding,
Cell 167, November 17, 2016 1193

Figure 6. Insulated Neighborhoods at the
IGF2/H19 and b-globin Locus
(A) Insulated neighborhood model at the maternal
and paternal alleles of the imprinted IGF2/H19
locus. On the maternal allele, CTCF binding at the
imprint control region upstream of the H19 gene
creates an insulated neighborhood around H19
and an enhancer, which prevents the enhancer
from activating the IGF2 gene. On the paternal
allele, the imprint control region is methylated,
which leads to repression of the H19 gene and
prevention of CTCF binding. On this allele, a large
insulated neighborhood is formed, allowing the
downstream enhancer to activate the IGF2 gene.
Black lollipops indicate DNA methylation. The
insulated neighborhood models are displayed on
the right.
(B) Lack of methylation at the imprint control region
upstream of H19 and the presence of the large
insulated neighborhood on the maternal IGF2/H19
allele occur in patients with Beckwith-Wiedemann
syndrome.
(C) Methylation at the imprint control region up-
stream of H19 and the presence of the small
insulated neighborhood on the paternal IGF2/
H19 allele occurs in patients with Silver-Russell
syndrome.
(D) Insulated neighborhood model at the b-globin
locus containing a cluster of globin genes and an
upstream LCR.
these mutations contribute to oncogen-

esis by altering insulated neighborhoods.
Disease-Associated Variation in
Loop Anchors
Genetic variants occur rarely in insulated
neighborhood anchors. However, allelic
non-coding variants in CTCF loop an-
chors have been shown to correlate with
allele-specific enhancer-promoter inter-
actions (Tang et al., 2015). Among these,
one variant, associated with asthma, dis-
rupts CTCF binding and CTCF loop for-
mation (Tang et al., 2015). A recent
human population genetics study showed
several genetic variants linked with an in-
dividual’s lipid profile (e.g., LDL, HDL) and
present within at least 1% of the popula-
tion were found within CTCF binding sites
(UK10K Consortium et al., 2015), and it
is possible that these variants disrupt
CTCF binding at insulated neighborhood
boundaries. With the new knowledge of
alteration of chromosome structure, and dysregulation of onco- CTCF loop anchors in human cells, geneticists will likely identify
gene expression (Flavahan et al., 2016). Furthermore, chromo- additional genetic variants that contribute to non-cancer disease
somal rearrangements such as translocations or deletions, through disruption of insulators.
which activate oncogenes, also disrupt insulated neighborhoods Target Genes of Disease-Associated Enhancer Variation
around those genes without altering the sequence of the gene it- Insulated neighborhood models provide a new approach to iden-
self (Gröschel et al., 2014; Hnisz et al., 2016). Cancer genome tify the target genes of disease-associated enhancer variation.
sequencing has revealed that somatic mutations occur in Tens of thousands of non-coding genetic variants have been
CTCF and cohesin coding sequences in various solid tumors linked with various human diseases and traits in genome-wide
and leukemias (Lawrence et al., 2014), and it seems likely that association studies (GWASs), and the majority of these variants
1194 Cell 167, November 17, 2016

Figure 8. Neighborhood Perturbation and Repair through Site-
Specific DNA Methylation
(A) Targeting a dCas9-DNA-methyltransferase 3a/3l (Dnmt3a/3l) fusion to an
insulated neighborhood anchor leads to DNA methylation, abrogation of CTCF
binding, and loss of neighborhood integrity. Black lollipops indicate DNA
methylation.
(B) Targeting a dCas9-TET (Ten-eleven translocation) fusion to an aberrantly
methylated insulated neighborhood anchor leads to DNA de-methylation and
restoration of neighborhood integrity.
2016). Although the variant is located in an intronic enhancer

within FTO, both IRX3 and IRX5 are located in the same insulated
neighborhood as the variant (Figure 7B). Similarly, functional
investigation of a genetic variant associated with type 2 dia-
betes, and previously assigned to the CDC123 and CAMK1D
genes based on proximity, revealed that the variant affects the
Figure 7. Insulated Neighborhoods as a Method to Identify Target distal CAMK1D gene and not CDC123 (Fogarty et al., 2014;
Genes of Disease-Associated Enhancer Variation GTEx Consortium, 2015). Examination of insulated neighbor-
(A) Top: assignment of an enhancer-associated single-nucleotide poly- hood structures reveals that CAMK1D is located in the same
morphism (SNP) to a gene based on linear proximity. Bottom: assignment of a
SNP to a gene based on the insulated neighborhood model. neighborhood as the variant, whereas CDC123 is not (Figure 7C).
(B) Model of the insulated neighborhood organization at the FTO-IRX3-IRX5 These examples suggest that insulated neighborhood maps can
locus. facilitate the identification of genes affected by non-coding ge-
(C) Model of the insulated neighborhood organization at the CDC123-
CAMK1D locus.
netic variants.
Epigenetic Editing of Insulated Neighborhood
Structures
occur in enhancers (Ernst et al., 2011; Farh et al., 2015; Hnisz The CTCF binding site in insulated neighborhood loop anchors is
et al., 2013; Maurano et al., 2012). The identification of the target hypomethylated (Ji et al., 2016); DNA methylation abrogates
genes of these variants is challenging because proximity-based CTCF DNA binding (Bell and Felsenfeld, 2000; Hark et al.,
assignment has proven, in some cases, to be inaccurate. Map- 2000; Kanduri et al., 2000; Szabó et al., 2000). This suggests
ping interactions between enhancers and promoters in dis- that site-specific methylation and demethylation of a neighbor-
ease-relevant cells improves the accuracy of the assignment hood anchor can alter neighborhood structures. Indeed, tar-
(Grubert et al., 2015; McGeachie et al., 2016; Pomerantz et al., geted methylation of a neighborhood anchor site with a
2009), but this is not always feasible. Because insulated neigh- dCas9-DNA-metyltransferase-3 fusion protein has been shown
borhoods tend to be shared by different cell types, existing to disrupt the neighborhood (Figure 8A) (Liu et al., 2016). Simi-
maps of insulated neighborhoods should allow investigators to larly, targeted de-methylation with a dCas9-TET fusion protein
develop a hypothesis regarding the potential target genes of has been demonstrated (Amabile et al., 2016; Liu et al., 2016),
enhancer-associated variation (Figure 7A). For example, a recent and this strategy could be used to restore an insulated neighbor-
study revealed that a genetic variant associated with obesity and hood whose anchor site is disrupted by aberrant DNA methyl-
previously assigned to the FTO gene in fact has no impact on ation (Figure 8B). These tools might evolve to be useful for
FTO but affects the IRX3 and IRX5 genes (Claussnitzer et al., therapeutic purposes.
Cell 167, November 17, 2016 1195

Challenges of the extent to which all the DNA loop structures of any one cell
How Dynamic and Heterogeneous Are Insulated type are shared by another cell type, but comparisons can be
Neighborhood Loops? made for the set of loops that meet high-confidence criteria in
The example of CTCF-CTCF loop and gene control at the im- similar experimental data from two or more cell types. Studies
printed IGF2/H19 locus suggests that the neighborhoods are on TADs have estimated that most TAD boundaries are shared
sufficiently stable to prevent development of the diseases asso- by any two cell types (Dixon et al., 2012, 2015). Studies on insu-
ciated with neighborhood dysregulation. Furthermore, the strik- lated neighborhoods have estimated that 80% of neighbor-
ing similarity of TAD boundaries across cell types (Dixon et al., hood boundaries are shared by any two cell types (Hnisz et al.,
2012, 2015), which we argue is produced largely by insulated 2016; Ji et al., 2016). Given the sparsity of data and the noise
neighborhood structures (Figure 4A), suggests that these in these datasets, it is possible that the vast majority of TADs
neighborhoods are rather stable. However, the dynamics of the and insulated neighborhoods are shared by most cell types,
loop structures that form insulated neighborhoods, and the although there is some evidence that CTCF binding and CTCF-
enhancer-promoter interactions within them, are not yet under- CTCF loop formation can be cell-type specific (Narendra et al.,
stood. Similarly, the cell-to-cell heterogeneity of DNA loop struc- 2015; Splinter et al., 2006; Tolhuis et al., 2002; Wang et al.,
tures is not understood, and the extent to which allele-specific 2012). A broader survey of cell types will be needed to determine
loops occur is not clear. The fraction of time that CTCF is bound the extent to which cell-type-specific insulated neighborhoods
to a loop anchor site, the fraction of time that it spends in a dimer- exist in human cells.
ized state, and the extent to which CTCF switches its dimeric How Are Insulated Neighborhood Loop Anchors
partner are three of the elements that factor into a potential solu- Regulated?
tion to these questions. The experimental approaches used thus DNA methylation plays a key role in CTCF-CTCF loop anchor
far to identify CTCF loops generally depend on the study of pop- control and gene control, as illustrated in the imprinted IGF2/
ulations of cells, and thus, the present data are inadequate to H19 locus (Figure 6), but the regulatory mechanisms that pro-
address questions of dynamics. Improvements in single-cell duce site-specific methylation of loop anchors are not well
technologies will be needed to reveal the dynamics of insulated understood. In Drosophila, a number of proteins have been iden-
neighborhoods and enhancer-promoter interactions. tified that influence CTCF binding and insulator function (Phillips-
Computational simulations of chromosome loops have led to Cremins and Corces, 2013), but it is not clear whether similar
the suggestion that only a subset of insulated neighborhoods proteins might contribute to regulation of mammalian loop an-
occur in each cell within a population at any given time and chors. CTCF binding to DNA can also be modulated by post-
have led to the hypothesis that an extrusion model can facilitate translational modifications such as poly-ADP ribosylation (Ong
enhancer-promoter interactions and lead to insulation in all cells et al., 2013). Non-coding RNA has also been implicated in regu-
(Doyle et al., 2014; Fudenberg et al., 2016; Giorgetti et al., 2014; lation of CTCF binding to DNA at certain loci (Saldaña-Meyer
Sanborn et al., 2015). This model postulates that chromosome et al., 2014). Future studies will need to address the extent to
loops are formed by the extrusion of chromatin by an ‘‘extrusion which CTCF modifications and RNA modulate neighborhood
complex,’’ an entity that acts as a molecular motor to draw DNA loop anchors.
through a cohesin complex (reviewed in Dekker and Mirny, How Does a Loop Insulate?
2016). In this model, the extruded DNA loop forms a CTCF- With the two-dimensional representations of loops shown here,
CTCF loop when the cohesin-containing extrusion complex is it is reasonable to ask how an insulated neighborhood boundary
blocked by a pair of convergently oriented CTCF molecules suppresses enhancer-promoter loop formation across the
bound to two sites in the extruded DNA. Because transcription boundaries. An element in a 2D chromatin loop should be able
initiation by RNA polymerase II includes cohesin loading at to contact elements in other loops. Additional structuring of the
the enhancer-promoter junction (Kagey et al., 2010), and neighborhood, such as condensing the looped chromatin into
two RNA molecules can transcribe bi-directionally from a compact ball, would reduce the opportunity to interact with
promoters and enhancers (Core et al., 2008; Seila et al., 2008; other neighborhoods. One candidate for such a factor is conden-
Sigova et al., 2013), it is possible that RNA polymerase II sin II, which is loaded onto active promoters together with
plays a role in this postulated extrusion. Condensin II, which is cohesin (Dowen et al., 2013). Condensin is known to inhibit
loaded onto DNA at sites of transcription initiation together transvection in Drosophila polytene chromosomes, where it pre-
with cohesin (Dowen et al., 2013), is another candidate ‘‘extru- sumably prevents interactions between two alleles (Hartl et al.,
sion complex’’ factor. These extrusion models have yet to be 2008). Other proteins, such as the Polycomb repressive complex
tested experimentally. (Francis et al., 2004), may contribute to effectively condense a
To What Extent Are Insulated Neighborhoods Shared silent insulated neighborhood.
across Cell Types? Do Loop Anchors Vary in Insulation Strength?
Most genomic data are inherently noisy and filtered to provide an Early efforts to quantify insulation on a genome-wide scale sug-
interpretation at some arbitrarily chosen confidence interval. The gest that differences in insulation strength at CTCF anchors
experimental approaches used to determine DNA interactions, might occur and perhaps correlate with certain genomic features
which include Hi-C and ChIA-PET technologies, produce espe- (Phillips-Cremins and Corces, 2013). Insulated neighborhoods
cially noisy data. Furthermore, DNA interaction data can be can have boundaries with multiple CTCF binding sites and
sparse, especially when using small numbers of cells. These fea- can be nested within larger insulated neighborhoods. These
tures of the data make it challenging to provide good estimates features appear to produce a higher ‘‘insulation score’’ for
1196 Cell 167, November 17, 2016

enhancers and genes within the neighborhood and may thus ACKNOWLEDGMENTS
represent a safeguard against perturbation. For example, the
Many ideas discussed in this perspective emerged from conversations with
b-globin genes are located in nested neighborhoods, and pertur-
Brian Abraham, Frederick Alt, Jay Bradner, Daniel Dadon, Eric Guo, Rudolf
bation of the inside neighborhood anchors has little effect on Jaenisch, Xiong Ji, Tony Lee, Bryan Lajoie, Charles H. Li, Stuart Levine, Thoru
globin gene expression (Bender et al., 2006). In ESCs, deletion Pedersen, Ana Pombo, Robert Roeder, Ben Sabari, Jurian Schuijers, Anne-
of multiple boundary sites was required to observe changes in Laure Valton, Robert Weinberg, Abraham Weintraub, Alicia Zamudio, Len
gene expression at certain loci (Dowen et al., 2014). Interestingly, Zon, and Thomas Zwaka. We are particularly grateful to Brad Bernstein, Victor
a recent study found that a set of adjacent neighborhoods shows Corces, Job Dekker, Edith Heard, Danny Reinberg, Bing Ren, Yijun Ruan, and
Phillip Sharp for comments on the manuscript. We thank Jennifer Cook-
evidence of ‘‘merging’’ together during the differentiation of
Chrysos for helping with the graphical illustrations. The work was supported
germinal center B cells (Bunting et al., 2016), indicating that
by an NIH Grant HG002668 (R.A.Y.), an Erwin Schrödinger Fellowship
the insulating properties of some neighborhoods may be under (J3490) from the Austrian Science Fund (D.H.), a Margaret and Herman Sokol
developmental control. Additional study is necessary to under- Postdoctoral Award (D.H.), and an American Cancer Society—New England
stand the structural and mechanistic features that contribute to Division Postdoctoral Fellowship (PF-16-146-01-DMC) (D.S.D.). R.A.Y. is a
insulator strength. founder of Syros Pharmaceuticals and Marauder Therapeutics.
What Additional Mechanisms Contribute to
Enhancer-Gene Specificity? REFERENCES
The insulated neighborhood model can explain how enhancer-
Amabile, A., Migliara, A., Capasso, P., Biffi, M., Cittaro, D., Naldini, L., and
promoter specificity is obtained when a single gene occurs
Lombardo, A. (2016). Inheritable silencing of endogenous genes by hit-and-
together with its regulatory elements within the neighborhood, run targeted epigenetic editing. Cell 167, 219–232.
but it does not fully explain enhancer-promoter specificity Banerji, J., Rusconi, S., and Schaffner, W. (1981). Expression of a beta-globin
when multiple genes are present. We estimate that in gene is enhanced by remote SV40 DNA sequences. Cell 27, 299–308.
neighborhoods with two genes, the activity of the two is Bell, A.C., and Felsenfeld, G. (2000). Methylation of a CTCF-dependent bound-
coherent in 60% (both are active or both are silent). The ten- ary controls imprinted expression of the Igf2 gene. Nature 405, 482–485.
dency for these two-gene neighborhoods to have coherent Bell, A.C., West, A.G., and Felsenfeld, G. (1999). The protein CTCF is required
on or off activities suggests that genes in these neighborhoods for the enhancer blocking activity of vertebrate insulators. Cell 98, 387–396.
may often be co-regulated. Indeed, recent evidence in Bender, M.A., Byron, R., Ragoczy, T., Telling, A., Bulger, M., and Groudine, M.
Drosophila suggests that an enhancer can target all genes (2006). Flanking HS-62.5 and 30 HS1, and regions upstream of the LCR, are not
within an insulated chromatin structure (Fukaya et al., 2016). required for beta-globin transcription. Blood 108, 1395–1401.
Further regulation may occur post-transcriptionally (e.g., micro- Benoist, C., and Chambon, P. (1981). In vivo sequence requirements of the
RNAs), which could account for differential transcript accumu- SV40 early promotor region. Nature 290, 304–310.
lation. As noted above, it is also possible that some degree of Bickmore, W.A., and van Steensel, B. (2013). Genome architecture: domain or-
ganization of interphase chromosomes. Cell 152, 1270–1284.
enhancer-gene specificity is obtained through the interaction of
specific factors bound at the enhancer and promoters (Zabidi Boveri, T. (1909). Die Blastomerenkerne von Ascaris megalocephala und die
Theorie der Chromosomenindividualität. Arch Zellforsch, 181–268.
et al., 2015).
Buecker, C., and Wysocka, J. (2012). Enhancers as information integration
hubs in development: lessons from genomics. Trends Genet. 28, 276–284.
Future Perspective
Bulger, M., and Groudine, M. (2011). Functional and mechanistic diversity of
Evidence that proper activation and repression of genes is
distal transcription enhancers. Cell 144, 327–339.
dependent on the integrity of insulated neighborhoods argues
Bunting, K.L., Soong, T.D., Singh, R., Jiang, Y., Béguelin, W., Poloway, D.W.,
that these are structural and functional units of mammalian Swed, B.L., Hatzi, K., Reisacher, W., Teater, M., et al. (2016). Multi-tiered Reor-
gene control. Insulated neighborhoods provide a new framework ganization of the Genome during B Cell Affinity Maturation Anchored by a
for investigating gene control and interpreting the effects of non- Germinal Center-Specific Locus Control Region. Immunity 45, 497–512.
coding genetic variation. A fuller understanding of the normal Butler, J.E., and Kadonaga, J.T. (2001). Enhancer-promoter specificity medi-
and abnormal control of any gene will require consideration of ated by DPE or TATA core promoter motifs. Genes Dev. 15, 2515–2519.
the potential contribution of any regulatory elements within its Cavalli, G., and Misteli, T. (2013). Functional implications of genome topology.
neighborhood and the possibility of loop anchor regulation. Nat. Struct. Mol. Biol. 20, 290–299.
New insights into the role of genome structure in selective Chepelev, I., Wei, G., Wangsa, D., Tang, Q., and Zhao, K. (2012). Characteriza-
gene control in development and disease will be accelerated tion of genome-wide enhancer-promoter interactions reveals co-expression of
with improvements in technologies to map chromosome struc- interacting genes and modes of higher order chromatin organization. Cell Res.
22, 490–503.
tures at improved resolution, ideally in an allele-specific fashion
Choi, O.R., and Engel, J.D. (1988). Developmental regulation of beta-globin
in single cells.
gene switching. Cell 55, 17–26.
Chung, J.H., Whiteley, M., and Felsenfeld, G. (1993). A 50 element of the
Note on Data Availability
chicken beta-globin domain serves as an insulator in human erythroid cells
Maps of insulated neighborhoods in human ESCs are available in and protects against position effect in Drosophila. Cell 74, 505–514.
Table S3 in Ji et al. (2016). The dataset described for primed Claussnitzer, M., Hui, C.C., and Kellis, M. (2016). FTO Obesity Variant and
hESCs were used for the quantitative analyses described here. Adipocyte Browning in Humans. N. Engl. J. Med. 374, 192–193.
Maps and features of insulated neighborhoods in human and Core, L.J., Waterfall, J.J., and Lis, J.T. (2008). Nascent RNA sequencing
murine ESCs are also found online at http://younglab.wi.mit. reveals widespread pausing and divergent initiation at human promoters.
edu/insulatedneighborhoods.htm. Science 322, 1845–1848.
Cell 167, November 17, 2016 1197

Cremer, T., and Cremer, M. (2010). Chromosome territories. Cold Spring Harb. Fullwood, M.J., Liu, M.H., Pan, Y.F., Liu, J., Xu, H., Mohamed, Y.B., Orlov, Y.L.,
Perspect. Biol. 2, a003889. Velkov, S., Ho, A., Mei, P.H., et al. (2009). An oestrogen-receptor-alpha-bound
de Laat, W., and Duboule, D. (2013). Topology of mammalian developmental human chromatin interactome. Nature 462, 58–64.
enhancers and their regulatory landscapes. Nature 502, 499–506. Geyer, P.K., and Corces, V.G. (1992). DNA position-specific repression of tran-
de Wit, E., Vos, E.S., Holwerda, S.J., Valdes-Quezada, C., Verstegen, M.J., scription by a Drosophila zinc finger protein. Genes Dev. 6, 1865–1873.
Teunissen, H., Splinter, E., Wijchers, P.J., Krijger, P.H., and de Laat, W. Gibcus, J.H., and Dekker, J. (2013). The hierarchy of the 3D genome. Mol. Cell
(2015). CTCF Binding Polarity Determines Chromatin Looping. Mol. Cell 60, 49, 773–782.
676–684. Giorgetti, L., Galupa, R., Nora, E.P., Piolot, T., Lam, F., Dekker, J., Tiana, G.,
Dekker, J., and Heard, E. (2015). Structural and functional diversity of Topolog- and Heard, E. (2014). Predictive polymer modeling reveals coupled fluctua-
ically Associating Domains. FEBS Lett. 589(20 Pt A), 2877–2884. tions in chromosome conformation and transcription. Cell 157, 950–963.
Dekker, J., and Mirny, L. (2016). The 3D Genome as Moderator of Chromo- Gómez-Marı́n, C., Tena, J.J., Acemel, R.D., López-Mayorga, M., Naranjo, S.,
somal Communication. Cell 164, 1110–1121. de la Calle-Mustienes, E., Maeso, I., Beccari, L., Aneas, I., Vielmas, E., et al.
DeMare, L.E., Leng, J., Cotney, J., Reilly, S.K., Yin, J., Sarro, R., and Noonan, (2015). Evolutionary comparison reveals that diverging CTCF sites are signa-
J.P. (2013). The genomic landscape of cohesin-associated chromatin interac- tures of ancestral topological associating domains borders. Proc. Natl.
tions. Genome Res. 23, 1224–1234. Acad. Sci. USA 112, 7542–7547.
Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and Gorkin, D.U., Leung, D., and Ren, B. (2014). The 3D genome in transcriptional
Ren, B. (2012). Topological domains in mammalian genomes identified by regulation and pluripotency. Cell Stem Cell 14, 762–775.
analysis of chromatin interactions. Nature 485, 376–380. Gröschel, S., Sanders, M.A., Hoogenboezem, R., de Wit, E., Bouwman, B.A.,
Dixon, J.R., Jung, I., Selvaraj, S., Shen, Y., Antosiewicz-Bourget, J.E., Lee, Erpelinck, C., van der Velden, V.H., Havermans, M., Avellino, R., van Lom, K.,
A.Y., Ye, Z., Kim, A., Rajagopal, N., Xie, W., et al. (2015). Chromatin architec- et al. (2014). A single oncogenic enhancer rearrangement causes concomitant
ture reorganization during stem cell differentiation. Nature 518, 331–336. EVI1 and GATA2 deregulation in leukemia. Cell 157, 369–381.
Dong, J., Panchakshari, R.A., Zhang, T., Zhang, Y., Hu, J., Volpi, S.A., Meyers, Grubert, F., Zaugg, J.B., Kasowski, M., Ursu, O., Spacek, D.V., Martin, A.R.,
R.M., Ho, Y.J., Du, Z., Robbiani, D.F., et al. (2015). Orientation-specific joining Greenside, P., Srivas, R., Phanstiel, D.H., Pekowska, A., et al. (2015). Genetic
of AID-initiated DNA breaks promotes antibody class switching. Nature 525, Control of Chromatin States in Humans Involves Local and Distal Chromo-
134–139. somal Interactions. Cell 162, 1051–1065.
Dowen, J.M., Bilodeau, S., Orlando, D.A., Hübner, M.R., Abraham, B.J., Spec- Gruss, P., Dhar, R., and Khoury, G. (1981). Simian virus 40 tandem repeated
tor, D.L., and Young, R.A. (2013). Multiple structural maintenance of chromo- sequences as an element of the early promoter. Proc. Natl. Acad. Sci. USA
some complexes at transcriptional regulatory elements. Stem Cell Reports 1, 78, 943–947.
371–378. GTEx Consortium (2015). Human genomics. The Genotype-Tissue Expression
Dowen, J.M., Fan, Z.P., Hnisz, D., Ren, G., Abraham, B.J., Zhang, L.N., Wein- (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348,
traub, A.S., Schuijers, J., Lee, T.I., Zhao, K., and Young, R.A. (2014). Control of 648–660.
cell identity genes occurs in insulated neighborhoods in mammalian chromo- Guo, Y., Xu, Q., Canzio, D., Shou, J., Li, J., Gorkin, D.U., Jung, I., Wu, H., Zhai,
somes. Cell 159, 374–387. Y., Tang, Y., et al. (2015). CRISPR Inversion of CTCF Sites Alters Genome
Doyle, B., Fudenberg, G., Imakaev, M., and Mirny, L.A. (2014). Chromatin Topology and Enhancer/Promoter Function. Cell 162, 900–910.
loops as allosteric modulators of enhancer-promoter interactions. PLoS Handoko, L., Xu, H., Li, G., Ngan, C.Y., Chew, E., Schnapp, M., Lee, C.W., Ye,
Comput. Biol. 10, e1003867. C., Ping, J.L., Mulawadi, F., et al. (2011). CTCF-mediated functional chromatin
ENCODE Project Consortium (2012). An integrated encyclopedia of DNA ele- interactome in pluripotent cells. Nat. Genet. 43, 630–638.
ments in the human genome. Nature 489, 57–74. Hark, A.T., Schoenherr, C.J., Katz, D.J., Ingram, R.S., Levorse, J.M., and Tilgh-
Ernst, J., Kheradpour, P., Mikkelsen, T.S., Shoresh, N., Ward, L.D., Epstein, man, S.M. (2000). CTCF mediates methylation-sensitive enhancer-blocking
C.B., Zhang, X., Wang, L., Issner, R., Coyne, M., et al. (2011). Mapping and activity at the H19/Igf2 locus. Nature 405, 486–489.
analysis of chromatin state dynamics in nine human cell types. Nature 473, Hartl, T.A., Smith, H.F., and Bosco, G. (2008). Chromosome alignment and
43–49. transvection are antagonized by condensin II. Science 322, 1384–1387.
Farh, K.K., Marson, A., Zhu, J., Kleinewietfeld, M., Housley, W.J., Beik, S., Heidari, N., Phanstiel, D.H., He, C., Grubert, F., Jahanbani, F., Kasowski, M.,
Shoresh, N., Whitton, H., Ryan, R.J., Shishkin, A.A., et al. (2015). Genetic Zhang, M.Q., and Snyder, M.P. (2014). Genome-wide map of regulatory inter-
and epigenetic fine mapping of causal autoimmune disease variants. Nature actions in the human genome. Genome Res. 24, 1905–1917.
518, 337–343. Heinz, S., Romanoski, C.E., Benner, C., and Glass, C.K. (2015). The selection
Flavahan, W.A., Drier, Y., Liau, B.B., Gillespie, S.M., Venteicher, A.S., Stem- and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 16,
mer-Rachamimov, A.O., Suvà, M.L., and Bernstein, B.E. (2016). Insulator 144–154.
dysfunction and oncogene activation in IDH mutant gliomas. Nature 529, Hnisz, D., Abraham, B.J., Lee, T.I., Lau, A., Saint-André, V., Sigova, A.A., Hoke,
110–114. H.A., and Young, R.A. (2013). Super-enhancers in the control of cell identity
Fogarty, M.P., Cannon, M.E., Vadlamudi, S., Gaulton, K.J., and Mohlke, K.L. and disease. Cell 155, 934–947.
(2014). Identification of a regulatory variant that binds FOXA1 and FOXA2 at Hnisz, D., Weintraub, A.S., Day, D.S., Valton, A.L., Bak, R.O., Li, C.H., Gold-
the CDC123/CAMK1D type 2 diabetes GWAS locus. PLoS Genet. 10, mann, J., Lajoie, B.R., Fan, Z.P., Sigova, A.A., et al. (2016). Activation of
e1004633. proto-oncogenes by disruption of chromosome neighborhoods. Science
Francis, N.J., Kingston, R.E., and Woodcock, C.L. (2004). Chromatin compac- 351, 1454–1458.
tion by a polycomb group protein complex. Science 306, 1574–1577. Hou, C., Zhao, H., Tanimoto, K., and Dean, A. (2008). CTCF-dependent
Fudenberg, G., Imakaev, M., Lu, C., Goloborodko, A., Abdennur, N., and enhancer-blocking by alternative chromatin loop formation. Proc. Natl.
Mirny, L.A. (2016). Formation of Chromosomal Domains by Loop Extrusion. Acad. Sci. USA 105, 20398–20403.
Cell Rep. 15, 2038–2049. Hu, J., Zhang, Y., Zhao, L., Frock, R.L., Du, Z., Meyers, R.M., Meng, F.L.,
Fukaya, T., Lim, B., and Levine, M. (2016). Enhancer Control of Transcriptional Schatz, D.G., and Alt, F.W. (2015). Chromosomal Loop Domains Direct the
Bursting. Cell 166, 358–368. Recombination of Antigen Receptor Genes. Cell 163, 947–959.
1198 Cell 167, November 17, 2016

Ji, X., Dadon, D.B., Powell, B.E., Fan, Z.P., Borges-Rivera, D., Shachar, S., Beckwith-Wiedemann syndrome and Silver-Russell syndrome. Hum. Mol.
Weintraub, A.S., Hnisz, D., Pegoraro, G., Lee, T.I., et al. (2016). 3D Chromo- Genet. 20, 1363–1374.
some Regulatory Landscape of Human Pluripotent Cells. Cell Stem Cell 18,
Nora, E.P., Lajoie, B.R., Schulz, E.G., Giorgetti, L., Okamoto, I., Servant, N.,
262–275.
Piolot, T., van Berkum, N.L., Meisig, J., Sedat, J., et al. (2012). Spatial partition-
Kagey, M.H., Newman, J.J., Bilodeau, S., Zhan, Y., Orlando, D.A., van Ber- ing of the regulatory landscape of the X-inactivation centre. Nature 485,
kum, N.L., Ebmeier, C.C., Goossens, J., Rahl, P.B., Levine, S.S., et al. 381–385.
(2010). Mediator and cohesin connect gene expression and chromatin archi-
Ohtsuki, S., Levine, M., and Cai, H.N. (1998). Different core promoters possess
tecture. Nature 467, 430–435.
distinct regulatory activities in the Drosophila embryo. Genes Dev. 12,
Kanduri, C., Pant, V., Loukinov, D., Pugacheva, E., Qi, C.F., Wolffe, A., Ohls- 547–556.
son, R., and Lobanenkov, V.V. (2000). Functional association of CTCF with
the insulator upstream of the H19 gene is parent of origin-specific and methyl- Ong, C.T., and Corces, V.G. (2011). Enhancer function: new insights into the
ation-sensitive. Curr. Biol. 10, 853–856. regulation of tissue-specific gene expression. Nat. Rev. Genet. 12, 283–293.
Katainen, R., Dave, K., Pitkänen, E., Palin, K., Kivioja, T., Välimäki, N., Gylfe, Ong, C.T., and Corces, V.G. (2014). CTCF: an architectural protein bridging
A.E., Ristolainen, H., Hänninen, U.A., Cajuso, T., et al. (2015). CTCF/cohe- genome topology and function. Nat. Rev. Genet. 15, 234–246.
sin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821. Ong, C.T., Van Bortle, K., Ramos, E., and Corces, V.G. (2013). Poly(ADP-ribo-
Kellum, R., and Schedl, P. (1991). A position-effect assay for boundaries of syl)ation regulates insulator function and intrachromosomal interactions in
higher order chromosomal domains. Cell 64, 941–950. Drosophila. Cell 155, 148–159.
Roadmap Epigenomics Consortium, Kundaje, A., Meuleman, W., Ernst, J., Bi- Phillips, J.E., and Corces, V.G. (2009). CTCF: master weaver of the genome.
lenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Cell 137, 1194–1211.
Ziller, M.J., et al. (2015). Integrative analysis of 111 reference human epige-
Phillips-Cremins, J.E., and Corces, V.G. (2013). Chromatin insulators: linking
nomes. Nature 518, 317–330.
genome organization to cellular function. Mol. Cell 50, 461–474.
Kurukuti, S., Tiwari, V.K., Tavoosidana, G., Pugacheva, E., Murrell, A., Zhao,
Phillips-Cremins, J.E., Sauria, M.E., Sanyal, A., Gerasimova, T.I., Lajoie, B.R.,
Z., Lobanenkov, V., Reik, W., and Ohlsson, R. (2006). CTCF binding at the
Bell, J.S., Ong, C.T., Hookway, T.A., Guo, C., Sun, Y., et al. (2013). Architec-
H19 imprinting control region mediates maternally inherited higher-order chro-
tural protein subclasses shape 3D organization of genomes during lineage
matin conformation to restrict enhancer access to Igf2. Proc. Natl. Acad. Sci.
commitment. Cell 153, 1281–1295.
USA 103, 10684–10689.
Lawrence, M.S., Stojanov, P., Mermel, C.H., Robinson, J.T., Garraway, L.A., Pomerantz, M.M., Ahmadiyeh, N., Jia, L., Herman, P., Verzi, M.P., Doddapa-
Golub, T.R., Meyerson, M., Gabriel, S.B., Lander, E.S., and Getz, G. (2014). neni, H., Beckwith, C.A., Chan, J.A., Hills, A., Davis, M., et al. (2009). The
Discovery and saturation analysis of cancer genes across 21 tumour types. 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC
Nature 505, 495–501. in colorectal cancer. Nat. Genet. 41, 882–884.
Levine, M., Cattoglio, C., and Tjian, R. (2014). Looping back to leap forward: Rao, S.S., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Rob-
transcription enters a new era. Cell 157, 13–25. inson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., and Aiden, E.L.
(2014). A 3D map of the human genome at kilobase resolution reveals princi-
Liu, M., Maurano, M.T., Wang, H., Qi, H., Song, C.Z., Navas, P.A., Emery,
ples of chromatin looping. Cell 159, 1665–1680.
D.W., Stamatoyannopoulos, J.A., and Stamatoyannopoulos, G. (2015).
Genomic discovery of potent chromatin insulators for human gene therapy. Ren, B., and Yue, F. (2015). Transcriptional enhancers: bridging the genome
Nat. Biotechnol. 33, 198–203. and phenome. Cold Spring Harb. Symp. Quant. Biol. 80, 17–26.
Liu, X.S., Wu, H., Ji, X., Stelzer, Y., Wu, X., Czauderna, S., Shu, J., Dadon, D., Saldaña-Meyer, R., González-Buendı́a, E., Guerrero, G., Narendra, V., Bona-
Young, R.A., and Jaenisch, R. (2016). Editing DNA methylation in the mamma- sio, R., Recillas-Targa, F., and Reinberg, D. (2014). CTCF regulates the human
lian genome. Cell 167, 233–247. p53 gene through direct interaction with its natural antisense transcript,
Lupiáñez, D.G., Kraft, K., Heinrich, V., Krawitz, P., Brancati, F., Klopocki, E., Wrap53. Genes Dev. 28, 723–734.
Horn, D., Kayserili, H., Opitz, J.M., Laxova, R., et al. (2015). Disruptions of Sanborn, A.L., Rao, S.S., Huang, S.C., Durand, N.C., Huntley, M.H., Jewett,
topological chromatin domains cause pathogenic rewiring of gene-enhancer A.I., Bochkov, I.D., Chinnappan, D., Cutkosky, A., Li, J., et al. (2015). Chro-
interactions. Cell 161, 1012–1025. matin extrusion explains key features of loop and domain formation in wild-
Maurano, M.T., Humbert, R., Rynes, E., Thurman, R.E., Haugen, E., Wang, H., type and engineered genomes. Proc. Natl. Acad. Sci. USA 112, E6456–E6465.
Reynolds, A.P., Sandstrom, R., Qu, H., Brody, J., et al. (2012). Systematic Schmitt, A.D., Hu, M., and Ren, B. (2016). Genome-wide mapping and analysis
localization of common disease-associated variation in regulatory DNA. Sci- of chromosome architecture. Nat. Rev. Mol. Cell Biol. Published online
ence 337, 1190–1195. September 1, 2016. http://dx.doi.org/10.1038/nrm.2016.104.
McGeachie, M.J., Yates, K.P., Zhou, X., Guo, F., Sternberg, A.L., Van Natta,
Seila, A.C., Calabrese, J.M., Levine, S.S., Yeo, G.W., Rahl, P.B., Flynn, R.A.,
M.L., Wise, R.A., Szefler, S.J., Sharma, S., Kho, A.T., et al.; CAMP Research
Young, R.A., and Sharp, P.A. (2008). Divergent transcription from active pro-
Group (2016). Genetics and genomics of longitudinal lung function patterns
moters. Science 322, 1849–1851.
in asthmatics. Am. J. Respir. Crit. Care Med. Published online July 1, 2016.
http://dx.doi.org/10.1164/rccm.201602-0250OC. Sigova, A.A., Mullen, A.C., Molinie, B., Gupta, S., Orlando, D.A., Guenther,
M.G., Almada, A.E., Lin, C., Sharp, P.A., Giallourakis, C.C., and Young, R.A.
Merkenschlager, M., and Nora, E.P. (2016). CTCF and Cohesin in Genome
(2013). Divergent transcription of long noncoding RNA/mRNA gene pairs in
Folding and Transcriptional Gene Regulation. Annu. Rev. Genomics Hum.
embryonic stem cells. Proc. Natl. Acad. Sci. USA 110, 2876–2881.
Genet. 17, 17–43.
Murrell, A., Heeson, S., and Reik, W. (2004). Interaction between differentially Smith, E.M., Lajoie, B.R., Jain, G., and Dekker, J. (2016). Invariant TAD Bound-
methylated regions partitions the imprinted genes Igf2 and H19 into parent- aries Constrain Cell-Type-Specific Looping Interactions between Promoters
specific chromatin loops. Nat. Genet. 36, 889–893. and Distal Elements around the CFTR Locus. Am. J. Hum. Genet. 98, 185–201.
Narendra, V., Rocha, P.P., An, D., Raviram, R., Skok, J.A., Mazzoni, E.O., and Spitz, F., and Furlong, E.E. (2012). Transcription factors: from enhancer bind-
Reinberg, D. (2015). CTCF establishes discrete functional chromatin domains ing to developmental control. Nat. Rev. Genet. 13, 613–626.
at the Hox clusters during differentiation. Science 347, 1017–1021. Splinter, E., Heath, H., Kooren, J., Palstra, R.J., Klous, P., Grosveld, F., Galjart,
Nativio, R., Sparago, A., Ito, Y., Weksberg, R., Riccio, A., and Murrell, A. (2011). N., and de Laat, W. (2006). CTCF mediates long-range chromatin looping and
Disruption of genomic neighbourhood at the imprinted IGF2-H19 locus in local histone modification in the beta-globin locus. Genes Dev. 20, 2349–2354.
Cell 167, November 17, 2016 1199

Szabó, P., Tang, S.H., Rentsendorj, A., Pfeifer, G.P., and Mann, J.R. (2000). Udvardy, A., Maine, E., and Schedl, P. (1985). The 87A7 chromomere. Identi-
Maternal-specific footprints at putative CTCF sites in the H19 imprinting con- fication of novel chromatin structures flanking the heat shock locus that may
trol region give evidence for insulator function. Curr. Biol. 10, 607–610. define the boundaries of higher order domains. J. Mol. Biol. 185, 341–358.
Tang, Z., Luo, O.J., Li, X., Zheng, M., Zhu, J.J., Szalaj, P., Trzaskoma, P., Mag- Vietri Rudan, M., Barrington, C., Henderson, S., Ernst, C., Odom, D.T., Tanay,
alska, A., Wlodarczyk, J., Ruszczycki, B., et al. (2015). CTCF-Mediated Human A., and Hadjur, S. (2015). Comparative Hi-C reveals that CTCF underlies evo-
3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell lution of chromosomal domain architecture. Cell Rep. 10, 1297–1309.
163, 1611–1627. Wang, H., Maurano, M.T., Qu, H., Varley, K.E., Gertz, J., Pauli, F., Lee, K., Can-
field, T., Weaver, M., Sandstrom, R., et al. (2012). Widespread plasticity in
Tjian, R., and Maniatis, T. (1994). Transcriptional activation: a complex puzzle
CTCF occupancy linked to DNA methylation. Genome Res. 22, 1680–1688.
with few easy pieces. Cell 77, 5–8.
Wang, H., Zang, C., Taing, L., Arnett, K.L., Wong, Y.J., Pear, W.S., Blacklow,
Tolhuis, B., Palstra, R.J., Splinter, E., Grosveld, F., and de Laat, W. (2002). S.C., Liu, X.S., and Aster, J.C. (2014). NOTCH1-RBPJ complexes drive target
Looping and interaction between hypersensitive sites in the active beta-globin gene expression through dynamic interactions with superenhancers. Proc.
locus. Mol. Cell 10, 1453–1465. Natl. Acad. Sci. USA 111, 705–710.
UK10K Consortium, Walter, K., Min, J.L., Huang, J., Crooks, L., Memari, Y., Zabidi, M.A., Arnold, C.D., Schernhuber, K., Pagani, M., Rath, M., Frank, O.,
McCarthy, S., Perry, J.R., Xu, C., Futema, M., et al. (2015). The UK10K project and Stark, A. (2015). Enhancer-core-promoter specificity separates develop-
identifies rare variants in health and disease. Nature 526, 82–90. mental and housekeeping gene regulation. Nature 518, 556–559.
1200 Cell 167, November 17, 2016

Article
The Chromatin Remodeler ISW1 Is a Quality Control

Factor that Surveys Nuclear mRNP Biogenesis
Graphical Abstract Authors
Anna Babour, Qingtang Shen,
Julien Dos-Santos, ..., Domenico Libri,
Jane Mellor, Catherine Dargemont
Correspondence
anna.babour@inserm.fr
In Brief
A chromatin remodeling complex retains
premature mRNPs in proximity to their
transcription site, ensuring an accurate
surveillance mechanism that proofreads
the efficiency of mRNA biogenesis.
Highlights
d The chromatin remodeling complex ISW1 controls nuclear
poly(A) RNA accumulation
d Inactivation of ISW1 rescues nuclear export of improper

mRNPs and resulting genetic instability
d Isw1 participates in mRNP surveillance
d Isw1 interacts with nuclear mRNPs
Babour et al., 2016, Cell 167, 1201–1214

November 17, 2016 ª 2016 Elsevier Inc.
Article
The Chromatin Remodeler ISW1 Is a Quality Control

Factor that Surveys Nuclear mRNP Biogenesis
Anna Babour,1,5,* Qingtang Shen,1 Julien Dos-Santos,1 Struan Murray,2 Alexandre Gay,1 Drice Challal,3 Milo Fasken,4
Benoı̂t Palancade,3 Anita Corbett,4 Domenico Libri,3 Jane Mellor,2 and Catherine Dargemont1
1Université Paris Diderot, Sorbonne Paris Cité, INSERM UMR944, CNRS UMR7212, Hôpital St. Louis 1, Avenue Claude Vellefaux,
75475 Paris Cedex, France

2Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK
3Institut Jacques Monod, Université Paris Diderot, Sorbonne Paris Cité, CNRS, Bâtiment Buffon, 15 rue Hélène Brion, 75205 Paris Cedex,
France
4Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
5Lead Contact
*Correspondence: anna.babour@inserm.fr
SUMMARY optimal coupling of transcription elongation with mRNP pack-

aging and 30 processing steps (Babour et al., 2012; Tutucci
Chromatin dynamics play an essential role in regu- and Stutz, 2011).
lating DNA transaction processes, but it is unclear In addition, the accuracy of mRNP biogenesis is monitored
whether transcription-associated chromatin mod- by quality control (QC) checkpoints acting at each step of their
ifications control the mRNA ribonucleoparticles maturation both in the nucleus and in the cytoplasm. However,
(mRNPs) pipeline from synthesis to nuclear exit. anomalous mRNPs are not recognized as such by dedicated
sensors. Instead, mRNP fitness results from a competition
Here, we identify the yeast ISW1 chromatin remod-
between the opposing processes of biogenesis and degrada-
eling complex as an unanticipated mRNP nuclear
tion. As a result, transcripts that, in a certain time frame, do
export surveillance factor that retains export-incom- not reach their full maturation, are targeted for degradation
petent transcripts near their transcription site. This (Jensen et al., 2003). In the yeast nucleus, degradation is
tethering activity of ISW1 requires chromatin binding mainly achieved by a multiprotein complex, the nuclear exo-
and is independent of nucleosome sliding activity or some, whose exonuclease activity is provided by two hydrolytic
changes in RNA polymerase II processivity. Combi- RNases, Dis3 and Rrp6 (Dziembowski et al., 2007; Mitchell
nation of in vivo UV-crosslinking and genome-wide et al., 1997; Torchet et al., 2002). Rrp6 is a key factor in nuclear
RNA immunoprecipitation assays show that Isw1 mRNP QC as it also participates to the nuclear retention of
and its cofactors interact directly with premature immature transcripts at the site of transcription (Hilleren et al.,
mRNPs. Our results highlight that the concerted ac- 2001). Transcripts retention in nuclear foci was observed
in almost all yeast mutants defective in mRNA maturation/
tion of Isw1 and the nuclear exosome ensures accu-
packaging/export (Hilleren et al., 2001; Iglesias et al., 2010).
rate surveillance mechanism that proofreads the
Recently, these foci have been proposed to serve a protective
efficiency of mRNA biogenesis. function and to extend the window of opportunity for mRNPs
to mature (Kallehauge et al., 2012). Although widespread, the
INTRODUCTION molecular mechanism responsible for their tethering to chro-
matin remains uncertain.
The harmonious production of nuclear export-competent mRNA Co-transcriptional mRNP processing occurs in a chromatin
ribonucleoparticles (mRNP) is linked to precisely orchestrated environment, and the influence of chromatin dynamics on gene
transcription and processing events. Factors involved in mRNA expression recently emerged as essential. Histone marks were
processing are therefore recruited co-transcriptionally on chro- shown to contribute to exon definition, therefore favoring the
matin, while their transfer onto mRNA is synchronized with the recruitment of splicing factors and/or regulating alternative
processing, quality control (QC), and release of the transcript splicing by modulating RNA polymerase II (Pol II) processivity
from the transcription site (Bentley, 2014). (Gunderson et al., 2011; Herissant et al., 2014; Luco et al.,
Once fully matured and packaged, mRNPs are transported 2011; Saint-André et al., 2011; Sims et al., 2007). In contrast
into the cytoplasm by the hetero-dimeric mRNA export recep- to splicing, few studies connect chromatin dynamics to mRNA
tor Mex67-Mtr2 in yeast (NXF1 in metazoan). Mex67 requires biogenesis and export. The histone chaperones Spt6 and
adaptor protein(s) to interact stably with mRNPs (Nino et al., FACT were proposed to participate in the recruitment of mRNA
2013). These include the SR protein Npl3, the polyA RNA binding export factors to chromatin, thereby influencing mRNA export
protein Nab2, the TREX complex (THO complex, Sub2, Yra1), (Hautbergue et al., 2009; Yoh et al., 2007). We also reported
and the NPC-associated TREX-2 complex. Many of them are that H2B ubiquitylation controls the recruitment of the export
also important actors of other mRNA biogenesis steps, allowing machinery to the mRNP (Vitaliano-Prunier et al., 2012). This
Cell 167, 1201–1214, November 17, 2016 ª 2016 Elsevier Inc. 1201
A B
C D
Figure 1. The Chromatin-Associated ISW1 Complex Controls Nuclear Accumulation of Poly(A) RNA
(A) Co-immunoprecipitation of Isw1-13Myc and Mex67. Mock, pre-immune serum.
(B) Inactivation of the ISW1 complex does not affect poly (A) RNA localization as observed by oligo dT FISH analyses (n = 3, mean ± SD).
(C) Deletion of ISW1 rescues the growth of the mex67DUBA mutant. The indicated strains containing pRS316-MEX67 were grown at 30 C on 5-FOA plates to
counter select pRS316-MEX67.
(D) Inactivation of the ISW1 complex rescues the poly(A) RNA nuclear accumulation defect of the mex67DUBA mutant. FISH analysis and quantification was
performed as in (B) in the indicated strains grown for 2 hr at 30 C.
1202 Cell 167, 1201–1214, November 17, 2016

Table 1. ISW1 Inactivation Rescues the Growth of a Category of mRNAs and mRNPs. We propose that the concerted action of
RNA Export Mutants Isw1 and Rrp6 ensures an accurate surveillance mechanism
mRNA Export Mutant Rescue upon ISW1 Inactivation that proofreads the efficiency of mRNA biogenesis.
mex67DUBA +
RESULTS
npl3-1 +
mft1D (THO) + The Chromatin Remodeling Complex ISW1 Controls
GFP-yra1-8 (TREX) + Nuclear Poly(A)RNA Accumulation of Some mRNA
DN-nab2 + Biogenesis Mutants
pap1-1 We fortuitously identified an interaction between the mRNA
fip1-206 export receptor Mex67 and the chromatin remodeling complex
thp1D (TREX-2)
ISW1 in a previously reported two-hybrid screen (Gwizdek
et al., 2006) (unpublished data). Co-immunoprecipitation assays
rna15-58 (CPF)
confirmed an interaction between Mex67 and the catalytic sub-
ref2D (CPF)
unit of the complex, Isw1, which was partially sensitive to RNase
sen1-1 treatment (Figures 1A and S1A). In contrast, no co-immunopre-
nup159-1 cipitation between Mex67 and Isw2—the second yeast member
Genetic interactions were scored as (+) for rescue and () for no effect or of the ISWI family of chromatin remodelers—could be detected
synthetic lethality. (Figure S1A). These results prompted us to investigate the puta-
tive role of this chromatin remodeler in mRNP biogenesis and
export. Deletion of ISW1 or its accessory subunits IOC2, IOC3,
raises the possibility that chromatin dynamics might participate and IOC4 did not affect the growth (Figure S1B) nor the
in more steps of mRNP formation than previously anticipated. steady-state subcellular localization of poly(A)RNA (Figure 1B).
Chromatin dynamics is controlled by histone modifiers and In contrast, combining the deletion of ISW1 to a previously
chromatin remodeling complexes that use the ATP energy described mex67DUBA mutation that strongly affects mRNA nu-
to slide or evict nucleosomes (Fazzio and Tsukiyama, 2003; clear export (compared to Figure 1B) and cell growth (Gwizdek
Lomvardas and Thanos, 2001). The ISWI family predominantly et al., 2006), led to a partial rescue of the growth defect at
functions to slide nucleosomes laterally and position them 30 C (Figure 1C) that correlated with a significant reduction of
over coding regions (Krajewski, 2013; Yen et al., 2012). Isw1, the percentage of cells accumulating nuclear poly(A) RNA (Fig-
the catalytic subunit of the yeast ISW1 complex (ISW1), associ- ure 1D). A similar rescue was observed upon deletion of the other
ates with the non-essential subunits Ioc3 or Ioc2 and Ioc4 to ISW1 subunits.
form two subcomplexes (Isw1a and Isw1b, respectively) To test whether these genetic and functional interactions were
thought to allow the targeting of Isw1 to distinct genomic loca- specific to the mex67DUBA mutant, we combined the inactiva-
tions (Mellor and Morillon, 2004; Morillon et al., 2003; Vary et al., tion of ISW1 with the well-characterized thermosensitive npl3-1
2003). Chromatin perturbations associated with loss of Isw1 are allele. At restrictive temperature (30 C), npl3-1 cells grew poorly
not correlated with changes in mRNA abundance; isw1D cells (Figure 1E) and displayed nuclear poly(A) RNA accumulation
exhibit a modest derepression of relatively few genes (Lenstra (Figure 1F). Deletion of ISW1 partially restored cell growth and
et al., 2011; Vary et al., 2003). Instead, Isw1 was proposed to reduced the nuclear poly(A) RNA accumulation of the npl3-1
maintain chromatin integrity during transcription elongation by mutant (Figures 1E and 1F). Of note, inactivation of each subunit
RNA Pol II by preventing intragenic cryptic transcription (Gkiko- of the ISW1 complex gave rise to comparable phenotypes, irre-
poulos et al., 2011; Smolle et al., 2012; Tirosh et al., 2010; Yen spective of their belonging to Isw1a or Isw1b subcomplexes.
et al., 2012). Deletion of ISW1 could also restore cell growth and RNA nuclear
Here, we show that in Saccharomyces cerevisiae, ISW1, is a export defects in mutants of other Mex67 RNA-binding adaptors
crucial actor of nuclear mRNP surveillance that retains prema- (THO/TREX, Nab2, Npl3) but not in mutants of the TREX2 com-
ture mRNPs in proximity to their transcription site. This mRNPs plex (Thp1), the polyadenylation machinery (Pap1), the 30 end
retention activity of ISW1 requires its recruitment onto chromatin processing machinery (Ref2, Rna15, Fip1), the transcription
but not its nucleosome sliding activity. Isw1 cooperates with the termination Nrd1-Nab3-Sen1 complex, or the nuclear pore com-
ribonucleolytic activity of Rrp6 for optimal quality control of plex (Nup159) (Figure S1C; Table 1). Consistently, deletion of
mRNPs close to the site of transcription. This unanticipated ISW1 did not affect the length of poly(A) tail in wild-type (WT)
function of ISW1 is consistent with its ability to interact with or npl3-1 cells (Figure S1D) indicating that the phenotypes
(E) Deletion of each subunit of the ISW1 complex rescues the growth of the npl3-1 mutant at 30 C on YPD.
(F) Inactivation of the ISW1 complex reduces the poly(A) nuclear accumulation defect of the npl3-1 mutant. Subcellular localization of poly(A) RNA was analyzed
as in (B) in the different strains grown overnight at 25 C in YPD and shifted for 3 hr at 30 C prior to fixation. Scale bar, 5 mm. See also Figure S1.
(G) Overexpression of Isw1 inhibits the growth of npl3-1 cells. Left: serial dilutions of strains grown on selective media. Right: total protein extracts from WT ISW1-
3FL or npl3-1 ISW1-3FL cells transformed with pRS415GPD or pRS415GPD3FL-ISW1 were analyzed by western blot with anti-FLAG and anti-Mex67 (loading
control) antibodies.
See also Figure S1.
Cell 167, 1201–1214, November 17, 2016 1203

A B Figure 2. The Function of ISW1 in Nuclear
mRNP Biogenesis Depends on Its Recruit-
ment onto Chromatin, but Not on Its Nucle-
osome Sliding Activity
(A) Deletion of ISW1 but not of other chromatin
remodeling complexes rescues the growth of the
npl3-1 mutant at 30 C. Growth of single mutants
is shown on Figure S2A.
(B) Schematic representation of the domains or-
ganization of Isw1. The SANT and SLIDE domains
of Isw1 facilitate its interaction with chromatin via
Set1-dependent H3K4 methylation whereas Ioc4
is recruited by the interaction of its PWWP domain
C D with Set2-dependent H3K36 methylation.
(C–F) Preventing ISW1 complex association to
chromatin (D–F) but not inactivating its catalytic
activity (C) recapitulates the effect of its inactiva-
tion on the growth of the npl3-1 mutant. ISW1
chromatin association was prevented by deleting
Isw1 SANT and SLIDE domains (D), inhibiting
H3K4me3 (E) or H3K36me3 (F). Expression of
Isw1 mutant proteins is shown on Figure S2B.
E F See also Figure S2.
tion (Figures 2C and S2B), indicating

that the nucleosome sliding activity of
Isw1 does not significantly contribute
to its function in nuclear mRNA accumu-
lation. In contrast, deleting the SANT
or SLIDE domains required for the
optimal association of Isw1 to chromatin
observed upon ISW1 inactivation are unlikely mediated by a (Clapier and Cairns, 2012; Mellor and Morillon, 2004; Pinskaya
major effect on mRNA 30 end processing. Ultimately, overex- et al., 2009) (Figure 2B) phenocopied its deletion (Figures 2D
pression of Isw1 increased its recruitment to transcribed genes and S2B). Tri-methylation at H3K4, a mark deposited by the
and inhibited the growth of npl3-1 but not WT cells (Figures 1G Set1 methyltransferase, significantly contributes to the recruit-
and S1E). ment of Isw1 onto chromatin (Santos-Rosa et al., 2003). Abolish-
Taken together, these results suggest a previously unantici- ing H3K4 methylation (set1D) or even H3K4 trimethylation
pated function for the ISW1 chromatin-remodeling complex in (spp1D) restored growth of the npl3-1 mutant (Figure 2E). Com-
controlling the nuclear accumulation of mRNPs. bined ISW1 and SET1 deletions did not display additive effect,
further supporting a role for chromatin recruitment of Isw1 in
The Function of ISW1 in Nuclear mRNP Biogenesis its mRNA biogenesis function (Figure 2E). Set2-mediated
Depends on Its Recruitment onto Chromatin, but Not on H3K36 methylation recruits the Ioc4 subunit of the complex to
Its Nucleosome Sliding Activity chromatin via an interaction with its PWWP domain (Smolle
Chromatin remodeling is achieved by four families of chromatin et al., 2012) (Figure 2B). The growth defect of the npl3-1 mutant
remodelers classified based on shared structural or functional at 30 C was also rescued when Ioc4 recruitment was impaired
domains (Becker and Workman, 2013; Swygert and Peterson, (set2D), and an additive effect was observed when both SET1
2014). Representative members of these four families were in- and SET2 were deleted, a condition that prevents recruitment
activated in the npl3-1 mutant and the growth of the resulting of Isw1 onto chromatin either directly or indirectly via Ioc4
double mutants was analyzed. We found that only the ISW1 (Figure 2F).
deletion was able to suppress the growth defect of the npl3-1 Collectively, these findings indicate that the chromatin remod-
mutant at 30 C (Figures 2A and S2A). This indicates that contrib- eling complex ISW1 participates in a mRNP nuclear biogenesis
uting to mRNP nuclear retention is not a common property of step that is independent of its catalytic activity but requires its
chromatin remodelers but is instead specific to the ISW1 recruitment onto chromatin.
complex.
To determine whether this role of ISW1 is related to its function Inactivation of ISW1 Does Not Alter the Transcription
as a chromatin remodeler, we first analyzed a catalytically inac- Elongation Rate
tive version of Isw1 with a single amino acid substitution (K227R) In yeast, inactivation of ISW1 only results in a modest derepres-
within its ATP binding site (Tsukiyama et al., 1999) (Figure 2B). sion of relatively few genes (Lenstra et al., 2011; Vary et al.,
This mutation did not recapitulate the effects of ISW1 inactiva- 2003). Nevertheless, mutations that reduce the transcription
1204 Cell 167, 1201–1214, November 17, 2016

A B
C D E
Figure 3. Inactivation of ISW1 Rescues Nuclear Export of Improper mRNPs and Resulting Genetic Instability
(A and B) Deletion of ISW1 does not affect transcription elongation. Serial dilutions of the indicated strains were grown with or without MPA.
(C) LYS2 transcription shut off experimental setting: npl3-1 and npl3-1 isw1D cells were shifted from 25 C to 30 C for 1 hr and transcription was blocked by
addition of phenanthroline (t = 0). Samples were collected for analysis at t = 0, 300 ,and 600 .
(D) Similar CTD recruitment to LYS2 in npl3-1 and npl3-1 isw1D cells analyzed by chromatin immunoprecipitation (ChIP) and normalized to the value at t = 0 (n = 3,
mean ± SD). See Figure S3C for non-normalized values.
Cell 167, 1201–1214, November 17, 2016 1205

rate were previously reported to suppress mRNA export-related this release results in a rescue of cell fitness, indicating that the
phenotypes (Jensen et al., 2004). We did not detect any signifi- mRNAs produced in the npl3-1 mutant may not be deleterious
cant effect of ISW1 deletion on the recruitment of polII CTD along per se but rather improperly packaged and therefore incompe-
two model genes (data not shown). We then examined the effect tent for export.
of DST1 inactivation on the growth of npl3-1 at 30 C. Dst1 is a
general transcription elongation factor (TFIIS) whose inactivation Inactivation of ISW1 Rescues the Genetic Instability
rescues some mRNP biogenesis mutants (Jensen et al., 2004). Caused by Improper mRNPs Assembly
We observed no significant rescue of the npl3-1 mutant upon RNA packaging is crucial to the maintenance of genome integrity
DST1 inactivation (Figures 3A and S3A). Mutants impaired in by counteracting the appearance of DNA damages (Santos-Per-
transcription elongation are sensitive to the elongation inhibitor eira and Aguilera, 2015). Improper mRNPs assembly would favor
mycophenolic acid (MPA), which depletes cellular pool of GTP. the formation of RNA::DNA hybrids (or R loops) between the
In contrast to dst1D cells, the growth of isw1D cells was not nascent mRNA and its template, which in turn disturbs replica-
affected by MPA (Figure S3B). Moreover, MPA was unable to tion fork progression, thereby generating DNA double strand
suppress the growth defect of npl3-1 cells (Figure 3B). Finally, breaks and unwanted recombination events. In this respect,
whereas DST1 inactivation increased the sensitivity of npl3-1 npl3D cells were recently reported to display R-loop-dependent
to MPA at 25 C, ISW1 inactivation had the opposite effect. The genetic instability (Santos-Pereira et al., 2013). To investigate
growth rescue of mRNP biogenesis mutants observed upon whether ISW1 inactivation would also rescue this phenotype,
ISW1 inactivation is thus unlikely to result from an altered we first monitored the formation of Rad52-YFP foci, which
transcription elongation rate. serves as a proxy for recombination repair centers (Lisby et al.,
2001). The percentage of cells exhibiting Rad52 foci was signif-
Inactivation of ISW1 Releases Nuclear-Retained mRNP icantly higher in npl3-1 than in WT cells, particularly in unbudded
To further investigate the function of ISW1 in mRNP biogenesis, cells (G1), whereas ISW1 deletion had no significant effect
we performed ‘‘transcriptional pulse-chase’’ experiments in (Figure 3G). Deletion of ISW1 led to a partial but significant
which the transcription of the constitutive LYS2 (Figure 3C) and reduction of the number of Rad52 foci in npl3-1 unbudded cells.
the inducible IMD2 (Figure S3D) genes was turned off by addition Transcription-dependent hyper recombination events were then
of the transcriptional inhibitor phenanthroline or by removal of analyzed, using previously described plasmid-based reporter
the inducer (6-Azauracil), respectively. These experimental set- systems. These plasmids bear truncated repeats of the LEU2
tings permit examination of the fate of transcripts synthesized gene separated (pLYDN) or not (pL) by a long transcribed inter-
prior to transcription inhibition in npl3-1 and npl3-1 isw1D cells. vening sequence and therefore allow an assessment of recombi-
During the chase period, we examined the recruitment of RNA nation (pL) or transcription-dependent recombination (pLYDN),
Pol II to LYS2 and IMD2 genes, the total amount, and the subcel- based on the frequency of Leu+ colonies (Figure 3H, left panel).
lular localization of their transcripts. No significant difference in Transcription-dependent hyper-recombination was slightly but
the recruitment of RNA Pol II to the LYS2 (Figures 3D and S3C) significantly increased in isw1D cells compared to WT, suggest-
and IMD2 (Figure S3E) genes was observed between npl3-1 ing that loss of Isw1 creates some level of genetic instability, as
and npl3-1 isw1D cells. Likewise, npl3-1 and npl3-1 isw1D cells recently reported for the loss of SNF2H, the ISW1 mammalian
showed comparable levels of LYS2 (Figure 3E) and IMD2 (Fig- homolog (Toiber et al., 2013). A severe transcription-dependent
ure S3F) transcripts during the course of the experiments, and hyper-recombination phenotype was observed in npl3-1 cells
the stability of IMD2 was equivalent between both strains (Fig- shifted for 3 hr at 30 C (Figure 3H), and this effect was
ure S3F). The function of Isw1 in mRNP biogenesis is thus inde- significantly reduced upon ISW1 deletion (Figure 3H). Therefore,
pendent from a role in the control of transcription or mRNA in RNA biogenesis mutants, inactivation of ISW1 releases nu-
degradation. In contrast, the percentage of cells displaying clear-retained transcript from chromatin, which correlates with
accumulation of transcripts in a nuclear dot decreased faster in a rescue of the associated genetic instability.
npl3-1 isw1D than in npl3-1 cells (Figures 3F and S3G), indicating
that the transcripts that are retained on chromatin in the npl3-1 Isw1 Is a Nuclear mRNP Quality Control Factor that
mutant were released upon ISW1 deletion. Therefore, ISW1 Cooperates with Rrp6
deletion counteracts nuclear retention of transcripts at chro- Our results support the hypothesis that ISW1 preferentially
matin, likely allowing their export to the cytoplasm. Importantly, tethers improperly packaged and export-incompetent mRNPs
(E) npl3-1 and npl3-1 isw1D show similar LYS2 mRNA levels as analyzed by qRT-PCR and normalized to ACT1 mRNA expression (n = 5, mean ± SD).
(F) Deletion of ISW1 releases the LYS2 transcripts accumulated in a nuclear dot of npl3-1 cells. The subcellular localization of the LYS2 transcript after blocking
transcription with phenanthroline in npl3-1 and npl3-1 isw1D cells was analyzed by FISH using Quasar570 -LYS2 probes. For each time point, the percentage of
cells showing a nuclear dot was scored (n = 3, mean ± SD). White arrows point to nuclear localized LYS2 transcripts. Scale bar, 5 mm.
(G) ISW1 inactivation reduces the number of spontaneous Rad52 foci in npl3-1 cells grown for 3 hr at 30 C. Fluorescence microscopic examination of the
indicated cells transformed with a pRS415-Rad52-YFP plasmid. White arrows highlight Rad52 foci in npl3-1 unbudded cells. For each cell type, an average of 300
budded and unbudded cells were examined (n = 3, mean ± SD).
(H) ISW1 inactivation partially rescues the hyperrecombination phenotype of the npl3-1 mutant. Recombination was analyzed in the indicated strains carrying
pL or pLYDN plasmids, grown for 3 hr at 30 C and plated at 25 C. Average and standard deviation of three fluctuation tests consisting of the median value of
12 independent colonies for each condition are shown.
See also Figure S3.
1206 Cell 167, 1201–1214, November 17, 2016

to chromatin, a property reminiscent of an mRNP quality con- Rrp6 (Figure S4I), possibly resulting from disassembly of the
trol factor. Intriguingly, isw1D cells were reported to exhibit nucleolus in mRNP export mutants (Thomsen et al., 2008).
high levels of stress granules (Buchan et al., 2013), which Together, these results support cooperation between Isw1
are stress-inducible cytoplasmic aggregates of ribonucleopro- and Rrp6 to guarantee proper nuclear mRNA surveillance.
tein complexes containing untranslated mRNA. Inactivation
of ISW1 in WT cells would lead to the export of immature Inactivation of ISW1 Rescues the Nuclear Export of a
mRNPs that are ultimately stored in these cytoplasmic stress Compromised Transcript
structures. To challenge this hypothesis, ISW1 inactivation was To reinforce this conclusion, we took advantage of a previously
combined with mutations in other mRNA quality control factors, described experimental model in which nuclear export of a spe-
and synthetic effects were analyzed. The deletion of both ISW1 cific transcript is compromised, as opposed to general impair-
and XRN1, which encodes a 50 –30 exonuclease involved in ment of the mRNA export pathway. The lys2-370 strain bears
cytoplasmic RNA decay, resulted in severe growth defects two consecutive point mutations in the LYS2 gene that prevent
compared to single mutants (Figure 4A). The perinuclear and export of its mRNA to the cytoplasm, thereby affecting cell
NPC-associated Mlp proteins actively retain unspliced tran- growth in lysine-free medium (Figure 5A). Nuclear retention, as
scripts at the nuclear side of the NPC (Galy et al., 2004; Palan- well as the Rrp6-mediated nuclear degradation, of the lys2-370
cade et al., 2005; Vinciguerra et al., 2005). Combining MLP1, transcripts have been clearly established (Das et al., 2006). Dele-
MLP2, and ISW1 deletions also resulted in synthetic growth tion of RRP6 or ISW1 rescued growth of the lys2-370 mutant
defect (Figure 4A). However, no genetic interaction between cells in lysine-free medium and deletion of both factors showed
ISW1 and RRP6, which encodes a catalytic subunit of the nu- a synergistic effect (Figures 5A and S5A). While ISW1 deletion
clear exosome, was observed (Figures 4A and S4A), indicating had no effect on the total lys2-370 transcript levels, it potentiated
that both proteins are either involved in unrelated functions the effect of RRP6 deletion (Figure 5B). Moreover, single or
or alternatively are implicated in the same cellular process. To combined deletions of ISW1 and RRP6 reduced the nuclear
distinguish between these possibilities, we analyzed the effect accumulation of lys2-370 transcripts and increased their cyto-
of RRP6 deletion in a context where mRNA export is compro- plasmic localization (Figures 5C and S5B). These results
mised. As previously reported, RRP6 deletion impaired the strengthen the model whereby ISW1 tethers improper tran-
growth of npl3-1 cells at 25 C (Figures 4B and S4B), a tempera- scripts to chromatin and cooperates with the nuclear exosome
ture at which the mutant displays no growth nor mRNA export to ensure an appropriate quality control. To exert this function,
defects (Burkard and Butler, 2000). In striking contrast, inactiva- chromatin-associated ISW1 is predicted to interact, at least
tion of RRP6, like ISW1, partially rescued the growth of the transiently, with nuclear mRNPs. We thus performed RNA
npl3-1 cells at 30 C, a phenotype reversed by the ectopic immunoprecipitation assays followed by qRT-PCR (RIP) in WT
expression of WT Rrp6 but not by rrp6D236A, a catalytically or lys2-370 strains expressing an endogenously PrA-tagged
inactive mutant (Figures 4B, 4C, S4B, and S4C). Remarkably, version of Isw1. As a control for the RIP, a TAP-tagged version
co-deletion of ISW1 and RRP6 further increased the overall cells of the nuclear cap-binding protein Cbp20 was used as it readily
fitness, indicating that Isw1 and Rrp6 likely achieve different recovers mRNAs (Figure S5C).
molecular functions in the same pathway (Figure 4B). Deletion Whereas the same weak but significant amount of ACT1 tran-
of TRF4, a polyA polymerase of the TRAMP complex that facili- script could be co-precipitated with Isw1 both in WT and the
tates the exosome access to its substrates (LaCava et al., 2005), lys2-370 strains, the association of Isw1 with the LYS2 transcript
also partially rescued the growth of the npl3-1 mutant at 30 C was significantly increased in the lys2-370 strain compared to
and showed additive effect with ISW1 deletion (Figures S4D WT (Figure 5D). Isw1 thus appear to preferentially interact with
and S4E). Together, these results indicate that Isw1 cooperates export-deficient transcripts, likely contributing to their nuclear
with the ribonucleolytic activity of Rrp6 for optimal nuclear retention.
quality control.
The microscopic inspection of the subcellular localization of ISW1 Interacts with mRNP
the LYS2 transcripts by FISH (Figure 4D) revealed that npl3-1 We then analyzed the mRNA-binding profile of Isw1 in WT and
cells grown overnight at 30 C accumulated LYS2 transcripts npl3-1 cells expressing an endogenously PrA-tagged version
in large nuclear foci. Upon ISW1 or RRP6 deletion, LYS2 tran- of Isw1 using the same approach. The mRNAs immunoprecipi-
scripts were detected in the cytoplasm and were even more tated with Isw1 were analyzed by RNA sequencing (RNA-seq)
abundant in npl3-1 isw1D rrp6D cells than in WT cells, indicating to generate lists of transcripts that were statistically enriched
a restored mRNA nuclear export (Figure 4D). This functional (RIP target) or depleted in the immunoprecipitated samples rela-
cooperation between the ISW1 complex, and the nuclear tive to input (Table S1). We identified 309 Isw1 RIP targets in WT
exosome was accompanied by their molecular interaction (Fig- cells and 1,669 targets in npl3-1 cells, a group encompassing
ure 4E). In marked contrast, no co-immunoprecipitation between 95.1% of the transcripts that associated with Isw1 in WT cells
Isw2 and Rrp6 could be detected (Figure S4G). Co-immuno- (Figure 6A). This increase in Isw1 association in npl3-1 cells is
precipitation of Isw1 and Rrp6 was evidenced, which was consistent with a role of Isw1 in binding export-defective tran-
RNase-insensitive (Figure S4F) and significantly enhanced in scripts. Of note, IOC2 and IOC3 transcripts were identified
npl3-1 cells compared to WT (Figure 4E). Similarly, the co-local- among the enriched RNAs in the Isw1 immunoprecipitates. Inter-
ization of Isw1 and Rrp6 increased significantly in npl3-1 estingly, many RNA-binding proteins were reported to associate
compared to WT (Figure S4H). This reflects delocalization of to their own transcripts (Hogan et al., 2008), a feature that can
Cell 167, 1201–1214, November 17, 2016 1207

A B
Figure 4. Isw1 Participates in mRNP Surveillance

(A) ISW1 deletion displays synthetic negative genetic interaction with XRN1 and MLP1 MLP2 but is neutral toward RRP6.
(B) ISW1 and RRP6 deletions rescue the growth of the npl3-1 mutant and show additive effects.
(C) A catalytically inactive version of Rrp6 rescues the growth of npl3-1 cells at 30 C. For (C), serial dilutions of the npl3-1, npl3-1 rrp6D, and npl3-1 rrp6D isw1D
strains transformed with an empty pRS415 (), with pRS415-RRP6 or pRS415-rrp6D238A.
(D) Deletions of ISW1 and RRP6 synergistically restore the cytoplasmic localization of the LYS2 transcript in npl3-1 cells grown overnight at 30 C as analyzed by
FISH using Quasar570 -LYS2 probes. Noteworthy, while npl3-1 cells are misshapen, inactivation of ISW1 or RRP6 partially restores their shape and npl3-1 isw1D
rrp6D cells are indistinguishable from WT. Scale bar, 5 mm. For each strain, the number of cells containing only one nuclear dot (black box), no signal (white), less
(gray), or more (hatched) than four cytoplasmic dots was quantified on at least 300 cells (n = 2, mean ± SD).
(E) Isw1 interacts with the Rrp6 and Rrp4 subunits of the nuclear exosome in WT cells and this interaction is fostered in npl3-1 cells shifted for 3 hr at 30 C.
Cell lysates (Input) and immunoprecipitates (IP) were analyzed by immunoblotting with anti-tags or anti-Mex67 antibodies. (), HA, untagged strain. The ratio of
co-immunoprecipitated Isw1-13myc relative to immunoprecipitated Rrp4-3HA (n = 3, mean ± SD).
See also Figure S4.
1208 Cell 167, 1201–1214, November 17, 2016

A B Figure 5. Inactivation of ISW1 Rescues the
Nuclear Export of the lys2-370 Transcript
(A) ISW1 and RRP6 deletions rescue the growth of
the lys2-370 mutant on DO-LYS and have additive
effects.
(B) Effect of ISW1 and RRP6 deletions on the total
level of the LYS2/lys2-370 transcript, analyzed
by qRT-PCR and normalized by the expression of
ACT1 transcript (n = 3, mean ± SD).
(C) FISH analysis of the subcellular localization of the
LYS2 transcript, using Quasar570 -LYS2 probes in
C strains grown overnight at 30 C in YPD. Scale bar,
5 mm. For each strain, the number of cells containing
only one nuclear dot (black box), less (white), or
more (hatched) than four cytoplasmic dots was
quantified on at least 300 cells (n = 2, mean ± SD).
(D) The lys2-370 transcript interacts with Isw1. RNA
immunoprecipitation experiments were performed
with PrA-tagged Isw1 in LYS2 and lys2-370 strains.
The ratio of co-immunoprecipitated ACT1 or LYS2
RNA relative to the total RNA present in each
strain quantified by qRT-PCR is represented. (n = 3,
mean ± SD). Untagged WT cells were used as
negative () control. Every IP is significant (p < 0.01)
compared to the untagged strain.
See also Figure S5.
nucleosome-depleted region at the pro-

D moter (Kaplan et al., 2009) (Figure 6D),
lower average levels of TBP binding (Ven-
ters and Pugh, 2009) (Figure S6D), and
H3K4 acetylation (Guillemette et al., 2011)
(Figure S6E). Moreover, Isw1 RIP targets
in WT are enriched for much longer tran-
scripts when compared to the genome-
wide distribution (Figure 6E). Importantly,
these features are specific for transcripts
co-immunoprecipitated with Isw1, as they
were not observed for Pab1 RIP targets
(Costello et al., 2015) (not shown). Finally,
we did not observe any correlation be-
tween Isw1/Ioc3-enriched genes and
Isw1 RIP targets supporting distinct roles
favor the co-translational assembly of multi-protein complexes for Isw1 in chromatin remodeling and RNA interactions (Smolle
(Halbach et al., 2009). et al., 2012) (Figures 6F and S6F).
We validated these genome-wide enrichments by performing In order to define whether Isw1 could interact directly with
RIP followed by qRT-PCR analysis on selected transcripts. To mRNA, we then performed in vivo UV cross-linking in strains ex-
take two examples, IOC3 transcripts co-immunoprecipitated pressing endogenously C-terminal HTP(HIS6-TEV-PrA)-tagged
with Isw1-Tap, while PMA1 transcripts were not significantly versions of Isw1 or Ioc2. After purification of the complexes,
enriched in the Isw1 immunoprecipitate. In contrast, both we assessed the presence of crosslinked protein-RNA adducts
IOC3 and PMA1 transcripts were co-immunoprecipitated with by radioactively labeling the nucleic acids associated with the
Cbp20-Tap (Figure S6A). Most importantly, for all genes stud- purified proteins and analyzing the pattern of radioactive species
ied, the interaction between Isw1 and its RIP targets was (Granneman et al., 2009). Analysis of the WT strain expressing
increased significantly in the npl3-1 mutant compared to Isw1-HTP revealed a radioactive band that migrated at the
WT (Figures 6B and S6B). expected molecular mass for Isw1-His6 and was absent from
Interestingly, the genes encoding Isw1 RIP targets show lower the untagged control (Figure 6A). This signal increased upon
average levels of sense and antisense transcription in WT cells UV crosslinking and disappeared upon RNase treatment,
when compared with all genes (Churchman and Weissman, indicating that it represents Isw1 cross-linked to RNA. Similar re-
2011) (Figures 6C and S6C). They also have a more restricted sults were obtained when Ioc2-HTP strains were examined,
Cell 167, 1201–1214, November 17, 2016 1209

Figure 6. Isw1 Interacts with Nuclear mRNPs
(A) Venn diagram for transcripts statistically enriched in WT, npl3-1, and no tag strains.
(B) The interaction between Isw1 and its mRNA targets is increased in the npl3-1 mutant compared to WT. RNA immunoprecipitation experiments were per-
formed with PrA-tagged Isw1 in WT and npl3-1 cells grown for 3 hr at 30 C. The ratio of five co-immunoprecipitated mRNA targets identified in the genome-wide
analysis (IOC2, IOC3, INO80, MDN1, HAP1) relative to the total RNA present in each strain was quantified by qRT-PCR (n = 3, mean ± SD). One representative
and not normalized experiment is shown in Figure S6C.
(C, D, and F) Average levels of sense transcript (C), nucleosome occupancy (D), and Isw1 (F) at those genes whose transcripts are statistically enriched for, in an
Isw1-PrA immunoprecipitate. Shown are the classes of genes enriched in WT (blue) and in npl3-1 (red) strains, compared to all protein-coding, non-dubious
genes in the yeast genome (green).
(E) Histograms showing length distributions of transcripts enriched in WT (blue) and in npl3-1 (red) strains, compared to all protein-coding transcripts in the yeast
genome (green). See also Figure S6 and Table S1.
1210 Cell 167, 1201–1214, November 17, 2016

whereas no RNA could be cross-linked to Rpb3 (Figure S6G), ISW1 targets as well as the interaction of Isw1 with the nuclear
indicating that Isw1 and Ioc2 interact specifically and directly exosome increases. The catalytic activity of Rrp6 combined to
with RNA. the tethering action of ISW1 may therefore concurrently ensure
Together, these data indicate that ISW1 directly interacts an accurate quality control of nuclear mRNP biogenesis. Taken
with mRNAs and that Isw1 RIP targets are weakly transcribed together, these data reinforce the emerging concept that chro-
and long. This length bias is consistent with the hypothesis of matin factors can regulate the progression of mRNPs from syn-
an increased residency time of Isw1 RIP targets at the vicinity thesis to nuclear exit.
of their transcription site that could favor their direct interaction An intriguing aspect of our results is that the release of defec-
with the ISW1 complex. tive or premature mRNPs followed by their export to the cyto-
plasm is beneficial to cells, indicating that their mere nuclear
DISCUSSION retention, as opposed to an intrinsic deficiency, impairs cells
fitness. Consistently, deletion of RRP6 rescues the biogenesis,
Here, we report that the chromatin remodeling complex ISW1 is nuclear mRNA accumulation, and growth defects of rna14.1, a
a crucial actor of nuclear mRNP biogenesis surveillance that mutant of the cleavage and polyadenylation factor I (Torchet
directly interacts with mRNAs in vivo, tethering abnormal mRNPs et al., 2002). This is also reminiscent of the CFTRDF508 variant
close to their site of transcription and preventing their untimely of CFTR, a well-known substrate of protein quality control
export to the cytoplasm. To date, this tethering activity was responsible for cystic fibrosis. Protein QC retains CFTRDF508
only attributed to Rrp6 although it was unclear how the defective in the endoplasmic reticulum, preventing its plasma mem-
mRNP remained attached to chromatin. We propose that chro- brane localization and thus proper cellular function. However,
matin localized ISW1 guarantees this connection, acting as a restoring the correct localization of the mutant protein is suffi-
molecular hook for inappropriate or premature mRNPs. cient to partially restore cell fitness (Denning et al., 1992).
ISW1 inactivation was able to rescue mRNA export mu- Hence although the beneficial consequences of inactivating a
tants deficient for bona fide export factors that bind mRNAs QC pathway might a priori seem paradoxical, it is not an excep-
throughout their entire length but not mutants for factors involved tion. Interestingly, inactivation of ISW1 was reported to in-
in polyadenylation or in the 30 end processing of mRNAs (Baejen crease the number of cytoplasmic stress granules (Buchan
et al., 2014; Tuck and Tollervey, 2013). This suggests that et al., 2013). Therefore, although the immature mRNPs that
the interaction between ISW1 and transcripts produced in these are exported upon ISW1 inactivation improve the phenotypes
mutants could be favored by their overall impaired packaging. of biogenesis mutants, they are also likely to dramatically
Mutants of TREX-2, which influence transcription and mRNA increase the cytoplasmic stress granules content. As a
export (Fischer et al., 2002), were not rescued upon inactivation result, the fitness of these mutants might rely on a subtle bal-
of ISW1. It is conceivable that the nuclear retention of mRNPs ance between the beneficial and detrimental consequences
generated in biogenesis/export mutants that are not rescued of ISW1 loss.
by ISW1 inactivation is favored by interactions with alternate Even though this novel function of ISW1 was evidenced in
chromatin-binding factors. the context of mRNP biogenesis mutants, it probably also oper-
Proteins involved in chromatin regulation were previously ates in WT cells, not only on the identified ISW1 mRNA targets
recognized to have the ability to bind RNA (Muchardt et al., but also on an array of transcripts that may become targets
2002), but only rare examples provided a functional role for upon certain environmental conditions. In this respect, the sur-
such interactions (Keller et al., 2012). In this regard, we found veillance activity of ISW1 might be regulated, most likely by a
that chromatin-bound ISW1 was able to support its surveillance modulation of its chromatin or RNA association. Recent studies
activity. We can speculate that the affinity of chromatin-bound exemplified that nuclear retention of mRNA is a widespread
ISW1 for mRNA is higher than that of free ISW1. In support of strategy used by mammalian cells to buffer transcriptional noise
such hypothesis, two H3K9me binding chromodomain proteins (Bahar Halpern et al., 2015; Battich et al., 2015). Our results open
of Schizosaccharomyces pombe were recently reported to the possibility that direct interactions of mRNA with ISW1
exhibit stronger RNA binding activity in their chromatin-bound homologs or other chromatin localized protein might represent
form than in their free form (Ishida et al., 2012). Hence, newly general means used by cells to achieve this nuclear retention.
synthetized export-incompetent mRNPs or premature mRNPs
that are still in the vicinity of chromatin would be more likely STAR+METHODS
to associate to chromatin-bound ISW1 than to a free pool of
ISW1. In agreement, a small but significant fraction of transcripts Detailed methods are provided in the online version of this paper
already interact with ISW1 in WT conditions. These Isw1 RIP and include the following:
targets are enriched in long and weakly expressed mRNAs,
which might limit their ability to mature at the same rate than d KEY RESOURCES TABLE
non-Isw1 targets. In mRNP biogenesis mutants, the pool of d CONTACT FOR REAGENT AND RESSOURCES SHARING
(G) Isw1 UV cross-links to RNA in vivo. HTP tagged Isw1 was cross-linked (+) or not () and purified from cell extracts. A total of 2.5% of the nickel eluate was
resolved by SDS-PAGE after (lanes 5–6) or not (lanes 3–4) RNase treatment and detected by autoradiography (upper panel) or anti-HIS western blot (lower panel).
An untagged strain was used as a control (lanes 1–2). The red asterisk indicates a contaminant band.
See also Figure S6.
Cell 167, 1201–1214, November 17, 2016 1211

d EXPERIMENTAL MODEL AND SUBJECT DETAILS Becker, P.B., and Workman, J.L. (2013). Nucleosome remodeling and epige-
B Yeast Strains and Growth netics. Cold Spring Harb. Perspect. Biol. 5, a017905.
B Plasmids Bentley, D.L. (2014). Coupling mRNA processing with transcription in time and
d METHOD DETAILS space. Nat. Rev. Genet. 15, 163–175.
B Fluorescence In Situ Hybridization Bohnsack, M.T., Tollervey, D., and Granneman, S. (2012). Identification of
B Fluorescence Microscopy RNA helicase target sites by UV cross-linking and analysis of cDNA. Methods
Enzymol. 511, 275–288.
B Co-immunoprecipitation Experiments
B Chromatin Immunoprecipitation Buchan, J.R., Kolaitis, R.M., Taylor, J.P., and Parker, R. (2013). Eukaryotic
stress granules are cleared by autophagy and Cdc48/VCP function. Cell
B RNA Immunoprecipitation, RTqPCR, and Deep
153, 1461–1474.
Sequencing
Burkard, K.T., and Butler, J.S. (2000). A nuclear 30 -50 exonuclease involved in
B In Vivo Crosslinking Assay
mRNA degradation interacts with Poly(A) polymerase and the hnRNA protein
B Next-Generation Sequence Analysis Npl3p. Mol. Cell. Biol. 20, 604–616.
B Poly(A) Tail Length Analysis
Churchman, L.S., and Weissman, J.S. (2011). Nascent transcript sequencing
d QUANTIFICATION AND STATISTICAL ANALYSIS visualizes transcription at nucleotide resolution. Nature 469, 368–373.
d DATA AND SOFTWARE AVAILABILITY Clapier, C.R., and Cairns, B.R. (2012). Regulation of ISWI involves inhibitory
modules antagonized by nucleosomal epitopes. Nature 492, 280–284.
SUPPLEMENTAL INFORMATION Costello, J., Castelli, L.M., Rowe, W., Kershaw, C.J., Talavera, D., Moham-
mad-Qureshi, S.S., Sims, P.F., Grant, C.M., Pavitt, G.D., Hubbard, S.J., and
Supplemental Information includes six figures and one table and can be found Ashe, M.P. (2015). Global mRNA selection mechanisms for translation initia-
with this article online at http://dx.doi.org/10.1016/j.cell.2016.10.048. tion. Genome Biol. 16, 10.
Das, B., Das, S., and Sherman, F. (2006). Mutant LYS2 mRNAs retained and
AUTHOR CONTRIBUTIONS degraded in the nucleus of Saccharomyces cerevisiae. Proc. Natl. Acad.
Sci. USA 103, 10871–10876.
Conceptualization, A.B. and C.D.; Investigation, A.B., J.D.-S., Q.S., S.M., A.G.,
Denning, G.M., Anderson, M.P., Amara, J.F., Marshall, J., Smith, A.E., and
D.C., M.F., and B.P.; Software, S.M.; Writing – Original Draft, A.B. and C.D.;
Welsh, M.J. (1992). Processing of mutant cystic fibrosis transmembrane
Writing – Review & Editing, A.B., A.C., B.P., C.D., D.L., J.M., and S.M.; Re-
conductance regulator is temperature-sensitive. Nature 358, 761–764.
sources, A.B., A.C., B.P., C.D., D.L., and J.M.; Supervision, A.B., A.C., B.P.,
D.L., C.D., and J.M. Dziembowski, A., Lorentzen, E., Conti, E., and Séraphin, B. (2007). A single
subunit, Dis3, is essentially responsible for yeast exosome core activity. Nat.
ACKNOWLEDGMENTS Struct. Mol. Biol. 14, 15–22.
Fazzio, T.G., and Tsukiyama, T. (2003). Chromatin remodeling in vivo: evi-
We thank T. Tsukiyama, B. Das, E. Fabre, P. Hieter, V. Géli, A. Aguilera, R. dence for a nucleosome sliding mechanism. Mol. Cell 12, 1333–1340.
Rothstein, T.H. Jensen, and F. Stutz for reagents and strains, A. Silvain for Fischer, T., Strässer, K., Rácz, A., Rodriguez-Navarro, S., Oppizzi, M., Ihrig, P.,
technical help and C. Antoniewski for advices, members of the C.D. laboratory, Lechner, J., and Hurt, E. (2002). The mRNA export machinery requires the
and V. Géli and J. Weitzman for helpful discussions. This work was supported novel Sac3p-Thp1p complex to dock at the nucleoplasmic entrance of the
by the Who am I? laboratory of excellence (ANR-11-LABX-0071) funded by the nuclear pores. EMBO J. 21, 5843–5852.
‘‘Investments for the Future’’ program operated by The French National
Galy, V., Gadal, O., Fromont-Racine, M., Romano, A., Jacquier, A., and Nehr-
Research Agency (ANR-11-IDEX-0005-01, ANR 2010-BLAN-1227-01), the
bass, U. (2004). Nuclear retention of unspliced mRNAs in yeast is mediated by
Association de Recherche contre le Cancer, and the Ligue Nationale contre
perinuclear Mlp1. Cell 116, 63–73.
le Cancer. J.D.-S. was supported by the University Paris Diderot and Q.S.
by the Fondation pour la Recherche Médicale. Gkikopoulos, T., Schofield, P., Singh, V., Pinskaya, M., Mellor, J., Smolle, M.,
Workman, J.L., Barton, G.J., and Owen-Hughes, T. (2011). A role for
Received: June 2, 2016 Snf2-related nucleosome-spacing enzymes in genome-wide nucleosome
Revised: September 12, 2016 organization. Science 333, 1758–1760.
Accepted: October 27, 2016 Granneman, S., Kudla, G., Petfalski, E., and Tollervey, D. (2009). Identification
Published: November 17, 2016 of protein binding sites on U3 snoRNA and pre-rRNA by UV cross-linking and
high-throughput analysis of cDNAs. Proc. Natl. Acad. Sci. USA 106, 9613–
REFERENCES 9618.
Guillemette, B., Drogaris, P., Lin, H.H., Armstrong, H., Hiragami-Hamada, K.,
Apponi, L.H., Leung, S.W., Williams, K.R., Valentini, S.R., Corbett, A.H., and Imhof, A., Bonneil, E., Thibault, P., Verreault, A., and Festenstein, R.J.
Pavlath, G.K. (2010). Loss of nuclear poly(A)-binding protein 1 causes (2011). H3 lysine 4 is acetylated at active gene promoters and is regulated
defects in myogenesis and mRNA biogenesis. Hum. Mol. Genet. 19, 1058– by H3 lysine 4 methylation. PLoS Genet. 7, e1001354.
1065.
Gunderson, F.Q., Merkhofer, E.C., and Johnson, T.L. (2011). Dynamic histone
Babour, A., Dargemont, C., and Stutz, F. (2012). Ubiquitin and assembly of acetylation is critical for cotranscriptional spliceosome assembly and spliceo-
export competent mRNP. Biochim. Biophys. Acta 1819, 521–530. somal rearrangements. Proc. Natl. Acad. Sci. USA 108, 2004–2009.
Baejen, C., Torkler, P., Gressel, S., Essig, K., Söding, J., and Cramer, P. (2014). Gwizdek, C., Iglesias, N., Rodriguez, M.S., Ossareh-Nazari, B., Hobeika, M.,
Transcriptome maps of mRNP biogenesis factors define pre-mRNA recogni- Divita, G., Stutz, F., and Dargemont, C. (2006). Ubiquitin-associated domain
tion. Mol. Cell 55, 745–757. of Mex67 synchronizes recruitment of the mRNA export machinery with tran-
Bahar Halpern, K., Caspi, I., Lemze, D., Levy, M., Landen, S., Elinav, E., Ulitsky, scription. Proc. Natl. Acad. Sci. USA 103, 16376–16381.
I., and Itzkovitz, S. (2015). Nuclear Retention of mRNA in Mammalian Tissues. Halbach, A., Zhang, H., Wengi, A., Jablonska, Z., Gruber, I.M., Halbeisen, R.E.,
Cell Rep. 13, 2653–2662. Dehé, P.M., Kemmeren, P., Holstege, F., Géli, V., et al. (2009). Cotranslational
Battich, N., Stoeger, T., and Pelkmans, L. (2015). Control of transcript assembly of the yeast SET1C histone methyltransferase complex. EMBO J.
variability in single mammalian cells. Cell 163, 1596–1610. 28, 2959–2970.
1212 Cell 167, 1201–1214, November 17, 2016

Hautbergue, G.M., Hung, M.L., Walsh, M.J., Snijders, A.P., Chang, C.T., Mitchell, P., Petfalski, E., Shevchenko, A., Mann, M., and Tollervey, D. (1997).
Jones, R., Ponting, C.P., Dickman, M.J., and Wilson, S.A. (2009). UIF, a New The exosome: a conserved eukaryotic RNA processing complex containing
mRNA export adaptor that works together with REF/ALY, requires FACT for multiple 30 –>50 exoribonucleases. Cell 91, 457–466.
recruitment to mRNA. Curr. Biol. 19, 1918–1924. Morillon, A., Karabetsou, N., O’Sullivan, J., Kent, N., Proudfoot, N., and Mellor,
Herissant, L., Moehle, E.A., Bertaccini, D., Van Dorsselaer, A., Schaeffer-Re- J. (2003). Isw1 chromatin remodeling ATPase coordinates transcription elon-
iss, C., Guthrie, C., and Dargemont, C. (2014). H2B ubiquitylation modulates gation and termination by RNA polymerase II. Cell 115, 425–435.
spliceosome assembly and function in budding yeast. Biol. Cell 106, 126–138. Muchardt, C., Guilleme, M., Seeler, J.S., Trouche, D., Dejean, A., and Yaniv, M.
Hilleren, P., McCarthy, T., Rosbash, M., Parker, R., and Jensen, T.H. (2001). (2002). Coordinated methyl and RNA binding is required for heterochromatin
Quality control of mRNA 30 -end processing is linked to the nuclear exosome. localization of mammalian HP1alpha. EMBO Rep. 3, 975–981.
Nature 413, 538–542. Murray, S.C., Serra Barros, A., Brown, D.A., Dudek, P., Ayling, J., and Mellor, J.
Hogan, D.J., Riordan, D.P., Gerber, A.P., Herschlag, D., and Brown, P.O. (2012). A pre-initiation complex at the 30 -end of genes drives antisense tran-
(2008). Diverse RNA-binding proteins interact with functionally related sets scription independent of divergent sense transcription. Nucleic Acids Res.
of RNAs, suggesting an extensive regulatory system. PLoS Biol. 6, e255. 40, 2432–2444.
Nino, C.A., Herissant, L., Babour, A., and Dargemont, C. (2013). mRNA nuclear
Iglesias, N., Tutucci, E., Gwizdek, C., Vinciguerra, P., Von Dach, E., Corbett,
export in yeast. Chem Rev. 113, 8523–8545.
A.H., Dargemont, C., and Stutz, F. (2010). Ubiquitin-mediated mRNP dy-
namics and surveillance prior to budding yeast mRNA export. Genes Dev. Oeffinger, M., Wei, K.E., Rogers, R., DeGrasse, J.A., Chait, B.T., Aitchison,
24, 1927–1938. J.D., and Rout, M.P. (2007). Comprehensive analysis of diverse ribonucleopro-
tein complexes. Nat. Methods 4, 951–956.
Ishida, M., Shimojo, H., Hayashi, A., Kawaguchi, R., Ohtani, Y., Uegaki, K.,
Nishimura, Y., and Nakayama, J. (2012). Intrinsic nucleic acid-binding activity Palancade, B., Zuccolo, M., Loeillet, S., Nicolas, A., and Doye, V. (2005).
of Chp1 chromodomain is required for heterochromatic gene silencing. Mol. Pml39, a novel protein of the nuclear periphery required for nuclear retention
Cell 47, 228–241. of improper messenger ribonucleoparticles. Mol. Biol. Cell 16, 5258–5268.
Jensen, T.H., Dower, K., Libri, D., and Rosbash, M. (2003). Early formation of Pinskaya, M., Nair, A., Clynes, D., Morillon, A., and Mellor, J. (2009). Nucleo-
mRNP: license for export or quality control? Mol. Cell 11, 1129–1138. some remodeling and transcriptional repression are distinct functions of
Isw1 in Saccharomyces cerevisiae. Mol. Cell. Biol. 29, 2419–2430.
Jensen, T.H., Boulay, J., Olesen, J.R., Colin, J., Weyler, M., and Libri, D. (2004).
Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bio-
Modulation of transcription affects mRNP quality. Mol. Cell 16, 235–244.
conductor package for differential expression analysis of digital gene expres-
Kallehauge, T.B., Robert, M.C., Bertrand, E., and Jensen, T.H. (2012). Nuclear sion data. Bioinformatics 26, 139–140.
retention prevents premature cytoplasmic appearance of mRNA. Mol. Cell 48,
Saint-André, V., Batsché, E., Rachez, C., and Muchardt, C. (2011). Histone H3
145–152.
lysine 9 trimethylation and HP1g favor inclusion of alternative exons. Nat.
Kaplan, N., Moore, I.K., Fondufe-Mittendorf, Y., Gossett, A.J., Tillo, D., Field, Struct. Mol. Biol. 18, 337–344.
Y., LeProust, E.M., Hughes, T.R., Lieb, J.D., Widom, J., and Segal, E. (2009). Santos-Pereira, J.M., and Aguilera, A. (2015). R loops: new modulators of
The DNA-encoded nucleosome organization of a eukaryotic genome. Nature genome dynamics and function. Nat. Rev. Genet. 16, 583–597.
458, 362–366.
Santos-Pereira, J.M., Herrero, A.B., Garcı́a-Rubio, M.L., Marı́n, A., Moreno, S.,
Keller, C., Adaixo, R., Stunnenberg, R., Woolcock, K.J., Hiller, S., and Bühler, and Aguilera, A. (2013). The Npl3 hnRNP prevents R-loop-mediated transcrip-
M. (2012). HP1(Swi6) mediates the recognition and destruction of heterochro- tion-replication conflicts and genome instability. Genes Dev. 27, 2445–2458.
matic RNA transcripts. Mol. Cell 47, 215–227.
Santos-Rosa, H., Schneider, R., Bernstein, B.E., Karabetsou, N., Morillon, A.,
Krajewski, W.A. (2013). Comparison of the Isw1a, Isw1b, and Isw2 nucleo- Weise, C., Schreiber, S.L., Mellor, J., and Kouzarides, T. (2003). Methylation of
some disrupting activities. Biochemistry 52, 6940–6949. histone H3 K4 mediates association of the Isw1p ATPase with chromatin. Mol.
LaCava, J., Houseley, J., Saveanu, C., Petfalski, E., Thompson, E., Jacquier, Cell 12, 1325–1332.
A., and Tollervey, D. (2005). RNA degradation by the exosome is promoted Sims, R.J., 3rd, Millhouse, S., Chen, C.F., Lewis, B.A., Erdjument-Bromage,
by a nuclear polyadenylation complex. Cell 121, 713–724. H., Tempst, P., Manley, J.L., and Reinberg, D. (2007). Recognition of trimethy-
lated histone H3 lysine 4 facilitates the recruitment of transcription postinitia-
Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and
memory-efficient alignment of short DNA sequences to the human genome. tion factors and pre-mRNA splicing. Mol. Cell 28, 665–676.
Genome Biol. 10, R25. Smolle, M., Venkatesh, S., Gogol, M.M., Li, H., Zhang, Y., Florens, L., Wash-
burn, M.P., and Workman, J.L. (2012). Chromatin remodelers Isw1 and
Lenstra, T.L., Benschop, J.J., Kim, T., Schulze, J.M., Brabers, N.A., Margaritis,
Chd1 maintain chromatin structure during transcription by preventing histone
T., van de Pasch, L.A., van Heesch, S.A., Brok, M.O., Groot Koerkamp, M.J.,
exchange. Nat. Struct. Mol. Biol. 19, 884–892.
et al. (2011). The specificity and topology of chromatin interaction pathways in
yeast. Mol. Cell 42, 536–549. Swygert, S.G., and Peterson, C.L. (2014). Chromatin dynamics: interplay
between remodeling enzymes and histone modifications. Biochim. Biophys.
Lisby, M., Rothstein, R., and Mortensen, U.H. (2001). Rad52 forms DNA repair
Acta 1839, 728–736.
and recombination centers during S phase. Proc. Natl. Acad. Sci. USA 98,
8276–8282. Thomsen, R., Saguez, C., Nasser, T., and Jensen, T.H. (2008). General, rapid,
and transcription-dependent fragmentation of nucleolar antigens in S. cerevi-
Lomvardas, S., and Thanos, D. (2001). Nucleosome sliding via TBP DNA bind- siae mRNA export mutants. RNA 14, 706–716.
ing in vivo. Cell 106, 685–696.
Tirosh, I., Sigal, N., and Barkai, N. (2010). Widespread remodeling of mid-cod-
Longtine, M.S., McKenzie, A., 3rd, Demarini, D.J., Shah, N.G., Wach, A., Bra- ing sequence nucleosomes by Isw1. Genome Biol. 11, R49.
chat, A., Philippsen, P., and Pringle, J.R. (1998). Additional modules for
Toiber, D., Erdel, F., Bouazoune, K., Silberman, D.M., Zhong, L., Mulligan, P.,
versatile and economical PCR-based gene deletion and modification in
Sebastian, C., Cosentino, C., Martinez-Pastor, B., Giacosa, S., et al. (2013).
Saccharomyces cerevisiae. Yeast 14, 953–961.
SIRT6 recruits SNF2H to DNA break sites, preventing genomic instability
Luco, R.F., Allo, M., Schor, I.E., Kornblihtt, A.R., and Misteli, T. (2011). Epige- through chromatin remodeling. Mol. Cell 51, 454–468.
netics in alternative pre-mRNA splicing. Cell 144, 16–26.
Torchet, C., Bousquet-Antonelli, C., Milligan, L., Thompson, E., Kufel, J., and
Mellor, J., and Morillon, A. (2004). ISWI complexes in Saccharomyces cerevi- Tollervey, D. (2002). Processing of 30 -extended read-through transcripts by the
siae. Biochim. Biophys. Acta 1677, 100–112. exosome can generate functional mRNAs. Mol. Cell 9, 1285–1296.
Cell 167, 1201–1214, November 17, 2016 1213

Tsukiyama, T., Palmer, J., Landel, C.C., Shiloach, J., and Wu, C. (1999). Vinciguerra, P., Iglesias, N., Camblong, J., Zenklusen, D., and Stutz, F. (2005).
Characterization of the imitation switch subfamily of ATP-dependent Perinuclear Mlp proteins downregulate gene expression in response to a
chromatin-remodeling factors in Saccharomyces cerevisiae. Genes Dev. 13, defect in mRNA export. EMBO J. 24, 813–823.
686–697. Vitaliano-Prunier, A., Babour, A., Hérissant, L., Apponi, L., Margaritis, T., Hol-
Tuck, A.C., and Tollervey, D. (2013). A transcriptome-wide atlas of RNP stege, F.C., Corbett, A.H., Gwizdek, C., and Dargemont, C. (2012). H2B ubiq-
composition reveals diverse classes of mRNAs and lncRNAs. Cell 154, 996– uitylation controls the formation of export-competent mRNP. Mol. Cell 45,
1009. 132–139.
Yen, K., Vinayachandran, V., Batta, K., Koerber, R.T., and Pugh, B.F. (2012).
Tutucci, E., and Stutz, F. (2011). Keeping mRNPs in check during assembly
Genome-wide nucleosome specificity and directionality of chromatin remod-
and nuclear export. Nat. Rev. Mol. Cell Biol. 12, 377–384.
elers. Cell 149, 1461–1473.
Vary, J.C., Jr., Gangaraju, V.K., Qin, J., Landel, C.C., Kooperberg, C., Bartho- Yoh, S.M., Cho, H., Pickle, L., Evans, R.M., and Jones, K.A. (2007). The Spt6
lomew, B., and Tsukiyama, T. (2003). Yeast Isw1p forms two separable com- SH2 domain binds Ser2-P RNAPII to direct Iws1-dependent mRNA splicing
plexes in vivo. Mol. Cell. Biol. 23, 80–91. and export. Genes Dev. 21, 160–174.
Venters, B.J., and Pugh, B.F. (2009). A canonical promoter organization of the Zenklusen, D., and Singer, R.H. (2010). Analyzing mRNA expression using sin-
transcription machinery and its regulators in the Saccharomyces genome. gle mRNA resolution fluorescent in situ hybridization. Methods Enzymol. 470,
Genome Res. 19, 360–371. 641–659.
1214 Cell 167, 1201–1214, November 17, 2016

STAR+METHODS
KEY RESOURCES TABLE
REAGENT or RESOURCE SOURCE IDENTIFIER

Antibodies
Anti-CTD (8WG16) BioLegend Cat# 920102, RRID: AB_2565318
Anti-HA.11 (16B12) BioLegend Cat# 901503, RRID: AB_2565005
Anti-Myc (9E10) Roche Cat# 11667149001,
RRID: AB_390912
Anti FLAG-M2 SIGMA-ALDRICH Cat# F3165, RRID: AB_259529
Anti Mex67 Dargemont laboratory N/A
Anti Rrp6 T. H. Jensen N/A
Peroxidase AffiniPure Goat Anti-Mouse IgG (H+L) Jackson ImmunoResearch Cat# 115-035-146, RRID:
SCR_010488
Peroxidase AffiniPure Goat Anti-Rabbit IgG (H+L) Jackson ImmunoResearch Cat# 111-035-144,
RRID: SCR_010488
EZview Red anti-HA beads SIGMA-ALDRICH Cat# E6779, RRID:
AB_10109562
Chemicals, Peptides, and Recombinant Proteins
Mycophenolic acid SIGMA-ALDRICH M5255
Phenanthroline SIGMA-ALDRICH 131377
6-Azauracil SIGMA-ALDRICH A1757
Hoechst 33258 SIGMA-ALDRICH 861405
cOmplete, EDTA-free Protease Inhibitor Cocktail Roche 05056489001
5-fluoroorotic acid (5-FoA) EUROMEDEX 1555-A
Dynabeads Protein G Novex Life Technologies 10004D
Protein G Sepharose 4 Fast Flow GE Healthcare Life 17-0618-01
Sciences
SuperScript II Reverse Transcriptase Invitrogen 18064014
Critical Commercial Assays
LightCycler 480 SYBR Green I Master Roche 04887352001
TruSeq Stranded mRNA Library Prep Kit Illumina RS-122-2103
Deposited Data
Raw Pab1 RIP seq (Costello et al., 2015) Array Express E-MTAB-2464
Experimental Models: Organisms/Strains
S. cerevisiae: Mat a, ura3-52, leu2-1 A. Corbett WT (S288C)
S. cerevisiae: Mat a, ura3-52, leu2-1, trp1-1, npl3-1 A. Corbett npl3-1
S. cerevisiae: Mat a, ura3-52, leu2-1, isw1::KanMx This study npl3-1 isw1D
S. cerevisiae: Mat a, ura3-52, leu2-1, isw2::KanMx This study npl3-1 isw2D
S. cerevisiae: Mat a, ura3-52, leu2-1, chd1::KanMx This study npl3-1 chd1D
S. cerevisiae: Mat a, ura3-52, leu2-1, ino80::KanMx This study npl3-1 ino80D
S. cerevisiae: Mat a, ura3-52, leu2-1, rsc2::KanMx This study npl3-1 rsc2D
S. cerevisiae: Mat a, ura3-52, leu2-1, snf2::KanMx This study npl3-1 snf2D
S. cerevisiae: Mat a, ura3-52, leu2-1, ioc2::KanMx This study npl3-1 ioc2D
S. cerevisiae: Mat a, ura3-52, leu2-1, ioc3::Hph This study npl3-1 ioc3D
S. cerevisiae: Mat a, ura3-52, leu2-1, ioc4::KanMx This study npl3-1 ioc4D
S. cerevisiae: Mat a, ura3-52, leu2-1, isw1::KanMx, ioc2::Hph This study npl3-1 isw1D ioc2D
S. cerevisiae: Mat a, ura3-52, leu2-1, isw1::KanMx, ioc3::Hph This study npl3-1 isw1D ioc3D
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, mft1::Hph This study mft1D
(Continued on next page)
Cell 167, 1201–1214.e1–e7, November 17, 2016 e1

Continued
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, mft1::Hph, This study mft1D isw1D
isw1::KanMx
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, mft1::HPH, This study mft1D ioc2D
ioc2::KanMx
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, thp1::KanMX This study thp1D
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, thp1::KanMX, This study thp1D isw1D
isw1:Hph
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, thp1::KanMX, This study thp1D ioc2D
ioc2:Hph
S. cerevisiae: Mata, ura3 D0, leu2D0, lys2D0, rna15-58::KanMx P. Hieter rna15-58
S. cerevisiae: Mata, ura3 D0, leu2D0, lys2D0, rna15-58::KanMx, This study rna15-58 isw1D
isw1::Hph
S. cerevisiae: Mata, ura3 D0, leu2D0, lys2D0, rna15-58::KanMx, This study rna15-58 ioc2D
ioc2::Hph
S. cerevisiae: Mat a, ura3-52, leu2-1, ref2::KanMx This study ref2D
S. cerevisiae: Mat a, ura3-52, leu2-1, ref2::KanMx, isw1::Hph This study ref2D isw1D
S. cerevisiae: Mat a, ura3-52, leu2-1, ref2::KanMx, ioc2::Hph This study ref2D ioc2D
S. cerevisiae: Mata, pap1-1, ade2, his3, leu2, trp1, ura3 (W303) T. H. Jensen pap1-1
S. cerevisiae: Mata, pap1-1, ade2, his3, leu2, trp1, ura3, This study pap1-1 isw1D
isw1::KanMx
S. cerevisiae: Mat a ade2, his3, leu2, trp1, ura3, fip1::LEU2 T. H. Jensen fip1-206
(CEN/TRP fip1-206)
S. cerevisiae: Mat a ade2, his3, leu2, trp1, ura3, fip1::LEU2 This study fip1-206 isw1D
(CEN/TRP fip1-206) isw1::KanMx
S. cerevisiae: MATa ura3D0 leu2D0 his3D0 sen1-1::KanMX P. Hieter sen1-1
S. cerevisiae: MATa ura3D0 leu2D0 his3D0 sen1-1::KanMX, This study sen1-1 isw1D
isw1::Hph
S. cerevisiae: MATa ura3D0 leu2D0 his3D0 sen1-1::KanMX, This study sen1-1 ioc2D
ioc2::Hph
S. cerevisiae: Mat a, leu2D1, his3D200, ura3-52, rat7-1 C. Cole nup159-1
S. cerevisiae: Mat a, leu2D1, his3D200, ura3-52, rat7-1, This study nup159-1 isw1D
isw1::KanMx
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, mex67::HIS3 E. Hurt MEX67 shuffle
(pRS316-URA3–MEX67)
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, mex67::HIS3 This study MEX67 shuffle, isw1D
(pRS316-URA3–MEX67) isw1::KanMx
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, mex67::HIS3 This study MEX67 shuffle, ioc2D
(pRS316-URA3–MEX67), ioc2::KanMx
(pRS316-URA3–MEX67) ioc3::KanMx
(pRS316-URA3–MEX67), ioc4::KanMx
S. cerevisiae: MATa, ade2, leu2, his3, ura3, yra1::HIS3, < pURA3- F. Stutz YRA1 shuffle
YRA1 >
S. cerevisiae: MATa, ade2, leu2, his3, ura3, yra1::HIS3, < pURA3- This study YRA1 shuffle, isw1D
YRA1 >, isw1::KanxMx
S. cerevisiae: MATa, ade2, leu2, his3, ura3, yra1::HIS3, < pURA3- This study YRA1 shuffle, ioc2D
YRA1 >, ioc2::KanxMx
S. cerevisiae: MATa, leu2, his3, trp1, ura3, nab2::HIS3, < pURA3- A. Corbett NAB2 shuffle
NAB2 >
S. cerevisiae: MATa, leu2, his3, trp1, ura3, nab2::HIS3, < pURA3- This study NAB2 shuffle, isw1D
NAB2 >, isw1::KanxMx
e2 Cell 167, 1201–1214.e1–e7, November 17, 2016

Continued
S. cerevisiae: MATa, leu2, his3, trp1, ura3, nab2::HIS3, < pURA3- This study NAB2 shuffle, ioc2D
NAB2 >, ioc2::KanxMx
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, dst1::KanMx This study dts1D
S. cerevisiae: Mat a, ura3-52, leu2-1, npl3-1::TRP, dst1::KanMx This study npl3-1 dst1D
S. cerevisiae: Mat a, ura3-52::pRS306, leu2-1, npl3-1::TRP This study npl3-1 URA
S. cerevisiae: Mat a, ura3-52::pRS306, leu2-1, isw1::KanMx This study npl3-1 isw1D URA
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, xrn1::Hph This study xrn1D
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, xrn1::Hph, This study xrn1D isw1D
isw1::KanxMx
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, mlp1::NatMx, This study mlp1D mlp2D
mlp2::Hph
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, mlp1::NatMx, This study mlp1D mlp2D isw1D
mlp2::Hph isw1:: KanMx
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, rrp6::Hph This study rrp6 D
S. cerevisiae: Mat a, ade2, his3, leu2, trp1, ura3, rrp6::Hph, This study rrp6 D isw1D
isw1::KanMx
S. cerevisiae: Mat a, ura3-52, leu2-1, npl3-1::TRP, rrp6::Hph This study npl3-1 rrp6D
S. cerevisiae: Mat a, ura3-52, leu2-1, npl3-1::TRP, rrp6::Hph, This study npl3-1 isw1D rrp6D
isw1::KanMx
S. cerevisiae: Mat a, ura3-52, leu2-1, trf4::KanMx This study trf4D
S. cerevisiae: Mat a, ura3-52, leu2-1, trf4::Hph, isw1::KanMx This study trf4D isw1D
S. cerevisiae: Mat a, ura3-52, leu2-1, npl3-1::TRP, trf4::KanMx This study npl3-1 trf4D
S. cerevisiae: Mat a, ura3-52, leu2-1, npl3-1::TRP, trf4::Hph, This study npl3-1 trf4D isw1D
isw1::KanMx
S. cerevisiae: Mata, cyc1-512, lys2-187, leu2-1,his4-166, ura3-52, J. Sherman lys2-370
met8-1
S. cerevisiae: Mata, cyc1-512, lys2-187, leu2-1, his4-166, J. Sherman lys2-370 rrp6D
ura3-52, met8-1, rrp6::URA
S. cerevisiae: Mata, cyc1-512, lys2-187, leu2-1, his4-166, This study lys2-370 isw1D
ura3-52, met8-1, isw1::KanMx
S. cerevisiae: Mata, cyc1-512, lys2-187, leu2-1, his4-166, This study lys2-370 isw1D rrp6D
ura3-52, met8-1, rrp6::URA, isw1::KanMx
S. cerevisiae: Mat a, ura3-52, leu2-1, RRP6-13MYC::KAN This study RRP6-13MYC
S. cerevisiae: Mat a, ura3-52, leu2-1, ISW1-13MYC::KAN This study ISW1-13MYC
S. cerevisiae: Mat a, ura3-52, leu2-1, ISW1-3HA::NatMx ISW1-3HA
S. cerevisiae: Mat a, ura3-52, leu2-1, RRP6-13MYC::KAN, This study RRP6-13Myc,
ISW1-3HA::NatMx ISW1-3HA
S. cerevisiae: Mat a, ura3-52, leu2-1, RRP4-3HA::NAT This study RRP4-3HA,
ISW1-13MYC::KAN ISW1-13MYC
S. cerevisiae: Mat a, ura3-52, leu2-1, npl3-1 RRP4-3HA::NAT This study RRP4-3HA,
ISW1-13MYC::KAN ISW1-13MYC, npl3-1
S. cerevisiae: Mat a, ura3-52, leu2-1, ISW1-GFP::KanMx, This study ISW1-GFP,
RRP6-RFP::Hph RRP6-mRFP
S. cerevisiae: Mat a, ura3-52, leu2-1, npl3-1::TRP, This study ISW1-GFP,
ISW1-GFP::KanMx, RRP6-RFP::Hph RRP6-mRFP, npl3-1
S. cerevisiae: Mat a, ura3-52, leu2-1, CBP20-TAP::KanMx This study CBP20-TAP
S. cerevisiae: Mat a, ura3-52, leu2-1, CBP20-TAP::KanMx, This study CBP20-TAP, ISW1-3HA
ISW1-3HA::NatMx
S. cerevisiae: Mat a, ura3-52, leu2-1, CBP20-TAP::KanMx, This study CBP20-TAP, ISW2-3HA
ISW2-3HA::NatMx
S. cerevisiae: Mat a, ura3-52, leu2-1, npl3-1,CBP20-TAP::KanMx This study CBP20-TAP npl3-1

Continued
S. cerevisiae: Mat a, ura3-52, leu2-1, npl3-1, CBP20-TAP::KanMx, This study CBP20-TAP,
ISW1-3HA::NatMx ISW1-3HA, npl3-1
S. cerevisiae: Mat a, ura3-52, leu2-1, ISW1-ProtA::KanMx This study ISW1-ProtA
S. cerevisiae: Mat a, ura3-52, leu2-1, npl3-1::TRP, This study ISW1-ProtA, npl3-1
ISW1-ProtA::KanMx
S. cerevisiae: Mata cyc1-512, ura3-52, trp2-1, This study LYS2 ISW1-ProtA
ISW1-ProtA::KanMx
S. cerevisiae: Mata, cyc1-512, lys2-187, leu2-1, his4-166, This study lys2-370 ISW1-ProtA
ura3-52, met8-1, ISW1-ProtA::KanMx
S. cerevisiae: Mat a, ura3-52, leu2-1, ISW1-HTTP::URA This study ISW1-HTP
S. cerevisiae: Mat a, ura3-52, leu2-1, IOC2-HTTP::URA This study IOC2-HTP
Recombinant DNA
pRS314-MEX67-3HA Dargemont laboratory N/A
pRS314-mex64DUBA-3HA Dargemont laboratory N/A
pGFP-YRA1 (CEN, LEU) F. Stutz (pFS2555) N/A
pGFP-yra1-8 (CEN, LEU) F. Stutz (pFS2554) N/A
pRS315-NAB2 A. Corbett (pAC717) N/A
pRS315-DN-nab2 A. Corbett (pAC1152) N/A
pRS416–ISW1-2FL T. Tsukiyama N/A
pRS416-isw1K227R-2FL This study N/A
pRS416–isw1DSANT-2FL This study N/A
pRS416–isw1DSLIDE-2FL This study N/A
pRS415-RRP6 D. Libri N/A
pRS415-rrp6D238A A. Corbett N/A
pRS415-Rad52-YFP V. Géli N/A
pRS316L A. Aguilera N/A
pRS316LYDNS A. Aguilera N/A
pRS415GPD-3FL-ISW1 This study N/A
Sequence-Based Reagents
Stellaris FISH Probes, Custom Assay with Fluorescein Dye; Biosearch Technologies LYS2
entire gene
Stellaris FISH Probes, Custom Assay with Fluorescein Dye; Biosearch Technologies IMD2
entire gene
qPCR primers SIGMA-ALDRICH Table S2
Software and Algorithms
Bowtie version 1 Langmead et al., 2009 Bowtie, RRID:SCR_005476 http://
bowtie-bio.sourceforge.net/index.shtml
EdgeR Robinson et al., 2010 edgeR, RRID:SCR_012802 http://
bioconductor.org
ImageJ 1.47v NIH; ImageJ, RRID:SCR_003070 https://
imagej.nih.gov/ij/notes.html
MATLAB MathWorks, Inc. http://www.mathworks.com/products/
matlab/
Other
NucleoSpin RNA II MACHEREY-NAGEL 740955
RNace-IT Ribonuclease Cocktail Agilent 400720
CONTACT FOR REAGENT AND RESSOURCES SHARING
Further information and requests for reagents may be directed to, and will be fulfilled by the Lead Contact, Anna Babour
(anna.babour@inserm.fr).

EXPERIMENTAL MODEL AND SUBJECT DETAILS
Yeast Strains and Growth

S. cerevisiae strains used in this study were grown using standard methods and are listed in the Key Resources Table. npl3-1 cells
and corresponding controls were grown overnight at 25 C and shifted for 3 hr at 30 C. Yeast genome manipulations (gene deletions
and tagging) were performed using a one-step PCR-mediated technique (Longtine et al., 1998). Each presented growth assay cor-
responding to five fold serial dilutions of strains is a representative example of at least 3 biological replicates.
Plasmids
The isw1K227R mutation was amplified from strain YTT1223 (T. Tsukiyama) and subcloned into pRS416-ISW1-2FL using BamHI
(853) XbaI (1555) restriction sites. An isw1DSANT fragment was amplified from the isw1D2641-2799 strain (J. Mellor collection, strain
MP2) and subcloned into pRS416–ISW1-2FL using the AgeI-NheI restriction enzymes to generate pRS416–isw1DSANT-2FL. Simi-
larly, an isw1DSLIDE fragment was amplified from the isw1D2961-3150 strain (J. Mellor collection, strain MP3) using a downstream
primer bearing two copies of the Flag sequence followed by a termination codon and a PstI restriction site. An AgeI-PstI fragment was
subcloned into pRS416–ISW1-2FL, generating the pRS416–isw1DSLIDE-2FL plasmid. pRS415GPD3FL-ISW1 was built as follow.
ISW1 ORF was cloned between the BamHI-PstI sites of pRS415GPD. A 3FL tag was next inserted at the BamHI site. All constructs
were verified by sequencing. Plasmids used in this study are reported in the Key Resources Table.
METHOD DETAILS
Fluorescence In Situ Hybridization

FISH was performed as previously described (Vitaliano-Prunier et al., 2012; Zenklusen and Singer, 2010). Cells grown in the appro-
priate media were fixed by adding parafolmaldehyde to the media at the final concentration of 4%. Fixation was performed for 45 min
at room temperature (RT) on a rotating wheel. After 3 washes in wash buffer (1.2 M sorbitol, 100 mM KHPO4, pH 7.5), cells were pel-
leted and resupended in spheroplast buffer (1,2M sorbitol, 100 mM KHPO4, pH 7.5) supplemented with 100T zymolyase at a final
concentration of 0,5 mg/mL. Digestion was performed for approximately 15 min at 30 C (until cell wall digestion of > 80% of the cells).
Spheroplasts were carefully washed twice in cold spheroplast buffer and attached to poly-L-lysine coated coverslips. Unadhered
cells were washed off and coverslips were stored in 70% ethanol at - 20 C for at least 2 hr (up to few weeks). After cell rehydratation
by incubation of coverslips in 2XSSC for 5 min at RT, prehybridization was performed for 30 min at 37 C in hybridization buffer (Form-
amide 50%, Dextrane sulfate 10%, 4X SSC, 1X Denhardts, 125 mg/ml E. coli tRNA, 500 mg/ml salmon sperm DNA). Hybridization was
performed in hybridization buffer supplemented with 10 ng of OligodT-Cy3 probe or 2 ng of LYS2-Quasar 570 and IMD2- Quasar 570
probes for 12 hr at 37 C in the dark. Oligo-dT stained coverslips were washed by three consecutive incubation in 2XSSC at RT for one
hour, 1XSSC for 30 min at room temperature and 0,5XSSC at 37 C for 15 min. Quasar 570 stained coverslips were washed once in
40% formamide/2x SSC for 15 min at 37 C, once with 2x SSC 0.1% Triton X-100 at RT for 15 miutes and once with 1x SSC at RT for
15 min. All coverslips were finally washed in 1XPBS plus Hoechst and mounted onto Prolong antifade reagent (Molecular probes
P36937) mounting media.
Poly(A) localization was analyzed using Cy3-labeled oligodT probe in the different strains grown overnight at 25 C in YPD and
shifted for 3h at 30 C prior to fixation. The LYS2 and IMD2 FISH probes were purchased from Biosearch technologies and contain
a blend of 48 oligos labeled with Quasar 570. Probes specificities were verified against a lys2D strain (for the LYS2 probe) or under
uninduced conditions (for the essential IMD2 gene). For LYS2 or IMD2 mRNAs subcellular localization analysis, the number of nuclear
or cytoplasmic ‘‘dots’’ was manually counted for each individual cell (with 200-300 cells per condition) on maximum intensity
projections.
Fluorescence Microscopy
Three-dimensional stacks with a 0.2-mm step were acquired by 3D deconvolution microscopy adapted on an DMR upright micro-
scope (Leica) equipped with a CoolSNAP HQ2 charge-coupled device (CCD) camera (Photometrics) and using a 100 3 Plan Apo-
chromat HCX oil immersion objective (NA = 1.4) controlled in the z axis by a piezoelectric motor (LVDT; Physik Instrumente). For
each condition, 200-300 Hoechst-stained cells were examined in 3 biologically independent experiments. Deconvolution, when
applied, was performed automatically using an iterative and measured point spread function (PSF)-based algorithm method
(Gold-Meinel) on batches of image stacks. Identical processing parameters and number of iterations were used. Maximum intensity
projections were performed using ImageJ software.
Co-immunoprecipitation Experiments
For co IP experiments between Isw1 and Rrp4, WT and npl3-1 log phase cells were shifted for 3 hr at 30 C. For all co IPs, cells were
harvested by centrifugation at 4 C and rapidly frozen in liquid nitrogen before cryolysis. Frozen cell grindates were prepared as pre-
viously described (Oeffinger et al., 2007) and thawed in IP buffer (HEPES 20mM pH 7.4; NaCl 150mM; MgCl2 5mM; Glycerol 10%;
Triton X-100 0.5% and protease inhibitors (Roche)). The cleared lysate was incubated for 90 min at 4 C with EZview Red anti-HA
beads (Sigma) or Dynabeads Protein G (Novex Life Technologies). Samples were then washed 5 times with IP buffer for 5 min at

4 C. Immunoprecipitated proteins were eluted from beads by boiling samples for 5 min at 95 C and analyzed by western blots. Input
lanes correspond to 10% of the total input engaged in the IP.
Chromatin Immunoprecipitation
Cell cultures were cross-linked for 10 min with 1.2% formaldehyde (37%; Sigma), which was quenched with 360 mM Glycine. 40 OD
cell pellets were resuspended in 1 mL of lysis buffer (50 mM HEPES-KOH pH 7.5; 140 mM NaCl; 1mM EDTA pH 8.0; 1% Triton X-100;
0.1% Sodium deoxycholate: 0.05% SDS; protease inhibitors). Cells were lysed using the MagNAlyser (Roche) and the lysates son-
icated for two rounds of 10 min (Diagenode). Sheared chromatin (z300 bps DNA fragments) was isolated from cellular debris by
centrifugation 20 min at 13000 rpm at 4 C. IP was realized by rotating samples overnight at 4 C after addition of 5 mg anti-CTD
(8WG16; Covance) to 0.5 mg of proteins from chromatin extracts (500 mL final). 50 mL of pre-washed protein G Sepharose beads
(GE Healthcare) were added to each sample, and incubated 2 hr at room temperature. Beads were successively washed for
5 min in 1 mL of the following buffers: lysis buffer, 500 mM NaCl buffer (50 mM HEPES-KOH pH 7.5; 500 mM NaCl; 1 mM EDTA
pH 8.0; 1% Triton X-100; 0.1% sodium deoxycholate), buffer III (10 mM TrisHCl pH 8.0; 1 mM EDTA pH 8.0; 250 mM LiCl; 1%
NP-40; 1% sodium deoxycholate), and TE pH 8.0 (10 mM Tris HCl pH 8.0; 1 mM EDTA pH 8.0). Elution was performed in 100 mL
elution buffer (50 mM Tris HCl pH 7.5; 1% SDS; 10 mM EDTA pH 8.0) for 20 min at 65 C. IP and input samples were reverse cross-
linked overnight at 65 C with prior addition of 1 mg/mL proteinase K. Genomic DNA was purified using QIAGEN PCR purification Kit
and quantified by qPCR (SYBR Green 1 Master; Light Cycler 480; Roche).
RNA Immunoprecipitation, RTqPCR, and Deep Sequencing

RNA immunoprecipitations were performed from 2g of frozen cell grindates prepared as previously described (Oeffinger et al., 2007).
Immunoprecipitated RNA was isolated from proteins by treating samples with 40 mg Proteinase K (Roche) and 0,1% SDS for 30 min at
30 C. RNA was extracted with the Nucleospin RNA II Kit (Macherey-Nagel), and reverse transcribed using the SuperScript II reverse
transcriptase (Invitrogen). cDNA was quantified by qPCR (SYBR Green 1 Master; Light Cycler 480; Roche). Alternatively, polyA+
RNAs were purified from 500 ng of total RNA from total or immunoprecipitated extracts using oligo(dT). Libraries were prepared using
the strand specific RNA-Seq library preparation TruSeq Stranded mRNA kit (Illumina). Libraries were multiplexed by 9 on 2 flowcell
lanes. A 50 bp read sequencing was performed on a HiSeq 1500 device (Illumina). A mean of 22.9 ± 2.6 million passing Illumina quality
filter reads was obtained for each of the 18 samples. Library preparation and Illumina sequencing were performed at the Ecole Nor-
male Supérieure Genomic Platform (Paris, France). The accession number for the RIP seq data reported in this paper is ArrayExpress:
E-MTAB-4826.
In Vivo Crosslinking Assay

In vivo crosslinking has been performed according to Bohnsack et al. (2012) for the CRAC procedure from 2L of cells expressing
genomically HTP-tagged proteins at OD 0.6 and crosslinked using a Megatron (UVO3) for 50 s. RNase digestion was performed using
1 ml from a 1/50 dilution of RNAce-IT (Agilent technologies).
Next-Generation Sequence Analysis

Reads were aligned to the S. cerevisiae genome build SacCer3 using Bowtie version 1 (Langmead et al., 2009). Read counts were
then determined for 5,398 non-dubious protein-coding genes, using transcript boundaries defined previously (Murray et al., 2012).
We used EdgeR (Robinson et al., 2010) to determine which genes were statistically enriched in Isw1 IPs relative to input, using the
GLM functionality to test for differential enrichment using a quasi-likelihood F-test. This gave a list of 374 transcripts that were sta-
tistically enriched for Isw1 in WT, 1,753 in npl3-1, and 86 in the No Tag control. Transcripts that were enriched in WT and npl3-1 strains
were excluded from downstream analysis if they were also enriched in the No Tag strain. Genome wide levels of sense transcription,
antisense transcription, nucleosome occupancy, TBP, H3K4 acetylation, H3K36me3, H3K79me3, Isw1 and Ioc3 were obtained from
sources described in the main text. Three groups were considered: the Isw1 target RNAs in both WT and npl3-1 strains, and all 5,398
non-dubious protein-coding genes. All genes in a given group were aligned by their ATG start codon, and the mean level of a given
factor calculated at each base pair relative to this point.
Poly(A) Tail Length Analysis

Bulk poly(A) tails were analyzed as described previously (Apponi et al., 2010). To 30 end label RNA with [32P]-pCp, 3ml total RNA (10 mg)
was mixed with 1.5ml Ligation Cocktail [For 15ml Ligation Cocktail (10 reactions): 6ml Ligation Buffer (250 mM HEPES, pH 8.3; 50 mM
MgCl2, 50% DMSO; 17.5 mM dithiothreitol; 62.5 mM ATP); 5ml [32P]-pCp (cytidine 30 ,50 bisphosphate; 3000Ci/mmol; 10mCi/ml; Perkin
Elmer); 2ml RNasin (40U/ml; Promega); 2ml T4 RNA ligase (20U/ml; New England Biolabs)] and incubated overnight at 4 C. To terminate
RNA labeling reaction and digest non-poly(A) RNA, 50ml of Stop Solution [0.5 M NaCl; 10 mM EDTA, pH 8] and 90ml of RNase Cocktail
[For 100 ml RNase Cocktail: 0.5 M NaCl; 10 mM Tris-HCl, pH 8; 1 mM MgCl2; 10ml tRNA (10mg/ml; Sigma-Aldrich); 0.1ml RNase A
(20mg/ml; Invitrogen); 3 ml RNase T1 (100U/ml; Roche)] were added to RNA and RNA sample was incubated at 37 C for 1 hr. Protein
was extracted from RNA sample by addition of 100 ml phenol/chloroform/isoamylalcohol (25:24:1), vortexing for 30 s, and centrifu-
gation at 12,000 x g for 5 min at 4 C. Aqueous phase (110 ml) was transferred to new tube and RNA was precipitated with 400 ml 100%
ethanol and centrifugation at 12,000 x g for 15 min at 4 C. RNA pellet was washed with 75% ethanol, air-dried, and resuspended in

20 ml loading buffer [20 mM sodium citrate, pH 5; 7 M urea; 1 mM EDTA, pH 8; 0.25 mg/ml bromophenol blue; 0.25 mg/ml xylene
cyanole]. RNA samples (4 ml) were separated on TBE-Urea (90 mM Tris-borate, 2 mM EDTA, 8 M urea) 10% polyacrylamide gel at
90 W for 2.5-3 hr, transferred to Whatmann paper, dried at 80 C for 2 hr, exposed to phosphoscreen, and imaged using Typhoon
FLA 7000 phosphoimager (GE Healthcare).
QUANTIFICATION AND STATISTICAL ANALYSIS
The number of independent experimental replications, the definition of center and precisions measures are reported in the figure leg-
ends (n, mean ± sd) Significance of the observed differences was evaluated using Student’s t test (*P 0.01-0.05; **P 0.001–0.01;
***p < 0.001).
For quantification of FISH experiments, the percentage of cells with nuclear accumulation of poly (A) or bearing a nuclear ‘‘dot’’
were measured from 100 to 300 Hoechst-stained cells of each genotype in 3 independent experiments (mean ± sd).
For measurement of the hyperrecombination phenotype, recombination frequencies were calculated as the median value of 3
fluctuation tests, each one performed with twelve independent colonies for each transformant studied. The mean frequencies are
represented (n = 3, mean ± sd).
DATA AND SOFTWARE AVAILABILITY
The accession number for the RIP seq data reported in this paper is ArrayExpress: E-MTAB-4826. Downstream bioinformatics anal-
ysis was carried out using bespoke codes written in MATLAB, to identify average protein levels (TBP, Isw1, etc.) in different gene
classes. Codes are available on request.
Autoradiograms were quantified using ImageJ.

Supplemental Figures

Figure S1. Related to Figure 1
(A) Left: The interaction between Isw1 and Mex67 is partially sensitive to RNase. Co-immunoprecipitation of Isw1-13Myc and Mex67 were assayed from lysates
treated or not with 200 mg/mL RNaseA. * indicates the position of IgG heavy chain. Right: Isw2-HA does not co-precipitate with Mex67. Mock = pre-immune
serum. (B) Deletion of each subunit of the ISW1 complex does not affect cell growth at 30 C. Fivefold serial dilutions of strains grown for 2 days on YPD. (C)
Inactivation of ISW1 complex restores the growth of some mRNA export mutants. Fivefold serial dilutions of strains grown at the indicated temperatures on YPD
(mft1D, thp1D, rna15-58, pap1-1, fip1-206, sen 1-1, ref2D and nup159-1) or DO-LEU (GFP-yra1-8, DN-nab2). (D) ISW1 deletion does not affect bulk poly(A) tail
length. Bulk poly(A) tail length was analyzed in indicated strains grown 3 hr at 30 C. (E) Upper panel: Fivefold serial dilutions of strains grown for 4 days on selective
media at 25 C. Lower panel: ChIP analysis of Isw1-3FL recruitment to PMA1 in the indicated strains.
(A) Effect of the inactivation of one subunit of various chromatin remodelers on cell growth. Fivefold serial dilutions of strains grown for 2 days on YPD. (B) The WT
and mutant forms of Isw1 have comparable expression levels. Total protein extracts from npl3-1 isw1D cells transformed with pRS416, pRS416-ISW1-2FL,
pRS416-isw1K227R-2FL, pRS416-isw1DSANT-2FL or pRS416-isw1DSLIDE-2FL were analyzed by western blot with anti-FLAG and anti-Mex67 (loading control)
antibodies.
(A) Deletion of ISW1 or DST1 does not affect cell growth at 30 C. (B) Mycophenolic acid (MPA) does not affect the growth of isw1D cells. Fivefold serial dilutions of
the indicated strains grown on YPD plates (A) or on MPA containing YPD plates (B). (C) Similar CTD recruitment to LYS2 in npl3-1 and npl3-1 isw1D cells analyzed
by ChIP (n = 3, mean ± sd). (D) IMD2 induction/chase experimental setting: the IMD2 gene was induced in npl3-1 and npl3-1 isw1D cells grown for 1h at 30 C in
SC-URA by addition of 75 mg/mL 6-Azauracil (+6AU) during 1h (Induction). At t = 0, cells were washed in SC media free of 6AU and samples were collected for
analysis (Repression). (E) Similar CTD recruitment to IMD2 in npl3-1 and npl3-1 isw1D cells. Error bars, sd of n = 5 independent biological repeats. (F) Similar IMD2
mRNA level in npl3-1 and npl3-1 isw1D cells as analyzed by RT-qPCR and normalized to ACT1 mRNA expression (n = 3, mean ± sd). (G) The IMD2 transcript is
nuclear retained in npl3-1 cells and exported upon ISW1 deletion. The fate of the IMD2 transcript was compared between npl3-1 and npl3-1 isw1D cells by
analyzing the subcellular localization of the IMD2 transcripts by FISH with an IMD2-Quasar 570 probe. Scale bar 5 mm. For each time point, the percentage of cells
showing a nuclear dot was scored. At least 200 Hoechst-stained cells per condition were examined in 3 independent experiments. Error bars, sd.
(A) ISW1 deletion is neutral toward RRP6 deletion at all tested temperatures. (B) RRP6 inactivation impairs the growth of npl3-1 cells at 25 C but rescues their
growth at 30 C. (C) The WT and mutant forms of Rrp6 have comparable expression levels. Total protein extracts from npl3-1, npl3-1 rrp6D cells transformed with
pRS415, pRS415-RRP6, pRS415-rrp6D238A, were analyzed by western blot with anti-Rrp6 and anti-Mex67 (loading control) antibodies. (D) Effect of TRF4 and
ISW1 inactivation on the growth of WT cells. (E) ISW1 and TRF4 deletions rescue the growth at 30 C of the npl3-1 mutant and show additive effects. Opposite
effects are observed at 25 C. (A, B, D, E) Cell growth of indicated strains was analyzed at indicated temperatures using fivefold serial dilutions. (F) The Isw1 Rrp4
interaction is insensitive to RNase. The interaction between Rrp4-HA and Isw1-13Myc or Mex67 was analyzed by co-immunoprecipitation as in Figure 4E with or
without treatment with 200 mg/mL RNase A. (G) Isw2-HA does not co-immunoprecipitate with Rrp6. Mock = unrelated Ab. (H) Increased co-localization of Isw1
and Rrp6 in npl3-1 cells. Localization of genomically tagged Isw1-GFP and Rrp6-mRFP was analyzed in WT and npl3-1 cells shifted for 3 hr at 30 C. Bar, 5 mm. For
quantification of the Isw1-GFP/Rrp6-mRFP co-localization, the overlapping area between the Isw1-GFP and Rrp6-mRFP signals was determined using the
‘‘image calculator’’ function of the ImageJ software on binary images and measured with the ‘‘Analyze particles’’ function. 16 classes of size of overlapping areas
were arbitrary defined. The frequency of cells per class was calculated for 5 independent experiments in each of which 75 to 150 cells were analyzed. Error bars,
sem. (I) Rrp6 delocalizes from the nucleolus in npl3-1 cells. Localization of genomically tagged Isw1-mRFP and Nop1-GFP (left panel) or Rrp6-mRFP and Nop1-
GFP was analyzed in WT and npl3-1 cells shifted to 30 C for 3 hr.
(A) ISW1 deletion restores the growth of the lys2-370 mutant on DO-LYS. (B) Effect of ISW1 and RRP6 deletions on the lys2-370 transcripts distribution as shown
in Figure 5C. Quantification of the number of LYS2 transcripts for one representative experiment: for each cell type the percentage of cells containing only a
nuclear dot (N), 1, to more than 10 cytoplasmic transcripts was scored. (C) RIP assay allows detecting the interaction of Cbp20 with ACT1 and LYS2 transcripts.
RNA immunoprecipitation experiments were performed with CBP20-TAP or untagged strains. The ratio of co-immunoprecipitated ACT1 or LYS2 RNA relative to
the total RNA present in each strain quantified by RT-qPCR is represented as a mean of 3 replicates (mean ± sd). Significance of the observed differences was
evaluated using Student’s t test (*P 0.01–0.05; **P 0.001–0.01; ***p < 0.001).
(A) Isw1 specifically interacts with its identified targets. RNA immunoprecipitation experiments were performed with PrA-tagged Isw1 and TAP-tagged Cbp20
cells. The ratio of co-immunoprecipitated PMA1 (non Isw1 target) or IOC3 (Isw1 target) RNA relative to the total RNA present in each strain quantified by RT-qPCR
is represented as a mean of 3 biological replicates (mean ± sd). Untagged WT cells (No Tag) were used as negative control. Significance of the observed dif-
ferences was evaluated using Student’s t test (*P 0.01–0.05; **P 0.001–0.01; ***p < 0.001). (B) The interaction between Isw1 and its mRNA targets is increased in
the npl3-1 mutant compared to WT. RNA immunoprecipitation experiments were performed with PrA-tagged Isw1 in WT and npl3-1 cells grown for 3 hr at 30 C.
The ratio of 5 co-immunoprecipitated mRNA targets identified in the genome-wide analysis (IOC2, IOC3, INO80, MDN1, HAP1) relative to the total RNA present in
each strain was quantified by RT-qPCR. Here is shown a representative experiment. C-F Average levels of X at those genes whose transcripts are statistically
enriched for, in an Isw1-PrA immunoprecipitate. Shown are the classes of genes enriched in WT (blue) and in npl3-1 (red) strains, compared to all protein-coding,
non-dubious genes in the yeast genome (green). (G) Ioc2 but not Rpb3 UV cross-links to RNA in vivo. Left: HTP tagged Ioc2 was cross-linked (+) or not (-) and
purified from cell extracts. 2.5% of the nickel eluates were resolved by SDS-PAGE after (lanes 3-4) or not (lanes 1-2) RNase treatment and detected by auto-
radiography (upper panel) or anti-HIS western blot (lower panel). The red asterisk indicates a contaminant band. Right: HTP tagged Rpb3 (1) was cross-linked and
purified from cell extracts. 2.5% of the nickel eluates were resolved by SDS-PAGE and detected by autoradiography (upper panel) or anti-Rpb3 western blot
(lower panel). An untagged WT strain served as a negative control (2).
Article
Structure and Function of the Nuclear Pore Complex

Cytoplasmic mRNA Export Platform
Javier Fernandez-Martinez,
Seung Joong Kim, Yi Shi, ..., Brian T. Chait,
Andrej Sali, Michael P. Rout
Correspondence
chait@rockefeller.edu (B.T.C.),
sali@salilab.org (A.S.),
rout@rockefeller.edu (M.P.R.)
In Brief
mRNAs escape the nucleus with help
from a nuclear pore subcomplex that sits
directly over the transport channel in the
cytoplasm.
Highlights
d Integrative structure at 9 Å precision of the endogenous
Nup82 holo-complex
d Molecular architecture of the conserved 1.8-MDa

cytoplasmic mRNA export platform
d Structural framework to understand the mRNP remodeling

and export processes
d mRNP remodeling machinery is positioned over the NPC’s

central channel, not in filaments
Fernandez-Martinez et al., 2016, Cell 167, 1215–1228

Article
Structure and Function of the Nuclear Pore

Complex Cytoplasmic mRNA Export Platform
Javier Fernandez-Martinez,1,8 Seung Joong Kim,2,8 Yi Shi,3,8 Paula Upla,4,8 Riccardo Pellarin,2,7,8 Michael Gagnon,5
Ilan E. Chemmama,2 Junjie Wang,3 Ilona Nudelman,1 Wenzhu Zhang,3 Rosemary Williams,1 William J. Rice,6
David L. Stokes,4 Daniel Zenklusen,5 Brian T. Chait,3,* Andrej Sali,2,* and Michael P. Rout1,9,*
1Laboratory of Cellular and Structural Biology, The Rockefeller University, New York, NY 10065, USA
2Departments of Bioengineering and Therapeutic Sciences and Pharmaceutical Chemistry, California Institute for Quantitative Biosciences,
University of California, San Francisco, San Francisco, CA 94158, USA
3Laboratory of Mass Spectrometry and Gaseous Ion Chemistry, The Rockefeller University, New York, NY 10065, USA
4Skirball Institute of Biomolecular Medicine, Department of Cell Biology, New York University School of Medicine, New York, NY 10016, USA
5Département de Biochimie et Médecine Moléculaire, University of Montréal, Montréal, QC H3C3J7, Canada
6Simons Electron Microscopy Center at New York Structural Biology Center, New York, NY 10027, USA
7Structural Bioinformatics Unit, Institut Pasteur, CNRS UMR 3528, 75015 Paris, France
8Co-first author
9Lead Contact
*Correspondence: chait@rockefeller.edu (B.T.C.), sali@salilab.org (A.S.), rout@rockefeller.edu (M.P.R.)

SUMMARY These FG repeat regions populate the NPC central channel

and, through their specific interaction with cargo-carrying trans-
The last steps in mRNA export and remodeling are port factors, mediate transport (Knockenhauer and Schwartz,
performed by the Nup82 complex, a large conserved 2016).
assembly at the cytoplasmic face of the nuclear pore Although much of transport across the NPC is mediated by the
complex (NPC). By integrating diverse structural karyopherin family of transport factors, the export of mRNAs fol-
data, we have determined the molecular architecture lows a different mechanism that requires a special platform
located at the cytoplasmic face of the NPC, called the Nup82
of the native Nup82 complex at subnanometer pre-
complex in budding yeast (Oeffinger and Zenklusen, 2012),
cision. The complex consists of two composi-
which in turn associates with Dyn2, Nup116, Gle2, and Gle1
tionally identical multiprotein subunits that adopt (Folkmann et al., 2011). The central role of this complex is under-
different configurations. The Nup82 complex fits scored by the fact that its mammalian homolog, the Nup88 com-
into the NPC through the outer ring Nup84 complex. plex, is a nexus for disease-associated mutations (Kaneb et al.,
Our map shows that this entire 14-MDa Nup82- 2015; Nousiainen et al., 2008). The Nup82 complex and its asso-
Nup84 complex assembly positions the cytoplasmic ciated proteins have proven challenging for structural analyses
mRNA export factor docking sites and messenger due to their flexibility and the presence of intrinsically disordered
ribonucleoprotein (mRNP) remodeling machinery domains. The core of the Nup82 complex is composed of
right over the NPC’s central channel rather than the proteins Nup82, Nup159, and Nsp1. Fragments of each
on distal cytoplasmic filaments, as previously sup- have been solved crystallographically (Chug et al., 2015; Stuwe
et al., 2015a; Yoshida et al., 2011), and negative stain electron
posed. We suggest that this configuration efficiently
microscopy (EM) revealed this complex to have an overall
captures and remodels exporting mRNP particles ‘‘P’’-shaped morphology (Gaik et al., 2015), but no structures
immediately upon reaching the cytoplasmic side exist for either the whole complex or how it interacts with its
of the NPC. associated proteins and the NPC.
mRNA export is achieved in several stages. First, mRNAs,
INTRODUCTION packaged into export-competent messenger ribonucleoprotein
(mRNP) particles, are docked into the nuclear basket; the
The nuclear pore complex (NPC) is a large cylindrical structure mRNP particle then travels across the NPC through interaction
with eight symmetrically arranged spokes embedded in the of the non-karyopherin transport factors Mex67-Mtr2 with FG
nuclear envelope (NE) and is composed of multiple copies of repeats that fill the NPC’s central channel (Oeffinger and Zenklu-
30 different nucleoporins (Nups). Discrete Nup subcomplexes sen, 2012). Once the mRNP particle reaches the cytoplasmic
associate to form the different substructures of the NPC, con- face of the NPC, the coordinated action of the DEAD-box RNA
sisting of coaxial outer, inner, and membrane rings surrounding helicase Dbp5, the nucleoporin Gle1, and the N-terminal b-pro-
a central channel and linked to peripheral components such as peller of Nup159 leads to active remodeling of the mRNP (Folk-
the nuclear basket. Approximately one-third of all Nups, termed mann et al., 2011; Montpetit et al., 2011). Mex67-Mtr2 and other
FG Nups, contain intrinsically disordered domains comprising transport factors are removed during remodeling (Lund and
multiple Phe-Gly (FG) repeats between hydrophilic spacers. Guthrie, 2005), preventing the mRNA from traveling back to the
nucleus. In the final stage, the remodeled mRNA is released into 4,266 particles were classified into 23 class averages (Fig-
the cytoplasm for translation. ure S2C); a majority of these (21) showed what appears to be a
Unfortunately, the precise coordination of these processes at single dimer of Dyn2, in agreement with a previous study (Gaik
the molecular scale has not been elucidated, in large part due et al., 2015) and with our stoichiometry (see above), and were
to the lack of sufficiently detailed information on the spatial thus included in the calculation. Interestingly, two of the class av-
arrangement of transport and remodeling components relative erages seemingly presented two consecutive dimers of Dyn2
to each other and the NPC. Localization studies have led to (Figure S2C, arrowheads), underscoring the previously observed
the proposal that the Nup82 complex forms filaments that proj- heterogeneity of the complex in vivo (Gaik et al., 2015). Instead of
ect orthogonally from the cytoplasmic face of the NPC; such a using a highly uncertain 3D map computed via single-particle
location would imply that exporting mRNPs must first transit reconstruction based on a heterogenous set of images, we relied
the central channel of the NPC before being transferred out to on much more robustly computed 2D class averages, following
these peripheral cytoplasmic filaments, where the final stages a previously demonstrated procedure (Shi et al., 2014). Only
of mRNP remodeling and export would occur distally from the the structured portions of the complex were constrained by the
central channel of the NPC (reviewed in Folkmann et al., 2011; EM data, because we showed that the unstructured FG repeats
Knockenhauer and Schwartz, 2016; Oeffinger and Zenklusen, are not revealed by negative stain EM (Figure S2D).
2012). However, exactly how this transfer would be accom- All components of the complex were used in the final calcula-
plished, and how central channel transit and mRNP processing tion, including FG repeats to account for their excluded volume
could be coordinated, remained unclear. and emanating points. Protein representations were derived
To understand these processes, we solved the structure of the from the atomic structures in the Protein Data Bank, where avail-
endogenous Nup82 complex by using an integrative approach able, or comparative models were built with MODELER 9.13 (Sali
that relies on multiple structural and proteomic data sources and Blundell, 1993) based on the closest homolog with a known
(Alber et al., 2007b; Shi et al., 2014). We also determined how structure detected by HHPred (Söding, 2005) (Figure S3; Table
the Nup82 complex is anchored to the cytoplasmic face of the S1); disordered FG-repeat-containing regions were modeled
NPC via the Nup84 complex, a seven-member assembly forming as flexible strings of beads, guided by our recent nuclear mag-
the outer rings. In addition, we used a combined structural and netic resonance (NMR) data (Hough et al., 2015). Finally, the
functional mapping analysis to elucidate the major mechanism residue-specific spatial proximity and orientation of the different
responsible for mRNA export defects affecting Nup84 complex subunits were determined by a comprehensive chemical cross-
components. Finally, we integrate our data into a detailed map linking with mass spectrometry readout (CX-MS) method, using
of the whole cytoplasmic mRNA export and remodeling machin- two complementary cross-linkers (Figures 2A and S2A) (Shi
ery. We show that, surprisingly, the Nup82 complex positions the et al., 2014). To reduce the intrinsic ambiguity of cross-link
cytoplasmic FG repeats and mRNP remodeling machinery right data arising from the presence of two copies of each protein,
over the NPC’s central channel rather than on distal cytoplasmic we also analyzed a strain expressing an exogenous homolog
filaments, as previously supposed. of Nup82 (skNup82) from the yeast Saccharomyces kudriavzevii
(Borneman et al., 2012) (Figure S2A; STAR Methods), whose
RESULTS distinct protein sequence allows crosslinks to it to be distin-
guished from the endogenous Nup82. We identified a total of
Solving the Structure of the Endogenous Nup82 Holo- 1,131 cross-links (Table S2) that include 662 unique disuccini-
complex midyl suberate (DSS) and 126 unique 1-ethyl-3-(3-dimethylami-
We solved the structure of the endogenous native Nup82 holo- nopropyl)carbodiimide hydrochloride (EDC) cross-links from the
complex (Figure 1) using an integrative modeling approach that wild-type yeast strain and 343 unique DSS cross-links from
has previously allowed us and others to successfully determine the skNup82-containing complex (Figure S2A). The majority of
the molecular architecture of numerous other large native as- the identified inter-molecular cross-links mapped to the coiled-
semblies (Sali et al., 2015). Such integrative strategies have coil, C-terminal regions of Nup159 and Nsp1 and the whole
proven to be suited for the structural analysis of large endoge- Nup82 and Dyn2 proteins. Few inter-molecular cross-links
nous complexes that are by nature flexible, contain unstructured were found to connect to the FG regions of Nup159 or Nsp1
regions, and are conformationally heterogeneous (Shi et al., and none connected to the b-propeller domain of Nup159,
2014; Shi et al., 2015). strongly indicating that those domains are dynamic, peripheral,
We measured the native stoichiometry of the purified Nup82 and not located in proximity to the core of the complex (Gaik
holo-complex by a combination of QConCAT-MS (Pratt et al., et al., 2015).
2006) and classical Siegel and Monte biophysical measurements We computed the structure of the Nup82 complex (Figure 1)
(Figure S1; STAR Methods). The consensus of our analyses re- through our integrative modeling approach as implemented in
sults in a stoichiometry of 2:2:2:2 (Nup159:Nup82:Nsp1:Dyn2), the Integrative Modeling Platform (IMP) program (Russel et al.,
consistent with that previously measured (Gaik et al., 2015) for 2012) using the data described above. A detailed assessment
a truncated overexpressed version of the complex, with the of the input data and the resulting model are shown in Table 1
exception of the Dyn2 dimer, a labile component that, unless and STAR Methods. In summary, the 463 best-scoring solutions
overexpressed (Figure S1E), is present as a single dimer in satisfy within stringent tolerances the data used to compute
the average native complex. The morphology and dimensions them. The clustering analysis of the best-scoring solutions
of the complex were determined by negative stain EM, where identified a single dominant cluster of 370 similar structures.
1216 Cell 167, 1215–1228, November 17, 2016

Figure 1. Structure of the Core Nup82 Holo-complex
(A) Three views of the localization probability density map corresponding to the Nup82 holo-complex ensemble are shown (light gray), with a single representative
ribbon structure embedded; the proteins, subunits, and different structural features of the complex are indicated. Subunit assignment is indicated with a su-
perscript ‘‘s1’’ (subunit 1) or ‘‘s2’’ (subunit 2). In all views, the components of each subunit are colored in tones of red (subunit 1) or blue (subunit 2) (see also B).
(B) Exploded view of the Nup82 holo-complex subunits and protein components, with the whole complex shown in the center and the two subunits and the
different components shown on the right (subunit 1, colored in red tones) or the left side (subunit 2, colored in blue tones). CCS, coiled-coil segment (as described
in the main text).
See also Figures S1, S2, and S3 and Tables S1 and S2.
The corresponding localization probability density map repre- ately adjacent regions in the complex, as validated by those
sents the probability of any volume element being occupied by cliques that coincide with known crystallographic interface re-
a given protein (Figure 1). The 9.0 Å precision of the core struc- gions, such as Nup159:Dyn2 (PDB: 4DS1) (Romes et al., 2012)
tured region is sufficiently high to pinpoint the locations and and Nup159:Nup82 (PDB: 3PBP) (Yoshida et al., 2011) (Fig-
orientations of the constituent proteins and domains, demon- ure 2B); indeed, in our final calculated structure these cliques
strating the quality of the input data, including the cross-links represent immediately adjacent regions in the complex. Second,
and EM 2D class averages (Figure S4; Table 1). those few cross-links in violation of strict distance limits in our
Our structure is validated by seven considerations as follows. structure are nevertheless right next to one of the cliques; they
First, the EDC and DSS cross-links are highly consistent with are thus consistent with the structure when locally limited flexi-
each other, despite different chemistries, and there is significant bility is taken into account (Figures 2A and S4D). Third, mass
highly non-random clustering of both EDC and DSS cross-links tagging of our structure is consistent with the localization of
into equivalent ‘‘cliques’’ (Figure 2A). These represent immedi- GFP tags on both the Nup82 and Nup159 C termini (Figure 2C).
Cell 167, 1215–1228, November 17, 2016 1217

Figure 2. Nup82 Holo-complex Structure
Validation
(A) Circos-XL plots showing the distribution of all
DSS (top plot) or EDC (bottom plot) cross-links
mapping within the core of the Nup82 holo-com-
plex. Each protein is represented as a colored
segment, with the amino acid residue indicated
on the outside of the plot and relevant domains
indicated inside each segment; regions without
reliable fold assignment are identified by lighter
shading. Inter-molecular cross-links are depicted
as purple lines and intra-molecular cross-links as
gray lines. The internal circles include bars repre-
senting the density of cross-links per ten residues
in DSS and EDC (blue and light blue color for inter-
molecular cross-links and intra-molecular cross-
links, respectively) and the density of lysines in
DSS (orange and light orange bars for cross-linked
and uncross-linked residues, respectively) or the
density of lysine/carboxylic acid in EDC (pink and
light pink bars for cross-linked and uncross-linked
residues, respectively).
(B) Structure of the Nup82 holo-complex showing
the cross-links falling within the expected Ca-Ca
maximum distance threshold (blue) or outside of
that threshold (orange). Below the structure, a bar
graph shows the Ca-Ca distance distribution of
all DSS or EDC cross-links in the structure. DSS
threshold = 35 Å; EDC threshold = 30 Å.
(C) GFP mass-tagging analysis of the Nup82
holo-complex. Analyses of a Nup82-GFP tagged
version (top diagram) or a Nup159-GFP tagged
version (bottom diagram) of the holo-complex are
shown. For each diagram, a view of the native
Nup82 holo-complex structure is shown (wild-type
[WT]), and the tagged version of the structure
shown on the right side. The top panels show a
representative negative stain 2D class average of
the native complex (left) and the tagged version
(right; green arrowhead, GFP). The bottom panels
show 2D projections of the native structure (left)
and the calculated GFP-tagged version (right;
green arrowhead, GFP). ccc, cross correlation
coefficient. Scale bar, 10 nm.
(D) SAXS analysis of the Nup82 (572–690) frag-
ment, showing two views of the computed ab initio
shape (gray envelope), with ribbon representations
of the equivalent Nup82 fragments in the confor-
mation they adopt within the Nup82 holo-complex;
subunits 1 (red) and 2 (blue) are indicated. See also
Figures S5D–S5F and Table S4.
and ab initio shapes of Nup82 con-

structs spanning residues 4–220, 4–452,
and 572–690 (Figures 2D and S5D–S5F;
Table S4). Notably, the Nup82 coiled-coil
(572–690) forms a kinked structure, and
Fourth, our structure is consistent with the previously published the corresponding SAXS profile shows a monotonous increase
data, including an independent negative stain 3D density map in the Kratky plot (Figures 2D and S5F), indicating a high degree
(Figure S5A) (Gaik et al., 2015). Fifth, the trimeric coiled-coil of flexibility between coiled-coil segments in solution, as would
structure is recapitulated even when computed using the chem- be expected for coiled-coils that form two different conformers
ical cross-linking data alone (Figure S5C). Sixth, our structure is seen in the final structure. Finally, our structure is also validated
in agreement with small angle X-ray scattering (SAXS) profiles by the non-random and clustered distribution of cross-links
1218 Cell 167, 1215–1228, November 17, 2016

Table 1. Summary of Integrative Structure Determination of the Nup82 Complex
Modeling Programs Python Modeling Interface (PMI), version c7411c3; Integrative Modeling Platform (IMP),
version 2.5; MODELER 9.13
Homology Detection and Structure Prediction HHPred, PSIPRED, DISOPRED, DomPred, COILS/PCOILS, and Multicoil2 (see also
Figure S3 and Table S1)
Spatial Restraints Chemical cross-links, electron microscopy 2D, excluded volume, sequence connectivity,
and five homo-dimer cross-links restraints (see also STAR Methods)
Sampling Method Replica exchange Gibbs sampling, based on the Metropolis Monte Carlo algorithm; 8–16
replicas were used through 270 (initial step) and 80 (refinement step) independent runs, at
the temperature range of 1.0–2.5
Monte Carlo Moves Random translation and rotation of rigid bodies (up to 2 Å and 0.04 radians, respectively)
Random translation of individual beads in the flexible segments (up to 3 Å)
Number of Structures Generated 1,350,000 (initial step) and 10,000 (refinement step) structures
463 top-scoring structures were subjected to the clustering analysis
Clustering Analysis 2 clusters of 370 (80%) and 93 (20%) structures (see also Figures S4 and S5)
Sampling Exhaustiveness p = 0.972
Precision of the Clusters 9.0 Å (cluster 1: 370 structures) / 16.3 Å (cluster 2: 93 structures)
Stoichiometry 2:2:2:2 (Nup82:Nup159:Nsp1:Dyn2; see also Figure S1)
Chemical Cross-links Satisfied in the Cluster 88.5% combined (93.3% DSS and 74.1% EDC within 35 and 30 Å distances, respectively;
see also Figures 2B and S4D)
EM 2D Class Averages Average ccc for 21 class averages is 0.931. See also Figures 2C and S2C.
GFP Mass-Tagging EM 2D Class Averages ccc = 0.932 (GFP mass-tagging at the Nup159 C termini); ccc = 0.953 (GFP mass-tagging at
the Nup82 C termini) (see also Figure 2C)
Small Angle X-Ray Scattering (SAXS) c = 1.66 (Nup824–220), 2.55 (Nup824–452), and 6.47 (Nup82572–690) (see also Figures 2D
and S5D–S5F and Table S4)
Human NPC cryo-EM Map ccc = 0.72 (wild-type) and 0.81 (mutant) (see also Figures 5 and S6)
Visualization and Plotting UCSF Chimera 1.10, CX-Circos, matplotLib, and GNUPLOT
connecting the Nup82 holo-complex to other parts of the NPC, loop. The two ends of the central rod are each formed by the
revealing interaction sites, as described below. C-terminal (spur-1) and the N-terminal (spur-2) bundles of the
CCS domains. Two copies of Dyn2 form a dimer that is perpen-
Features of the Nup82 Holo-complex dicular with spur-2 and seems to help lock the two subunits into
The C termini of Nup82, Nup159, and Nsp1 share a common their asymmetric arrangement. Dyn2 also helps to orient the two
domain arrangement, formed by consecutive helical coiled-coil Nup159 copies, so that their FG regions emanate in parallel from
regions of different length, connected by flexible linkers. They that end of the complex. Interestingly, the FG regions of Nsp1
assemble (together with Dyn2) to form the Nup82 holo-complex, also project from spur-2, forming, together with the Nup159
a roughly ‘‘D’’-shaped particle, which is formed by the asym- FGs, an intrinsically disordered plume. In agreement with prior
metric assembly of two compositionally identical subunits work, the hump formed by the Nup82 b-propellers helps to
(termed subunit 1 [s1] and subunit 2 [s2] in Figure 1). Each sub- lock down the C termini of Nup159 and form the attachment
unit consists mainly of parallel, three-stranded, hetero-trimeric site for two Nup116 copies (Yoshida et al., 2011) (see below).
coiled-coils connected by flexible linkers, consisting of a single
copy of the C termini of Nup82, Nup159, and Nsp1. However, Structure of the Nup82-Nup84 Complex Assembly and
the two subunits adopt different configurations, mainly due to the Cytoplasmic mRNA Export Platform
the different degree of flexion of the hinges between hetero- To understand how the Nup82 holo-complex is associated with
trimeric coiled-coil segments (termed CCSs) and the relative the whole NPC, we isolated it under conditions that preserved its
position of the Nup82 b-propellers. Subunit 1 mainly forms the interaction with other Nups (Fernandez-Martinez et al., 2012).
‘‘rod,’’ while subunit 2 forms the ‘‘loop’’ of the holo-complex, CX-MS was used to analyze those proteins proximally associ-
with both subunits contributing to the spurs (Figure 1). The ated with each of the Nup82 holo-complex’s components (Table
CCS1s2 and CCS2s2 trimers constitute the extended loop that S3). Notably, most of the identified cross-links connected the
can be observed in certain orientations of the particle (Figure 1A, spur-1 region of the Nup82 holo-complex to components of
left and center). The denser region of the complex is formed by the Nup84 complex hub (Figure 3; Table S3) (Shi et al., 2014);
trimeric parallel CCS domains that form the slightly bent, elon- indeed, a direct physical connection between the Nup82 and
gated central rod. Both Nup82 b-propellers are located side Nup84 complexes was recently demonstrated in Chaetomium
by side on top of the rod formed by subunit s1, with Nup82 b-pro- thermophilum (Kellner et al., 2016). Our data, together with our
pellers2 located in trans in a distal position from the CCS1-2s2 prior map of the Nup84 complex (Shi et al., 2014),
Cell 167, 1215–1228, November 17, 2016 1219

Figure 3. Molecular Architecture of the Cytoplasmic mRNA Export and Remodeling Platform
(A) Structure of the Nup82-Nup84 complex assembly. Three views of the structural arrangement formed by the Y-shaped Nup84 complex (light gray density) and
the Nup82 holo-complex (light blue density) calculated using CX-MS data. Each component and structural feature of the different complexes are labeled and
shown as a density with fitted ribbon representations of their component Nups. A Circos plot shows the distribution of cross-links (dashed, light blue lines)
identified between components of the Nup82 complex, Nup84 complex, and mRNP export/remodeling machinery, used for the calculation of the assembly and
the map described in (B).
(B) Molecular architecture of the cytoplasmic mRNA export and remodeling platform. An exploded view of the different platform components is presented (solid
blue lines, covalent attachment; dashed blue lines, CX-MS-identified associations). When available, components are represented as crystal structures (Dbp5,
1220 Cell 167, 1215–1228, November 17, 2016

crystallographic data on the Nup84 complex (Kelley et al., 2015; spur-2, and its C-terminal and the Dbp5-interacting domain
Stuwe et al., 2015b), and the previous map of the entire NPC facing downward toward the NPC central channel (Figure 3).
(Alber et al., 2007b), were sufficient to allow us to dock the two Through its interaction with Gle1, Nup42 is also seemingly
complexes together to generate a map of the entire 1.3-MDa, localized toward the central channel, in agreement with a
15-protein, Nup82-Nup84 complex assembly (Figure 3A). All recent report that showed how the FG region of Nup42 is fully
our solutions were similar, differing only in the degree of rotation functional if fused to the Gle1 C terminus (Adams et al., 2014).
along the Nup82 complex long axis relative to the Nup84 com- Thus, in our map, both the core of the Nup82 holo-complex
plex (Figure S6). The Nup82 holo-complex body associates and the Nup84 complex form a flexible scaffold, which orga-
through its spur-1 region with the Nup85/Seh1 arm on the nizes and properly orients the two functional ends (FG regions
Y-complex hub and the N-terminal side of Nup145C (Figure 3A), and enzymatic activities) of the cytoplasmic mRNA export
with the two complexes oriented orthogonally with respect to machinery.
their long axis (Figure 3A). Our arrangement is supported
by the tight clustering of cross-links between the Nup82 and Functional Relationship between the Nup82 Holo-
Nup84 complexes mainly to two discrete locations, one on complex and the Nup84 Complex
spur-1 and the other on a single region of the Nup85-Seh1 To functionally annotate our Nup82-Nup84 complex assembly
arm, respectively. structure, we sought to investigate its relationship to mRNA
It has been previously shown that the Nup84 complex long- export. Mutations affecting both Nup84 and Nup82 complex
axis orientation is approximately parallel to the plane of the NE components have previously been shown to display character-
in the NPC’s outer ring (Alber et al., 2007a; Bui et al., 2013). istic mRNA export defects (Fabre and Hurt, 1997). Although
Consequently, our structure reveals that the Nup82 holo-com- the direct involvement of components of the Nup82 holo-com-
plex long axis is orthogonal to that of the Nup84 complex, form- plex in mRNA export has been long established (Fabre and
ing a potential linker between the outer and inner ring. The Hurt, 1997), until now, the association of mRNA defects with
coiled-coil bundles of the Nup82 holo-complex body form a the Nup84 complex has remained unclear. Thus, to identify re-
scaffold, and their downward orientation makes it so that the gions of the Nup84 complex that are most relevant for mRNA
FG plume in spur-2 projects from the bottom of the complex. export, we analyzed a collection of truncation mutants (Fer-
The FG regions of Nsp1 and Nup159 would thus face the central nandez-Martinez et al., 2012). The mRNA export defect of
transport channel and be adjacent to the Nsp1 FG regions each mutant was quantified and heat-mapped into the Nup84
emanating from the inner-ring Nic96 complex (Figure 3). complex structure (Figures 4 and S7). We detected a clear hot-
Our CX-MS analysis of the higher-order assembly also identi- spot mapping to the Nup85/Seh1 arm (Figure 4), different from
fied cross-links connecting other known components of the those determined for other Nup84 complex phenotypes (Fernan-
mRNA export machinery (Gle1, Nup42, and Nup116) to the dez-Martinez et al., 2012). Notably, this hotspot maps to where
Nup82 holo-complex (Figure 3; Table S3). The identified cross- the Nup85-Seh1 arm connects to the Nup82 holo-complex (Fig-
links are fully consistent with previous work showing physical ure 3). This significant structure-function correlation supports
connections between some of these components, such as the the idea that the mRNA export phenotype, focused to this part
C-termini of Gle1 and Nup42 (Strahm et al., 1999) and the C ter- of the Nup84 complex, is largely associated with a defective
minus of Nup116 to Nup82 (Yoshida et al., 2011), indicating that incorporation of the Nup82 complex into the NPC. To test
our CX-MS analysis is targeting bona fide physical connections this idea, we analyzed the in vivo localization of Nup82-
within the mRNA export machinery. In combination with pub- GFP in several Nup84 complex truncation mutants affecting
lished crystal structures of labile components of this machinery different parts of the Y-shaped complex. As shown in Figure 4,
(Montpetit et al., 2011; Ren et al., 2010), our data allowed us to the Nup82-GFP construct is indeed significantly mislocalized
assemble a physical map of the whole cytoplasmic mRNA export to the cytoplasm only in mutations affecting the Nup85/Seh1
platform comprising 16 different proteins (some in multiple arm, while a control Nup49-GFP reporter did not show similar
copies, so comprising 24 subunits) with a mass of 1.8 MDa behavior (Fernandez-Martinez et al., 2012). Thus, we conclude
(Figure 3B). The organization of the assembly reveals that the that the mRNA export phenotype found in Nup84 complex mu-
components actively involved in the mRNP remodeling process tants is mainly the consequence of a defective or weakened
(Dbp5, Gle1, and Nup159 N terminus) and associated FG regions incorporation of the Nup82 holo-complex into the NPC.
(Nup42, Nup116, Nup159, and Nsp1) are localized around the
Nup82 holo-complex and the short arms of the Nup84 complex. Conservation of the Cytoplasmic mRNA Export Platform
Remarkably, we identified ten cross-links connecting Gle1 to the in Opisthokonts
Nup82 holo-complex, delineating for the first time the position We tested whether our current structure was consistent with
and orientation of Gle1 in the NPC, adjacent to the Nup82 previous maps of the whole NPC. When the Nup82-Nup84
holo-complex and oriented with its N terminus toward the complex assembly is docked into our yeast NPC map (Alber
holo-complex hump, its middle region running parallel to et al., 2007b), the arrangement of their common components is
Gle1, and Nup159 N termini; PDB: 3RRM; Montpetit et al., 2011; Gle2/RAE1; PDB: 3MMY; Ren et al., 2010; Nup116 C termini; PDB: 3PBP; Yoshida et al., 2011;
and 3NF5; Sampathkumar et al., 2012). The Gle1 N terminus is represented with a homology model of its predicted coiled-coil region as a red ribbon inside a light
gray density of the approximate expected size for the domain.
See also Table S3.
Cell 167, 1215–1228, November 17, 2016 1221

Figure 4. mRNA Export Phenotype in Nup84 Complex Mutants Is Associated with Defective Incorporation of the Nup82 Holo-complex into
the NPC
(A) The mRNA export defect phenotype was quantified and plotted (mean value; n = 4) for each Nup84 complex component mutant in order of increasing level of
nuclear poly(A) mRNA accumulation as observed by fluorescence in situ hybridization (FISH) (see STAR Methods and Figure S7 for details) and assigned to five
divisions of increasing level of accumulation (white to dark purple) (Fernandez-Martinez et al., 2012). Representative examples of strains included in each division
are shown on the top. AU, arbitrary units. Error bars represent SEM. Scale bar, 5 mm.
(B) Mapping of the color code described in (A) into the Nup84 complex components. Horizontal lines represent the amino acid residue length of each protein and
truncated version; amino acid residue positions are shown on top of the lines.
(C) The severity of nuclear mRNA accumulation phenotype (detailed in A and B) for specific truncations of the Nup84 complex components are shown mapped
into the Nup82-Nup84 complex assembly. The color code is the same as the one described in (A). The Nup82 holo-complex density is shown in light blue.
(D) Subcellular localization of Nup82-GFP in Nup84 complex truncation mutants. Top: diagrams representing the Nup84 complex, with the corresponding
truncated region of the complex shown. Middle: localization of the genomically tagged Nup82-GFP reporter as determined by fluorescence microscopy. Bottom:
differential interference contrast (DIC) image of the same cells. Scale bar, 5 mm.
See also Figure S7.
fully consistent, as shown in Figure 5A. The Nup82 holo-complex When the Nup84 complex was aligned to the corresponding in-
overlaps with the localization density of Nup82, facing down into ner copy of its homolog (the Nup107-160 complex), the Nup82
the central channel, and is in close proximity to the Nup85 arm holo-complex aligned with a density projecting only from the
of the Nup84 complex. cytoplasmic ring, pointing toward the central channel (Figures
Previous attempts to align a single EM envelope for the yeast 5B and 5C). It has been suggested that this protrusion might
Nup82 complex to a human cryo-EM NPC map (Bui et al., indeed represent some aspect of the Nup88-Nup214 complex,
2013) led to divergent and ambiguous results (Gaik et al., the vertebrate counterpart to the Nup82 holo-complex (Bui
2015). However, we were able to unambiguously dock the et al., 2013). The yeast and human alignments both support
yeast Nup82-Nup84 complex assembly into the available hu- an overall conservation for certain major features of NPC
man cryo-EM maps (Bui et al., 2013; von Appen et al., 2015). architecture between fungi and metazoa and provide further
1222 Cell 167, 1215–1228, November 17, 2016

Figure 5. Position of the Nup82-Nup84
Complex Assembly within the NPC
(A) Fitting to the yeast NPC map. Two views of
the optimized alignment of two S. cerevisiae
Nup82-Nup84 complex assemblies into the
S. cerevisiae NPC localization probability density
map (transparent gray), together with a side view of
the detailed alignment (Alber et al., 2007b); Nup85
(green), Nup133 (red), and two Nup82 units (blue
and orange) are indicated. Scale bars, 100 Å.
(B) Comparison with the human NPC tomographic
cryo-EM map (EMDB: 2444) (Bui et al., 2013).
Two views of the optimized alignment of two
S. cerevisiae Nup82-Nup84 complex assemblies
(pink and blue) into the human NPC map (CCC =
0.72). One suggested localization for the human
Nup214/Nup88 complex is colored in yellow.
(C) Comparison with the mutant human NPC
tomographic cryo-EM map (EMDB: 3104) (von
Appen et al., 2015), lacking an outer cytoplasmic
Y-complex ring (CCC = 0.81).
See also Figure S6.
DISCUSSION
Structure and Evolution of the

Nup82 Holo-complex
We present the structure of the Nup82
holo-complex and show how it assem-
bles with the Nup84 complex and other
proteins to form the 24-subunit, 1.8-
MDa cytoplasmic mRNA export plat-
form in the NPC. Our structural analysis
therefore covers close to one-third of
the yeast NPC mass (Alber et al.,
2007b), which is now mapped in molec-
ular detail. Unexpectedly, the Nup82
holo-complex and its associated ma-
chinery do not form any kind of cyto-
plasmic filament, in contrast to how
it has been pictured in the literature.
On the contrary, it forms a strut that
faces the central channel. The Nup82
holo-complex exhibits an unusual archi-
tecture, with two compositionally iden-
tical trimers forming an asymmetric
structure. Hinges in coiled-coils allow
flexibility to convert two otherwise iden-
tically arranged subunits into two similar
but morphologically distinct subunits.
This structural arrangement, with flexi-
bility in the subunits permitting alter-
nate assemblies, is reminiscent of how
vesicle-coating proteins form variable
independent validation of our Nup82-Nup84 complex assembly architectures within the same coat complex, such as the hex-
structure. Importantly, the position of the Nup82 holo-complex agonal versus pentagonal architectures observed in clathrin-
FG repeat regions with respect to the whole NPC is suggestive coated vesicles (Cheng et al., 2007). Perhaps this variability
of an organized arrangement of transport factor docking sites is another echo of the evolutionary origin of the NPC in an
(see Discussion). ancient coating complex (Devos et al., 2004), and it may
Cell 167, 1215–1228, November 17, 2016 1223

also contribute to the observed flexibility of the NPC as a The Nup82 Complex Projects into the NPC’s Central
whole. Channel to Coordinate Efficient Export and Remodeling
Another feature shared by the NPC and its related coating of mRNPs
complexes is the presence of compositionally distinct but struc- The FG repeats associated with the Nup82 holo-complex
turally and evolutionarily related modules within the entire as- project from the end of the complex adjacent to the Nic96
sembly that arose from ancient duplication events (Alber et al., complex, toward the midplane of the central channel (Figure 6);
2007b; Devos et al., 2004; Fernandez-Martinez et al., 2012). there, they would neighbor the Nsp1, Nup57, and Nup49 FG
Indeed, there is another NPC subcomplex that also uses a repeats at the equator of the NPC (Kosinski et al., 2016;
trimeric bundle and appears to be homologous and evolution- Lin et al., 2016; Stuwe et al., 2015a). It is known that the rela-
arily related to the Nup82 holo-complex. We discovered this tive position of FG repeats in the Nup82 holo-complex are
relationship through a homolog detection search using HHPred crucial (Adams et al., 2014) and that the Mex67/Mtr2 dimer
(Söding, 2005), aiming to find structures comparable to the mediating mRNA export directly engages the FG repeats
coiled-coil regions of the three core Nup82 complex compo- associated with the Nup82 holo-complex (Strässer et al.,
nents. Remarkably, the top and highly significant hit (HHpred 2000; Trahan and Oeffinger, 2016) (Figure 3). Collectively,
p = 4.5E-60, 3.3E-9, and 0.0053 for Nup82, Nsp1, and these results suggest that the type and position of FG repeats
Nup159, respectively) was another complex from the NPC also in the Nup82 holo-complex are key for an efficient mRNA
containing a heterotrimer of coiled-coils: the Xenopus laevis export mechanism.
Nup93:Nup62:Nup58:Nup54 complex (Chug et al., 2015) and Surprisingly, we show that the Nup82 holo-complex does
its Chaetomium thermophilum Nic96:Nsp1:Nup57:Nup49 com- not project outward from the cytoplasmic face of the NPC, as
plex homolog (Stuwe et al., 2015a) (Figure S3). This similarity previously assumed. Instead, it projects inward, both radially
aided in generating high-confidence comparative models for and vertically. This arrangement has several important func-
our calculations (STAR Methods). The C termini of both com- tional consequences. First, based on the organization of the
plexes share a common domain arrangement, formed by three Nup82 holo-complex, this places the associated cytoplasmically
consecutive helical coiled-coil regions of different lengths, con- disposed FG repeat regions in intimate contact with the symmet-
nected by flexible linkers (Figure 1), and both complexes share rically positioned FG repeat regions in the central channel, form-
a common component, Nsp1. Collectively, these observations ing a continuous conduit of transport factor docking sites from
further support the idea that both complexes evolved from the nuclear to cytoplasmic sides of the NPC. Second, this
a single common precursor structure, providing yet another arrangement also places the mRNP remodeling machineries
example of an ancient duplication now generating diverse at the immediate cytoplasmic end of this channel (Figures 5
modules within the NPC, as postulated by our original protocoa- and 6). We suggest that the Nup82 holo-complex and Nup84
tomer hypothesis (Devos et al., 2004). complex position these cytoplasmic docking and remodeling
sites right over the central channel to efficiently capture export-
Spatial Organization of the FG Repeats ing mRNP particles immediately upon reaching the cytoplasmic
A common architecture and evolutionary origin might also imply end of the central channel; once captured, they can be directly
a degree of shared functionality. In the case of the Nup82 holo- processed by the proximally tethered Gle1/Dbp5/Nup159N re-
complex, the coiled-coil region serves as a strut to position modeling machinery rather than requiring a transfer mechanism
various transport factor docking sites out from the core scaffold to previously supposed distal processing sites on cytoplasmic
and toward the central channel of the NPC, where nucleocyto- filaments. Third, the transport factors released during remodel-
plasmic exchange is mediated (Figure 6). We therefore sug- ing are also potentially well positioned to be recycled back into
gest that the coiled-coil trimeric region of the homologous the nucleus, while the now translationally primed mRNP exits
Nic96:Nsp1:Nup57:Nup49 complex and that of the Nup82 to the cytoplasm (Figure 6). Our molecular architecture is fully
holo-complex perform analogous functions, namely to serve consistent with proposed mRNP remodeling models (Folkmann
as struts for the correct positioning of transport factor docking et al., 2011; Montpetit et al., 2011), as well as with the observa-
sites along the nucleocytoplasmic axis of the central transport tion that cytoplasmic release, but not translocation, is a rate-
channel (Figure 6). Being intrinsically disordered, the FG repeat limiting step during mRNA export (Oeffinger and Zenklusen,
regions themselves cannot form ordered structures to span the 2012). When translated into the overall NPC architecture, the
central channel. However, by providing a semi-rigid support, presence of eight remodeling hubs surrounding the central
the coiled-coil regions of the two complexes may act as flexible channel ensures a highly efficient system consistent with the
struts, placing the FG Nup docking sites so that they efficiently fast mRNA export rates observed in vivo (Grünwald and Singer,
occupy the central channel to form an effective selective bar- 2010; Mor et al., 2010; Smith et al., 2015). Other types of ribonu-
rier, perhaps such that the struts plus FG repeats together cleoproteins are also actively exported through the NPC, using
comprise the observed ‘‘central transporter’’ (Yang et al., pathways and components that largely overlap with those of
1998). Indeed, space-filling models based on size data for FG mRNA export (Nerurkar et al., 2015). It is thus reasonable
repeats (Yamada et al., 2010) (Figure 6) show that the FG to expect that our structural analysis would also serve as a
regions would project from the Nup82 holo-complex in such framework for revealing the mechanisms governing their transit
a manner as to essentially span the NPC’s central channel and maturation through the NPC.
and essentially form the top, cytoplasmic part of the central While the Nup82 holo-complex is a major nexus for RNA
transporter. export and remodeling processes, its human homolog when
1224 Cell 167, 1215–1228, November 17, 2016

Figure 6. The Nup82-Nup84 Complex Assembly Acts as a Scaffold to Organize the FG Region and mRNP Remodeling Sites in the NPC
Top: model for the arrangement of the FG regions associated to the Nup82 holo-complex. FG regions were modeled using molecular dynamics. The position of
the Nup116 FG regions is based on the position of their C termini (PDB: 3PBP (Yoshida et al., 2011) but could vary significantly, depending on the orientation of the
unstructured region connecting the FG domains (dotted blue line). N termini of Nup159 can interact with Dbp5 during mRNP remodeling, as indicated by the
dashed blue line. Sequential mRNP export and remodeling steps associated with each region of the complex are shown on the left. Bottom left: mapping of
disease-associated Gle1 mutations into our model for the mRNA export platform. The yeast Gle1 region equivalent to where disease-related mutations have been
found in human Gle1 were colored in purple (lethal congenital contracture syndrome 1 [LCCS-1]), gold (lethal arthrogryposis with anterior horn cell disease
[LAAHD]), and cyan (amyotrophic lateral sclerosis [ALS]), based on data described previously (Folkmann et al., 2014; Kaneb et al., 2015; Kendirgi et al., 2003).
Proteins are represented as described in Figure 3B. Dashed blue lines indicate identified protein-protein associations. Bottom right: schematic representation
comparing the previous view (left) of the Nup82 complex as components of cytoplasmically oriented filaments, with the new view (right) of how it instead forms
struts projecting toward the NPC central channel, positions the FG regions to fill the channel, and forms the top part of the central transporter region. See
also Table S3.
Cell 167, 1215–1228, November 17, 2016 1225

altered is also a major nexus for numerous diseases, as under- B Docking of the Nup82 Holo-complex and the Y-Shape
scored by the fact that the mammalian orthologs of Nup82 Nup84 Complex
(Nup88), Nup159 (Nup214), and Nup116 (Nup98) represent the d QUANTIFICATION AND STATISTICAL ANALYSIS
Nups most prevalent in cancer and developmental diseases d DATA AND SOFTWARE AVAILABILITY
(Simon and Rout, 2014). Hence, our structure may also help B Software
rationalize the modifications in this machinery that lead to severe B Data Resources
human diseases. For example, mutations in the human homolog
of Gle1 are associated with lethal congenital contracture syn- SUPPLEMENTAL INFORMATION
drome 1, lethal arthrogryposis with anterior horn cell disease
Supplemental Information includes seven figures, four tables, and two movies
(Nousiainen et al., 2008), and amyotrophic lateral sclerosis (Ka-
and can be found with this article online at http://dx.doi.org/10.1016/j.cell.
neb et al., 2015). We have been able to localize and orient Gle1 2016.10.028.
at the hump and spur-2 region of the Nup82 holo-complex, fac-
ing the NPC central channel (Figure 3). Our data (Table S3) also AUTHOR CONTRIBUTIONS
indicate that the C terminus of Nup42 is associated with the C
terminus of Gle1 (Strahm et al., 1999), where the Nup159 N-ter- Conceptualization, J.F.-M., S.J.K., Y.S., R.P., B.T.C., A.S., and M.P.R.; Inves-
tigation, J.F.-M., S.J.K., P.U., Y.S., R.W., I.N., W. Z., W.J.R., M.G. and D.Z.;
minal b-propellers are dynamically associated, with the Nup159
Formal Analysis, S.J.K., R.P., J.W., I.E.C. and A.S.; Writing, J.F.-M., S.J.K.,
FG regions oriented toward the arms of the Nup84 complex (Fig- A.S., B.T.C., and M.P.R.; Funding Acquisition, D.L.S., D.Z., A.S., B.T.C., and
ure 6) and Dbp5 physically associated with its ATPase cycle M.P.R.; Supervision, D.L.S., D.Z., A.S., B.T.C., and M.P.R.
modulators Gle1 and the Nup159 b-propeller (Montpetit et al.,
2011; Noble et al., 2011). Strikingly, the residues equivalent ACKNOWLEDGMENTS
to those causing disease states in human Gle1 (Folkmann
et al., 2014) all map to sites that anchor the yeast protein to either We would like to thank S.R. Wente, R. Sadeh, and A. Krutchinsky for sharing
yeast strains and plasmids; K. Uryu and the EMRC Resource Center at The
the Nup82 holo-complex or to Nup42 and Dbp5-Nup159N
Rockefeller University for assistance with negative stain EM; NYSGRC for
(Figure 6). These results, taken together with our structural and providing samples for SAXS; T. Matsui and T.M. Weiss at SSRL, SLAC Na-
functional analyses, underscore the importance of the Nup82 tional Accelerator Laboratory for assistance with collecting SAXS data; and
complex as a hub for anchoring the mRNA transport and pro- B. Raveh at UCSF for computing FG Nup models. Support was provided by
cessing machineries into the heart of the NPC itself and help the Simons Foundation grant 349247 (Simons Electron Microscopy Center,
explain why this complex is a focus for so many developmental, NYSBC), the NSERC, Canadian Institutes of Health Research grant MOP-
232642, and the Canadian Foundation for Innovation (D.Z.), as well as
oncogenic, and viral diseases.
NSF graduate research fellowship 1650113 (I.E.C.) and NIH grants U54
GM103511 (B.T.C., A.S., and M.P.R.), R01 GM112108 (M.P.R.), P41
STAR+METHODS GM109824 (M.P.R., A.S., and B.T.C.), P41 GM103314 (B.T.C.), and R01
GM083960 (A.S.).
Detailed methods are provided in the online version of this paper
Received: April 13, 2016
and include the following: Revised: July 20, 2016
Accepted: October 14, 2016
d KEY RESOURCES TABLE
Published: November 10, 2016
d CONTACT FOR REAGENT AND RESOURCE SHARING
d EXPERIMENTAL MODEL AND SUBJECT DETAILS REFERENCES
B Yeast Strains
d METHODS DETAILS Adams, R.L., Terry, L.J., and Wente, S.R. (2014). Nucleoporin FG domains
facilitate mRNP remodeling at the cytoplasmic face of the nuclear pore
B Affinity Purification of Protein Complexes
complex. Genetics 197, 1213–1224.
B Stoichiometry of the Nup82 Holo-complex
Alber, F., Dokudovskaya, S., Veenhoff, L.M., Zhang, W., Kipper, J., Devos, D.,
B Chemical Cross-linking and Mass Spectrometry
Suprapto, A., Karni-Schmidt, O., Williams, R., Chait, B.T., et al. (2007a). Deter-
B Chemical Cross-linking and Mass Spectrometry Anal- mining the architectures of macromolecular assemblies. Nature 450, 683–694.
ysis of the S. cerevisiae/S. kudriavzevii Nup82 Holo- Alber, F., Dokudovskaya, S., Veenhoff, L.M., Zhang, W., Kipper, J., Devos, D.,
complex Suprapto, A., Karni-Schmidt, O., Williams, R., Chait, B.T., et al. (2007b). The
B Negative Stain Electron Microscopy molecular architecture of the nuclear pore complex. Nature 450, 695–701.
B Fluorescence In Situ Hybridization Borneman, A.R., Desany, B.A., Riches, D., Affourtit, J.P., Forgan, A.H., Pretor-
B Fluorescence Microscopy ius, I.S., Egholm, M., and Chambers, P.J. (2012). The genome sequence of
B Integrative Structure Determination the wine yeast VIN7 reveals an allotriploid hybrid genome with Saccharomyces
B Clustering cerevisiae and Saccharomyces kudriavzevii origins. FEMS Yeast Res. 12,
88–96.
B Convergence of Sampling
Bui, K.H., von Appen, A., DiGuilio, A.L., Ori, A., Sparks, L., Mackmull, M.T.,
B Estimating Structure Precision Based on Variability in
Bock, T., Hagen, W., Andrés-Pons, A., Glavy, J.S., and Beck, M. (2013).
the Ensemble of Good-Scoring Structures Integrated structural analysis of the human nuclear pore complex scaffold.
B Fit to Input Information Cell 155, 1233–1243.
B Satisfaction of Data that Were Not Used to Compute Cheng, Y., Boll, W., Kirchhausen, T., Harrison, S.C., and Walz, T. (2007).
Structures Cryo-electron tomography of clathrin-coated vesicles: structural implications
B GFP Mass-Tagging Electron Microscopy for coat assembly. J. Mol. Biol. 365, 892–899.
1226 Cell 167, 1215–1228, November 17, 2016

Chug, H., Trakhanov, S., Hülsmann, B.B., Pleiner, T., and Görlich, D. (2015). Kendirgi, F., Barry, D.M., Griffis, E.R., Powers, M.A., and Wente, S.R. (2003).
Crystal structure of the metazoan Nup62dNup58dNup54 nucleoporin com- An essential role for hGle1 nucleocytoplasmic shuttling in mRNA export.
plex. Science 350, 106–110. J. Cell Biol. 160, 1029–1040.
Cox, J., and Mann, M. (2008). MaxQuant enables high peptide identification Knockenhauer, K.E., and Schwartz, T.U. (2016). The Nuclear Pore Complex as
rates, individualized p.p.b.-range mass accuracies and proteome-wide pro- a Flexible and Dynamic Gate. Cell 164, 1162–1171.
tein quantification. Nat. Biotechnol. 26, 1367–1372. Kosinski, J., Mosalaganti, S., von Appen, A., Teimer, R., DiGuilio, A.L., Wan,
Degiacomi, M.T., Iacovache, I., Pernot, L., Chami, M., Kudryashev, M., Stahl- W., Bui, K.H., Hagen, W.J., Briggs, J.A., Glavy, J.S., et al. (2016). Molecular
berg, H., van der Goot, F.G., and Dal Peraro, M. (2013). Molecular assembly of architecture of the inner ring scaffold of the human nuclear pore complex.
the aerolysin pore reveals a swirling membrane-insertion mechanism. Nat. Science 352, 363–365.
Chem. Biol. 9, 623–629. Lin, D.H., Stuwe, T., Schilbach, S., Rundlet, E.J., Perriches, T., Mobbs, G., Fan,
Devos, D., Dokudovskaya, S., Alber, F., Williams, R., Chait, B.T., Sali, A., and Y., Thierbach, K., Huber, F.M., Collins, L.N., et al. (2016). Architecture of the
Rout, M.P. (2004). Components of coated vesicles and nuclear pore com- symmetric core of the nuclear pore. Science 352, aaf1015.
plexes share a common molecular architecture. PLoS Biol. 2, e380. Ludtke, S.J., Baldwin, P.R., and Chiu, W. (1999). EMAN: semiautomated soft-
Ding, C., Li, Y., Kim, B.J., Malovannaya, A., Jung, S.Y., Wang, Y., and Qin, J. ware for high-resolution single-particle reconstructions. J. Struct. Biol. 128,
(2011). Quantitative analysis of cohesin complex stoichiometry and SMC3 82–97.
modification-dependent protein interactions. J. Proteome Res. 10, 3652– Lund, M.K., and Guthrie, C. (2005). The DEAD-box protein Dbp5p is required
3659. to dissociate Mex67p from exported mRNPs at the nuclear rim. Mol. Cell 20,
Erickson, H.P. (2009). Size and shape of protein molecules at the nanometer 645–651.
level determined by sedimentation, gel filtration, and electron microscopy. Lupas, A., Van Dyke, M., and Stock, J. (1991). Predicting coiled coils from pro-
Biol. Proced. Online 11, 32–51. tein sequences. Science 252, 1162–1164.
Fabre, E., and Hurt, E. (1997). Yeast genetics to dissect the nuclear pore com- McDonald, J.H. (2014). Handbook of Biological Statistics, Third Edition
plex and nucleocytoplasmic trafficking. Annu. Rev. Genet. 31, 277–313. (Sparky House Publishing).
Fernandez-Martinez, J., Phillips, J., Sekedat, M.D., Diaz-Avalos, R., Velaz- Montpetit, B., Thomsen, N.D., Helmke, K.J., Seeliger, M.A., Berger, J.M., and
quez-Muriel, J., Franke, J.D., Williams, R., Stokes, D.L., Chait, B.T., Sali, A., Weis, K. (2011). A conserved mechanism of DEAD-box ATPase activation by
and Rout, M.P. (2012). Structure-function mapping of a heptameric module nucleoporins and InsP6 in mRNA export. Nature 472, 238–242.
in the nuclear pore complex. J. Cell Biol. 196, 419–434.
Mor, A., Suliman, S., Ben-Yishay, R., Yunger, S., Brody, Y., and Shav-Tal, Y.
Folkmann, A.W., Noble, K.N., Cole, C.N., and Wente, S.R. (2011). Dbp5, Gle1- (2010). Dynamics of single mRNP nucleocytoplasmic transport and export
IP6 and Nup159: a working model for mRNP export. Nucleus 2, 540–548. through the nuclear pore in living cells. Nat. Cell Biol. 12, 543–552.
Folkmann, A.W., Dawson, T.R., and Wente, S.R. (2014). Insights into mRNA Nerurkar, P., Altvater, M., Gerhardy, S., Schütz, S., Fischer, U., Weirich, C.,
export-linked molecular mechanisms of human disease through a Gle1 struc- and Panse, V.G. (2015). Eukaryotic ribosome assembly and nuclear export.
ture-function analysis. Adv. Biol. Regul. 54, 74–91. Int. Rev. Cell Mol. Biol. 319, 107–140.
Frank, J., Radermacher, M., Penczek, P., Zhu, J., Li, Y., Ladjadj, M., and Leith, Noble, K.N., Tran, E.J., Alcázar-Román, A.R., Hodge, C.A., Cole, C.N., and
A. (1996). SPIDER and WEB: processing and visualization of images in 3D Wente, S.R. (2011). The Dbp5 cycle at the nuclear pore complex during
electron microscopy and related fields. J Struct Biol. 116, 190–199. mRNA export II: nucleotide cycling and mRNP remodeling by Dbp5 are
Gaik, M., Flemming, D., von Appen, A., Kastritis, P., Mücke, N., Fischer, J., controlled by Nup159 and Gle1. Genes Dev. 25, 1065–1077.
Stelter, P., Ori, A., Bui, K.H., Baßler, J., et al. (2015). Structural basis for assem- Nousiainen, H.O., Kestilä, M., Pakkasjärvi, N., Honkala, H., Kuure, S., Tallila, J.,
bly and function of the Nup82 complex in the nuclear pore scaffold. J. Cell Biol. Vuopala, K., Ignatius, J., Herva, R., and Peltonen, L. (2008). Mutations in mRNA
208, 283–297. export mediator GLE1 result in a fetal motoneuron disease. Nat. Genet. 40,
Gouy, M., Guindon, S., and Gascuel, O. (2010). SeaView version 4: A multiplat- 155–157.
form graphical user interface for sequence alignment and phylogenetic tree Oeffinger, M., and Zenklusen, D. (2012). To the pore and through the pore: a
building. Mol. Biol. Evol. 27, 221–224. story of mRNA export kinetics. Biochim. Biophys. Acta 1819, 494–506.
Griffith, O.M. (1994). Techniques of Preparative, Zonal, and Continuous Flow Petoukhov, M.V., Franke, D., Shkumatov, A.V., Tria, G., Kikhney, A.G., Gajda,
Ultracentrifugation, S.D. Applications Research Department (Beckman Instru- M., Gorba, C., Mertens, H.D., Konarev, P.V., and Vergun, D.I. (2012). New de-
ments, Inc.). velopments in the ATSAS program package for small-angle scattering data
Grünwald, D., and Singer, R.H. (2010). In vivo imaging of labelled endogenous analysis. J Appl Crystallogr. 45, 342–350.
b-actin mRNA during nucleocytoplasmic transport. Nature 467, 604–607. Pratt, J.M., Simpson, D.M., Doherty, M.K., Rivers, J., Gaskell, S.J., and Bey-
Hough, L.E., Dutta, K., Sparks, S., Temel, D.B., Kamal, A., Tetenbaum-Novatt, non, R.J. (2006). Multiplexed absolute quantification for proteomics using
J., Rout, M.P., and Cowburn, D. (2015). The molecular mechanism of nuclear concatenated signature peptides encoded by QconCAT genes. Nat. Protoc.
transport revealed by atomic-scale measurements. eLife 4, e10027. 1, 1029–1043.
Jones, D.T. (1999). Protein secondary structure prediction based on position- Ren, Y., Seo, H.S., Blobel, G., and Hoelz, A. (2010). Structural and functional
specific scoring matrices. J. Mol. Biol. 292, 195–202. analysis of the interaction between the nucleoporin Nup98 and the mRNA
Kaneb, H.M., Folkmann, A.W., Belzil, V.V., Jao, L.E., Leblond, C.S., Girard, export factor Rae1. Proc. Natl. Acad. Sci. USA 107, 10406–10411.
S.L., Daoud, H., Noreau, A., Rochefort, D., Hince, P., et al. (2015). Deleterious Romes, E.M., Tripathy, A., and Slep, K.C. (2012). Structure of a yeast
mutations in the essential mRNA metabolism factor, hGle1, in amyotrophic Dyn2-Nup159 complex and molecular basis for dynein light chain-nuclear
lateral sclerosis. Hum. Mol. Genet. 24, 1363–1373. pore interaction. J. Biol. Chem. 287, 15862–15873.
Kelley, K., Knockenhauer, K.E., Kabachinski, G., and Schwartz, T.U. (2015). Rout, Aitchison, Suprapto, A., Hjertaas, K., Zhao, Y., and Chait, B.T. (2000).
Atomic structure of the Y complex of the nuclear pore. Nat. Struct. Mol. Biol. The yeast nuclear pore complex: composition, architecture, and transport
22, 425–431. mechanism. J Cell Biol. 148, 635–651.
Kellner, N., Schwarz, J., Sturm, M., Fernandez-Martinez, J., Griesel, S., Zhang, Russel, D., Lasker, K., Webb, B., Velázquez-Muriel, J., Tjioe, E., Schneidman-
W., Chait, B.T., Rout, M.P., Kück, U., and Hurt, E. (2016). Developing genetic Duhovny, D., Peterson, B., and Sali, A. (2012). Putting the pieces together:
tools to exploit Chaetomium thermophilum for biochemical analyses of eu- integrative modeling platform software for structure determination of macro-
karyotic macromolecular assemblies. Sci. Rep. 6, 20937. molecular assemblies. PLoS Biol. 10, e1001244.
Cell 167, 1215–1228, November 17, 2016 1227

Sali, A., and Blundell, T.L. (1993). Comparative protein modelling by satisfac- Strawn, L.A., Shen, T., Shulga, N., Goldfarb, D.S., and Wente, S.R. (2004). Min-
tion of spatial restraints. J. Mol. Biol. 234, 779–815. imal nuclear pore complexes define FG repeat domains essential for transport.
Nat. Cell Biol. 6, 197–206.
Sali, A., Berman, H.M., Schwede, T., Trewhella, J., Kleywegt, G., Burley, S.K.,
Markley, J., Nakamura, H., Adams, P., Bonvin, A.M., et al. (2015). Outcome of Stuwe, T., Bley, C.J., Thierbach, K., Petrovic, S., Schilbach, S., Mayo, D.J.,
the first wwPDB Hybrid/Integrative Methods Task Force Workshop. Structure Perriches, T., Rundlet, E.J., Jeon, Y.E., Collins, L.N., et al. (2015a). Architecture
23, 1156–1167. of the fungal nuclear pore inner ring complex. Science 350, 56–64.
Stuwe, T., Correia, A.R., Lin, D.H., Paduch, M., Lu, V.T., Kossiakoff, A.A., and
Sampathkumar, P., Kim, S.J., Manglicmot, D., Bain, K.T., Gilmore, J., Gheyi,
Hoelz, A. (2015b). Nuclear pores. Architecture of the nuclear pore complex
T., Phillips, J., Pieper, U., Fernandez-Martinez, J., Franke, J.D., et al. (2012).
coat. Science 347, 1148–1152.
Atomic structure of the nuclear pore complex targeting domain of a Nup116
homologue from the yeast, Candida glabrata. Proteins 80, 2110–2116. Trahan, C., and Oeffinger, M. (2016). Targeted cross-linking-mass spectrom-
etry determines vicinal interactomes within heterogeneous RNP complexes.
Schneidman-Duhovny, D., Hammel, M., and Sali, A. (2010). FoXS: a web
Nucleic Acids Res. 44, 1354–1369.
server for rapid computation and fitting of SAXS profiles. Nucleic Acids Res.
Trigg, J., Gutwin, K., Keating, A.E., and Berger, B. (2011). Multicoil2: predicting
38, W540–W544.
coiled coils and their oligomerization states from sequence in the twilight zone.
Schneidman-Duhovny, D., Pellarin, R., and Sali, A. (2014). Uncertainty in inte- PLoS ONE 6, e23519.
grative structural modeling. Curr. Opin. Struct. Biol. 28, 96–104.
von Appen, A., Kosinski, J., Sparks, L., Ori, A., DiGuilio, A.L., Vollmer, B.,
Shi, Y., Fernandez-Martinez, J., Tjioe, E., Pellarin, R., Kim, S.J., Williams, R., Mackmull, M.T., Banterle, N., Parca, L., Kastritis, P., et al. (2015). In situ struc-
Schneidman-Duhovny, D., Sali, A., Rout, M.P., and Chait, B.T. (2014). Struc- tural analysis of the human nuclear pore complex. Nature 526, 140–143.
tural characterization by cross-linking reveals the detailed architecture of a Ward, J.J., McGuffin, L.J., Bryson, K., Buxton, B.F., and Jones, D.T. (2004).
coatomer-related heptameric module from the nuclear pore complex. Mol. The DISOPRED server for the prediction of protein disorder. Bioinformatics
Cell. Proteomics 13, 2927–2943. 20, 2138–2139.
Shi, Y., Pellarin, R., Fridy, P.C., Fernandez-Martinez, J., Thompson, M.K., Li, Weirich, C.S., Erzberger, J.P., Berger, J.M., and Weis, K. (2004). The N-termi-
Y., Wang, Q.J., Sali, A., Rout, M.P., and Chait, B.T. (2015). A strategy for dis- nal domain of Nup159 forms a beta-propeller that functions in mRNA export by
secting the architectures of native macromolecular assemblies. Nat. Methods tethering the helicase Dbp5 to the nuclear pore. Mol. Cell 16, 749–760.
12, 1135–1138. Yamada, J., Phillips, J.L., Patel, S., Goldfien, G., Calestagne-Morelli, A.,
Simon, D.N., and Rout, M.P. (2014). Cancer and the nuclear pore complex. Huang, H., Reza, R., Acheson, J., Krishnan, V.V., Newsam, S., et al. (2010).
Adv. Exp. Med. Biol. 773, 285–307. A bimodal distribution of two distinct categories of intrinsically disordered
structures with separate functions in FG nucleoporins. Mol. Cell. Proteomics
Smith, C., Lari, A., Derrer, C.P., Ouwehand, A., Rossouw, A., Huisman, M.,
9, 2205–2224.
Dange, T., Hopman, M., Joseph, A., Zenklusen, D., et al. (2015). In vivo sin-
Yang, Q., Rout, M.P., and Akey, C.W. (1998). Three-dimensional architecture
gle-particle imaging of nuclear mRNA export in budding yeast demonstrates
of the isolated yeast nuclear pore complex: functional and evolutionary impli-
an essential role for Mex67p. J. Cell Biol. 211, 1121–1130.
cations. Mol. Cell 1, 223–234.
Söding, J. (2005). Protein homology detection by HMM-HMM comparison.
Yang, B., Wu, Y.J., Zhu, M., Fan, S.B., Lin, J., Zhang, K., Li, S., Chi, H., Li, Y.X.,
Bioinformatics 21, 951–960.
Chen, H.F., et al. (2012a). Identification of cross-linked peptides from complex
Strahm, Y., Fahrenkrog, B., Zenklusen, D., Rychner, E., Kantor, J., Rosbach, samples. Nat. Methods 9, 904–906.
M., and Stutz, F. (1999). The RNA export factor Gle1p is located on the cyto- Yang, Z., Fang, J., Chittuluru, J., Asturias, F.J., and Penczek, P.A. (2012b). Iter-
plasmic fibrils of the NPC and physically interacts with the FG-nucleoporin ative stable alignment and clustering of 2D transmission electron microscope
Rip1p, the DEAD-box protein Rat8p/Dbp5p and a new protein Ymr 255p. images. Structure 20, 237–247.
EMBO J. 18, 5761–5777.
Yoshida, K., Seo, H.S., Debler, E.W., Blobel, G., and Hoelz, A. (2011). Struc-
Strässer, K., Bassler, J., and Hurt, E. (2000). Binding of the Mex67p/Mtr2p het- tural and functional analysis of an essential nucleoporin heterotrimer on the
erodimer to FXFG, GLFG, and FG repeat nucleoporins is essential for nuclear cytoplasmic face of the nuclear pore complex. Proc. Natl. Acad. Sci. USA
mRNA export. J. Cell Biol. 150, 695–706. 108, 16571–16576.
1228 Cell 167, 1215–1228, November 17, 2016

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
Rabbit IgG Protein A Purified Innovative Research Cat.# IR-RB-GF
Uranyl formate Electron Microscopy Sciences Cat. #22451 Cas. #16984-59-1
PreScission protease GE Healthcare Life Sciences Cat.# 27-0843-01
Coomassie R250 MP Biomedicals Cat.# 190682
GelCode Blue Stain Reagent Thermo Fisher Scientific Cat.# 24592
Trypsin Sequencing Grade, modified Roche Cat.# 11418033001
Endoproteinase Lys-C Sequencing Grade Roche Cat.# 11047825001
DSS(DiSuccinimidylSuberate)-H12/D12 Creative molecules Cat.# 001S
EDC (1-ethyl-3-(3-dimethylaminopropyl)carbodiimide Thermo Fisher Scientific Cat.# PI22980
hydrochloride)
Sulfo-NHS (N-hydroxysulfosuccinimide) Thermo Fisher Scientific Cat.# P124510
Trypsin Sequencing Grade, modified Roche Cat.# 11418033001
Endoproteinase Lys-C Sequencing Grade Roche Cat.# 11047825001
Iodoacetamide Sigma Cat.# I6125-10 g
L-arginine:HCl 13C6 Cambridge Isotope Cat.# CNLM-539-H-
Laboratories Inc.
L-lysine:2HCl 13C6 Cambridge Isotope Cat.# CNLM-291-H
Laboratories Inc.
TCEP Thermo Fisher Scientific Cat.# PI20491
Nupage LDS Sample buffer Life Technologies Cat.# NP0007
Poly-L-lysine Solution Sigma-Aldrich Cat.# P8920
Formamide Sigma-Aldrich Cat.# F9037
Lyticase from Arthrobacter luteus Sigma-Aldrich Cat.# L2524
tRNA from E. coli MRE 600 Roche Cat.# 10109541001
32% Paraformaldehyde (formaldehyde) aqueous solution Electron Microscope Sciences Cat.# 15714
Ribonucleoside-vanadyl complex New England Biolabs Cat.# S1402S
Ultrapure Salmon Sperm DNA Solution Invitrogen Cat.# 15632011
Prolong Gold Invitrogen Cat.# P36935
Dynabeads M270 Epoxy Thermo Fisher Scientific Cat # 143.02D
SYPRO Ruby Protein Gel Stain Thermo Fisher Scientific Cat.# S12000
Gel Filtration HMW Calibration Kit GE Healthcare Life Sciences Cat.# 28-4038-42
BugBuster Extraction Reagent EMD Millipore Cat# 70921-4
Gluthatione Sepharose 4b GE Healthcare Bioscience Cat# 17-0756-05
His-Trap HP GE Healthcare Biosciences Cat# 17-5247-01
Deposited Data
Chemical Cross-linking with Mass Spectrometry readout datasets Chorus https://chorusproject.org/
pages/index.html
Files containing the input data, scripts, and output structures N/A https://salilab.org/nup82
https://github.com/salilab/nup82
Experimental Models: Cell Lines

Continued
MATa/MATa ade2-1/ade2-1 ura3-1/ura3-1 his3-11,15/his3-11, WT W303
15 trp1-1/trp1-1 leu2-3,112/leu2-3,112 can1-100/can1-100
MATa ade2-1 ura3-1 his3-11,15 trp1-1 leu2-3,112 can1-100 WT W303a
MATa ade2-1 ura3-1 his3-11,15 trp1-1 leu2-3,112 can1-100 WT W303a
MATa/MATa ura3-52/ura3-52 his3-D200/his3-D200 trp1-1/ WT DF5
trp1-1 leu2-3,112/leu2-3,112 lys2-801/lys2-801
MATa ade2-1 ura3-1 his3-11,15 trp1-1 leu2-3,112 can1-100 This study Nup82-PPX-PrA
NUP82-PPX-ProteinA::HIS5
MATa ade2-1 ura3-1 his3-11,15 trp1-1 leu2-3,112 can1-100 This study Nup159-PPX-PrA
NUP159-PPX-ProteinA::HIS5
MATa/MATa ura3-52/ura3-52 his3-D200/his3-D200 trp1-1/ This study Nup82-PrA 2n
trp1-1 leu2-3,112/leu2-3,112 lys2-801/lys2-801 NUP82/
NUP82-ProteinA::HIS5
MATa/MATa ura3-52/ura3-52 his3-D200/his3-D200 trp1-1/ Rout et al., 2000 Nup159-PrA 2n
trp1-1 leu2-3,112/leu2-3,112 lys2-801/lys2-801 NUP159/
NUP159-ProteinA::HIS5
MATa/MATa ura3-52/ura3-52 his3-D200/his3-D200 trp1-1/ Rout et al., 2000 Nsp1-PrA 2n
trp1-1 leu2-3,112/leu2-3,112 lys2-801/lys2-801 NSP1/
NSP1-ProteinA::HIS5
MATa ade2-1 ura3-1 his3-11,15 trp1-1 leu2-3,112 can1-100 This study Nup82-PPX-PrA/
NUP82-PPX-ProteinA::HIS5 NUP159-GFP-3xFlag-6xHis::klURA3 Nup159-GFP3xF6xH
MATa ade2-1 ura3-1 his3-11,15 trp1-1 leu2-3,112 can1-100 This study Nup159-PPX-PrA/
NUP159-PPX-ProteinA::HIS5 NUP82-GFP-3xFlag-6xHis::klURA3 Nup82-GFP3xF6xH
MATa ade2-1 ura3-1 his3-11,15 trp1-1 lys2 leu2-3,112 can1-100 This study and Nup82-PPX-PrA/
Flag-LoxP-nsp1DFXFG-DFG NUP82-PPX-ProteinA::HIS5 Strawn et al., 2004 Nsp1DFXFG-DFG
MATa ade2-1 ura3-1 his3-11,15 trp1-1 leu2-3,112 can1-100 This study Nup120(397-1037)/
LoxP-NUP120D (aa1-396)-ProteinA::HIS5 Nup82-GFP3xF6xH
MATa ura3-52 his3-D200 trp1-1 leu2-3,112 lys2-801 LoxP- This study Nup85(233-744)/
NUP85D(aa1-232)-ProteinA::HIS5 NUP82-GFP-3xFlag-6xHis:: Nup82-GFP3xF6xH
klURA3
MATa ura3-52 his3-D200 trp1-1 leu2-3,112 lys2-801 NUP85- This study Nup85wt-PrA/
ProteinA::HIS5 NUP82-GFP-3xFlag-6xHis::klURA3 Nup82-GFP3xF6xH
MATa ura3-52 his3-D200 trp1-1 leu2-3,112 lys2-801 This study Nup84(1-573)/
NUP84D(aa574-726)-ProteinA::HIS5 NUP82-GFP-3xFlag- Nup82-GFP3xF6xH
6xHis::klURA3
MATa ura3-52 his3-D200 trp1-1 leu2-3,112 lys2-801 This study Nup145c(1-316)-(327-712)/
NUP145cD(aa317-326)-ProteinA::HIS5 NUP82-GFP-3xFlag- Nup82-GFP3xF6xH
6xHis::klURA3
MATa ade2-1 ura3-1 his3-11,15 trp1-1 leu2-3,112 can1-100 Fernandez-Martinez Nup84wt-PPX-PrA
NUP84-PPX-ProteinA::HIS5 et al., 2012
Saccharomyces kudriavzevii American Type Culture ATCC2601
Collection (ATCC)
Recombinant DNA
Nup82 QconCat (synthetic concatamer of tryptic peptides from Integrated DNA Technologies N/A
Nup82 complex components used as an internal standard for
quantitative mass spectrometry)
pGEX6p-1 (Ding et al., 2011) N/A
p424-Gal1-Dyn2 This study S. cerevisiae Dyn2 uniprot
Q02647
pProtA/HIS5 (Fernandez-Martinez N/A
et al., 2012)
pAG305GPD-ccdb-EGFP Addgene #14186

Continued
pAG305-skNup82ppx-EGFP This study skNup82 GenBank:
EHN01740.1
Oligo skN82Prom-F (see STAR Methods for sequence) This study N/A
Oligo skN82GTW_R2 (see STAR Methods for sequence) This study N/A
oligo dT probe (see STAR Methods for sequence) Exigon N/A
EMAN Ludtke et al., 1999 http://blake.bcm.edu/
emanwiki/EMAN1
Iterative Stable Alignment and Clustering (ISAC) Yang et al., 2012b http://sparx-em.org/
sparxwiki/sxisac
Spider Frank et al., 1996 http://spider.wadsworth.org/
spider_doc/spider/docs/spider.html
CX-Circos N/A http://cx-circos.net
MaxQuant (version 1.2.2.5) Cox and Mann, 2008 http://www.coxdocs.org/doku.
php?id=maxquant:start
ImageJ NIH https://imagej.nih.gov/ij/
pLink Yang et al., 2012a http://pfind.ict.ac.cn/software/
pLink/
Openlab Perkin Elmer http://cellularimaging.perkinelmer.
com/support/openlab_resources/
Integrative Modeling Platform (IMP), version 2.5 and Python Russel et al., 2012 https://integrativemodeling.org
Modeling Interface (PMI), version c7411c3
MODELER 9.13 Sali and Blundell, 1993 https://salilab.org/modeller/
HHPred Söding, 2005 https://toolkit.tuebingen.
mpg.de/hhpred
PSIPRED Jones, 1999 http://bioinf.cs.ucl.ac.uk/psipred/
DISOPRED Ward et al., 2004 http://bioinf.cs.ucl.ac.uk/
psipred/?disopred=1
DomPred http://bioinf.cs.ucl.ac.uk/
psipred/?dompred=1
COILS/PCOILS Lupas et al., 1991 https://toolkit.tuebingen.
mpg.de/pcoils
Multicoil2 Trigg et al., 2011 http://groups.csail.mit.edu/cb/
multicoil2/cgi-bin/multicoil2.cgi
SeaView, version 4.6 Gouy et al., 2010 http://doua.prabi.fr/software/
seaview
UCSF Chimera, version 1.10 https://www.cgl.ucsf.edu/
chimera/
MatplotLib, version 1.5 http://matplotlib.org/
GNUPLOT, version 4.8 Open software maintained http://www.gnuplot.info/
by the developer community
FoXS Schneidman-Duhovny https://modbase.compbio.
et al., 2010 ucsf.edu/foxs/index.html
ATSAS package (DAMMIF/ DAMMIN/ DAMAVER/PRIMUS), Petoukhov et al., 2012 https://www.embl-hamburg.
version 2.6 de/biosaxs/software.html
SASTOOL, version 0.9.5.3 SSRL beamline 4-2 http://ssrl.slac.stanford.edu/
at SLAC saxs/analysis/sastool.htm
Scatter SIBYLS beamline 12.3.1 https://bl1231.als.lbl.gov/
at LBNL scatter/
SAXS MOW, version 1.0 SAXS beam line at the http://www.if.sc.usp.br/
Brazilian Synchrotron Light saxs/obsolete/saxsmow.html
National Laboratory

Continued
Other
Superose 6 GL 30/100 GE Healthcare Life Sciences Cat.# 17-5172-01
Orbitrap Fusion Mass Spectrometer Thermo Fisher Scientific N/A
Easy-nLC 1000 HPLC Thermo Fisher Scientific N/A
Easy-Spray electrospray source Thermo Fisher Scientific N/A
NuPage 4-12% Bis-Tris Gel 1.0mm x 10 well Thermo Fisher Scientific Cat.# NP0321Box
CONTACT FOR REAGENT AND RESOURCE SHARING
Further information and requests for reagents may be directed to and will be fulfilled by the Lead Contact author Michael P. Rout
(rout@rockefeller.edu).
Yeast Strains
All Saccharomyces cerevisiae strains used in this study are listed in the Key Resources Table, with the exception of the Nup84 com-
plex truncation mutants that were described in detail in (Fernandez-Martinez et al., 2012). The Nup82 complex tagged strains were
constructed in a W303 (Mata/alpha ade2-1 ura3-1 his3-11,15 trp 1-1 leu2- 3,112 can1-100) background. Otherwise stated, strains
were grown at 30 C in YPD media (1% yeast extract, 2% bactopeptone, and 2% glucose). The Saccharomyces kudriavzevii strain
was obtained from the American Type Culture Collection (ATCC 2601) and grown in the same conditions as referred above for
S. cerevisiae.
METHODS DETAILS
Affinity Purification of Protein Complexes

To purify the native Nup82 complex, that we will call from now on Nup82 holo-complex (as it includes all its intact, full-length endog-
enous components), we constructed strains in which the NUP encoding gene was genomically tagged with a variant of the Staph-
ylococcus aureus Protein-A, preceded by the human rhinovirus 3C protease (ppx) target sequence (GLEVLFQGPS). The sequence
was introduced by PCR amplification of the transformation cassette from the plasmid pProtA/HIS5. Harvested yeast cells, grown in
YPD at 30 C to mid-log phase were frozen in liquid nitrogen and cryogenically lysed in a Retsch PM100 planetary ball mill (http://lab.
rockefeller.edu/rout/protocols). A total of 10-20 g of frozen cell powder were resuspended in 9 volumes of IP buffer (20mM HEPES
pH 7.4, 300mM NaCl, 2mM MgCl2, 0.1% Tween 20, 1mM DTT). Cell lysate was clarified by centrifugation at 20,000 g for 10 min. IgG
Ab conjugated magnetic beads (Invitrogen) at a concentration of 50 mL slurry/g of frozen powder were added to the clarified cell lysate
and incubated for 30 min at 4 C. Beads were washed three times with 1 mL of IP buffer without protease inhibitors. The native com-
plex was released from the affinity matrix by PreScission protease digestion in the same buffer. The recovered sample was then
centrifuged at 20.000 g for 10 min. The supernatant (50-100 ml) was loaded on top of a 5%–20% sucrose gradient made in IP buffer
without Tween 20 plus 1/1000 of protease inhibitors. Gradients were ultracentrifuged on a SW55 Ti rotor (Beckman) at 42.000 rpm
and 5 C for 17 hr. Gradients were manually unloaded from the top in 12 fractions of 410 ml. Fractions were analyzed by SDS-PAGE
and R250 Coomassie or Sypro Ruby staining.
A higher order complex, containing the Nup84 complex plus several other nups, including the Nup82 holo-complex components,
was identified previously (Fernandez-Martinez et al., 2012). The complex was affinity purified from a Nup84-ppx-PrA strain (STAR
Methods; Key Resources Table) as described above using as IP buffer 20mM HEPES pH 7.4, 20mM NaCl, 150mM potassium ac-
etate, 2mM MgCl2, 0.5% Triton X-100, 0.1% Tween 20, 1mM DTT, and processed for cross-linking and mass spectrometry analysis.
Stoichiometry of the Nup82 Holo-complex

Diploid strains, carrying one wild-type and one Protein-A-tagged version of each of the major Nup82 holo-complex components were
analyzed by affinity purification as described above and the identity of the bands verified by mass spectrometry (Figure S1). To deter-
mine the Stokes radius (Rs) of the Nup82 holo-complex, the natively eluted complex was run through a calibrated Superose 6 GL
30/100 column in 20mM HEPES pH 7.4, 150mM NaCl, 0.1% Tween 20 buffer, and the results plotted against reference protein stan-
dards (Ovalbumin, Rs: 3.05; Aldolase, Rs: 4.81; Ferritin, Rs: 6.1; Thyroglobulin, Rs: 8.5). The sedimentation coefficient (S20,w) of the
Nup82 holo-complex was estimated from the peak of the complex banded in sucrose gradients, run as described above, using
the formula S20,w = DI/(u2*t), where DI is the time integral, u the angular velocity (seconds-1) and t is time (seconds) (see also (Griffith,
1994)). The mass of the holo-complex was then calculated using the Siegel-Monte equation (Figures S1A and S1B) (Erickson, 2009).

Quantification of the relative amounts of each protein in the purified complex was performed using a synthetic concatamer
of tryptic peptides or QconCAT (Pratt et al., 2006) based on the Nup82 complex components (Figure S1D). Quantotypic
peptides for each of the four nucleoporins of the Nup82 complex were selected based on their mass spectrometric
behavior (Nup82: 7-LSALPIFQASLSASQSPR-24, 636-NQILQFNSFVHSQK-649; Nup159: 301-TNAFDFGSSSFGSGFSK-717,
948-TSESAFDTTANEEIPK-963; Nsp1: 779-TTNIDINNEDENIQLIK-795, 806-SLDDNSTSLEK-816; Dyn2: 64-NFGSYVTHEK-73, 53-
YGNTWHVIVGK-63). A synthetic gene (called Nup82 QconCAT) was designed by concatenation of the sequences encoding
the referred peptides and addition of a 6xHis c-terminal tag: (MKEIRNQILQFNSFVHSQKTNAFDFGSSSFGSGFSKNFGSYV
THEKTTNIDINNEDENIQLIKLSALPIFQASLSASQSPRTSESAFDTTANEEIPKYGNTWHVIVGKSLDDNSTSLEKQINSIKHHHHHH).
The E. coli codon optimized sequence was cloned into plasmid pGEX6p-1, resulting in the expression of a protein with an n-ter-
minal GST tag that was used both as a purification tag and sacrificial peptide (Ding et al., 2011). The Nup82-QconCAT protein was
expressed by growing 300ml of BL21 E. coli cells at 37 C to OD600 = 0.6 in minimal M9 media (Pratt et al., 2006) supplemented with
heavy arginine and lysine (L-arginine:HCl 13C6; L-lysine:2HCl 13C6, Cambridge Isotope Laboratories Inc.). IPTG (1mM) was used to
induce expression of the construct for 3 hr at 37 C. Harvested cells were processed using BugBuster Extraction Reagent (Novagen)
as indicated by the manufacturer. The full-length Nup82 QconCAT was then purified using a two-step method that ensures a final full-
length product by consecutive purification from the n and c-terminal tags: i) Clarified soluble material was incubated with 500 mL of
gluthatione Sepharose 4b (GE Healthcare) at room temperature for 1 hr at 4 C, and the retained proteins eluted using 2x 1ml of elution
buffer (20mM HEPES pH 7.4, 150mM NaCl, 45mM imidazole, 6M guanidinium hydrochloride, 1mM TCEP, 1/500 protease inhibitor
cocktail (PIC) (Sigma)). ii) The elution volume was then passed through an equilibrated His-Trap HP (GE Healthcare) at room temper-
ature. The retained Nup82 QconCAT was then eluted in 20mM HEPES pH 7.4, 500mM imidazole, 150mM NaCl, 6M guanidinium
hychloride, 1mM TCEP, 1/500 PIC. The resulting elution was analyzed by SDS-PAGE to ensure the presence of a full-length, pure
protein.
For the MS analysis, the Nup82 holo-complex was purified as described above. The gradient fractions containing the complex
were collected and concentrated by centrifugation at 355,000 g for 6 hr in a TLA 120.1 rotor at 4 C. The concentrated complex
was then resuspended in a final 1x Nupage LDS Sample buffer (Thermo Fisher Scientific), 10mM TCEP (Thermo Fisher Scientific).
The Nup82-QconCAT was ethanol precipitated and washed to eliminate the guanidinium chloride and resuspended in 1x Nupage
LDS Sample buffer, 10mM TCEP. Approximately equimolar amounts of complex and Nup82-Qconcat were combined to give a final
protein amount of 1 mg. The combined sample was heated at 72 C for 10 min and then alkylated using a final concentration of 30mM
iodoacetamide (Sigma). The sample was then loaded into a 4% (37.5:1) in-house prepared stacking acrylamide SDS-PAGE gel. The
resulting band, containing a mixture of Nup82 complex and stable-isotopically labeled Nup82 QconCAT proteins, was excised and
sequentially digested by endoproteinase LysC (Roche) and trypsin (Roche) inside gel matrix, followed by LC-MS analyses to deter-
mine L/H ratio of standard peptides. LC-MS analyses were performed on an Orbitrap Fusion Mass Spectrometer (Thermo Scientific),
with an Easy-nLC 1000 HPLC (Thermo Scientific) and an Easy-Spray electrospray source (Thermo Scientific). L/H ratios of standard
peptides were determined using the MaxQuant software (version 1.2.2.5) (Cox and Mann, 2008).
Overexpression of Dyn2 was performed mimicking the conditions described in (Gaik et al., 2015): the S. cerevisiae Dyn2 coding
sequence was cloned into the 2-micron plasmid p424-Gal1, under the control of the Gal-1 promoter. Overexpression was achieved
by growing the transformed yeast cells in yeast synthetic minimal media supplemented with 2% glucose, 1% raffinose, harvesting the
cells in mid-log phase, washing them with ddH2O and then transferring them to yeast synthetic minimal media supplemented with 2%
galactose, 1% raffinose for 3 hr at 30 C. Cells were then harvested and cryo-milled and the endogenous Nup82 holo-complex was
purified as described above using Nup82-PrA as the handle. Purified complexes were run in SDS-PAGE gels, stained with Sypro-
Ruby (Thermo Fisher Scientific) and the relative intensity of the different bands were quantified using ImageJ (http://imagej.net).
Chemical Cross-linking and Mass Spectrometry

The natively eluted complex (250 ml, in buffer 1- 20mM HEPES pH 7.4, 300mM NaCl, 0.1% Tween, 2mM MgCl2, 1mM DTT) was cross-
linked via the addition of DSS-H12/D12 (DiSuccinimidylSuberate) cross-linker (Creative Molecules) to yield a final concentration of
0.25 mM and incubated for 45 min at 25 C with gentle agitation in a shaker (900 rpm). The reaction was then quenched by
50 mM ammonium bicarbonate. In the case of cross-linking using EDC reagent (Pierce), the sample was equilibrated and natively
eluted in EDC cross-linking buffer (10mM BisTris pH 6.5, 100mM NaCl, 2mM MgCl2, 0.1% Tween, 1mM DTT). EDC (20 mM) and
N-hydroxysulfosuccinimide (0.4 mM) (i.e., 2% molar ratio with respect to EDC) were then added to cross-link the sample. The sample
was incubated for 45 min at 25 C with gentle agitation. After cross-linking, Tris-HCl pH 8.0 (50 mM) and b-mercaptoethanol (20 mM)
were added to the cross-linked sample to quench the reaction. After Cysteine reduction and alkylation, cross-linked samples were
separated in a 4%–12% NuPage SDS-PAGE (Invitrogen). Gels were briefly stained by GelCode Blue Stain Reagent (Thermo Fisher
Scientific) to enable the visualization of the cross-linked protein complexes. The cross-linked complexes were then digested in-gel
with trypsin or chymotrypsin to generate cross-linked peptides as previously described (Shi et al., 2014). After in-gel digestion, the
cross-linked peptide mixtures were fractionated by peptide SEC (Superdex Peptide PC 3.2/30, GE Healthcare) by an offline HPLC
(Agilent Technologies). Two or three SEC fractions covering the molecular mass range of 2.5 kD to 10 kD were subsequently
collected and analyzed by LC/MS. For cross-link identifications, the purified peptides were dissolved in the sample loading buffer
(5% MeOH, 0.2% FA) and analyzed by a LTQ Velos Orbitrap Pro mass spectrometer or an Orbitrap Q Exactive (QE) Plus mass spec-
trometer (Thermo Fisher). For the analysis by the Velos Orbitrap mass spectrometer, briefly, the dissolved peptides were pressure

loaded onto a self-packed PicoFrit column with integrated electrospray ionization emitter tip (360 O.D, 75 I.D with 15 mm tip, New
Objective). The column was packed with 10 cm reverse-phase C18 material (3 mm porous silica, 200 Å pore size, Dr. Maisch
GmbH). Mobile phase A consisted of 0.5% acetic acid and mobile phase B of 70% ACN with 0.5% acetic acid. The peptides
were eluted in a 120 or a 140 min LC gradient (8% B to 50% B, 0-93 min, followed by 50% B to 100% B, 93-110 min and equilibrated
with 100% A until 120 or 140 min) using a HPLC system (Agilent), and analyzed with a LTQ Velos Orbitrap Pro mass spectrometer. The
flow rate was 200-250 nL/min. The spray voltage was set at 1.9-2.3 kV. The instrument was operated in the data-dependent mode,
where the top eight-most abundant ions were fragmented by higher energy collisional dissociation (HCD) (normalized collisional
energy 27-29) and analyzed in the Orbitrap mass analyzer. The target resolution for MS1 was 60,000 and 7,500 for MS2. The QE in-
strument was directly coupled to an EASY-nLC 1200 System (Thermo Fisher) and experimental parameters were similar to those of
the Velos Orbitrap. The cross-linked peptides were loaded onto an Easy-Spray column heated at 35 C (C18, 3mm particle size, 200 Å
pore size, and 50 mm X 15cm, Thermo fisher). The top 8 or 10 most abundant ions (with charge stage of 3-7) were selected for frag-
mentation by HCD. The raw data were searched by pLink (Yang et al., 2012a) using a FASTA database containing protein sequences
of the complexes. An initial MS1 search window of 5 Da was allowed to cover all isotopic peaks of the cross-linked peptides. The data
were automatically filtered using a mass accuracy of MS1 % 10 ppm (parts per million) and MS2 % 20 ppm of the theoretical mono-
isotopic (A0) and other isotopic masses (A+1, A+2, A+3, and A+4) as specified in the software. Other search parameters include
cysteine carbamidomethyl as a fixed modification, and methionine oxidation as a variable modification. A maximum of two trypsin
missed-cleavage sites was allowed. The initial search results were obtained using a default 5% false discovery rate (FDR) – expected
by target-decoy search strategy. All spectra were manually verified. 94% of the cross-link identifications have a MS1 mass accu-
racy within 6 ppm. The cross-link data was visualized and analyzed by the CX-Circos software (manuscript in preparation).
Chemical Cross-linking and Mass Spectrometry Analysis of the S. cerevisiae/S. kudriavzevii Nup82 Holo-complex
To define the relative orientation of the two copies of Nup82 present in the Nup82 holo-complex we expressed an exogenous copy of
Nup82 from the yeast Saccharomyces kudriavzevii (called from now on skNup82). We selected S. kudriavzevii because it is a closely
related species that forms natural hybrids with S. cerevisiae, some of them used for wine fabrication (Borneman et al., 2012), and the
level of conservation at the amino acid level between both species is particularly high, ensuring functionality of the skNup82 version
and enough sequence variation to identify the specific peptides from each species protein version. S. kudriavzevii strain was obtained
from ATCC (ATCC 2601) and genomic DNA was prepared using standard methods. The 30 UTR and open reading frame for skNup82
was amplified and sequenced to account for potential mutations detected in the sequence available in the public database
(GenBank: EHN01740.1). The wild-type verified skNup82 sequence was found to encode a 716 amino acid protein with 75% identity
to the scNup82 primary sequence (alignment available upon request). The upstream 190 nucleotides (promoter) region and
the gene sequence were amplified using primers skN82Prom-F(50 -CACCGAAAGTTTATAGATTCAT-30 ) and skN82GTW_R2
(50 -GCTGGGCCCCTGGAACAGAACTTCCAGGCCGTTTTTTGGCTGAGTATTAGTG-30 ) that introduces an in-frame prescission pro-
tease cleavage site at the end of the skNup82 coding sequence. The PCR product was cloned using the pENTR/D-TOPO Cloning Kit
(Thermo Fisher Scientific) and then transferred to a modified pAG305GPD-ccdb-EGFP plasmid (Addgene), where the GPD promoter
had been eliminated through a SacI-XbaI (New England Biolabs) cleavage and refill. The resulting integrative plasmid, pAG305-
skNup82ppx-EGFP, was linearized using ClaI (New England Biolabs) and transformed into a diploid w303 S. cerevisiae strain. Suc-
cessful integrations were assessed by PCR; correct expression and localization of the skNup82-EGFP construct were confirmed by
western-blot and fluorescence microscopy, that showed the characteristic nuclear rim staining of a properly localized nucleoporin.
Affinity purification of the Nup82 complex using skNup82-EGFP as a handle showed all the components of the native Nup82
complex, including a substoichiometric amount of scNup82, showing correct incorporation of the construct into the native Nup82
complex. The isolated, purified complex (see above for details on purification) was analyzed by CX-MS (see above).
Negative Stain Electron Microscopy

Purified endogenous Nup82 complex samples were applied to glow-discharged carbon-coated copper grids and stained with 1%
uranyl formate. Images were collected on a Tecnai F20 (FEI Inc., USA) transmission electron microscope operating at an acceleration
voltage of 80 kV at 50,000x magnification and underfocus 1.5 mm. Images were recorded on a Tietz F224 4096x4096 CCD camera
(15 mm pixels) at 2x binning. The pixel size at the specimen level was 3.23 Å. Particles were selected using Boxer from EMAN (Ludtke
et al., 1999). The contrast transfer function (CTF) of the normalized images was determined using ctfit from EMAN and the phases
were flipped accordingly. After that, the particles were subjected to Iterative Stable Alignment and Clustering (ISAC; (Yang et al.,
2012b)) technique. A pixel error of 2O3 was used for the stability threshold. For comparison, the Nup82 holo-complex class averages
were aligned and paired with Nsp1-FGD class averages or with GFP-tagged Nup82 complex class averages using the modified Spi-
der ‘AP SH’ operation. Then the Nsp1-FGD class averages were subtracted from the Nup82 holo-complex class averages and the
Nup82 holo-complex class averages were subtracted from the GFP-tagged Nup82 complex class averages and difference maps
generated.
Fluorescence In Situ Hybridization

FISH on wild-type and Nup84 complex truncation mutant strains was performed in 96-well plates. A 35 nucleotide long oligo dT probe
(synthetized by Exiqon) and labeled post-synthesis with cy5 was used to detect poly A+ RNA [TT+TTT+TTTT+TTT+TTT+TT.

TT+TTT+TTT+TTT+TTT+TTTT, T+ represents locked nucleic acids (LNA). Cells were grown in SD complete at 25 C to OD 600 =
0.5-0.6 and fixed by the addition of para-formaldehyde at a final concentration of 4% for 45min at room temperature. Cells were
washed 3x with buffer B (1.2M Sorbitol, 100mM KHPO4 pH7.5), suspended in spheroplast buffer [1.2M Sorbitol, 100mM KHPO4
pH7.5, 20mM Ribonucleoside-vanadyl complex (NEB #S1402S), 20mM b-mercaptoethanol, 25U lyticase / 1OD600 of cells (Sigma
cat # L2524)] and incubated at 37 C until cell walls were digested. Digested cells were washed 2x with cold buffer B, attached to
polyA lysine (0.01%) treated 96 glass bottom MicroWell plate (MGB096-1-2-LG-L #0325289L2L) and stored in 70% ethanol
at 20 C. For hybridization cells were washed twice with 2 3 saline sodium citrate (SSC) and 1x 35% formamide/2 3 SSC. 20ng
of labeled dT LNA probe was resuspended in 35% (v/v) formamide, 2 3 SSC, 1 mg ml1 BSA, 10 mM ribonucleoside vanadyl com-
plex (NEB #S1402S), 5 mM NaHPO4, pH 7.5, 0.5 mg ml1 Escherichia coli tRNA and 0.5 mg ml1 single-stranded DNA and denatured
at 95 C for 3 min and cells hybridized overnight in the dark at 37 C. Cells were then washed in 35% formamide/2 3 SSC at 37 C
2x 30 min, followed by a 1 min wash in 1 3 PBS at room temperature followed by the addition of DAPI containing mounting medium
to each well (Prolong Gold - Invitrogen #P36935). Images were acquired using a Zeiss Z1 inverted microscope, a 100x 1.43 NA oil
objective and a AxioCam mRm CCD camera and the following filter sets: Zeiss 488050-9901-000 (Cy5), Zeiss 488049-9901-000
(DAPI). Three-dimensional datasets were generated by acquiring multiple 200 nm z stacks spanning the entire volume of cells, 3D
datasets reduced to 2D datasets by applying a maximum projection function in FiJi. The polyA accumulation phenotype was quan-
tified by determining the fraction of cells showing strong nuclear polyA accumulation. For each strain, at least 200 cells from at least
3 different fields were quantified.
Fluorescence Microscopy
Nup82 was genomically tagged with GFP on selected Nup84 complex truncation yeast mutant strains using standard techniques.
Cells were grown in YPD media at 30 C and visualized with a 63x 1.4 numerical aperture plan-apochromat objective using a Carl
Zeiss Axioplan 2 microscope equipped with a Hamamatsu Orca ER-cooled CCD camera. The system was controlled with Openlab
imaging software (Perkin Elmer). Images were treated with ImageJ (http://imagej.net/Welcome) and Adobe Photoshop (Adobe)
softwares.
Integrative Structure Determination

Our integrative structure determination of the Nup82 holo-complex proceeded through four stages (Figure S3D) (Alber et al., 2007a;
Alber et al., 2007b): (1) gathering of data, (2) representation of subunits and translation of the data into spatial restraints, (3) config-
urational sampling to produce an ensemble of structures that satisfies the restraints, and (4) analysis and validation of the ensemble
structures. The modeling protocol (i.e., stages 2, 3, and 4) was scripted using the Python Modeling Interface (PMI), version c7411c3, a
library for modeling macromolecular complexes based on our open-source Integrative Modeling Platform (IMP) package, version 2.5
(https://integrativemodeling.org) (Russel et al., 2012). Further details of the integrative modeling procedures are provided in Table 1,
as well as previous publications (Shi et al., 2014). Files containing the input data, scripts, and output structures are available online
(https://salilab.org/nup82; https://github.com/salilab/nup82).
Stage 1: Gathering of Data
The stoichiometry was determined via biochemical quantitation of the density-gradient purified Nup82 complex (Figure S1). 1,131
cross-links were identified via mass spectrometry (Figure 2A; Table S2). The atomic structures for some of the yeast Nup82 complex
components had been previously determined via X-ray crystallography (Table S1) (Romes et al., 2012; Sampathkumar et al., 2012;
Weirich et al., 2004; Yoshida et al., 2011). Their close homologs were identified by HHPred (Table S1) (Söding, 2005). Secondary
structure and disordered regions were predicted by PSIPRED (Jones, 1999) and DISOPRED (Ward et al., 2004), respectively (Table
S1). Coiled-coil regions of Nup82, Nsp1, and Nup159 were predicted by COILS/PCOILS (Lupas et al., 1991) and Multicoil2 (Trigg
et al., 2011) (Table S1). 21 EM class averages (Figure S2C) and 3 SAXS profiles (Figures S5D–S5F) were obtained as described in
STAR Methods and Table S4.
Stage 2: Representation of Subunits and Translation of the Data into Spatial Restraints
The domains of the Nup82 complex subunits were coarse-grained using beads of varying sizes representing either a rigid body or a
flexible string, based on the available crystallographic structures and comparative models (Table S1). In a rigid body, the beads have
their relative distances constrained during configurational sampling, whereas in a flexible string the beads are restrained by the
sequence connectivity (Shi et al., 2014). The residues in the rigid bodies and flexible strings corresponded to 37.3% and 62.7%
of the Nup82 complex, respectively. To maximize computational efficiency while avoiding using too coarse a representation, we
represented the Nup82 complex in a multi-scale fashion, as follows.
First, the crystallographic structures of each Nup82 complex domain were coarse-grained using two categories of resolution,
where beads represented either individual residues or segments of up to 10 residues. For the one-residue bead representation,
the coordinates of a bead were those of the corresponding Ca atoms. For the 10-residue bead representation, the coordinates of
a bead were the center of mass of all atoms in the corresponding consecutive residues (each residue was in one bead only). The
crystallographic structures covered 25.6% of the residues in the Nup82 complex.
Second, for predicted non-disordered domains of the remaining sequences, comparative models were built with MODELER 9.13
(Sali and Blundell, 1993) based on the closest known structure detected by HHPred (Söding, 2005) and the literature (Table S1) (Chug
et al., 2015; Stuwe et al., 2015a). Notably, structurally defined remote homologs (PDB: 5C3L and 5CWS) (Chug et al., 2015; Stuwe

et al., 2015a) were detected for the C-terminal coiled-coil regions of Nup82, Nup159, and Nsp1 (Figure S3; Table S1). Similarly to the
X-ray structures, the modeled regions were also coarse-grained using two categories of resolution, resulting in the 1-residue and
10-residue bead representations. The comparative models covered 11.7% of the residues in the Nup82 complex.
Finally, the remaining regions without a crystallographic structure or a comparative model (i.e., regions predicted to be disordered
without a known homolog) were represented by a flexible string of beads corresponding to up to 100 residues each. We used the
low-resolution representation (100 residues per bead) only for the unstructured FG repeats, whose structure is ‘‘decoupled’’ from
the configurations of the core of the Nup82 holo-complex (Alber et al., 2007a). The residues in these beads corresponded to
62.7% of the Nup82 complex.
To improve the accuracy and precision of the structure ensemble obtained through the satisfaction of spatial restraints (below), we also
imposed constraints based on crystallographically defined interfaces: Dyn27-92-Nup1591117-1126 (PDB: 4DS1) (Romes et al., 2012) and
Nup827-452-Nup1591429-1456-Nup116966-1111 (PDB: 3PBP) (Yoshida et al., 2011). The latter interface of ScNup116966-1111 was compared
with the structure of CgNup116882–1034 (PDB: 3NF5) (Sampathkumar et al., 2012), leading to the conclusion that the Nup116 interfaces
are consistent among different species. Subcomplexes including these interfaces were simply represented as rigid bodies.
With this representation in hand, we next encoded the spatial restraints into a Bayesian scoring function (Shi et al., 2014) based on
the information gathered in Stage 1, as follows.
First, the collected DSS and EDC cross-links were used to construct the Bayesian scoring function that restrained the distances
spanned by the cross-linked residues (Shi et al., 2014), taking into account the ambiguity due to multiple copies of identical subunits;
the ambiguous cross-link restraint considers all possible pairwise assignments in multiple copies of identical subunits, weighting
more the least violated distance(s).
Second, the excluded volume restraints were applied to each bead in 10-residue (or the closest) bead representations, using the
statistical relationship between the volume and the number of residues that it covered (Alber et al., 2007a).
Third, we applied the sequence connectivity restraint, using a harmonic upper bound on the distance between consecutive beads
in a subunit, with a threshold distance equal to four times the sum of the radii of the two connected beads. The bead radius was calcu-
lated from the excluded volume of the corresponding bead, assuming standard protein density (Alber et al., 2007a; Shi et al., 2014).
Fourth, 5 homo-dimer DSS cross-links between Nup159 residues of 1384-1384, 1387-1387, 1414-1414, 1417-1417, and 1432-
1432 as well as one homo-dimer DSS cross-link between Nup82 residues of 517-517 were transformed to upper-harmonic distance
restraints (up to 30 Å), enforcing the homo-dimer formation of the helices.
Finally, the EM 2D restraint (Shi et al., 2014) was imposed on the highest resolution representation of each subunit, using a negative
logarithm of the cross-correlation coefficient between the EM class average density and the best-matching density projection of the
structure as the em2D score (Stage 3). For sufficient precision, 100 projections were generated by uniform sampling of the unit sphere
(Shi et al., 2014). The pixel size of the resulting projection image was equal to the pixel size of the class average (3.23Å). The relative
weight of the final EM 2D restraint in the total score of a structure was set to 104, so that the scale of the em2D score matched those of
the other restraint types.
Most of the remaining information (stoichiometry, crystallographic structures of the subunits, their homologs, and the two crystal-
lographic interfaces) is included in the representation, whereas the SAXS profiles, immuno-EM class averages, and the density map
from single-particle EM reconstruction (Gaik et al., 2015) were used only for validating our final structures. See the IMP scripts for
details (https://salilab.org/nup82; https://github.com/salilab/nup82).
Stage 3: Conformational Sampling
Structural models of the Nup82 complex were computed using Replica Exchange Gibbs sampling, based on the Metropolis Monte
Carlo algorithm (Shi et al., 2014). The Monte Carlo moves included random translation and rotation of rigid bodies (up to 2 Å and 0.04
radians, respectively) and random translation of individual beads in the flexible segments (up to 3 Å). 8 to 16 replicas were used for
each run, with temperatures ranging between 1.0 and 2.5 (Table 1). A structure model was saved every 10 Gibbs sampling steps,
each consisting of a cycle of Monte Carlo steps that moved every rigid body and flexible bead once. The entire sampling procedure
(Steps 1 to 3) took 4 weeks on a cluster of 5,000 cores.
Step 1—Initial modeling against each corresponding EM 2D class
21 subsets of independent sampling runs were performed, each sampling run starting with a random initial configuration and
sampled against the EM 2D restraint of the corresponding class. The calculations were repeated 10 to 20 times per subset, producing
a total of 1,350,000 structures through the 270 independent runs.
Step 2—Application of the EM 2D filter
From the 1,350,000 structures from Step 1, we selected 650 structures whose em2D cross-correlation coefficient was at least 0.89
for at least 10 of the 21 class averages (Figure S4B).
Step 3—Refinement against all 21 EM 2D class averages
80 independent refinement runs were performed, each one starting with one of the 650 structures from Step 2. The scoring function
included em2D scores for all 21 class averages as well as other restraints listed above. The sampling produced a total of 10,000
structures. 463 top-scoring structures from Step 3 were subjected to the subsequent analysis in Stage 4.
Stage 4: Analysis and Validation of the Ensemble Structures
Input information and output structures need to be analyzed to estimate structure precision and accuracy, detect inconsistent and
missing information, and to suggest more informative future experiments. Assessment begins with structural clustering of the

modeled structures produced by sampling, followed by assessment of the thoroughness of structural sampling, estimating structure
precision based on variability in the ensemble of good-scoring structures, quantification of the structure fit to the input information,
structure assessment by cross-validation, and structure assessment by data not used to compute it. These validations are based on
the nascent wwPDB effort on archival, validation, and dissemination of integrative structure models, which we lead (Sali et al., 2015).
We now discuss each one of these points in turn.
Clustering
A prerequisite for structure analysis is the clustering of the structures generated by satisfying the input data (Alber et al., 2007b; Shi
et al., 2014). We used Ca root-mean-square deviation (RMSD) quality-threshold clustering (Shi et al., 2014). In general, there are three
possible modeling outcomes, based on the number of clusters of models and consistency between the models and information (Shi
et al., 2014). First, if only a single model (or a cluster of similar models) satisfies all restraints and all input information, there is likely
sufficient information for determining the structure (with the precision corresponding to the variability within the cluster). Second, if
two or more different models are consistent with the input restraints, the information is insufficient to define the single state or there
are multiple significantly populated states. If the number of distinct models is small, structural differences between models may sug-
gest additional experiments to narrow down the number of possible solutions. Third, if no model satisfies all input information, the
information or its interpretation in terms of the inferred spatial restraints is incorrect, in which case the representation needs to be
modified to include additional degrees of freedom, and/or sampling needs to be improved.
In the case of the Nup82 complex, the clustering analysis identified a single dominant cluster of 370 similar structures (Figures S4A
and S5B), corresponding to the most favorable outcome of the three possibilities described above. The average RMSD between the
major (370 structures) and minor clusters (93 structures) is relatively low at approximately 20Å, considering the resolution of the data,
the resolution of the coarse-grained molecular representation, and the variation within each cluster (Shi et al., 2014) (Figure S4A). As a
result, localization of all components is effectively identical between the major and minor clusters, differing only in the orientation of
the Nup82 b-propeller (Figure S5B). Most importantly, our functional interpretation of the structure is completely robust with regard to
the differences between the means of the two clusters.
Convergence of Sampling
Any structure determination or computational modeling exercise can be described as a structural sampling process, guided by a
scoring function (Alber et al., 2007a). Generally, good-scoring structures need to be found by a sampling, optimization, or enumer-
ation scheme. Unless structures are enumerated, the very first test needs to estimate the thoroughness of structural sampling or opti-
mization (Shi et al., 2014), which is often stochastic (e.g., Monte Carlo and Molecular Dynamics simulations). For stochastic methods,
thoroughness of sampling can be assessed by showing that two independent runs (e.g., using random starting configurations or
different random number generator seeds) do not result in significantly different solutions (Alber et al., 2007a; Fernandez-Martinez
et al., 2012; Shi et al., 2014). Given two or more sets of structures from independent runs, we first cluster structures from all sets
together, followed by assessing whether or not the runs contribute evenly to the population of each cluster, using the p value
from the c-square contingency test for homogeneity of proportions (McDonald, 2014).
For the Nup82 complex, the highly significant p value of 0.972 (Table 1) indicated that our Monte Carlo algorithm sampled all top-
scoring solutions at the resolution better than the precision of the dominant cluster. The caveat is that passing this sampling test is not
absolute evidence of thorough sampling; a positive outcome of the test may be misleading if, for example, the landscape contains
only a narrow, and thus difficult to find, pathway to the pronounced minimum corresponding to the correct structure.
Estimating Structure Precision Based on Variability in the Ensemble of Good-Scoring Structures

The ensemble of the top-scoring structures is analyzed in terms of the precision of its structural features (Alber et al., 2007a,
2007b). In general, commonly-used features include particle positions, distances, and contacts. Precision is defined by the feature
variability in the ensemble with a measure similar to the crystallographic isotropic temperature factor (Biso) (Figure S4C), and
likely provides the lower bound on its accuracy. Of particular interest are features present in most configurations in the ensemble
that have a single maximum in their probability distribution. The spread around the maximum describes how precisely the feature
is determined from the input information. The precision of component position is quantified as the average root-mean-square
fluctuation (RMSF) across all pairs of structures in the cluster, after least-squares superposition onto the centroid structure (Shi
et al., 2014).
For the Nup82 complex, the 9.0 Å precision of the core structured region in the dominant cluster was sufficiently high to pinpoint the
locations and orientations of the constituent proteins and domains (Figures 1 and S4C; Table 1), demonstrating the quality of the data
including the cross-links and EM 2D class averages. The localization probability density maps of every Nup82 subunit as well as the
whole complex were computed from the dominant cluster of the 370 solutions (Figures 1 and S4A).
Fit to Input Information

An accurate structure needs to satisfy the input information used to compute it. The ensemble of solutions was assessed in terms
of how well they satisfied information from which they were computed, including the cross-links, the excluded volume, sequence
connectivity, and the EM two-dimensional restraints.

First, the dominant cluster satisfied 88.5% of all combined cross-links (93.3% and 74.1% of the DSS and EDC cross-links, respec-
tively) (Figures 2B and S4D; Table 1); a cross-link restraint was satisfied by the cluster ensemble if the median Ca-Ca distance
of the corresponding residue pairs (considering restraint ambiguity) was < 35 Å and 30 Å for the DSS and the EDC cross-links,
respectively. Our cross-link data (10 DSS and 1 EDC cross-links) is in complete agreement with the crystal structure of
Nup827-452-Nup1591429-1456-Nup116966-1111 (PDB: 3PBP) (Figure 3).
Second, the EDC and DSS cross-links are highly consistent with each other, despite different chemistries, and there is significant
highly non-random clustering of both EDC and DSS cross-links into equivalent ‘‘cliques’’ (Figure 2A). These represent adjacencies, as
validated by those cliques that coincide with known crystallographic interface regions, such as Nup159:Dyn2 (PDB: 4DS1) (Romes
et al., 2012) and Nup159:Nup82 (PDB: 3PBP) (Yoshida et al., 2011); indeed, in our final calculated structure these cliques represent
immediately adjacent regions in the complex (Figure 2B).
Third, considering the more abundant DSS cross-links, as can be seen from Figure S4D (left), relatively few cross-links (< 7%)
remain unsatisfied by our structures. Of those that are not satisfied, most involve relatively modest distance violations that can clearly
be rationalized by locally limited flexibility of the proteins, as shown in Figure 2B (cross-link distance distributions). Moreover, those
few cross-links in violation of strict distance limits in our structure are nevertheless right next to one of the cliques; they are thus
consistent with the structure when locally limited flexibility is taken into account (Figures 2A and S4D) (Shi et al., 2014).
Fourth, the solutions also fit the EM class averages, with an average cross-correlation coefficient of 0.931 (Figure 2C; Table 1).
Finally, 99% of the top 463 solutions satisfied the excluded volume and sequence connectivity restraints under the combined score
threshold of 500.
Satisfaction of Data that Were Not Used to Compute Structures

In principle, our Bayesian modeling already effectively includes cross-validation via its Bayesian scoring function and sampling (Shi
et al., 2014). However, the most direct test of a modeled structure is by comparing it to the data that were not used to compute it (a
generalization of cross-validation). A structure can be validated directly against experimental data deliberately omitted from the
structural model calculation (Degiacomi et al., 2013). This goal is achieved by excluding a subset of the experimental data from struc-
ture calculation, followed by evaluation of the resulting structures against the omitted subset of data. This procedure is analogous to
the one used for calculating the crystallographic Rfree parameter and can be used to assess both the structure and the input data.
First, mass tagging of our structure is consistent with the localization of GFP tags on both the Nup82 and Nup159 C-termini (See
‘‘GFP mass-tagging analysis of the Nup82 holo-complex by immuno-EM’’ below and Figure 2C).
Second, our structure is consistent with the previously published data, including an independent negative stain 3D density map
(Figure S5A) (Gaik et al., 2015). Our asymmetric 19 nm long structure bears a general resemblance to the Nup82 complex class
averages by Gaik et al., except for having mostly one Dyn2 dimer at its end instead of five dimers (Gaik et al., 2015).
Third, the trimeric coiled-coil structure between the helical Nup82-Nup159-Nsp1 regions is recapitulated even when computed
using the chemical cross-linking data alone, without using the EM class averages (Figure S5C). We modeled the trimer using the avail-
able crystallographic structures, the helical regions predicted by PSIPRED (Jones, 1999), and the cross-links. All crystallographic
structures and predicted helical regions were kept rigid. We used an ideal helix template to construct the coordinates of the predicted
helical regions. We adopted the same multi-scale approach used to represent the entire Nup82 complex described above. The 500
best-scoring solutions satisfied all cross-links. The structural clustering of the 500 best-scoring solutions revealed that regions
Nup82522-612 - Nup1591211-1321 - Nsp1637-727 were consistently arranged into a trimeric helical bundle.
Fourth, our structure is in agreement with SAXS profiles and ab initio shapes of Nup82 constructs spanning residues 4-220, 4-452,
and 572-690 (Figures 2D and S5D–S5F; Table S4). Notably, the Nup82 coiled-coil (572-690) forms a kinked structure and the corre-
sponding SAXS profile shows a tendency of monotonous increase in the Kratky plot (Figure S5F), indicating a high degree of flexibility
between coiled-coil segments in solution, as would be expected for coiled-coils that form two different conformers as seen in the final
structure.
Finally, our structure is also validated by the non-random and clustered distribution of cross-links connecting the Nup82 holo-com-
plex to other parts of the NPC, revealing interaction sites, as described in ‘‘Docking of the Nup82 holo-complex and the Y-shape
Nup84 complex’’ below.
GFP Mass-Tagging Electron Microscopy

Two different types of GFP-tagged structures of the Nup82 holo-complex were generated by attaching a rigid-body GFP structure
(PDB: 1GFL) to either the Nup82 or Nup159 C-termini via the 14 linker residues of DPLALPVATPGIPM. For the Nup82 complex, the
best-scoring structure was used. The configuration of the GFP tags was optimized using the replica exchange Gibbs sampling as
described above using IMP. In summary, 10 independent sampling runs were performed, each run starting with a random initial
configuration of the GFP tags. 4 replicas were used for each run, with temperatures ranging between 1.0 and 2.5. We produced a
total of 50,000 structures each for the Nup82 and Nup159 GFP tags, using the EM 2D restraint of the corresponding immuno-EM
class average. As a result, the best-scoring model structures are consistent with the localization of the GFP tags on both the
Nup82 (ccc = 0.953) and Nup159 (ccc = 0.932) C-termini (Figure 2C).

Docking of the Nup82 Holo-complex and the Y-Shape Nup84 Complex
A structure of the Nup82 holo-complex interacting with the Y-shape Nup84 complex was obtained by rigid-body docking restrained
by 9 chemical cross-links identified at the interface (Table S3), using the replica exchange Gibbs sampling using IMP, as described
above. For the Nup82 complex, the best-scoring structure was used. For the Nup84 complex, our previous structure (Shi et al., 2014)
was refined by using new crystallographic structures of the complex subunits (PDB: 4XMM and 4YCZ) (Kelley et al., 2015; Stuwe
et al., 2015b). Next, 20 independent sampling runs were performed, each run starting with a random initial configuration. 6 replicas
were used for each run, with temperatures ranging between 1.0 and 2.5. We produced a total of 100,000 structures using the cross-
link restraints spanning the interface between the Nup82 holo-complex and the Nup84 complex. Subsequently, 200 top-scoring
structures were subjected to the clustering analysis, identifying 3 clusters (clusters A, B, and C; 86, 70, and 44 structures, respec-
tively) of solution structures (Figure S6A). At least 7 out of the 9 chemical cross-links were satisfied by the 200 top-scoring structures,
within the distance threshold of 35 Å. All our solutions were similar, differing only in the degree of the Nup82 complex rotation along its
long axis, relative to the Nup84 complex (Figure S6B). Precisions of the Nup82 holo-complex in the 3 clusters were 30.2, 11.0, and
39.0 Å, respectively.
Among the three clusters, only cluster C satisfied the cross-links used to compute them (Table S3) and the S. cerevisiae NPC local-
ization probability density map (fit score by overlapping volume = 0.46, Figures 5A and S6C) (Alber et al., 2007b). Notably, this cluster
of solutions is also the only one that aligns with the wild-type human NPC tomographic cryo-EM map (Figures 5B and S6D, EMDB
2444) (Bui et al., 2013) and the mutant one lacking an outer cytoplasmic Y-complex ring (Figure 5C, EMDB 3104) (von Appen et al.,
2015). The cross-correlation coefficients between the Nup82 holo-complex structure and the human NPC tomographic cryo-EM
maps are 0.72 (wild-type, Figures 5B and S6D) and 0.81 (mutant, Figure 5C) in cluster C (Table 1). The cross-correlation coefficients
were calculated using the measure correlation command in the UCSF Chimera software (https://www.cgl.ucsf.edu/chimera/).
See METHODS DETAILS for details on the statistical analyses.
Software
The modeling protocol (i.e., stages 2, 3, and 4) was scripted using the Python Modeling Interface (PMI), version c7411c3, a library for
modeling macromolecular complexes based on our open-source Integrative Modeling Platform (IMP) package, version 2.5 (https://
integrativemodeling.org) (Russel et al., 2012).
To display the CX-MS data we used the software CX-Circos (http://cx-circos.net).
Data Resources
The chemical cross-linking with mass spectrometric readout data used in this study was deposited in the Chorus database (https://
chorusproject.org/pages/index.html).
Files containing the input data, modeling scripts, and output structures are available online (https://salilab.org/nup82; https://
github.com/salilab/nup82).

Figure S1. Stoichiometry of the Endogenous Nup82 Holo-complex, Related to Figure 1

(A) Affinity-purified Nup82 holo-complex was loaded into 5%–20% sucrose density gradients. The resulting fractions were analyzed by Sypro Ruby stained SDS-
PAGE. A representative example is shown. The resulting sedimentation coefficient (S20,w) value corresponding to the main fractions was estimated from n = 4
gradients.
(B) Size-exclusion chromatography was used to estimate the Stokes radius (Rs) value for the affinity-purified Nup82 holo-complex. Sypro Ruby stained SDS-
PAGE gel of a representative experiment is shown. The resulting mass (M) of the holo-complex was calculated using the Siegel-Monte equation (Erickson, 2009).
(C) Sypro-Ruby stained SDS-PAGE gel showing the affinity purified Nup82 holo-complex from diploid strains containing one PrA tagged copy of the indicated
nucleoporin. Colored dots indicate the bands identified by mass spectrometry, with the protein ID indicated below. Blue indicates the tagged protein, and green
indicates components of the Nic96 complex co-purifying with Nsp1-PrA. Molecular weight standards are shown on the left.
(D) Sucrose density gradient purified Nup82 holocomplex was analyzed by quantitative proteomics using an internal standard (QconCat) as described in STAR
Methods. The relative stoichiometry was normalized to Nup82. Error bars represent the standard error of the mean for 2 in 2 biological and technical replicas.
(E) Affinity purified native Nup82 holo-complex from a strain carrying an empty plasmid (wild-type) or a Dyn2 overexpression plasmid (Dyn2 overex.) were
analyzed by SDS-PAGE and stained with Sypro-Ruby. The intensity of the resulting bands was quantified and normalized to the abundance of Nup159. The
relative amount of each protein between the wt and Dyn2 overex. was obtained and plotted. Error bars represent the standard error of the mean for n = 6.
Figure S2. Cross-linking-MS and Negative Stain Electron Microscopy Analyses of the Nup82 Holo-complex, Related to Figure 1
(A) Circos-XL plots showing the distribution of all DSS (left plot) and EDC (middle plot) cross-links identified within the native Nup82 holo-complex and to the
substoichiometric component Nup116. On the right side, a similar plot showing the DSS cross-links identified on the exogenous-skNup82-containing complex
(see STAR Methods) is shown. Each protein is represented as a colored segment, with the amino acid residue indicated on the outside of the plot and relevant
domains indicated inside each segment; regions without clear fold assignment are identified by clear tone colors. Inter-molecular cross-links are depicted as
purple lines and intra-molecular cross-links as gray lines. The internal circles include histograms representing the density of cross-links per 10 residues in DSS
and EDC (blue and light blue color for inter-molecular cross-links and intra-molecular cross-links, respectively) and the density of lysines in DSS (orange and light
orange bars for cross-linked and uncross-linked residues, respectively) or the density of lysine/carboxylic acid in EDC (pink and light pink bars for cross-linked
and uncross-linked residues, respectively).
(B) An example of a cross-link MS/MS spectrum (mass = 9,264 Da, z = 6) is shown. The corresponding b and y ion series and their charge states are assigned.

(C) Negative stain EM 2D class averages of the endogenous Nup82 holo-complex. 4,266 single particles were classified in 23 class averages using ISAC (Yang
et al., 2012b). The number of particles per class is indicated in the upper-left corner of each panel. The two class averages where a double Dyn2 dimer was
observed are indicated with an arrow. Scale bar, 10 nm.
(D) Comparison between the class averages of the Nup82 holo-complex (WT, left panels) and a variant of the complex with Nsp1 without the FG and FxFG regions
(Nsp1DFG, middle panels) (Strawn et al., 2004). To obtain the difference images on the right panels, each Nsp1DFG class average was aligned and paired with the
best matching WT class average in (C), followed by subtracting the Nsp1DFG class average from the WT class average. As a result, the same WT class average
could be used more than once. WT class averages taken from (C) are reproduced in (D) for clarity. Scale bar, 10 nm.
Figure S3. Structural and Evolutionary Relationship between the Nup82 and Nic96 Complexes and Four-Stage Scheme for Integrative
Structure Determination of the Nup82 Holo-complex, Related to Figure 1
Closest homologs of the Saccharomyces cerevisiae Nsp1 (A), Nup159 (B), and Nup82 (C) coiled-coil regions were detected by HHPred (Söding, 2005) (Table S1).
The multiple sequence alignment was visualized using SeaView 4.6 (Gouy et al., 2010), and numbering above alignment is relative to S. cerevisiae. Remarkably,

the top and highly significant hit is another complex from the NPC, also containing a heterotrimer of coiled-coils: the Xenopus laevis Nup93:Nup62:Nup58:Nup54
complex (PDB: 5C3L) (Chug et al., 2015) and its Chaetomium thermophilum Nic96:Nsp1:Nup57:Nup49 complex homolog (PDB: 5CWS) (Stuwe et al., 2015a). The
C-termini of both complexes share a common domain arrangement, formed by three consecutive helical coiled-coil regions of different lengths, connected by
flexible linkers (Figure 1), and both complexes even share a common component, Nsp1.
(D) Our integrative structure determination proceeds through four stages: (1) gathering of data, (2) representation of subunits and translation of the data into
spatial restraints, (3) configurational sampling to produce an ensemble of structures that satisfies the restraints, and (4) analysis and validation of the ensemble
structures. Further details are provided in Table 1, as well as STAR Methods. Files containing the input data, scripts, and output structures are available online
(https://salilab.org/nup82; https://github.com/salilab/nup82).
Figure S4. Validation of the Nup82 Holo-complex Structure: I, Related to Figure 2
(A) Clustering based on the RMSD distance matrix identified a single dominant cluster containing 370 of the 463 refined top-scoring models. The RMSD values are
colored from dark blue (0 Å) to dark red (30 Å).

(B) Representative em2d score distributions of initial structures show the cross-correlation coefficient ranging from 0.76 to 0.91 for the EM 2D class averages 4
(blue filled circle) and 19 (red filled square). We filtered structures above a cross-correlation threshold of 0.89 (black dotted line) for refinement. The final set of 650
filtered structures satisfies at least 10 class averages above the threshold.
(C) The positional precisions for each component of the Nup82 holo-complex were calculated as average RMSF across all pairs of structures in the cluster, after
least-squares superposition onto the centroid structure (Shi et al., 2014). The 9.0 Å precision of the core structured region in the dominant cluster was sufficiently
high to pinpoint the locations and orientations of the constituent proteins and domains, demonstrating the quality of the data.
(D) We assessed the DSS (left plot) and EDC (right plot) chemical cross-links in the dominant cluster; a cross-link restraint is satisfied by the cluster ensemble if the
median Ca-Ca distance of the corresponding residue pairs (considering restraint ambiguity) is < 35 Å and 30 Å for the DSS and the EDC cross-links, respectively.
Satisfied cross-links (93.3% DSS and 74.1% EDC) were represented by blue filled circles and the violated cross-links as blue empty circles. Same-residue cross-
links between two copies of the same protein are represented by red triangles.
Figure S5. Validation of the Nup82 Holo-complex Structure: II, Related to Figure 2
(A) Comparison of the localization probability density computed from our structure of the Nup82 holo-complex (left, light blue), with the previously published
negative stain EM tomography map of a truncated version of the Nup82 holo-complex (right, darker blue) (Gaik et al., 2015). The common and specific structural
features are indicated. Scale bar, 20 Å.
(B) Comparison between the major (370 structures) and minor (93 structures) cluster ensembles of the Nup82 holo-complex solutions. The average RMSD
between the major and minor clusters is relatively low at approximately 20Å, considering the resolution of the data, the resolution of the coarse-grained molecular
representation, and the variation within each cluster (Schneidman-Duhovny et al., 2014) (Figure S4A). As a result, localization of all components is effectively
identical between the major and minor clusters, differing only in the orientation of the Nup82 b-propeller. Most importantly, our functional interpretation of the
structure is completely robust with regard to the differences between the means of the two clusters.
(C) Trimeric coiled-coil-like structure predicted between the helical regions Nup82 (562-612) (dark blue), Nsp1 (667-722) (cyan) and Nup159 (1283-1327) (navy
blue). The model is computed using the chemical cross-linking data, crystallographic structures of domains, secondary structure predictions, and assuming a
1:1:1 stoichiometry of the complex. The shown ribbon is the backbone structure of a representative model chosen from the best scoring cluster of solutions. The
localization densities are calculated for the three helical regions on the best scoring cluster.
(D), (E), and (F) SAXS analyses of the recombinantly expressed Nup82 (4-220) (D), Nup82 (4-452) (E), and Nup82 (572-690) (F) constructs.
(LEFT) the experimental (black dots) and calculated SAXS profiles (red lines) using FoXS (Schneidman-Duhovny et al., 2010) are shown. The lower left plot
presents the residuals (calculated intensity/experimental intensity) of the corresponding SAXS sample.
(MIDDLE) Upper-middle inset shows the SAXS profiles in the Guinier plot with the calculated Rg fit value in Å. The linear behavior of the Guinier plots confirms a
high degree of homogeneity for all Nup82 SAXS samples in solution. Lower-middle inset shows the correspondent Kratky plot. The extrapolation curves (red lines)
are added to the Kratky plots. The Kratky plots are used to visually depict the level of macromolecular flexibility. A sample with a high degree of flexibility has a
monotonous increase in the Kratky curve, such as Nup82 (572-690) (F). In contrast, Nup82 (4-220) (D) and Nup82 (4-452) (E) show well-defined ‘‘bell-shaped’’
curves, indicating folded structures with less flexibility.
(RIGHT) Shown is a view of the ab initio shape (represented as a transparent envelope) computed from the experimental SAXS profile, with the best fit of a ribbon
representations of each construct. In (F), two ribbon representations of the equivalent Nup82 fragments are shown in the conformation they adopt within the
Nup82 holo-complex structure subunits 1 (red) and 2 (blue).
Figure S6. Validation of the Nup82-Nup84 Complex Assembly, Related to Figures 3, 5, and 6
(A) Clustering based on the RMSD distance matrix identified three clusters containing 86, 70, and 44 structures of the 200 top-scoring structures, respectively.
The RMSD values are colored from dark blue (0 Å) to dark red (135 Å).
(B) Comparison among the three cluster ensembles of the Nup82-Nup84 complex assembly. The localization probability density map for each of the three
clusters was shown as a transparent envelope. All our solutions were similar, differing only in the degree of the Nup82 complex rotation along its long axis, relative
to the Nup84 complex. Precisions of the Nup82 holo-complex in the 3 clusters were 30.2, 11.0, and 39.0 Å, respectively.

(C) Fitting of the Nup82-Nup84 complex assemblies to the S. cerevisiae NPC localization probability density map. Two views of the optimized alignment of two
S. cerevisiae Nup82-Nup84 complex assemblies into the S. cerevisiae NPC map (transparent gray), together with a side view of the detailed alignment (Alber et al.,
2007b); Nup85 (green), Nup133 (red), and two Nup82 units (blue and orange) are indicated. Among the three clusters, only cluster C satisfied both the crosslinks
used to compute them (Table S3) and the S. cerevisiae NPC localization probability density map (fit score by overlapping volume = 0.46).
(D) Comparison of the Nup82-Nup84 complex assemblies with the human NPC tomographic cryo-EM map (EMDB 2444) (Bui et al., 2013). Two views of the
optimized alignment of two S. cerevisiae Nup82-Nup84 complex assemblies into the human NPC map. Cluster C is the only one that aligns to the wild-type human
NPC tomographic cryo-EM map (CCC = 0.72).
Figure S7. Fluorescence In Situ Hybridization Analysis of mRNA Export Defects on Nup84 Complex Truncation Mutants, Related to Figure 4
The upper image of each row shows representative images of the localization of polyA mRNA by FISH (red) for each of the analyzed Nup84 truncation
mutants (Fernandez-Martinez et al., 2012). The lower image on each row shows the merged localization of polyA mRNA (red) and DNA stained with DAPI (blue).
Bar, 5 mm.
Article
Decoding Mammalian Ribosome-mRNA States by

Translational GTPase Complexes
Sichen Shao, Jason Murray, Alan Brown,
Jack Taunton, V. Ramakrishnan,
Ramanujan S. Hegde
Correspondence
ramak@mrc-lmb.cam.ac.uk (V.R.),
rhegde@mrc-lmb.cam.ac.uk (R.S.H.)
In Brief
The individual decoding factor,GTPase
complexes involved in protein synthesis
differentially remodel local protein and
RNA elements on ribosomes to ensure
translation fidelity.
Highlights Data Resources

d Cryo-EM structures of elongating, terminating, and stalled 5LZS
mammalian ribosomes 5LZT
5LZU
d Eukaryotic-specific elements contribute to stringent sense 5LZV
and stop codon decoding 5LZW
5LZX
d Pelota engages stalled ribosomes by destabilizing mRNA in
5LZY
the mRNA channel
5LZZ
d Decoding complexes communicate recognition to GTPase
activation in different ways
Shao et al., 2016, Cell 167, 1229–1240

November 17, 2016 ª 2016 MRC Laboratory of Molecular Biology.
Published by Elsevier Inc.
Article
Decoding Mammalian Ribosome-mRNA States

by Translational GTPase Complexes
Sichen Shao,1,3,4 Jason Murray,1,3 Alan Brown,1,3 Jack Taunton,2 V. Ramakrishnan,1,* and Ramanujan S. Hegde1,5,*
1MRC-LMB, Francis Crick Avenue, Cambridge CB2 0QH, UK
2Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
3Co-first author
4Present address: Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
5Lead Contact
*Correspondence: ramak@mrc-lmb.cam.ac.uk (V.R.), rhegde@mrc-lmb.cam.ac.uk (R.S.H.)

SUMMARY delivers eRF1 to stop codons; and Hbs1l delivers Pelota

(Dom34 in yeast) to stalled ribosomes. After delivery, the speci-
In eukaryotes, accurate protein synthesis relies on a ficity of each decoding factor is inspected at the ribosomal de-
family of translational GTPases that pair with specific coding center before being accepted into the catalytic peptidyl
decoding factors to decipher the mRNA code on ri- transferase center (PTC) of the ribosome. Acceptance of each
bosomes. We present structures of the mammalian decoding factor by the ribosome has distinct and irreversible
ribosome engaged with decoding factor,GTPase consequences: amino acid addition by aa-tRNA, translation
termination by eRF1, and the initiation of mRNA and protein
complexes representing intermediates of transla-
quality-control pathways by Pelota. Therefore, accurate de-
tion elongation (aminoacyl-tRNA,eEF1A), termina- coding of the transcriptome and maintenance of protein homeo-
tion (eRF1,eRF3), and ribosome rescue (Pelota, stasis relies on decoding factor,GTPase complexes recognizing
Hbs1l). Comparative analyses reveal that each the appropriate ribosome-mRNA complex.
decoding factor exploits the plasticity of the ribo- Our mechanistic understanding of decoding derives primarily
somal decoding center to differentially remodel from functional and structural studies of sense codon recogni-
ribosomal proteins and rRNA. This leads to varying tion by aa-tRNAs and the bacterial eEF1A homolog, EF-Tu (Voo-
degrees of large-scale ribosome movements and rhees and Ramakrishnan, 2013). The accuracy of accepting the
implies distinct mechanisms for communicating correct aa-tRNA is enhanced by a two-step mechanism that ex-
information from the decoding center to each ploits the interactions at the decoding center twice. GTP hydro-
GTPase. Additional structural snapshots of the lysis by EF-Tu irreversibly separates an initial selection step from
a secondary kinetic proofreading step (Blanchard et al., 2004).
translation termination pathway reveal the conforma-
During initial selection, aa-tRNA in complex with EF-Tu,GTP
tional changes that choreograph the accommoda-
samples ribosomes in a configuration in which the aminoacyl
tion of decoding factors into the peptidyl transferase group of the aa-tRNA is held by EF-Tu to prevent premature
center. Our results provide a structural framework engagement with the PTC (Schmeing et al., 2009). Cognate inter-
for how different states of the mammalian ribosome actions between aa-tRNA and mRNA at the ribosomal decoding
are selectively recognized by the appropriate decod- center are communicated to EF-Tu to activate GTP hydrolysis
ing factor,GTPase complex to ensure translational (Pape et al., 1998; Ogle et al., 2001, 2002), which ultimately leads
fidelity. to the dissociation of EF-Tu,GDP from the ribosomal complex
(Schmeing et al., 2009). This frees the aa-tRNA to ‘‘accommo-
date’’ into the ribosomal PTC, a rate-limiting step that relies on
INTRODUCTION the stability of the codon-anticodon interactions at the ribosomal
decoding center (Pape et al., 1998).
Successful protein synthesis by ribosomes requires amino acids Important differences from the paradigm established by aa-
to be incorporated correctly during polypeptide elongation, tRNA,EF-Tu probably exist for eukaryotic decoding factor,trans-
translation to terminate at precise points, and quality control lational GTPase complexes to account for higher translation accu-
pathways to be engaged when translation is interrupted (Dever racy (Kramer et al., 2010), the evolutionary divergence of the
and Green, 2012). In eukaryotes, each of these events is medi- mammalian ribosome, and the eukaryotic expansion of the trans-
ated by specific factors (collectively termed as decoding factors lational GTPase family to deliver non-tRNA factors to the ribo-
in this study) that are delivered to the A site of the ribosome by a somal A site (Atkinson et al., 2008). Biochemical studies and
specialized member of a subfamily of translational GTPases. moderate-resolution structures of several eukaryotic decoding
Members of this GTPase subfamily are structurally homologous complexes have revealed insights into conserved and distinct fea-
but have non-redundant functions (Dever and Green, 2012): tures of eukaryotic decoding complexes (Becker et al., 2011; De-
eEF1A delivers aminoacyl (aa)-tRNAs to sense codons; eRF3 ver and Green, 2012; Shoemaker and Green, 2012; Taylor et al.,
Cell 167, 1229–1240, November 17, 2016 ª 2016 MRC Laboratory of Molecular Biology. Published by Elsevier Inc. 1229
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
2012; des Georges et al., 2014; Preis et al., 2014). However, the mRNA, aa-tRNAs, and the nascent chain are averages of the
molecular interactions that accompany initial selection, communi- species captured. Despite this, the density at the decoding cen-
cate information from the decoding center to each GTPase, and ter is well defined, revealing that decoding in eukaryotes shares
mediate decoding factor accommodation in each case remain many features with that in bacteria (Ogle et al., 2001). In par-
incompletely understood. Using high-resolution electron cryo- ticular, the decoding nucleotides A1824 and A1825 (A1492 and
microscopy (cryo-EM), we analyze the molecular basis of A1493 in bacteria) are flipped out of helix 44 (h44) of 18S
specificity at the decoding center for each mammalian decoding rRNA. Together with G626 (G530 in bacteria) in the anti-confor-
factor,translational GTPase complex, compare potential GTPase mation, these bases inspect the geometry of the minor groove
activation mechanisms, and describe the conformational changes of the codon-anticodon helix (Figure 1B) and help stabilize the
governing the accommodation of decoding factors. These results A-site tRNA via hydrogen bonding. These interactions monitor
provide new insights into how these related complexes are Watson-Crick base-pairing at the first two codon positions (+1
able to make discriminatory interactions to recognize the appro- and +2) while providing tolerance at the +3 wobble position.
priate ribosome-mRNA substrates to maintain overall translational As in bacteria (Ogle et al., 2001), the ribosomal protein uS12
fidelity. projects a loop into the decoding center (Figures 1C and S4A).
Gln61 (Lys44 in E. coli) at the apex of the loop indirectly hydrogen
RESULTS AND DISCUSSION bonds with A1824 in its flipped-out position and with the +2
nucleotide. Pro62 adopts a conserved cis-peptide conformation
Cryo-EM Structures of Eukaryotic Translational (Noeske et al., 2015) that allows its backbone carbonyl to form a
Decoding Complexes water- or metal-mediated hydrogen bond with the +3 nucleotide
Translational decoding complexes (here defined as the elonga- (Figures 1C and S4A). Additional hydrogen bonds may be intro-
tion complex, 80S,aa-tRNA,eEF1A; the termination complex, duced by environmental condition-dependent hydroxylation of
80S,eRF1,eRF3; and the rescue complex, 80S,Pelota,Hbs1l) Pro62 (Loenarz et al., 2014; Noeske et al., 2015). Notably, these
are transient states that either rapidly dissociate or progress to hydrogen bonds are only with the mRNA backbone, allowing for
an accommodated state upon codon recognition. We therefore wobble base-pairing at the +3 position.
developed methods to trap or assemble these complexes Relative to bacterial decoding, the eukaryotic-specific ribo-
(Figure S1 and STAR Methods). To prepare the elongation com- somal protein eS30 may enhance the stability of a correct
plex, ongoing in vitro translation reactions in rabbit reticulocyte codon-anticodon interaction. In the presence of a cognate aa-
lysate of an N-terminally tagged protein were inhibited by the tRNA, the N terminus of eS30 becomes ordered, allowing a
elongation inhibitor didemnin B (Rinehart et al., 1981), and conserved histidine (His76) to reach into a groove between the
the ribosome-nascent chains (RNCs) were affinity purified via phosphate backbone of the anticodon +1 position and the two
the partially synthesized nascent polypeptide. To generate the flipped-out decoding bases to form potentially stabilizing con-
termination complex, we programmed and affinity purified tacts (Figures 1B and 1D). Because this groove depends on
RNCs with a UGA stop codon in the A site that were reconsti- the flipped nucleotides that accompany canonical codon-anti-
tuted with eRF1, eRF3, and the nonhydrolyzable GTP analog codon base-pairing, this interaction may preferentially stabilize
GMPPCP. Rescue complexes were prepared similarly to pro- cognate tRNAs to enhance discrimination.
duce RNCs containing an empty A site (generated with a trun- The A- and P-site tRNAs also appear to stabilize 15 residues
cated mRNA), or an A site occupied by either a stop codon or at the C terminus of uS19 that interacts with the phosphate
an AAA codon within a polyadenylated (poly(A)) tail, that were re- backbone of the P-site tRNA and may make electrostatic in-
constituted with Pelota, Hbs1l, and GMPPCP. The structure of teractions with the A-site tRNA (Figure 1E). Similar tRNA-
each complex was solved by cryo-EM to between 3.3 and dependent transitions in ribosomal proteins are observed in
3.8 Å resolution (Figure S2; Tables S1 and S2). bacteria, with the C terminus of uS13 instead of uS19 thread-
Each complex represents an unrotated ribosome containing ing between the anticodon stem loops of the A- and P-site
canonical P- and E-site tRNAs (Figures 1, 2, 3, and S2). The tRNAs in bacteria (Jenner et al., 2010). Deletion of the uS13
GTPase (G) domain and domains 2 and 3 of each GTPase C terminus in bacteria is associated with a reduced rate of
(Figure S3A) were well resolved, while the highly divergent translation and less efficient tRNA selection (Faxén et al.,
N-terminal extensions of Hbs1l and eRF3 were not visualized, 1994). Thus, the contacts formed by uS19, and especially by
presumably due to their flexibility. Each decoding factor (Fig- eS30, which is dependent on a cognate aa-tRNA, could in-
ure S3B) assumes a pre-accommodated conformation: the crease the stability of aa-tRNAs during initial selection and ac-
tRNA acceptor arm or the homologous M-C domains of eRF1 commodation, thereby reducing erroneous ejection of cognate
or Pelota interacts with the GTPase, and the tRNA anticodon aa-tRNAs during kinetic proofreading.
stem loop or structurally distinct N domain of eRF1 or Pelota Stop Codon Decoding by eRF1
occupies the decoding center (Figures 1, 2, and 3). Unlike translation elongation, the factors and mechanisms medi-
ating translation termination are not conserved between pro-
Decoding Factor Interactions at the Ribosomal karyotes and eukaryotes (Dever and Green, 2012). This includes
Decoding Center the mechanism of stop codon recognition, as well as the role of
Sense Codon Decoding in Eukaryotes termination-associated GTPases. Recent cryo-EM structures
As the ribosomes in the elongation complex (Figure 1A) are have revealed how accommodated eRF1 interacts with stop
stalled at different codons by didemnin B, the density for the codons (Brown et al., 2015b; Matheisl et al., 2015). However,
1230 Cell 167, 1229–1240, November 17, 2016

Figure 1. Structure of the Mammalian Elon-
A B A/T h44 gation Complex
His76 aa-tRNA (A) Overview of the elongation complex comprising
60S the large (60S) and small (40S) ribosomal subunits,
P- (green) and E-site (gold) tRNAs, mRNA (slate),
eS30 A1825 aminoacyl-tRNA in the A/T state (aa-tRNA; purple),
E-site P-site +1 A1824 and eEF1A (red).
tRNA tRNA (B) Decoding center of the elongation complex.
eEF1A eS30 (teal) and the decoding nucleotides of 18S
+2
rRNA (yellow) are indicated.
G626 +3 (C) EM map density and models of the interactions
within the decoding center of the elongation com-
plex. Decoding nucleotides of 18S rRNA (yellow),
A/T aa-tRNA
aa-tRNA (purple), the A-site codon (+1 to +3) of
C1698
mRNA (slate), and uS12 (orange) are indicated.
long NC
(D) Density and models of the interaction between
mRNA 40S
mRNA 18S rRNA His76 of the N terminus of eS30 (teal) within the
decoding center of the elongation complex. In
panels (C) and (D), density for mRNA, tRNA, and
C A/T aa-tRNA D
rRNA is contoured at 9s; density for uS12 and eS30
A1825 is contoured at 5s.
A/T aa-tRNA
18S h44 (E) The C termini of uS19 (bronze) and uS13 (brown)
of the mammalian (80S) elongation complex com-
A1824
pared to the homologous proteins in a 70S bacterial
+1 elongation complex (gray, PDB: 4V51), showing the
potential interactions of the C terminus of uS19 in
+2 18S h44 mammals or uS13 in bacteria with the anticodon
A1825 stem loops of A/T aa-tRNA (purple) and P-site tRNA
+3 (green).
mRNA Pro62 uS12
A1824 See also Figures S1, S2, S3, and S4.
E
uS13 (80S) His76
uS13 (70S)
uS19 (80S) a flipped-out A1825, and the base
P-site uS19 (70S) following the stop codon (+4) stacks with
tRNA G626 in the anti-conformation (Figures
A/T aa-tRNA 2B, 2C, and S4E) (Brown et al., 2015b;
uS12
Matheisl et al., 2015). Improved density
eS30 for the mRNA further reveals that the +5
base can stack with nucleotide C1698 of
18S rRNA, which protrudes into the
mRNA channel (Figures 2B and 2C). The
mRNA increased stability imparted by this addi-
tional stacking interaction explains why
a +5 purine can increase the effectiveness
the mechanism of stop codon recognition during the initial of a ‘‘weak’’ stop codon with a +4 pyrimidine (McCaughan et al.,
eRF1,eRF3 interaction with 80S ribosomes was unclear, as 1995).
earlier structures had only visualized this complex at moderate Recognition of Stalled Translation Complexes by Pelota
resolution (Taylor et al., 2012; des Georges et al., 2014; Preis Pelota has been reported to bind stalled ribosomes with an
et al., 2014; Muhs et al., 2015). To address this problem, pro- empty A site as well as those with an mRNA-occupied A site
grammed RNCs with a UGA stop codon in the A site were without sequence preference (Shoemaker et al., 2010). To deter-
used to isolate three intermediate states along the canonical mine the basis for this sequence-independent engagement by
termination pathway: (1) delivery of eRF1 to the stop codon by the rescue complex, we utilized our reconstitution method to
eRF3; (2) accommodated eRF1; and (3) accommodated eRF1 assemble 80S,Pelota,Hbs1l complexes with an A site that
after ABCE1 recruitment (Figures 2A, S1, S2, and S4B–S4D) lacked mRNA (assembled on a truncated mRNA), or that con-
(Brown et al., 2015b). tained either the UGA stop codon or the AAA sense codon
The structures show that the stop codon maintains the same (due to translation stalling within a poly(A) tail) (Figures 3A, S1,
compacted geometry and interactions with the eRF1 N domain and S2). The complex assembled on a truncated mRNA shows
(Brown et al., 2015b; Matheisl et al., 2015) throughout the termi- that the b30 -b40 loop of Pelota extends from the N domain to
nation pathway (Figures 2B and S4B–S4D), despite large re- protrude into the empty mRNA channel, following the path nor-
arrangements of the M and C domains of eRF1 (see below). In mally taken by mRNA (Figures 3B and S4F). A similar path is
this configuration, the +2 and +3 stop codon bases stack with taken by the shorter b30 -b40 loop of yeast Dom34 as observed
Cell 167, 1229–1240, November 17, 2016 1231

Figure 2. Structure of the Mammalian Termi-
A B eRF1
h44 nation Complex
(A) Overview of the termination complex assembled
60S A1825 with eRF1 (purple) and eRF3 (orange).
A1824 (B) Decoding center of the termination complex.
eS30 +3 (C) EM map density (contoured at 6s) and model
+2
E-site P-site showing interactions of the mRNA containing the
tRNA +4
tRNA UGA stop codon (slate) with rRNA elements of the
eRF3 decoding center (yellow).
+1
G626 C1698
eRF1
samples suggest that Pelota,Hbs1l is not
+5 recognizing a minor population of ribo-
NC-stop
mRNA somes that do not contain mRNA in the A
40S mRNA 18S rRNA site. Instead, we favor a mechanism by
which the Pelota b30 -b40 loop is able to
bind a variety of mRNA substrates and, in
C doing so, destabilizes the mRNA within
A1825
the channel. In support of this, the moder-
ate-resolution structure of Dom34,Hbs1
+2
Mg bound to ribosomes stalled by mRNA sec-
+3
+4 ondary structure (Becker et al., 2011) also
mRNA noted poor density within the mRNA
channel.
G626 Distinct Molecular Interactions
+1
C1698 +5
Govern Decoding Factor Selection
Comparisons of the overall architectures
(Figures 1A, 2A, and 3A) and the decoding
centers of our structures (Figures 1B, 2B,
18S rRNA and 3B) suggest that the mammalian ribo-
some does not display translational sta-
tus-specific cues to favor engagement by
a particular decoding factor,translational
GTPase complex. Instead, successful re-
cognition relies on decoding factors ex-
at moderate resolution (Becker et al., 2011). However, the ploiting the inherent plasticity of the mRNA and the ribosomal de-
higher-resolution information in our map allows the details of coding center, with sampling preference being biased by the
this interaction to be analyzed. The highly conserved residue overall abundance and local concentrations of each complex.
(Arg45) at the top of the b30 -b40 loop appears to play an Highly specific interactions form between decoding factors
anchoring role in the complex. Arg45 can hydrogen bond with and mRNA sequences during elongation and termination. In
His100, which is part of a conserved (Y/F/H)HT sequence on particular, the ribosomal protein eS30 may contribute to
b60 that interacts with 18S rRNA (Figure 3C). Arg45 is also part increasing the stringency of sense codon decoding in eukary-
of a wider hydrogen-bonding network that includes the decoding otes relative to bacteria. By contrast, the b30 -b40 loop of Pelota
nucleotide G626 in the anti-conformation (Figure 3C). Residues invariably inserts into the mRNA channel and follows the path
60-61 prevent the decoding nucleotide A1824 from flipping out normally taken by mRNA, regardless of the mRNA substrate
of h44, while A1825 is flipped out and interacts with Arg62. (Figure 3). Having to compete with mRNA for the channel may
Together, these and other potential interactions with uS3 and mean that Pelota,Hbs1l undergoes more futile attempts to
uS5 probably stabilize the otherwise flexible and poorly con- engage the ribosome than other decoding complexes. This
served loop (Kobayashi et al., 2010). Thus, the b30 -b40 loop is barrier and the relatively low abundance of Pelota and Hbs1l
well positioned to sense A site occupancy. (Geiger et al., 2012) probably renders Pelota,Hbs1l a poor
Surprisingly, in both reconstructions containing mRNA seq- competitor for elongating or terminating ribosomes. Only during
uence downstream of the P site, the conformation of the b30 - protracted periods of stalling, or with a truncated mRNA, would
b40 loop in the mRNA channel is unchanged, and we observe the likelihood of the b30 -b40 loop engaging the ribosomal A site
little to no density for the mRNA in the A site, while the mRNA up- increase.
stream of the A site is also noticeably more disordered (Figures Once inserted, the loop maintains the mRNA in a less stable
3D and 3E). The high occupancy of Pelota,Hbs1l in these data- state that may facilitate subsequent endonucleolytic cleavage
sets (26%), the purity of our biochemically isolated complexes, and/or ribosome splitting. Although endogenous substrates of
and no evidence of endonucleolytic mRNA cleavage in our Pelota,Hbs1l remain poorly characterized, this model is
1232 Cell 167, 1229–1240, November 17, 2016

Figure 3. Structure of the Mammalian
A B Rescue Complex
(A) Overview of the rescue complex assembled
with Pelota (pink) and Hbs1l (brown).
(B) Decoding center of the rescue complex.
(C) Hydrogen-bonding interactions between the
b30 -b40 loop of Pelota (pink) and 18S rRNA nucle-
otides (yellow).
(D and E) Density corresponding to mRNA in the
(D) termination or (E) rescue complexes both
assembled on the same (NC-stop) mRNA stalled
with the UGA stop codon in the A site. The ribo-
somal small subunit, P- and E-site tRNAs, and
eRF1 or Pelota are indicated.
peptidyl-tRNA and an unoccupied A site

C D and GTPase-associated center.
In the mammalian elongation complex,
there is a pronounced rotation of the
shoulder of the SSU toward the intersu-
bunit interface (Figures 4A and 4B) that
resembles domain closure in bacteria
(Ogle et al., 2001, 2002). This movement
raises the rRNA of the SSU platform
by 3–4 Å to closely contact domain 2
E of eEF1A. For an accurate comparison
and to avoid the possible influences of
crystal contacts, we re-analyzed the
conformational changes that occur in
A bacteria using high-resolution cryo-EM
structures of E. coli ribosomes with
(Fischer et al., 2015) and without (Bischoff
et al., 2014) an A-site tRNA (Figure 4C).
This shows that domain closure in bacte-
ria and mammals is broadly conserved,
although the rotation of the shoulder
supported by in vitro studies showing that Pelota,Hbs1l is more around rRNA h44 is slightly more pronounced in the bacterial
effective at mediating the recycling of ribosomes stalled on structure.
mRNAs with shorter lengths extending 30 of the P site (Pisareva Domain closure appears to be a specific response to aa-tRNA
et al., 2011; Shoemaker and Green, 2011) and suggests that ri- selection and does not occur in the presence of either eRF1 or
bosomes on more flexible mRNA (for example, mRNA that has Pelota (Figures 4D and 4E). However, subtle conformational
already been cleaved) or that are not engaged in active transla- changes can be observed, particularly in the rescue complex
tion are better substrates for Pelota,Hbs1l (van den Elzen et al., where displacement of the head of the SSU (Figure 4E) may
2010; Guydosh and Green, 2014). help the A site to accommodate the N domain of Pelota. The ex-
clusivity of domain closure to elongation suggests that the pre-
Implications of Specialized GTPase Complexes for cise positioning of elements within the decoding center is crucial
Eukaryotic Translation for this large-scale movement. Only in the elongation complex
Ribosomal Movements upon Decoding Complex are both decoding nucleotides A1824 and A1825 flipped out of
Engagement h44 (Figures 1B, 2B, and 3B). This configuration may work with
Cognate codon-anticodon recognition in the decoding center of G626 and neighboring proteins, particularly uS12, to tether the
bacterial ribosomes induces a subtle but large-scale conforma- interactions of the decoding center to propagate movement.
tional change in the small subunit (SSU), referred to as domain Consistent with this, our structures reveal considerable differ-
closure (Ogle et al., 2001, 2002) (Figure 4A). This movement ences in the position of uS12 relative to the mRNA and decoding
has been proposed to induce a tighter fit around the codon-anti- nucleotides in each complex (Figures S4A, S4E, and S4F). Direct
codon helix and to help activate the translational GTPase. To interactions between uS12, the mRNA, and the flipped-out
determine how the mammalian ribosome responds to recogni- A1824 nucleotide occur only in the presence of a cognate aa-
tion by different decoding factors, we compared each decoding tRNA. In eukaryotes, mutations in uS12 influence translation
complex to an unrotated rabbit ribosome containing a P-site fidelity (Alksne et al., 1993; Loenarz et al., 2014), similar to the
Cell 167, 1229–1240, November 17, 2016 1233

A head However, we believe this suggests that non-cognate tRNAs
have to go through the same activated state as cognate tRNAs
60S beak in order to be selected, rather than implying that domain closure
is not an intrinsic part of decoding. In physiological conditions,
the probability of reaching the activated state is likely much
eEF1A more favored for cognate interactions than for non-cognate
40S ones. Consistent with this, mutations expected to impede
+ eEF1A
domain closure are associated with hyperaccurate phenotypes
40S
+ empty A site
but a corresponding loss of translational efficiency (Andersson
head et al., 1986; Ogle and Ramakrishnan, 2005).
shoulder
Pre-accommodation Decoding Factor,GTPase
Interactions
B C The absence of domain closure in the termination and rescue
head head complexes suggests that these decoding factors may directly
beak communicate signals from the decoding center to the GTPase.
beak Decoding factors bound to translational GTPases adopt a pre-
accommodated conformation on the ribosome that prevents
shoulder shoulder the decoding factor from engaging the PTC. For aa-tRNAs, this
pre-accommodated state is referred to as the A/T state, which
acts as a paradigm for understanding the role of this conforma-
tion during decoding. In the pre-accommodated state, the
acceptor- and T-stems of the A/T aa-tRNA run parallel to, and
interact with, the adjoined b-barrel domains of eEF1A at the
interface with the G domain (Figures S5A and S5B), similar to
h44
recognition of aa-tRNAs by EF-Tu (Schmeing et al., 2009).
h44 Despite the aa-tRNA representing a mixture of species, the den-
D E sity for the 30 CCA is well defined (Figure S5C). The aminoacy-
lated terminal adenosine (A76) packs against the outside of the
domain 2 b-barrel in a pocket formed by two protruding loops
(b7-b8 and b10-b11; Figure S5C), while the aminoacyl group is
oriented into a spacious cavity between domain 2 and the G
domain that can accommodate all 20 amino acids.
The M domains of eRF1 and Pelota bind their respective trans-
lational GTPase in the same cleft between the G domain and
domain 2 (Figures S5D and S5E) (Becker et al., 2011; Taylor
et al., 2012). In both structures, the b7-a5 tip of the M domain
(which harbors the catalytic GGQ motif in eRF1) follows the path
of the 30 CCA of the aa-tRNA but does not extend as far as the
0 RMSD (Å) 5 staggered pockets in eEF1A that bind A76 and the variable ami-
h44 h44 noacyl group. Although these pockets exist in eRF3 and Hbs1l,
the lining residues are not conserved; indeed, the characteristics
Figure 4. Conformational Responses of the Ribosome to Decoding of the interface between each decoding factor and GTPase part-
Complexes
ner differ considerably (Figures S5B, S5F, and S5G). Compared to
(A) EM map of the elongation complex (colored) superposed on a ribosome
with an empty A site (gray small subunit), demonstrating the movement cor- eEF1A, both eRF3 and Hbs1l contain a more electronegative cleft
responding to domain closure (illustrated by the arrow). The shoulder region of to bind the positively charged region around the b7-a5 tip, which
the small subunit moves toward the large subunit, which maximizes the con- is needed to interact with the phosphate backbone of rRNA in the
tacts between a translational GTPase and the ribosome, particularly with the PTC after accommodation. Thus, prior to GTP hydrolysis, high-
GTPase center. affinity binding sites in translational GTPases maintain decoding
(B–E) Worm diagrams colored by pairwise root-mean-square deviation
factors in an unproductive conformation, in which the 30 CCA of
(RMSD) of the small subunits of (B) the elongation complex relative to a ribo-
some with an empty A site, (C) of a bacterial elongation complex (PDB: 5AFI) aa-tRNA or catalytic GGQ motif of eRF1 is held over 80 Å from
relative to an empty ribosome (PDB: 4UY8), and of the (D) termination and (E) the P-site tRNA ester bond in the PTC.
rescue complexes relative to the same reference as in (B). The directions of Comparison with the crystal structures of GTP-bound ternary
movements are indicated by arrows. The A site is indicated with a purple dot. complexes (Kobayashi et al., 2010, 2012) reveals that the N do-
mains of the decoding factors are oriented differently on the ribo-
restrictive mutations in bacteria (Ogle et al., 2002), supporting a some to engage the decoding center (Figures S5H and S5I). This
role for uS12 in stabilizing the conformation induced by codon may propagate conformational changes through the factor and
recognition. The same architecture may be induced with near- establish additional interactions between the M domain and
cognate tRNAs during crystallization (Demeshkina et al., 2012). the GTPase, particularly with the G domain, which harbors three
1234 Cell 167, 1229–1240, November 17, 2016

A C E Figure 5. The Didemnin B Binding Site
(A) Chemical structure of didemnin B.
(B) Fit of the model of didemnin B (blue) to the EM
map density contoured at 5s.
(C) Didemnin B binds at the interface between the
G domain (red) and domain 3 (yellow) of eEF1A.
Domain 2 is shown in orange. Relative to the
eEF1A crystal structure (PDB: 4C0S; gray), the
b15-b16 hairpin packs against didemnin B.
(D) Didemnin B occupies a hydrophobic pocket of
eEF1A (orientated as in C), which corresponds to
the binding site for kirromycin (green) on EF-Tu.
B D (E) Hydrogen-bonding interactions between di-
demnin B (blue) and eEF1A (pink).
See also Figures S5 and S6.
non-ribosome-bound state (Crepin et al.,

2014), the tip of this hairpin is displaced
by 4 Å to pinch didemnin B against the
G domain (Figure 5C). The conserved
loop residues 381–383 potentially form
hydrogen bonds with the backbone of
functionally important motifs (the P loop, and the switch 1 and the branch and Thr moiety of didemnin B (Figure 5E) that may
switch 2 loops) that are thought to form productive contacts not occur with the shorter branches in didemnin A or C, possibly
with the sarcin-ricin loop (SRL) of the ribosome to activate GTP explaining the greater potency of didemnin B (Rinehart et al.,
hydrolysis (Voorhees and Ramakrishnan, 2013). In the termina- 1981).
tion complex, the b7-a5 loop of eRF1 contacts the end of the The didemnin B binding site partially overlaps with that of the
switch 1 loop of eRF3 (residues 279–281) (Figure S5J). Similarly, linear polyketide kirromycin on EF-Tu, despite the different
the equivalent region of Pelota interacts with the a3-a4 loop chemical structures of the two compounds (Schmeing et al.,
of the Hbs1l switch 1 region, while the conserved PGF motif in 2009) (Figure 5D). Based on this observation, we propose a
the b8-a6 loop (Lee et al., 2007) together with His244 of Pelota conserved mechanism (Schmeing et al., 2009) that didemnin B
recognize part of the Hbs1l switch 2 loop (Figure S5K). Hence, serves to increase the effective number of contacts between
these interactions may permit the decoding factor to directly the GDP-bound G domain and domain 3 to prevent the inter-
facilitate the precise positioning of the GTPase G domain for pro- domain rotation that is necessary for eEF1A to release aa-tRNA
ductive GTP hydrolysis. and dissociate from the ribosome. Recently, didemnin B and an-
While direct communication via the decoding factor may be satrienin B have been shown to compete with ternatin for binding
particularly important in the termination and rescue complexes to eEF1A (Carelli et al., 2015), and mutations in eEF1A Ala399,
where domain closure was not observed (Figure 4), the phos- adjacent to the b15-b16 hairpin, were found to confer decreased
phate backbone of the acceptor arm of A/T aa-tRNA also makes sensitivity to didemnin B, ternatin, and another structurally unre-
potential electrostatic interactions with both switch regions of lated natural product, nannocystin A (Carelli et al., 2015; Krastel
eEF1A (Figure S5L). This is consistent with observations that et al., 2015). This suggests that these chemically diverse natural
an intact aa-tRNA is necessary to trigger EF-Tu hydrolysis (Pie- products share similar mechanisms of activity.
penburg et al., 2000), and that tRNA mutations can increase However, unlike bacterial elongation complexes trapped with
GTPase activation rates (Cochella and Green, 2005). kirromycin (Fischer et al., 2015; Schmeing et al., 2009), the
Didemnin B Prevents eEF1A Dissociation switch 1 loop of eEF1A is ordered in the GDP-bound elongation
The structure of the elongation complex shows that didemnin B, complex (Figures 6A and 6B). This was surprising, as the switch 1
a naturally occurring branched cyclic depsipeptide protein syn- loop is thought to universally facilitate the gating function of
thesis inhibitor (Rinehart et al., 1981; Li et al., 1984) (Figures 5A GTPases by transitioning from an ordered to a disordered state
and 5B), traps eEF1A in a post-hydrolysis GDP-bound state (Fig- upon GTP hydrolysis (Vetter and Wittinghofer, 2001; Voorhees
ure S6A) by occupying a cleft between the G domain and domain and Ramakrishnan, 2013) and has so far only been observed in
3 of eEF1A that is 20 Å from the GTPase active site (Figure 5C). an ordered state in the presence of nonhydrolyzable GMPPCP
Didemnin B binding appears to be predominantly stabilized by (Voorhees et al., 2010). This suggests that disordering of the
hydrophobic interactions, with the Leu and methylleucine (MeLeu) switch 1 loop in mammalian translational GTPases either occurs
moieties occupying a hydrophobic pocket on the surface of the as an independent step not immediately linked to Pi release, or is
domain 3 b-barrel (Figure 5D). Didemnin B is further held in stabilized as an indirect consequence of didemnin B binding.
place by a solvent-exposed b-hairpin insertion (b15-b16, residues Activated State of Eukaryotic Translational GTPases
375–391) of the domain 3 b-barrel, which is absent in bacterial Eukaryotic GTPases possess a short insertion (14 residues)
EF-Tu. Compared to the crystal structure of rabbit eEF1A in its relative to bacterial EF-Tu immediately preceding the switch 1
Cell 167, 1229–1240, November 17, 2016 1235

A B
Figure 6. Interactions between the GTPase and the Ribosome

(A) Comparison of the switch 1 loop (red) of eEF1A (pink) in the elongation complex with the EF-Tu switch 1 loop (teal) in the presence of GMPPCP (PDB: 4V5L)
(left). The switch 1 (Sw1) loop interacts with proteins and rRNA from both the large (blue) and small (yellow) subunits of the ribosome (right).
(B) EM map density and model of the interactions between the eEF1A switch 1 loop (red) with rRNA and proteins of the large (blue) and small (yellow) subunit.
Density for rRNA is contoured at 9s; density for eEF1A and uL14 is contoured at 5s.
(C) Sequence alignment of the switch 1 loop region of selected translational GTPases.
loop (Figure 6C). In our complexes, this insertion forms an of nucleotide A4607 of the SRL (His84 and A2662, respectively,
amphipathic a helix (a2; Figures 6A–6C) connected by a short in E. coli), and the hydrophobic gate formed by residues Val16
loop to a helical turn before adopting the same conformation and Ile71 (Val20 and Ile61 in EF-Tu) appears to be in an open
observed in the EF-Tu,GMPPCP complex (Voorhees et al., conformation. Similar configurations were observed in the
2010) (Figure 6A). In the elongation complex, the a2 helix lies termination and rescue complexes, which were reconstituted
across the surface of eEF1A to bury the hydrophobic face, while with GMPPCP (Figure S6B). Notably, the G domain of Hbs1l is
the polar residues on the other side interact with the ribosome. At further from the SRL, and the catalytic histidine (His348) is less
the top of the a2 helix, Arg37 stacks with nucleotide A464 from strongly coordinated. This could increase the length of time
h14 of SSU rRNA. The C-terminal part of the a2 helix and the Pelota,Hbs1l needs to be associated with the ribosome before hy-
following loop (residues 48–53) make multiple interactions with drolysis occurs, thereby increasing the stringency for a productive
uL14: Glu48 of eEF1A potentially forms salt bridge with Arg131 encounter.
of uL14, and contacts between the eEF1A loop with Arg6 and Specialization of Translational GTPases Regulates
Gly7 of uL14 appear to stabilize the usually disordered N termi- Initial Selection and Activation
nus of uL14. Additional contacts occur between Ser53 and Although the three translational GTPase partners share consid-
nucleotide G4600 of the SRL (Figure 6B). A similar network of in- erable structural similarity and superpose with root-mean-
teractions is seen for the a2 helix of eRF3 and Hbs1l. Together, square deviation (RMSD) values between 1.4 and 1.9 Å, they
these eukaryotic-specific interactions may help to stabilize the cannot complement each other (Wallrapp et al., 1998) and
switch 1 region, perhaps explaining why it is not disordered possess divergent interfaces specialized to interact with their
despite loss of the g-phosphate in the didemnin B-stalled elon- respective decoding factor (Figures S5B, S5F, and S5G). Similar
gation complex. sub-functionalization has not occurred in archaea, where aEF1a
An effect of the ordered switch 1 region in the elongation com- plays an omnipotent role to deliver aa-tRNA, aRF1, and aPelota
plex is that the eEF1A catalytic residues adopt the same confor- to ribosomes (Saito et al., 2010).
mation as seen in the ‘‘activated’’ state of EF-Tu trapped on the Our structures suggest several advantages of having a dedi-
bacterial ribosome by GMPPCP (Figure S6B) (Voorhees et al., cated translational GTPase for each decoding factor in maintain-
2010). In this conformation, the eEF1A catalytic histidine His95 ing overall translational fidelity. First, improved affinity between
on the switch 2 loop is coordinated by the phosphate backbone decoding factors and individual GTPases (Figures S5A–S5G),
1236 Cell 167, 1229–1240, November 17, 2016

A
B C
Figure 7. Conformational Changes during Accommodation

(A) Structures of ribosomal complexes representing intermediates along the eukaryotic translation termination pathway.
(B) The accommodated M domain (purple) of eRF1 is rotated by 140 relative to the pre-accommodated state (yellow). Gln185 of the catalytic GGQ motif, P-site
tRNA (green), the N domains in both states (pink), the C domain (pale blue) in the accommodated state, and the axis of M domain rotation (blue) are shown.
(C) Comparison of eRF1 (purple) in a pre-accommodated state (left) with an accommodated (right) conformation, showing straightening of a8 and a9 (blue) into a
continuous helix upon accommodation.
(D) Comparison of Pelota (pink) in a pre-accommodated state (left) with Dom34 (pink) in an accommodated (right) state (right; PDB: 3IZQ), revealing straightening
of a8 and a9 (blue).
See also Figure S7.
combined with distinct temporal and spatial distribution patterns, Conformational Changes Coordinate Decoding Factor
probably contribute to higher selectivity during decoding. Sec- Accommodation
ond, non-redundant pairing may allow for distinct mechanisms After GTP hydrolysis and GTPase dissociation, the decoding
for communicating decoding events to the GTPase (e.g., Fig- factor needs to accommodate fully into the PTC without dissoci-
ure 4), possibly via direct interactions between the decoding fac- ating from the ribosome. Our structural snapshots of the transla-
tor and motifs needed for GTP hydrolysis (Figures S5J and S5L). tion termination pathway reveal the conformational effects of
Finally, specialized complexes may have different dissociation accommodation on eRF1 and the ribosome (Figure 7A). After
constants and basal activation barriers to GTP hydrolysis that eRF3 dissociates, the M and C domains of eRF1 undergo large
could alter the general competitiveness of each decoding com- interdependent rotations relative to the static N domain. The
plex (Figures 4 and S6B). pre-accommodated and accommodated M domains are related
Cell 167, 1229–1240, November 17, 2016 1237

by a 140 rotation around Asp142 in the linker between the N and d EXPERIMENTAL MODEL AND SUBJECT DETAILS
M domains (Figure 7B). B Cell Lines
However, the driving force for this rearrangement may derive d METHOD DETAILS
from the hinge (centered on residue 276) between helices a8 B Constructs
and a9 connecting the M and C domains, which are held at an B Purification of recombinant proteins
acute kink (70 ) by eRF3. Accommodation relieves this confor- B In vitro transcription and translation reactions
mational strain by allowing a8 and a9 to straighten into a contin- B Sample Preparations
uous a helix (Figure 7C). Comparing pre-accommodated Pelota B Cryo-EM grid formation
with accommodated Dom34 (Becker et al., 2012) reveals a B Miscellaneous biochemistry
similar transition (Figure 7D). The confined environment around B Data collection
the decoding factors suggests that, as demonstrated for aa- B Image Processing
tRNAs (Whitford et al., 2010), the accommodation pathway of B Model building
eRF1 and Pelota is likely complex, comprising multiple steps. B Model refinement and validation
This may slow the rate of accommodation and provide more B Molecular graphics
opportunities for the decoding factor to dissociate when interac- d QUANTIFICATION AND STATISTICAL ANALYSIS
tions at the decoding center are suboptimal. d DATA AND SOFTWARE AVAILABILITY
A key structural difference between Pelota and eRF1 is B Data Resources
a ‘‘minidomain’’ insertion in the eRF1 C domain (residues 328–
373) (Figure S3B). The minidomain adopts different orientations SUPPLEMENTAL INFORMATION
during the termination pathway, although its movement is
restricted by a stacking interaction between Arg330 at the top Supplemental Information includes seven figures and three tables and can be
of the minidomain and Trp377 of the C domain (Figure S7A). In found with this article online at http://dx.doi.org/10.1016/j.cell.2016.10.046.
the recognition complex, the minidomain interacts with the N ter-

minus of eS31 that wraps around the flipped-out G1508 nucleo- AUTHOR CONTRIBUTIONS
tide of SSU rRNA, which may facilitate initial binding of the

Conceptualization, S.S., J.M., A.B., J.T., V.R., and R.S.H.; Investigation and
ternary complex to the ribosome. During accommodation, the Initial Draft, S.S., J.M., and A.B.; Editing, All Authors.
minidomain switches subunit partners: the contacts with eS31
are disrupted and new contacts with uL11 form, primarily via ACKNOWLEDGMENTS
an interaction with the C-terminal tail of eRF1 (Figure S7B).
Together, these may stabilize both the eRF1 C domain and the We thank J. Grimmett, T. Darling, S. Chen, and C. Savva for technical support,
L7/L12 stalk base to facilitate ABCE1 binding (Brown et al., D. Barford and A. Leslie for discussions, and K. Yanagitani for constructs. This
2015b), which displaces the minidomain by another 1.5 Å work was supported by the UK Medical Research Council (MC_UP_A022_1007
to R.S.H. and MC_U105184332 to V.R.), a St John’s College Title A fellowship
and further stabilizes the interactions with uL11.
(S.S.), and a Wellcome Trust Senior Investigator award (WT096570), the
Agouron Institute, and the Louis-Jeantet Foundation (V.R.). J.M. thanks T. Dever
Conclusions and the NIH Oxford-Cambridge Scholars’ Program for support.
Collectively, our structures suggest that specialization of eukary-
otic decoding factor,translational GTPase complexes enhances Received: July 13, 2016
overall translation fidelity and efficiency by allowing for distinct Revised: October 3, 2016
mechanisms of decoding (Figures 1, 2, and 3), activation (Figure 4),
and accommodation (Figure 7). Our results also highlight funda-
mental differences from the bacterial system, including eukary- REFERENCES
otic-specific elements that increase the stringency of sense codon
decoding (Figure 1) and the absence of domain closure in certain Adams, P.D., Afonine, P.V., Bunkóczi, G., Chen, V.B., Davis, I.W., Echols, N.,
decoding complexes (Figure 4). This implicates novel mechanisms Headd, J.J., Hung, L.-W., Kapral, G.J., Grosse-Kunstleve, R.W., et al. (2010).
for communicating information from the decoding center to eu- PHENIX: A comprehensive Python-based system for macromolecular struc-
karyotic translational GTPases, and subtle but important variations ture solution. Acta Crystallogr. D Biol. Crystallogr. 66, 213–221.
in the rates of GTPase activation and accommodation of eukary- Alksne, L.E., Anthony, R.A., Liebman, S.W., and Warner, J.R. (1993). An accu-
racy center in the ribosome conserved over 2 billion years. Proc. Natl. Acad.
otic decoding complexes. Together, these distinctions likely trans-
Sci. USA 90, 9538–9541.
late into decisive differences in the competitive advantage of each
Amunts, A., Brown, A., Bai, X.-C., Llácer, J.L., Hussain, T., Emsley, P., Long,
decoding complex for different ribosome-mRNA substrates.
F., Murshudov, G., Scheres, S.H.W., and Ramakrishnan, V. (2014). Structure
of the yeast mitochondrial large ribosomal subunit. Science 343, 1485–1489.
STAR+METHODS Andersen, G.R., Pedersen, L., Valente, L., Chatterjee, I., Kinzy, T.G., Kjeldgaard,
M., and Nyborg, J. (2000). Structural basis for nucleotide exchange and compe-
Detailed methods are provided in the online version of this paper tition with tRNA in the yeast elongation factor complex eEF1A:eEF1Balpha. Mol.
and include the following: Cell 6, 1261–1266.
Andersson, D.I., van Verseveld, H.W., Stouthamer, A.H., and Kurland, C.G.
d KEY RESOURCES TABLE (1986). Suboptimal growth with hyper-accurate ribosomes. Arch. Microbiol.
d CONTACT FOR REAGENT AND RESOURCE SHARING 144, 96–101.
1238 Cell 167, 1229–1240, November 17, 2016

Atkinson, G.C., Baldauf, S.L., and Hauryliuk, V. (2008). Evolution of nonstop, Dever, T.E., and Green, R. (2012). The elongation, termination, and recycling
no-go and nonsense-mediated mRNA decay and their termination factor- phases of translation in eukaryotes. Cold Spring Harb. Perspect. Biol. 4,
derived components. BMC Evol. Biol. 8, 290. a013706.
Bai, X.-C., Rajendra, E., Yang, G., Shi, Y., and Scheres, S.H. (2015). Sampling Emsley, P., Lohkamp, B., Scott, W.G., and Cowtan, K. (2010). Features and
the conformational space of the catalytic subunit of human g-secretase. eLife development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501.
4, e11182. Faxén, M., Walles-Granberg, A., and Isaksson, L.A. (1994). Antisuppression by
Becker, T., Armache, J.-P., Jarasch, A., Anger, A.M., Villa, E., Sieber, H., Mo- a mutation in rpsM(S13) giving a shortened ribosomal protein S13. Biochim.
taal, B.A., Mielke, T., Berninghausen, O., and Beckmann, R. (2011). Structure Biophys. Acta 1218, 27–34.
of the no-go mRNA decay complex Dom34-Hbs1 bound to a stalled 80S ribo- Fischer, N., Neumann, P., Konevega, A.L., Bock, L.V., Ficner, R., Rodnina,
some. Nat. Struct. Mol. Biol. 18, 715–720. M.V., and Stark, H. (2015). Structure of the E. coli ribosome-EF-Tu complex
at <3 Å resolution by Cs-corrected cryo-EM. Nature 520, 567–570.
Becker, T., Franckenberg, S., Wickles, S., Shoemaker, C.J., Anger, A.M., Arm-
ache, J.-P., Sieber, H., Ungewickell, C., Berninghausen, O., Daberkow, I., et al. Geiger, T., Wehner, A., Schaab, C., Cox, J., and Mann, M. (2012). Comparative
(2012). Structural basis of highly conserved ribosome recycling in eukaryotes proteomic analysis of eleven common cell lines reveals ubiquitous but varying
and archaea. Nature 482, 501–506. expression of most proteins. Mol. Cell Proteomics 11, M111.014050.
Bischoff, L., Berninghausen, O., and Beckmann, R. (2014). Molecular basis for des Georges, A., Hashem, Y., Unbehaun, A., Grassucci, R.A., Taylor, D.,
the ribosome functioning as an L-tryptophan sensor. Cell Rep. 9, 469–475. Hellen, C.U.T., Pestova, T.V., and Frank, J. (2014). Structure of the mammalian
ribosomal pre-termination complex associated with eRF1*eRF3*GDPNP.
Blanchard, S.C., Gonzalez, R.L., Kim, H.D., Chu, S., and Puglisi, J.D. (2004). Nucleic Acids Res. 42, 3409–3418.
tRNA selection and kinetic proofreading in translation. Nat. Struct. Mol. Biol.
Guydosh, N.R., and Green, R. (2014). Dom34 rescues ribosomes in 30 untrans-
11, 1008–1014.
lated regions. Cell 156, 950–962.
Brown, A., Long, F., Nicholls, R.A., Toots, J., Emsley, P., and Murshudov, G. Hossain, M.B., van der Helm, D., Antel, J., Sheldrick, G.M., Sanduja, S.K., and
(2015a). Tools for macromolecular model building and refinement into electron Weinheimer, A.J. (1988). Crystal and molecular structure of didemnin B, an
cryo-microscopy reconstructions. Acta Crystallogr. D Biol. Crystallogr. 71, antiviral and cytotoxic depsipeptide. Proc. Natl. Acad. Sci. USA 85, 4118–
136–153. 4122.
Brown, A., Shao, S., Murray, J., Hegde, R.S., and Ramakrishnan, V. (2015b). Jenner, L.B., Demeshkina, N., Yusupova, G., and Yusupov, M. (2010). Struc-
Structural basis for stop codon recognition in eukaryotes. Nature 524, 493–496. tural aspects of messenger RNA reading frame maintenance by the ribosome.
Bruno, I.J., Cole, J.C., Kessler, M., Luo, J., Motherwell, W.D., Purkis, L.H., Nat. Struct. Mol. Biol. 17, 555–560.
Smith, B.R., Taylor, R., Cooper, R.I., Harris, S.E., and Orpen, A.G. (2004). Klink, B.U., Goody, R.S., and Scheidig, A.J. (2006). A newly designed micro-
Retrieval of crystallographically-derived molecular geometry information. spectrofluorometer for kinetic studies on protein crystals in combination with
J. Chem. Inf. Comput. Sci. 44, 2133–2144. x-ray diffraction. Biophys. J. 91, 981–992.
Carelli, J.D., Sethofer, S.G., Smith, G.A., Miller, H.R., Simard, J.L., Merrick, Kobayashi, K., Kikuno, I., Kuroha, K., Saito, K., Ito, K., Ishitani, R., Inada, T.,
W.C., Jain, R.K., Ross, N.T., and Taunton, J. (2015). Ternatin and improved and Nureki, O. (2010). Structural basis for mRNA surveillance by archaeal
synthetic variants kill cancer cells by targeting the elongation factor-1A ternary Pelota and GTP-bound EF1a complex. Proc. Natl. Acad. Sci. USA 107,
complex. eLife 4, e10222. 17575–17579.
Chauvin, C., Salhi, S., Le Goff, C., Viranaicken, W., Diop, D., and Jean-Jean, O. Kobayashi, K., Saito, K., Ishitani, R., Ito, K., and Nureki, O. (2012). Structural
(2005). Involvement of human release factors eRF3a and eRF3b in translation basis for translation termination by archaeal RF1 and GTP-bound EF1a com-
termination and regulation of the termination complex formation. Mol. Cell. plex. Nucleic Acids Res. 40, 9319–9328.
Biol. 25, 5801–5811. Kramer, E.B., Vallabhaneni, H., Mayer, L.M., and Farabaugh, P.J. (2010). A
Chen, V.B., Arendall, W.B., 3rd, Headd, J.J., Keedy, D.A., Immormino, R.M., comprehensive analysis of translational missense errors in the yeast Saccha-
Kapral, G.J., Murray, L.W., Richardson, J.S., and Richardson, D.C. (2010). romyces cerevisiae. RNA 16, 1797–1808.
MolProbity: All-atom structure validation for macromolecular crystallography. Krastel, P., Roggo, S., Schirle, M., Ross, N.T., Perruccio, F., Aspesi, P., Jr.,
Acta Crystallogr. D Biol. Crystallogr. 66, 12–21. Aust, T., Buntin, K., Estoppey, D., Liechty, B., et al. (2015). Nannocystin A:
An Elongation Factor 1 inhibitor from Myxobacteria with differential anti-can-
Chen, S., McMullan, G., Faruqi, A.R., Murshudov, G.N., Short, J.M., Scheres,
cer properties. Angew. Chem. Int. Ed. Engl. 54, 10149–10154.
S.H.W., and Henderson, R. (2013). High-resolution noise substitution to mea-
sure overfitting and validate resolution in 3D structure determination by single Kucukelbir, A., Sigworth, F.J., and Tagare, H.D. (2014). Quantifying the local
particle electron cryomicroscopy. Ultramicroscopy 135, 24–35. resolution of cryo-EM density maps. Nat. Methods 11, 63–65.
Lee, H.H., Kim, Y.-S., Kim, K.H., Heo, I., Kim, S.K., Kim, O., Kim, H.K., Yoon,
Cheng, Z., Saito, K., Pisarev, A.V., Wada, M., Pisareva, V.P., Pestova, T.V.,
J.Y., Kim, H.S., Kim, D.J., et al. (2007). Structural and functional insights into
Gajda, M., Round, A., Kong, C., Lim, M., et al. (2009). Structural insights into
Dom34, a key component of no-go mRNA decay. Mol. Cell 27, 938–950.
eRF3 and stop codon recognition by eRF1. Genes Dev. 23, 1106–1118.
Li, L.H., Timmins, L.G., Wallace, T.L., Krueger, W.C., Prairie, M.D., and Im,
Cochella, L., and Green, R. (2005). An active role for tRNA in decoding beyond
W.B. (1984). Mechanism of action of didemnin B, a depsipeptide from the
codon:anticodon pairing. Science 308, 1178–1180.
sea. Cancer Lett. 23, 279–288.
Crepin, T., Shalak, V.F., Yaremchuk, A.D., Vlasenko, D.O., McCarthy, A., Ne- Li, X., Mooney, P., Zheng, S., Booth, C.R., Braunfeld, M.B., Gubbens, S.,
grutskii, B.S., Tukalo, M.A., and El’skaya, A.V. (2014). Mammalian translation Agard, D.A., and Cheng, Y. (2013). Electron counting and beam-induced mo-
elongation factor eEF1A2: X-ray structure and new features of GDP/GTP ex- tion correction enable near-atomic-resolution single-particle cryo-EM. Nat.
change mechanism in higher eukaryotes. Nucleic Acids Res. 42, 12939– Methods 10, 584–590.
12948.
Loenarz, C., Sekirnik, R., Thalhammer, A., Ge, W., Spivakovsky, E., Mackeen,
Crews, C.M., Collins, J.L., Lane, W.S., Snapper, M.L., and Schreiber, S.L. M.M., McDonough, M.A., Cockman, M.E., Kessler, B.M., Ratcliffe, P.J., et al.
(1994). GTP-dependent binding of the antiproliferative agent didemnin to elon- (2014). Hydroxylation of the eukaryotic ribosomal decoding center affects
gation factor 1 alpha. J. Biol. Chem. 269, 15411–15414. translational accuracy. Proc. Natl. Acad. Sci. USA 111, 4019–4024.
Demeshkina, N., Jenner, L., Westhof, E., Yusupov, M., and Yusupova, G. Matheisl, S., Berninghausen, O., Becker, T., and Beckmann, R. (2015). Struc-
(2012). A new understanding of the decoding principle on the ribosome. Nature ture of a human translation termination complex. Nucleic Acids Res. 43, 8615–
484, 256–259. 8626.
Cell 167, 1229–1240, November 17, 2016 1239

McCaughan, K.K., Brown, C.M., Dalphin, M.E., Berry, M.J., and Tate, W.P. nation, and quality control of protein synthesis. Proc. Natl. Acad. Sci. USA 107,
(1995). Translational termination efficiency in mammals is influenced by the 19242–19247.
base following the stop codon. Proc. Natl. Acad. Sci. USA 92, 5431–5435. Scheres, S.H. (2014). Beam-induced motion correction for sub-megadalton
Muhs, M., Hilal, T., Mielke, T., Skabkin, M.A., Sanbonmatsu, K.Y., Pestova, cryo-EM particles. eLife 3, e03665.
T.V., and Spahn, C.M.T. (2015). Cryo-EM of ribosomal 80S complexes with
Scheres, S.H.W. (2015). Semi-automated selection of cryo-EM particles in
termination factors reveals the translocated cricket paralysis virus IRES.
RELION-1.3. J. Struct. Biol. 189, 114–122.
Mol. Cell 57, 422–432.
Schmeing, T.M., Voorhees, R.M., Kelley, A.C., Gao, Y.-G.G., Murphy, F.V., 4th,
Murshudov, G.N., Skubák, P., Lebedev, A.A., Pannu, N.S., Steiner, R.A., Nich-
Weir, J.R., and Ramakrishnan, V. (2009). The crystal structure of the ribosome
olls, R.A., Winn, M.D., Long, F., and Vagin, A.A. (2011). REFMAC5 for the
bound to EF-Tu and aminoacyl-tRNA. Science 326, 688–694.
refinement of macromolecular crystal structures. Acta Crystallogr. D Biol.
Crystallogr. 67, 355–367. Shao, S., von der Malsburg, K., and Hegde, R.S. (2013). Listerin-dependent
nascent protein ubiquitination relies on ribosome subunit dissociation. Mol.
Noeske, J., Wasserman, M.R., Terry, D.S., Altman, R.B., Blanchard, S.C., and
Cell 50, 637–648.
Cate, J.H.D. (2015). High-resolution structure of the Escherichia coli ribosome.
Nat. Struct. Mol. Biol. 22, 336–341. Sharma, A., Mariappan, M., Appathurai, S., and Hegde, R.S. (2010). In vitro
Ogle, J.M., and Ramakrishnan, V. (2005). Structural insights into translational dissection of protein translocation into the mammalian endoplasmic reticulum.
fidelity. Annu. Rev. Biochem. 74, 129–177. Methods Mol. Biol. 619, 339–363.
Ogle, J.M., Brodersen, D.E., Clemons, W.M., Jr., Tarry, M.J., Carter, A.P., and Shoemaker, C.J., and Green, R. (2011). Kinetic analysis reveals the ordered
Ramakrishnan, V. (2001). Recognition of cognate transfer RNA by the 30S coupling of translation termination and ribosome recycling in yeast. Proc.
ribosomal subunit. Science 292, 897–902. Natl. Acad. Sci. USA 108, E1392–E1398.
Ogle, J.M., Murphy, F.V., Tarry, M.J., and Ramakrishnan, V. (2002). Selection Shoemaker, C.J., and Green, R. (2012). Translation drives mRNA quality con-
of tRNA by the ribosome requires a transition from an open to a closed form. trol. Nat. Struct. Mol. Biol. 19, 594–601.
Cell 111, 721–732. Shoemaker, C.J., Eyler, D.E., and Green, R. (2010). Dom34:Hbs1 promotes
Pape, T., Wintermeyer, W., and Rodnina, M.V. (1998). Complete kinetic mech- subunit dissociation and peptidyl-tRNA drop-off to initiate no-go decay. Sci-
anism of elongation factor Tu-dependent binding of aminoacyl-tRNA to the A ence 330, 369–372.
site of the E. coli ribosome. EMBO J. 17, 7490–7497. Tang, G., Peng, L., Baldwin, P.R., Mann, D.S., Jiang, W., Rees, I., and Ludtke,
Parmeggiani, A., Krab, I.M., Okamura, S., Nielsen, R.C., Nyborg, J., and Nis- S.J.; EMAN2 (2007). EMAN2: An extensible image processing suite for elec-
sen, P. (2006). Structural basis of the action of pulvomycin and GE2270 A on tron microscopy. J. Struct. Biol. 157, 38–46.
elongation factor Tu. Biochemistry 45, 6846–6857.
Taylor, D., Unbehaun, A., Li, W., Das, S., Lei, J., Liao, H.Y., Grassucci, R.A.,
Pasqualato, S., and Cherfils, J. (2005). Crystallographic evidence for sub- Pestova, T.V., and Frank, J. (2012). Cryo-EM structure of the mammalian eu-
strate-assisted GTP hydrolysis by a small GTP binding protein. Structure 13, karyotic release factor eRF1-eRF3-associated termination complex. Proc.
533–540. Natl. Acad. Sci. USA 109, 18413–18418.
Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., van den Elzen, A.M.G., Henri, J., Lazar, N., Gas, M.E., Durand, D., Lacroute, F.,
Meng, E.C., and Ferrin, T.E. (2004). UCSF Chimera—A visualization system Nicaise, M., van Tilbeurgh, H., Séraphin, B., and Graille, M. (2010). Dissection
for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612. of Dom34-Hbs1 reveals independent functions in two RNA quality control
Piepenburg, O., Pape, T., Pleiss, J.A., Wintermeyer, W., Uhlenbeck, O.C., and pathways. Nat. Struct. Mol. Biol. 17, 1446–1452.
Rodnina, M.V. (2000). Intact aminoacyl-tRNA is required to trigger GTP hydro-
Vetter, I.R., and Wittinghofer, A. (2001). The guanine nucleotide-binding switch
lysis by elongation factor Tu on the ribosome. Biochemistry 39, 1734–1738.
in three dimensions. Science 294, 1299–1304.
Pisareva, V.P., Skabkin, M.A., Hellen, C.U.T., Pestova, T.V., and Pisarev, A.V.
Voorhees, R.M., and Ramakrishnan, V. (2013). Structural basis of the transla-
(2011). Dissociation by Pelota, Hbs1 and ABCE1 of mammalian vacant 80S ri-
tional elongation cycle. Annu. Rev. Biochem. 82, 203–236.
bosomes and stalled elongation complexes. EMBO J. 30, 1804–1817.
Preis, A., Heuer, A., Barrio-Garcia, C., Hauser, A., Eyler, D.E., Berninghausen, Voorhees, R.M., Schmeing, T.M., Kelley, A.C., and Ramakrishnan, V. (2010).
O., Green, R., Becker, T., and Beckmann, R. (2014). Cryoelectron microscopic The mechanism for activation of GTP hydrolysis on the ribosome. Science
structures of eukaryotic translation termination complexes containing eRF1- 330, 835–838.
eRF3 or eRF1-ABCE1. Cell Rep. 8, 59–65. Wallrapp, C., Verrier, S.B., Zhouravleva, G., Philippe, H., Philippe, M., Gress,
Rinehart, K.L., Jr., Gloer, J.B., Hughes, R.G., Jr., Renis, H.E., McGovren, J.P., T.M., and Jean-Jean, O. (1998). The product of the mammalian orthologue
Swynenberg, E.B., Stringfellow, D.A., Kuentzel, S.L., and Li, L.H. (1981). of the Saccharomyces cerevisiae HBS1 gene is phylogenetically related to eu-
Didemnins: Antiviral and antitumor depsipeptides from a caribbean tunicate. karyotic release factor 3 (eRF3) but does not carry eRF3-like activity. FEBS
Science 212, 933–935. Lett. 440, 387–392.
Rosenthal, P.B., and Henderson, R. (2003). Optimal determination of particle Whitford, P.C., Geggier, P., Altman, R.B., Blanchard, S.C., Onuchic, J.N., and
orientation, absolute hand, and contrast loss in single-particle electron cryomi- Sanbonmatsu, K.Y. (2010). Accommodation of aminoacyl-tRNA into the ribo-
croscopy. J. Mol. Biol. 333, 721–745. some involves reversible excursions along multiple pathways. RNA 16, 1196–
Saito, K., Kobayashi, K., Wada, M., Kikuno, I., Takusagawa, A., Mochizuki, M., 1204.
Uchiumi, T., Ishitani, R., Nureki, O., and Ito, K. (2010). Omnipotent role of Zhang, K. (2016). Gctf: Real-time CTF determination and correction. J. Struct.
archaeal elongation factor 1 alpha (EF1a in translational elongation and termi- Biol. 193, 1–12.
1240 Cell 167, 1229–1240, November 17, 2016

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
Rabbit monoclonal anti-eEF1A1 antibody Abcam Cat. #ab140632
Rabbit polyclonal anti-uL6 antibody Santa Cruz Cat. #102085; RRID: AB_2182219
Rabbit polyclonal anti-uS9 antibody Santa Cruz Cat. #sc-102087; RRID: AB_2269633
3XFlag-TEV WT Hbs1l (human) Shao et al., 2013 N/A
3XFlag-TEV H348A Hbs1l-DN (human) Shao et al., 2013 N/A
3XFlag-TEV eRF3a (human) This study N/A
WT eRF1 (human) This study N/A
eRF1(AAQ) (human) Brown et al., 2015b N/A
His-Pelota (human) Shao et al., 2013 N/A
E. coli Poly(A) polymerase New England Biolabs Cat. #M0276L
3X Flag peptide Sigma-Aldrich Cat. #F4799
Anti-Flag M2 affinity resin Sigma-Aldrich Cat. #A2220
Ni-NTA agarose QIAGEN Cat. #30210
Didemnin B This study (Jack Taunton) CAS #77327-05-0
Cycloheximide Sigma-Aldrich Cat. #C4859; CAS #66-81-9
Emetine Calbiochem Cat. #324693; CAS #316-42-7
Anisomycin Sigma-Aldrich Cat. #A9789; CAS #22862-76-6
EasyTag L-[35S]-Methionine Perkin Elmer Cat. #NEG709A005MC
CAP (diguanosine triphosphate cap) New England Biolabs Cat. #S1404L
RNasin Promega Cat. #N251
SP6 polymerase New England Biolabs Cat. #M0207L
Creatine kinase Roche Cat. #127566
Creatine phosphate Roche Cat. #621714
Amino acid kit Sigma Cat. #09416
Deposited Data
80S,empty A site density map This study EMDB: 4129
80S,aa-tRNA,eEF1A density map This study EMDB: 4130
80S,eRF1,eRF3 density map This study EMDB: 4131
80S,eRF1 density map This study EMDB: 4132
80S,eRF1,ABCE1 (combined) density map This study EMDB: 4133
80S,Pelota,Hbs1l (truncated mRNA) density map This study EMDB: 4134
80S,Pelota,Hbs1l (stop mRNA) density map This study EMDB: 4135
80S,Pelota,Hbs1l (polyA mRNA) density map This study EMDB: 4136
80S,Pelota,Hbs1l (combined) density map This study EMDB: 4137
80S,aa-tRNA,eEF1A atomic model This study PDB: 5LZS
80S,eRF1,eRF3 atomic model This study PDB: 5LZT
80S,eRF1 atomic model This study PDB: 5LZU
80S,eRF1,ABCE1 (combined) atomic model This study PDB: 5LZV
80S,Pelota,Hbs1l (truncated mRNA) atomic model This study PDB: 5LZW
80S,Pelota,Hbs1l (stop mRNA) atomic model This study PDB: 5LZX
80S,Pelota,Hbs1l (polyA mRNA) atomic model This study PDB: 5LZY
80S,Pelota,Hbs1l (combined) atomic model This study PDB: 5LZZ

Continued
HEK293T ATCC CRL-3216
E. coli BL21 (DE3) Thermo Fisher C600003
E. coli BL21 (DE3) pLysS Thermo Fisher C606003
Recombinant DNA
pcDNA 3XFlag-TEV WT Hbs1l Shao et al., 2013 N/A
pcDNA 3XFlag-TEV H348A Hbs1l Shao et al., 2013 N/A
pcDNA 3XFlag-TEV eRF3a This study N/A
pRSETA 6XHis-TEV eRF1 This study N/A
pRSETA 6XHis-TEV eRF1(AAQ) Brown et al., 2015b N/A
pSP64 3XFlag VHP Sec61-UGA(68) Brown et al., 2015b N/A
pSP64 3XFlag VHP Sec61-68 Shao et al., 2013 N/A
pSP64 3XFlag KRas This study N/A
Primer: SP64 50 Fwd: TCATACACATACGATTTAGG Sharma et al., 2010 N/A
Primer: SP64 Rev: CAATACGCAAACCGCCTC Sharma et al., 2010 N/A
Primer: Val68 Rev: AACTTTGAGCCCAGGTGAATC Shao et al., 2013 N/A
EPU software FEI https://www.fei.com/software/epu/
Motioncorr Li et al., 2013 http://cryoem.ucsf.edu/software/driftcorr.html
Gctf v0.5 Zhang, 2016 http://www.mrc-lmb.cam.ac.uk/kzhang/Gctf/
RELION v1.4 Scheres, 2015 http://www2.mrc-lmb.cam.ac.uk/relion
ResMap v1.1.4 Kucukelbir et al., 2014 http://resmap.sourceforge.net/
Coot v0.8 Emsley et al., 2010 http://www2.mrc-lmb.cam.ac.uk/personal/pemsley/coot/
REFMAC v5.8 Murshudov et al., 2011 https://www2.mrc-lmb.cam.ac.uk/groups/murshudov/
content/refmac/refmac.html
MolProbity v4.3 Chen et al., 2010 http://molprobity.biochem.duke.edu/
Phenix.elbow dev-2499 Adams et al., 2010 http://www.phenix-online.org/documentation/reference/
elbow.html
UCSF Chimera v1.10.2 Pettersen et al., 2004 https://www.cgl.ucsf.edu/chimera/
PyMOL v1.7 Schrödinger, LLC http://www.pymol.org
Other
RRL in vitro translation mix Sharma et al., 2010 N/A
TransIT 293 Mirus MIR 2705
Requests for reagents may be directed to Lead Contact Ramanujan S. Hegde (rhegde@mrc-lmb.cam.ac.uk).
Cell Lines
HEK293T cells used for protein expression were maintained in DMEM (high glucose, GlutaMAX, pyruvate) with 10% fetal bovine
serum.
METHOD DETAILS
Constructs
An SP64-based plasmid encoding 3X Flag-tagged Sec61b containing the autonomously folding villin headpiece (VHP) domain was
used to generate transcripts truncated after the Val68 codon of Sec61b (Shao et al., 2013). For termination complexes, the same

construct was modified to include the UGA stop codon after the Val68 codon (Brown et al., 2015b). To generate elongation com-
plexes, the open reading frame of KRas was cloned after a 3X Flag tag in an SP64-based plasmid using conventional techniques.
In vitro transcription reactions were performed using PCR products generated with primers that amplify from the SP6 promoter to
either the 30 UTR of the SP64 vector (Sharma et al., 2010) or to directly after Val68 of Sec61b (Shao et al., 2013).
The open reading frame of wild-type eRF1 (Brown et al., 2015b) was inserted after a N-terminal 6X His tag and a TEV cleavage site,
and the Pelota open reading frame (Shao et al., 2013) was inserted before a C-terminal TEV cleavage site and 6X His tag in the
pRSETA vector using conventional techniques. Point mutations in eRF1 were generated using Phusion mutagenesis (Brown
et al., 2015b). The open reading frames of human Hbs1l (Shao et al., 2013) and eRF3a (Origene) were cloned after a 3X Flag tag
in a pcDNA3-based vector using conventional procedures.
Purification of recombinant proteins

Wild-type and mutant eRF1 (eRF1(AAQ)) were expressed in, and purified from, Escherichia coli BL21(DE3) cells (Brown et al., 2015b).
His-tagged Pelota was expressed and purified from Escherichia coli BL21(DE3) pLysS cells (Shao et al., 2013). Transformed cells
were induced at A600 = 0.4-0.6 with 0.2 mM IPTG for 2 hr at 37 C and lysed with a microfluidizer in lysis buffer (1X PBS, pH 7.5,
250 mM NaCl, 10 mM imidazole, 1 mM DTT) containing 1X protease inhibitor cocktail (Roche). Lysates were clarified by centrifugation
and the supernatant passed over a NiNTA column. After washing with 25 column volumes of lysis buffer, elutions were carried out
with 250 mM imidazole in lysis buffer. Peak fractions were pooled, dialyzed overnight against 50 mM HEPES, pH 7.4, 150 mM KOAc,
5 mM Mg(OAc)2, 10 mM imidazole, 10% glycerol, 1 mM DTT. TEV protease was included during dialysis of eRF1 proteins. TEV pro-
tease and cleaved His tag were removed by passage over a NiNTA column.
Flag-tagged recombinant eRF3a and Hbs1l were purified from HEK293T cells (Shao et al., 2013). eRF3a was used for structural
analysis as it is the primary release factor isoform used to terminate translation in mammalian cells, with eRF3b expression restricted
to the brain (Chauvin et al., 2005). Transfection was with Mirus TransIT according to the manufacturer’s instructions. Cells were har-
vested after 3 days and lysed in 50 mM HEPES, pH 7.4, 100 mM KOAc, 5 mM Mg(OAc)2, 1% Triton X-100, 1 mM DTT, and 1X pro-
tease inhibitor cocktail (Roche). The post-nuclear supernatant lysate was incubated with anti-Flag (M2) agarose beads (Sigma) at 4 C
for 1-1.5 hr. The resin was washed with 6 mL lysis buffer, followed by 6 mL 50 mM HEPES, pH 7.4, 250 mM KOAc, 5 mM Mg(OAc)2,
1% Triton X-100, 1 mM DTT, followed by 6 mL elution buffer (50 mM HEPES, pH 7.4, 100 mM KOAc, 5 mM Mg(OAc)2, 1 mM DTT).
Elution was carried out with two sequential incubations of one column volume of 0.1 mg/mL 3X Flag peptide (Sigma) in elution buffer
for 25 min each at room temperature. The elutions were combined, flash frozen, and directly used for downstream assays.
In vitro transcription and translation reactions

Transcription reactions were conducted with 5-20 ng/ml purified PCR product, in 40 mM HEPES pH 7.4, 6 mM MgCl2, 20 mM sper-
midine (Sigma), 10 mM DTT, 0.5 mM ATP, 0.5 mM UTP, 0.5 mM CTP, 0.1 mM GTP (Roche), 0.5 mM CAP (NEB), 0.4-0.8 U/mL rRNasin
(Promega), and 0.4 U/mL SP6 polymerase (NEB) at 37 C for 60 min (Sharma et al., 2010). In vitro translation reactions in a home-
made rabbit reticulocyte (RRL) system containing 1/20 volume of transcription reaction, 0.5 mCi/mL 35S-methionine (Perkin Elmer
EasyTag), nuclease-treated crude rabbit reticulocyte (Green Hectares), 20 mM HEPES, 10 mM KOH, 40 mg/mL creatine kinase
(Roche), 20 mg/mL pig liver tRNA, 12 mM creatine phosphate (Roche), 1 mM ATP (Roche), 1 mM GTP (Roche), 50 mM KOAc,
2 mM MgCl2, 1 mM glutathione, 0.3 mM spermidine, and 40 mM of each amino acid except for methionine (Sigma), were at 32 C
for 25 min unless otherwise indicated (Shao et al., 2013; Sharma et al., 2010).
Sample Preparations
Elongation complex
A transcript encoding 3X Flag-tagged KRas was translated in vitro. A final concentration of 50 mM didemnin B was added after 7 min
to stall ribosome-nascent chain complexes (RNCs) at the stage of tRNA delivery by eEF1A and the reaction allowed to proceed to
25 min. 4 mL translation reaction was directly incubated with 100 mL (packed volume) of anti-Flag M2 beads (Sigma) for 1 hr at 4 C
with gentle mixing. The beads were washed sequentially with 6 mL 50 mM HEPES, pH 7.4, 100 mM KOAc, 5 mM Mg(OAc)2, 0.1%
Triton X-100, 1 mM DTT; 6 mL 50 mM HEPES, pH 7.4, 250 mM KOAc, 5 mM Mg(OAc)2, 0.5% Triton X-100, 1 mM DTT; and 6 mL RNC
buffer (50 mM HEPES, pH 7.4, 100 mM KOAc, 5 mM Mg(OAc)2, 1 mM DTT). Two sequential elutions were carried out with 100 mL
0.1 mg/mL 3X Flag peptide (Sigma) in RNC buffer at room temperature for 25 min. The elutions were combined and centrifuged
at 100,000 rpm at 4 C for 40 min in a TLA120.2 rotor (Beckman Coulter) before resuspension of the ribosomal pellet in RNC buffer
containing 5 mM didemnin B. The resuspended RNCs were adjusted to 120 nM and directly frozen to grids for cryo-EM analysis.
Termination complexes
3X Flag-tagged Sec61b containing the autonomously-folding villin headpiece domain with a UGA stop codon was translated in vitro
with 0.5 mM eRF1(AAQ) to trap termination complexes (Brown et al., 2015b). After 25 min, translation reactions were adjusted to
750 mM KOAc, 15 mM Mg(OAc)2 and spun on a 0.5M sucrose cushion containing 50 mM HEPES, pH 7.4, 750 mM KOAc, 15 mM
Mg(OAc)2 at 100,000 rpm for 1 hr at 4 C in a TLA100.3 rotor (Beckman Coulter). The ribosome pellets from 4 mL translation reactions
were resuspended in RNC buffer and incubated with 100 mL (packed volume) of anti-Flag M2 beads (Sigma) for 1-1.5 hr at 4 C with
gentle mixing. The beads were washed sequentially with 6 mL 50 mM HEPES, pH 7.4, 100 mM KOAc, 5 mM Mg(OAc)2, 0.1% Triton
X-100, 1 mM DTT; 6 mL 50 mM HEPES, pH 7.4, 250 mM KOAc, 5 mM Mg(OAc)2, 0.5% Triton X-100, 1 mM DTT; and 6 mL RNC buffer.

Two sequential elutions were carried out with 100 mL 0.1 mg/mL 3X Flag peptide (Sigma) in RNC buffer at room temperature for
25 min. The elutions were combined and incubated with wild-type eRF1, wild-type eRF3, and 0.5 mM GMPPCP to generate the
eRF1-eRF3 complex, or with wild-type eRF1, wild-type eRF3, and 0.5 mM GTP to generate the accommodated eRF1 complex.
The reactions were centrifuged at 100,000 rpm at 4 C for 40 min in a TLA120.2 rotor (Beckman Coulter) before resuspension of
the ribosomal pellet in RNC buffer containing 600 nM of recombinant eRF1 and eRF3 with 1 mM GMPPCP or GTP. Complexes con-
taining eRF1(AAQ) and ABCE1 were prepared as previously (Brown et al., 2015b).
Rescue complexes
3X Flag-tagged Sec61b containing the autonomously-folding villin headpiece domain truncated after Val68 of Sec61b without or with a
polyA tail was translated in vitro as previously described (Shao et al., 2013). After 7 min, an excess of dominant negative Hbs1l was
added and the translation reaction allowed to proceed to 25 min before being isolated through a high salt cushion and affinity purified
via the Flag-tagged as described above. The combined elutions in RNC buffer were incubated with Pelota, wild-type Hbs1l, and 0.5 mM
GMPPCP to assemble stall-recognition complexes. The reactions were then centrifuged at 100,000 rpm at 4 C for 40 min in a TLA120.2
rotor (Beckman Coulter) before resuspension of the ribosomal pellet in RNC buffer containing 600 nM of recombinant Pelota and WT
Hbs1l with 1 mM GMPPCP. The same strategy was used to assemble the rescue complex on a stop codon-containing substrate,
except that the substrate contained a UGA stop codon after Val68 and was translated in the presence of eRF1(AAQ).
Reference table for the biological composition of final complexes used for cryo-EM
Ribosome mRNA substrate Recombinant proteins

Complex (120 nM) (see Figure S1) (600 nM each) Other
80S,aa-tRNA,eEF1A Rabbit (RRL) Long NC None 5 mM didemnin B
80S,eRF1,eRF3 Rabbit (RRL) NC-stop (UGA) Human WT eRF1 1 mM GMPPCP
Human WT eRF3
80S,Pelota,Hbs1l Rabbit (RRL) Trunc. NC OR Human WT Pelota 1 mM GMPPCP
polyA NC OR Human WT Hbs1l
NC-stop (UGA)
80S,eRF1 Rabbit (RRL) NC-stop (UGA) Human WT eRF1 1 mM GTP
Human WT eRF3
80S,eRF1,ABCE1 Rabbit (RRL) NC-stop eRF1(AAQ) None
(Brown et al., 2015b)
Cryo-EM grid formation

R2/2 cryo-EM grids (Quantifoil) were covered with continuous carbon (estimated to be 50 Å thick) and glow discharged to increase
hydrophilicity. The grids were transferred to a Vitrobot MKIII (FEI) with the chamber set at 4 C and 100% ambient humidity. Aliquots of
purified RNCs (3 mL, 120 nM concentration in 50 mM HEPES pH 7.4, 100 mM KOAc, 5 mM Mg(OAc)2, 1 mM DTT plus any additions
as detailed in the Reference table above) were applied to the grid and incubated for 30 s, before blotting for 3 s to remove excess
solution, and vitrified in liquid ethane.
Miscellaneous biochemistry
SDS-PAGE was with 10% or 12% Tris-tricine polyacrylamide gels run at 100 V for 85-90 min. For autoradiography and direct visu-
alization of protein bands, gels were fixed and stained with Coomassie R250, destained and directly imaged, or dried and exposed on
MR film (Kodak Carestream BioMax). For immunoblotting, gels were transferred to 0.2 mm nitrocellulose membrane (Bio-Rad) in a wet
transfer system at 100V for 50 min. Blots were blocked and incubated with primary and secondary antibodies in 5% milk in PBS +
0.1% Tween. Antibodies were used at the following concentrations: 1:4000 aHbs1l, 1:4000 aABCE1, 1:1000 aeRF1, 1:1000 aeEF1A,
1:100 auL6, and 1:100 auS9. Secondary antibodies were used at 1:2500 or 1:5000.
Functional assays were conducted with 35S-methionine-labeled RNCs isolated under high salt conditions and affinity purified via
the Flag tag exactly as described for cryo-EM grid preparation. The radiolabeled RNCs were then incubated with the recombinant
proteins, 1 mM puromycin, or 0.5 mM GTP or GMPPCP at 32 C for 15 min before analysis by SDS-PAGE and autoradiography.
To sequence 28S rRNA, ribosomes were isolated from crude RRL from two rabbits under high salt conditions, and the RNA ex-
tracted using the RNeasy system (QIAGEN). Electrophoresis on 5% TBE-acrylamide gels and toluidine blue staining verified high re-
covery of 28S and 18S rRNA bands. The RNA sample was reverse transcribed with ArrayScript reverse transcriptase (Thermo Fisher)
according to the manufacturer’s instructions and used for PCR reactions to amplify and sequence portions of the 28S sequence with
Sanger sequencing. This revealed some rabbit-to-rabbit variability, and allowed for certain portions (but not all) of the 28S rRNA
sequence to be determined with high confidence based on alignments with highly conserved regions. These regions were incorpo-
rated into the final model (see below).

Data collection
Details of the data collection for each complex are presented in Table S1. All micrographs were taken using quasi-automated data
collection (EPU software, FEI) on a Titan Krios microscope equipped with a XFEG electron source using 300 kV acceleration voltage.
Images were recorded on a Falcon II direct electron detector (FEI). For the termination and rescue complexes, a dose rate of 30
electrons per Å2 per second was used at a calibrated magnification of 104,478, resulting in a pixel size of 1.34 Å. Movie frames
were collected at a rate of 16 s-1, with total exposures of 1.0-1.1 s. For the elongation complex and the comparative complex
with an empty A site (Figure S2), a higher magnification (134,615, resulting in a pixel size of 1.04 Å) and a higher dose rate (40 elec-
trons per Å2 per second) were used. In total, 13 independent data collections were used to collect 17,681 micrographs, from which
nine structures were solved at resolutions ranging from 3.1 to 4.0 Å.
Image Processing
Details for the processing of each complex are presented in Table S1. Movies frames were aligned using whole-image motion correc-
tion (Li et al., 2013) to reduce beam-induced blurring of the images. Micrographs that displayed evidence of astigmatism, charging,
contamination, and poor contrast were excluded. Parameters of the contrast transfer function for each motion-corrected micrograph
were obtained using Gctf (Zhang, 2016). Ribosome particles were selected from the images using the interactive semi-automatic
swarm tool in the e2boxer.py program of EMAN2 (Tang et al., 2007) or with semi-automated particle picking implemented in RELION
1.4 (Scheres, 2015). Reference-free two-dimensional class averaging was used to discard non-ribosomal particles, with those picked
using RELION subjected to an additional sorting step (Scheres, 2015).
Particles retained after two-dimensional classification underwent an initial three-dimensional refinement using a 30 Å low-pass
filtered cryo-EM reconstruction of a rabbit ribosome (EMDB 3039) as an initial model. After refinement, statistical particle-based
movie correction was performed in RELION 1.4 (Scheres, 2015) that included a resolution and dose-dependent model for the radi-
ation damage, in which each frame is B-factor weighted as estimated from single-frame reconstructions (Scheres, 2014).
The resulting ‘shiny’ particles were then subjected to three-dimensional classification to separate different compositions and con-
formations of the ribosome complexes and isolate particles with high occupancy of the desired factors. This step was omitted for the
80S,aa-tRNA,eEF1A complex. Particles retained after three-dimensional classification were subjected to focused classification with
signal subtraction (FCwSS) (Bai et al., 2015) to further isolate particles containing the desired factor. After FCwSS, an additional round
of 3D classification and refinement were used to obtain the final maps.
Reported resolutions are based on the Fourier shell correlation (FSC) 0.143 criterion (Rosenthal and Henderson, 2003). High-res-
olution noise substitution was used to correct for the effects of a soft mask on FSC curves (Chen et al., 2013). Before visualization,
density maps were corrected for the modulation transfer function of the Falcon II detector and then sharpened by applying a negative
B-factor that was estimated using automated procedures (Rosenthal and Henderson, 2003). Local resolution was quantified using
ResMap (Kucukelbir et al., 2014).
Model building
Ribosome
Both subunits of the mammalian ribosome (PDB accession code 3JAH) (Brown et al., 2015b) were individually docked into the map
with Chimera (Pettersen et al., 2004). The atomic models of the ribosomal proteins and 18S rRNA were modified in Coot v0.8 to agree
with the rabbit sequences and optimized for fit to density using rigid body fitting followed by real-space refinement in Coot (Brown
et al., 2015a; Emsley et al., 2010). Where possible, the atomic model of 28S rRNA was modified to reflect the rabbit sequence
(OryCun2.0 GCA_000003625.1). However, since this sequence had insufficient coverage, we also attempted to sequence the 28S
rRNA directly from ribosomes extracted from RRL (see above for experimental procedures). The model was then modified to agree
with regions with high sequencing confidence (bases 725-965, 1271-2888, 3584-3867) or, in well-conserved areas, to better match
the complete 28S rRNA sequences from human (NCBI accession NR_003287.2) and rat (a closely related rodent; NCBI accession
NR_046246.1). Human numbering is used for the rRNA (NCBI accession NR_003287.2 for 28S, and X03205.1 for 18S). See Table
S3 for the numbering and sequence in the ribosome model aligned with the human reference.
Elongation complex
Because our structure represents a mixture of species, the starting models for the P- and E-site tRNAs and the mRNA were
taken from our previous structure (PDB accession code 3JAH) (Brown et al., 2015b). P-site tRNAVal was also used as an initial
model for the A-site tRNA. The fit of the tRNAs and mRNA to the density were optimized using rigid body fitting and real space
refinement. The crystal structure of yeast eEF1A (PDB accession code 1F60) (Andersen et al., 2000) was docked into density at
the GAC. The switch I loop region was taken from the structure of EF-Tu bound to GMPPNP (PDB accession code 2C78) (Par-
meggiani et al., 2006). The model of eEF1A was modified to the rabbit sequence (UniProt ID: P68105) and manually fit to
density.
The small molecule crystal structure of didemnin B (Hossain et al., 1988) was docked into empty density near eEF1A and adjusted
in Coot using real space refinement with chemical restraints generated using Phenix.elbow (Adams et al., 2010). The geometry of
didemnin B model was analyzed using Mogul, a molecular-geometry library derived from the Cambridge Structural Database
(CSD) (Bruno et al., 2004). Some of the restraints generated from Phenix.elbow were adjusted to match the median angles and dis-
tances identified by Mogul. These modified restraints were then applied during refinement in Coot and REFMAC.

Distinguishing the nucleotide status of eEF1A
As the elongation complex was isolated directly from lysate, the nucleotide status of eEF1A is undefined. However, the observed
density and the nucleotide coordination environment is consistent with the cryo-EM structure of EF-Tu,GDP (Fischer et al., 2015)
and the high-resolution crystal structure of HRas,GDP (Klink et al., 2006). Thus, the bound nucleotide can be confidently assigned
as a GDP with a Mg ion coordinated to the b-phosphate (Figure S6A). The presence of GDP is consistent with data showing that di-
demnin B does not inhibit the GTPase activity of eEF1A (Crews et al., 1994), and with didemnin B sharing a mechanism, as well as a
binding site, with kirromycin, which traps the GDP-bound state of EF-Tu (Schmeing et al., 2009).
The presence of GTP can be excluded, as the density is insufficient to account for a g-phosphate and a coordinated Mg ion, which
bind together in translational GTPases (Voorhees et al., 2010) (Figure S6). We can also exclude the possibility that didemnin B has
trapped the state after GTP hydrolysis but prior to release of inorganic phosphate (Pi), as related GTPases in the GDP+Pi state also
coordinate a Mg ion (Pasqualato and Cherfils, 2005) (Figure S6A).
Termination complex
The model for human eRF1 (UniProt: P62495) was taken from our previous structure of the 80S,eRF1,ABCE1 complex bound to a
UGA stop codon (PDB accession code 3JAI) (Brown et al., 2015b) and fitted to the 80S,eRF1,eRF3 and 80S,eRF1 maps. The in-
dividual domains of eRF1 were moved manually and rigid-body fitted in Coot to fit the pre-accommodated state in the
80S,eRF1,eRF3 map, while only minor modifications were necessary to model accommodated eRF1 in the 80S,eRF1 map. The
AAQ sequence was mutated to GGQ in the 80S,eRF1,eRF3 model to reflect that this complex was reconstructed using wild-
type eRF1 and not with catalytically inactive eRF1(AAQ).
A model for eRF3a (Uniprot: P15170) was constructed using the crystal structure of the human eRF1,eRF3 complex (PDB acces-
sion code 3E1Y) (Cheng et al., 2009) and the moderate-resolution cryo-EM structure of the mammalian 80S,eRF1,eRF3 termination
complex (PDB accession code 3J5Y) (des Georges et al., 2014) as templates. GMPPCP was modeled into the active site of the
eRF3 G domain.
Real space refinement was performed to optimize the fit of all eRF1 and eRF3 sidechains, as well as changes to the ribosome at the
binding interfaces and decoding center.
Rescue complex
Models for human Pelota (UniProt: Q9BRX2) and Hbs1l (UniProt: Q9Y450) were built using the deposited models for the moderate-
resolution reconstruction of Dom34,Hbs1 (the yeast homologs of Pelota,Hbs1l) bound to a ribosome stalled by a synthetic stem loop
(PDB accession code 3IZQ) (Becker et al., 2011) as a template. In this reconstruction, additional density was observed at the entrance
to the mRNA channel that was assigned to the N-terminal domain of Hbs1. This interaction is absent in our reconstructions, which
may reflect differences in the mRNA substrates used to program stalling, or between the N-terminal domains, which in mammals and
yeast share little sequence identity. Therefore, only the G domain and domains 2 and 3 of Hbs1l were modeled. GMPPCP was
modeled into the active site of the Hbs1l G domain.
In the 80S,Pelota,Hbs1l complex formed with a truncated mRNA, additional density at the ‘latch’ between h18 in the body and h34
and uS3 in the neck of the SSU appears to correspond to a bound GMPPCP molecule that is probably an artifact of reconstituting the
complex in the presence of 0.5 mM GMPPCP.
Real space refinement was performed to optimize the fit of all Pelota and Hbs1l sidechains, as well as changes to the ribosome at
the binding interfaces and decoding center.
Model refinement and validation

Models were refined with REFMAC v5.8 utilizing external restraints generated by ProSMART and LIBG (Brown et al., 2015a). Model
statistics were obtained using MolProbity (Chen et al., 2010). Cross-validation was calculated as previously described (Amunts et al.,
2014; Brown et al., 2015a).
Molecular graphics
All figures were generated with Chimera (Pettersen et al., 2004) or PyMOL (Schrödinger, LLC).
All reported resolutions are based on the Fourier shell correlation (FSC) 0.143 criterion (Rosenthal and Henderson, 2003).
Data Resources
Nine maps have been deposited with the EMDB with accession codes EMDB: 4129, EMDB: 4130, EMDB: 4131, EMDB: 4132, EMDB:
4133, EMDB: 4134, EMDB: 4135, EMDB:4136, and EMDB: 4137. Atomic coordinates have been deposited with the Protein Data
Bank under accession codes PDB: 5LZS, PDB: 5LZT, PDB: 5LZU, PDB: 5LZV, PDB: 5LZW, PDB: 5LZX, PDB: 5LZY and PDB: 5LZZ.

A 3X FLAG KRas UAG D

long NC 5’ 3’
di tine in
B
e yc
n
AUG
ni
em om
m
(Met)
an X
de
is
H
3X FLAG VHP β UGA
C
NC-stop 5’ 3’
AUG GUU input eEF1A
(Met) (Val)
3X FLAG VHP β eEF1A

trunc. NC 5’ 3’
AUG GUU
(Met) (Val)
anti-Flag
elutions uL6
3X FLAG VHP β ~200 nt polyA tail
polyA NC 5’ uS9
AUG GUU
(Met) (Val)
B
IVT + affinity
long NC 80S • aa-tRNA • eEF1A
didemnin B purify
IVT + affinity 80S • eRF1(AAQ) • ABCE1

NC-stop
eRF1(AAQ) purify (EMDB 3038-3040) 80S • eRF1 • eRF3
+ DN-Hbs1l
+GMPPCP
high salt affinity
wash purify
80S•NC-stop mRNA + { eRF1
eRF3 +GTP
{
Pelota
+ Hbs1l 80S • Pelota • Hbs1l 80S • eRF1
GMPPCP
trunc. NC
IVT + high salt affinity
80S • mRNA
DN-Hbs1l wash purify
polyA NC
C termination E
P
C
GMPPCP: + -
PP
.
TP
elongation rescue
ro
GTP: - + other: M
- -
pu
G
G
-175 -175 eRF3: - - - + +
-175
-83 eRF1: - - + + +
eRF3 - -83 Hbs1l - -83
eEF1A - -63
-48 -63 -63
eRF1 - Pelota -
-48 -48
-32
-25
ribo. -32 -32
-25 -25
prot. -16 ribo. ribo. NC-tRNA -
prot. prot. -16
-16
-7
-7
released NC -
-7
Figure S1. Isolation of Translational Decoding Complexes for Cryo-EM, Related to Figure 1
(A) Schematic of the mRNA constructs used for in vitro translation and isolation of ribosome-nascent chain complexes (RNCs). The start codon (AUG), stop codon
(UAG or UGA), and coding regions for the 3X Flag tag (green), the autonomously-folding villin headpiece (VHP) domain (blue), the cytosolic portion of Sec61b
(orange), and KRas (purple) are indicated.
(B) Experimental strategies for isolating the indicated RNCs from in vitro translation (IVT) reactions.
(C) SDS-PAGE and Coomassie staining of isolated RNCs representing the elongation complex (80S,aa-tRNA,eEF1A); pre-accommodated (80S,eRF1,eRF3) or
accommodated (80S,eRF1) termination complexes; and rescue complex (80S,Pelota,Hbs1l) reconstituted with a truncated mRNA (see panel A). Copurified,
exogenously-added, and ribosomal (ribo. prot.) proteins are indicated.
(D) The long NC construct (see panel A) was translated in vitro in rabbit reticulocyte lysate (RRL) with the indicated translational inhibitors added at the following
concentrations: 50 mg/mL cycloheximide (CHX), 10 mM anisomycin, 200 mM emetine, and 50 mM didemnin B. The translation reactions were affinity purified via the
3X Flag tag on the nascent chain. The elutions and inputs were analyzed by SDS-PAGE and immunoblotting for the indicated proteins, revealing that didemnin B
specifically traps eEF1A on the isolated RNCs.

(E) The NC-stop construct was translated in vitro in RRL in the presence of 35S-methionine and mutant eRF1(AAQ) to trap RNCs with the UGA stop codon in the A
site. The RNCs were isolated under high salt conditions and subjected to affinity purification via the 3X Flag tag on the nascent chain. The isolated RNCs were
incubated with 1 mM puromycin or recombinant wild-type eRF1, wild-type eRF3, and 0.5 mM GMPPCP or GTP as indicated, and then directly analyzed by SDS-
PAGE and autoradiography. The bands corresponding to ribosome-associated nascent chain-tRNA (NC-tRNA) and released nascent chains (NC) are indicated.
This demonstrates the functionality of the components of the reconstituted termination complex in mediating the release of the nascent chain, which is inhibited
by the nonhydrolyzable GTP analog, GMPPCP.
80S (empty A site) 80S • aa-tRNA • eEF1A 80S • eRF1 • eRF3 80S • eRF1 80S • eRF1(AAQ) • ABCE1
60S
P-site A/T eRF1

tRNA aa-tRNA eRF1 (AAQ)
E-site ABCE1
tRNA eEF1A eRF3 eRF1
40S
mRNA
Local resolution (Å)

5
7
1
FSC=0.5
FSC
FSC=0.143 3.1 Å 3.4 Å 3.7 Å 3.8 Å 3.4 Å

0
10 5 3.33 2.5 10 5 3.33 2.5 10 5 3.33 2.5 10 5 3.33 2.5 10 5 3.33 2.5
Resolution (Å) Resolution (Å) Resolution (Å) Resolution (Å) Resolution (Å)
80S • Pelota • Hbs1l + truncated mRNA + UGA stop

p codon + polyA
p y mRNA combined
3
Local resolution (Å)

Pelota
5
Hbs1l
1
overall FSC
model vs. map
FSC
self-validation
cross-validation 0.143 3.5 Å 3.7 Å 4.0 Å 3.5 Å
0
10 5 3.33 2.5 10 5 3.33 2.5 10 5 3.33 2.5 10 5 3.33 2.5
Resolution (Å) Resolution (Å) Resolution (Å) Resolution (Å)
Figure S2. Quality of Cryo-EM Maps and Models, Related to Figure 1

The EM map for each isolated RNC complex is shown colored according to individual factors (top row) or by local resolution (second row). Below each local
resolution map are Fourier shell correlation (FSC) curves calculated between independent half maps (black), and calculated between the refined model and final
map (purple), and with the self (blue) and cross-validated (magenta) correlations for each complex. The nominal resolution estimated from the map-to-map
correlation at FSC = 0.143 is reported and agrees well with the model-to-map correlation at FSC = 0.5. The 80S,eRF1(AAQ),ABCE1 map was generated by
combining all of the datasets from (Brown et al., 2015b) to analyze eRF1 conformational changes during the termination pathway (see Figures 7 and S7).
Figure S3. Secondary Structure Topology Diagrams of Translational GTPases and Decoding Proteins, Related to Figure 1
(A) Topology diagram of the homologous regions of translational GTPases (e.g., eEF1A, eRF3, and Hbs1l), showing the G domain (red) and the two b-barrel
domains (orange and yellow). The motifs important for GTP hydrolysis (Switch 1, Switch 2 (Sw2), and P loop) are highlighted.
(B) Topology diagrams of eRF1 and Pelota, showing the divergent N domains and homologous M and C domains. The locations of the loop harboring the catalytic
GGQ motif (blue) and the minidomain (mini) in eRF1 are indicated.
Figure S4. Decoding Center Interactions, Related to Figure 1
(A) Decoding center interactions of A/T aa-tRNA (purple) in the elongation complex, demonstrating how Gln61 and cis-Pro62 on a loop of uS12 (orange) can
interact, via a water molecule or metal ion, with the mRNA (slate) backbone. Decoding nucleotides of 18S rRNA (yellow) are indicated.
(B–D) EM map density and model showing that the interactions between eRF1 (purple) and stop codon mRNA (slate) remain unchanged in the (B) pre-
accommodated (contoured at 8s), (C) accommodated (contoured at 7s), and (D) ABCE1-bound complexes (contoured at 8s).
(E and F) Decoding center interactions of (E) eRF1 (purple) in the termination complex and of (F) Pelota (pink), viewed as in panel (A).
Figure S5. Details of Pre-accommodation Architectures, Related to Figure 5
(A) The acceptor stem of aa-tRNA (purple) binds in a cleft between the G domain (red) and domains 2 (orange) and 3 (yellow) of eEF1A.
(B) Surface model of eEF1A colored by electrostatic potential (same view as panel A).
(C) EM map density contoured at 7s and models of the interactions between the 30 end of aa-tRNA (purple) and domain 2 (orange) and G domain (red) of eEF1A.
(D and E) The M domains of (D) eRF1 and (E) Pelota bind their respective GTPase partners in a cleft analogous to where aa-tRNA binds eEF1A. Structures are
aligned as in panel (A).
(F and G) Surface model colored by electrostatic potential of (F) eRF3, and (G) Hbs1l.
(H and I) Superposition of (H) the crystal structure of aRF1,aEF1A,GTP (gray) on ribosome-bound eRF1,eRF3,GMPPCP or of (I) the crystal structure of
aPelota,aEF1A,GTP (gray) on ribosome-bound Pelota,Hbs1l,GMPPCP via domains 2 and 3 of the GTPase. Upon ribosome binding, the N domain of the
decoding factor is reoriented, while the M domain forms additional contacts with the G domain of the GTPase.

(J and K) Interactions between the M domains of (J) eRF1 or of (K) Pelota with the G domain of the respective GTPase. The b7-a5 loop, which harbors the GGQ
motif of eRF1, makes interactions with the Switch 1 (Sw1, red) motif, and additional interactions are formed with the Switch 2 (Sw2, teal) motif harboring the
catalytic histidine.
(L) The backbone and CCA end of A/T aa-tRNA also interacts with catalytically important motifs of the G domain of eEF1A.
Figure S6. GTPase Active Sites, Related to Figure 5
(A) EM map density and model for GDP and GTP analogs in the indicated structures. eEF1A-bound GDP density is contoured at 7s; Hbs1l-bound GMPPCP
density is contoured at 6s. Coordinating residues (pink) and magnesium ions (green) are indicated.
(B) Interactions of the sarcin-ricin loop (SRL) with the catalytic histidine (teal) of the indicated GTPase. The residues of the hydrophobic gate are indicated in
yellow.
A L7/L12 stalk B L7/L12 stalk
M
M
pre-accom.
accom.
eRF1
eRF1
Trp377
Arg330
uL11 uL11
N N C
18S C
rRNA G1507 eS31 eS31
Figure S7. Conformational Changes during Decoding Factor Accommodation, Related to Figure 7
(A) The minidomain of pre-accommodated eRF1 (colored by domains) forms an interaction (circled) with eS31 (yellow) that is stabilized by G1507 of 18S rRNA.
(B) Upon accommodation, the M (purple) and C (pale blue) domains of eRF1, and the L7/L12 rRNA stalk base (blue) supporting uL11 (light cyan) undergo
conformational changes to establish new interactions (circled) between the eRF1 minidomain with uL11 and the L7/L12 stalk base. Arrows indicate the direction
and magnitude of movement of the minidomain and uL11 from the pre-accommodated state.
Article
EGFR Dynamics Change during Activation in Native

Membranes as Revealed by NMR
Mohammed Kaplan,
Siddarth Narasimhan, Cecilia de Heus, ...,
Simone Lemeer,
Paul M.P. van Bergen en Henegouwen,
Marc Baldus
Correspondence
p.vanbergen@uu.nl (P.M.P.v.B.e.H.),
m.baldus@uu.nl (M.B.)
In Brief
An NMR approach shows how receptors
move in native membranes at high
resolution, revealing that, while the
intracellular domain of EGFR is rigid, the
extracellular domain is highly dynamic
until bound by ligand.
Highlights
d NMR can be applied to study activation of full-length EGFR in
native membranes
d Solid-state NMR provides insight into structure and mobility

of full-length EGFR
d Data identify conformational selection as a key factor for

receptor activation
Kaplan et al., 2016, Cell 167, 1241–1251

Article
EGFR Dynamics Change during Activation

in Native Membranes as Revealed by NMR
Mohammed Kaplan,1,5 Siddarth Narasimhan,1 Cecilia de Heus,2,6 Deni Mance,1 Sander van Doorn,3 Klaartje Houben,1
Dus
an Popov-Celeketi c,2 Reinier Damman,1 Eugene A. Katrukha,2 Purvi Jain,2 Willie J.C. Geerts,4 Albert J.R. Heck,3
Gert E. Folkers,1 Lukas C. Kapitein,2 Simone Lemeer,3 Paul M.P. van Bergen en Henegouwen,2,* and Marc Baldus1,7,*
1NMR Spectroscopy, Bijvoet Center for Biomolecular Research
2Cell Biology, Department of Biology, Faculty of Science
3Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical
Sciences
4Biomolecular Imaging, Bijvoet Center for Biomolecular Research
Utrecht University, 3584 CH Utrecht, the Netherlands

5Present address: Physical Biology Center for Ultrafast Science and Technology, Arthur Amos Noyes Laboratory of Chemical Physics,
California Institute of Technology, Pasadena, CA 91125, USA

6Present address: University Medical Center Utrecht, Cell Biology, Heidelberglaan 100, 3584CX Utrecht, the Netherlands
7Lead Contact
*Correspondence: p.vanbergen@uu.nl (P.M.P.v.B.e.H.), m.baldus@uu.nl (M.B.)

SUMMARY juxtamembrane (JM) region, a tyrosine kinase domain (KD),

and a C-terminal-region (CT), in which multiple potential tyrosine
The epidermal growth factor receptor (EGFR) repre- kinase substrate residues are located. Detailed structural infor-
sents one of the most common target proteins in mation has been obtained for various EGFR segments, such as
anti-cancer therapy. To directly examine the structural the ECD (Ferguson et al., 2003; Garrett et al., 2002; Ogiso
and dynamical properties of EGFR activation by the et al., 2002) and the KD (Jura et al., 2009; Stamos et al., 2002),
epidermal growth factor (EGF) in native membranes, or for constructs containing the TM and JM domains in mem-
brane mimetics (Endres et al., 2013; Lu et al., 2010; Stamos
we have developed a solid-state nuclear magnetic
et al., 2002). Crystal structures have furthermore suggested
resonance (ssNMR)-based approach supported by
that the non-liganded ECD can adapt a closed conformation
dynamic nuclear polarization (DNP). In contrast to pre- that is stabilized by an intramolecular tether between domain II
vious crystallographic results, our experiments show and IV (Ferguson et al., 2003; Ogiso et al., 2002). On the other
that the ligand-free state of the extracellular domain hand, the liganded ECD was found in an open, extended confor-
(ECD) is highly dynamic, while the intracellular kinase mation leading to the intracellular active KD. Significant progress
domain (KD) is rigid. Ligand binding restricts the over- has been made in deciphering which interaction sites are
all and local motion of EGFR domains, including the involved in the stabilization of the EGFR dimer. They include
ECD and the C-terminal region. We propose that the the dimerization loops in domain II (Dawson et al., 2005), the
reduction in conformational entropy of the ECD by GxxxG or GG4 motifs in the TM (Lu et al., 2010), the antiparallel
ligand binding favors the cooperative binding required coiled coils in the JM (Doerner et al., 2015), and the KD (Jura
et al., 2009). These multiple interactions contribute to the coop-
for receptor dimerization, causing allosteric activation
erative formation and stabilization of the dimer of the wild-type
of the intracellular tyrosine kinase.
(Dawson et al., 2005) and, at least partially, of tumor-related
constitutively active EGFR mutants (Valley et al., 2015).
INTRODUCTION In spite of these studies, a unified structural view that de-
scribes the ligand-induced functional coupling between the
The epidermal growth factor receptor (EGFR, Her1, or ErbB1) ECD and the intracellular domain of the full-length receptor in a
is one of the four members of the Her (ErbB) family of receptor native membrane environment has remained elusive (Bessman
tyrosine kinases that serves as cell-surface receptor for peptide et al., 2014; Kovacs et al., 2015). FRET, as well as molecular dy-
ligands and plays a crucial role in regulating cell proliferation, namics (MD) studies using EGFR in synthetic lipid bilayers, sug-
migration, and differentiation (Arteaga and Engelman, 2014; Fer- gest that the unliganded EGFR ECD is located close to the mem-
guson et al., 2003; Yarden, 2001). ErbB proteins are linked to the brane in the closed, tethered conformation (Arkhipov et al., 2013;
development of different tumors (e.g., colorectal carcinoma, Kaszuba et al., 2015; Ziomkiewicz et al., 2013). However, recent
head and neck cancer, and gliomas) and represent a successful MD data from the ECD without membrane show a highly flexible
target for anti-cancer therapies using antibodies or small mole- fold (Arkhipov et al., 2013), which would explain the dynamic ex-
cules inhibitors (Tebbutt et al., 2013). istence of predimers on cell surfaces (Low-Nam et al., 2011). MD
EGFR consists of an extracellular domain (ECD) formed by studies furthermore suggest that the dimerized EGFR ectodo-
domains I to IV, a single-pass transmembrane (TM) domain, a main is lying flat on the membrane, thereby possibly explaining
Figure 1. Preparation of EGFR-Rich Membrane Vesicles from A431 Cells
Schematic presentation for the preparation of A431 membrane vesicles. For MS/EM/dSTORM/gSTED studies, cells were grown on DMEM medium, while for
ssNMR studies, A431 cells were cultured in [13C, 15N]-labeled DMEM medium (20 plates were required for one sample). Cells were scraped from the plates and
vesiculated by passing them through a syringe 10 times. After removal of the unbroken cells and cell nuclei by spinning at low speed, the membrane vesicles were
spun down at high speed and loaded into an ssNMR rotor. Note that all methods can also be used to study whole cells.
the negative cooperativity of ligand binding (Arkhipov et al., tained for EGFR domains (Ferguson et al., 2003; Lu et al., 2010;
2014). On the other hand, recent FRET studies speak in favor Ogiso et al., 2002; Stamos et al., 2002) and assuming the C-ter-
of an increased distance between domain I and the membrane minal domain to be unstructured, we monitored EGFR structure
after ligand binding in line with an upright position of the dimer- and dynamics at global and residue-specific levels. Taken
ized ECDs (Valley et al., 2015; Ziomkiewicz et al., 2013). together, our NMR data reveal dynamics of specific EGFR re-
These studies, together with previous work highlighting the in- gions in the unliganded state, which are strongly reduced by
fluence of native membrane lipids such as cholesterol (den Har- ligand binding, suggesting that a reduction in conformational en-
tigh et al., 1992) or gangliosides (Coskun et al., 2011; Miljan and tropy contributes to the free energy of EGFR dimerization.
Bremer, 2002), as well as receptor glycosylation (Liu et al., 2011)
for receptor activation and internalization, underline the notion RESULTS
that a comprehensive understanding of receptor activation re-
quires the study of the full-length EGFR in its native environment. Isolation and Characterization of EGFR-Rich A431
To address this aspect, we describe in the following the develop- Membrane Vesicles
ment and application of a solid-state NMR (ssNMR)-based To investigate EGFR in its native membrane environment, we
approach to directly examine structural and dynamical proper- used A431 cells known to exhibit a high (1–2 3 106 receptors
ties of full-length EGFR in native membrane vesicles before per cell) expression level of EGFR (Haigler et al., 1978) to pro-
and after activation. Unlike solution-state NMR, where small duce EGFR-containing membrane vesicles amenable for our
membrane proteins such as the transmembrane region of multi-technique approach (Figure 1). Confocal microscopy of
EGFR can be studied in membrane mimetics (Endres et al., A431 cells and EGFR negative cells confirmed high-level expres-
2013), ssNMR can give detailed structural insight into the role sion of EGFR (Figure 2A). In addition, super-resolution light mi-
of the bilayered membrane for protein structure in synthetic croscopy (dSTORM) experiments using anti-EGFR nanobodies
(Matsushita et al., 2013) or native bacterial membranes (Kaplan and cryo-electron microscopy revealed the isolation of vesicles
et al., 2015) largely irrespective of their size and mobility. In addi- with a size of 50–250 nm, with EGFR localized to the membrane
tion, ssNMR can probe changes in local or overall protein dy- (Figures 2B and 2C). To determine the orientation of EGFR in
namics at ambient temperature by the reduction in signal inten- these vesicles, we treated the vesicles with Proteinase K for
sity in dipolar-based experiments due to the presence of motion 15 min at 4 C and analyzed the samples by western blotting us-
(Etzkorn et al., 2010; Hong et al., 2012; Schneider et al., 2010) ing an antibody specific for the intracellular domain. Comparison
and by tracking ssNMR line width variations due to backbone of the EGFR protein band intensity using densitometry with the
fluctuations at low temperatures (Koers et al., 2014). Importantly, remaining intracellular domain band of 65 kDa suggests that
the latter studies are fully compatible with sensitivity enhance- approximately 85% ± 6.4 (mean ± SEM, n = 3) of EGFR is in
ment methods such as dynamic nuclear polarization (DNP) that the right outside-out orientation (Figure 2D). This was confirmed
results in NMR signal enhancements by one to two orders of by three-color gated stimulated emission depletion (gSTED) mi-
magnitude (Ni et al., 2013). The combination of this high-sensi- croscopy, where the vesicles were stained with the lipophylic
tivity technique with tailored amino-acid labeling allows for the membrane stain DiI, EGF-A488, and an anti-EGFR nanobody
study of local protein structure even in complex molecular envi- conjugated to A647 (NB-A647). Almost all vesicles showed
ronments (Kaplan et al., 2015). EGF binding (Figure 2E). Fluorescence intensity analysis of colo-
To investigate EGFR in its native membrane environment by calized vesicles shows a high degree of correlation between the
ssNMR, we utilized A431 cells to extract EGFR-enriched mem- EGF-A488 and NB-A647 (Pearson correlation coefficient: r =
brane vesicles amenable for NMR studies. For reference, we 0.577, N = 84, p < 0.001) (Figure 2F). In addition, we observed
characterized these membrane vesicle preparations by electron ligand-induced phosphorylation (Figure 2G), confirming high-
microscopy, super-resolution light microscopy, and mass level expression of functionally active EGFR in the isolated mem-
spectrometry (MS). Using previous structural information ob- brane vesicles. To probe the level of EGFR expression, we
1242 Cell 167, 1241–1251, November 17, 2016

conducted MS experiments on A431 cells and the isolated mem- for a-helical (red boxes), random-coil (rc, black boxes), and b
brane vesicles (Figure 3). Using an accepted semiquantitative strand (blue boxes) for backbone Ca (dashed line) and Cb (solid
approach based on summed ions intensities over all detected line) resonances, and we estimated, based on the EGFR amino-
peptides we found actin to be the most abundant protein in acid sequence and the available structures, the relative contribu-
whole A431 cells, followed by other abundant soluble molecules tion (equivalent to the total expected NMR signal intensity) of
including heat shock and histone proteins. While the EGFR the major receptor segments, i.e., ECD, KD, and CT. Such an
expression level was lower than these proteins, EGFR still repre- analysis was also performed for actin (Figure S5).
sented the most abundant membrane protein in our cells in line The observation that EGF induces spectral changes (Figure 4)
with previous findings (Haigler et al., 1978). When moving to iso- already provided a strong indication that the ssNMR data at high
lated membrane vesicles, we found EGFR highly enriched by a temperatures were dominated by signals of the EGFR receptor,
factor 5.5 (Figure 3), making EGFR, together with actin, the where actin monomers may be too mobile to be detected. This
most abundant protein in our membrane vesicles. As membrane notion was further confirmed by analyzing ssNMR correlations
proteins, such as EGFR, are typically less detectable by MS than for specific residue types and secondary structure elements pre-
soluble proteins (Santoni et al., 2000), such as actin, we argue sented in the following (see Figure S5 for a statistical analysis of
that the MS-based estimation of EGFR levels is at the lower limit. protein secondary structure and amino-acid distributions for
In summary, our results shown in Figures 2 and 3 confirmed the EGFR and actin). Examining the Ser-region before EGF binding,
presence of high levels of functional EGFR in our isolated mem- we observed Ser signals mostly in a-helical conformations and
brane vesicles. Such preparations are also advantageous with additional intensity matching Ser in random-coil conformations
respect to the amount of protein in our ssNMR experiments, (Figure 4B). A dominant a-helical signal before addition of EGF
and we consequently prepared vesicles amenable for ssNMR can only be explained by the EGFR KD domain (Figure 4B),
studies by growing A431 cells on a medium containing [13C, and the significant increase in b strand and random-coil signals
15
N]-labeled algae mixture (Figure 1). would be compatible with an increasing contribution of ECD (for
both b strand and random coil conformations) and CT (random
Solid-State NMR Experiments on [13C, 15N]-Labeled coil conformations) EGFR domains after EGF binding. Interest-
A431 Membrane Vesicles at Ambient Temperatures ingly, such a notion correlates with signal changes in the Thr (Fig-
Suggest a Dynamic Extracellular Domain and Rigid ure 4C) and Ala region (Figure 4D), where a-helical correlations
Kinase Domain are dominant and b strand/random-coil contributions appear
Using [13C, 15N]-labeled A431 vesicles, we examined the effect after EGF binding. A similar effect would explain the strong signal
of the addition of EGF (Figures 4 and S1) and of variations in tem- increase in the Pro signals, which are most abundant in EGFR
perature (253 and 285 K, Figures S1 and S2) on the resulting 1D ECD and CT, after EGF binding. Taken together, these observa-
and 2D ssNMR spectra. While signal intensities in frozen sam- tions (which we confirmed by repeating experiments on different
ples in the absence or presence of EGF were very comparable sample batches) suggest that the ssNMR data (Figure 4) are
(Figure S1C), ssNMR intensities differed at ambient temperature, dominated by a rigid KD domain of EGFR in the resting state
with a clear increase in signal intensity after addition of EGF (Fig- and the appearance of rigid ECD domain and possibly the CT
ures S1D and 4), indicative of a ligand-induced structural stabi- domain of EGFR after EGF binding. Note, however, that such a
lization of EGFR. Moreover, the overall 2D correlation pattern global analysis of our ssNMR spectra does not allow us to unam-
seen at lower temperatures (Figure S2) correlated with chemi- biguously identify structural changes in the CT domain and to
cal-shift predictions on the basis of previous EGFR domain draw general conclusions about the much-smaller TM and JM
structures (Ferguson et al., 2003; Garrett et al., 2003; Lu et al., domains or individual residues.
2010; Ogiso et al., 2002; Stamos et al., 2002) and assuming
the C-terminal (CT) domain to be unstructured. These observa- DNP-Supported Solid-State NMR Experiments on
tions confirmed the dominance of folded EGFR in our spectra Specifically [13C, 15N]-Labeled A431 Vesicles Detect a
(Figure S2). SsNMR signals from our vesicular samples remained Reduction in Local Protein Dynamics after Ligand
constant during extended measurement periods, consistent with Binding
intact protein preparations (Figure S3). Next to the folded protein In order to obtain site-specific information of the different do-
signals, we also observed mobile random-coil signals from un- mains of EGFR, we produced specifically [13C, 15N]-labeled
structured protein regions (such as the EGFR CT) and other small A431 membrane vesicles with 13C-Met, 13C-Phe, 15N-Thr, and
15
molecules, including lipids and sugars (Figure S4). N-Leu (referred to henceforth as MFTL-labeled EGFR). As
Spectral overlap precluded an analysis of the entire 1,186- shown before (Kaplan et al., 2015), this strategy introduced
amino-acid receptor by conventional ssNMR. However, 2D atomic probes that lead to residue-specific sequential correla-
(13C,13C) double-quantum/single quantum experiment (DQSQ) tions in inter-residue ssNMR, the so-called NCOCX, experi-
(Figure 4A) spectra at ambient temperatures provided sufficient ments. This approach generates, in total, 12 sequential residue
spectral resolution to investigate changes in ssNMR signal inten- pairs in the two most-abundant proteins in our samples, namely,
sities and peak positions in different 2D segments, such as the EGFR and actin. Nine of these sequential correlations are distrib-
Ser and Thr spectral region (Figures 4B and 4C), as well as re- uted in EGFR extracellular domains D1–D3, the intracellular tyro-
gions containing Ala (Figure 4D) and Pro (Figure 4E) residues. sine KD, and the CT (Figure 5A), while the remaining three corre-
Using standard secondary chemical-shift values (Wang and Jar- lations result from actin (PDB: 1D4X, Figure S6A). Due to a limited
detzky, 2002), we distinguished spectral regions characteristic signal-to-noise ratio at higher temperatures, we resorted to DNP
Cell 167, 1241–1251, November 17, 2016 1243

Figure 2. Structural and Functional Characterization of A431 Cells and Membrane Vesicles
(A) Confocal microscopy of A431 cells (bottom) and NIH 3T3 clone 2.2 cells (EGFR negative) (top) incubated with Alexa488-tagged EGF (in green). Blue represents
DAPI staining of nuclei.
(B) dSTORM reconstruction of A431 vesicles stained with anti-EGFR, Alexa647-conjugated nanobody (left), and two magnified 3-mm2 areas (right). Scale bars
indicate 1 mm and 250 nm, respectively.
(C) Cryo-EM of A431 membrane vesicles. The sample was observed without chemical fixation or contrast.
(D) Western blot analysis of proteinase-K-treated vesicles to determine EGFR topology in the membrane vesicles. Freeze/thaw-disrupted membrane vesicles
were used to confirm proteinase K activity. EGFR_FL: full-length EGFR; EGFR-DECD: EGFR lacking the ECD.
(E) Three-color gSTED imaging of A431 vesicles labeled with a membrane dye (DiI), EGF-A488, and an anti-EGFR nanobody conjugated to Alexa647. Arrows
indicate DiI stained vesicles not labeled with EGF or anti-EGFR NB. Scale bar indicates 1 mm.
1244 Cell 167, 1241–1251, November 17, 2016

Figure 3. Relative Abundance of EGFR in
A431 Cells and Membrane Vesicles As-
sessed by Mass Spectrometry
Normalized intensities of the 20 most-abundant
proteins in intact A431 cells (light blue) and corre-
sponding intensities in membrane vesicles (dark
blue). Intensities were calculated by summing the
intensities over all peptides detected in the tryptic
digests of the cells and vesicles for the annotated
proteins. Intensities were normalized to the sum of
peptide intensities detected in both vesicles and
the whole-cell lysates. From the normalized in-
tensities, enrichment factors (labels) were calcu-
lated, clearly revealing that of the 20 most-abun-
dant proteins in cells, only EGFR is highly enriched
in the membrane vesicles.
experiments, which significantly increase NMR signal intensity identified correlation in Figure 5C. We also observe a clear shift
via electron polarization (Ni et al., 2013). The increased sensitivity and spectral changes after EGF binding in the random-coil Phe
(with a DNP enhancement factor ε 20 at 800 MHz and 80 at Cb region, which strongly suggests that these signals stem
400 MHz) allowed us to perform 2D and 3D NCOCX experiments from 357FT358 (D3), as well as from the two sequential correla-
at 400 (Figures 5B, 5D, and 5E) and 800 (Figure S6B) MHz DNP tions in the CT (as indicated in Figure 5C).
conditions, as well as a 2D 15N-edited 13C-13C experiment In full accordance with a dominant contribution of EGFR to our
(Baker et al., 2015) at 400 MHz DNP conditions (Figure 5C). spectra, we did not observe methionine correlations in b strand
Again, we made use of standard spectral regions expected for conformations (indicated Metb(0,0) in Figure 5C). Instead, we de-
a-helix (red), b strand (blue), and random-coil (black) ssNMR fre- tected Met correlations (which can be discriminated on the basis
quencies for both 13C and 15N dimensions. These spectral re- of their characteristic Cb shifts) in a-helix and random-coil con-
gions are indicated for expected Phe and Met correlations by formations, in line with EGFR and actin predictions. In summary,
solid and dashed lines in Figures 5B–5E, respectively. our spectra at DNP temperatures (100 K) suggested the domi-
In general, the addition of EGF can lead to chemical-shift or nant role of EGFR signals also in our LT-DNP spectra. For both
line-width changes in ssNMR data of EGFR due to local alter- EGF-free and EGF-bound conformations, our observed correla-
ations in protein structure and dynamics or due to the presence tions globally matched with expectations from previous X-ray
of a nearby ligand. Interestingly, the addition of EGF significantly structures of the corresponding EGFR subdomains, and we
increased spectral resolution both in NC (Figures 5B, 5D, and 5E could tentatively assign chemical-shift changes to a residue
and S6), as well as in 15N-edited CC (Figure 5C) experiments, pair located close to the EGF binding site previously seen in pro-
indicative of a reduction in local backbone and side-chain fluctu- tein crystals. In addition, our ssNMR data suggested a significant
ations that reduce line broadening at low temperatures (Koers reduction in local backbone and side-chain fluctuations that
et al., 2014) and a dominant contribution of EGFR correlations would give rise to structural disorder at low temperatures before
to the spectrum. In line with the latter conclusions, we found EGF binding.
the most-dominant signals in Phe a-helical and random-coil re-
gions (Wang and Jardetzky, 2002) in full analogy to the expected DISCUSSION
three correlations for EGFR in domain I and in domain III, as well
as in the CT, respectively (Figure 5A). In addition, we found at an Increasing evidence suggests that a comprehensive view of
15
N chemical shift of 124 ppm (Figure 5B, NCOCX experiment), EGFR activation requires the study of structure and dynamics
which is characteristic for Leu residues in b strand or random- of the full-length receptor in its native cell membrane setting
coil conformations, and a clear b strand Phe correlation in our (Bessman et al., 2014; Kovacs et al., 2015). NMR has, for a
15
N-edited 13C-13C experiment (Figure 5C) that can only stem long time, contributed to obtaining such information for mole-
from EGFR, namely, the sequential pair 380FL381 in domain III cules that tumble rapidly under in vitro (Arkhipov et al., 2013;
(Figure 5A, denoted Pheb(1,0) in Figure 5B). In the crystal struc- Kern and Zuiderweg, 2003; Kerns et al., 2015; Nygaard et al.,
ture (Ogiso et al., 2002), the 380FL381 pair is located close to the 2013) and, more recently, under in-cell conditions (Banci et al.,
EGF binding site (see Figures 6A and 6B), which would readily 2013; Serber et al., 2001; Smith et al., 2015). On the other
explain the observed chemical-shift changes for the tentatively hand, ssNMR provides increasing possibilities to conduct such
(F) Scatterplot of integrated fluorescence intensities of individual A431 membrane vesicles in EGF-A488 and anti-EGFR NB-647 channels.
(G) Phosphorylation assay of A431 plasma membrane vesicles to detect phosphorylated EGFR (pEGFR) with anti-P1068 antibody. A431 cells were incubated at
37 C with (+) or without () EGF for 10 min. For membrane vesicles samples, either A431 cells were incubated at 37 C for 10 min with EGF (+), followed by vesicle
preparation, or vesicles were first prepared from A431 cells, after which they were incubated at 37 C without () or with (+) EGF.
Cell 167, 1241–1251, November 17, 2016 1245

Figure 4. EGF-Induced Alterations in Dynamics of Fully [13C, 15N]-Labeled EGFR as Seen by 2D ssNMR at Ambient Temperatures
(A) DQSQ of fully [13C, 15N]-labeled A431 vesicles without EGF (orange) and with EGF (cyan) at 285 K.
(B–E) Zoom-in of spectral regions comprising serine, threonine, alanine, and proline resonances. Solid lines and dashed boxes represent the Cb and Ca chemical-
shift regions in a-helix (red), b sheet (blue), and random-coil (black) conformations, respectively. Scale bars (normalized for each amino acid type) reflect the
number of residues expected to occur in the three considered backbone structural folds (B–D) or the total number of residues (E) in the ECD, KD, and CT domain.
The analysis was performed using the known structures of different EGFR domains.
See also Figures S1, S2, S3, S4, and S5.
studies on large, possibly membrane-embedded, protein com- sugars, suggesting the presence of receptor dynamics at
plexes in their natural cell environment (Chow et al., 2014; Fred- different timescales in our samples (Figure S4).
erick et al., 2015; Kaplan et al., 2015; Renault et al., 2012). Here, Our results suggest a model for EGFR activation in which the
we have shown how to extend such studies to examine large ECD is present on the cell surface of resting cells as an
eukaryotic protein receptors in their native membrane setting ensemble of different conformers. Both the closed, tethered
by isolating fully and specifically [13C, 15N]-labeled membrane conformation can be expected, as well as the open conforma-
vesicles that express the functional receptor of interest to high tion in which the autoinhibitory tether between domain II and
levels. Combining 2D ssNMR data at ambient temperatures IV is released (Figure 6A). Based upon a previously suggested
with DNP studies of specifically labeled membrane vesicles al- DG of 1 to 2 kcal/mole of the domain II/IV interaction,
lowed us to examine the overall structure and dynamics of the 80%–97% of the ECD was expected in the closed conformation
full-length EGFR before and after ligand binding in situ. (Ferguson et al., 2003). In this framework, global and local
Taken together, our ssNMR analysis suggests that the dynamics probed in our ssNMR studies would be most compat-
observed spectroscopic changes due to EGF binding are largely ible with the presence of global domain motions of large por-
due to alterations in receptor dynamics. Before activation, our tions of the ECD combined with local backbone fluctuations
data are in accordance with a highly dynamic ECD and CT and (detected by DNP-ssNMR for the EGF binding region, as well
a rigid KD, in line with earlier studies suggesting autoinhibitory in- as the dimerization interface, Figure 6B) that can lead to the
teractions of the KD and the N-terminal portions of the intracel- open conformation, enabling the ectodomain to form inactive
lular JM region with the intracellular membrane surface (Endres (pre)dimers previously detected for very short time periods
et al., 2013; Sengupta et al., 2009) (Figures 6A and 6B). The fluc- (Low-Nam et al., 2011). The highly dynamic nature of the unli-
tuations (local and global) of the ECD in the absence of EGF pre- ganded EGFR also explains the ligand-independent dimeriza-
clude strong interactions of the ECD with the membrane, which tion and activation of EGFR at higher expression levels of
is in disagreement with recent MD studies on the EGFR ECD EGFR in the plasma membrane of different cancer cells. The
domain (Arkhipov et al., 2013). Rather, our experimental results ECD dynamics result in the presence of the ECD in the extended
are in line with experimental results (Coskun et al., 2011; den conformation, which is prone to form dimers. Since the percent-
Hartigh et al., 1992; Liu et al., 2011; Miljan and Bremer, 2002) age of extended conformations will not change, the number of
and recent computational studies (Kaszuba et al., 2015) sug- EGFRs in the extended conformation is higher, resulting in a
gesting a key role for the natural composition of the cell mem- higher probability for predimer formation. Similarly, it explains
brane and receptor glycosylation for receptor dynamics. Indeed, the observation that the ECD in active EGFR mutants can
we observed in our ssNMR experiments additional mobility in lead to enhanced ligand-independent dimerization (Valley
other endogenous cellular components, including lipids and et al., 2015). Deletion of parts of the ECD as has occurred in viral
1246 Cell 167, 1241–1251, November 17, 2016

Figure 5. DNP-ssNMR Experiments on MFTL-Labeled A431 Vesicles Reveal Ligand-Induced Protein Stabilization on the Level of Individual
Residues
(A) Schematic view of EGFR domains, highlighting the sequential correlations expected in a 13C-[F, M] and 15N-[L, T] labeled sample. Color-coding stands for
specific backbone conformations as described in the main text.
(B, D, and E) 2D planes of 3D NCOCX experiments before (D, orange) and after (B and E, cyan) addition of EGF for the indicated 15N chemical shifts.
(C) 2D N-edited 13C-13C experiment of MFTL A431 membrane obtained with (cyan) and without (orange) EGF. In (B–E), red, blue, and black boxes represent the
chemical-shift ranges expected for Phe (solid lines) and Met (dotted lines) Ca and Cb correlations in a-helical, random-coil, and b strand conformations.
Cell 167, 1241–1251, November 17, 2016 1247

Figure 6. A Model of EGFR Dynamics and Structural Changes in the Free and EGF-Bound Forms
(A) Generic model of EGFR activation via conformational selection in the ECD.
(B) At high temperatures (285 K), the unbound receptor exhibits dynamics in both the ECD and CT. Upon binding to the ligand EGF (shown in yellow), the receptor
dimerizes and exhibits less dynamics, both on a global and local scale. Residues probed by ssNMR in the MFTL sample are highlighted in orange (MT and FT
residue pairs) and magenta (ML and FL pairs), and zoom-ins show local protein structure. 13C-labeled residues (M and F) contain side-chains in stick repre-
sentation, and the 15N-labeled residues (T and L) are represented as spheres on the backbone nitrogen.
and oncogenic variants as v-ERB and EGFviii release the closed rigid conformation with reduced conformational entropy, which
conformation, resulting in a larger number of less-stable ligand- contributes to the binding of the multiple low-affinity interaction
independent dimers. As a consequence, basal kinase activity motifs that are present not only in the ECD (Dawson et al.,
levels are higher but less than EGF-induced kinase activity. 2005) but in the entire EGFR. This cooperative binding of two
We hypothesize that EGF binding can occur to all conformers, rigid EGFR monomers involves the tether in domain II, GxxxG,
including the open conformation. In this model, EGF does not or GG4 motifs in the TM, the anti-parallel a helices in the JM,
induce a conformational change of the receptor but rather stabi- as well as the KD domain (Doerner et al., 2015; Ferguson et al.,
lizes the open conformation, which is preceding receptor dimer- 2003; Jura et al., 2009; Lu et al., 2010). In this way, the reduc-
ization. The reduced dynamics of the liganded EGFR result in a tion in global, as well as local, dynamics contributes to the
Connecting lines track experimentally observed Phe (solid lines, annotated by Phe [X,Y]) and Met (dashed lines, annotated by Met [X,Y]) correlations in a-helical, b
strand, and random-coil conformations, respectively. X,Y stand for the number of predicted sequential correlations for X = EGFR and Y = Actin on the basis
Figure 5A (EGFR) and Figure S6A (Actin). In (C), tentative assignments for the 380FL381 pair, as well as for the spectral correlations consistent with signals
stemming from 357FT358 (DIII) and from the two sequential correlations in the CT, are indicated. All experiments were conducted at 400 MHz DNP conditions.
1248 Cell 167, 1241–1251, November 17, 2016

cooperative binding of EGFR monomers by an entropy-enthalpy D.P.-C., P.J., and L.C.K. designed and performed the dSTORM and gSTED
compensation mechanism. experiments. D.M. conducted DNP experiments and was supported by M.K.
M.K. and R.D. prepared unlabeled A431 vesicles for MS experiments. S.D.,
Analogous to emerging signal-transduction mechanisms
S.L., and A.J.R.H. performed MS experiments. M.K., S.N., P.B.H, K.H.,
across cell membranes (Nygaard et al., 2013), the concept of G.E.F., and M.B. analyzed data. M.K., P.B.H., and M.B. prepared the manu-
an allosteric regulation in which a reduction in receptor dynamics script, and all authors edited it.
may be sufficient to shift the conformational equilibrium from
inactive monomers and inactive predimers to EGF-activated
ACKNOWLEDGMENTS
EGFR populations may also help to understand ligand-induced
dimerization of other receptor tyrosine kinases. Our presented We thank Willem Kegel and Markus Weingarth for helpful discussions and
ssNMR approach may furthermore aid the refinement of struc- Johan van der Zwan for technical support. This work was funded in part by
ture and dynamics of such membrane-embedded EGFR popula- Netherlands Organization for Scientific Research (NWO) (grants 700.26.121
tions, including the domain IV region containing glycosylated and 700.10.443 to M.B, STW12152 to P.B.H. and a VIDI grant 723.013.008
to SL) and iNEXT (project number 653706), a Horizon 2020 program of the
sites critical for ligand binding (Whitson et al., 2005) and the
European Union. In addition, S.v.D., S.L., and A.J.R.H. are supported by the
C-terminal domain of EGFR. Such studies may provide critical project Proteins At Work (project 184.032.201), a program of the Netherlands
insight into the role of the plasma membrane and receptor Proteomics Centre financed by NWO as part of the National Roadmap Large-
dynamics in related eukaryotic growth factor receptor tyrosine scale Research Facilities of the Netherlands. The NMR experiments were sup-
kinases that play key roles in regulating cellular processes ported in part by uNMR-NL, an NWO-funded National Roadmap Large-Scale
such as proliferation, differentiation, or cell survival (Bessman Facility of the Netherlands. We are indebted to Paul Tordo and Olivier Ouari
(Marseille) for providing AMUPol.
et al., 2014; Kovacs et al., 2015).
Received: May 4, 2016
STAR+METHODS Revised: August 8, 2016
Detailed methods are provided in the online version of this paper Published: November 10, 2016
and include the following:
REFERENCES
d CONTACT FOR REAGENT AND RESOURCE SHARING Andronesi, O.C., Becker, S., Seidel, K., Heise, H., Young, H.S., and Baldus, M.
d EXPERIMENTAL MODEL AND SUBJECT DETAILS (2005). Determination of membrane protein structure and dynamics by magic-
angle-spinning solid-state NMR spectroscopy. J. Am. Chem. Soc. 127,
d METHOD DETAILS
12965–12974.
B Phosphorylation assay
13 15 Arkhipov, A., Shan, Y., Das, R., Endres, N.F., Eastwood, M.P., Wemmer, D.E.,
B Preparation of a [ C, N]-labeled medium to label eu-
Kuriyan, J., and Shaw, D.E. (2013). Architecture and membrane interactions of
karyotic cells the EGF receptor. Cell 152, 557–569.
13
B C, 15N labeling of eukaryotic cells
Arkhipov, A., Shan, Y., Kim, E.T., and Shaw, D.E. (2014). Membrane interaction
B Digestion of EGFR by Proteinase K enzyme
of bound ligands contributes to the negative binding cooperativity of the EGF
B Membrane vesicle preparation receptor. PLoS Comput. Biol. 10, e1003742.
13 15
B Preparation of [ C, N] A431 vesicles with EGF for
Arteaga, C.L., and Engelman, J.A. (2014). ERBB receptors: from oncogene
NMR discovery to basic science to mechanism-based cancer therapeutics. Cancer
B Nanobody labeling and dSTORM and gSTED imaging Cell 25, 282–303.
B Cryo-electron microscopy Baker, L.A., Daniëls, M., van der Cruijsen, E.A.W., Folkers, G.E., and Baldus,
B Mass spectrometry M. (2015). Efficient cellular solid-state NMR of membrane proteins by targeted
B Heatmaps of conformation-dependent amino acid dis- protein labeling. J. Biomol. NMR 62, 199–208.
tributions in EGFR Baldus, M., Petkova, A.T., Herzfeld, J., and Griffin, R.G. (1998). Cross polariza-
B Solid-state NMR and DNP experiments tion in the tilted frame: assignment and spectral simplification in heteronuclear
d QUANTIFICATION AND STATISTICAL ANALYSIS spin systems. Mol. Physics 95, 1197–1207.
B Quantification of EGFR extracellular domain cleavage Banci, L., Barbieri, L., Bertini, I., Luchinat, E., Secci, E., Zhao, Y., and Aricescu,
B Fluorescence intensity analysis of A431 membrane A.R. (2013). Atomic-resolution monitoring of protein maturation in live human
cells by NMR. Nat. Chem. Biol. 9, 297–299.
vesicles
Bessman, N.J., Freed, D.M., and Lemmon, M.A. (2014). Putting together struc-
tures of epidermal growth factor receptors. Curr. Opin. Struct. Biol. 29,
95–101.
Supplemental Information includes six figures and can be found with this Chow, W.Y., Rajan, R., Muller, K.H., Reid, D.G., Skepper, J.N., Wong, W.C.,
article online at http://dx.doi.org/10.1016/j.cell.2016.10.038. Brooks, R.A., Green, M., Bihan, D., Farndale, R.W., et al. (2014). NMR spec-
troscopy of native and in vitro tissues implicates polyADP ribose in biominer-
alization. Science 344, 742–746.
AUTHOR CONTRIBUTIONS
Coskun, Ü., Grzybek, M., Drechsel, D., and Simons, K. (2011). Regulation of
M.K., P.B.H., and M.B. designed experiments. M.K. and S.N. prepared iso- human EGF receptor by lipids. Proc. Natl. Acad. Sci. USA 108, 9044–9048.
lated labeled EGFR membrane vesicles. M.K. and K.H. conducted ssNMR Cox, J., and Mann, M. (2008). MaxQuant enables high peptide identification
experiments. C.d.H. performed the phosphorylation assay. M.K. and C.d.H. rates, individualized p.p.b.-range mass accuracies and proteome-wide pro-
performed confocal microscopy. P.J. and W.J.C.G. preformed EM. E.A.K, tein quantification. Nat. Biotechnol. 26, 1367–1372.
Cell 167, 1241–1251, November 17, 2016 1249

Dawson, J.P., Berger, M.B., Lin, C.-C., Schlessinger, J., Lemmon, M.A., and Kern, D., and Zuiderweg, E.R. (2003). The role of dynamics in allosteric regu-
Ferguson, K.M. (2005). Epidermal growth factor receptor dimerization and lation. Curr. Opin. Struct. Biol. 13, 748–757.
activation require ligand-induced conformational changes in the dimer inter- Kerns, S.J., Agafonov, R.V., Cho, Y.-J., Pontiggia, F., Otten, R., Pachov, D.V.,
face. Mol. Cell. Biol. 25, 7734–7742. Kutter, S., Phung, L.A., Murphy, P.N., Thai, V., et al. (2015). The energy land-
Doerner, A., Scheck, R., and Schepartz, A. (2015). Growth Factor Identity Is scape of adenylate kinase during catalysis. Nat. Struct. Mol. Biol. 22, 124–131.
Encoded by Discrete Coiled-Coil Rotamers in the EGFR Juxtamembrane Koers, E.J., van der Cruijsen, E.A.W., Rosay, M., Weingarth, M., Prokofyev, A.,
Region. Chem. Biol. 22, 776–784. Sauvée, C., Ouari, O., van der Zwan, J., Pongs, O., Tordo, P., et al. (2014).
Edelstein, A., Amodaj, N., Hoover, K., Vale, R., and Stuurman, N. (2010). NMR-based structural biology enhanced by dynamic nuclear polarization at
Computer control of microscopes using mManager. (Curr. Protoc. Mol. Biol.) high magnetic field. J. Biomol. NMR 60, 157–168.
Chapter 14, Unit14.20. Kovacs, E., Zorn, J.A., Huang, Y., Barros, T., and Kuriyan, J. (2015). A struc-
Endres, N.F., Das, R., Smith, A.W., Arkhipov, A., Kovacs, E., Huang, Y., Pelton, tural perspective on the regulation of the epidermal growth factor receptor.
J.G., Shan, Y., Shaw, D.E., Wemmer, D.E., et al. (2013). Conformational Annu. Rev. Biochem. 84, 739–764.
coupling across the plasma membrane in activation of the EGF receptor. Kremer, J.R., Mastronarde, D.N., and McIntosh, J.R. (1996). Computer visual-
Cell 152, 543–556. ization of three-dimensional image data using IMOD. J. Struct. Biol. 116,
Etzkorn, M., Seidel, K., Li, L., Martell, S., Geyer, M., Engelhard, M., and Baldus, 71–76.
M. (2010). Complex formation and light activation in membrane-embedded Liu, Y.C., Yen, H.Y., Chen, C.Y., and Chen, C.H. (2011). Sialylation and fuco-
sensory rhodopsin II as seen by solid-state NMR spectroscopy. Structure sylation of epidermal growth factor receptor suppress its dimerization and
18, 293–300. activation in lung cancer cells. Proc. Natl. Acad. Sci. USA 108, 11332–11337.
Ferguson, K.M., Berger, M.B., Mendrola, J.M., Cho, H.S., Leahy, D.J., and Low-Nam, S.T., Lidke, K.A., Cutler, P.J., Roovers, R.C., van Bergen en Hene-
Lemmon, M.A. (2003). EGF activates its receptor by removing interactions gouwen, P.M., Wilson, B.S., and Lidke, D.S. (2011). ErbB1 dimerization is pro-
that autoinhibit ectodomain dimerization. Mol. Cell 11, 507–517. moted by domain co-confinement and stabilized by ligand binding. Nat.
Frederick, K.K., Michaelis, V.K., Corzilius, B., Ong, T.-C., Jacavone, A.C., Struct. Mol. Biol. 18, 1244–1249.
Griffin, R.G., and Lindquist, S. (2015). Sensitivity-enhanced NMR reveals alter- Lu, C., Mi, L.-Z., Grey, M.J., Zhu, J., Graef, E., Yokoyama, S., and Springer,
ations in protein structure by cellular milieus. Cell 163, 620–628. T.A. (2010). Structural evidence for loose linkage between ligand binding
Frishman, D., and Argos, P. (1995). Knowledge-based protein secondary and kinase activation in the epidermal growth factor receptor. Mol. Cell. Biol.
structure assignment. Proteins 23, 566–579. 30, 5432–5443.
Garrett, T.P.J., McKern, N.M., Lou, M., Elleman, T.C., Adams, T.E., Lovrecz, Matsushita, C., Tamagaki, H., Miyazawa, Y., Aimoto, S., Smith, S.O., and Sato,
G.O., Zhu, H.-J., Walker, F., Frenkel, M.J., Hoyne, P.A., et al. (2002). Crystal T. (2013). Transmembrane helix orientation influences membrane binding of
structure of a truncated epidermal growth factor receptor extracellular domain the intracellular juxtamembrane domain in Neu receptor peptides. Proc.
bound to transforming growth factor alpha. Cell 110, 763–773. Natl. Acad. Sci. USA 110, 1646–1651.
Garrett, T.P.J., McKern, N.M., Lou, M., Elleman, T.C., Adams, T.E., Lovrecz, Mikhaylova, M., Cloin, B.M.C., Finan, K., van den Berg, R., Teeuw, J., Kijanka,
G.O., Kofler, M., Jorissen, R.N., Nice, E.C., Burgess, A.W., and Ward, C.W. M.M., Sokolowski, M., Katrukha, E.A., Maidorn, M., Opazo, F., et al. (2015).
(2003). The crystal structure of a truncated ErbB2 ectodomain reveals an Resolving bundled microtubules using anti-tubulin nanobodies. Nat. Commun.
active conformation, poised to interact with other ErbB receptors. Mol. Cell 6, 7933.
11, 495–505. Miljan, E.A., and Bremer, E.G. (2002). Regulation of growth factor receptors by
Gradmann, S., Ader, C., Heinrich, I., Nand, D., Dittmann, M., Cukkemane, A., gangliosides. Sci. STKE 2002, re15.
van Dijk, M., Bonvin, A.M.J.J., Engelhard, M., and Baldus, M. (2012). Rapid Morris, G.A., and Freeman, R. (1979). Enhancement of Nuclear Magnetic-
prediction of multi-dimensional NMR data sets. J. Biomol. NMR 54, 377–387. Resonance Signals by Polarization Transfer. J. Am. Chem. Soc. 101, 760–762.
Haigler, H., Ash, J.F., Singer, S.J., and Cohen, S. (1978). Visualization by Ni, Q.Z., Daviso, E., Can, T.V., Markhasin, E., Jawla, S.K., Swager, T.M., Tem-
fluorescence of the binding and internalization of epidermal growth factor in kin, R.J., Herzfeld, J., and Griffin, R.G. (2013). High frequency dynamic nuclear
human carcinoma cells A-431. Proc. Natl. Acad. Sci. USA 75, 3317–3321. polarization. Acc. Chem. Res. 46, 1933–1941.
den Hartigh, J.C., van Bergen en Henegouwen, P.M., Verkleij, A.J., and Boon- Nygaard, R., Zou, Y., Dror, R.O., Mildorf, T.J., Arlow, D.H., Manglik, A., Pan,
stra, J. (1992). The EGF receptor is an actin-binding protein. J. Cell Biol. 119, A.C., Liu, C.W., Fung, J.J., Bokoch, M.P., et al. (2013). The dynamic process
349–355. of b(2)-adrenergic receptor activation. Cell 152, 532–542.
Hoffman, D.B., Pearson, C.G., Yen, T.J., Howell, B.J., and Salmon, E.D. (2001). Ogiso, H., Ishitani, R., Nureki, O., Fukai, S., Yamanaka, M., Kim, J.-H., Saito,
Microtubule-dependent changes in assembly of microtubule motor proteins K., Sakamoto, A., Inoue, M., Shirouzu, M., and Yokoyama, S. (2002). Crystal
and mitotic spindle checkpoint proteins at PtK1 kinetochores. Mol. Biol. Cell structure of the complex of human epidermal growth factor and receptor extra-
12, 1995–2009. cellular domains. Cell 110, 775–787.
Hong, M., Zhang, Y., and Hu, F. (2012). Membrane protein structure and Pines, A., Gibby, M.G., and Waugh, J.S. (1973). Proton-Enhanced NMR of
dynamics from NMR spectroscopy. Annu. Rev. Phys. Chem. 63, 1–24. Dilute Spins in Solids. J. Chem. Phys. 59, 15–19.
Jura, N., Endres, N.F., Engel, K., Deindl, S., Das, R., Lamers, M.H., Wemmer, Renault, M., Tommassen-van Boxtel, R., Bos, M.P., Post, J.A., Tommassen,
D.E., Zhang, X., and Kuriyan, J. (2009). Mechanism for activation of the EGF J., and Baldus, M. (2012). Cellular solid-state nuclear magnetic resonance
receptor catalytic domain by the juxtamembrane segment. Cell 137, 1293– spectroscopy. Proc. Natl. Acad. Sci. USA 109, 4863–4868.
1307. Santoni, V., Molloy, M., and Rabilloud, T. (2000). Membrane proteins and pro-
Kaplan, M., Cukkemane, A., van Zundert, G.C.P., Narasimhan, S., Daniëls, M., teomics: un amour impossible? Electrophoresis 21, 1054–1070.
Mance, D., Waksman, G., Bonvin, A.M.J.J., Fronzes, R., Folkers, G.E., and Sauvée, C., Rosay, M., Casano, G., Aussenac, F., Weber, R.T., Ouari, O., and
Baldus, M. (2015). Probing a cell-embedded megadalton protein complex by Tordo, P. (2013). Highly efficient, water-soluble polarizing agents for dynamic
DNP-supported solid-state NMR. Nat. Methods 12, 649–652. nuclear polarization at high frequency. Angew. Chem. Int. Ed. Engl. 52, 10858–
Kaszuba, K., Grzybek, M., Or1owski, A., Danne, R., Róg, T., Simons, K., Coskun, 10861.
Ü., and Vattulainen, I. (2015). N-Glycosylation as determinant of epidermal Schneider, R., Seidel, K., Etzkorn, M., Lange, A., Becker, S., and Baldus, M.
growth factor receptor conformation in membranes. Proc. Natl. Acad. Sci. (2010). Probing molecular motion by double-quantum (13C,13C) solid-state
USA 112, 4334–4339. NMR spectroscopy: application to ubiquitin. J. Am. Chem. Soc. 132, 223–233.
1250 Cell 167, 1241–1251, November 17, 2016

Sengupta, P., Bosis, E., Nachliel, E., Gutman, M., Smith, S.O., Mihályné, G., dimerization drives ligand-independent activity of mutant epidermal growth
Zaitseva, I., and McLaughlin, S. (2009). EGFR juxtamembrane domain, mem- factor receptor in lung cancer. Mol. Biol. Cell 26, 4087–4099.
branes, and calmodulin: kinetics of their interaction. Biophys. J. 96, 4887–
Wang, Y., and Jardetzky, O. (2002). Probability-based protein secondary
4895.
structure identification using combined NMR chemical-shift data. Protein
Serber, Z., Keatinge-Clay, A.T., Ledwidge, R., Kelly, A.E., Miller, S.M., and Sci. 11, 852–861.
Dötsch, V. (2001). High-resolution macromolecular NMR spectroscopy inside
living cells. J. Am. Chem. Soc. 123, 2446–2447. Weingarth, M., Demco, D.E., Bodenhausen, G., and Tekely, P. (2009).
Improved magnetization transfer in solid-state NMR with fast magic angle
Shaka, A.J., Barker, P.B., and Freeman, R. (1985). Computer-optimized de-
spinning. Chem. Phys. Lett. 469, 342–348.
coupling scheme for wideband applications and low-level operation.
J. Magn. Reson. 64, 547–552. Whitson, K.B., Whitson, S.R., Red-Brewer, M.L., McCoy, A.J., Vitali, A.A.,
Smith, M.J., Marshall, C.B., Theillet, F.-X., Binolfi, A., Selenko, P., and Ikura, M. Walker, F., Johns, T.G., Beth, A.H., and Staros, J.V. (2005). Functional effects
(2015). Real-time NMR monitoring of biological activities in complex physio- of glycosylation at Asn-579 of the epidermal growth factor receptor. Biochem-
logical environments. Curr. Opin. Struct. Biol. 32, 39–47. istry 44, 14920–14931.
Stamos, J., Sliwkowski, M.X., and Eigenbrot, C. (2002). Structure of the Yarden, Y. (2001). The EGFR family and its ligands in human cancer. signalling
epidermal growth factor receptor kinase domain alone and in complex with mechanisms and therapeutic opportunities. Eur. J. Cancer 37(Suppl 4 ),
a 4-anilinoquinazoline inhibitor. J. Biol. Chem. 277, 46265–46272. S3–S8.
Tebbutt, N., Pedersen, M.W., and Johns, T.G. (2013). Targeting the ERBB fam- Ziomkiewicz, I., Loman, A., Klement, R., Fritsch, C., Klymchenko, A.S., Bunt,
ily in cancer: couples therapy. Nat. Rev. Cancer 13, 663–673. G., Jovin, T.M., and Arndt-Jovin, D.J. (2013). Dynamic conformational transi-
Valley, C.C., Arndt-Jovin, D.J., Karedla, N., Steinkamp, M.P., Chizhik, A.I., Hla- tions of the EGF receptor in living mammalian cells determined by FRET and
vacek, W.S., Wilson, B.S., Lidke, K.A., and Lidke, D.S. (2015). Enhanced fluorescence lifetime imaging microscopy. Cytometry A 83, 794–805.
Cell 167, 1241–1251, November 17, 2016 1251

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
Rabbit polyclonal anti-phosphoEGFR (Y1068) Cell Signaling Technology Cat#2234S
Rabbit monoclonal anti-EGFR (C74B9) Cell Signaling Technology Cat#2646S
Mouse anti-Actin (clone C4) MP Biochemicals Cat#691001; RRID: AB_2336056
Goat anti-mouse700 Li-Cor Cat#926-68170; RRID: AB_10956589
Goat anti-rabbit800 Li-Cor Cat#925-32211
[13C, 15N] algal amino acids mixture Cortecnet Cat#CCN070P1
[13C] L-Methionine Cambridge Isotope Laboratories Cat#CLM-893-H-PK
[13C] L-Phenylalanine Cambridge Isotope Laboratories Cat#CLM-2250-H-PK
[15N] L-Threonine Cambridge Isotope Laboratories Cat#NLM-742-PK
[15N] L-Leucine Cambridge Isotope Laboratories Cat#NLM-142-PK
EGF R&D Systems Cat#236-EG-01M
EGF-A488 ThermoFisher Scientific Cat#E-13345
AMUpol Sauvée et al., 2013 N/A
Human: A431 ATCC Cat#CRL-1555; RIDD: CVCL-0037
Mouse: NIH 3T3 clone 2.2 ATCC Cat#CRL-1658; RRID: CVCL-0594
DoM_Utrecht Mikhaylova et al., 2015 https://github.com/ekatrukha/DoM_Utrecht
Topspin Bruker Biospin N/A
Sparky T. D. Goddard and D. G. Kneller, SPARKY 3, https://www.cgl.ucsf.edu/home/sparky/
University of California, San Francisco
FANDAS Gradmann et al., 2012 N/A
Odyssey Application Software 2.0 Li-Cor N/A
Further information and requests for reagents may be directed to Paul van Bergen en Henegouwen (p.vanbergen@uu.nl).
A431 cells obtained from ATCC (CRL-1555, LGC Standards, Germany) and EGFR negative cells NIH 3T3 clone 2.2 murine fibroblasts
were cultured in Dulbecco’s modified eagle’s medium (DMEM: GIBCO, invitrogen, Paisley, UK) containing 10% (v/v) fetal calf serum
(FCS), L- glutamine, penicillin and streptomycin at 37 C with an atmosphere containing 5% CO2.
METHOD DETAILS
Phosphorylation assay
Phosphorylation of EGFR was induced either by adding 8 nM EGF to the cells in medium before membrane vesicles were prepared or
to membrane vesicles in a phosphorylation buffer for 10 min at 37 C. Proteins were separated by SDS-PAGE and blotted onto PVDF-
membrane. The membrane was incubated with R-a-phosphoEGFR (Y1068) (Cell Signaling Technology, Danvers, Massachusetts)
and M-a-Actin followed by G-a-R800 (Li-Cor) and G-a-mouse700 (Li-Cor). To detect EGFR the blot was first stripped with stripping
buffer and then blocked and incubated with R-a-EGFR (C74B9, Cell signaling technology)) followed by G-a-R800. The detection was
performed with the Odyssey imaging system (Li-COR) and bands were quantified using Odyssey software.

Preparation of a [13C, 15N]-labeled medium to label eukaryotic cells
For isotope labeling, we adapted published procedures using a combination of dialyzed fetal calf serum and labeled amino acid mix-
tures obtained from algae extracts to produce a [13C, 15N] enriched medium. 1 L of DMEM without amino acids was supplemented
with 2 g/L glucose and 1g of a [13C, 15N] algal amino-acid mixture (Cortecnet). Due to the absence of certain amino acids in this
mixture, unlabeled Trp (16 mg/L), Cys (62 mg/L) and Gln (2 mM) were added. In addition, 10% of dialyzed fetal calf serum was added
to the medium. 1 g of labeled algal mixture contained the following amino acids: ASX: 8.8%, THR: 3.2%, SER: 4.6%, GLU: 8.2%,
PRO: 4.0%, GLY: 8.3%, ALA: 11.9%, VAL: 7.0%, MET: 2.0%, ILE: 6.0%, LEU: 12.2%, TYR: 3.9%, PHE: 5.4%, HIS: 1.2%, LYS:
5.8%, TRP: 0.0%, ARG: 5.9%, CYS: 0.0%.
13
C, 15N labeling of eukaryotic cells
A431 cells were cultured in the labeled medium described above on Corning cell culture dishes (150 mm x 25 mm). Cells cultured in
the first week (2-3 passages) in the labeled medium were not used to prepare the samples to ensure full incorporation of labeled sub-
stance in the cells. Once the plates were 80%–90% confluent, cells were incubated with PBS containing 2 mM EGTA at 37 C for
15 min, after which they were scraped. Subsequently, cells were spun at 500xg for 10 min at 4 C. The cell pellet was resuspended in
PBS and spun again at 500xg for 10 min at 4 C and used to prepare the membrane vesicles as described below. Approximately 20
plates (150 mm x 25 mm) were used to fill a 3.2 mm rotor with [13C, 15N] labeled A431 membrane vesicles.
Digestion of EGFR by Proteinase K enzyme

Freshly prepared A431 membrane vesicles were incubated with 200 mg/mL of Proteinase K for 15 min on ice. Proteinase K was in-
activated by diluting 2 nM PMSF in 1:1 ratio in the digestion mixture. The proteins were separated by SDS-PAGE and blotted onto
PVDF membrane. The membrane was incubated overnight with R-a-EGFR (C74B9, Cell signaling technology) and M-a-Actin. The
protein quantities were determined by incubation with G-a-R800 (Li-Cor) and G-a-mouse700 (Li-Cor), followed by detection on
the Odyssey imaging system (Li-Cor).
Membrane vesicle preparation

Cells were re-suspended with homogenization buffer (10 mM Tris pH 7.4, 250 mM sucrose, 1 mM EDTA. Phosphatase inhibitors
(100 mM sodium-orthovanadate) and protease inhibitors (Complete, Roche) were added freshly and cells were vesiculated by pass-
ing them 10 times through a syringe (21 gx1.5; 0.2x40mm). Subsequently, cells were spun at 1000xg at 4 C for 10 min to remove
unbroken cells, nuclei and cell debris from the supernatant. This was repeated until no pellet was visible anymore. The supernatant
was subsequently spun at 150,000xg for 30 min at 4 C to collect membrane vesicles. Vesicles were resuspended in 10 mM HEPES
pH 7.4, supplemented with protease and phosphatase inhibitors.
Preparation of [13C, 15N] A431 vesicles with EGF for NMR

Isolated A431 membrane vesicles were spun down at 124,000xg for 25 min at 4 C, and the pellet was resuspended in phosphory-
lation buffer (20 mM HEPES pH 7, 10 mM MgCl2, 3 mM MnCl2, 1 mM DTT, with protease and phosphatase inhibitors. To this buffer,
1 mM ATP was added. The vesicles were incubated with 8 nM EGF at 37 C for 10 min. Subsequently, the vesicles were washed three
times with 10 mM HEPES pH 7.4 buffer containing protease and phosphatase inhibitors. For DNP samples, the sample was washed
once with 10 mM HEPES pH 7.4 (containing protease and phosphatase inhibitors) and then two times with DNP buffer: 20 ml AMUPol5
(in D2O), 20 ml H2O, 20 ml D2O, 40 mg glycerol-d8 (equivalent to 30 ul) and 10 ml 100 mM HEPES. For each washing step, 50 ml of the
buffer was used.
Nanobody labeling and dSTORM and gSTED imaging

Vesicles derived from A431 cells were labeled in suspension with 10 nM anti-EGFR (7D12) nanobodies conjugated to Alexa647, 8 nM
EGF conjugated to Alexa488 and 10 mM DiI (1,1’-Dioctadecyl-3,3,30 ,30 -Tetramethylindocarbocyanine Perchlorate) for 1 hr at 4 C.
Non-bound ligands were removed by centrifugation at 75,000xg for 40 min and vesicles were attached to glass slides for microscop-
ical analysis.
dSTORM microscopy was performed on a Nikon Ti microscope equipped with a 100x Apo TIRF oil objective (NA. 1.49), a Per-
fect Focus System and an additional 2.5x Optovar to achieve an effective pixel size of 64 nm (Mikhaylova et al., 2015). Evanescent
laser illumination was achieved using a custom illumination pathway with a 15 mW 405 nm diode laser (Power Technology) and a
40 mW 640 nm diode laser (Power Technology). Fluorescence was detected using a water-cooled Andor DU-897D EMCDD cam-
era and ET series Cy5 filter (Chroma Technology). All components were controlled by mmanager software (Edelstein et al., 2010).
For dSTORM imaging of Alexa Fluor 647, the sample was continuously illuminated with 640 nm. In addition, the sample was illu-
minated with 405 nm light at increasing intensity to keep the number of fluorophores in the fluorescent state constant. Between
5000 and 10000 frames were recorded per acquisition with exposure time of 40 ms. Purified vesicles from A431 (EGFR-positive)
and 3T3 fibroblasts (EGFR negative) were incubated with 10 nM of anti-EGFR or non-specific nanobodies fused to Alexa647 for
one hour at 4 C. Four flow chambers with an approximate volume of 5 mL each were made with stripes of double-sided tape be-
tween a plasma-cleaned 22x22 mm coverslip and the microscope slide. These chambers were filled with four consecutive 10
times dilutions of labeled vesicles and incubated for 5 min at RT. The chamber was washed with 25 mL of imaging buffer to remove

non-attached vesicles and sealed using vacuum grease. The composition of the imaging buffer was 100 mM MEA, 5% w/v
glucose, 700 mg/ml glucose oxidase and 40 mg/ml catalase in PBS buffer. The dilution containing optimal density of vesicles
on the coverslip was chosen for the imaging (20-30 vesicles per 100 mm2). The signals from 3T3 fibroblasts vesicles labeled
with anti-EGFR nanobody and A431 vesicles labeled with non-specific nanobody were not distinguishable from the noise. Analysis
of the dSTORM localization and rendering was performed using a custom written ImageJ plugin for the single-molecule localiza-
tion (https://github.com/ekatrukha/DoM_Utrecht). Each spot was fitted with asymmetric two-dimensional Gaussian PSF and only
fits with a calculated width within ± 30% of the measured PSF’s SD were accepted. Localizations within one pixel distance in a
number of successive frames were considered to arise from the same molecule. In this case the weighted mean was calculated for
each coordinate, where weights were equal to inverse squared localization errors. The resulting table with molecule coordinates
was used to render the final localization image with 5 nm pixel size. Each molecule was plotted as a 2D Gaussian of the integrated
intensity equal to one and with SD in x and y equal to the localization precision. The noise arising from non-specific localization
was suppressed using local density based filtering. Only particles having more than 150 neighbors in the circle of 250 nm radius
were kept in the reconstruction.
Gated STED imaging of A431 vesicles was performed with a Leica TCS SP8 STED 3X microscope using HC PL APO 100x/1.4 oil
STED WHITE objective. Alexa488 EGF was excited with the 488 nm wavelength of pulsed white laser (80MHz) and depleted with CW
592 nm STED laser. DiI was excited with 561 nm line and depleted with CW 660 nm line. Alexa647 conjugated anti-EGFR nanobody
was imaged with 633 nm excitation with white laser and depleted with 775 nm pulsed laser. Images were acquired in 2D STED mode
with vortex phase mask. Depletion laser power was equal to 10%–30% of maximum power and we used an internal Leica GaAsP
HyD hybrid detector with a time gate of 0.3 % tg % 6 ns. Confocal three color imaging was performed on the same setup using
the same white laser excitation and emission settings from LAS X controlling software library.
Cryo-electron microscopy
For the preparation of thin vitrified specimens of the A431 vesicles, a 3 ul drop of sample was placed on the surface of a glow
discharged Quantifoil micromachined holey carbon (R 2/2) TEM grid (Quantifoil Micro Tools GmbH, Jena, Germany) held by
the Vitrobot mark IV tweezer (FEI, Eindhoven, the Netherlands). Before introducing the sample into the Vitrobot, the environmental
chamber of the Vitrobot was equilibrated at room temperature (22 C) and humidity was set at 100%. Blotting conditions were cho-
sen so that a 10-500 nm liquid specimen film spanning R 2/2 mm holes of the QF were formed when excess sample was removed
by the blotting filter paper in the Vitrobot. The specimen was released and fell through the opening shutter and into liquid ethane at
its freezing point, where the thin specimen films were vitrified. The vitreous specimen was transferred under liquid nitrogen into a
Gatan 626 single tilt liquid nitrogen cryo holder (Gatan GmbH, Munich, Germany) and into a Tecnai20 LaB6 electron microscope
(FEI, Eindhoven, the Netherlands), where the specimen temperature was maintained below 165 C. An Eagle 4k 3 4k CCD cam-
era (FEI, Eindhoven, the Netherlands) was used under normal and low-dose conditions to record micrographs of the vesicles,
which was done in Tif format with a nominal under focus of 3 mm. Vesicle diameter was measured using the IMOD software pack-
age (Kremer et al., 1996).
Mass spectrometry
A431 vesicles and cells were lysed in 50mM ammonium bicarbonate, 1% SDC, 10mM TCEP, 100mM TRIS, 40mM chloroacetamide
and complete protease inhibitor cocktail (Roche) and boiled for 5 min at 95 C. The supernatant was diluted 10 times and digested
overnight using LysC (1:75) and trypsin (1:50). SDC was removed by acidifying the samples with formic acid and spinning down. The
supernatant was desalted using C18 SepPak (Waters) cartridges, vacuum-dried and stored at 80 C for further analysis. Peptide
mixtures were reconstituted in 10% formic acid and 1mg of protein digest of each sample was analyzed by nano-LC-MS/MS on
an Orbitrap Q-Exactive plus (ThermoFisher Scientific, Bremen). The digest was trapped on an in-house made trap column (Reprosil
pur C18, dr maisch, 100 mm x 2 cm, 3 mm) by loading for 10 min with A (A: 0.1% formic acid) and separated on an analytical column
(Poroshell 120 EC C18, Agilent Technologies, 50 mm x 50 cm, 2.7 mm) using a 2 hr linear gradient from 13% to 40% B (B: 0.1% formic
acid, 80% ACN). During each scan cycle, the 10 most intense peptide precursors were selected for higher-energy collisional disso-
ciation (HCD). Raw data files were processed with MaxQuant version 1.5.3.30. The data were searched against the Human UniProt
database (February 2016, 151.869 entries). A false discovery rate was set to 1% at protein and peptide level. Peptide intensities were
normalized to total peptide intensities in each LC-MS run. For relative quantification, intensities of all unique and razor peptides of a
protein were summed up (Cox and Mann, 2008).
Heatmaps of conformation-dependent amino acid distributions in EGFR

The available structures were first split according to individual domains (DI-IV, TM, KD, and CT). Domain segments that were elusive
from the available structures were built in random-coil conformation using PyMol. Secondary structure assignments were made for
these structures by supplying the .pdb files to the software: STRuctural IDEntification (STRIDE, Frishman and Argos, 1995), which
uses the phi and psi angles of the residues to assign secondary structures. The secondary structure assignments were sorted
into three categories based on their similarity: a-helix (simplified from a-helix and 310 helix), b sheet (simplified from strand and bridge)
and random coil (for turn and coil). These data were subsequently used to calculate the amino acid distribution per secondary struc-
ture over the whole protein.

Solid-state NMR and DNP experiments
NMR experiments were conducted using a standard-bore 700 MHz as well as wide-bore 800 MHz/ 527 GHz DNP and 400 MHz/263
GHz DNP systems (Bruker Biospin). We filled fully [13C, 15N] or MFTL labeled A431 membrane vesicles obtained from around 20
(150 mm x 25 mm) plates into standard 3.2 mm rotors. For all DNP measurements, samples were cooled down to 100 K using
3.2 mm sapphire rotors. DNP samples were prepared using AMUpol (Sauvée et al., 2013) and buffers as described above. The
DNP enhancement was measured by overlaying HC and HN CP/MAS spectra recorded with and without microwave irradiation.
Two and three-dimensional NC correlation spectra were recorded using SPECIFIC-CP N-13C transfers (Baldus et al., 1998). Homo-
nuclear (13C,13C) transfers were established using PARIS (Weingarth et al., 2009) or spin-diffusion mixing blocks. 1H decoupling using
SPINAL64 was employed during evolution and detection periods except in HC HETCOR (Figure S4) where GARP (Shaka et al., 1985)
decoupling was employed at 10 KHz. Below the processing parameters for the NMR experiments displayed as main or supplemental
Figures are given.
Processing parameters of DQSQ in Figure 4 A-F

Field strength CP time [ms] SPC5 mixing time [ms] Temperature [K] MAS [kHz]
700 MHz 500 2.3 285 9
The DQSQ CC 2D datasets (with and without EGF) in Figure 4 were acquired using 110 t1 points with a spectral window of
46656.176 Hz in t1. The spectra were processed using an EM function, line broadening 100 Hz in t2 and t1 and with 2K and 1K
zero filling in t2 and t1 respectively, with 8 coefficients linear prediction in t1. Note that the experiment with EGF was multiplied by
factor 1.4 to compensate for the sample amount compared to the sample without EGF.
Processing parameters of NCOCX data in Figure 5B, D and E

HN, CP SPECIFIC CP PDSD mixing Total acquisition
Spectrum time [ms] mixing [ms] time [ms] Temperature [K] MAS [kHz] time (days)
NCOCX 400 MHz DNP, 350 4.6 20 100 8 3.5
With EGF
NCOCX 400 MHz DNP, 200 4.8 20 100 8 3.5
No EGF
The NCOCX experiments in Figure 5 (400 MHz DNP) were acquired using 8 points in t1 and 13 points in t2 with spectral width of
3012.048 Hz and 1620.745 Hz in t2 and t1 respectively. The spectra were processed using a squared sine bell function 3 in t3, t2 and t1
with 4k zero filling in t3. In t2 and t1 128 points of zero filling were used.
Processing parameters of 15N-edited CC spectrum in Figure 5C

HN, CP SPECIFIC CP PDSD mixing Total acquisition
Spectrum time [ms] mixing [ms] time [ms] Temperature [K] MAS [kHz] time (days)
N-edited CC 350 4.6 20 100 8 2.5
400 MHz DNP, With EGF
N-edited CC 350 4.8 20 100 8 2.5
400 MHz DNP, No EGF
The 15N-edited CC experiments in Figure 5C were acquired using 50 points in t1 with spectral width of 12569.131 Hz in t1. The
spectra were processed using a squared sine bell function 2 in t2 and t1 with 2k zero filling in t2. In t1 128 points of zero filling
were used.
Processing parameters of DQSQ data in Figure S1 C

Field strength CP time [ms] SPC5 mixing time [ms] Temperature [K] MAS [kHz]
700 MHz 500 2.3 253 K 9
The DQSQ CC 2D datasets in Figure S1C were acquired using 110 t1 points with a spectral window of 46656.176 Hz in t1. The
spectrum was processed using EM function, line broadening 100 Hz in t2 and t1 and with 2k and 1K zero filling in t2 and t1 respectively,
with 8 coefficients linear prediction in t1.

Processing parameters of CC PARIS in Figure S2
Field strength CP time [ms] PARIS mixing time [ms] Temperature [K] MAS [kHz]
700 MHz 500 30 253 9
The PARIS CC 2D dataset in Figure S2 was acquired using 221 t1 points with a spectral window of 36982.246 Hz in t1. The spectrum
was processed using EM function with line broadening 100 Hz in both t1 and t2. 1k and 2k zero filling was used in both t1 and t2 respec-
tively, with 4 linear prediction coefficients in t1.
Processing parameters of HC HETCOR in Figure S4

Field strength Temperature [K] MAS [kHz]
700 MHz 285 K 9
The HC HETCOR 2D dataset in Figure S4 was acquired using 92 t1 points with a spectral window of 7142.86 Hz in t1. The spectrum
was processed using squared sine function (SSB = 2) in both t1 and t2. 1k and 2k zero filling was used in both t1 and t2 respectively,
with 40 linear prediction coefficients in t1.
Processing parameters of NCOCX data in Figure S6B

Field strength HN, CP time [ms] SPECIFIC CP time [ms] PDSD mixing [ms] Temperature [K] MAS [kHz]
NCOCX 800 MHz DNP, With EGF 400 3.6 20 100 8
NCOCX 800 MHz DNP, No EGF 350 4.0 20 100 8
The NCOCX experiments in Figure S6B (800 MHz DNP) were acquired using 15 t1 points with a spectral width of 3333.33 Hz in t1.
The spectra were processed using squared sine function 2.5 in both t1 and t2 with 4k and 1k zero filling points in t2 and t1 respectively.
Quantification of EGFR extracellular domain cleavage

Intensity of EGFR- and actin-positive bands on western blot were quantified using the Odyssey Application software. The total
amount of EGFR is corrected for the loading control. The percentage of EGFR extracellular domain cleavage was calculated as a
product of the sum of full-length EGFR and the EGFR degradation product.
Fluorescence intensity analysis of A431 membrane vesicles

The measurements of integrated fluorescence intensity without background were performed similar to method described previously
(Hoffman et al., 2001) using custom written ImageJ plugin DoM_Utrecht (https://github.com/ekatrukha/DoM_Utrecht). We counted
raw integrated intensity IR of squared 13 3 13 pixel region of area SR that was centered on the maximum intensity pixel of a fluo-
rescent spot. The raw integrated intensity of background IB was equal to integrated counts of 14 3 14 pixel region of area SB minus
IR. The final integrated fluorescence intensity (without background) IF was equal to:
SR
IF = IR IB
SB

Figure S1. Temperature Dependence of One- and Two-Dimensional ssNMR Experiments Using [13C, 15N]-Labeled A431 Plasma Membrane
Vesicles with and without EGF, Related to Figure 4
(A) 13C CP (cross polarization, which probes the rigid parts of the sample (Pines et al., 1973)) experiment of [13C, 15N]-labeled A431 plasma membrane vesicles
without EFG at 253 K (blue) and 285 K (orange).
(B) INEPT-based (See (Morris and Freeman, 1979) experiment, to probe the mobile parts of the sample of [13C, 15N]-labeled A431 plasma membrane vesicles
without EFG at 253 K (blue) and 285 K (orange).
(C) 2D 13C,13C) double-quantum / single-quantum experiment (DQSQ) with (red) and without (blue) EGF performed at 253 K.
(D) First increment of 2D NCa of [13C, 15N]-labeled A431 plasma membrane vesicles without EGF (blue at 253 K and orange at 285 K) and with EGF (red at 253 K
and green at 285 K).
Figure S2. Comparison of ssNMR Spectra of [13C, 15
N]-Labeled A431 Membrane Vesicles at Low Temperatures to EGFR Chemical-Shift
Predictions, Related to Figure 4
The 2D (13C,13C) PARIS experiment was performed at 253 K. Black crosses represent FANDAS (Gradmann et al., 2012) predictions of EGFR based on the
different available structures and assuming random-coil chemical shifts for the C-terminal region (CT). Note that the peaks at 70 ppm are stemming from lipids.
As mentioned in the section Materials and Methods, EGFR samples were prepared using unlabeled Glutamine, Tryptophan and Cysteine amino acids and,
correspondingly, were not included in the FANDAS correlation map. FANDAS predictions were made based on the following structures: 1NQL (Extracellular
inactive), 2M20 (Transmembrane domain), 2M20 (Juxtamembrane), 1M14 (Kinase domain), 1M14 (part of the C-terminal tail).
Figure S3. ssNMR Signal Patterns for Extended Measurement Periods, Related to Figure 4
1D 13C CP and INEPT on [13C, 15N]-labeled A431 vesicles with and without EGF performed during the course of 2D experiments. At the end of measurements (day
16), both samples showed the same profile as in the beginning of the measurements. Data were recorded on a 700 MHz NMR instrument.
Figure S4. Mobile Molecules Appear at Higher Temperature in 2D ssNMR Data, Related to Figure 4
2D INEPT experiment (See Andronesi et al., 2005) of [13C, 15
N]-labeled A431 membrane vesicles without EGF performed at 285 K showing mobile molecular
components.
Figure S5. Secondary-Structure Analysis of EGFR, Actin, and EGFR Domains, Related to Figures 4 and 5
(A) Comparison of the distribution of Ser, Thr, Pro and Ala residue in different secondary structures between EGFR (red) and Actin (blue). The y axis represents the
number of each amino acid in the correspondent secondary structure.
(B) Heatmaps of the distribution of Ala, Pro, Ser and Thr residues in EGFR for the three secondary structure elements (a-helix, b strand and random coil). Red and
green stand for the highest and lowest numbers of occurrence, respectively.
Figure S6. Sequential Correlations Predicted for Actin in the MFTL-Labeled A431 Membrane Vesicles and High-Field DNP Data, Related to
Figure 5
(A) highlights the three expected correlations of Actin in the MFTL labeled A431 membrane vesicles.
(B) 2D NCOCX of MFTL labeled A431 vesicles with (cyan) and without (orange) EGF performed on a 800 MHz DNP machine (Koers et al., 2014). Dotted lines
connect the Cb region of Phe in both spectra.
Article
The C. elegans Taste Receptor Homolog LITE-1 Is a

Photoreceptor
Jianke Gong, Yiyuan Yuan, Alex Ward, ...,
Zhaoyang Feng, Jianfeng Liu,
X.Z. Shawn Xu
Correspondence
jfliu@mail.hust.edu.cn (J.L.),
shawnxu@umich.edu (X.Z.S.X.)
In Brief
A taste receptor homolog absorbs UV
light and mediates avoidance behavior in
C. elegans in response to light exposure.
Highlights
d LITE-1, a taste receptor homolog, is a bona fide
photoreceptor that senses UV light
d LITE-1 has a high efficiency of photon capturing
d Photoabsorption by LITE-1 relies on its conformation and

requires two Trp residues
d Introducing such a Trp residue into a related protein

promotes photosensitivity
Gong et al., 2016, Cell 167, 1252–1263

Article
The C. elegans Taste Receptor

Homolog LITE-1 Is a Photoreceptor
Jianke Gong,1,2 Yiyuan Yuan,2,4 Alex Ward,2 Lijun Kang,2,5 Bi Zhang,1,2 Zhiping Wu,3 Junmin Peng,3 Zhaoyang Feng,4
Jianfeng Liu,1,* and X.Z. Shawn Xu2,6,*
1College of Life Science and Technology, Collaborative Innovation Center for Brain Science, and Key Laboratory of Molecular Biophysics
of MOE, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
2Life Sciences Institute and Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, MI 48109, USA
3Departments of Structural Biology and Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
4Department of Pharmacology, Case Western Reserve University, Cleveland, OH 44106, USA
5Present address: Institute of Neuroscience, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310058, China
6Lead Contact
*Correspondence: jfliu@mail.hust.edu.cn (J.L.), shawnxu@umich.edu (X.Z.S.X.)

SUMMARY and Hardie, 2009). In addition to image-forming photoreceptor

cells in the retina, a growing list of non-image-forming photosen-
Many animal tissues/cells are photosensitive, yet sitive cells/tissues has been identified in a wide range of animal
only two types of photoreceptors (i.e., opsins and species (Wang and Montell, 2007; Yau and Hardie, 2009). For
cryptochromes) have been discovered in metazoans. example, a subset of ganglion and horizontal cells in the verte-
The question arises as to whether unknown types brate retina are photosensitive (Yau and Hardie, 2009). Photo-
of photoreceptors exist in the animal kingdom. sensitive cells are also found in the skin (e.g., keratinocytes
and melanocytes) of mammals, the pupil of most vertebrates,
LITE-1, a seven-transmembrane gustatory receptor
the pineal of non-mammalian vertebrates, the hypothalamus of
(GR) homolog, mediates UV-light-induced avoidance
birds, and the body surface of insects (Bellono et al., 2013; Fos-
behavior in C. elegans. However, it is not known ter and Soni, 1998; Moore et al., 2013; Xiang et al., 2010; Yau and
whether LITE-1 functions as a chemoreceptor or Hardie, 2009). However, in contrast to microbes and plants,
photoreceptor. Here, we show that LITE-1 directly which express many types of photoreceptors, only two such
absorbs both UVA and UVB light with an extinction groups of proteins have been identified in the animal kingdom:
coefficient 10–100 times that of opsins and crypto- opsins and cryptochromes (Wang and Montell, 2007; Yau and
chromes, indicating that LITE-1 is highly efficient in Hardie, 2009). The question thus arises as to whether unknown
capturing photons. Unlike typical photoreceptors types of photoreceptors exist in metazoans.
employing a prosthetic chromophore to capture pho- The nematode C. elegans detects and responds to a wide
tons, LITE-1 strictly depends on its protein confor- variety of sensory cues such as mechanical forces (e.g., touch
and stretch), chemicals (e.g., odorants and tastants), and tem-
mation for photon absorption. We have further iden-
perature, representing a popular genetic model organism for
tified two tryptophan residues critical for LITE-1
the study of sensory perception (de Bono and Maricq, 2005).
function. Interestingly, unlike GPCRs, LITE-1 adopts Despite the lack of eyes, C. elegans also responds to light
a reversed membrane topology. Thus, LITE-1, a taste (Edwards et al., 2008; Ward et al., 2008). Specifically, short
receptor homolog, represents a distinct type of wavelengths of light, particularly UV light, induce avoidance
photoreceptor in the animal kingdom. behavior (negative phototaxis) in C. elegans, which is mediated
by a group of photosensory neurons, providing a protective
INTRODUCTION mechanism for the worm to avoid lethal doses of UV in the sun-
light (Liu et al., 2010; Ward et al., 2008). LITE-1, a member of the
Light sensation is critical for all phyla of life, ranging from bacteria invertebrate seven-transmembrane (7-TM) gustatory receptor
to humans (Wang and Montell, 2007; Yau and Hardie, 2009). (GR) family, is required for UV-light-induced avoidance behavior
Organisms have evolved various types of photoreceptor proteins (Edwards et al., 2008; Liu et al., 2010). Ectopic expression of
(hereinafter referred to as photoreceptors) to detect light (Falcia- LITE-1 can confer photo-sensitivity to photo-insensitive cells
tore and Bowler, 2005; Wang and Montell, 2007; Yau and Hardie, (Edwards et al., 2008; Liu et al., 2010). Despite such indirect
2009). These photoreceptors show different spectral properties, evidence suggesting LITE-1 as a candidate photoreceptor, other
with some sensing blue and others detecting green and red, possibilities remain. For example, unlike long wavelengths of
covering a wide spectrum of light (Falciatore and Bowler, 2005; light, UV illumination produces reactive oxygen species (ROS)
Wang and Montell, 2007; Yau and Hardie, 2009). Photoreceptors such as H2O2, which in turn can evoke an avoidance behavioral
are typically composed of two moieties: a host protein and a response similar to that induced by UV light (Bhatla and Horvitz,
prosthetic chromophore (e.g., retinal), the latter of which is 2015). Given that LITE-1 is a member of the gustatory receptor
responsible for light absorption (Wang and Montell, 2007; Yau (GR) family, it has thus been suggested that LITE-1 may function
1252 Cell 167, 1252–1263, November 17, 2016 ª 2016 Elsevier Inc.
α C-LITE-1 Figure 1. LITE-1 Adopts an Unusual Mem-
brane Topology, with Its C Terminus Facing
Extracellularly and N Terminus Located
A Intracellularly
(A) A schematic of LITE-1 membrane topology.
Antibodies were raised against the N-terminal
α N-LITE-1 (aN-LITE-1) and C-terminal (aC-LITE-1) peptide
(15 aa) of the LITE-1 long isoform.
B Non-permeabilized surface staining Permeabilized staining (B) LITE-1 displays a distinct membrane topology,
GFP Ab staining Merge GFP Ab staining Merge with its C terminus facing extracelluarly and it
N terminus located in the cytoplasm. Shown are
confocal images from immunofluorescence
α C-LITE-1 staining. LITE-1 was co-expressed with GFP as a
transgene in muscles under the myo-3 promoter.
Staining was performed on primary cultured cells
under non-permeabilizing conditions for surface
staining or under permeablizing conditions to stain
α N-LITE-1 the entire cell. aN-LITE-1 and aC-LITE-1 detect
the N- and C-terminal end of LITE-1, respectively.
aN-Myc stains the Myc tag fused to the N-terminal
end of LITE-1. See Figure S1 for controls. Scale
bar, 2 mm.
α N-Myc (C) BiFC images showing that the N terminus of
LITE-1 is located in the cytoplasm. Shown on the
left are schematics describing the design of the
BiFC approach. Shown on the right are fluores-
C cence images. N-YFP::ZIP::LITE-1 was expressed
YFP DsRed Merge
as a transgene in muscles using the myo-3 pro-
moter. C-YFP::ZIP (or C-YFP::DZIP that lacks a
N-YFP::ZIP::LITE-1 zipper domain) and DsRed were co-expressed as
+
a separate transgene in muscles using the same
C-YFP::ZIP
+ promoter. Two transgenes were crossed together
N-YFP::ZIP::LITE-1 DsRed to examine reconstitution of YFP fluorescence
in muscles. Only if the N terminus of LITE-1 is
located intracellularly would one be able to
C-YFP::ZIP detect YFP fluorescence. Muscles were acutely
N-YFP::ZIP::LITE-1
dissected out from transgenic worms using a
+
C-YFP::ΔZIP protocol described previously (Liu et al., 2013).
+ Scale bar, 100 mm.
DsRed Also see Figure S1.
N-YFP::ZIP::LITE-1
C-YFP::ΔZIP
as a chemoreceptor (Yau and Hardie, 2009). In this case, LITE-1 residue into another GR family member promotes photosensi-
would sense light-produced chemicals, but not light per se. tivity, opening up the intriguing prospect that it might be possible
To address this conundrum, here we purified LITE-1 protein to genetically engineer new photoreceptors.
from worm lysate and found that it directly absorbs UVA and
UVB light. This property of LITE-1, together with its capacity in RESULTS
producing light-evoked functional outputs in vivo, indicates
that LITE-1 is a photoreceptor. LITE-1 bears a number of unique LITE-1 Adopts a Membrane Topology Opposite to
features that distinguish it from other photoreceptors. These Conventional 7-TM Receptors
include an exceptionally high efficiency in photoabsorption, an As a first step, we considered whether LITE-1 is related to any
ability to sense both UVA and UVB light, a strict dependence known photoreceptors. LITE-1 is predicted to contain 7-TM do-
on conformation for photoabsorption, a strong resistance to mains (Figure 1A). The only known 7-TM photoreceptors in meta-
bleaching by UV light, and a reversed membrane topology zoans are opsins, but LITE-1 has no significant homology with
compared to opsins. These results identify LITE-1, a taste recep- opsins at the sequence level (Edwards et al., 2008; Liu et al.,
tor homolog, as a unique photoreceptor, with features not seen 2010). As both insect OR (olfactory receptors) and GR (gustatory
in any known photoreceptors. Thus, novel types of photorecep- receptors) members were shown to possess a membrane topol-
tors are present in the animal kingdom. Furthermore, we identi- ogy opposite to conventional 7-TM receptors (Benton et al.,
fied two tryptophan residues in LITE-1 that are critical for 2006; Zhang et al., 2011), we thus questioned whether LITE-1
photoabsorption. Remarkably, introducing such a tryptophan and opsins are even related at the membrane topology level.
Cell 167, 1252–1263, November 17, 2016 1253

To probe the membrane topology of LITE-1, we raised anti- 2010), though it remains unclear whether such photosensitivity
bodies against the N and C termini of LITE-1 (Figure 1A). Immu- results from light or light-produced chemicals. Indeed, as previ-
nostaining with these antibodies did not reveal consistent LITE-1 ously reported (Edwards et al., 2008; Liu et al., 2010), UV light
expression in worm tissues (A.W. and X.Z.S.X., unpublished can induce the contraction of body-wall muscles ectopically ex-
data), suggesting that LITE-1 is expressed at a very low level pressing LITE-1, leading to body paralysis (Figures 2A and S2
in vivo. We therefore generated transgenic animals expressing and Movies S1 and S2). To provide more direct and quantitative
LITE-1 in muscle cells using a muscle-specific promoter, as evidence, we recorded the response of muscle cells to UV light
LITE-1 can be functionally expressed in these cells at a higher by calcium imaging using the genetically encoded calcium
level, though it remains possible that recombinant LITE-1 may sensor RCaMP. We found that UV illumination induced robust
not fully preserve all the functional properties of native proteins calcium transients in muscle cells ectopically expressing
(Edwards et al., 2008; Liu et al., 2010) (also see below). We found LITE-1, but not in control muscle cells (Figures 2B–2D). These
that our LITE-1 antibodies can detect LITE proteins in primary experiments show that LITE-1 was functionally expressed in
cultured muscle cells (Figure 1B). Surprisingly, the C-terminal muscle cells. They also show that LITE-1 can indeed confer
end of LITE-1 appears to be extracellular, as antibodies against photo-sensitivity to photo-insensitive cells, demonstrating that
LITE-1’s C terminus can detect LITE-1 when applied extracellu- it can be potentially used as an optogenetic tool.
larly under non-permeabilizing conditions (Figure 1B). This stain- Since our LITE-1 antibodies are not suitable for affinity purifi-
ing is specific for LITE-1 since no signal was observed in control cation, we then tested a number of monoclonal antibodies
muscle cells (Figure S1). By contrast, the same protocol failed to against small affinity tags such as Myc, FLAG, and 1D4 and
detect LITE-1 with antibodies against its N-terminal end, though found that 1D4 antibody worked most efficiently. Using this anti-
the protein was clearly expressed in these cells, as shown under body, we were able to affinity-purify LITE-1, a membrane protein,
permeabilizing conditions (Figure 1B). To provide additional evi- to homogeneity, as determined by SDS-PAGE followed by Coo-
dence, we fused a Myc tag to the N-terminal end of LITE-1 and massie staining (Figure 2E) and by western blot (Figure 2F). This
obtained the same result (Figure 1B). This suggests that the result was also verified by silver staining (data not shown).
N-terminal end of LITE-1 is intracellular.
To collect further evidence, we employed the BiFC (bimolec- Purified LITE-1 Protein Absorbs Photons
ular fluorescence complementation) approach (Hu et al., 2002). By subjecting purified LITE-1 protein to spectrophotometric
In this approach, the N- and C-terminal fragment of YFP is fused analysis, we found that it exhibited strong absorption of UV light,
to a leucine zipper domain to generate N-YFP::ZIP and C-YFP:: with two absorbance peaks at 280 and 320 nm (Figure 2G). Thus,
ZIP, respectively (Figure 1C). The zipper domains then bring the LITE-1 can capture both UVB and UVA light (WHO definition of
two YFP fragments together to reconstitute a fluorescent YFP UVB: 280–315 nm; UVA: 315–400 nm). As a comparison, at the
protein (Figure 1C). We attached N-YFP::ZIP to the N terminus same concentration (0.4 mM), BSA showed no such absorption
of LITE-1 and found that this N-YFP::ZIP::LITE-1 fusion comple- (Figure 2G). In addition, bacterial rhodopsin (bRho), which is a
mented with C-YFP::ZIP to reconstitute YFP fluorescence in live commercial product obtained from Sigma Co., exhibited minimal
muscle cells acutely dissected from the animal, but not with absorption at its signature peak 568 nm (Figure 2H). Only at 103
C-YFP::DZIP that lacked the zipper domain (Figure 1C). This concentration (4 mM) were we able to detect modest light ab-
observation further demonstrates that the N terminus of LITE-1 sorption in bacterial rhodopsin (bRho), which was still much
is located intracellularly. We conclude that LITE-1 adopts a weaker than that found in LITE-1 (Figure 2H). It should be noted
reversed membrane topology compared to opsins. Thus, that, though bRho exhibited weaker photoabsorption compared
LITE-1 does not seem to be closely related to any known photo- to LITE-1, its extinction coefficient (62,000 in Figure 2H versus
receptors at the sequence or structural levels. 63,000 in Figure 2I), as well as its spectral properties, were
both in line with those reported in literature (Figures 2H and 2I),
Purification of LITE-1 Protein from Worm Lysate indicating that the quality of bRho samples was reliable.
Is LITE-1 a photoreceptor? A lack of clear similarity to known The extinction coefficient of both absorbance peaks of LITE-1
photoreceptors does not necessarily disqualify LITE-1 as a is > 106 M 1cm 1, which is 10–100 times that of all known pho-
photoreceptor. To address this question, a simple yet definitive toreceptors (Figure 2I). Thus, LITE-1 has a high efficiency in
approach is to examine whether purified LITE-1 protein can cap- capturing photons.
ture photons by spectrophotometry (Wang and Montell, 2007; To make a more direct comparison, we purified bovine
Yau and Hardie, 2009). All known photoreceptors were verified rhodopsin (Rho) ectopically expressed in worm muscles (Salom
by this approach (Figure 2I). To this end, we searched for an et al., 2012) and did so side by side with LITE-1 under the same
expression system that would allow us to purify a sufficient conditions (Figures S3A and S3B). Compared to LITE-1, purified
amount of LITE-1 protein for spectrophotometric analysis. Mus- bovine rhodopsin (Rho) also showed much weaker photoab-
cle cells thus came to our attention, as they constitute a major sorption at its signature peak (Figures S3C and S3D), providing
mass of worm tissues and have been successfully utilized as a additional evidence demonstrating that LITE-1 is highly effi-
heterologous system to functionally express receptors and cient in capturing photons. The relatively weak photoabsorption
channels (Salom et al., 2012; Wang et al., 2012). Importantly, it by bovine rhodopsin (Rho) was not because our purified
has been shown that LITE-1 can be functionally expressed in Rho samples were of low quality, as the extinction coefficient
muscles, as its expression can confer photo-sensitivity to these of the purified Rho was in fact very similar to that reported in liter-
otherwise photo-insensitive cells (Edwards et al., 2008; Liu et al., ature (Figure S3D versus Figure 2I). In addition, the signature
1254 Cell 167, 1252–1263, November 17, 2016

A B C D Figure 2. LITE-1 Absorbs UVA and UVB
150 Light, and Ectopic Expression of LITE-1
RCaMP ∆R/R (%)

Body paralysis (%)
UVA
RCaMP ∆R/R (%)

100 *** 150 150 UVA
RCaMP ∆R/R (%)

UVA *** Confers Photo-Sensitivity to Photo-Insensi-
80 lite-1 transgene WT
100 100 100 tive Cells
60
(A) Transgenic expression of LITE-1 in muscle cells
40 50 50 50
confers photosensitivity shown by behavioral as-
20 UVA
0 0 0 says. LITE-1 was expressed as a transgene in
0
lite-1 WT 10s muscle cells under the myo-3 promoter. WT (wild-
10s
transgene
type) and LITE-1 transgenic worms were exposed
to a 20 s pulse of UVA light (350 ± 20 nm,
E F
0.8 mW/mm2). Animals showing muscle contrac-
kDa kDa tion-induced paralysis during light illumination
250 250
148 148
were scored positive. n = 20. Error bars represent
SEM. ***p < 0.00001 (ANOVA with Bonferroni test).
98 98
(B–D) Transgenic expression of LITE-1 in muscle
64 64 cells confers photosensitivity shown by calcium
50 50 LITE-1 imaging. RCaMP was expressed as a transgene in
LITE-1
36 36 muscle cells under the myo-3 promoter. YFP was
expressed as a transgene under the same pro-
moter to enable ratiometric imaging. A 5 s pulse of
G 22 H 22 Blot w/ α1D4 UVA light (340 ± 20 nm, 0.7 mW/mm2) was applied
1.5 1.5 to muscles to elicit calcium transients. Shades
ε280=3.25x106 LITE-1 (0.4 μM) bRho (4 μM)
along the traces in (B) and (C) represent SEM.
bRho (0.4 μM)
BSA (0.4 μM) (D) Bar graph. n R 7. *p < 0.0001 (Student’s t test).
Absorbance (O.D.)
Absorbance (O.D.)
1 1 (E and F) Purification of LITE-1. Worm lysate, flow

ε320=1.75x106 through, and purified LITE-1 were loaded. Shown
in (E) is an SDS-PAGE gel stained with Coomassie
0.5 0.5 blue. Shown in (F) is a western blot probed with
ε568=6.2x104 anti-1D4 that recognizes the 1D4 tag attached to
the C-terminal end of LITE-1. The amount of each
0 0 sample loaded in (F) was one-tenth of that in (E).
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700
Wavelength (nm) Wavelength (nm) Samples for SDS-PAGE and western were pre-
I pared at room temperature under non-reducing
Photoreceptor Wavelength (nm) Extinction coefficient (M-1cm-1) Species conditions (free of b-ME and DTT) to avoid ag-
280 3,760,000±597,000 gregation of LITE-1.
LITE-1 C. elegans
320 2,240,000±696,000 (G) LITE-1 shows strong absorption of UVA and
UVB light, while BSA does not. The same con-
Cyptochrome 450 36,000 Drosophila
centration of purified LITE-1 and BSA (0.4 mM)
Bacterial rhodopsin 568 63,000 Halobacterium
was subjected to UV-visible spectrophotometric
Rhodopsin 500 41,200 Bovinae
analysis. The extinction coefficient (ε) for both
Melanopsin 467 33,000 Mus musculus
Opsins
peaks of LITE-1 was noted. Unit: M 1cm 1. Note:

UV opsin 357 41,760 Mus musculus
these numbers only represent the LITE-1 sample
Blue opsin 425 39,400 Xenopus laevis
Green opsin 510 40,800 G. gallus shown here and those in Figure 3, as they were
Red opsin 560 47,200 G. gallus from the same batch of purification. See (I) for
averaged data for LITE-1 from different batches of
purification.
(H) Bacterial rhodopsin (bRho) shows much weaker absorption of light compared to LITE-1. The results from low and high concentrations of bRho were shown.
bRho was purchased from Sigma.
(I) LITE-1 is far more efficient in photon absorption than cryptochromes and opsins. The extinction coefficients for LITE-1 were averaged from samples from seven
independent purifications. ‘‘±’’ represents SEM. The numbers for cryptochromes and opsins were from published literature: cryptochrome (Thompson and
Sancar, 2002), bacterial rhodopsin (Oesterhelt and Hess, 1973), rhodopsin (Okano et al., 1992), melanopsin (Matsuyama et al., 2012), UV opsin (Insinna et al.,
2012), blue opsin (Vought et al., 1999), green opsin, and red opsin (Kolesnikov et al., 2014).
See also Figures S2 and S3 and Movies S1 and S2.
absorbance peak of the purified Rho was 500 nm, which was tor did not absorb light when purified and tested side by side
identical to that published in literature (Figure S3D versus Fig- with LITE-1 and Rho (Figure S3C). Thus, multiple control exper-
ure 2I). This set of control experiments also validated our iments support that LITE-1 absorbs photons and does so at a
experimental system, including protein expression, purification, high efficiency. This property of LITE-1, together with its
concentration determination, and spectral analysis. capacity in producing various light-induced functional outputs
In another control experiment, we purified mammalian adeno- [e.g., light-induced muscle contraction and calcium transients
sine A2A receptor (A2AR) ectopically expressed in worm muscles and avoidance behavior ([Figures 2A–2D and S2 and Movies
(Salom et al., 2012) (Figures S3A and S3B). Like LITE-1 and S1 and S2]), indicates that LITE-1 is a photoreceptor. LITE-1 is
opsins, A2AR is also a 7-TM receptor but is not expected to be also the only photoreceptor that shows strong absorption of
photosensitive. Indeed, we found that, as predicted, this recep- both UVA and UVB light.
Cell 167, 1252–1263, November 17, 2016 1255

A B Figure 3. Photoabsorption by LITE-1 Relies
1.5 on Its Conformation
1.5
LITE-1 (0.4 μM) Mock bRho (4 μM) Mock (A) Denaturing LITE-1 with urea abolishes its
Urea (4M) Urea (4M)
photoabsorption. Shown are spectral data for
Absorbance (O.D.)
1 1 mock- and urea-treated LITE-1. LITE-1 was

treated with urea (4 M) for 5 min at room temper-
ature prior to spectral analysis.
0.5 0.5 (B) Denaturing bacterial rhodopsin (bRho) with
urea does not eliminate its photoabsorption. Urea
treatment shifts bRho’s 568 nm absorbance peak
0 0 to 370 nm. bRho was treated with urea (4 M)
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700
for 5 min at room temperature prior to spectral
Wavelength (nm) Wavelength (nm)
analysis.
(C) Denaturing LITE-1 with NaOH abolishes its
photoabsorption. LITE-1 was treated with NaOH
C D (0.1 M) for 5 min at room temperature prior to
1.5 1.5 spectral analysis.
LITE-1 (0.4 μM) Mock bRho (4 μM) Mock
NaOH (0.1M) (D) Denaturing bacterial rhodopsin (bRho) with
NaOH (0.1M)
Absorbance (O.D.)
NaOH does not eliminate its photoabsorption.

1 1
NaOH treatment shifts bRho’s 568 nm absorbance
peak to 370 nm. bRho was treated with NaOH
(0.1 M) for 5 min at room temperature prior to
0.5 0.5
spectral analysis.
LITE-1 concentration: 0.4 mM. bRho concentra-
0 0 tion: 4 mM.
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700 See also Figure S4.
LITE-1 Strictly Depends on Its Conformation for NaOH and observed a similar phenomenon (Figures 3C and
Photoabsorption 3D). These observations demonstrate that, unlike typical
We next sought to characterize the photoabsorption of LITE-1. photoreceptors, LITE-1 strictly depends on its conformation for
A photoreceptor is usually composed of two moieties: a host photoabsorption.
protein and a prosthetic chromophore (Falciatore and Bowler, We also tested H2O2. Interestingly, H2O2 treatment abolished
2005; Wang and Montell, 2007; Yau and Hardie, 2009). The LITE-1’s photoabsorption (Figure S4A). As an oxidizing agent,
spectral properties of a photoreceptor are certainly affected by H2O2 can damage the function of proteins, lipids, and nucleic
the host protein. However, the absolute ability of a photore- acids (Fridovich, 2013). Oxidization of LITE-1 may affect the
ceptor to absorb light does not rely on the host protein, as light conformation of LITE-1, which is required for its absorption of
absorption is mediated by the chromophore (e.g., retinal, flavin, light. Similarly, H2O2 treatment also destroyed the spectral
bilin, and p-coumaric acid) (Falciatore and Bowler, 2005; Marti fingerprint of bRho by shifting its absorbance peak from 568 to
et al., 1991; Radding and Wald, 1956). Consequently, denaturing 370 nm (Figure S4B). Thus, H2O2 appears to inhibit the photoab-
a photoreceptor usually shifts its absorbance peaks to different sorption of both LITE-1 and bRho in vitro. Nevertheless, as it is
wavelengths but does not eliminate them, as they are mediated difficult to estimate the endogenous concentration of H2O2,
by the associated chromophore (Dutta et al., 2010; Hagins, whether and how H2O2 affects LITE-1 function in vivo remains
1973; Hubbard, 1969; Maglova et al., 1989). This, surprisingly, to be determined.
does not appear to be the case for LITE-1. Denaturing LITE-1
with urea abolished the light absorption by LITE-1, eliminating Genetic Screens Identify Residues Critical for LITE-1
both the 280 and 320 nm peaks (Figure 3A). As a comparison, Function
the same urea treatment failed to abolish the light absorption To obtain a better understanding of LITE-1 photoabsorption, we
by bacterial rhodopsin (bRho) but instead shifted its absorbance attempted to identify residues critical for LITE-1 function. In a ge-
peak from 568 nm to 370 nm (Figure 3B), the latter of which is the netic screen for mutant animals defective in UV-light-induced
signature peak of free retinal, the chromophore of bRho (Sperling avoidance behavior, we isolated several lite-1 mutant alleles
and Rafferty, 1969). A similar phenomenon was observed with (Liu et al., 2010). We hypothesized that mutations in transmem-
our purified bovine rhodopsin (Rho) (Figures S3E and S3F). It is brane domains are more likely to affect the photoabsorption of
notable that the 280 nm peak of denatured bRho remained LITE-1 rather than its coupling to downstream signaling mole-
unchanged (Figure 3B), consistent with the notion that this cules. Two mutants, lite-1(xu8) and lite-1(xu10), thus came to
peak was mediated by the intrinsic light absorption by trypto- our attention, as the residues mutated (A332V and S226F,
phan residues of the bRho protein. This peak was not that respectively) reside in putative transmembrane domains (Fig-
distinct in denatured LITE-1 in Figure 3A since the ure 7I). The objective was to purify these mutant forms of
concentration of LITE-1 used was one-tenth that of bRho. We LITE-1 protein and then characterize their photoabsorption
also treated LITE-1 using other denaturing agents such as in vitro. We first tested their role in vivo and found that, as
1256 Cell 167, 1252–1263, November 17, 2016

A B C Figure 4. Residues S226 and A332 in LITE-1
100 UVA 150 Are Critical for Its Sensitivity to UVA Light
Body paralysis (%)
*** UVA 150 lite-1 transgene lite-1(A332V) transgene
RCaMP ∆R/R (%)
RCaMP ∆R/R (%)

80
In Vivo
100 100 (A) S226F and A332V mutations disrupt the func-
60
tion of LITE-1 in vivo shown by behavioral assays.
40 50 50 UVA LITE-1 harboring S226F or A332V was expressed
20 as a transgene in muscles under the myo-3 pro-
0 0 0 moter. WT (wild-type) and transgenic worms were
10s 10s exposed to a 20 s pulse of UVA light (350 ± 20 nm,
0.8 mW/mm2). Animals showing muscle contrac-
tion-induced paralysis during light illumination
D E were scored positive. Some genotypes had all
150 data points as zero, and thus no statistical analysis
150 UVA
lite-1(S226F) transgene was performed on them. n = 20. Error bars: SEM.
RCaMP ∆R/R (%)
RCaMP ∆R/R (%)
100 ***p < 0.00001 (ANOVA with Bonferroni test).

100
(B–E) S226F and A332V mutations disrupt the
function of LITE-1 in vivo, shown by calcium im-
50 50
UVA *** aging. The experiments were done as described in
*** Figure 2B. A 5 s pulse of UVA light (340 ± 20 nm,
0
0 0.7 mW/mm2) was applied to muscles to elicit
10s
calcium transients. Shades along the traces in
(B–D) represent SEM. (E) Bar graph. n R 7.
***p < 0.00001 (ANOVA with Bonferroni test).
(F and G) Purification of mutant forms LITE-1.
F G Shown in (F) is an SDS-PAGE gel stained with
kDa kDa Coomassie blue. Shown in (G) is a western blot
250 250 probed with anti-1D4 that recognizes the 1D4 tag
148 148
attached to the C terminus of LITE-1 variants. The
98 98 amount of each sample loaded in (G) was 1/10 of
that in (F). Samples for SDS-PAGE and western
64 64
were prepared at room temperature under non-
50
LITE-1
50
LITE-1 reducing conditions (free of b-ME and DTT) to
avoid aggregation of LITE-1.
36 36
See also Figure S5.
22 22 Blot w/ α1D4
expected, A332V and S226F mutations disrupted LITE-1 func- forms of LITE-1 were insensitive to UVA light (Figures 4A and
tion in vivo. Specifically, worms ectopically expressing LITE-1 S5A), they were nevertheless sensitive to UVB light (Figures 5C
harboring either mutation were no longer sensitive to UVA light and S5B). In addition, as was the case with wild-type LITE-1,
in behavioral assays (Figures 4A and S5A). In addition, these UVB light also induced robust calcium transients in muscle cells
two point mutations nearly abolished UVA-light-evoked calcium ectopically expressing these two mutant forms of LITE-1 (Fig-
transients in muscle cells ectopically expressing LITE-1 (Figures ures 5D–5H). These results are in line with the data from spectral
4B–4E). We successfully purified LITE-1A332V and LITE-1S226F analysis (Figures 5A and 5B). Thus, it appears that the absorption
proteins to homogeneity (Figures 4F and 4G). LITE-1A332V and of UVA and UVB light by LITE-1 can be separated, providing
LITE-1S226F displayed an absorbance spectrum distinct from further evidence demonstrating the specificity of LITE-1
wild-type LITE-1: they both lost the 320 nm peak but retained photoabsorption.
normal absorption at 280 nm (Figures 5A and 5B). Thus, the
two mutations disrupted LITE-1’s absorption of UVA but not LITE-1 Absorption of UVB but Not UVA Light Shows
UVB light. This is consistent with the fact that our genetic screen Resistance to Photobleaching
was targeted for isolating mutants defective in responding to Prolonged light illumination bleaches photoreceptors (Wang
UVA but not UVB light, since the optical system of the micro- and Montell, 2007; Yau and Hardie, 2009). We tested this prop-
scope used to evoke and assay phototaxis behavior did not erty of LITE-1 and found that pre-exposure to UV light can
transmit UVB light (Liu et al., 2010). readily bleach LITE-1’s ability to absorb UVA light by elimi-
Given that LITE-1A332V and LITE-1S226F proteins retained nating its 320 nm peak (Figure 5I). Surprisingly, such treatment
normal absorption of UVB light in vitro, one would predict that spared the 280 nm peak (Figure 5I), indicating that the ability for
these two mutant forms of LITE-1 shall preserve the sensitivity LITE-1 to capture UVB light was more stable and relatively
to UVB light in vivo. To test this idea, we set up an optical path resistant to photobleaching. This experiment reveals an addi-
through which UV light was directed to the worm directly. tional feature that distinguishes LITE-1 absorption of UVA and
Indeed, though transgenic worms expressing these two mutant UVB light.
Cell 167, 1252–1263, November 17, 2016 1257

A B Figure 5. Residues S226 and A332 in LITE-1
2 2
LITE-1A332V (0.4 μM) LITE-1S226F (0.4 μM) Are Required for Its Absorption of UVA but
Not UVB Light In Vitro
Absorbance (O.D.)
Absorbance (O.D.)
1.5 1.5
(A and B) S226F and A332V mutations disrupt
LITE-1’s absorption of UVA but not UVB light
1 1
in vitro. The extinction coefficient at 280 nm for
0.5 LITE-1A332V and LITE-1226F is: 4.0 3 106 M 1cm 1
0.5
and 3.75 3 106 M 1cm 1, respectively, which are
0 0 similar to wild-type LITE-1 (Figure 2I).
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700 (C) S226F and A332V mutations do not disrupt the
sensitivity of LITE-1 to UVB light in vivo, shown by
C 100 *** UVB D E behavioral assays. LITE-1 harboring S226F or
*** UVB A332V was expressed as a transgene in muscle
Body paralysis (%)
80 ***
150 150 lite-1 transgene cells under the myo-3 promoter. WT (wild-type)
RCaMP ∆R/R (%)

WT
RCaMP ∆R/R (%)
60 and transgenic worms were exposed to a 20 s

100 100 pulse of UVB light (280 ± 10 nm, 0.03 mW/mm2).
40 Animals showing muscle contraction-induced
50
50 paralysis during light illumination were scored
20 UVB
0
positive. n = 20. Error bars represent SEM.
0 0 ***p < 0.00001 (ANOVA with Bonferroni test).
10s 10s
(D–H) S226F and A332V mutations do not disrupt
the sensitivity of LITE-1 to UVB light in vivo, shown
by calcium imaging. The experiments were done
F UVB
G H as described in Figure 2B. A 5 s pulse of UVB light
UVB 200
150
lite-1(A332V)
150 lite-1(S226F)
UVB (280 ± 10 nm, 0.02 mW/mm2) was applied to
RCaMP ∆R/R (%)
RCaMP ∆R/R (%)
RCaMP ∆R/R (%)
***
transgene transgene 150 *** muscles to elicit calcium transients. Shades along
100 100 *** the traces in (D–G) represent SEM (H) Bar graph.
100
n R 10. ***p < 0.00001 (ANOVA with Bonferroni
50 50
50
test).
(I) LITE-1 absorption of UVB but not UVA light
0 0
0 shows resistance to photobleaching. LITE-1 was
10s 10s
pre-exposed to UV light for 5 min (17 mW/mm2,
302 nm) at room temperature prior to spectro-
I 1 photometric analysis. Pre-exposure to UV light for
LITE-1 (0.18 μM)
30 min still did not notably affect the UVB photo-
LITE-1 (0.18 μM)
Absorbance (O.D.)
absorption. The photoabsorption at 280 nm was

bleached by UV light
eventually lost after 1 hr of pre-exposure, probably
0.5
because LITE-1 was denatured. As a direct com-
parison, bRho, when tested under the same con-
dition, showed photobleaching of its 568 nm peak
after pre-exposure to ambient light for <5 min, and
0
250 300 350 400 450 500 550 600 650 700 such photobleaching became complete at 10 min.
Wavelength (nm) See also Figure S5.
Two Tryptophan Residues Are Required for LITE-1 mutagenesis and expressed the corresponding mutant form
Function of LITE-1 as a transgene in muscle cells. We first examined their
Our success in identifying residues critical for LITE-1’s absorp- function in vivo. Two tryptophan residues, W77 and W328, when
tion of UVA light encouraged us to explore what may underlie mutated to alanine, abolished the sensitivity of LITE-1 to UVA
its absorption of UVB light. Tryptophan residues show intrinsic light in vivo in behavioral assays (Figures 6A and S6A), whereas
absorption of UVB light, peaking at 280 nm. It is also known mutating the other four tryptophan residues did not elicit a
that light absorption by tryptophan is quite resistant to photo- notable effect (Figure 6A). We obtained a similar result when
bleaching (Wu et al., 2008). These two features together led us mutating W77 and W328 to F (phenylalanine) (Figure 6A).
to question whether tryptophan residues in LITE-1 play a role Furthermore, the two tryptophan mutations W77F and W328F
in mediating its absorption of UVB light. Six tryptophan residues nearly eliminated UVA-light-induced calcium transients in mus-
are found in LITE-1 (Figure 7I). However, should any of these cle cells ectopically expressing LITE-1 (Figures 6C–6F). These
tryptophan residues be important for LITE-1 function, they would data identify a critical role for W77 and W328 in LITE-1 function
not be expected to be picked up by our genetic screen, as the in vivo.
mutagen (EMS) used in the screen would typically mutate a tryp- Lastly, we purified the two mutant forms of LITE-1, LITE-1W77F
tophan residue to a stop codon rather than generate a missense and LITE-1W328F, to homogeneity (Figures 4F and 4G) and exam-
mutation. ined their photoabsorption in vitro. Strikingly, W77F and W328F
Therefore, to test the above hypothesis, we mutated each of mutations not only abolished LITE-1’s absorption of UVA light at
the six tryptophan residues to alanine through site-directed 320 nm, but also nearly eliminated its absorption of UVB light at
1258 Cell 167, 1252–1263, November 17, 2016

A B Figure 6. The Two Tryptophan Residues
100 *** 100 *** W77 and W328 in LITE-1 Are Required for
UVA *** *** UVB
*** *** *** *** *** LITE-1 Function Both In Vivo and In Vitro
***
Body paralysis (%)

Body paralysis (%)
80 80 (A and B) Mutating W77 and W328 but not the

other four W residues disrupts the sensitivity of
60 60
LITE-1 to both UVA and UVB light in vivo, shown by
40 40 behavioral assays. LITE-1 variants harboring mu-
tations in each W residue were expressed as a
20 20 transgene in muscle cells. Wild-type (WT) and
transgenic worms were exposed to a 20 s pulse of
0 0
UVA light (A) or UVB light (B). Animals showing
muscle contraction-induced paralysis during light
illumination were scored positive. n = 20. Error
bars represent SEM. ***p < 0.00001 (ANOVA with
C UVA
D E F Bonferroni test).
150 lite-1 150 lite-1(W77F) lite-1(W328F) 200
RCaMP ∆R/R (%)

150
RCaMP ∆R/R (%)
(C–F) W77F and W328F mutations disrupt the

RCaMP ∆R/R (%)
UVA
RCaMP ∆R/R (%)
transgene transgene transgene

150 sensitivity of LITE-1 to UVA light in vivo, shown by
100 100 100
100 calcium imaging. The experiments were done as
50 50 UVA 50 UVA 50 described in Figure 2B. A 5 s pulse of UVA light
*** *** (340 ± 20 nm, 0.7 mW/mm2) was applied to mus-
0 0 0 0
10s cles to elicit calcium transients. Shades along the
10s
10s traces in (C–E) represent SEM. (F) Bar graph.
n R 6. ***p < 0.00001 (ANOVA with Bonferroni
G H I J test).
UVB lite-1 150 lite-1(W77F) lite-1(W328F) (G–J) W77F and W328F mutations disrupt the
RCaMP ∆R/R (%)
150 150 150

RCaMP ∆R/R (%)
RCaMP ∆R/R (%)
UVB
RCaMP ∆R/R (%)
transgene transgene transgene

sensitivity of LITE-1 to UVB light in vivo, shown by
100 100 100 100 calcium imaging. A 5 s pulse of UVB light (280 ±
50 50 50 UVB 50
10 nm, 0.02 mW/mm2) was applied to muscles to
UVB
*** elicit calcium transients. Shades along the traces
0 ***
0 0 0 in (G–I) represent SEM. (J) Bar graph. n R 10.
10s 10s 10s ***p < 0.00001 (ANOVA with Bonferroni test).
(K and L) W77F and W328F mutations disrupt
LITE-1’s absorption of both UVA and UVB light
K L in vitro.
1.5 1.5 See also Figure S6.
LITE-1W77F (0.4 μM) LITE-1W328F (0.4 μM)
Absorbance (O.D.)
Absorbance (O.D.)
1 1
wondered whether introducing such tryp-

0.5 tophan residues into another protein 0.5
would promote photoabsorption. On the

0 0 other hand, tryptophan residues alone
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700
Wavelength (nm) Wavelength (nm) are unlikely to underpin the high photoab-
sorption capacity of LITE-1, and other
parts of LITE-1 must be involved, which
may serve as a ‘‘backbone’’ to support
280 nm (Figures 6K and 6L). Consistently with this spectral data, the function of the two tryptophan residues in capturing photons.
we found that these two tryptophan mutations abolished the We thus reasoned that those proteins related to LITE-1, such as
sensitivity of LITE-1 to UVB light in vivo in behavioral assays (Fig- other GR genes, may possess such a backbone structure and
ures 6B and S6B). In addition, UVB light elicited little if any cal- thereby would have a higher likelihood to be engineered as a
cium transients in muscle cells ectopically expressing these photoreceptor. The C. elegans GR family contains five members.
two mutant forms of LITE-1 (Figures 6G–6J). The residual cal- With the exception of LITE-1, no other GR genes have both tryp-
cium response evoked by UVA and UVB light in these muscle tophan residues at the corresponding positions (Figure S7A). We
cells arose from the other tryptophan residue, as mutating noticed that, although GUR-3 is not that similar to LITE-1 at the
both tryptophan residues eliminated the response (J.G. and sequence level (40% sequence identity with LITE-1), it has one
X.Z.S.X., unpublished data). Thus, the two tryptophan residues tryptophan residue in place, which corresponds to W328 in
W77 and W328 are critical for LITE-1 function both in vivo and LITE-1 (Figure S7A). GUR-3 was suggested to function as a
in vitro. These experiments identify key molecular determinants chemoreceptor (Bhatla and Horvitz, 2015). Ectopic expression
required for LITE-1 function in vivo and in vitro. of GUR-3 in muscle cells did not promote their sensitivity to
UV light in behavioral assays (Figures 7A, S7B, S7F, and
Genetic Engineering of Photoreceptors S7G), suggesting that GUR-3 has little or no photosensitivity.
To provide further evidence supporting a critical role for the Indeed, calcium imaging revealed that UV light only evoked
two tryptophan residues in mediating photoabsorption, we little calcium response in muscle cells ectopically expressing
Cell 167, 1252–1263, November 17, 2016 1259

A B C D Figure 7. Genetic Engineering of a Photore-
gur-3(Y79W) ceptor by Introducing a Tryptophan Residue
100 150 gur-3 transgene 150
RCaMP ∆R/R (%)

RCaMP ∆R/R (%)
Body paralysis (%)
UVB transgene
RCaMP ∆R/R (%)

UVB 150 UVB into Another GR Family Member, GUR-3
80 *** ***
100 100 (A) Mutating Y79 to W in GUR-3 promotes photo-
60 100
sensitivity in vivo shown by behavioral assays.
40 50 UVB GUR-3Y79W and GUR-3 were expressed as a
50 50
20 transgene in muscle cells. Worms were exposed
0 0 0 to a 20 s pulse of UVB light (280 ± 10 nm,
0
10s 10s 0.03 mW/mm2), and those showing muscle
contraction-induced paralysis during light illumi-
nation were scored positive. n = 50. Error
bars represent SEM. ***p < 0.00001 (Student’s
E kDa F kDa t test).
250 250
148 148 (B–D) Mutating Y79 to W in GUR-3 promotes
photosensitivity in vivo, shown by calcium imag-
98 98 ing. The experiments were done as described in
64 64 Figure 2B. A 5 s pulse of UVB light (280 ± 10 nm,
50
50 0.02 mW/mm2) was applied to muscles to elicit
36 36 calcium transients. Shades along the traces in
(B) and (C) represent SEM. (D) Bar graph. n = 20.
***p < 0.00001 (Student’s t test).
22 22 Blot w/ α1D4
(E and F) Purification of GUR-3Y79W and GUR-3.
G H Shown in (E) is an SDS-PAGE gel stained with
0.4 0.4
GUR-3Y79W (0.35 μM) GUR-3 (0.25 μM) Coomassie blue. Shown in (F) is a western blot
probed with anti-1D4 that recognizes the 1D4 tag
0.3 0.3 attached to the C terminus of GUR-3Y79W and
Absorbance (O.D.)
Absorbance (O.D.)
GUR-3, as well as LITE-1. LITE-1 was purified side

0.2 0.2 by side as a reference. As predicted, GUR-3
showed a slightly larger molecular weight than
0.1 0.1
LITE-1. The amount of each sample loaded in
(F) was one-tenth of that in (E). Samples for
SDS-PAGE and western were prepared at
0 0
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700 room temperature under non-reducing conditions
Wavelength (nm) Wavelength (nm) (free of b-ME and DTT) to avoid aggregation of
I 191 LITE-1.
W 319
W Out
(G and H) Mutating Y79 to W in GUR-3 greatly
potentiates the absorption of UVB light (280 nm)
328 419
226 W W in vitro.
S Membrane
77 332 (I) A schematic model denoting LITE-1 membrane
W A
topology and the position of residues investigated
In in this study.
See also Figure S7.
41
W
GUR-3 (Figures 7C, 7D and S7C–S7E). We then mutated residue in vitro (Figures 7G and 7H). As expected, GUR-3 showed little
Y79 in GUR-3 to W (i.e., GUR-3Y79W), which corresponds to W77 absorption of UVB light (Figure 7H). By contrast, strong absorp-
in LITE-1 (Figure S7A). Strikingly, worms ectopically expressing tion of UVB light at 280 nm was observed in GUR-3Y79W (Fig-
the tryptophan-bearing GUR-3Y79W then became very sensitive ure 7G). The extinction coefficient of this tryptophan-bearing
to UVB light (Figures 7A and S7G). UVA light was not that GUR-3Y79W protein reached the level of 106 M 1cm 1 (1.03 3
effective on these worms (Figures S7B–S7F). This result was 106 M 1cm 1), which is about one-third of that found for
expected, as UVA absorption by LITE-1 apparently requires LITE-1. This data provides a biochemical basis for the observed
additional key elements such as residues A226 and S332 and photosensitivity of GUR-3Y79W. This set of experiments also rai-
perhaps others (Figures 4A–4E). We also examined UVB-light- ses the intriguing prospect that it might be possible to genetically
evoked calcium transients in muscle cells and found that ectopic engineer new photoreceptors.
expression of the tryptophan-bearing GUR-3Y79W greatly
potentiated UVB-light-induced calcium response in these cells DISCUSSION
(Figures 7B–7D). Thus, introducing a tryptophan residue into
GUR-3 promotes photosensitivity. In summary, our results demonstrate that the C. elegans taste re-
Having characterized the photosensitivity of GUR-3Y79W and ceptor homolog LITE-1 is a bona fide photoreceptor. As some
GUR-3 in vivo, we then purified both proteins to homogeneity photoreceptors are multifunctional—for example, Drosophila
(Figures 7E and 7F) and examined their photoabsorption rhodopsin also responds to heat (Shen et al., 2011)—it remains
1260 Cell 167, 1252–1263, November 17, 2016

possible that LITE-1 might also be able to sense additional cues, 7-TM adiponectin receptors AdipR1 and AdipR2, which play a
including chemical cues such as H2O2 (Bhatla and Horvitz, pivotal role in diabetes, obesity, and insulin resistance in mam-
2015). Several features distinguish LITE-1 from known photore- mals, also bear a membrane topology opposite to classical
ceptors, including an exceptionally high efficiency in photoab- GPCRs (Iwabu et al., 2010). Some mammalian tissues/cells
sorption, an ability to sense both UVA and UVB light, a strict (e.g., skin keratinocytes and melanocytes) are sensitive to UV
dependence on protein conformation for photoabsorption, a light, but the underlying photoreceptors have not been defini-
strong resistance to photobleaching, and a reversed membrane tively identified (Bellono et al., 2013; Moore et al., 2013). It is
topology compared to opsins. LITE-1 also bears no sequence conceivable that some receptor proteins functionally related to
homology with the two known metazoan photoreceptors LITE-1 may sense UV light in these skin cells.
(i.e., opsins and cryptochromes) or any other photoreceptors in Ectopic expression of LITE-1 can confer photosensitivity to
microbes and plants. Apparently, LITE-1 represents a distinct photo-insensitive cells by triggering neuronal excitation and
type of photoreceptor in nature. muscle contraction (Edwards et al., 2008; Liu et al., 2010).
While it is easy to appreciate the requirement of tryptophan LITE-1’s exceptionally high efficiency in photon absorption,
residues for the absorption of UVB light at 280 nm, it is a bit sur- sensitivity to both UVA and UVB light, and strong resistance to
prising that the absorption of UVA light at 320 nm also depends photobleaching make it a promising candidate as an optoge-
on the same tryptophan residues. On the other hand, some mu- netic tool. These features also demonstrate the potential for
tations (e.g., S226F and A332V) only affect the absorption of UVA developing LITE-1 as an organic additive to sunscreens for
but not UVB light (Figures 5A and 5B). As such, we suggest that skin protection against harmful UV in the sunlight. The current
the two tryptophan residues W77 and W328 regulate the absorp- study provides an entry point to characterize this interesting
tion of both UVB and UVA light, while the absorption of UVA light photoreceptor.
requires additional residues such as S226 and A332. The W77F
and W328F data, together with the unusually strict dependence STAR+METHODS
of LITE-1 photoabsorption on its protein conformation, raise the
intriguing possibility that LITE-1 may not have a prosthetic chro- Detailed methods are provided in the online version of this paper
mophore. Interestingly, the plant-specific protein UVR8, a solu- and include the following:
ble protein that is completely unrelated to LITE-1, also requires
tryptophan residues for UVB light detection and lacks a pros- d KEY RESOURCES TABLE
thetic chromophore (Christie et al., 2012; Rizzini et al., 2011; d CONTACT FOR REAGENT AND RESOURCE SHARING
Wu et al., 2012). This prompted us to speculate that the two d EXPERIMENTAL MODEL AND SUBJECT DETAILS
tryptophan residues W77 and W328 may contribute to the for- d METHODS DETAILS
mation of the chromophore of LITE-1, which may underlie its B Immunostaining to determine the membrane topology
high photon-capturing efficiency. Though the definitive answer of LITE-1
shall await the determination of the atomic structure of LITE-1, B Purification and spectrophotometric analysis of LITE-1
our finding that introducing such a tryptophan residue into and control proteins
another GR family protein can promote photosensitivity lends B Behavior assays to quantify LITE-1 function
support to this model. This experiment also raises the intriguing B Calcium imaging to quantify LITE-1 function
prospect that it might be possible to genetically engineer new B Molecular biology
photoreceptors. d QUANTIFICATION AND STATISTICAL ANALYSIS
LITE-1 is a member of the invertebrate GR gene family, which
contains five homologs in worms and more than 60 members in SUPPLEMENTAL INFORMATION
insects (Clyne et al., 2000; Liu et al., 2010; Scott et al., 2001).
Supplemental Information includes seven figures and two movies and can be
Some of them in fact do not act as chemoreceptors (Thorne found with this article online at http://dx.doi.org/10.1016/j.cell.2016.10.053.
and Amrein, 2008). For example, Drosophila Gr28b(d) encodes
a thermosensor (Ni et al., 2013), while another Gr28b isoform AUTHOR CONTRIBUTIONS
has been implicated in UV-light-induced avoidance behavior
(Xiang et al., 2010). The inverted membrane topology makes it J.G. performed the experiments and analyzed the data. Y.Y., B.Z., Z.W., J.P.,
unlikely for LITE-1 to function as a GPCR. Interestingly, LITE-1 and Z.F. assisted J.G. in performing the experiments. A.W. initiated the project
and generated reagents. L.K. performed immunostaining on primary cultured
can functionally interact with G protein signaling (Liu et al.,
muscle cells and analyzed the data. J.G., J.L., and X.Z.S.X. wrote the paper.
2010); but given the atypical topology of LITE-1, its interaction
with G protein signaling is likely to be indirect (Liu et al., 2010). ACKNOWLEDGMENTS
It is also unclear whether LITE-1 possesses ion channel activity
like some OR and GR members. At the sequence level, no clear We thank Tom Kerppola for BiFC plasmids; David Salom and Kris Palczewski
mammalian LITE-1 homologs could be identified. This, however, for technical assistance and providing strains; Zhaohui Xu for helpful discus-
does not necessarily imply a lack of LITE-1 orthologs in mam- sions; and Wenyuang Zhang, Jiejun Zhou, John Tesmer, Frederick Stull, and
James Bardwell for technical assistance. Some strains were obtained from
mals, as 7-TM receptors tend to share limited homologies
the CGC. This work utilized the Core Center for Vision Research funded by
even among those within the same subfamilies. In fact, 7-TM re- P30 EY007003 from the NEI. A.W. was supported by a predoctoral training
ceptors with a reversed membrane topology are present in the grant from the NEI (T32EY013934). This work was supported by the NSFC
mammalian genome (Iwabu et al., 2010). For example, the (31130028, 31225011, and 31420103909 to J.L.), the Program of Introducing
Cell 167, 1252–1263, November 17, 2016 1261

Talents of Discipline to Universities from the Ministry of Education of China anisms of Activation. In G Protein Signaling Mechanisms in the Retina, K.A.
(B08029 to J.L.), Program for Changjiang Scholars and Innovative Research Martemyanov and A.P. Sampath, eds. (Springer).
Team in University (PCSIRT: IRT13016), and grants from the NIGMS Li, Z., Liu, J., Zheng, M., and Xu, X.Z. (2014). Encoding of both analog- and dig-
(GM083241) and NEI (EY022315). ital-like behavioral outputs by one C. elegans interneuron. Cell 159, 751–765.
Liu, J., Ward, A., Gao, J., Dong, Y., Nishio, N., Inada, H., Kang, L., Yu, Y., Ma,
Received: December 20, 2015
D., Xu, T., et al. (2010). C. elegans phototransduction requires a G protein-
Revised: September 5, 2016
dependent cGMP pathway and a taste receptor homolog. Nat. Neurosci. 13,
715–722.
Liu, J., Zhang, B., Lei, H., Feng, Z., Liu, J., Hsu, A.L., and Xu, X.Z. (2013). Func-
tional aging in the nervous system contributes to age-dependent motor activity
REFERENCES
decline in C. elegans. Cell Metab. 18, 392–402.
Bellono, N.W., Kammel, L.G., Zimmerman, A.L., and Oancea, E. (2013). UV Maglova, L., Atanasov, B., and Keszthelyi, L. (1989). Unfolding of monomeric
light phototransduction activates transient receptor potential A1 ion channels bacteriorhodopsin in water-urea solution. Biochim. Biophys. Acta 975,
in human melanocytes. Proc. Natl. Acad. Sci. USA 110, 2383–2388. 271–276.
Benton, R., Sachse, S., Michnick, S.W., and Vosshall, L.B. (2006). Atypical Marti, T., Rösselet, S.J., Otto, H., Heyn, M.P., and Khorana, H.G. (1991). The
membrane topology and heteromeric function of Drosophila odorant receptors retinylidene Schiff base counterion in bacteriorhodopsin. J. Biol. Chem. 266,
in vivo. PLoS Biol. 4, e20. 18674–18683.
Bhatla, N., and Horvitz, H.R. (2015). Light and hydrogen peroxide inhibit C. el- Matsuyama, T., Yamashita, T., Imamoto, Y., and Shichida, Y. (2012). Photo-
egans Feeding through gustatory receptor orthologs and pharyngeal neurons. chemical properties of mammalian melanopsin. Biochemistry 51, 5454–5462.
Neuron 85, 804–818. Moore, C., Cevikbas, F., Pasolli, H.A., Chen, Y., Kong, W., Kempkes, C., Par-
Christensen, M., Estevez, A., Yin, X., Fox, R., Morrison, R., McDonnell, M., ekh, P., Lee, S.H., Kontchou, N.A., Yeh, I., et al. (2013). UVB radiation gener-
Gleason, C., Miller, D.M., 3rd, and Strange, K. (2002). A primary culture system ates sunburn pain and affects skin by activating epidermal TRPV4 ion channels
for functional analysis of C. elegans neurons and muscle cells. Neuron 33, and triggering endothelin-1 signaling. Proc. Natl. Acad. Sci. USA 110, E3225–
503–514. E3234.
Christie, J.M., Arvai, A.S., Baxter, K.J., Heilmann, M., Pratt, A.J., O’Hara, A., Ni, L., Bronk, P., Chang, E.C., Lowell, A.M., Flam, J.O., Panzano, V.C., Theo-
Kelly, S.M., Hothorn, M., Smith, B.O., Hitomi, K., et al. (2012). Plant UVR8 bald, D.L., Griffith, L.C., and Garrity, P.A. (2013). A gustatory receptor pa-
photoreceptor senses UV-B by tryptophan-mediated disruption of cross- ralogue controls rapid warmth avoidance in Drosophila. Nature 500, 580–584.
dimer salt bridges. Science 335, 1492–1496. Oesterhelt, D., and Hess, B. (1973). Reversible photolysis of the purple com-
Clyne, P.J., Warr, C.G., and Carlson, J.R. (2000). Candidate taste receptors in plex in the purple membrane of Halobacterium halobium. Eur. J. Biochem.
Drosophila. Science 287, 1830–1834. 37, 316–326.
de Bono, M., and Maricq, A.V. (2005). Neuronal substrates of complex behav- Okano, T., Fukada, Y., Shichida, Y., and Yoshizawa, T. (1992). Photosensitiv-
iors in C. elegans. Annu. Rev. Neurosci. 28, 451–501. ities of iodopsin and rhodopsins. Photochem. Photobiol. 56, 995–1001.
Dutta, A., Kim, T.Y., Moeller, M., Wu, J., Alexiev, U., and Klein-Seetharaman, J. Radding, C.M., and Wald, G. (1956). Acid-base properties of rhodopsin and
(2010). Characterization of membrane protein non-native states. 2. The SDS- opsin. J. Gen. Physiol. 39, 909–922.
unfolded states of rhodopsin. Biochemistry 49, 6329–6340. Rizzini, L., Favory, J.J., Cloix, C., Faggionato, D., O’Hara, A., Kaiserli, E., Bau-
Edwards, S.L., Charlie, N.K., Milfort, M.C., Brown, B.S., Gravlin, C.N., Knecht, meister, R., Schäfer, E., Nagy, F., Jenkins, G.I., and Ulm, R. (2011). Perception
J.E., and Miller, K.G. (2008). A novel molecular solution for ultraviolet light of UV-B by the Arabidopsis UVR8 protein. Science 332, 103–106.
detection in Caenorhabditis elegans. PLoS Biol. 6, e198. Salom, D., Cao, P., Sun, W., Kramp, K., Jastrzebska, B., Jin, H., Feng, Z., and
Falciatore, A., and Bowler, C. (2005). The evolution and function of blue and red Palczewski, K. (2012). Heterologous expression of functional G-protein-
light photoreceptors. Curr. Top. Dev. Biol. 68, 317–350. coupled receptors in Caenorhabditis elegans. FASEB J. 26, 492–502.
Foster, R.G., and Soni, B.G. (1998). Extraretinal photoreceptors and their regu- Scott, K., Brady, R., Jr., Cravchik, A., Morozov, P., Rzhetsky, A., Zuker, C., and
lation of temporal physiology. Rev. Reprod. 3, 145–150. Axel, R. (2001). A chemosensory gene family encoding candidate gustatory
Fridovich, I. (2013). Oxygen: how do we stand it? Med. Princ. Pract. 22, and olfactory receptors in Drosophila. Cell 104, 661–673.
131–137. Shen, W.L., Kwon, Y., Adegbola, A.A., Luo, J., Chess, A., and Montell, C.
Hagins, F.M. (1973). Purification and partial characterization of the protein (2011). Function of rhodopsin in temperature discrimination in Drosophila. Sci-
component of squid rhodopsin. J. Biol. Chem. 248, 3298–3304. ence 331, 1333–1336.
Hu, C.D., Chinenov, Y., and Kerppola, T.K. (2002). Visualization of interactions Sperling, W., and Rafferty, C.N. (1969). Relationship between absorption
among bZIP and Rel family proteins in living cells using bimolecular fluores- spectrum and molecular conformations of 11-cis-retinal. Nature 224, 590–594.
cence complementation. Mol. Cell 9, 789–798. Thompson, C.L., and Sancar, A. (2002). Photolyase/cryptochrome blue-light
Hubbard, R. (1969). Absorption spectrum of rhodopsin: 500 nm absorption photoreceptors use photon energy to repair DNA and reset the circadian
band. Nature 221, 432–435. clock. Oncogene 21, 9043–9056.
Insinna, C., Daniele, L.L., Davis, J.A., Larsen, D.D., Kuemmel, C., Wang, J., Ni- Thorne, N., and Amrein, H. (2008). Atypical expression of Drosophila gustatory
konov, S.S., Knox, B.E., and Pugh, E.N., Jr. (2012). An S-opsin knock-in mouse receptor genes in sensory and central neurons. J. Comp. Neurol. 506,
(F81Y) reveals a role for the native ligand 11-cis-retinal in cone opsin biosyn- 548–568.
thesis. J. Neurosci. 32, 8094–8104. Vought, B.W., Dukkipatti, A., Max, M., Knox, B.E., and Birge, R.R. (1999).
Iwabu, M., Yamauchi, T., Okada-Iwabu, M., Sato, K., Nakagawa, T., Funata, Photochemistry of the primary event in short-wavelength visual opsins at
M., Yamaguchi, M., Namiki, S., Nakayama, R., Tabata, M., et al. (2010). Adipo- low temperature. Biochemistry 38, 11287–11297.
nectin and AdipoR1 regulate PGC-1alpha and mitochondria by Ca(2+) and Wang, T., and Montell, C. (2007). Phototransduction and retinal degeneration
AMPK/SIRT1. Nature 464, 1313–1319. in Drosophila. Pflugers Arch. 454, 821–847.
Kolesnikov, A.V., Kisselev, O.G., and Kefalov, V.J. (2014). Signaling by Rod Wang, R., Mellem, J.E., Jensen, M., Brockie, P.J., Walker, C.S., Hoerndli, F.J.,
and Cone Photoreceptors: Opsin Properties, G-protein Assembly, and Mech- Hauth, L., Madsen, D.M., and Maricq, A.V. (2012). The SOL-2/Neto auxiliary
1262 Cell 167, 1252–1263, November 17, 2016

protein modulates the function of AMPA-subtype ionotropic glutamate recep- Xiang, Y., Yuan, Q., Vogt, N., Looger, L.L., Jan, L.Y., and Jan, Y.N. (2010).
tors. Neuron 75, 838–850. Light-avoidance-mediating photoreceptors tile the Drosophila larval body
wall. Nature 468, 921–926.
Ward, A., Liu, J., Feng, Z., and Xu, X.Z. (2008). Light-sensitive neurons and
channels mediate phototaxis in C. elegans. Nat. Neurosci. 11, 916–922. Xiao, R., Zhang, B., Dong, Y., Gong, J., Xu, T., Liu, J., and Xu, X.Z.S. (2013).
A genetic program promotes C. elegans longevity at cold temperatures via a
Wu, L.Z., Sheng, Y.B., Xie, J.B., and Wang, W. (2008). Photoexcitation of tryp- thermosensitive TRP channel. Cell 152, 806–817.
tophan groups induced reduction of disulfide bonds in hen egg white lyso- Yau, K.W., and Hardie, R.C. (2009). Phototransduction motifs and variations.
zyme. J. Mol. Struct. 882, 101–106. Cell 139, 246–264.
Wu, D., Hu, Q., Yan, Z., Chen, W., Yan, C., Huang, X., Zhang, J., Yang, P., Zhang, H.J., Anderson, A.R., Trowell, S.C., Luo, A.R., Xiang, Z.H., and Xia, Q.Y.
Deng, H., Wang, J., et al. (2012). Structural basis of ultraviolet-B perception (2011). Topological and functional characterization of an insect gustatory re-
by UVR8. Nature 484, 214–219. ceptor. PLoS ONE 6, e24111.
Cell 167, 1252–1263, November 17, 2016 1263

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
Mouse monoclonal anti-1D4 Polgenix N/A
Rabbit monoclonal anti-C-LITE-1 This paper N/A
Rabbit monoclonal anti-N-LITE-1 This paper N/A
Mouse monoclonal anti-N-Myc ThermoFisher CAT# MA1-16638; RRID:
AB_2235735
bis-trsi-propane Sigma CAT# 79-97-0
proteinase inhibitor cocktail Roche CAT# 11836170001
n-dodecyl-b-D-maltopyranoside Affymentrix CAT# 69227-93-6
9-cis-retinal Toronto Research CAT# 514-85-2
Chemicals
Hydrogen peroxide solution Sigma CAT# 7722-84-1
Urea Sigma CAT# 57-31-6
1D4 peptide Genscript Lot# 89521380001
Bradford protein assay Bio-Rad CAT# 5000001
ECL western blotting kit ThermoFisher CAT# 35050
C. elegans: lite-1(xu7) Ward et al., 2008; Caenorhabditis TQ800
Genetics Center
C. elegans: N2 Caenorhabditis Genetics Center WormBase: N2
C. elegans: xuIs98 [Pmyo-3::lite-1::1D4::SL2::YFP] This paper TQ2518
C. elegans: xuIs397 [Pmyo-3::lite-1(W328F)::1D4::SL2::YFP] This paper TQ6448
C. elegans: xuIs399 [Pmyo-3::lite-1(W77F)::1D4::SL2::YFP] This paper TQ6450
C. elegans: xuIs403 [Pmyo-3::lite-1(A332V)::1D4::SL2::YFP] This paper TQ6454
C. elegans: xuIs404 [Pmyo-3::lite-1(S226F)::1D4::SL2::YFP] This paper TQ6455
C. elegans: xuEx1623 [Pmyo-3::lite-1(W77A)::1D4::SL2::YFP] This paper TQ5143
C. elegans: xuEx2430[Pmyo-3::bJun::N-YFP::lite-1::SL2::DsRed] This paper TQ6688
C. elegans: xuEx2431[Pmyo-3::bFos::C-YFP] This paper TQ6689
C. elegans: xuEx2432[Pmyo-3:: DbFos::C-YFP] This paper TQ6690
C. elegans: xuEx386[Pmyo-3::myc::lite-1::SL2::YFP] This paper TQ1353
C. elegans: xuIs32[Pmyo-3::lite-1::SL2::YFP] This paper TQ1230
C. elegans: xuIs442[Pmyo-3::GUR-3(Y79W)::1D4::SL2::YFP] This paper TQ7405
C. elegans: xuIs441[Pmyo-3::GUR-3::1D4::SL2::YFP] This paper TQ7404
C. elegans: xuIs444[Pmyo-3::RCaMP]; lite-1(xu7) This paper TQ7428
Recombinant DNA
Pmyo-3::lite-1::1D4::SL2::YFP This paper pSX1580
Pmyo-3::lite-1(W328F)::1D4::SL2::YFP This paper pSX1712
Pmyo-3::lite-1(W77F)::1D4::SL2::YFP This paper pSX1713
Pmyo-3::lite-1(A332V)::1D4::SL2::YFP This paper pSX1710

Continued
Pmyo-3::lite-1(S226F)::1D4::SL2::YFP This paper pSX1711
Pmyo-3::lite-1(W77A)::1D4::SL2::YFP This paper pSX1714
Pmyo-3::bJun::N-YFP::lite-1::SL2::DsRed This paper pSX1740
Pmyo-3::bFos::C-YFP This paper pSX1741
Pmyo-3:: DbFos::C-YFP This paper pSX1742
Pmyo-3::myc::lite-1::SL2::YFP This paper pSX153
Pmyo-3::GUR-3(Y79W)::1D4::SL2::YFP This paper pSX1902
Pmyo-3::GUR-3::1D4::SL2::YFP This paper pSX1903
Pmyo-3::RCaMP This paper pSX1904
Wormlab system MBF Bioscience N/A
MetaFluor Molecular Devices N/A
Requests for reagents and resources may be directed to the Lead Contact, X.Z. Shawn Xu (shawnxu@umich.edu).
C. elegans strains were maintained at 20 C on nematode growth medium (NGM) plates seeded with OP50 bacteria. Liquid culture
was used to produce large quantities of worms for protein purification (see Methods Details). Transgenic lines were generated by
injecting plasmid DNA directly into hermaphrodite gonad. Integrated transgenic strains were outcrossed at least six times before
used for protein purification.
METHODS DETAILS
Immunostaining to determine the membrane topology of LITE-1

Immunostaining was performed on primary cultured cells using standard protocols (Christensen et al., 2002). Muscle cells co-ex-
pressed LITE-1 and GFP or expressed GFP alone as a transgene driven by the muscle-specific promoter myo-3. Gravid hermaph-
rodites were lysed to release eggs, and embryos were dissociated by chitinase treatment and trituration, filtered through a 5 mm
membrane, plated on cover glasses coated with peanut lectin, and cultured in L15 with 10% serum (340-345 mOsm) at 20 C. To
perform non-permeablized surface staining, live cells were first blocked with 3% BSA and 5% normal goat serum (NGS) in PBS
for 30 min, and then incubated with primary antibodies (1 mg/ml) for one hour in PBS (1.5% BSA) at room temperature. Following three
washes with PBS, cells were fixed for 10 min with 1.5% paraformaldehyde (PFA) in PBS followed by three washes with PBS and one
hour incubation in second antibodies (1:2000, Cy3 conjugated). After five washes with PBS, cover glasses were mounted for imaging
analysis. To perform permeabilized staining, cells were first fixed with 1.5% PFA in PBS for 10 min at room temperature, rinsed three
times with PBS, and permeabilized with 0.5% Trition X-100 in PBS for 5 min. After three washes with PBS, cells were blocked with
BSA and NGS, incubated with primary antibodies, and washed five times. Following one hour incubation with secondary antibodies,
cover glasses were rinsed five times before mounting. The N- and C-terminal end peptides (15 residues) were used to immunize
rabbits to generate LITE-1 antibodies which were affinity-purified before use for staining (YenZym Antibodies).
Purification and spectrophotometric analysis of LITE-1 and control proteins

Worms were cultured in the dark. They were first cultured on NGM plates and then transferred to 10 liters of S medium for liquid cul-
ture using a fermenter (New Brunswick, 20 C, 50% dissolved oxygen, 300 rpm agitation, pH7.2) with the support from concentrated
HB101 bacteria. After 2 generations (about 7-8 days) in the fermenter, worms were harvested and suspended in 80 mL of 25 mM bis-
trsi-propane (BTP) buffer (pH7.2) supplemented with proteinase inhibitor cocktail (Complete Mini, EDTA-free). All purification steps
were carried out in the dark. A microfluidizer (Microfluidics) was used to break the worms (120 psi, 5 cycles). After removing the debris

by low speed centrifugation at 1,000 g for 10 min at 4 C, the supernatant was collected and centrifuged again at high speed
(100,000 g) for 1 hr at 4 C to pellet cell membranes, which were solubilized with 20 mM n-dodecyl-b-D-maltopyranoside (DDM;
Affymentrix) in BTP buffer (pH7.2) containing 500mM NaCl. After removing unsolubilized materials by centrifugation at 40,000 g
for 30 min, we loaded the extract to an a1D4 affinity column. Note: we attached a 1D4 tag to the C terminus of LITE-1 and
GUR-3 expressed as a transgene in the worm muscle, as described for A2A receptor (Salom et al., 2012). Bovine rhodopsin (Rho)
has this tag sequence at its C terminus. After washing with the washing buffer (10 mM DDM in 25 mM BTP buffer [pH7.2] and
500mM NaCl), we eluded LITE-1 with 1.5 mg/ml of 1D4 peptide diluted in this buffer. Purified LITE-1 was loaded onto a molecular
size separation column (GE healthcare Bio-Sciences) to remove 1D4 peptide before spectrophotometric analysis. When purifying
bovine rhodopsin (Rho), 2 mM 9-cis-retinal was used to resuspend pelleted cell membranes and incubate for 30 min prior to solu-
bilization with DDM. This treatment was not performed when purifying LITE-1, GUR-3, or A2A receptor. Purified protein samples used
for SDS-PAGE were prepared under non-reducing conditions at room temperature (no heating) to avoid aggregation.
The concentration of purified proteins was first determined by the Bradford assay (Bio-Rad), and then verified by SDS-PAGE fol-
lowed by Coomassie staining using rhodopsin as a standard. The concentration data were also independently verified by silver stain-
ing following SDS-PAGE using rhodopsin as a standard.
Spectrophotometric analysis was performed on a UV-Vis spectrophotometer (Varian Cary 50) in a quartz cuvette. Samples and
reference blanks were all diluted in the same washing buffer. Note: 1D4 peptide was removed from samples prior to spectrophoto-
metric analysis (see above). For those experiments involving treatment with denaturing agents or H2O2, LITE-1 was incubated with
these agents for 5 min at room temperature prior to spectrophotometric analysis. All the assays were carried out in the dark.
Behavior assays to quantify LITE-1 function

Body paralysis assay was performed on day 1 gravid adult hermaphrodites, which were raised on NGM plates, under a Zeiss fluores-
cence dissection scope (Zeiss Discovery) coupled with an M2Bio lens system from Kramer Scientifics. The assay was done on NGM
plates without OP50 using a protocol similar to that for assaying phototaxis behavior (Liu et al., 2010; Ward et al., 2008). UVA light
pulses (350 ± 20 nm, 0.8 mW/mm2, up to 20 s) were delivered from an Arc lamp (X-Cite 120) to the worm through a 10x lens in com-
bination with 2.5x zoom. To deliver UVB light, we attached a 280 ± 10 nm excitation filter (from Semrock, 0.03 mW/mm2) to the end of
the liquid light guide of the lamp, which was then directly pointed to the worm using a micromanipulator. We manually moved the dish
to keep the worm in the view field. In another assay, we quantified body paralysis by monitoring locomotion speed decrease over time
using the Wormlab system (MBF Bioscience). UVA and UVB light was directed to the worm using a liquid light guide as described
above. To minimize the effect of endogenous lite-1 gene on locomotion speed under UV light (Liu et al., 2010), this assay was per-
formed in lite-1(xu7) mutant background for all genotypes. A total of 20-50 animals were assayed for each genotype in each experiment
unless otherwise indicated. The sample size of each assay was found to be adequate after running power analysis (p > 0.8). Each worm
was assayed five times, and once the worm was paralyzed, we stopped the assay to let it recover for next round of test.
Calcium imaging to quantify LITE-1 function

Calcium imaging of muscle cells was performed on an inverted microscope (Olympus IX73) under a 60x lens as previously described
(Li et al., 2014; Xiao et al., 2013). RCaMP was expressed as a transgene in muscle cells using the myo-3 promoter. YFP was also
expressed as a transgene under the same promoter to enable ratiometric imaging. Transgenic worms expressing LITE-1 or control
worms were glued on an agarose pad and bathed in solution (10 mM HEPES [pH 7.4], 5 mM KCl, 145 mM NaCl,1.2 mM MgCl2,
2.5 mM CaC12, and 10 mM glucose). UV light (UVA: 340 ± 20 nm, 0.7 mW/mm2; UVB: 280 ± 10 nm, 0.02 mW/mm2; 5 s) was directly
projected to the worm through a liquid light guide mounted on a micromanipulator. Images were acquired with a Roper CoolSnap
CCD camera and processed with MetaFluor software (Molecular Devices). To minimize the contribution from endogenous photosen-
sation system, all genotypes, including WT, carried lite-1(xu7) mutation in the background (Liu et al., 2010). The peak percentage
change in the ratio of RCaMP/YFP fluorescence was quantified.
Molecular biology
All the plasmids are listed in the Key Resources Table. All the LITE-1 and GUR-3 constructs carry a 1D4 tag at the C terminus, with the
exception in Figure 1 where no such a tag was included to LITE-1. Myc tag was only included in the construct used in Figure 1B. As
listed in the Key Resources Table, some plasmids contain an SL2::YFP fragment, which directs expression of YFP as a separate tran-
script under the control of the same promoter of its upstream gene in an operon-like fashion. This enables expression of YFP as a co-
expression marker in muscle cells under the control of the same muscle-specific myo-3 promoter that drives expression of LITE-1.
Quantification and statistical parameters were indicated in the legends of each figure, including error bars (SEM), n numbers, and
p values. For those involving multiple group comparisons, we applied ANOVA followed by a post hoc test. We considered
p values of < 0.05 significant.

Non-permeabilized live cell staining Permeablized staining

GFP Ab staining Merge GFP Ab staining Merge
α C-LITE-1
α N-LITE-1
α N-Myc
Figure S1. Control Images for Figure 1B, Related to Figure 1

Immunostaining was conducted as described in Figure 1B. Primary cultured cells were derived from transgenic worms expressing GFP only but not LITE-1 in
muscle cells under the myo-3 promoter. No LITE-1 signal was detected, showing that LITE-1 staining seen in Figure 1B is specific.
A 0.2
UVA
Locomotion speed (mm/s)

0.15 WT
lite-1 transgene
0.1
0.05
0
5s
Figure S2. Ectopic Expression of LITE-1 as a Transgene in Muscle Cells Confers Photosensitivity, Related to Figure 2
LITE-1 was expressed as a transgene in muscle cells under the myo-3 promoter. Worm locomotion speed was monitored and quantified by WormLab system
(MBF Bioscience). UVA light (350 ± 20 nm, 0.8 mW/mm2) was directed to worms, which induced muscle contraction in lite-1 transgenic worms but not in WT
worms, leading to the paralysis of the former (locomotion speed reduced to zero), but not the latter. To minimize the effect of endogenous lite-1 gene on
locomotion speed under UV light, the experiments were done in lite-1(xu7) mutant background (i.e., both genotypes carried lite-1(xu7) mutation). Shades along
the traces denote error bars (SEM). n = 25.
A kDa B
kDa
191 191
97 97
64
64
51 51
39 39
28 28
19 Blot w/ α1D4
C 19 D
0.5
Absorbance (O.D.)
2 ε280 =4.0x106 LITE-1 (0.5 μM) Rho (2.7 μM)

A2AR (0.5 μM) 0.4 Rho (0.5 μM)
1.5
ε320=2.4x106 0.3
1
0.2
ε500=4.34x104
0.5 0.1
0 0
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700
E F
2 0.5
Absorbance (O.D.)
LITE-1 (0.5 μM) Mock Rho (2.7 μM) Mock

Urea (4M) 0.4 Urea (4M)
1.5
0.3
1
0.2
0.5 0.1
0 0
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700
G H
2 0.5
Absorbance (O.D.)
LITE-1 (0.5 μM) Mock Rho (2.7 μM) Mock

1.5 NaOH (0.1M) 0.4 NaOH (0.1M)
0.3
1
0.2
0.5
0.1
0 0
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700
Figure S3. Comparison of the Spectral Properties of LITE-1, Bovine Rhodopsin, and Adenosine A2A Receptor Purified from Worm Muscles,
Related to Figure 2
(A and B) LITE-1, Rho, and A2AR were purified side-by-side from transgenic worms under the same conditions. All transgenes have a 1D4 tag at the C terminus.
(A) Coomassie staining. (B) Western.
(C) LITE-1 shows strong photoabsorption at 0.5 mM, whereas A2AR does not.
(D) Rho shows minimal photoabsorption at 0.5 mM, and only shows modest photoabsorption at a higher concentration (2.7 mM). Note: the y axis scale in (C) and (D)
are different.
(E and F) Denaturing LITE-1 with urea abolishes its photoabsorption (E), whereas the same treatment does not eliminate the photoabsorption of Rho and
instead shifts its 500 nm absorbance peak to 370 nm (F).
(G and H) Denaturing LITE-1 with NaOH abolishes its photoabsorption (E), whereas the same treatment on Rho does not and instead shifts its 500 nm
absorbance peak to 370 nm (F).
A B
1.5 1.5
LITE-1 (0.4 μM) Mock bRho (4 μM) Mock
Absorbance (O.D.)
H2O2 (0.1mM) H2O2 (0.1mM)
1 1
0.5 0.5
0 0
250 300 350 400 450 500 550 600 650 700 250 300 350 400 450 500 550 600 650 700
Figure S4. The Impact of H2O2 on LITE-1 Photoabsorption, Related to Figure 3

(A) H2O2 treatment abolishes the light absorption of LITE-1. LITE-1 was treated with H2O2 (0.1 mM) for 5 min at room temperature prior to spectral analysis.
(B) H2O2 treatment does not abolish the photosensitivity of bacterial rhodopsin (bRho) but shifts its absorbance peak from 568 nm to 370 nm. bRho was treated
with H2O2 (0.1 mM) for 5 min at room temperature prior to spectral analysis. A similar phenomenon was observed with purified bovine rhodopsin (not shown).
A UVA
0.2

lite-1 transgene
0.15
lite-1(A332V) transgene
0.1
lite-1(S226F) transgene
0.05
0
5s
B
0.2
UVB
WT
0.15
lite-1 transgene
0.1 lite-1(A332V) transgene
lite-1(S226F) transgene
0.05
0
5s
Figure S5. Residues S226 and A332 in LITE-1 Are Critical for Its Sensitivity to UVA but Not UVB Light In Vivo, Related to Figures 4 and 5
(A and B) LITE-1S226F and LITE-1A332V were expressed as a transgene in muscle cells under the myo-3 promoter. Worm locomotion speed was monitored and
quantified by WormLab system (MBF Bioscience). UVA (350 ± 20 nm, 0.8 mW/mm2) (A) or UVB (280 ± 10 nm, 0.03 mW/mm2) (B) light was directed to the worm,
which induced muscle contraction, leading to paralysis of the worm (locomotion speed reduced to zero). To minimize the effect of endogenous lite-1 gene on
locomotion speed under UV light, the experiments were done in lite-1(xu7) mutant background (i.e., all genotypes carried lite-1(xu7) mutation). Shades along the
traces denote error bars (SEM). n = 25.
A UVA
0.2

0.15 lite-1 transgene
lite-1(W77F) transgene
0.1
0.05
0
5s
UVB
B 0.2
lite-1 transgene
0.15
0.1
0.05
0
5s
Figure S6. The Two Tryptophan Residues W77 and W328 in LITE-1 Are Required for Its Sensitivity to Both UVA and UVB Light In Vivo, Related
to Figure 6
(A and B) LITE-1W77F and LITE-1W328F were expressed as a transgene in muscle cells under the myo-3 promoter. Worm locomotion speed was monitored and
quantified by WormLab system (MBF Bioscience). UVA (350 ± 20 nm, 0.8 mW/mm2) (A) or UVB (280 ± 10 nm, 0.03 mW/mm2) (B) light was directed to the worm.
The two tryptophan mutations disrupted the ability of LITE-1 in mediating UVA- and UVB-light-induced paralysis caused by muscle contraction (locomotion
speed reduced to zero). To minimize the effect of endogenous lite-1 gene on locomotion speed under UV light, the experiments were done in lite-1(xu7) mutant
background (i.e., all genotypes carried lite-1(xu7) mutation). Shades along the traces denote error bars (SEM). n = 25.
A *
LITE-1 - 74- I Y S W L V F C L L L F T T L R K F N Q V G V R P N G T R E N - L Q - E F F A N - 111
GUR-3 - 76- I Y N Y L T L A I L T A A T I R R I S Q I K Q K S A T N E E K D A A - F H V L N - 114
GUR-1 -114- L F L F R L L A I F P A T T D R K S R R - - - - - K R N H R S I I K L I L Y V N - 148
GUR-4 - 52- - - - - - - - - - L R - - - I D L - - - - - - R K P G A K R N I - - - - - - - N - 66
GUR-5 - 45- - - - - - - - - - L R - - - L D F V - - - - - N S D G W A R K I - - - - - - - N - 60
LITE-1 - 314- A Q - S I C W S E V V S I V I W I V N A I L V L L L F S L P A F M I N -
* 374
GUR-3 - 316- N G I Q A D M A E T F S V A I W L T N T M L A L M L F S I P A F M I A - 350
GUR-1 - 394- V H V K I C W A A Y Q V - - - - - V M A I L H I I I I C S T G M M T N - 423
GUR-4 - 304- Y D L I L C M P - - - - - - - - - - - - - - - T I G L C A F S F F A V - 323
GUR-5 - 297- T D F L I C M P - - - - - - - - - - - - - - - F I L F C T C A F C S V - 316
B C D E
100 150
Body paralysis (%)
200 gur-3(Y79W) transgene 200

RCaMP ∆R/R (%)
gur-3 transgene
RCaMP ∆R/R (%)

UVA
RCaMP ∆R/R (%)

UVA
80
150 150 100
60
100 100
40 UVA UVA ns
50
20 ns 50 50
0 0 0 0
10s 10s
F 0.2 G 0.2
UVA UVB
lite-1 transgene
0.15 0.15
gur-3(Y79W)
transgene
0.1 0.1
gur-3 transgene
0.05 0.05
5s 5s
0 0
Figure S7. Sequence Alignment of C. elegans GR Family Proteins and Additional Data Related to GUR-3, Related to Figure 7
(A) The two tryptophan residues W77 and W328 in LITE-1 are marked with an asterisk in red. W77 is not conserved in any other GR members. W328 is only found
in GUR-3. The sequences between residues 112-313 in LITE-1 are not shown, as there is limited homology in this large segment between LITE-1 and other GRs.
(B) Mutating Y79 to W in GUR-3 does not promote its sensitivity to UVA light in vivo shown by paralysis assay. GUR-3Y79W and GUR-3 were expressed as a
transgene in muscle cells. Worms were exposed to a 20 s pulse of UVA light (350 ± 20 nm, 0.8 mW/mm2), and those showing muscle contraction-induced
paralysis during light illumination were scored positive. n = 50. Error bars: SEM p = 0.153 (t test).
(C–E) Mutating Y79 to W in GUR-3 does not promote its sensitivity to UVA light in vivo shown by calcium imaging. A 5 s pulse of UVA light (340 ± 20 nm,
0.7 mW/mm2) was directed to the worm. (C) and (D) Imaging traces. (E) Bar graph. n = 20. p = 0.779 (t test).
(F and G) Mutating Y79 to W in GUR-3 promote its sensitivity to UVB but not UVA light in vivo shown by locomotion assay. The assay was done as in Figure S6. The
lite-1 transgene traces were duplicates from Figure S6 and were included for comparison.
Article
DNA Damage Signaling Instructs Polyploid

Macrophage Fate in Granulomas
Laura Herrtwich, Indrajit Nanda,
Konstantinos Evangelou, ...,
Andreas Diefenbach, Philipp Henneke,
Antigoni Triantafyllopoulou
Correspondence
diefenbach@uni-mainz.de (A.D.),
antigoni.triantafyllopoulou@
uniklinik-freiburg.de (A.T.)
In Brief
Polyploid macrophages develop in
response to chronic inflammatory
signaling from toll-like receptors via
replication stress and activation of the
DNA damage response.
Highlights
d Polyploid macrophage fate is controlled by persistent
inflammatory stimuli
d Polyploid granuloma macrophages form by modified cell

divisions and mitotic defects
d Polyploid macrophages grow by overcoming p53-

dependent barriers to their proliferation
d Myc and the DNA Damage Response promote polyploid

macrophage differentiation
Herrtwich et al., 2016, Cell 167, 1264–1280

Article
DNA Damage Signaling Instructs

Polyploid Macrophage Fate in Granulomas
Laura Herrtwich,1,2,29 Indrajit Nanda,3,29 Konstantinos Evangelou,4,29 Teodora Nikolova,5,29 Veronika Horn,1,29 Sagar,6
Daniel Erny,7 Jonathan Stefanowski,8 Leif Rogell,6,9,10 Claudius Klein,11 Kourosh Gharun,2 Marie Follo,11
Maximilian Seidl,12 Bernhard Kremer,2 Nikolas Münke,2 Julia Senges,2 Manfred Fliegauf,2 Tom Aschman,1
Dietmar Pfeifer,11 Sandrine Sarrazin,13 Michael H. Sieweke,13,14 Dirk Wagner,2,15 Christine Dierks,11
(Author list continued on next page)
1Department of Rheumatology and Clinical Immunology, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg,
79106 Freiburg, Germany

2Center of Chronic Immunodeficiency, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg,
Germany
3Institute of Human Genetics, Biozentrum, Am Hubland, 97074 Würzburg, Germany
4Molecular Carcinogenesis Group, Department of Histology and Embryology, School of Medicine, National and Kapodistrian University of
Athens, 115 27 Athens, Greece

5Institute of Toxicology, University Medical Center Mainz, 55131 Mainz, Germany
6Max Planck Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
7Institute of Neuropathology, Medical Center, University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany
8Immune Dynamics, Charité Universitätsmedizin and Deutsches Rheumaforschungszentrum, 10117 Berlin, Germany
9Institute of Medical Microbiology and Hygiene, University of Mainz Medical Center, 55131 Mainz, Germany
10Research Center for Immunology and Immunotherapy, University of Mainz Medical Center, 55131 Mainz, Germany
11Department of Medicine I, Medical Center, University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany
12Department of Pathology, Medical Center, University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany
13Aix-Marseille Univ, CNRS, INSERM, CIML, 13288 Marseille, France
14Max-Delbrück-Centrum für Molekulare Medizin in der Helmholtzgemeinschaft (MDC), 13125 Berlin, Germany
15Division of Infectious Diseases, Department of Internal Medicine 2, Medical Center – University of Freiburg, Faculty of Medicine, University
of Freiburg, 79106 Freiburg, Germany

16Eye Center, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany
17Department of Internal Medicine 3, Rheumatology and Immunology, University of Erlangen-Nuremberg, 91054 Erlangen, Germany
18Center for Sepsis Control and Care, Jena University Hospital, 07747 Jena, Germany
19BIOSS Centre for Biological Signalling Studies, University of Freiburg, 79106 Freiburg, Germany
20Department of Pathology, Schleswig-Holstein University Hospital, Campus Lübeck and Research Center Borstel, 23845 Borstel, Germany
21Division of Infection Immunology, Research Center Borstel, 23845 Borstel, Germany
22Cluster of Excellence, Inflammation at Interfaces (Borstel-Kiel-Lübeck-Plön), 24118 Kiel, Germany
(Affiliations continued on next page)
SUMMARY not, as previously thought, by cell-to-cell fusion.

TLR2 signaling promoted macrophage polyploidy
Granulomas are immune cell aggregates formed in and suppressed genomic instability by regulating
response to persistent inflammatory stimuli. Granu- Myc and ATR. We propose that, in the presence of
loma macrophage subsets are diverse and carry persistent inflammatory stimuli, pathways previously
varying copy numbers of their genomic information. linked to oncogene-initiated carcinogenesis instruct
The molecular programs that control the differentia- a long-lived granuloma-resident macrophage differ-
tion of such macrophage populations in response entiation program that regulates granulomatous tis-
to a chronic stimulus, though critical for disease sue remodeling.
outcome, have not been defined. Here, we delineate
a macrophage differentiation pathway by which a
INTRODUCTION
persistent Toll-like receptor (TLR) 2 signal instructs
polyploid macrophage fate by inducing replication Granulomatous diseases of infectious, autoinflammatory,
stress and activating the DNA damage response. allergic, and malignant etiologies, such as mycobacterial dis-
Polyploid granuloma-resident macrophages formed ease, vasculitis, inflammatory bowel disease, and sarcoidosis
via modified cell divisions and mitotic defects and affect millions of people worldwide. Their common hallmark is
Thomas Haaf,3 Thomas Ness,16 Mario M. Zaiss,17 Reinhard E. Voll,1 Sachin D. Deshmukh,18 Marco Prinz,7,19
Torsten Goldmann,20 Christoph Hölscher,21,22,23 Anja E. Hauser,8 Andres J. Lopez-Contreras,24 Dominic Grün,6
Vassilis Gorgoulis,4,25,26,27 Andreas Diefenbach,9,10,* Philipp Henneke,2,28 and Antigoni Triantafyllopoulou1,2,30,*
23German Centre for Infection Research, 23845 Borstel, Germany
24Center for Chromosome Stability, Department of Cellular and Molecular Medicine, Panum Institute, University of Copenhagen,
2200 Copenhagen N, Denmark
25Faculty Institute of Cancer Sciences, Manchester Academic Health Sciences Centre, University of Manchester, Manchester M20 4QL, UK
26Biomedical Research Foundation, Academy of Athens, 115 27 Athens, Greece
27Department of Pathophysiology School of Medicine, National and Kapodistrian University of Athens, 115 27 Athens, Greece
28Center for Pediatrics and Adolescent Medicine, Medical Center, University of Freiburg, Faculty of Medicine, University of Freiburg,
79106 Freiburg, Germany

29Co-first author
30Lead Contact
*Correspondence: diefenbach@uni-mainz.de (A.D.), antigoni.triantafyllopoulou@uniklinik-freiburg.de (A.T.)

formation of a granuloma, a compact and often highly ordered cess for the genesis of the various polyploid MF subsets found
aggregate of immune cells that forms in response to a persistent in granulomatous diseases is lacking. The fact that MF in gran-
inflammatory stimulus. At its core, the granuloma consists of ulomas carry varying copy numbers of their genomic information
different macrophage (MF) subsets displaying a range of mor- poses a series of significant basic questions that have not been
phologies, such as epithelioid MF, foam cells (i.e., MF loaded addressed to date. Does the formation of polyploid MF pose a
with lipid droplets), mononuclear (MoNucl), binuclear (BiNucl), threat to their genomic stability? What is the role of the DNA
and multinuclear (MultiNucl) MF (or MMF), and Langhans giant damage response in MF differentiation into polyploid subsets?
cells (Ramakrishnan, 2012; Williams and Williams, 1983). The Do polyploid MF constitute a distinct fate that contributes to
molecular programs that control the differentiation of such MF the pathogenesis of granulomatous diseases?
populations in response to a chronic stimulus are likely critical Here, using an array of techniques, we delineate a MF differ-
to disease outcome. A prominent example is tuberculosis, a entiation pathway in response to persistent inflammatory stimuli.
pandemic infectious disease caused by Mycobacterium (M.) Sensing of BLP controlled the differentiation of proliferating MF
tuberculosis. M. tuberculosis infects approximately a third precursors into polyploid MF expressing distinct metabolic and
of the world’s population and is the leading cause of death ECM remodeling gene expression signatures. Toll-like receptor
from a bacterial infection worldwide (Nathan, 2009). In tubercu- (TLR)2 signaling via MyD88 promoted MF genome duplications
losis, distinct spectra of MF differentiation determine disease via mitotic defects but not by cell-to-cell fusion. BLP-induced
outcome. On one end of the spectrum, microbicidal MF kill intra- polyploid MF grew further by re-entering the cell cycle and over-
cellular bacteria. On the other end, permissive MF provide coming p53-dependent barriers to their proliferation. TLR2
mycobacteria with a replicative niche. These spectra are charac- signaling promoted MF polyploidy and alleviated genomic insta-
terized by distinct metabolic and effector profiles. Thus, microbi- bility, by regulating Myc and the DNA damage response (DDR).
cidal MF produce reactive nitrogen species, MF characterized Therefore, we have unlocked a previously unknown and unique
by lipid accumulation (foam cells) are associated with granuloma role of growth and DDR signals in determining MF differentiation
necrosis and bacterial persistence (Peyron et al., 2008; Russell in the presence of persisting inflammatory stimuli.
et al., 2009), and MF expressing extracellular matrix (ECM)-
remodeling molecules, such as matrix metalloproteinase 9 RESULTS
(MMP9) are crucial for granuloma formation and bacterial spread
(Taylor et al., 2006; Volkman et al., 2010). It is currently unknown MF with Varying Numbers of Nuclei Co-localize with
if the various phenotypically distinct MF subsets contained in Proliferating F4/80+ Precursors in Small Mycobacterial
granulomas exhibit distinct metabolic or functional profiles. Granulomas
Thus, the mechanisms controlling MF differentiation and the To identify mechanisms by which persistent inflammatory sig-
functional profiles of the various MF subsets contained in gran- nals instruct MF differentiation in granulomas, we used infection
ulomas are key to identifying novel strategies to promote host with M. bovis Bacillus Calmette-Guerin (BCG). BCG induced the
resistance. formation of BiNucl MF and MMF in liver granulomas (Figures
Within the scope of understanding MF cell-fate decisions in S1A–S1B) and increased the numbers of proliferating (Ki67+)
granulomas, an important and unresolved question relates to F4/80+ cells (Figures S1C and S1D). In smaller granulomas, an
MF polyploidization. It is generally believed that the formation organized topographical arrangement of Ki67+F4/80+ precur-
of polyploid giant cells can be explained by cell-to-cell fusion sors and BiNucl MF or MMF emerged: the former were posi-
(Helming and Gordon, 2009). While this has been well docu- tioned in the outside (Figure S1E), while BiNucl MF and MMF
mented for RANKL-induced osteoclasts and for the generation were found primarily in the center of the granuloma (Figure S1F).
of MMF using stimulation of non-cycling progenitors with myco- In more mature granulomas, Ki67+F4/80+ cells were fewer and
bacteria or bacterial lipoproteins (BLPs) in vitro (Puissegur et al., their location within granulomas was ill defined (Figure S1G).
2007), direct evidence for cell-to-cell fusion as the leading pro- These data raised the possibility that during BCG infection
Cell 167, 1264–1280, November 17, 2016 1265

A
B C
D E F G
H I J
M
K
Figure 1. BLPs Mediate Non-canonical MF Differentiation

(A) Experimental setup for differentiation of MMF using proliferating MF precursors.
(B) Examples of MMF.
(C) Numbers of MMF. Mean ± SD from three independent experiments.
(D) Heatmap of differentially expressed genes, selected based on involvement in MF function or differentiation programs. Gene array was performed with four
independent biological replicates per group.
1266 Cell 167, 1264–1280, November 17, 2016

proliferating precursor cells differentiate into MF with varying A Non-canonical MF Differentiation Program Instructs
numbers of nuclei and potentially varying copy numbers of their Polyploid MF Fate
genome. MMF contain several nuclei and clearly display a distinct
morphology (Figure 1B), suggesting that BLPs induce a non-ca-
TLR2 and MyD88-Dependent Differentiation of MF nonical MF differentiation program, which differs from that
Precursors into MMF induced by M-CSF stimulation alone. To explore this, we per-
To explore the molecular programs that drive the differentiation formed genome-wide transcriptome analysis of MF precursors
of MF precursors into polyploid MF, we compared the ability of with and without stimulation with BLP. When comparing gene
uncommitted hematopoietic progenitors, common MF dendritic expression signatures of MF precursors stimulated with
cell progenitors (MDPs), and mature monocytes to differentiate M-CSF and those stimulated with M-CSF and BLP, we found a
into MMF following stimulation with M-CSF and FSL-1 (a syn- significant downregulation of MF cell-fate-determining tran-
thetic BLP acting as a ligand for the TLR2/6 complex) (Fig- scription factors (Geissmann et al., 2010), including Mafb and
ure S1H). Intriguingly, only MDP formed numerous large MMF Irf8 (Figure 1D), which was confirmed by qRT-PCR and immuno-
(Figures S1I–S1J). Similar to MDP, CD115high MF precursors blot (IB) (Figures 1E–1G).
(Figure S1K), enriched by culturing bone marrow (BM) cells first Since only a fraction of cells are MMF after culture in FSL-1,
for 4–5 days in low concentrations of M-CSF and then re- we performed single-cell (sc) RNA sequencing (RNA-seq) on
seeded and stimulated with BCG or FSL-1, differentiated very highly purified FSL-1-stimulated cells with a DNA content of 2c
efficiently into MMF (Figures 1A and 1B), a process that (F2c), >4c (F>4c), and control 2c MF (Ctrl) (Figures S3A and
required signaling via the adaptor protein MyD88 (Figures 1B S3B). scRNA-seq analysis using the RaceID2 algorithm (Grün
and 1C). MMF started to form after 3 days, and their numbers et al., 2016) revealed considerable heterogeneity among Ctrl
continuously increased until day 6 of stimulation (Figure S2A). MF, but also among F2c and F>4c MF (Figures S3C and S3D).
When such CD115high MMF precursors were cultured with M- Differential gene expression analysis within each of these groups
CSF alone, they uniformly differentiated into conventional BM- identified a number of genes differentially expressed between
derived MF (Figure 1B). These data indicated that the cycling F2c and F>4c MF (e.g., Cdk1, Tubb5, Actg1, Mmp9) (Fig-
capacity and differentiation potential of precursor cells may be ure S3E). Moreover, Apoe and Csf1r were further downregulated
an essential determinant of their ability to form MultiNucl prog- in F>4c compared to F2c MF in clusters 5 and 9, respectively
eny in response to BLP. (Figure S3E). These findings indicate that, although F>4c MF
Ligands of TLR2/1 (Pam3CSK4) and TLR2/6 (FSL-1) receptor cluster together with F2c MF based on whole-transcriptome
complexes were more efficient in inducing differentiation of similarity, they exhibit additional expression changes of various
MMF than the TLR4 ligand LPS (Figure S2B). Poly(I:C), which MMF differentiation signature genes indicative of their advanced
signals via TLR3, RIG-I, and MDA5, did not induce MMF forma- stage of differentiation. The presence of F>4c MF in different
tion (Figure S2B). We asked whether cytokines that are known to clusters also suggests the existence of multiple differentiation
be crucial for granuloma formation can induce MMF. Tumor ne- endpoints after FSL-1 stimulation.
crosis factor (TNF) induced differentiation of MMF as efficiently While MF differentiated in M-CSF alone expressed robust
as FSL-1 (Figure S2C), but interferon (IFN)-g and interleukin levels of Mafb, Mafb expression was abruptly shut down in
(IL)-1b did not (Figure S2D). Do BLPs induce MMF indirectly F>4c, and Mafb was already downregulated in F2c MF (Fig-
by inducing a secreted cytokine, such as TNF? To this end, we ure 1H). Enforced expression of MafB in MF precursors sup-
mixed CD45.1+Myd88+/+ with congenic CD45.2+Myd88/ MF pressed the formation of MMF (Figures 1I and 1J). These data
precursors in vitro prior to stimulation with FSL-1 (Figures S2E suggest that MF precursor differentiation into MMF has tran-
and S2F). Within the same well, only CD45.1+Myd88+/+ cells scriptional requirements that are distinct from canonical MF dif-
formed MMF, whereas CD45.2+Myd88/ cells did not, demon- ferentiation and may represent a distinct differentiation program
strating that cell-autonomous MyD88 signaling was required for driven by a persisting inflammatory stimulus.
BLP-induced MMF formation (Figures S2E and S2F). In addition,
Tnf/ MF precursors were not impaired in forming MMF when Metabolic and Tissue-Remodeling Gene Signatures
stimulated with FSL-1 (Figures S2G and S2H). These data indi- of MMF
cated that TLR2 and TNF pathways can independently instruct Next, we assessed transcripts that differed at least 2-fold be-
MMF fate. tween FSL1-stimulated MF and controls. FSL1-stimulated MF
(E–J) MafB negatively regulates MMF formation. (E) qRT-PCR of Mafb mRNA expression. Mean ± SD of triplicate determinants from three independent ex-
periments. (F and G) Immunoblotting (IB) for MafB and IRF8. Example of two independent experiments. (H) Violin plots comparing expression of Mafb, by scRNA-
seq. (I and J) MF precursors transduced with empty retroviral vector pMX-IRES-EGFP (EV) or pMX-Mafb-IRES-EGFP (MafbV) prior to stimulation. (I) IB for MafB.
(J) Numbers of MMFs. Mean ± SD from three independent experiments.
(K and L) Metabolic gene signatures in BLP-induced MMF. (K) Violin plots comparing expression of Apoe; Abca1 was analyzed by scRNA-seq. (L) IF for Nile Red
is shown.
(M) Heatmap of selected genes differentially expressed in Kupffer cells, granuloma F4/80hi MF, and F4/80low BiNucl and MMF. Means of duplicate determinants
from five to nine independent biological replicates per group.
(E and M) qRT-PCR data normalized relative to Gapdh mRNA expression.
(H and K) y axis, log2 (normalized count+0.1) expression levels; black point, mean of expression level.
*p < 0.05, **p < 0.01; scale bars, 10 mm. See also Figures S1, S2, S3, and S4.
Cell 167, 1264–1280, November 17, 2016 1267

A
D E
J K
1268 Cell 167, 1264–1280, November 17, 2016

significantly downregulated expression of genes regulating (Figure 2A). To explore whether fusion may occur in vivo, we
cholesterol efflux, such as Abcg, Abca, and Apoe (Figure 1D), generated mixed BM chimeras by transplanting lethally irradi-
which was verified by scRNA-seq (Figure 1K). Reactome ated CD45.2+ mice with a mixture of BM cells obtained from
pathway analysis revealed that F2c MF were highly enriched CD45.1+ and CD45.2+ donors (Figure S5A). BM chimeric mice
in genes involved in cholesterol biosynthesis while F>4c MF were infected with BCG (Figure 2B) and, if the cell-to-cell fusion
downregulate lipid metabolizing gene signatures (Figure S4). paradigm was true, we expected roughly 50% of polyploid MF to
Accordingly, Nile Red immunofluorescence (IF) revealed lipid co-express CD45.1 and CD45.2. However, we did not identify
body accumulation in FSL-1-stimulated MoNucl MF and any BiNucl MF or MMF that expressed both CD45.1 and
MMF (Figure 1L), indicating that the BLP-controlled MMF differ- CD45.2 in BCG granulomas (Figures 2B, 2C, and S5B). As a con-
entiation program leads to functional changes in cholesterol trol for our staining procedure, we employed BM from
metabolism. (CD45.1xCD45.2) F1 mice, leukocytes of which were double
In FSL1-stimulated MF, transcripts for MF-attracting chemo- positive for CD45.1 and CD45.2. Thus, IF staining for CD45.1
kines (i.e., Ccl2, Ccl7, and Ccl12) were 7- to 12-fold upregulated and CD45.2 faithfully detects double positive cells, if they exist
(Figure 1D). scRNA-seq revealed that these chemokines are (Figure S5C). These data demonstrate that differentiation of
already upregulated in F2c and remain highly expressed in MMF in the presence of a persistent inflammatory stimulus oc-
F > 4c MF (Figure S3F). BLPs induced increased expression of curs by a previously unappreciated process that does not involve
genes encoding for ECM remodeling proteases, such as cell-to-cell fusion.
Mmp8, Mmp9, Chi3l3, Ctsk, and Lox (Figures 1D and S3F),
and ECM remodeling gene pathways were highly enriched in Persistent Exposure to BLP Regulates MF Ploidy via
both F2c and F>4c MF (Figure S3E, cluster 13, and Figure S4). Modified Cell Division
This is an intriguing finding because ECM remodeling enzymes In mammalian cells, other than MF, polyploidization via modified
such as MMP9 are crucial for MF recruitment and granuloma cell division has been described (i.e., endoreplication, cytoki-
formation (Taylor et al., 2006; Volkman et al., 2010). nesis failure) (Figures S5D and S5E) (Davoli et al., 2010). To
Finally, we used laser capture microdissection (LCM) to isolate explore this possibility, we analyzed the nuclear area and nuclear
F4/80hi MF and F4/80low BiNucl and MMF from BCG liver gran- DNA content of single nuclei in an unbiased, high-content
ulomas and control liver-resident MF to confirm that the gene fashion using a quantitative image-based cytometry (QIBC)
signatures observed in vitro can also be found in MMF from protocol, allowing us to discriminate tetraploid cells with two
mycobacterial granulomas in vivo. qRT-PCR-based quantifica- diploid nuclei (indicating cytokinesis failure) from mononucle-
tion of selected transcripts revealed that cholesterol transporters ated 4c cells (indicating endoreplication) (Carvalho et al., 2011;
were downregulated, whereas genes associated with ECM re- Toledo et al., 2013). Both BCG and BLP stimulation of MF
modeling were strongly upregulated in MMF directly isolated precursors induced a significant, MyD88-dependent increase
from mycobacterial granuloma (Figures 1M and S3G–S3I). in nuclear area and in DNA content per single nucleus (Figures
Thus, MMF differentiating in the presence of a persistent TLR2 2D–2F). Fluorescence in situ hybridization (FISH) in interphase
stimulus show alterations in cholesterol and lipid metabolism nuclei confirmed increased DNA content per nucleus (Figures
and ECM remodeling gene expression signatures. 2G and 2H).
QIBC analysis of the DNA content of single nuclei was com-
MMF Form by a Mechanism Distinct from Cell-to-Cell bined with IF for b-tubulin to accurately identify single cells and
Fusion revealed that the majority (74.8%) of polyploid nuclei were con-
We attempted to quantify cell-to-cell fusion by mixing CD45.1+ tained within BiNucl cells, whereas 16% of polyploid nuclei were
with congenic CD45.2+ MF precursors, followed by stimulation contained within MoNucl cells (Figure 2I). These data show that
with BLP (Figure 2A). As a positive control, we used RANKL, a chronic stimulation with BLP promoted recurrent cytokinesis fail-
cytokine known to induce osteoclast formation by cell-to-cell ure and, to a lesser extent, endoreplication (Figures S5D and
fusion (Vignery, 2005). While RANKL-stimulated MF precursors S5G) leading to the formation of polyploid MMF and polyploid
differentiated into MultiNucl osteoclast-like CD45.1+ CD45.2+ MoNucl MF, respectively.
cells indicating cell-to-cell fusion, BLP stimulation did not lead In vivo, polyploid nuclei were identifiable in liver granulomas
to the generation of MMF co-expressing CD45.1 and CD45.2 induced by BCG (Figure S5F), albeit in small numbers. Infection
Figure 2. MMF Formation from MF Precursors Does Not Involve Cell-to-Cell Fusion
(A) IF for CD45.1, CD45.2 in stimulated MF precursors. White arrows, CD45.1+CD45.2– MMF (middle), CD45.1+CD45.2+ osteoclast-like MMF (bottom).
(B and C) IF for CD45.1, CD45.2 on liver granuloma cryosections from BCG-infected CD45.1:CD45.2 chimeras. (B) Example of a CD45.1+CD45.2– MMF.
(C) Numbers of BiNucl and MMF with the indicated phenotype; n = 5 chimeras, N.D., not detectable.
(D–K) BLPs and mycobacteria regulate nuclear ploidy (D–F and I): QIBC (G, H, J, and K): FISH. (D) Nuclear area per single nucleus. Black line, mean nuclear area.
One representative experiment of three independent experiments. (E and F) DNA content per nucleus. (E) Representative histograms. Red line, cutoff for DNA
content >4c. (F) Percentage of polyploid (>4c) nuclei, as in (E). Mean ± SD from three independent experiments. (G and H) FISH for chromosomes 2, 11, X, 16
in vitro. (G) Representative images. (H) Percentage of total nuclei with the indicated number of FISH signals. 155–212 nuclei per condition were analyzed. Mean ±
SD from three independent experiments. (I) Distribution of polyploid nuclei in BLP-stimulated MoNucl, BiNucl, and MultiNucl cells. (J and K) FISH for chromo-
somes 2, 11 in lung cryosections from M. tuberculosis-infected WT and Il13Tg mice. (J) Representative images. (K) Numbers of polyploid (FISH signals R2;2 per
nucleus) nuclei from 25 visual fields in MF-rich granuloma areas.
*p < 0.05, **p < 0.01, ***p < 0.001; scale bars, 10 mm. See also Figure S5.
Cell 167, 1264–1280, November 17, 2016 1269

A
F G H
I K L
Figure 3. TLR2-Induced MMF Formation by Cytokinesis Failure

(A–H) Time-lapse imaging of MF precursors after 72 hr of stimulation. (A) Percentage of MoNucl, BiNucl, and MultiNucl MF at the beginning of the imaging
session. Mean ± SD from three independent experiments.
1270 Cell 167, 1264–1280, November 17, 2016

of B6 mice with M. tuberculosis does not induce the formation of How do BLPs cause cytokinesis failure? Cytokinesis failure
multinucleated giant cells nor granuloma necrosis, both impor- can result from perturbations in cleavage furrow formation,
tant features of human tuberculosis. In contrast, IL-13-overex- cleavage furrow stabilization, midbody formation, or abscission
pressing (Il13Tg) mice form pulmonary necrotizing granulomas (Normand and King, 2010). Tracking single mitotic events by
with numerous MMF, foam cells, and TNF-producing MF (Heit- LCI revealed that cytokinesis failed by cleavage furrow regres-
mann et al., 2014). Strikingly, the presence of polyploid nuclei sion (Figure 3C; Movie S2). Chromatin persisting in the midzone
was increased in MF-rich areas of M. tuberculosis-infected is an important cause of cleavage furrow regression and cytoki-
Il13Tg lung granulomas as shown by FISH for chromosomes 2 nesis failure in cells with lagging chromosomes or acentric chro-
and 11 (Figures 2J and 2K). Collectively, these data demonstrate mosome fragments (Davoli and de Lange, 2011). Indeed, lagging
that Mycobacteria and BLP controlled DNA content of MF in a chromosomes at the cleavage furrow were present in BLP-stim-
process that required signaling via the adaptor protein MyD88. ulated MF precursors (Figures 3D and 3E, bottom panels;
Movies S6 and S7). Lagging chromosomes or chromatid frag-
Continuous Stimulation with BLP Introduces Mitotic ments that fail to be included in the daughter nuclei following
Defects Leading to MMF Formation cytokinesis failure are eventually enclosed by a nuclear mem-
To directly assess whether the formation of MMF may occur by brane, forming a micronucleus (MN) (Fenech et al., 2011). We
modifying cytokinesis, we employed time-lapse live-cell imaging quantified MN by QIBC of interphase MF. This analysis revealed
(LCI) of MF precursors transduced with a retrovirus encoding the that BLP induced a significant increase in the numbers of MF
histone protein 2B (H2B) fused with GFP. At the beginning of the containing MN (Figure 3K). Importantly, we also identified lag-
LCI session, more than 80% of the cells were MoNucl MF (Fig- ging chromosomes at anaphases of dividing MF in BCG-
ure 3A). MoNucl MF precursors that were stimulated with induced granulomas, indicating that such mitotic defects also
M-CSF only were small and underwent predominantly success- occurred in granuloma-associated MMF in vivo (Figure 3L).
ful divisions (Figure 3B; Movie S1). In contrast, MoNucl cells Collectively, the data demonstrated that BLP promoted the for-
stimulated with BLP appeared large (Figures 3C–3E) and under- mation of polyploid MF via mitotic defects.
went cytokinesis failure with significantly increased frequency,
producing predominantly BiNucl progeny (Figures 3C–3G; TLR2 Signaling Confers a Proliferation Advantage to
Movies S2 and S3). Surprisingly, LCI analysis of all mitotic events Polyploid Macrophage Progeny
revealed that BiNucl cells re-entered mitosis (Figure 3C; Movie We interrogated cell cycle-related gene expression signatures in
S4), and the proportion of BiNucl cells entering mitosis was BLP-stimulated MF. BLP stimulation downregulated the expres-
increased by BLP (Figure 3H). In addition, BLP significantly sion of cell cycle genes and proteins at the onset of MMF forma-
increased the rate of cytokinesis failure within mitotic events of tion (day 3), by gene set enrichment analysis (GSEA) (Figure 4A),
BiNucl parent cells (Figure 3F). qRT-PCR, and IB for cyclin D1 and D2 (Figures S6A and S6B). In
We examined whether MMF formed following failed division of contrast, following MMF formation (day 6), cell-cycle genes
MoNucl versus BiNucl parent cells (Figures 3I–3J). To address were significantly enriched among the genes that were upregu-
this, we fate mapped 307 single mitotic events by determining lated by BLP (Figure 4A). scRNA-seq at the same time point
the number of nuclei in parent and daughter cells in each single (day 6) revealed upregulated expression of the mitotic regulators
mitotic event and the outcome of cytokinesis (success versus Ccnb1 and Cdk1 as well as increased expression of the DNA
failure). MultiNucl cells were generated more often following replication licensing factor Mcm6 and the DNA synthesis pro-
failed cytokinesis of BiNucl, rather than MoNucl, parent cells moting genes Rrm1 and Rrm2 in F>4c MF (Figure 4B). In
(Figure 3G), indicating that at least one round of cytokinesis fail- addition, gene signatures of DNA replication and cell-cycle pro-
ure preceded multinucleation. MMF formation from MoNucl gression were highly enriched in FSL1-stimulated MF (Fig-
parent cells also occurred (Figure 3E; Movies S5), but with less ure S4). These data suggested that BLP initially suppressed
frequency (Figure 3G). the proliferation of MF precursors (day 3), while at a later time
(B–E) Examples of still images from selected time points. (B) Successful cell divisions in medium without FSL-1. The corresponding movie is Movie S1.
(C–E) Examples of cytokinesis failure outcomes in FSL-1-stimulated MF precursors.
(C) Cytokinesis failure leads to a BiNucl daughter cell (top), which re-enters mitosis and re-fails cytokinesis, generating again a BiNucl daughter cell (bottom). The
corresponding movies are Movies S2 (top) and S4 (bottom).
(D) Cytokinesis failure leads to a BiNucl daughter cell (top). Lagging chromosome (yellow arrow) visualized at the cleavage furrow, cleavage furrow regression,
and formation of a BiNucl daughter cell containing a micronucleus (MN, bottom). The corresponding movies are Movies S3 (top) and S6 (bottom).
(E) MoNucl parent cell undergoes a tripolar mitosis and fails cytokinesis producing a MultiNucl daughter cell (top). Yellow arrows, lagging chromosomes and MN
(bottom). The corresponding movies are Movies S5 (top) and S7 (bottom).
(F) Outcome of single-cell divisions from MoNucl and BiNucl parent cells. Mean ± SD from three independent experiments.
(G) Percentage of MoNucl, BiNucl, and MultiNucl daughter cells per 100 mitoses.
(H) Percentage of MoNucl and BiNucl cells entering mitosis during the live-cell imaging session.
(I and J) MMF formation via endoreplication and cytokinesis failure (I) or recurrent cytokinesis failure (J).
(K) Percentage of cells containing MN by QIBC. Mean ± SD from three independent experiments.
(L) Lagging chromosomes in a mitotic macrophage in BCG liver granuloma.
(G and H) n > 300 mitotic events, representative of two independent experiments. *p < 0.05; scale bars, 10 mm; timescale, hours: minutes. See also Movies S1, S2,
S3, S4, S5, S6, and S7.
Cell 167, 1264–1280, November 17, 2016 1271

A B
C D E
G H
Figure 4. TLR2 Signaling Confers a Proliferation Advantage to Polyploid MF Progeny

(A) GSEA for cell-cycle genes. Gene array was performed with four independent biological replicates per group.
(B) Violin plots comparing expression of Ccnd1, Ccnb1, Cdk1, Mcm6, Rrm1, Rrm2, by scRNA-seq. y axis, log2 (normalized count+0.1) expression levels; black
point, mean of expression level.
(C) Examples of metaphase spreads.
(D) Percentage of metaphases with 2c versus R 4c ploidy. Metaphases with 38–40 chromosomes were grouped as ‘‘2c’’, metaphases with 78–80 or >80
chromosomes were grouped as ‘‘R 4c’’. Mean ± SD from three independent experiments. n = 22–101 metaphases per condition.
(E and F) Spectral karyotyping (SKY). Ten metaphases per condition were analyzed. (E) Example of control metaphase with a 2c chromosome count. (F) Example
of metaphase with a 4c chromosome count after FSL-1 stimulation.
(G and H) BLPs induce chromosomal aberrations (CA). (G) Example of metaphase spread with CA: triradial chromatid exchange (red arrow), dicentric chro-
mosome (green arrow), acentric fragments (black arrow), Robertsonian translocation (brown arrow), and complex rearrangements (three chromosomes, purple
arrow). (H) Percentage of all metaphases with CA. Mean ± SD from five independent experiments.
**p < 0.01, ****p < 0.0001. See also Figure S6.
1272 Cell 167, 1264–1280, November 17, 2016

A B C
D E F
G H I
J K L M
P Q R
S T
Cell 167, 1264–1280, November 17, 2016 1273

point (day 6) BLP re-programmed MMF to express cell-cycle variant H2AX (called gH2AX), an early histone modification
genes to promote mitosis and DNA synthesis required for induced by DNA damage, and correlated its levels with nuclear
polyploidy. DNA content by QIBC. MF precursors differentiated in the pres-
In agreement, BLP increased the numbers of BiNucl cells ence of BLP displayed high levels of gH2AX (Figures 5A–5C)
entering mitosis (Figure 3H). To confirm that BiNucl cells re- often with a pan-nuclear pattern of staining (Figure 5A), consis-
entering mitosis were indeed polyploid and do not represent tent with ATR activation and RS (Toledo et al., 2008). gH2AX
diploid cells with their DNA divided into subdiploid nuclei, we levels were highest in nuclei with DNA content between 2c and
performed chromosome counts and SKY karyotyping in meta- 4c (Figure 5B). EdU+ nuclei of BLP-stimulated MF expressed
phase spreads. BLP induced a significant increase in polyploid high levels of gH2AX (Figures 5D–5F), and BLP suppressed the
metaphases (Figures 4C–4F), supporting that BLPs promote rate of replication fork progression in DNA combing assays (Fig-
mitotic entry of polyploid cells. To explore the effects of BLP ures 5G and 5H) and increased levels of phosphorylated (p-)RPA
on DNA synthesis, we quantified bromodeoxyuridine (BrdU) (Replication Protein A), both of which demonstrate RS (Figures 5I
incorporation per single nucleus (Figures S6C–S6E). BrdU+ and 5J). Importantly, nuclei with high gH2AX levels also had
nuclei with a 2c-4c DNA content (gate A) were gated separately high p-RPA levels (Figure 5K), consistent with an ongoing RS
from BrdU+ nuclei with DNA content >4c (gate B; Figure S6D). response. RS and activated DDR in BLP-stimulated MF was
The rate of BrdU incorporation by nuclei with 2c-4c DNA content further confirmed by p53-binding protein 1 (53BP1) foci (Lukas
was significantly reduced in the presence of BLP in a MyD88- et al., 2011) and p-CHK2 (Checkpoint Kinase 2) (Bartek and Lu-
dependent manner (Figure S6E). In contrast, BLP increased kas, 2003) (Figures 5L–5N). Together, these data demonstrated
the rate of BrdU incorporation into nuclei with DNA content that BLPs induce RS and activate the DDR in MF precursors.
>4c when compared with the few 4c nuclei contained within
the control cultures (Figure S6E). Together, these results demon- ATR-Dependent DDR Pathways Secure Genomic
strate that persistent stimulation with BLP confers a cell-cycle Stability Further Promoting MF Polyploidy
‘‘advantage’’ to polyploid MF, allowing them to bypass prolifer- Our data indicated that TLR2 signaling induces RS and activates
ation barriers that normally suppress the proliferation of poly- the DDR initiating DNA repair processes in MF as indicated by
ploid cells. GSEA analysis showing significant enrichment for cell-cycle
repair genes (false discovery rate [FDR] <0.001) (Figure 5O), a
Continuous Stimulation with BLP Induces Replication finding confirmed by scRNA-seq (Figure S4). We examined the
Stress and Activates the DDR effects of pharmacological inhibition of ATR using low concen-
BLP induced a significant increase in micronuclei (Figure 3K) and trations of a highly specific inhibitor, ETP-46464 (Toledo et al.,
chromosomal aberrations (Figures 4G and 4H) indicating 2011). ATR inhibition significantly suppressed the levels of
genomic instability (Fenech et al., 2011), and proliferation of gH2AX during MF differentiation in the presence of BLP (Fig-
polyploid cells can pose an additional threat to genomic stability ure 5P) and increased mitotic defects and genomic instability
(Ganem and Pellman, 2007). Thus, we asked by which pathways as demonstrated by the significantly augmented numbers of
BLPs promote the polyploid cell fate via recurrent mitotic defects MN (Figure 5Q). These data suggest that ATR inhibition
despite the genomic instability inherent in this process. Replica- increased RS (Toledo et al., 2011). Indeed, BLP-stimulated MF
tion stress (RS), defined as slowing of DNA synthesis due to showed increased cytokinesis failure after ATR inhibition as evi-
stalling of replication fork progression, is intrinsically linked to denced by increased numbers of BiNucl MF (Figure 5R). Thus,
suppressed DNA synthesis and to activation of the RS response, ATR limits RS while, on the other side, it is known to enhance
primarily mediated by the DDR kinase ataxia telangiectasia and genomic stability of binucleated cells and may support their re-
Rad3 related (ATR) (Zeman and Cimprich, 2014). To test whether replication. In agreement, ATR inhibition decreased the DNA
BLP stimulation induced DNA damage signaling in the S phase, content per individual nucleus, indicating decreased re-replica-
we measured the ATR-mediated phosphorylation of the histone tion (Figures 5S and 5T). Thus, ATR limits mitotic defects and
Figure 5. BLPs Induce RS and Activate the DDR

(A–F) TLR2-induced DDR in the S phase, QIBC; (A–C) IF for gH2AX, DAPI. (A) Representative images. (B) Mean gH2AX signals per nucleus versus total DAPI
intensity. Red lines, cutoff for gH2AXhi nuclei. (C) Percentage of gH2AXhi nuclei, as in (B). (D–F) IF for gH2AX levels, DAPI, and EdU incorporation. (D) Repre-
sentative images. (E) Mean EdU versus total DAPI intensity (top) and gH2AX versus total DAPI intensity (bottom). Purple line (top), cutoff for EdU positivity. Darker
dots bottom, Edu+ nuclei. Red line (bottom), cutoff for gH2AX positivity. (F) Percentage of EdU+ nuclei that are gH2AX+, as in (E).
(G and H) Replication fork speed, by DNA combing. (G) Examples of combed DNA fibers with replication tracts: IdU (green), CIdU (red). (H) Replication fork speed
distribution. Data are pooled from five independent experiments with n = 250 fibers scored per condition.
(I–K) Regulation of p-RPA levels by BLP, QIBC. (I) Representative images. (J and K) p-RPA levels per single nucleus; J: control versus FSL-1; K: FSL-1 gH2AXhi
versus FSL-1 gH2AXlo/hi.
(L and M) Regulation of 53BP1 foci by BLP, QIBC. (L) Representative images. (M) Percentage of nuclei with more than four 53BP1 foci per nucleus.
(N) IB for p-ChK2.
(O) GSEA for the REACTOME_DNA_repair gene set. Gene array was performed with four independent biological replicates per group.
(P–T) ATR mediates polyploidy and genetic stability, QIBC. (P) Percentage of gH2AXhi nuclei; cutoff for gH2AXhi expression as in (E). (Q) Percentage of cells with
MN. (R) Percentage of BiNucl and MMF. (S and T) DNA content per single nucleus. (S) Representative images. (T) Percentage of polyploid (>4c) nuclei.
Mean ± SD from three (C, F, P, Q, R, and T) or two (M) independent experiments are shown. Example from two (J and K) and three (N) independent experiments.
*p < 0.05, **p < 0.01; scale bar, 10 mm.
1274 Cell 167, 1264–1280, November 17, 2016

A
B C
D E F
G H
Cell 167, 1264–1280, November 17, 2016 1275

genomic instability but, at the same time, is required to support arrest (Andreassen et al., 2001; Bartkova et al., 2006; Fujiwara
the re-replication of BiNucl MF. et al., 2005; Ganem and Pellman, 2007). GSEA revealed a highly
significant enrichment for p53-dependent genes among those
RS and Activated DDR in Human and Mouse Granulomas upregulated by BLP (Figure S7D). We hypothesized that acti-
To test whether MF precursor differentiation into polyploid MF vated p53 signaling poses a barrier to the differentiation of MF
occurs in the context of RS in vivo, we examined BCG-induced precursors into polyploid MF and their re-replication, a barrier
liver granulomas. Strikingly, 3 weeks after infection, gH2AX that is bypassed in the presence of a chronically persisting in-
levels were significantly increased in granuloma-associated flammatory stimulus. To address this, we analyzed numbers of
MF and the increased gH2AX signals co-localized with BiNucl BiNucl MF and MMF and nuclear ploidy in BM-derived p53/
MF and MMF at the granuloma core (Figures 6A and 6B). MF and wild-type controls, with and without stimulation with
Furthermore, gH2AXhi MF displayed a pan-nuclear gH2AX BLP. BLP stimulation of p53/ MF precursors generated a
staining pattern, consistent with RS (Figure 6A). Similarly, strikingly increased number of BiNucl MF and MMF (Figures
M. tuberculosis-induced lung granuloma MF of Il13Tg mice dis- 7A and 7B), and nuclear ploidy was significantly upregulated
played high levels of gH2AX with a pan-nuclear staining pattern (Figure 7C). These data confirm that p53 poses a barrier to the
(Figures 6C and 6D) and large F4/80+ gH2AXhi mitotic figures, formation and proliferation of polyploid MF by BLP.
some of them with lagging chromosomes (Figure 6C). The data We then sought to explain how BLP activated the DDR and
support that RS and mitotic defects are prominent features licensed the differentiation of polyploid MF. An important target
under inflammatory conditions in vivo that lead to severe granu- of growth factor signaling is the transcription factor Myc, a
lomatous immunopathology and promote differentiation of poly- known inducer of RS as well as a master regulator of cellular pro-
ploid MF. liferation and growth. Microarray, gene, and protein expression
Next, we analyzed ten human biopsies from M. tuberculosis- data in vitro and in vivo demonstrated that MMF formation
infected patients. MF in granuloma areas containing MMF was linked to increased expression of Myc (Figures 1D, 1M,
showed significantly increased levels of pan-nuclear gH2AX 7D, and 7E). We asked whether the processes controlling
(Figures 6E, 6F, and S7A). Upregulation of gH2AX was prominent MMF generation (i.e., RS and proliferation of polyploid cells
in MMF, supporting a role for the DDR pathway in MMF differen- against p53-imposed barriers) require Myc signaling. To inhibit
tiation in human M. tuberculosis granulomas. To confirm that Myc in a temporally controlled manner and without interfering
increased pan-nuclear gH2AX represents RS and activated with the cell cycle of MF progenitors, we used 10058-F4, a
DDR, we examined p-RPA expression and numbers of 53BP1 widely used and highly specific small molecule inhibitor of Myc
foci. Both were prominently increased in MMF (Figures 6G, (Yin et al., 2003). Myc inhibition abrogated high nuclear gH2AX
6H, S7B, and S7C). These findings could be extended to levels (Figures 7F–7H), in particular in proliferating cells (Figures
sarcoidosis and giant cell arteritis granulomas (Figures 6E–6H). 7I–7J). In line with a role for Myc as a mediator of TLR2-induced
GSEA analysis of previously published gene expression data RS signaling, Myc inhibition reversed the TLR2-induced sup-
(Kim et al., 2010; Subbian et al., 2015) from caseous human pression of MF precursor proliferation (Figures 7I and 7K).
lung M. tuberculosis granulomas and normal human lung tis- Furthermore, the changes in cholesterol metabolism induced
sue revealed significant enrichment for DNA repair genes by continuous stimulation of MF precursors with BLP were a
(FDR <0.001; Figure 6I). Collectively, the data suggest that Myc-dependent event (Figures 7L and 7M). Collectively, our
RS and activated DDR are crucial determinants of MMF differen- data assign a central role to the TLR2-Myc-DDR signaling axis
tiation in a larger spectrum of human MMF-rich granulomas of in the cellular cascades driving differentiation of MF precursors
infectious and non-infectious etiology. into MMF.
TLR2-Induced Myc Activation Controls RS and Bypasses DISCUSSION

p53 Barriers to Proliferation
The proliferation potential of genetically unstable polyploid cells The molecular programs that control MF differentiation in the
is countered by p53-dependent barriers leading to cell-cycle presence of a persistent inflammatory stimulus, although critical
Figure 6. RS and Activated DDR in Granulomas Enriched in MMF In Vivo

(A and B) IF for gH2AX, mouse liver cryosections. (A) Pan-nuclear staining pattern of gH2AX. Representative images. Scale bars, 10 mm. (B) Numbers of gH2AXhi
MF per granuloma. n = 3 mice per time point.
(C and D) IF for gH2AX, mouse lung cryosections. (C) Representative images. White arrow, large gH2AX+ nucleus, with pan-nuclear staining pattern. Yellow arrow,
large F4/80+ mitotic figure with lagging chromosome. Scale bars, 10 mm. (D) Numbers of gH2AXhi F4/80+ MF per granuloma field from three independent
experiments. (A and C) Examples of five independent experiments.
(E–H) RS and DDR in human lung, sarcoid skin, and giant cell arteritis granulomas. IH (E and F) and IF (G and H) in formalin-fixed paraffin-embedded sections.
(E) Examples from 10 to 15 patient biopsies per granulomatous disease. Scale bars, left: 200 mm, right: 50 mm. (F) Mean values of the number of gH2AXhi nuclei per
visual field in MF-rich granuloma areas versus adjacent uninvolved tissue per sample. A total of 500–1,000 cells (10 to 20 high-power fields) were evaluated per
sample in granuloma and adjacent unaffected regions, respectively. n = 10–17 patient biopsies per group. (G and H) IF for p-RPA2 and 53BP1. Scale
bars, 100 mm.
(I) GSEA for the REACTOME_DNA_repair gene set analysis of previously published gene expression data (Kim et al., 2010; Subbian et al., 2015) from caseous
human lung M. tuberculosis granulomas and normal human lung tissue.
***p < 0.001. See also Figure S7.
1276 Cell 167, 1264–1280, November 17, 2016

A B C
D E F
H I J
L M K
Cell 167, 1264–1280, November 17, 2016 1277

for disease outcome, remain unresolved. Here, we delineate a MF transcription factor MafB suppressed MMF differentiation,
MF differentiation pathway induced by chronic stimulation with denoting their diverging cell fate. The transcription factors
BLP. Our results unravel a role for Myc-induced DDR signaling MafB and c-Maf were recently shown to repress a MF-specific
as a critical determinant of granuloma-resident polyploid MF enhancer repertoire associated with self-renewal signaling net-
cell fate with unique metabolic and ECM remodeling gene works, including Myc as a key signaling node (Soucie et al.,
expression signatures. 2016).
We identified Myc as a crucial regulator of the BLP-induced
Inflammation-Induced Polyploid Macrophage Cell Fate DDR response. Interestingly, recent studies reported that the
via Modified Cell Divisions DNA damage pathway was significantly enriched among
Among immune cells, MF are unique in their capability to differ- genes regulated by Myc inhibitors in osteoclast precursors
entiate into polyploid progeny, such as osteoclasts and MMF, in (Park-Min et al., 2014). It is therefore likely that Myc-DDR
a process widely believed to be cell-to-cell fusion (Helming and signaling plays a broad and important role in genomic stability
Gordon, 2009). How polyploid MF subsets arise in the presence during formation of polyploid MF subsets, either by RANKL-
of a persistent inflammatory stimulus was, however, unclear. We induced cell-to-cell fusion or by TLR2 or TNF-induced modi-
show that RS and mitotic defects including endoreplication and fied cell division. The mechanisms by which this occurs may
cytokinesis failure represent an alternative pathway leading to differ in the two processes. It will be an important avenue for
polyploid MF subsets in granulomas. Such a pathway to poly- future research to determine distinct and common mecha-
ploidization was previously described in trophoblasts and hepa- nisms to protect genomic integrity during MF polyploidization
tocytes (Davoli et al., 2010) but has not been linked to pathogen as this may provide insights how to promote or suppress
recognition, inflammatory cytokine signals or immune cell fate. MMF or osteoclast differentiation thereby modifying disease
We demonstrate that TLR2 and TNF signaling can engage this outcome.
pathway in proliferating MF precursors leading to the formation Oncogene-induced RS is a barrier toward tetraploidy and
of polyploid MF. tumorigenesis since the DDR induces cell-cycle arrest and
An intriguing implication of our findings is that distinct p53-dependent cell death (Bartkova et al., 2006; Gorgoulis
cytokine or pathogen-recognition signals may govern the et al., 2005). In agreement, TLR2-Myc-mediated RS sup-
decision to amplify the genomic content of MF, producing pressed cell-cycle progression in diploid MF precursors and
polyploid progeny by two distinct cellular processes, one being p53 deficiency significantly increased the numbers of poly-
cell-to-cell fusion and the other modified cell division. Thus, ploid MMF. Strikingly, TNF also acted as a barrier to MMF
RANKL induced cell-to-cell fusion, while TLR2 ligands and formation since Tnf/ MF precursors formed significantly
TNF led to polyploidization via mitotic defects. Despite the increased numbers of MMF. Whether the p53 and TNF-
obvious cell biological differences of the ‘‘cell-to-cell fusion’’ signaling networks cross-talk to suppress MF polyploidy is
and ‘‘modified cell division’’ pathways to polyploidy, our data currently unknown. The role of Myc in MF polyploid-
support that the two pathways share some common mecha- ization is in line with recent findings that growth factor sig-
nistic pillars. Thus, MafB suppression and Myc activation are naling may be one route to overcome p53-imposed cell-cycle
common regulators of RANKL-induced osteoclastogenesis arrest in tetraploid cells (Ganem et al., 2014). Our findings sup-
(Kim et al., 2007; Park-Min et al., 2014) and BLP-induced port that, in non-malignant inflammatory microenvironments,
MMF formation. overcoming inflammation-induced p53 barriers to the prolifera-
tion of tetraploid cells may be a rate-limiting step for the forma-
A TLR2-Myc-DDR Signaling Axis Instructs a Polyploid tion of polyploid MF and immune-mediated pathology in
Macrophage Fate in Granulomas chronic granulomatous diseases. Thus, pathways employed
BLP instructed a non-canonical MF differentiation program that by developing cancer cells surprisingly instruct a polyploid
led to the formation of polyploid MF with an ECM remodeling MF fate in the presence of chronically persisting inflammatory
gene expression signature. Overexpression of the canonical stimuli.
Figure 7. BLPs Activate the DDR and Induce MF Polyploidy via Myc
(A–C) p53 suppresses BLP-induced polyploid MF, QIBC. (A) Representative images; (B) percentage of BiNucl and MMF; (C) percentage of polyploid (>4c) nuclei.
(B and C) Mean ± SD from averages of triplicate replicates from two independent experiments.
(D) qRT-PCR of Myc mRNA expression, normalized relative to Gapdh mRNA expression. Mean ± SD of triplicate determinants pooled from three independent
experiments.
(E) IB of nuclear lysates for Myc. Example of two independent experiments.
(F–H) Myc regulates S phase TLR2-DDR signaling, QIBC. (F) Representative images. (G) Mean gH2AX intensity per nucleus. Red lines, cutoff for gH2AXhi
expression. Black lines, mean values. (H) Percentage of gH2AXhi nuclei, as in (G).
(I) Mean EdU versus total DAPI intensity (top) and gH2AX versus total DAPI intensity (bottom). Purple line, cutoff for EdU positivity. Darker dots (bottom), EdU+
nuclei. Red line (bottom), cutoff for gH2AX positivity.
(J) Percentage of gH2AX+ nuclei, among EdU+ nuclei, as in (I).
(K) Percentage of EdU+ nuclei, as in (I).
(L and M) Myc regulates BLP-induced lipid droplet accumulation. QIBC of cytoplasmic lipid droplet accumulation. (L) Nile Red IF. (M) Percentage of nuclei
associated with Nile Red+ cytoplasmic droplets.
(H, J, K, and M) Mean ± SD from three independent experiments. *p < 0.05, **p < 0.01, ***p < 0.001, Scale bars, 10 mm.
1278 Cell 167, 1264–1280, November 17, 2016

STAR+METHODS gramme (Marie Curie IRG 268390 to A.T. and project INsPiRE to V.G.), the
Research Committee of the Medical Faculty, University of Freiburg, an EKFK
NAKSYS Fellowship (to A.T.), the DFG (SFB 1160, project 12 to A.T. and
Detailed methods are provided in the online version of this paper
R.E.V.; FOR 2165, HA5354/6-1 to A.E.H.; FOR 2033 NicHem, project B1 to
and include the following: C.D.), the German Centre for Infection Research (to C.H.), the BMBF
(FKZ01E01502 to S.D.D), the European Research Council (ERC-2012-StG-
311377 to A.D. and ERC-2015-StG-679068 to A.J.L.-C.), the Danish Council
d CONTACT FOR REAGENT AND RESOURCE SHARING for Independent Research (Sapere Aude, DFF-Starting Grant 2014 to
d EXPERIMENTAL MODEL AND SUBJECT DETAILS A.J.L.-C.), the Greek GSRT program of Excellence II (Aristeia II to V.G.).
B Human specimens M.H.S. is a BIH-Einstein fellow and INSERM-Helmholtz group leader; he was
B Mice supported by Agence Nationale de la Recherche’ (ANR-11-BSV3-0026), Fon-
d METHOD DETAILS dation pour la Recherche Médicale (DEQ. 20110421320), and InCA (13-10/
405/AB-LC-HS).
B Generation of MMFs from MV precursors
B Bacterial culture and infections
Received: January 21, 2016
B Retroviral transductions Revised: July 26, 2016
B FISH Accepted: September 28, 2016
B SKY Published: October 27, 2016
B Immunofluorescence and immunohistochemistry in
mouse tissues REFERENCES
B DNA Fiber Assay

Anders, S., and Huber, W. (2010). Differential expression analysis for sequence
B Immunohistochemistry (IHC) and Indirect Immunofluo- count data. Genome Biol. 11, R106.
rescence (IF) in human samples Andreassen, P.R., Lohez, O.D., Lacroix, F.B., and Margolis, R.L. (2001). Tetra-
B Time-lapse live-cell imaging ploid state induces p53-dependent arrest of nontransformed mammalian cells
B Quantitative Image-Based Cytometry (QIBC) in G1. Mol. Biol. Cell 12, 1315–1328.
B qRT-PCR and microarray analysis Baker, S.C., Bauer, S.R., Beyer, R.P., Brenton, J.D., Bromley, B., Burrill, J.,
B Single-cell RNA library preparation Causton, H., Conley, M.P., Elespuru, R., Fero, M., et al.; External RNA Controls
B Quantification of Transcript Abundance Consortium (2005). The External RNA Controls Consortium: A progress report.
B Single-Cell RNA Sequencing Data Analysis Nat. Methods 2, 731–734.
B Laser microdissection and gene expression analysis Bartek, J., and Lukas, J. (2003). Chk1 and Chk2 kinases in checkpoint control
and cancer. Cancer Cell 3, 421–429.
B Immunoblotting
d QUANTIFICATION AND STATISTICAL ANALYSIS Bartkova, J., Rezaei, N., Liontos, M., Karakaidos, P., Kletsas, D., Issaeva, N.,
Vassiliou, L.V., Kolettas, E., Niforou, K., Zoumpourlis, V.C., et al. (2006). Onco-
d DATA AVAILABILITY
gene-induced senescence is part of the tumorigenesis barrier imposed by
DNA damage checkpoints. Nature 444, 633–637.
Berte, N., Piee-Staffa, A., Piecha, N., Wang, M., Borgmann, K., Kaina, B., and
Nikolova, T. (2016). Targeting homologous recombination by pharmacological
Supplemental Information includes seven figures, one table, and seven movies
inhibitors enhances the killing response of glioblastoma cells treated with alky-
lating drugs. Mol. Cancer Ther. Published online July 29, 2016 molcanther.
2016.09.054.
0176.2016.
Carvalho, C.R., Clarindo, W.R., and Abreu, I.S. (2011). Image cytometry: Nu-
clear and chromosomal DNA quantification. Methods Mol. Biol. 689, 51–68.
A.T., L.H., and V.H. designed, performed, and analyzed the majority of the ex- Coschi, C.H., Martens, A.L., Ritchie, K., Francis, S.M., Chakrabarti, S., Berube,
periments with help from K.G., J. Senges, N.M., and T.A. The indicated exper- N.G., and Dick, F.A. (2010). Mitotic chromosome condensation mediated by
iments were performed and analyzed by I.N., T.H. (FISH, SKY, cytogenetics), the retinoblastoma protein is tumor-suppressive. Genes Dev. 24, 1351–1363.
T.N. (DNA fiber assays, metaphase analysis), S., D.G. (scRNA-seq), K.E., V.G., Davoli, T., and de Lange, T. (2011). The causes and consequences of poly-
T.N., R.E.V., and T.G. (human studies), D.E. and M.P. (LCM-MF gene expres- ploidy in normal development and cancer. Annu. Rev. Cell Dev. Biol. 27,
sion), S.D.D. (microscopy), and J. Stefanowski, A.E.H., M.M.Z., C.K., and C.D. 585–610.
(osteoclast analysis in vivo). C.H. (M. tuberculosis infections), D.P. (microar- Davoli, T., Denchi, E.L., and de Lange, T. (2010). Persistent telomere damage
rays), B.K. (LCI), M.S. (pathology), and M. Follo (QIBC) helped with experi- induces bypass of mitosis and tetraploidy. Cell 141, 81–93.
ments. D.W., M. Fliegauf, S.S., M.S., and A.J.L.-C. provided critical reagents;
Emson, C.L., Bell, S.E., Jones, A., Wisden, W., and McKenzie, A.N. (1998).
L.R. and A.D. analyzed gene array data; S.S., M.H.S., and S.D.D. provided in-
Interleukin (IL)-4-independent induction of immunoglobulin (Ig)E, and pertur-
tellectual input. A.J.L.-C. and V.G. provided intellectual input on RS and DDR.
bation of T cell development in transgenic mice expressing IL-13. J. Exp.
V.G. directed the human studies. A.D. co-directed research and revised the
Med. 188, 399–404.
manuscript. P.H. oversaw initial experiments. A.T. directed research and wrote
the manuscript with input from co-authors. Fenech, M., Kirsch-Volders, M., Natarajan, A.T., Surralles, J., Crott, J.W.,
Parry, J., Norppa, H., Eastmond, D.A., Tucker, J.D., and Thomas, P. (2011).
Molecular mechanisms of micronucleus, nucleoplasmic bridge and nuclear
ACKNOWLEDGMENTS
bud formation in mammalian and human cells. Mutagenesis 26, 125–132.
We thank L. Ivashkiv, Y. Tanriver, M. Lenardo, E. Trompouki, and P. Heun for Fujiwara, T., Bandi, M., Nitta, M., Ivanova, E.V., Bronson, R.T., and Pellman, D.
helpful discussions and R. Rzepka, J. Volz, A. Hölscher, K. Schrenk, M. Vavra, (2005). Cytokinesis failure generating tetraploids promotes tumorigenesis in
A. Imm, and the Advanced Medical Bioimaging Core Facility of the Charité for p53-null cells. Nature 437, 1043–1047.
excellent technical assistance; C. Blattner and M. Oren for p53/ mice. The Ganem, N.J., and Pellman, D. (2007). Limiting the proliferation of polyploid
work was supported by the European Union’s Seventh Framework Pro- cells. Cell 131, 437–440.
Cell 167, 1264–1280, November 17, 2016 1279

Ganem, N.J., Cornils, H., Chiu, S.Y., O’Rourke, K.P., Arnaud, J., Yimlamai, D., Peyron, P., Vaubourgeix, J., Poquet, Y., Levillain, F., Botanch, C., Bardou, F.,
Théry, M., Camargo, F.D., and Pellman, D. (2014). Cytokinesis failure triggers Daffé, M., Emile, J.F., Marchou, B., Cardona, P.J., et al. (2008). Foamy macro-
hippo tumor suppressor pathway activation. Cell 158, 833–848. phages from tuberculous patients’ granulomas constitute a nutrient-rich reser-
Geissmann, F., Manz, M.G., Jung, S., Sieweke, M.H., Merad, M., and Ley, K. voir for M. tuberculosis persistence. PLoS Pathog. 4, e1000204.
(2010). Development of monocytes, macrophages, and dendritic cells. Sci- Puissegur, M.P., Lay, G., Gilleron, M., Botella, L., Nigou, J., Marrakchi, H.,
ence 327, 656–661. Mari, B., Duteyrat, J.L., Guerardel, Y., Kremer, L., et al. (2007). Mycobacterial
Gorgoulis, V.G., Vassiliou, L.V., Karakaidos, P., Zacharatos, P., Kotsinas, A., lipomannan induces granuloma macrophage fusion via a TLR2-dependent,
Liloglou, T., Venere, M., Ditullio, R.A., Jr., Kastrinakis, N.G., Levy, B., et al. ADAM9- and beta1 integrin-mediated pathway. J. Immunol. 178, 3161–3169.
(2005). Activation of the DNA damage checkpoint and genomic instability in Ramakrishnan, L. (2012). Revisiting the role of the granuloma in tuberculosis.
human precancerous lesions. Nature 434, 907–913. Nat. Rev. Immunol. 12, 352–366.
Grün, D., Kester, L., and van Oudenaarden, A. (2014). Validation of noise Russell, D.G., Cardona, P.J., Kim, M.J., Allain, S., and Altare, F. (2009). Foamy
models for single-cell transcriptomics. Nat. Methods 11, 637–640. macrophages and the progression of the human tuberculosis granuloma. Nat.
Immunol. 10, 943–948.
Grün, D., Muraro, M.J., Boisset, J.C., Wiebrands, K., Lyubimova, A., Dharmad-
hikari, G., van den Born, M., van Es, J., Jansen, E., Clevers, H., et al. (2016). Soucie, E.L., Weng, Z., Geirsdóttir, L., Molawi, K., Maurizio, J., Fenouil, R.,
De novo prediction of stem cell identity using single-cell transcriptome data. Mossadegh-Keller, N., Gimenez, G., VanHille, L., Beniazza, M., et al. (2016).
Cell Stem Cell 19, 266–277. Lineage-specific enhancers activate self-renewal genes in macrophages and
embryonic stem cells. Science 351, aad5510.
Hashimshony, T., Senderovich, N., Avital, G., Klochendler, A., de Leeuw, Y.,
Subbian, S., Tsenova, L., Kim, M.J., Wainwright, H.C., Visser, A., Bandyopad-
Anavy, L., Gennert, D., Li, S., Livak, K.J., Rozenblatt-Rosen, O., et al. (2016).
hyay, N., Bader, J.S., Karakousis, P.C., Murrmann, G.B., Bekker, L.G., et al.
CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol.
(2015). Lesion-Specific Immune Response in Granulomas of Patients with Pul-
17, 77.
monary Tuberculosis: A Pilot Study. PLoS ONE 10, e0132249.
Heitmann, L., Abad Dar, M., Schreiber, T., Erdmann, H., Behrends, J., Mcken-
Taylor, J.L., Hattle, J.M., Dreitz, S.A., Troudt, J.M., Izzo, L.S., Basaraba, R.J.,
zie, A.N., Brombacher, F., Ehlers, S., and Hölscher, C. (2014). The IL-13/IL-4Ra
Orme, I.M., Matrisian, L.M., and Izzo, A.A. (2006). Role for matrix metallopro-
axis is involved in tuberculosis-associated pathology. J. Pathol. 234, 338–350.
teinase 9 in granuloma formation during pulmonary Mycobacterium tubercu-
Helming, L., and Gordon, S. (2009). Molecular mediators of macrophage losis infection. Infect. Immun. 74, 6135–6144.
fusion. Trends Cell Biol. 19, 514–522.
Toledo, L.I., Murga, M., Gutierrez-Martinez, P., Soria, R., and Fernandez-Ca-
Hölscher, C., Reiling, N., Schaible, U.E., Hölscher, A., Bathmann, C., Korbel, petillo, O. (2008). ATR signaling can drive cells into senescence in the absence
D., Lenz, I., Sonntag, T., Kröger, S., Akira, S., et al. (2008). Containment of of DNA breaks. Genes Dev. 22, 297–302.
aerogenic Mycobacterium tuberculosis infection in mice does not require
Toledo, L.I., Murga, M., Zur, R., Soria, R., Rodriguez, A., Martinez, S., Oyarza-
MyD88 adaptor function for TLR2, -4 and -9. Eur. J. Immunol. 38, 680–694.
bal, J., Pastor, J., Bischoff, J.R., and Fernandez-Capetillo, O. (2011). A cell-
Kim, K., Kim, J.H., Lee, J., Jin, H.M., Kook, H., Kim, K.K., Lee, S.Y., and Kim, N. based screen identifies ATR inhibitors with synthetic lethal properties for
(2007). MafB negatively regulates RANKL-mediated osteoclast differentiation. cancer-associated mutations. Nat. Struct. Mol. Biol. 18, 721–727.
Blood 109, 3253–3259. Toledo, L.I., Altmeyer, M., Rask, M.B., Lukas, C., Larsen, D.H., Povlsen, L.K.,
Kim, M.J., Wainwright, H.C., Locketz, M., Bekker, L.G., Walther, G.B., Dittrich, Bekker-Jensen, S., Mailand, N., Bartek, J., and Lukas, J. (2013). ATR prohibits
C., Visser, A., Wang, W., Hsu, F.F., Wiehart, U., et al. (2010). Caseation of hu- replication catastrophe by preventing global exhaustion of RPA. Cell 155,
man tuberculosis granulomas correlates with elevated host lipid metabolism. 1088–1103.
EMBO Mol. Med. 2, 258–274. van der Maaten, L., and Hinton, G. (2008). Visualizing data using t-SNE.
Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment with Bur- J. Mach. Learn. Res. 9, 2579–2605.
rows-Wheeler transform. Bioinformatics 26, 589–595. Vignery, A. (2005). Macrophage fusion: The making of osteoclasts and giant
Lukas, C., Savic, V., Bekker-Jensen, S., Doil, C., Neumann, B., Pedersen, R.S., cells. J. Exp. Med. 202, 337–340.
Grøfte, M., Chan, K.L., Hickson, I.D., Bartek, J., and Lukas, J. (2011). 53BP1 Volkman, H.E., Pozos, T.C., Zheng, J., Davis, J.M., Rawls, J.F., and Ramak-
nuclear bodies form around DNA lesions generated by mitotic transmission rishnan, L. (2010). Tuberculous granuloma induction via interaction of a bacte-
of chromosomes under replication stress. Nat. Cell Biol. 13, 243–253. rial secreted protein with host epithelium. Science 327, 466–469.
Meyer, L.R., Zweig, A.S., Hinrichs, A.S., Karolchik, D., Kuhn, R.M., Wong, M., Williams, G.T., and Williams, W.J. (1983). Granulomatous inflammation–a re-
Sloan, C.A., Rosenbloom, K.R., Roe, G., Rhead, B., et al. (2013). The UCSC view. J. Clin. Pathol. 36, 723–733.
Genome Browser database: Extensions and updates 2013. Nucleic Acids Yin, X., Giap, C., Lazo, J.S., and Prochownik, E.V. (2003). Low molecular
Res. 41, D64–D69. weight inhibitors of Myc-Max interaction and function. Oncogene 22, 6151–
Nathan, C. (2009). Taming tuberculosis: A challenge for science and society. 6159.
Cell Host Microbe 5, 220–224. Yu, G., and He, Q.Y. (2016). ReactomePA: An R/Bioconductor package for re-
Normand, G., and King, R.W. (2010). Understanding cytokinesis failure. Adv. actome pathway analysis and visualization. Mol. Biosyst. 12, 477–479.
Exp. Med. Biol. 676, 27–55. Zehentmeier, S., Cseresnyes, Z., Escribano Navarro, J., Niesner, R.A., and
Park-Min, K.H., Lim, E., Lee, M.J., Park, S.H., Giannopoulou, E., Yarilina, A., Hauser, A.E. (2015). Automated quantification of hematopoietic cell—stromal
van der Meulen, M., Zhao, B., Smithers, N., Witherington, J., et al. (2014). In- cell interactions in histological images of undecalcified bone. J. Vis. Exp. (98)
hibition of osteoclastogenesis and inflammatory bone resorption by targeting Zeman, M.K., and Cimprich, K.A. (2014). Causes and consequences of repli-
BET proteins and epigenetic regulation. Nat. Commun. 5, 5418. cation stress. Nat. Cell Biol. 16, 2–9.
1280 Cell 167, 1264–1280, November 17, 2016

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
IF: anti-CD45.1-PE (clone A20) eBioscience Cat# 12-0453
IF: anti-CD45.2-FITC (clone 104) eBioscience Cat# 11-0454
IF: anti-CD45 (clone 30-F11) eBioscience Cat# 14-0451-82
IF: anti-CD45.1- Alexa Fluor 488 (clone 104) Biolegend Cat# BLD-109816
IF: anti-CD45.2-Alexa Fluor 488 (clone A20) Biolegend Cat# BLD-110720
IF: anti-F4/80-Alexa Fluor 647 (clone CI:A3-1) Bio-Rad (AbD Serotec) Cat# MCA497A647
IF: anti-F4/80 unconjugated (clone CI:A3-1) Bio-Rad (AbD Serotec) Cat# MCA497G
IF: anti-MTB (polyclonal) Bio-Rad (AbD Serotec) Cat# OBT0947
IF: anti-gH2AX Ser139 (rabbit monoclonal) Cell Signaling Cat# 9718
IF: anti-BrdU antibody (mouse, clone B44) Becton Dickinson Cat# 374580
IF: anti-BrdU antibody (rat, clone BU/1)) AbD Serotec Cat# OBT0030G
WB: anti-p38 (polyclonal) Cell signaling Cat# 9212
IF: anti-BrdU antibody (clone BU-1) Thermo Fisher Scientific Cat# MA3-071
IF: anti-Ki67 (clone SP6) Thermo Fisher Scientific Cat# RM-9106
IF: anti-a-tubulin (clone B-5-1-2) Sigma-Aldrich Cat# T5168
IF: anti-phospho RPA32 (S4/S8) (polyclonal) Bethyl laboratories Cat# A300-245A
IF:anti-53BP1 (polyclonal) Abcam Cat# ab21083
IF:anti-RPA2 (clone 9H8) Abcam Cat# ab2175
IF:anti-phospho RPA2 [S4/S8] (polyclonal) Abcam Cat# ab87277
WB: anti-TATA-binding protein TBP (clone 1TBP18) Abcam Cat# ab818
WB: anti-c-Myc (clone 9E10) Santa Cruz Cat# sc-47694
WB: anti-Cyclin D1 (clone A12) Santa Cruz Cat# sc-8396
WB: anti-cyclin D2 (M-20, polyclonal) Santa Cruz Cat# sc-718
WB: anti-MafB (P-20, goat polyclonal) Santa Cruz Cat# sc-10022
IF/IHC: anti-gH2AX Ser139 (clone JBW301) EMD Millipore Cat# 05-636
IF: Alexa Fluor 568 goat anti-rabbit Invitrogen Cat# A-11011
IF: Alexa Fluor 488 goat anti-mouse Invitrogen Cat# A-11001
IF: Alexa Fluor 488 goat anti-rabbit Invitrogen Cat# A-11034
IF: Alexa Fluor 546 goat anti-mouse Invitrogen Cat# A-11030
Recombinant murine M-CSF Peprotech Cat# 315-02
Recombinant murine TNF Peprotech Cat# 315-01A
Recombinant murine sRANK ligand Peprotech Cat# 315-11
Recombinant murine IL-1b Peprotech Cat# 211-11B
Recombinant murine IFN-g Peprotech Cat# 315-05
FSL-1 Invivogen Cat# tlrl-fsl
Pam3CSK4 Invivogen Cat# tlrl-pms
LPS Invivogen Cat# tlrl-eklps
poly (I:C) Invivogen Cat# tlrl-pic
Nile Red Sigma-Aldrich Cat# N3013
RNase A from bovine pancreas Sigma-Aldrich Cat# R4642
ETP-46464 Selleckchem Cat# S8050

Continued
c-Myc Inhibitor 10058-F4 Calbiochem Cat# 475956
Polybrene (Hexadimethrine Bromide) Sigma-Aldrich Cat# H9268
FuGENE6 Transfection Reagent Roche Cat# 11 815 091 001
RNaseOUT Invitrogen Cat# 10777-019
Superscript II Invitrogen Cat# 18064-014
Second Strand Buffer Invitrogen Cat# 10812-014
E. coli DNA ligase Invitrogen Cat# 18052-019
E. coli RNaseH Invitrogen Cat# 18021-071
E. coli DNA polymerase Invitrogen Cat# 18010-025
AMPure XP beads Beckman Coulter Cat# A63880
RNAClean XP beads Beckman Coulter Cat# A63987
Click-iT EdU Alexa Fluor 488 Imaging Kit Thermo Fisher Scientific Cat# C10337
Hemacolor Staining Kit Merck Millipore Cat# 1.11661.0001
Ultravision Quanto Detection System HRP DAB kit Thermo Fisher Scientific Cat# TL-125-QHD
CalPhos Mammalian Transfection Kit Clontech Cat# 631312
RNeasy Micro Kit QIAGEN Cat# 74004
ARCTURUS PicoPure RNA Isolation Kit Thermo Fisher Scientific Cat# KIT0214
ABsolute QPCR Mix, SYBR Green Thermo Fisher Scientific Cat# 1159A
TaqMan Gene Expression Master Mix Thermo Fisher Scientific Cat# 4369016
TaqMan PreAmp Master Mix Thermo Fisher Scientific Cat# 4391128
High-Capacity RNA-to-cDNA Kit Thermo Fisher Scientific Cat# 4387406
iScript cDNA Synthesis Kit Thermo Fisher Scientific Cat# 1708871
SkyPaint DNA Kit M-10 for Mouse Chromosomes Applied Spectral Imaging Cat# FPRPR0030
MEGAscript T7 Transcription Kit Ambion Cat# AM1334
Phusion High-Fidelity PCR Master Mix with HF Buffer NEB Cat# M0531
ExoSAP-IT For PCR Product Clean-Up Affymetrix Cat# 78200
NEBNext Magnesium RNA Fragmentation Module NEB Cat# E6150S
Deposited Data
Raw microarray data This paper http://www.ebi.ac.uk/arrayexpress/
Accession Number: E-MTAB-5085
Raw scRNA-seq data This paper NCBI GEO Accession Number,
GEO: GSE86929
Mouse: C57BL/6J The Jackson Laboratory Stock No: 000664
Mouse: IL-13-transgenic (tg) mice Emson et al., 1998 N/A
Mycobacterium tuberculosis H37rv ATCC Cat# 27294
Mycobacterium bovis BCG, strain RIVM derived Medac, Hamburg Cat# BCG-Medac
from strain 1173-P2
Recombinant DNA
pMX-Mafb-IRES-Egfp This paper N/A
pMXs-IRES-Egfp Retroviral expression vector Cell Biolabs Cat# RTV-013
pBABE-H2BGFP Fred Dick Lab Coschi et al., 2010 Addgene plasmid # 26790
FISH probes
TK (11qE1) / AurKa (2qH3) red/green Kreatech Cat# KBI-30501
RAB9B (XqF1) / DSCR (16qC4) red/green Kreatech Cat# KBI-30503

Continued
qPCR Primer: Gapdh This paper N/A
Forward: TGGAGAAACCTGCCAAGTATG
Reverse: GTTGAAGTCGCAGGAGACAAC
qPCR Primer: Mafb This paper N/A
Forward: AACGGTAGTGTGGAGGAC
Reverse: TCACAGAAAGAACTGAGGA
qPCR Primer: Myc This paper N/A
Forward: AATCCTGTACCTCGTCCGAT
Reverse: TCTTCTCCACAGACACCACA
qPCR Primer: Ccnd1 This paper N/A
Forward: TGCTACCGACAACGCA
Reverse: TCAATCTGTTCCTGGCAGGC
qPCR Primer: Ccnd2 This paper N/A
Forward: CGTGTGATGCCCTGACTGAG
Reverse: GACTTAGATCCGGCGTTATG
Taqman Gene Expression assay: Emr1 Thermo Fisher Scientific Cat# 4331182
(Mm 00802529_m1)
Taqman Gene Expression assay: Apoe Thermo Fisher Scientific Cat# 4331182
(Mm01307193_g1)
Taqman Gene Expression assay: Nfkbiz Thermo Fisher Scientific Cat# 4331182
(Mm00600522_m1)
Taqman Gene Expression assay: Ccl5 Thermo Fisher Scientific Cat# 4331182
(Mm01302427_m1)
Taqman Gene Expression assay: Chi3l3 Thermo Fisher Scientific Cat# 4331182
(Mm00657889_mH)
Taqman Gene Expression assay: Lox Thermo Fisher Scientific Cat# 4331182
(Mm00495386_m1)
Taqman Gene Expression assay: Ctsk Thermo Fisher Scientific Cat# 4331182
(Mm00484039_m1)
Taqman Gene Expression assay: Mmp9 Thermo Fisher Scientific Cat# 4331182
(Mm 00600163_m1)
Taqman Gene Expression assay: Pcna Thermo Fisher Scientific Cat# 4331182
(Mm00448100_g1)
Taqman Gene Expression assay: Ccnd2 Thermo Fisher Scientific Cat# 4331182
(Mm00438070_m1)
Taqman Gene Expression assay: Mcm6 Thermo Fisher Scientific Cat# 4331182
(Mm00484848_m1)
Taqman Gene Expression assay: Blm Thermo Fisher Scientific Cat# 4331182
(Mm00476150_m1)
Taqman Gene Expression assay: Rad50 Thermo Fisher Scientific Cat# 4331182
(Mm00485504_m1)
Taqman Gene Expression assay: Rad52 Thermo Fisher Scientific Cat# 4331182
(Mm00448543_m1)
Taqman Gene Expression assay: Myc Thermo Fisher Scientific Cat# 4331182
(Mm00487804_m1)
randomhexRT primer GCCTTGGCACCCGAGA Custom made, Integrated DNA N/A
ATTCCANNNNNN technologies
RNA PCR Primers sequences available from Illumina Custom made, Integrated DNA N/A
(RP1, RPI1-RPI12) technologies
192 polyT primers with unique molecular index Custom made, Integrated DNA N/A
and cell barcode, see Table S1 technologies

Continued
FISH imaging software FISHView 2.0 Applied Spectral Imaging http://www.spectral-imaging.com/products-
technologies/capture-analysis/fishview
Spectral imaging Software (Vers. 2.6) Applied Spectral Imaging http://www.spectral-imaging.com/
TIBCO Spotfire Software TIBCO Software http://spotfire.tibco.com/
Scan^ R Acquisition Software Olympus Life Sciences http://www.olympus-lifescience.com/de/
microscopes/inverted/scanr/
Scan^ R Analysis Software Olympus Life Sciences http://www.olympus-lifescience.com/de/
microscopes/inverted/scanr/
GSEA (version 2.0.13) Broad Institute http://software.broadinstitute.org/gsea/
index.jsp
R version 3.2.4 and RStudio The R Foundation https://www.r-project.org/; https://www.
rstudio.com/
ReactomePA, R package Yu and He, 2016 http://bioconductor.org/packages/release/
bioc/html/ReactomePA.html
RaceID2 algorithm Grün et al., 2016 https://github.com/dgrun/StemID
FISH probes
TK (11qE1) / AurKa (2qH3) red/green Kreatech Cat# KBI-30501
RAB9B (XqF1) / DSCR (16qC4) red/green Kreatech Cat# KBI-30503
Further information and requests for reagents may be directed to, and will be fulfilled by the corresponding author Antigoni Trianta-
fyllopoulou (antigoni.triantafyllopoulou@uniklinik-freiburg.de).
Human specimens
Formalin-fixed, paraffin-embedded sections from 10 M. Tuberculosis lung, 15 sarcoidosis skin and 10 giant cell arteritis temporal
artery biopsies, obtained for diagnostic purposes, were analyzed. The demographics of the patients are listed below.
M. Tuberculosis patients (Borstel Cohort): 6 males and 4 females, 18-72 years old at the time of the biopsy.
Skin sarcoidosis patients (Athens Cohort): 2 males and 13 females, 36-75 years old at the time of the biopsy. Giant cell arteritis
patients (Athens Cohort): 2 males and 5 females, 51-67 years old at the time of the biopsy.
Giant cell arteritis patients (Freiburg Cohort): 3 males and 7 females, 65-84 years old at the time of the biopsy.
Protocols for experimental use of clinical samples were approved by the Ethics Committees of the Medical School of Athens
(sarcoidosis and giant cell arteritis samples), the University of Freiburg (giant cell arteritis samples) and the University of Lübeck
(M. tuberculosis samples).
Mice
Conventional C57BL/6 mice were purchased from Charles River or Janvier. IL-13-transgenic (tg) mice were previously described
(Emson et al., 1998). 8-12-week old, age- and sex- matched mice were used for all in vitro and in vivo experiments. For the generation
of bone marrow chimeras C57BL/6 CD45.2+ mice were lethally g-irradiated (900 rads) from a cesium source and subsequently re-
constituted with a mixture of bone marrow cells from C57BL/6 CD45.2+ and C57BL/6 CD45.1+ congenic mice. For the first 4 weeks,
mice received antibiotic-containing drinking water. Animals were allowed to reconstitute for 6-12 weeks prior to infection with
M. bovis BCG. Following reconstitution bone marrow of chimeric mice contained roughly equal numbers of CD45.1+ and CD45.2+
leukocytes. All animal experiments were approved and performed in accordance with the guidelines of the local animal care and
use committees of the Regierungspräsidium Freiburg and Kiel.
METHOD DETAILS
Generation of MMFs from MV precursors

Bone marrow (BM) cells were flushed from the femurs of mice and cultured with murine M-CSF (20 ng/ml, Peprotech) on petri dishes in
DMEM supplemented with 10% FBS for 4-5 days. The adherent cell population (referred to as ‘macrophage precursors’) was then

recovered and plated at 2x104 cells/ml in OptiMEM medium containing 10% FBS and 50ng/ml M-CSF in triplicate wells on 96-well
plates. FSL-1, Pam3CSK4, LPS, poly (I:C) (all from Invivogen) or TNF, IFN-g, IL-1b (Peprotech) were added for an additional
6 days, unless otherwise indicated. For MMF quantification, 96-well plates were stained using the Hemacolor staining kit (Merck)
or DAPI and beta-tubulin. MMFs (defined as cells containing R 3 nuclei) were quantified in triplicate wells. Osteoclastogenesis using
RANKL was done from bone marrow (BM) cells using established protocols, as previously described (Park-Min et al., 2014).
Bacterial culture and infections

Mycobacterium tuberculosis strain H37Rv and Mycobacterium bovis BCG were expanded to log phase on Middlebrook 7H9 liquid
medium supplemented with ADC (Difco), washed, aliquoted in PBS and stored at 80 C until further use. Bacterial stocks
were quantified on 7H11 agar supplemented with OADC (Difco). For mycobacterial infections, animals were inoculated i.p. with
2 3 106 Mycobacterium bovis BCG or infected via the aerosol route with a low dose of 100 CFU M. tuberculosis H37Rv, as previously
described (Hölscher et al., 2008).
Retroviral transductions
Retrovirus packaging was performed by transfecting the retroviral vectors into Phoenix cells using FuGENE6 (Roche) for the pBABE-
H2BGFP vector (pBABE-H2BGFP was a gift from Fred Dick (Addgene plasmid # 26790)) and CalPhos transfection reagent (Clontech)
for the pMX-Mafb-IRES-Egfp and pMX-IRES-Egfp empty vectors. Bone marrow cells were infected with the recombinant retrovi-
ruses in the presence of 4mg/ml polybrene and M-CSF (20ng/ml) for 24h, after which the media was changed. After 48h adherent
cells were collected, GFP+ cells were sorted and re-plated for stimulation.
FISH
Labeled probes from four different mouse chromosomes; 2qH3(Anurka); 11qE1 (Tlk2); 16qC4(Rcan1); XqF1(Rab9b) were hybridized
to methanol-acetic acid fixed cells, according to suppliers instruction (Kreatech). After hybridization and washing cells with specific
hybridization signals were photographed using specific sets of filters using fluorescence microscope (Axio Imager, Zeiss) equipped
with a CCD camera and digitized images of the FITC, CY3, and DAPI signals of same cell were merged using the FISH imaging soft-
ware, FISHView 2.0 (Applied Spectral Imaging).
SKY
Metaphase chromosomes were prepared according to standard procedures. Hybridization with mouse SKY chromosome paints
(SkYPaint, Applied Spectral Imaging) was carried out following manufacturer’s instructions. After hybridization and washing, spectral
images were acquired using a HiSky system (SD300) and dedicated Spectral imaging Software (Vers. 2.6). Obtained SKY images
were then analyzed by the SkyView software, version 6.0 (Applied Spectral Imaging). Karyotypes depicted in the figures are prepared
from the spectrally classified pseudo-colored chromosomes.
Immunofluorescence and immunohistochemistry in mouse tissues

For immunohistochemistry, liver was fixed in 10% formalin, embedded in paraffin and cut into 3–5 mm sections. Paraffin sections
were rehydrated and heat-induced antigen retrieval was performed in citrate buffer or TRIS buffer. Immunofluorescence of mixed
chimeric livers was performed on 5 mm-thick cryosections, fixed with 4% PFA at room temperature and treated with 0.5% Triton
X-100 in PBS for 10 min. Undecalcified bone cryosections were prepared as previously described (Zehentmeier et al., 2015).
For immunofluorescence of in vitro samples, macrophage precursors were plated in 96-well plates (tissue culture-treated, BD
falcon black plates) at 2x104 cells/ml in OptiMEM (GIBCO) medium with 10% FBS, 50ng/ml M-CSF and stimulated with TLR ligands,
cytokines or vehicle. At the indicated time points, the cells were fixed with 4% paraformaldehyde at room temperature for 15 min and
treated 0.5% Triton X-100 in PBS for 10 min.
Subsequent to fixation and antigen retrieval or permeabilization, sections or cells were blocked with a solution containing 1% BSA/
10% goat serum/0.3% Triton X-100 in PBS and incubated overnight at 4 C with the following primary antibodies, diluted in 1% BSA/
0.3% Triton X-100 in PBS: anti-CD45.1-PE (Ebioscience), anti-CD45.2-FITC (Ebioscience), anti-F4/80-Alexa647 or anti-F4/80 un-
conjugated (AbD Serotec), anti-gH2AX Ser139 (Cell Signaling), anti-Ki67 (Pierce Thermo Scientific), anti-b-tubulin (Sigma), anti-
MTB (Serotec). Next, sections or cells were incubated with highly cross-adsorbed secondary antibodies raised against mouse, rabbit
or rat and labeled with Alexa 488, Alexa 546 or Alexa 633 Fluorophores (Molecular Probes, Life Technologies), for 1 hr at room tem-
perature. Nuclei were stained with 4’,6-Diamidino-2-Phenylindole Dihydrochloride (DAPI, 10 mg/ml in PBS). When Click-it EdU reac-
tions (Molecular Probes, Life Technologies) were combined with antibody staining, these were performed prior to incubation with the
primary antibodies, while EdU was added to the medium prior to fixation, following the manufacturer’s protocol. For BrdU staining
and DNA content analysis, BrdU was added to the medium prior to fixation and incubated for 30 min. Cells were fixed with ice-cold
methanol for 10 min at 20 C. Cells were slowly rehydrated with ice-cold PBS, treated with 2N HCl for 1 hr at room temperature to
denature DNA, blocked and incubated overnight at 4 C with mouse monoclonal anti-BrdU antibody (Pierce), followed by incubation
with secondary antibody as above. Nuclear RNA was digested with DNase free RNase A (Sigma) for 30 min at 37 C and stained with
Propidium Iodide (PI) 50 mg/ml for 30 min at room temperature. Alternatively, staining of nuclear DNA with DAPI, following initial fix-
ation with 4% PFA and permeabilization in 0, 5% Triton X-100 in PBS, was equivalent to PI staining for quantitation of nuclear DNA

content by image cytometry. For lipid body staining, cells were fixed for 15 min in PBS-PFA 4%, permeabilized with Triton X 0.1% for
10 min, stained with Nile Red (Sigma-Aldrich, 1:10,000 dilution, from a stock solution of 5mg/ml in acetone) for 5 min, then washed
with PBS. Image acquisition of multiple random fields was automated on a Scan^R screening station (Olympus, Germany) and
analyzed by using Scan^R (Olympus, Germany) analysis software.
Bone cryosections were permeabilized using 0.3% Triton X-100 in PBS for 10 min and blocked with 10% goat serum, 1% BSA in
PBS for 15 min. Primary antibodies against CD45.1 (BioLegend, Clone 104, BLD-109816), CD45.2 (BioLegend, Clone A20, BLD-
110720), CD45 (eBioscience, Clone 30-F11, 14-0451-82) were diluted 1:100 in 1% BSA in PBS staining buffer and incubated for
1-1.5 hr at RT. Secondary antibodies anti-rat Alexa647 (Invitrogen, A-21208) were diluted 1:500 in staining buffer and incubated
for 1 hr at RT. Brightfield images were acquired on a Keyence Z-9000 system. Confocal fluorescent and DIC images were acquired
with a Nikon A1Rsi+ system using 405 nm, 488 nm and 640 nm laser excitation.
DNA Fiber Assay

DNA fiber assay was performed as described (Berte et al., 2016) with slight modifications. Following stimulation cells were pulse
labeled with 25 mM 5-chloro-20 -deoxyuridine (CldU; Sigma) followed by labeling with 250 mM 5-iodo-20 -deoxyuridine (IdU; TCI
Deutschland, Eschborn, Germany) for 20 min each. Labeled cells were harvested by scraping in ice-cold PBS and lysed onto
SuperFrost slides. DNA fibers were allowed to stretch and were fixed. HCl (2.5 M)-treated fiber spreads were stained with monoclonal
rat anti-BrdU (AbD Serotec, 1:1000) followed by monoclonal mouse anti-BrdU (Becton–Dickinson, 1:1500). Primary antibodies
were detected by donkey Fab2 anti-rat Cy3-coupled and anti-mouse Alexa488-coupled secondary antibodies (Jackson
ImmunoResearch, Europe, 1:500). Fibers were examined and images captured using LSM 710 supplied with ZEN 2009 software
(Carl Zeiss, Germany). CldU-labeled and IdU-labeled tracks were measured using LSM Image Browser (Carl Zeiss, Germany) and
mm values were converted into kilo base pairs. At least 150 forks were analyzed from 3 repetitions. DNA fiber structures from 3 in-
dependent experiments were counted in ImageJ software (Version 1.44p) using the Cell Counter function.
Immunohistochemistry (IHC) and Indirect Immunofluorescence (IF) in human samples

Paraffin sections (4 mm thick) were deparaffinized and gradually rehydrated. Antigen retrieval was carried out in 10mM citrate buffer
(pH 6.0) by heating the slides for 25 min in a microwave oven. For IHC the anti-gH2AX primary antibody (05-636, EMD Millipore)
diluted 1:1000 in TBS, was applied overnight at 4 C. Blocking and signal detection was carried out with the Ultravision Quanto Detec-
tion System HRP DAB kit (Cat no: TL-125-QHD Thermo Scientific) according to manufacturer’s instructions. Hematoxylin was used
as counterstain. Stainings were observed with the Leica DM 1750M microscope equipped with a DFC 329 Leica digital camera and
image acquisition was performed with the Leica Application Suite (LAS) v4.4.0 software. A total of 500-1000 cells (10-20 high power
fields) were evaluated in granuloma and adjacent unaffected regions), respectively (when this was feasible due to limited tissue). The
mean number of gH2AX positive cells, in these areas (granuloma-adjacent), for each sample per disease was estimated and depicted
in histograms. Intra-observer variability was minimal (p < 0.01). Statistical tests were performed by the SPSS v17.0.
For IF the following primary antibodies were applied at 4 C overnight: i) anti-53BP1 (ab21083, Abcam), diluted 1:200 in TBS ii) anti-
RPA2 (ab2187 [9H8], Abcam), diluted 1:200 in TBS and iii) anti-phospho RPA2 [S4/S8] (ab87277, Abcam), diluted 1:200 in TBS. Slides
were treated with blocking solution containing 1% BSA and 5% normal goat serum (Invitrogen, #31873). Secondary antibodies were
Alexa Fluor 488 goat anti-mouse IgG (H+L), (Invitrogen, # A-11001), dilution 1:500 in TBS and Alexa Fluor 568 goat anti-rabbit IgG
(H+L), (Invitrogen, #A-11011, dilution 1:500 in TBS. Sections were counterstained with 100 ng/ml of 4,6-diamidino-2-phenylindole
(DAPI). Image acquisition of multiple random fields was automated on a Scan^R screening station (Olympus, Germany) and analyzed
by using Scan^R (Olympus, Germany) analysis software, or a Zeiss Axiolab fluorescence microscope equipped with a Zeiss Axiocam
MRm camera and Achroplan objectives while image acquisition was performed with AxioVision software release 4.7.1.
Samples either previously characterized (for gH2AX and 53BP1 immunoreactivity) or instructed by the manufacturer (for RPA2 and
phospho RPA2), served as positive controls while omission of the primary antibody was performed in negative control assays.
Time-lapse live-cell imaging

Live-cell imaging was started at day 3 after stimulation of bone marrow-derived macrophage precursors with FSL-1 (20ng/ml,
Invivogen) or medium. Cells were grown in cell culture dishes (Cellbind, Corning) in Optimem (GIBCO) containing 10% FBS and
50ng/ml M-CSF. In some experiments bone marrow macrophage precursors were transduced with retrovirus expressing
H2BGFP prior to stimulation, as described above. Similar rates of cytokinesis failure were observed with or without retroviral trans-
duction. Time-lapse live-cell imaging was performed using an LSM 710 confocal microscope equipped with epifluorescence live-cell
imaging setup providing a humidified atmosphere at 37 C with 5% CO2 (Carl Zeiss). Sample illumination was kept to a minimum and
had no adverse effects on cell division and proliferation. Differential interference contrast (DIC) and fluorescent (GFP filter) images
were acquired every 1-15 min with a 40x Objective. Image analysis was performed using Zeiss ZEN software.
Quantitative Image-Based Cytometry (QIBC)

Image acquisition was performed on an Olympus IX-81 inverse microscope using a UPLSAPO 20x objective (N.A. 0.75) and Scan^R
Acquisition software, as recently described in detail (Toledo et al., 2013). Acquisition times for the different channels were adjusted to
obtain images under non-saturating conditions for all the treatments analyzed within the experiment. For QIBC analysis of single

nuclei in vitro, 25 to 96 images were acquired per well using triplicate wells per condition, containing in total 5000 to 10000 cells per
condition. After acquisition, images were processed for automated analysis using the Scan^R Analysis software. A dynamic back-
ground correction was first applied to the images. DAPI or PI signal was used to generate a mask that identified each individual
nucleus as an individual object. This mask was then applied to quantify pixel intensities in the different channels for each individual
cell/object. The watershed segmentation algorithm included in the software was applied to separate nuclear clusters. Geometrical
parameters (area, circularity, and physical position in the field of view) are calculated for each individual object. After segmentation
and pixel quantification, the desired quantified values for each nucleus (mean and total intensities, area) were extracted and data
were analyzed using the Scan^R and Spotfire softwares to quantify percentages and average values and to generate color-coded
scatter diagrams in a flow-cytometry-like fashion. Fragmented and apoptotic nuclei were excluded based on their total DAPI fluo-
rescence, circularity, and nuclear area. Single cell analysis by high content microscopy not only provides the spatial resolution of
fluorescence imaging, but also greatly exceeds flow-cytometry and immunoblotting in resolution and quantitative power (Toledo
et al., 2013). Particularly given the strong adherence of TLR2-induced polyploid macrophages to their matrix (L.H. and A.T., unpub-
lished data), QIBC was superior in quality and resolution to any technique that required detaching MMF, such as collection of cells for
flow cytometry, or western blot. Furthermore, QIBC offered the unique advantage of permitting discrimination between multinucle-
ated polyploid macrophages with diploid nuclei from mononucleated polyploid macrophages. Moreover, QIBC allowed us to relate
DNA synthesis and DDR to single nuclei for thousands of cells with varying numbers of nuclei and nuclear DNA content, using cell-
cycle asynchronous populations.
For QIBC analysis of micronuclei, micronuclei were identified based on total DAPI fluorescence, circularity, and area. For QIBC anal-
ysis of nuclear area in granuloma-associated macrophage nuclei, we established the following algorithm: using surface F4/80 expres-
sion as a primary object, we gated in granulomas, defined as F4/80+ objects with a large area. A ‘large’ area of F4/80+ objects was
defined based on the presence of more than 20 F4/80+ cells within that area. Within the granuloma gate, single nuclei were identified
as secondary objects, based on total DAPI fluorescence, while fragmented nuclei were excluded based on their total DAPI fluorescence
and nuclear area. The nuclear area was then analyzed within single granuloma nuclei (Figure 1E). For QIBC analysis of Ki67 expression
in F4/80+ cells, all F4/80+ cells were included in the analysis using F4/80 positivity to define the primary object while single nuclei asso-
ciated with F4/80+ objects were identified as secondary objects, based on their total DAPI fluorescence, excluding fragmented nuclei
as above. Ki67 expression within single nuclei was then analyzed (Figure 1H), as percent of total nuclei associated with F4/80+ cells.
For QIBC analysis of lipid bodies, in order to distinguish the lipids contained within lipid bodies from those of the cell membranes,
we took advantage of the fluorescent emission spectrum properties of Nile Red, which depend upon the lipid Nile Red is associated
with, i.e., for triacylglycerol: lmax em = 590nm, for phospholipids: lmax em = 640nm (Molecular Probes handbook), as previously
described (Peyron et al., 2008). On high content images and using the Olympus Scan^R analysis software, the phospholipid back-
grounds of both lipid and non-lipid body laden macrophages were summed together into joint sub-objects and are displayed in
red, while the triacylglycerol-rich lipid bodies appear in white. The threshold for Nile Red positivity was set based on the total Nile
Red fluorescence intensity of cells with more than 50% of their cytoplasm stained.
qRT-PCR and microarray analysis

Total RNA was isolated using the RNeasy Micro Kit (QIAGEN) and reversed transcribed using the First Strand cDNA Synthesis kit
(Fermentas). qRT-PCR was performed in triplicate using an Eppendorf Realplex Thermal Cycler (Eppendorf), following the manufac-
turer’s protocols. Relative amounts of mRNA were calculated by the DDCt method and normalized for levels of Gapdh. The following
primer sequences were used: Myc, forward 50 -AATCCTGTACCTCGTCCGAT-30 , reverse 50 - TCTTCTCCACAGACACCACA-30 ;
Ccnd1, forward 50 -TGCTACCGACAACGCA-30 , reverse 50 - TCAATCTGTTCCTGGCAGGC-30 ; Ccnd2, forward 50 -CGTGTGATGC
CCTGACTGAG-30 , reverse 50 -GACTTAGATCCGGCGTTATG-30 ; Gapdh, forward 50 - TGGAGAAACCTGCCAAGTATG 30 , reverse
50 - GTTGAAGTCGCAGGAGACAAC 30 ; Mafb, forward 50 - AACGGTAGTGTGGAGGAC 30 , reverse 50 - TCACAGAAAGAACTG
AGGA 30 . For microarrays, total RNA was isolated using the RNeasy Micro Kit (QIAGEN) and samples with an RIN greater than
8 were further processed with the Ambion WT Expression kit (Ambion, USA) as described by the manufacturer. The resulting cDNAs
were fragmented and then labeled using the Affymetrix Terminal Labeling kit (Affymetrix, USA). Labeled fragments were hybridized to
Affymetrix GeneChip ST 2.0 arrays for 16 hr at 45 C with 60 rpm in an Affymetrix Hybridization oven 645. After washing and staining,
the arrays were scanned with the Affymetrix GeneChip Scanner 3000 7G. CEL files were produced from the raw data with Affymetrix
GeneChip Command Console Software Version 3.0.1. For the GSEA analysis (version 2.0.13) differentially expressed genes in each
group were compared to the indicated gene sets available on the GSEA homepage. A list of gene sets is available as electronic file.
For the heatmap, a selection of genes based on the differences in gene expression, the involvement in immunological processes and
the GSEA analysis were plotted.
Single-cell RNA library preparation

Single cell RNA sequencing was performed using CEL-Seq2 method (Hashimshony et al., 2016) with several modifications. A 5-fold
volume reduction was achieved using a nanoliter-scale pipetting robot (Mosquito HTS, TTP Labtech). Single cells were sorted into
384-well plates containing 240nL of primer mix (CEL-Seq Primer sequences are listed in Table S1) and 1.2 mL of PCR encapsulation
barrier, Vapor-Lock (QIAGEN). Sorted plates were centrifuged at 2200 g for 10min at 4 C, snap-frozen in liquid nitrogen and stored
at 80 C until processed. 160nL of reverse transcription reaction mix and 2.2 mL of second strand reaction mix was used to convert

RNA into cDNA. cDNA from 96-cells were pooled together before clean up and in vitro transcription, generating 4 libraries from one
384-well plate. 0.8 mL of AMPure/RNAClean XP beads (Beckman Coulter) per 1 mL of sample were used during all the purification
steps including the library cleanup. Other steps were performed as described in the original protocol. Twelve libraries (1152 single
cells) were sequenced on a single lane (pair-end multiplexing run, 100bp read length) of an Ilumina HiSeq 2500 sequencing system
generating 200 million sequence fragments.
Quantification of Transcript Abundance

Paired end reads were aligned to the transcriptome using bwa (version 0.6.2-r126) with default parameters (Li and Durbin, 2010). The
transcriptome contained all RefSeq gene models based on the mouse genome release mm10 downloaded from the UCSC genome
browser comprising 31,201 isoforms derived from 23,538 gene loci (Meyer et al., 2013). All isoforms of the same gene were merged to
a single gene locus. The 65bp right mate of each read pair was mapped to the ensemble of all gene loci and to the set of 92 ERCC
spike-ins in sense direction (Baker et al., 2005). Reads mapping to multiple loci were discarded. The 25bp left read contains the bar-
code information: the first six bases corresponded to the cell specific barcode followed by six bases representing the unique molec-
ular identifier (UMI). The remainder of the left read contains a polyT stretch. The left read was not used for quantification. For each cell
barcode, the number of UMIs per transcript was counted and aggregated across all transcripts derived from the same gene locus.
Based on binomial statistics, the number of observed UMIs was converted into transcript counts (Grün et al., 2014).
Single-Cell RNA Sequencing Data Analysis

Twelve libraries (1152 single cells) were sequenced and, after quality controls, data from 563 cells (199 of Ctrl; 333 of F2c and 31 from
F > 4c) were further analyzed. We quantified 12,821 genes and down-sampling to 3,000 transcripts per cell was used for data normal-
ization. Identification and visualization of different subpopulations were performed with the RaceID2 algorithm (Grün et al., 2016).
Briefly, down-sampling to 3,000 transcripts was used for data normalization. Initial clustering and subsequent outlier cell identifica-
tion were performed using k-medoids clustering followed by outlier identification. The t-distributed stochastic neighbor embedding
(t-SNE) algorithm was used for dimensional reduction and cell cluster visualization (van der Maaten and Hinton, 2008). For better visu-
alization, the t-SNE algorithm is initialized with positions in the embedded space as determined by classical multidimensional scaling.
Differentially expressed genes between two subgroups of cells were identified similar to a previously published method (Anders and
Huber, 2010). First, negative binomial distributions reflecting the gene expression variability within each subgroup were inferred
based on the background model for the expected transcript count variability computed by RaceID2 (Grün et al., 2016). Using these
distributions, a p value for the observed difference in transcript counts between the two subgroups was calculated and multiple
testing corrected by the Benjamini-Hochberg method. Pathway enrichment analysis was performed on genes exhibiting a minimum
of 2-fold expression difference between the two subgroups with a p-adjusted value less than 0.05. Pathway enrichment analysis was
performed using R package, ReactomePA based on Reactome pathway database (Yu and He, 2016).
Laser microdissection and gene expression analysis

Microdissection of F4/80hi macrophages and F4/80low binucleated and MMF from M. bovis BCG liver granulomas was performed
3 weeks post infection using a Zeiss PALM MicroBeam (Zeiss) instrument. Liver tissue and Kupffer cells were isolated from PBS-
treated age- and sex-matched control mice. Fast immunochemistry of serial sections was performed with F4/80 antibody (Serotec).
Immunostained sections were counterstained with DAPI to facilitate the identification of mononucleated and multinucleated macro-
phages. RNA was isolated with the ARCTURUS PicoPure RNA Isolation Kit (Thermo Fisher) and reverse transcription, pre-amplifi-
cation and real-time PCR were performed using Thermo Fisher reagents according to the manufacturer’s recommendations. For
gene expression analysis, we used the following TaqMan Gene Expression Assays:
Gapdh (Mm03302249_g1),
Emr1 (Mm 00802529_m1),
Apoe (Mm01307193_g1),
Nfkbiz (Mm00600522_m1),
Ccl5 (Mm01302427_m1),
Chi3l3 (Mm00657889_mH),
Lox (Mm00495386_m1),
Ctsk (Mm00484039_m1),
Mmp9 (Mm 00600163_m1),
Pcna (Mm00448100_g1),
Ccnd2 (Mm00438070_m1),
Mcm6 (Mm00484848_m1),
Blm (Mm00476150_m1),
Rad50 (Mm00485504_m1),
Rad52 (Mm00448543_m1) and
Myc(Mm00487804_m1).

Immunoblotting
Whole cell extracts and nuclear extracts were obtained and stored at 80C until processing for Immunoblot. Lysates of equal cell
numbers were fractionated on 12% polyacrylamide gels by SDS-PAGE, transferred to polyvinylidene fluoride membranes (Millipore),
incubated with specific antibodies and enhanced chemiluminescence (Amersham) was used for detection. The following primary
antibodies were used: mouse monoclonal anti-c-Myc, mouse monoclonal anti-Cyclin D1, rabbit polyclonal anti-cyclin D2, goat poly-
clonal anti-MafB (Santa Cruz), rabbit polyclonal anti-p38 (Cell signaling), mouse monoclonal anti-Tata-binding protein (Tbp, EMD
Millipore and Abcam).
Data are presented as mean ± SD. Sample number (n) indicates the number of independent biological samples in each experiment.
Sample numbers and experimental repeats are indicated in figures and figure legends or methods section above. p value of datasets
was determined by Student’s t test with 95% confidence interval. All statistical tests were performed with Graph Pad Prism V4 soft-
ware (*p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, n.s. not significant).
DATA AVAILABILITY
The accession number for the gene array data reported in this paper is ArrayExpress: E-MTAB-5085. The accession number for the
scRNA seq data reported in this paper is NCBI GEO: GSE86929.


Figure S1. MV Precursors Rather than Differentiated Monocytes form MMFs, Related to Figure 1
(A–G) C57BL/6 mice were infected with Mycobacterium (M.) bovis BCG i.p. and analyzed at the indicated time points.
(A) IF for F4/80 and DAPI in liver granulomas. p.i., post infection.
(B) Quantification of BiNucl and MultiNucl cells/granuloma. Mean ± SD from 3 mice per time point are shown.
(C) IF for F4/80, Ki67, DAPI in liver granulomas. p.i., post infection. Note the small size of granuloma nuclei associated with F4/80+ cells 1 week after infection.
(D) Quantification of Ki67+ MV by QIBC. Mean ± SD from 1300-4000 nuclei per mouse from 3 mice per time point shown as percent of total nuclei analyzed.
(E and F) Quantitation of the location of BiNucl and MMFs and Ki67+F4/80+ cells in granulomas. 50 granulomas pooled from 3 mice infected with M. bovis BCG
3 weeks post infection were analyzed.
(G) Position of Ki67+F4/80+ cells in larger granulomas is ill-defined. Representative images of liver sections stained with antibodies to F4/80, Ki67 and DAPI are
shown. p.i., post infection.
(H) Experimental set up related to Figures S1I and S1J.
(I) At the left, representative flow cytometric dot plots of bone marrow (BM), indicating the gates used to sort (a) whole BM, (b) uncommitted hematopoietic
progenitors (identified as Gr1 CD11b Kit+CD115 ), (c) common macrophage dendritic cell progenitors (MDPs, identified as Gr1 CD11b Kit+CD115+), and (d)
mature monocytes (Gr1int/ CD11b+Kit CD115+). At the right, representative pictures of sorted MDPs stimulated with FSL-1 versus medium for 6 days, then fixed
and stained with Hemacolor (Sigma).
(J) Quantification of the number of MMFs. The bars show mean ± SD of 3 independent experiments.
(K) Experimental set-up and representative flow cytometric histograms of BM and BM-derived mononuclear phagocytes. Whole BM was stained for flow cy-
tometry directly (day 0) or adherent cells were harvested after 1, 3, 7 days of culture and stained for flow cytometry.
***p < 0.001, Scale Bars, 10 mm.
Figure S2. Identification of Inflammatory Signals Leading to MMF Formation, Related to Figure 1
(A) Quantification of MMF, as a timeline, following stimulation with FSL-1, BCG or medium. The bars show mean ± SD from 3 independent experiments.
(B–D) Identification of inflammatory signals leading to MMF formation. Quantification of MMF, following MF precursors stimulation with the indicated cytokines
or TLR ligands. Data are shown as mean ± SD from 3 independent experiments.
(E and F) Requirement for MyD88 at the single cell level for TLR2-induced generation of MMF. A 1:1 mixture of CD45.1+Myd88+/+ and CD45.2+Myd88 / MF
precursors were stimulated with FSL-1 (20ng/ml) or medium and MMF were analyzed for CD45.1 and CD45.2 expression. (E) Representative images of one out of
3 independent experiments are shown. (F) Quantification of CD45.1 and CD45.2 expression by MMF. Data are representative of 3 independent experiments.
(G and H) TNF is not required for TLR2-induced generation of MMF. Tnf+/+ and Tnf / MF precursors were stimulated with FSL-1 for 6 days. (G) Representative
images of one out of 2 independent experiments are shown. (H) Quantification of BiNucl and MMF in an unbiased, automated manner, by QIBC. The bars show
mean ± SD from 2 independent experiments.
*** < 0.001, Scale Bars, 10 mm.
Figure S3. Gene Signatures Characteristic of Polyploid Macrophages, Related to Figure 1
(A) Experimental set up for scRNA-seq experiments (related to Figures 1H, 1K, 4B, S3C–S3F, and S4).
(B) Sorting strategy for scRNA-seq experiments. Doublets were excluded based on Trigger Pulse Width. Control MF precursors with a 2c DNA content and FSL-1
stimulated MF precursors with 2c (diploid) and > 4c (polyploid) DNA content were single cell sorted for RNA-seq.
(C) t-SNE map representation of transcriptome similarities of control 2c, FSL-1-stimulated 2c and > 4c macrophages. RaceID2 algorithm identified 28 different
clusters including outliers, highlighted with different colors and numbers.
(D) t-SNE map showing the experimental condition for each cell.
(E) Heatmap showing the expression profile of shortlisted genes characterizing MMF differentiation in FSL-1-stimulated 2c and > 4c MMF within cluster 5, 9
and 13.
(F) Violin plots comparing expression of Ccl2, Cll7, Ccl12 (C) and Chi3l3, Lox, Mmp8 and Mmp9 (D) in MF precursors stimulated with FSL-1 or medium for 6 days,
isolated based on their DNA content and analyzed by single cell RNA-seq. The y axis indicates the log2 (normalized count+0.1) expression levels. The black point
indicates the mean of expression level.
(G and H) C57BL/6 mice were infected with Mycobacterium (M.) bovis BCG i.p. 3 weeks post infection liver cryosections were stained with F4/80. Indicated
populations were isolated by laser capture microdissection (LCM). Representative images of liver granuloma before LCM- (G) and after (H) LCM-guided isolation
of F4/80hi and F4/80low bi- and multinucleated granuloma macrophages.
(I) qRT-PCR analysis of Emr1 mRNA expression (encoding F4/80) in granuloma MF populations, liver Kupffer cells and liver tissue verifying purity of isolated
populations. Data are shown as mean ± SEM of duplicate determinants from 5-9 independent biological replicates per group and are normalized relative to
Gapdh mRNA expression.
*p < 0.05, *** < 0.001, Scale Bar 30 mm.
5990979:Cell Cycle, Mitoc
5991662:Cholesterol biosynthesis
5990983:Regulaon of DNA replicaon
5991009:Cell Cycle Checkpoints
5990991:Mitoc G1-G1/S phases
5990980:Cell Cycle
5990990:G1/S Transion
5991078:Metabolism of nucleodes
5991079:Detoxificaon of Reacve Oxygen Species
5992272:Binding and Uptake of Ligands by Scavenger Receptors
5991080:Synthesis and interconversion of nucleode di- and triphosphates
5992246:Crosslinking of collagen fibrils
5991454:M Phase
5991006:p53-Dependent G1 DNA Damage Response
5991007:p53-Dependent G1/S DNA damage checkpoint
5990998:DNA strand elongaon
5991008:G1/S DNA Damage Checkpoints
5991558:Unwinding of DNA
5990977:DNA Replicaon Pre-Iniaon
5990978:M/G1 Transion
5990987:Synthesis of DNA
5990981:DNA Replicaon
5990988:S Phase
5991979:Sphingolipid metabolism
5992207:Glycosphingolipid metabolism
5991156:Immune System
5992227:Trafficking and processing of endosomal TLR
5992254:MHC class II angen presentaon
5992296:Syndecan interacons
5992275:Scavenging by Class A Receptors
5992281:Non-integrin membrane-ECM interacons
5992181:Assembly of collagen fibrils and other mulmeric structures
5992281:ECM proteoglycans
5991800:HDL-mediated lipid transport
5991537:Inial triggering of complement
5991538:Complement cascade
5992120:Class I MHC mediated angen processing & presentaon
5992066:Interferon gamma signaling
5991303:Cytokine Signaling in Immune system
5992067:Interferon Signaling
5992144:Cross-presentaon of parculate exogenous angens
5991584:Classical anbody-mediated complement acvaon
5991585:Creaon of C4 and C2 acvators
5991581:Lipoprotein metabolism
5992141:Angen processing-Cross presentaon
5992176:Degradaon of the extracellular matrix
5992182:Collagen formaon
5991832:Chemokine receptors bind chemokines
5992175:Collagen degradaon
6 5991065:Metabolism of lipids and lipoproteins
5
5991155:Innate Immune System
5991413:Integrin cell surface interacons
4 5991694:NF-kB is acvated and signals survival
3 5991684:p75NTR signals via NF-kB
Ctrl
F>4c vs
F2c
Ctrl
F> 4c vs
F>4c vs
Ctrl
F>4c vs
F2c
F2c vs
Ctrl
F2c vs
0
Upregulated genes Downregulated genes
Figure S4. Gene Signatures Characteristic of Polyploid Macrophages-Reactome Pathway Enrichment Analysis, Related to Figure 1
MF precursors stimulated with FSL-1 or medium for 6 days were isolated based on their DNA content and analyzed by single cell RNA-seq. Reactome pathway
enrichment analysis was performed on genes exhibiting a minimum of 2-fold expression difference between the two subgroups with a p-adjusted value less
than 0.05.
A C
CD45.1:CD45.2 (CD45.1 x CD45.2) F1
uninfected chimera control uninfected non-chimeric control
CD45.1:CD45.2
50%:50%
Analysis
Week: -6 0 2-3
B
CD45.2 CD45.1 merge
DAPI DAPI
D endoreplication
F
G2 11, DAPI
2c S
x2 G1 4c
E cytokinesis failure
M. bovis BCG
M
G2
2c 2c
S
G1
2c
G cytokinesis failure cytokinesis failure

M M
G2 G2 4c
2c 2c
S S
G1
2c G1
4c
Figure S5. MMF Formation from MF Precursors Does Not Involve Cell-to-Cell Fusion, Related to Figure 2
(A and B) M. bovis BCG-induced MMF do not require cell-to-cell fusion in vivo. (A) CD45.2 mice were lethally irradiated and reconstituted with a mixture of BM
cells from CD45.1 and CD45.2 mice. 6-12 weeks later, uninfected mice were analyzed. Representative images of bone cryosections from 2 independent ex-
periments showing roughly equal numbers of either CD45.1+ or CD45.2+ bone marrow leukocytes. Scale Bar 100 mm
(B) CD45.2 mice were lethally irradiated and reconstituted with a 1:1 mixture of bone marrow cells from CD45.1 and CD45.2 mice. 6-12 weeks later, the mice were
infected with M. bovis BCG i.p. and analyzed 2-3 weeks p.i. Liver granuloma cryosections were stained with antibodies to CD45.1, CD45.2 and F4/80.
Representative images from 3 independent experiments with n = 5 mice per experiment are shown. p.i., post infection.
(C) (CD45.1xCD45.2)F1 non-chimeric uninfected mice were analyzed as controls. Representative images of bone cryosections from 2 independent experiments
showing numerous leucocytes double positive for CD45.1 and CD45.2. Scale Bar 100 mm
(D) Schematic depiction of polyploidy arising via endoreplication: cycling cells undergo multiple cell cycles without entering mitosis, doubling their DNA content
within a single nucleus.
(E) Schematic depiction of polyploidy arising via cytokinesis failure: cells enter mitosis but fail to physically split into two following chromosome segregation,
giving rise to BiNucl tetraploid daughter cells.
(F) C57BL/6 wild-type were infected with M. bovis BCG i.p. 2-3 weeks p.i. liver cryosections were analyzed by FISH for chromosome 11. Representative pictures
from 3 independent experiments are shown. p.i., post infection.
(G) Schematic depiction of polyploidy arising via cytokinesis failure: following a first cytokinesis failure, BiNucl tetraploid daughter cells, re-enter mitosis and re-
distribute their genetic content into two nuclei thus generating a BiNucl daughter cell with 4c DNA content in each nucleus.
A B
Control
Ccnd1 Ccnd2
Relative expression
FSL-1 FSL-1: - - - + +
**
10 3 day: 0 3 6 3 6
*
8
2 cyclin D1
6
4 1 cyclin D2
2
0 0
day: 0 1 3 6 Tbp
day: 0 1 3 6
C D A Myd88+/+
B
Control
BrdU, mean intensity per nucleus (A.U.)

DAPI
A B Myd88+/+
FSL-1
BrdU
Myd88+/+, control Myd88+/+, FSL-1 Myd88-/-, FSL-1
E Gate A Gate B A B Myd88-/-

Control FSL-1
**
% of total nuclei
25 ** 3 FSL-1
20
15 2
10
1
5
0 0 DAPI, total intensity per
Myd88+/+ Myd88-/- Myd88+/+ Myd88-/- nucleus (A.U.)
Figure S6. TLR2 Signaling Confers a Proliferation Advantage to Polyploid MF Progeny, Related to Figure 4
(A) qRT-PCR of Cnnd1 and Cnnd2 mRNA, normalized relative to Gapdh mRNA expression. Mean ± SD of triplicate determinants pooled from 3 independent
experiments.
(B) IB of nuclear lysates for Cyclins D1 and D2. Example of 2 independent experiments.
(C–E) Increased BrdU incorporation into polyploid nuclei, IF for BrdU, DAPI; QIBC. (C) Representative images. (D) Mean BrdU fluorescence versus total DAPI
intensity per single nucleus. (E) Percent of BrdU+ nuclei belonging in gate A or B, as in (D). n = 1000-5000 nuclei per condition. Mean ± SD from 3 independent
experiments.
*p < 0.05, **p < 0.01, Scale Bar, 10 mm.
Figure S7. Replication Stress and Activated DDR in Granulomas Enriched in MMF In Vivo, Related to Figure 6
(A–C) Representative images of uninvolved tissue adjacent from granulomatous areas from 10-15 patient biopsies per granulomatous disease are shown. (A) IH
for gH2AX. (B) IF for p-RPA2. (C) IF for 53BP1. n = 10-17 patient biopsies per disease.
(D) GSEA for p53-dependent genes in MF precursors stimulated for 6 days with FSL-1 versus medium. Gene array was performed with 4 independent biological
replicates per group.
Scale Bars, 100 mm.
Article
Epigenetic Activation of WNT5A Drives Glioblastoma

Stem Cell Differentiation and Invasive Growth
Baoli Hu, Qianghu Wang, Y. Alan Wang, ...,
Lynda Chin, Roeland G.W. Verhaak,
Ronald A. DePinho
Correspondence
yalanwang@mdanderson.org (Y.A.W.),
rdepinho@mdanderson.org (R.A.D.)
In Brief
Epigenetic activation of WNT5A
expression contributes to glioblastoma
tumor recurrence by promoting
differentiation of glioma-derived stem
cells into endothelial cells.

d Comparisons of NSCs and derivative GSCs reveal elevated GSE85615
WNT5A and EC signature GSE86624
d PAX6/DLX5 bidirectionally regulates WNT5A during

differentiation of GSCs into GdECs
d WNT5A-mediated GdEC differentiation and EC recruitment

support GSC invasive growth
d Clinical studies of peritumoral/recurrent GBM reveal

increased WNT5A/GdEC expression
Hu et al., 2016, Cell 167, 1281–1295

Article
Epigenetic Activation of WNT5A Drives Glioblastoma

Stem Cell Differentiation and Invasive Growth
Baoli Hu,1,16 Qianghu Wang,2,3,16 Y. Alan Wang,1,16,* Sujun Hua,1 Charles-Etienne Gabriel Sauvé,1 Derrick Ong,1
Zheng D. Lan,1 Qing Chang,3,9 Yan Wing Ho,1 Marta Moreno Monasterio,1 Xin Lu,1 Yi Zhong,4 Jianhua Zhang,3,9
Pingna Deng,1 Zhi Tan,5 Guocan Wang,1 Wen-Ting Liao,1 Lynda J. Corley,6 Haiyan Yan,10 Junxia Zhang,11 Yongping You,11
Ning Liu,11 Linbo Cai,12 Gaetano Finocchiaro,13 Joanna J. Phillips,14 Mitchel S. Berger,15 Denise J. Spring,1 Jian Hu,1
Erik P. Sulman,7,8 Gregory N. Fuller,6 Lynda Chin,3 Roeland G.W. Verhaak,2,3 and Ronald A. DePinho1,17,*
1Department of Cancer Biology
2Department of Bioinformatics and Computational Biology
3Department of Genomic Medicine
4Department of Epigenetics and Molecular Carcinogenesis
5Department of Experimental Therapeutics
6Department of Pathology
7Department of Radiation Oncology
8Department of Translational Molecular Pathology
9Institute for Applied Cancer Science
University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA

10Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
11Department of Neurosurgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
12Department of Oncology, Guangdong 999 Brain Hospital, Guangzhou 510510, China
13Unit of Molecular Neuro-Oncology, Fondazione IRCCS Istituto Neurologico C. Besta, 20133 Milano, Italy
14Departments of Neurological Surgery and Pathology
15Neurological Surgery
University of California, San Francisco, San Francisco, CA 94143, USA

16Co-first author
17Lead Contact
*Correspondence: yalanwang@mdanderson.org (Y.A.W.), rdepinho@mdanderson.org (R.A.D.)

SUMMARY the brain parenchyma, contributing to the lethality

of GBM.
Glioblastoma stem cells (GSCs) are implicated in
tumor neovascularization, invasiveness, and thera-
peutic resistance. To illuminate mechanisms govern- INTRODUCTION
ing these hallmark features, we developed a de novo
Glioblastoma multiforme (GBM) is a highly lethal primary brain
glioblastoma multiforme (GBM) model derived from
tumor characterized by robust neovascularization and glioma
immortalized human neural stem/progenitor cells
cell invasiveness throughout the brain parenchyma (Dunn
(hNSCs) to enable precise system-level comparisons et al., 2012; Furnari et al., 2007). Poor prognosis relates to the
of pre-malignant and oncogene-induced malignant near universal recurrence of tumors despite aggressive multi-
states of NSCs. Integrated transcriptomic and epige- modality treatment of maximal surgical resection, radiotherapy,
nomic analyses uncovered a PAX6/DLX5 transcrip- and chemotherapy (Wen and Kesari, 2008). Gliomagenesis is
tional program driving WNT5A-mediated GSC differ- driven by genetic alterations, including those targeting compo-
entiation into endothelial-like cells (GdECs). GdECs nents of the TP53-ARF-MDM2 and PTEN-PI3K-AKT pathways
recruit existing endothelial cells to promote peritu- (Cancer Genome Atlas Research Network, 2008; Brennan
moral satellite lesions, which serve as a niche support- et al., 2013; Ceccarelli et al., 2016) and can arise from the
ing the growth of invasive glioma cells away from the transformation of neural stem/progenitor cells (NSCs) (Alcantara
Llaguno et al., 2009; Zheng et al., 2008).
primary tumor. Clinical data reveal higher WNT5A
GBM possesses so-called glioblastoma stem cells (GSCs),
and GdECs expression in peritumoral and recurrent
which share many NSC features such as expression of stem
GBMs relative to matched intratumoral and primary cell markers (e.g., Nestin, CD133), self-renewal, and multi-line-
GBMs, respectively, supporting WNT5A-mediated age differentiation capacity (Furnari et al., 2007; Lobo et al.,
GSC differentiation and invasive growth in disease 2007; Singh et al., 2004). GSCs are associated with strong tumor
recurrence. Thus, the PAX6/DLX5-WNT5A axis gov- initiation potential and are thought to contribute to disease pro-
erns the diffuse spread of glioma cells throughout gression, recurrence and therapeutic resistance (Bao et al.,
A B ** C 100
KT
200 **
Colony Number
Survival Fraction (%)

-A
N
N
L
3D
3D
TR
FP
75
p5
p5
CTRL (n=5)
G
C
p53DN 100
CTRL p53DN p53DN-AKT p53DN(n=5)
pAKT(S473) 50 p53DN-AKT(n=10)
pAKT(T308)
Total AKT 0 25
pERK p<0.0001
Total ERK
0
Actin 0 10 20 30
Weeks
D F Nestin GFAP Ki67
E
N
pAKT (S473) pAKT (T308) p53DN
G Top 10 Enriched Pathways with FDR<0.25 1.0 1.2 1.4 1.6 1.8 2.0 2.2
HEMATOPOIESIS_STEM_CELL_NUMBER_LARGE_VS_TINY_UP
ENDOTHELIAL CELL MARKER CD31+ VS. CD31- UP(STEM_CELL_DN)
CD8_STEM_CELL_MEMORY_VS_NAIVE_CD8_TCELL_UP
LEUKEMIC_STEM_CELL_DN
CD8_STEM_CELL_MEMORY_VS_EFFECTOR_MEMORY_CD8_TCELL_DN
HEMATOPOIETIC_STEM_CELL_VS_CD4_TCELL_DN
HEMATOPOIETIC_STEM_CELL_VS_COMMON_LYMPHOID_PROGENITOR_DN -log(p,10)
HEMATOPOIETIC_STEM_CELL_VS_MULTIPOTENT_PROGENITOR_UP Normalized ES
GERMLINE_STEM_CELL
LIVER_CANCER_STEM_CELL_DN
H3K27me3 H3K27ac Expression

H Endothelial Cell Marker CD31+ vs. CD31- UP (STEM_CELL_DN) I hNSC iGSC hNSC iGSC Validation
Enrichment Score
Enrichment Profile
0.100
Oncogenic Activation (85 genes)
Geneset Hits
0.075 Ranking Metric Scores
0.050
0.025
0.000
Ranked List Metric
2.0 Genes upreguated in

1.0 oncogenic transformation
2.0
Zero cross at 12316
0.0
-1.0 Genes downreguated in
oncogenic transformation
-2.0
0 2500 5000 7500 10000 12500 15000 17500 20000
Figure 1. Overexpression of p53DN and myr-AKT Generates Malignant Glioma and Upregulates EC Signaling Pathway
(A) Immunoblot analysis of overexpressed oncogenes in hNSCs.
(B) Soft agar colony formation of hNSCs expressing p53DN, p53DN/myr-AKT (p53DN-AKT). Error bars represent SD of triplicate wells. **p < 0.01. Representative
images are shown.
1282 Cell 167, 1281–1295, November 17, 2016

2006; Chen et al., 2012; Zheng et al., 2008; Zhu et al., 2014). tive p53 (p53DN) and/or a constitutively active myristoylated
While GSCs exhibit differentiation capacity into glial and neu- form of AKT (myr-AKT) (Figure 1A). The hNSCs transduced
ronal lineages, their terminal differentiation capacity is markedly with both p53DN and myr-AKT (p53DN-AKT-hNSCs), but not
impaired (Hu et al., 2013; Zheng et al., 2008), and they show p53DN or myr-AKT alone, exhibited robust soft agar colony for-
trans-differentiation capacity (Cheng et al., 2013; Ricci-Vitiani mation (Figures 1B and S1B) and highly penetrant tumorigenic
et al., 2010; Soda et al., 2011; Wang et al., 2010). potential following intracranial injection in mice (Figure 1C).
The robust developmental plasticity of GSCs has also been Histopathological characterization of the p53DN-AKT-hNSCs
evidenced by their capacity to differentiation into endothelial derived tumors documented classical GBM features of high
cells (ECs), which display classic EC phenotypes in vitro and cellular density, pseudopalisading necrosis, and microvascular
have been reported to contribute to GBM vascularization in vivo hyperplasia (Figures 1D and 1E). These tumors showed a high
(Ricci-Vitiani et al., 2010; Wang et al., 2010). The genetic and proliferative index (Ki67), robust expression of glioma markers
epigenetic factors driving GSCs differentiation into ECs have (Nestin, GFAP), strong pAKT, and p53DN expression (Fig-
not been elucidated; nor is it known how GdECs might contribute ure 1F). These de novo tumors readily generated iGSCs as evi-
to the pathobiology of GBM or to clinical outcomes (Cheng et al., denced by (1) tumor-repopulating potential with as few as 200
2013; Rodriguez et al., 2012). implanted cells and median tumor latency of 15–35 weeks (Fig-
Here, we delineate mechanisms governing the aberrant devel- ure S1C); (2) robust Nestin expression; and (3) limited capacity
opmental plasticity of GSCs and its contribution to the refractory to differentiate into astrocytic and neuronal lineages (Fig-
nature of GBM. We establish a GBM model that affords a direct ure S1D). Accordingly, transduction of c-Myc, p53DN, and
comparison of genome-wide histone modifications and associ- myr-AKT in another primary human NSC line also generated
ated gene expression alterations between parental human high-grade gliomas following intracranial implantation (data
NSCs and their derivative oncogene-induced GSCs (hereafter not shown). Thus, p53 neutralization and AKT activation coop-
iGSCs), identifying PAX6- and DLX5-regulated WNT5A as a erate to transform these hNSCs into high-grade gliomas with
key factor driving iGSCs differentiation into GdECs. These classical disease features.
GdECs function, in turn, to recruit host ECs to form a vascular- To gain mechanistic insight into system-level differences be-
like niche that supports the growth of invading glioma cells in tween premalignant hNSCs and their malignant derivatives,
the brain parenchyma, a process known to contribute to disease we performed transcriptomic analysis focusing on the changes
recurrence in the clinic. of 68 stem cell-related signaling pathways from Molecular Sig-
natures Database (MSigDB) (Subramanian et al., 2005). Notably,
RESULTS gene set enrichment analysis (GSEA) revealed that upregula-
tion of EC signaling pathway was observed in p53DN-AKT
EC Signaling Pathway Enrichment in De Novo induced transformation of hNSCs (Figures 1G and 1H). Further-
Gliomagenesis via Oncogenic Transformation more, genome-wide chromatin immunoprecipitation sequencing
of Human NSCs (ChIP-seq) analysis focusing on H3K27 histone modifications in
Consistent with the critical roles of TP53 and PTEN-PI3K-AKT al- core promoter regions revealed 85 genes displaying a dynamic
terations in GBM pathogenesis (Cancer Genome Atlas Research switch from H3K27 trimethylation (me3) to H3K27 acetylation
Network, 2008; Brennan et al., 2013), GBM genomic and prote- (ac), indicating epigenetic activation during oncogenic transfor-
omic profiles from The Cancer Genome Atlas (TCGA) show sig- mation of hNSC (Figures 1I and S1E; Table S1). Interestingly,
nificant correlation between poorer prognosis and higher levels EC signaling pathway, but not HEMATOPOIESIS_STEM_CELL
of AKT activation in patients with TP53 mutations (Figure S1A). NUMBER_LARGE_VS_TINY_UP (p > 0.14), was significantly
These results are consistent with the notion that robust AKT acti- enriched (p < 0.05) in these genes, further highlighting the upregu-
vation promotes disease aggressiveness (Molina et al., 2010; lation of EC signaling pathway in glioma-relevant biological pro-
Phillips et al., 2006; Suzuki et al., 2010; Wang et al., 2004). cesses. Given the seminal finding that GSCs can differentiate
To model these pathway alterations and establish a de novo into ECs and participate in tumor vascularization (Ricci-Vitiani
human GBM model, we employed Myc-immortalized human et al., 2010; Wang et al., 2010), the above transcriptomic and
NSCs (hNSCs) that were documented to possess NSC-like epigenomic analyses prompted us to verify EC differentiation in
features including self-renewal, expression of NSC markers, our system experimentally. Fluorescence-activated cell sorting
and multi-lineage differentiation capacity (data not shown). The (FACS) analysis of EC markers revealed that 12.8% and 8.5%
hNSCs were infected with lentiviruses encoding dominant-nega- of iGSCs under NSC culture conditions expressed VE-Cadherin
(C) Kaplan-Meier survival analysis for oncogenic transformation of hNSC in vivo.

(D) Representative H&E image of intracranial tumor derived from p53DN-AKT-hNSCs; scale bars, 1 mm.
(E) Representative H&E image of tumor sections with necrotic area (N) and microvascular hyperplasia (black arrow). Scale bars, 50 mm.
(F) IHC staining of tumors with the indicated antibodies. Scale bars, 50 mm.
(G) Top ten signaling pathways related to hNSC oncogenic transformation were identified by GSEA analysis based on gene expression profiles of hNSCs and their
derivative cells. The normalized enrichment scores (ES) and the log transformed p values are shown.
(H) GSEA enrichment plots of genes ranked based on oncogenic transformation versus EC signaling pathway.
(I) Heatmap of histone landscape of gene transcriptional start sites (TSSs) within ±2 kb and of Log2-ratio of these gene expression levels in hNSCs and iGSCs.
See also Figure S1 and Table S1.
Cell 167, 1281–1295, November 17, 2016 1283

A IgG-FITC/IgG-PE CD133+/CD144+ B p<0.001 C 30
3 20 CD133+/CD144-
Relative mRNA Level

0.0 0.0 92.8 0.4
10 CD133+/CD144+
CD133+/CD144+
Fold Change of
2 8
Vector
6
4
0.0 0.0 1
2
0.0 0.0 93.4 0.5 0

0
34
E2
F
31
14
10
FR
vW
D
DMSO RAPA
TI
D
D
C
G
C
C
C
CD133-PE
VE
p53DN
D
Dil-AcLDL/DAPI
CD105 VEGFR2 vWF
0.0 0.0
NSC media
0.0 0.3 54.1 11.8
p53DN-AKT
EC media
0.0 0.2
CD144-FITC
6
6
3
2
7
54
57
58
60
11
14
E CD133+/CD144- CD133+/CD144+ CD133+/CD144+ +RAPA F
BT
BT
TS
TS
TS
TS
pAKT (Ser473)
AKT
p-p70S6K (Thr389)
p70S6K
pS6 (Ser235/236)
S6
Actin
G 80 H 40 I 1.5
CD133+/CD144- GFP myr-AKT DMSO RAPA
CD133+/CD144+ ** **
**
CD133+/CD144+
CD133+/CD144+
60
Fold Change of
Fold Change of
30
Percentage (%)
40 20 **
0.5
20 10 **
0 0 0
TS543 TS576 TS586 BT112 TS603 BT147 TS543 TS576 TS586 TS603 BT147
Figure 2. Activation of AKT Pathway Induces Differentiation of GSCs into ECs

(A) FACS analysis of hNSCs, p53DN-transduced hNSCs, and p53DN-AKT-hNSCs based on CD133 and CD144 expression.
(B) Fold change of percentage of CD133+/CD144+ cells by FACS analysis in p53DN-AKT-hNSCs under treatment with rapamycin (RAPA, 50 nM) for 72 hr.
(C) qRT-PCR for indicated EC markers expression in two sorted subpopulations from p53DN-AKT-hNSCs.
(D) IF analysis of sorted CD133+/CD144+ from p53DN-AKT-hNSCs cultured under NSC or EC media for 5 days for EC markers expression and DiI-AcLDL uptake.
Scale bar, 40 mm.
(E) Tubular networks formation of sorted CD133+/CD144+ and CD133+/CD144– cells from p53DN-AKT-hNSCs cultured on Matrigel in EC media with/without
RAPA (50 nM) treatment. Scale bar, 100 mm.
(F) Immunoblot analysis of AKT/mTOR pathway activation in patient-derived GSCs.
1284 Cell 167, 1281–1295, November 17, 2016

(CD144) and PECAM-1 (CD31), respectively (Figure S1F). More- 2G) and showed considerably greater tube-forming ability (Fig-
over, these iGSCs also displayed high levels of classical EC ure S2G). In contrast, three lines with lower levels of activated
markers and possessed functional EC features such as fluores- AKT downstream signaling (TS543, TS576, and TS586) had
cent acetylated-low density lipoprotein (DiI-AcLDL) uptake under lower percentages of CD133+/CD144+ cells (Figures 2F and
EC culture conditions (Figures S1G and S1H). Consistently, we 2G). Enforced myr-AKT expression in these three GSC lines
also observed GdECs in tumors derived from p53DN-AKT- significantly increased the fraction of CD133+/CD144+ cells (Fig-
hNSCs (Figure S1I). Together, these findings of an EC signature ure 2H). Reciprocally, rapamycin inhibition of AKT pathway
and phenotypic features establish that GSC can differentiate decreased the fraction of CD133+/CD144+ cells in TS603 and
into EC in our model system. BT147 cells (Figure 2I). Together, these results indicate that
robust AKT activation plays a key role in driving GSC differentia-
AKT Activation Plays a Key Role in Endothelial Lineage tion with EC-like properties.
Differentiation of GSC
The association of AKT activation in hNSC transformation and AKT Activation Upregulates WNT5A to Drive GdEC
EC signature enrichment prompted us to directly assess the Differentiation of GSC
potential role of AKT in driving EC differentiation. To that Given the key role of AKT activation in the transformation of
end, immunofluorescence (IF) analysis showed that NSCs ex- hNSCs and endothelial lineage differentiation of GSC, coupling
pressing p53DN plus myr-AKT, but not p53DN alone, expressed with the association of high-AKT activation with poor prognosis
CD144 and CD31 (Figure S2A). Correspondingly, FACS analysis (Suzuki et al., 2010), we identified a list of genes associated
showed that p53DN-AKT-hNSCs expressed CD133 and CD144, with high AKT activation from our oncogene-induced hNSC
which together are known to mark GSC-derived endothelial system (Table S3). To identify genes mediating AKT-induced
progenitor cells (Wang et al., 2010); in contrast, p53DN-hNSCs endothelial lineage differentiation, we intersected these high
expressed CD133 but not CD144 (Figure 2A). Finally, pharmaco- AKT-associated genes with 85 genes displaying histone modifi-
logical inhibition of AKT signaling with mTOR inhibitor rapamycin cation switch from H3K27me3 to H3K27ac, known to play a
decreased the percentage of CD133+/CD144+ cells in vitro. pivotal role in lineage commitment and cell fate determination
(Figure 2B). (Adam et al., 2015). Thus, we identified eight upregulated genes
To reinforce the link between CD133+/CD144+ cells and EC (CXCL14, DLX5, DMRT3, GPR37, MYLIP, NUDT14, TCF7, and
biology, p53DN-AKT-hNSCs were sorted into CD133+/CD144– WNT5A) that might be involved in promoting endothelial lineage
and CD133+/CD144+ subpopulations. Compared to CD133+/ differentiation.
CD144– cells, CD133+/CD144+ cells showed significantly higher To explore this supposition, each gene was transduced
expression levels of CD31, CD34, TIE2, VEGFR2, and von Wille- into p53DN-hNSCs and monitored for generation of CD133+/
brand factor (vWF) by qRT-PCR (Figure 2C). On the functional CD144+ cells. Compared to myr-AKT, only WNT5A and
level, culturing CD133+/CD144+ cells in EC media for 5 days re- DLX5 overexpression generated a considerable percentage of
sulted in DiI-AcLDL uptake in cells expressing CD105, VEGFR2, CD133+/CD144+ cells in p53DN-hNSCs (Figures 3A and 3B).
and vWF (Figures 2D and S2B), which was also inhibited by rapa- Conversely, small hairpin RNA (shRNA)-mediated knockdown
mycin (Figure S2C). Moreover, when grown in matrigel cultures, of the eight genes showed that only WNT5A knockdown sub-
CD133+/CD144+ cells, but not CD133+/CD144– cells, were able stantially impaired tubular network formation of CD133+/
to form tubular networks and displayed DiI-AcLDL uptake (Fig- CD144+ cells sorted from p53DN-AKT-hNSCs (Figures 3C and
ures 2E and S2D), which was abolished by rapamycin (Figure 2E). 3D). Notably, myr-AKT also dramatically increased WNT5A
Importantly, transcriptomic analysis revealed that the level of the expression (Figures S3A and S3B). Furthermore, the WNT5A
EC signature from MSigDB exhibited a stepwise increase in antagonist, BOX5, significantly inhibited the production of
these sorted cell fractions, from CD133–/CD144– to CD133+/ CD133+/CD144+ cells in p53DN-hNSCs transduced with myr-
CD144– to CD133+/CD144+ to CD133–/CD144+, progressing AKT or WNT5A (Figures 3E, S3C, and S3D). Finally, BOX5 treat-
toward the signature of bona fide endothelial cells (Figure S2E; ment blocked tubular network formation of CD133+/CD144+
Table S2). This stepwise differentiation process was further vali- cells sorted from p53DN-AKT-hNSCs (Figures 3F, 3G, and
dated in these sorted subpopulations by IF staining of VEGFR2 S3E). Together, these results indicate that AKT-mediated upre-
and endothelial nitric oxide synthase (eNOS), which play impor- gulation of WNT5A plays a pivotal role in the GdEC differentiation
tant roles in vasculature biology (Förstermann and Münzel, 2006) of GSC.
(Figure S2F).
We further tested whether the level of activated AKT down- Regulation of WNT5A Expression by the Opposing
stream signaling in patient-derived GSCs correlated with EC Actions of DLX5 and PAX6
differentiation. Analysis of six GSC lines showed that two lines Chromatin landscape and transcriptome comparisons between
with relatively higher pS6 expression (TS603, BT147) exhibited hNSCs and iGSCs established that, in hNSCs with no WNT5A
a higher percentage of CD133+/CD144+ cells (Figures 2F and expression, the WNT5A promoter exhibited a poised (bivalent)
(G) FACS analysis of CD133+/CD144+ cells in the indicated GSCs.

(H) FACS analysis of CD133+/CD144+ cells in the indicated GSCs with myr-AKT overexpression.
(I) FACS analysis of CD133+/CD144+ cells in the indicated GSCs treated with RAPA (50 nM) for 72 hr.
Error bars represent SD of the mean of two (C and G) or three (B, H, and I) independent experiments. **p < 0.01. See also Figure S2 and Table S2.
Cell 167, 1281–1295, November 17, 2016 1285

A B
IgG-FITC/IgG-PE CD144/CD133 IgG-FITC/IgG-PE CD144/CD133 12 **
CD133+/CD144+ (%)
0.07 0.58 0.07 0.70 **
8
GPR37
Vector
4 **
0
0.08 10.3 0.14 1.51
LIP
T
X5
F7
r
7
3
A
14
14
cto
Myr-AKT
AK
R3
RT
T5
TC
CL
DT
DL
MYLIP
MY
Ve
GP
WN
DM
CX
NU
C
Scramble shCXCL14 shDLX5
0.06 0.61 0.21 1.37

CXCL14
NUDT14
CD133-PE
shDMRT3 shGPR37 shMYLIP
0.06 3.10 0.01 0.38

shNUDT14 shTCF7 shWNT5A
TCF7
DLX5
D 30
Branch Points
Number of
0.05 0.19 0.02 6.73
20
WNT5A
DMRT3
10
**
0
X5
F7
LIP
ble
7
3
A
14
14
R3
RT
T5
TC
CL
DL
DT
ram
MY
GP
WN
DM
CX
sh
sh
NU
CD144-FITC
sh
Sc
sh
sh
sh
sh
sh
E F DMSO BOX5 G
CD133+/CD144+ (%)
** **
12 DMSO BOX5 30
Branch Points
Number of
** 20
8
4 10
0 0
OE myr-AKT OE WNT5A DMSO BOX5
Figure 3. AKT-Driven WNT5A Upregulation in GdECs Differentiation of hNSCs

(A) FACS analysis for the percentage of CD133+/CD144+ cells in7 days post-infection p53DN-hNSCs cells by lentivirus carrying the indicated genes individually.
(B) Quantitation of the percentage of CD133+/CD144+ cells in (A) from four independent experiments.
(C) Matrigel tubular network formation of the sorted CD133+/CD144+ cells from p53DN-AKT-hNSCs with infection by lentivirus carrying pooled short hairpins
(minimum three shRNAs) targeting each indicated gene.
(D) Quantitation of the number of tubular networks branch points in (C) (n = 5).
(E) FACS analysis of CD133+/CD144+ cells in p53DN-hNSCs overexpressing myr-AKT or WNT5A with BOX5 treatment (50 mM) for 72 hr. (n = 3).
(F) Representative images for the tubular network of sorted CD133+/CD144+ cells from p53DN-AKT-hNSCs with BOX5 treatment. Scale bar, 100 mm.
(G) Number of branch points calculated in (F) (n = 5). Error bars represent SD of the mean; **p < 0.01.
See also Figure S3 and Tables S1 and S3.
chromatin status defined by both H3K4me3 and H3K27me3 To further explore the mechanisms governing the transcrip-
marks (Bernstein et al., 2006; Figures 4A, 4B, S3A, and S3B). tional regulation of the WNT5A locus under AKT activation,
In contrast, the WNT5A promoter of WNT5A-expressing iGSCs TCGA proteomic datasets analyses (RPPA) further confirmed
exhibited an active H3K27ac mark with concomitant loss of the the correlation between WNT5A mRNA levels and the mTOR/
repressive H3K27me3 mark (Figures 4A and 4B). These patterns S6K pathway (Figure S4A). We next identified a significant
are consistent with the poised WNT5A promoter being epigenet- negative correlation between WNT5A expression and known
ically activated during transformation. master transcription factors of NSC self-renewal and lineage
1286 Cell 167, 1281–1295, November 17, 2016

A chr3 10 kb (hg19) B 1 kb (hg19)
20.9 _ 55,525,000 chr3 55,522,000
hNSC
20 _
H3K27me3
iGSC
20 _
hNSC R2 R1
H3K27ac 19.1
_
R1 P
P
iGSC
21.6
_
hNSC
36.1
_
H3K4me3
iGSC
30.4 _
hNSC
27.7
_
H3K4me1,2
iGSC
WNT5A WNT5A
chr11 AAGTCGTCAGTGAA GGTAATTAGG
C _
AATT
10 kb (hg19) 31,840,000 2.0 2.0
hNSC
T A GCGT
Bits
Bits
12.6 1.0
AA 1.0
_
H3K27me3 AA
C
C T G G
T
C
A
G
iGSC 0.0 CGGA C
0.0
G
C C
A
A
T
G A
GG
TTT
C G
AA
C
T
T
G
A CC A A
T A T TGT TC C
G
CC G
5 10
C C G C
T T
26.5 _ 5 10
PAX6 Binding Motif DLX5 Binding Motif
hNSC
16.4 _
H3K27ac
iGSC D F
55.7 _
30 30
hNSC
Relative Fold Enrichment
Relative Fold Enrichment

PAX6 DLX5
39 _
H3K4me3 Input Input

iGSC
61.8 _
20 20
hNSC
33.4 _
H3K4me1,2 iGSC
10
10
PAX6
chr7 20 kb (hg19) 96,660,000 0
E R1 R2 P n
22.4 _
0
xo
WNT5A WNT5A_3 PAX2_p

_e
hNSC
TB
_
AC
H3K27me3
iGSC G H
Relative WNT5A mRNA Level
Relative WNT5A mRNA Level

1.5 12 ** **
hNSC
24.4
H3K27ac
_
**
iGSC ** *
24.2 _
1 8 *
hNSC
H3K4me3 47.7 _
iGSC
32.6 _
0.5 4
hNSC
H3K4me1,2 31.8 _
iGSC 0 0
iGSC TS603 BT147 iGSC TS543 TS576
DLX6 DLX5 Vector PAX6 OE Vector DLX5 OE
Figure 4. Transcriptional Activation of WNT5A by PAX6 and DLX5

(A) ChIP-seq analysis of chromatin status for WNT5A locus around TSS in hNSC and iGSC.
(B) PAX6 and DLX5 binding motifs in WNT5A regulatory regions.
(C) Chromatin modification changes from hNSC to iGSC for PAX6. The peak of H3K27me in iGSC is highlighted in sky blue color.
(D) Binding of PAX6 in WNT5A regulatory regions in hNSC by ChIP-PCR. Beta-actin locus (ACTB_exon) was used as the negative control (n = 3).
(E) Chromatin modification changes from hNSC to iGSC in DLX5-DLX6 locus.
(F) Binding of DLX5 in WNT5A regulatory regions by ChIP-PCR. PAX2 was used as the control for non-specific binding (n = 3).
(G) WNT5A expression by qRT-PCR analysis in GSCs and iGSC-overexpressing PAX6 (n = 3).
(H) WNT5A expression by qRT-PCR analysis in GSCs and iGSC-overexpressing DLX5 (n = 3). Error bars represent SD of the mean; *p < 0.05 and **p < 0.01.
See also Figure S4.
determination including Gli2, FoxG1, SOX2, PAX4/6, and HES1 sive H3K27me3 mark following transition from hNSCs to iGSCs
in this specific context (Figure S4A). These findings indicate (Figures 4C and S4B–S4G). Correspondingly, the WNT5A locus
that downregulating the neurogenesis TFs may be necessary possesses PAX6 binding motifs located in regulatory region 1
for EC lineage differentiation of GSC. Moreover, only the PAX (R1), regulatory region 2 (R2), and promoter region (P) (Figures
subclass (PAX4 and PAX6) promoter exhibited a gain in repres- 4A and 4B), which were further validated by ChIP-PCR in hNSCs
Cell 167, 1281–1295, November 17, 2016 1287

(Figure 4D). Consistent with implied negative regulation of PAX6 peritumoral regions of gliomas derived from WNT5A-TS543 (Fig-
on WNT5A expression, CD133+/CD144+ cells sorted from iGSCs ures 5C and 5D). Of note, there was a higher number of GdECs in
showed negligible PAX6 and high WNT5A expression compared the peritumoral regions compared to intratumoral regions (Fig-
to CD133+/CD144– cells (Figure S4H). ures 5C–5E). Thus, WNT5A drives GdEC differentiation, which
As noted previously, enforced DLX5 expression in p53DN- is associated with an increase in tumor neovascularization and
hNSCs produced CD133+/CD144+ cells (Figures 3A and 3B). an increase in peritumoral satellite lesions, which may provide
Notably, the WNT5A promoter possesses a DLX5 binding motif a microenvironment to promote the growth of invading glioma
in close proximity to the PAX6 binding site (Figure 4B). Intrigu- cells throughout the brain parenchyma.
ingly, the locus harboring DLX5 exhibited a poised pattern in To ascertain the tumor biological significance of these
hNSCs and switched to an epigenetically activated pattern in WNT5A-mediated phenotypes, the herpes simplex virus thymi-
iGSCs (Figure 4E). ChIP-PCR validated DLX5 binding to the dine kinase gene (HSVTK)/ganciclovir (GCV) cell ablation system
WNT5A promoter region in these iGSCs (Figure 4F). Finally, we was used to selectively eliminate GdECs in vivo. To that end, we
solidified PAX6 and DXL5 in the opposing regulation of WNT5A constructed a vector encoding an HSVTK-GFP fusion protein
by demonstrating that enforced PAX6 expression reduced under control of the CD144 promoter (hereafter, pCD144-GFP)
WNT5A mRNA and protein levels (Figures 4G and S4I), whereas (Figure S5I). Following pCD144-GFP transduction into TS543
enforced DLX5 expression increased WNT5A mRNA and protein and WNT5A-TS543 GSCs, FACS detected 0.61% and 6.98%
levels in iGSCs and patient-derived GSCs (Figures 4H and S4J). GFP+ cells, respectively (Figures S5J and S5K). Next, 1 week
In line with our experimental observations, analysis of TCGA following orthotopic implantation of pCD144-GFP-transduced
GBM gene expression and proteomic profiles showed that WNT5A-TS543 GSCs, mice were treated with GCV resulting in
WNT5A and DLX5 were positively associated, and PAX6 nega- increased apoptosis in pCD144-GFP+ GdECs relative to controls
tively associated, with activation of the mTOR/S6K pathway (Fig- (Figure S5L). Tumors from GCV-treated animals showed overall
ure S4A). These results support the view that both PAX6 and reduction in intratumoral MVD detected by CD34 staining and a
DLX5 are repressed and activated, respectively, in response to modest increase in mouse survival (Figures 5F, 5G, and S5M).
AKT signaling leading to an epigenetic switch of the WNT5A lo- Notably, while the depletion of GdECs by GCV showed similar in-
cus, its transcriptional activation in GSC, and promotion of GdEC tratumoral size, it dramatically decreased satellite lesions and
differentiation of GSC. These results also align with our observa- invasiveness in peritumoral areas (Figures 5H and S5N), support-
tion that DLX5 silencing alone did not impair tubular network for- ing a key role for GdECs in tumor invasive growth.
mation of GdECs (Figures 3C and 3D), indicating that both the
opposite actions of DLX5 and PAX6 are necessary to regulate WNT5A-Mediated GdECs Recruitment of
WNT5A-mediated GdEC differentiation of GSC. We propose Non-transformed ECs Promotes GSCs Self-Renewal
that AKT activation upregulates WNT5A, which promotes EC and Invasive Growth
proliferation and differentiation in neovascularization (Cheng We next investigated the role of GdECs in peritumoral satellite
et al., 2008; Masckauchán et al., 2006; Yang et al., 2009), thus lesion formation with a specific emphasis on whether these
enabling GSC aberrant developmental plasticity and differentia- peritumoral satellite lesions might support the growth of invading
tion into GdEC (Figures S4K and S4L). glioma cells in the periphery. We observed that the higher fre-
quency of GdECs in tumors derived from pCD144-GFP-trans-
WNT5A-Mediated Endothelial Differentiation of GSCs in duced WNT5A-TS543 correlated with higher MVD. Moreover,
Tumor Invasive Growth host mouse ECs (CD34+/TRA-1-85–) were in close proximity to
To address whether WNT5A-mediated endothelial differentiation GdECs in peritumoral areas (Figures 6A, 6B, and S6A–S6C),
of GSCs plays a functional role in gliomagenesis in vivo, we raising the possibility that GdECs may recruit host ECs to
next employed a patient-derived GSC orthotopic tumor model form peritumoral satellite lesions. To assess this possibility, we
that would be more directly relevant to the human pathological performed transwell assays using GdECs sorted from pCD144-
condition. TS543 GSC derived tumors had higher levels of GFP-transduced WNT5A-TS543 and GSC TS603 (endogenous
PAX6 and lower levels of pS6, WNT5A, and DLX5 compared WNT5A), respectively, and demonstrated increased recruit-
with GBMs derived from p53DN-AKT-hNSCs (Figure S5A). In ment of human brain microvascular endothelial cells (HBMECs)
the TS543 model, enforced WNT5A expression (WNT5A- compared with non-GdECs sorted from these GSCs (Figures
TS543) generated tumors with more rapid growth and shorter 6C and 6D). Furthermore, WNT5A mRNA levels were higher
latency relative to Vector-TS543 controls (Figures S5B–S5D). in GdECs subpopulations than non-GdEC subpopulations
WNT5A-TS543 gliomas were highly hemorrhagic (Figure 5A), sorted from pCD144-GFP-transduced WNT5A-TS543 and
showed increased microvascular density (MVD) and exhibited GSC TS603 cultures (Figure 6E). We next determined whether
increased expression of endothelial markers (Figures 5A, S5E, WNT5A directly mediates EC recruitment. In the transwell assay,
and S5F). WNT5A-TS543 gliomas were strikingly more inva- recombinant WNT5A (rWNT5A), but not rWNT3A, acted as a
sive, generating many distant satellite lesions in the peritumoral chemoattractant and significantly recruited HBMECs, which
brain parenchyma that were evident on histologic examination was drastically impaired in the presence of WNT5A antagonist
(Figure S5G) and confirmed by human-specific antigen (TRA-1- BOX5 (Figure 6F). Importantly, WNT5A increased HBMECs
85/CD147) IF staining (Figures 5B and S5H). Finally, WNT5A proliferation and survival in serum-free medium (Figure S6D).
promoted endothelial differentiation in vivo as evidenced by Together, these results indicate that GdECs-derived WNT5A
increased CD34+/TRA-1-85+ GdECs in the intratumoral and can stimulate EC recruitment and proliferation. Furthermore, in
1288 Cell 167, 1281–1295, November 17, 2016

A Bright field H&E CD34 WNT5A B
Vector
Vector
DAPI/TRA-1-85
WNT5A OE
WNT5A OE
C Vector WNT5A OE D F
p=7.94e-05 GCV(-) GCV(+)
( )
Intratumor
p=7.39e-05
DAPI/TRA-1-85/CD34
30
TRA-1-85+/CD34+(%)
Tumor-1
20
p=0.004
10
Peritumor
Tumor-2
Intratumor Peritumor
Vector WNT5A OE
E CD34 /DAPI TRA-1-85 /DAPI Merge

Tumor-3
G H
80
p<0.0001
GCV(-)
CD34 Microvascular
DAPI/TRA-1-85
Density (MVD)
60
40
20
GCV(+)
0
GCV(-) GCV(+)
Figure 5. WNT5A-Mediated Endothelial Lineage Differentiation in Tumor Neovascularization and Satellite Lesion Formation
(A) Representative images for the hemorrhage lesion in mouse brain that received injection of TS543-overexpressing WNT5A (WNT5A OE) versus control (Vector).
H&E and IHC analyses of tumor sections show the microvascular hyperplasia (black arrows) and expression of CD34 and WNT5A. Scale bar, 50 mm.
(B) Representative images for the satellite lesions in peritumoral areas. Scale bar, 200 mm.
(C) Representative images for GdECs (yellow arrows) identified by co-staining with TRA-1-85 and CD34 in intratumoral and peritumoral areas. Scale bar, 25 mm.
(D) Quantitation of TRA-1-85+/CD34+ cells using Vectra software system (n = 3 tumors).
(E) High magnification of rectangle area in (C). Scale bar, 10 mm.
(F) IHC staining of CD34 in intracranial tumors derived frompCD144-GFP infected WNT5A-TS543following GCV treatment. Representative images of low (scale
bar, 100 mm) and high (scale bar, 50 mm) magnification.
(G) Dotplots for quantitation of MVD in tumors with/without GCV treatment (n = 4 tumors, five fields per tumor).
(H) Representative images for tumor appearance (left, scale bar, 2,000 mm) and peritumoral satellite lesions (right, scale bar, 200 mm).
See also Figure S5.
Cell 167, 1281–1295, November 17, 2016 1289

A B 30 C
p<0.0001
DAPI/TRA-1-85 DAPI/pCD144-GFP DAPI/CD34 Merge HBMEC
Distance from
Host EC (μm)
20
10
0
TRA-1-85+/ TRA-1-85+/ e.g., pCD144-GFP+
pCD144-GFP- pCD144-GFP+
TS543-WNT5A TS603 ** **
D 3 ** * E * ** F
4 **
Fluorescence Intensity
3
Relative mRNA level
of Invaded HBMECs
Fluorescence Intensity
of Invaded HBMECs
**
3
2 **
2
2
1 1
1
0 0 0
TS543-WNT5A TS603 CD144 WNT5A CD144 WNT5A
X5 A+
A
A
S
TR
T5
T3
FB
BO T5
pCD144-GFP- pCD144-GFP+ pCD144-GFP- pCD144-GFP+
N
4%
N
rW
rW
rW
G H
DAPI/TRA-1-85 DAPI/pCD144-GFP DAPI/CD34 Merge pCD144-GFP- pCD144-GFP+
pCD144-GFP+/HBMECs HBMECs
Large Lesion Small Lesion
Neurosphere Formation
** *
* ** ** **
2
0
TS543 TS603
Figure 6. Recruitment of Host ECs by WNT5A-Mediated GdECs Contributes to GSCs Self-Renewal and Proliferation
(A) Representative images of IF analysis for GdECs (green arrows), compared with tumor cells (red arrows), are in close proximity to mouse ECs (white arrow) in
tumor sections. Scale bar, 10 mm.
(B) Dotplots show the distance from mouse ECs to the nearest tumor cells and GdECs, respectively (n R 15).
(C) Illustration of the transwell system to measure EC recruitment.
(D) Fluorescence intensity shows HBMECs recruitment after co-culture with GdECs for 24 hr (n R 3).
(E) qRT-PCR for CD144 and WNT5A mRNA levels in sorted pCD144-GFP– and pCD144-GFP+ from TS543-WNT5A and TS603 (n = 3).
(F) Fluorescence intensity shows HBMECs recruitment after co-culture with NSC media containing rWNT5A (0.5 mg/ml) or rWNT3A (0.05 mg/ml) (n = 3).
(G) Representative images of GdECs (green arrows) and mouse ECs (white arrows) in variously sized satellite lesions. Scale bar, 20 mm.
(H) Neurosphere formation of TS543 or TS603 co-cultured with GdECs and HBMECs (n = 3). Cartoon depicting the experimental approach. Error bars represent
SD of the mean; *p < 0.05 and **p < 0.01.
See also Figure S6.
GBM sections, pCD144-GFP+ GdECs were consistently in close GSCs in the presence of GdEC + HBMEC co-cultures. Strikingly,
proximity to host ECs (CD34+/TRA-1-85–) in the peritumoral sat- only GdEC/HBMEC co-cultures, but not GdEC or HBMEC cul-
ellite lesions; and the larger satellite lesions possessed greater tures, increased sphere formation of GSC TS543 and TS603
numbers of GdECs and mouse host ECs (Figure 6G). Addition- (Figures 6H and S6E). These co-cultures also increased soft
ally, GCV-mediated depletion of GdECs resulted in diminished agar colony formation of TS543 and TS603 (Figures S6F and
satellite lesion formation (Figure 5H), although individual SOX2 S6G). These observations gain added significance in light of
positive GSCs were still present throughout peritumoral area emerging evidence for the crucial role of ECs in NSC/GSC niche
(data not shown). These observations suggest that GdECs are formation that supports NSC/GSC growth and survival (Calabr-
required for the maintenance and expansion of the peritumoral ese et al., 2007; Shen et al., 2004; Zhu et al., 2011). Together,
satellite lesions, prompting us to speculate that GdECs recruit these observations support our model that GSC differentiation
host ECs, which may act synergistically to provide a microenvi- into GdEC stimulates host EC recruitment via WNT5A to create
ronment that supports the growth and survival of GSCs in these a vascular-like niche supporting GSC growth and survival,
peritumoral areas. To test this hypothesis, we audited tumor thereby promoting tumor cells growth beyond the primary tumor
sphere formation to check proliferation and self-renewal of microenvironment.
1290 Cell 167, 1281–1295, November 17, 2016

WNT5A-Mediated GdEC in Human GBM Recurrence cally stained and showed significantly higher levels of WNT5A
To investigate the clinical relevance of our findings, we asked and CD31 (marker for GdEC and EC) in recurrent tumors com-
whether the WNT5A-mediated process of GdEC biology is oper- pared to their paired primary tumors (Figures 7J, S7J, and S7K).
ative in GBM patient specimens. First, we documented that Furthermore, the significantly higher frequency (p = 5.5e-7) of
WNT5A mRNA levels were significantly higher in GBM tumors GdECs at recurrence was systematically and accurately identified
than in non-tumor brain tissues (Figure S7A). Second, we docu- by an automated quantitative pathology imaging system (Fig-
mented the presence of GdECs (SOX2+/CD31+, SOX2+/CD105+, ure 7K). Most importantly, comprehensive transcriptome analysis
or CD133+/CD31+) and established a significant correlation be- on 81 paired primary/recurrent IDHwt GBMs validated increased
tween high WNT5A expression and increasing frequency of WNT5A expression and GdEC signature in recurrent GBMs
GdECs in human GBMs (Figures 7A, S7B, and S7C). Moreover, compared to paired primary GBMs (Figures 7L and S7L). Pairwise
GdECs were noted to situate close to host ECs (SOX2–/CD31+, comparisons also displayed the strong association of both GdEC
SOX2–/CD105+, or CD133–/CD31+) (Figure 7A), which was veri- and EC signature with WNT5A in recurrent GBMs (Figures 7M and
fied by objective proximity measurements of GdECs and host S7M). Collectively, these data strongly support our experimental
EC in tumor sections that were double-stained (immunohisto- findings that WNT5A-mediated GdEC differentiation contributes
chemistry [IHC]) and assessed by an automated quantitative to peritumoral satellite lesion formation and tumor recurrence in
pathology imaging system (Figures 7B and 7C). To further verify human GBM (Figure S7N).
these findings in large-scale human GBM datasets, we gener-
ated a GdEC signature by integrated analyses of transcriptomic DISCUSSION
profiling from our de novo GBM model and EC signature from
MSigDB, which included genes upregulated in both neoplastic In this study, we generated a de novo human GBM model
and the EC signaling process (Tables S2 and S4). Based on enabling precise comparison of chromatin and transcriptomic
364 primary IDHwt GBM from TCGA datasets, we found that changes in the malignant transformation of human NSCs into
both GdEC and EC signatures were positively associated with GSCs. Our efforts to understand the mechanisms governing
WNT5A mRNA expression (Figures 7D and S7D). Together, GSC hallmark features and their contributions to GBM’s clinical
these human GBM data strongly align with our experimental find- properties resulted in identification of the opposing actions of
ings of WNT5A-directed GdEC differentiation and associated DLX5 activation and PAX6 repression of WNT5A transcrip-
host EC recruitment in GBM. tion, which, in turn, drives a differentiation program producing
As shown in previous studies (Ricci-Vitiani et al., 2010; Wang GdECs. Together with recruited host ECs, these GdECs support
et al., 2010), we observed that GdEC (SOX2+/CD31+) was glioma cell growth and invasion in the surrounding brain paren-
incorporated into blood vessels in human GBM tumor sections chyma—tumor biological properties that are intimately associ-
(Figure S7E). Importantly, we observed (1) peritumoral satellite ated with glioma recurrence in patients.
lesions in GBM patient samples, (2) GdECs (CD31+/SOX2+) in Transcriptional regulatory networks known to regulate stem
close proximity to host ECs (CD31+/SOX2–) in these structures, cell plasticity and lineage determination under physiological
(3) larger satellite lesions possessed greater numbers of GdECs conditions are shown here to be hijacked to mediate GdEC dif-
and host ECs (Figures 7E and 7F). To investigate whether ferentiation of GSC in gliomagenesis. Specifically, our identifi-
WNT5A expression is associated with peritumoral satellite cation and functional validation of WNT5A in this process is
lesions and patient outcome, 14 primary GBMs with progres- consistent with previous work showing that WNT5A can pro-
sion-free survival (PFS) information were stained for WNT5A, mote embryonic stem cell differentiation into EC lineage during
revealing that higher levels of WNT5A were associated with normal vascular development, and can regulate EC prolifera-
increased number of peritumoral satellite lesions and with a tion, migration, and survival in angiogenesis (Cheng et al.,
tendency to develop recurrent tumors with a shorter PFS (Fig- 2008; Masckauchán et al., 2006; Yang et al., 2009). Interest-
ures 7G and S7F). Strikingly, using transcriptomic profiling ingly, PAX6 can function as a tumor suppressor and inhibit
from a previous study (Sottoriva et al., 2013), we found that angiogenesis and invasion in glioma (Mayes et al., 2006;
WNT5A and GdEC signature are significantly higher in the peri- Zhou et al., 2005). Furthermore, DLX5 has been shown to regu-
tumoral regions compared with matched intratumoral regions late WNT5A expression in CNS development and DLX5 expres-
for GBM patients (Figures 7H and S7G). Using another RNA- sion has been observed in CD133+ GBM cells (Liu et al., 2009;
seq dataset from previous study (Gill et al., 2014), we observed Paina et al., 2011). Finally, AKT enhances protein stability and
a dramatic increase of WNT5A expression and GdEC signature transcriptional activity of DLX5 (Jeong et al., 2011); and AKT
in nonenhancing (NE) regions versus contrast-enhancing (CE) activation also upregulates CCCTC binding factor, which can
regions from 27 different GBM patients (Figures S7H and epigenetically repress PAX6 transcription via promoter methyl-
S7I). These findings reinforce the key role of WNT5A and ation (Gao et al., 2007, 2011). Collectively, these reports along
GdEC in the peritumoral disease and support the mechanism with our findings, strongly substantiate a role of the AKT-DLX5/
of their cooperative role in disease recurrence. Furthermore, PAX6-WNT5A axis in regulation of aberrant developmental on-
multiple variable COX analysis clearly demonstrates that cobiology, which plays a key role in GBM’s lethal pathophysi-
WNT5A is an independent prognostic factor for PFS in GBM ology (Figures S4K and S4L).
patients (Figure 7I; Tables S5 and S6). The frequency and function of GSC differentiation into GdEC
To validate WNT5A/GdEC in tumor recurrence, 14 paired pri- have been a source of controversy in the GBM field (Cheng
mary/recurrent GBMs tumor sections were immunohistochemi- et al., 2013; Ricci-Vitiani et al., 2010; Rodriguez et al., 2012;
Cell 167, 1281–1295, November 17, 2016 1291

A B C
120
DAPI/SOX2 DAPI/CD31 SOX2/CD31 Merge
Wilcoxon rank
Raw Image
100
Distance from Host EC
(SOX2-/CD31+) (μm)
test p< 2.2e-16
80
DAPI/SOX2 DAPI /CD105 SOX2/CD105 Merge
60
40
Score Map
20
DAPI/CD133 DAPI/CD31 CD133/CD31 Merge
0
GSC GdEC
(SOX2+/CD31-) (SOX2+/CD31+)
D E G
2000
WNT5A Highest PFS (Median

Case #15 staining satellite survival
index (%) lesion count months)
GdEC Signature Score
low(n=7) 2/7 19.5

1500
high(n=7) 5/7 10.9 *

1000
F
SOX2/CD31/Hematoxylin
500
Spearman Correlation Test

rho=0.125, p=0.0174
0
-0.5 0.0 0.5 1.0 1.5 2.0

WNT5A Expression
Normalized WNT5A Expression WNT5A/CD31/Hematoxylin
H I J
P55
0.5
1.0
Low WNT5A mRNA level (n=46)

P41 High WNT5A mRNA level (n=46)
P4 Logrank test p=0.0168
0.4
0.8
Primary
HR=1.52 (WNT5A High vs. Low)

P57
P49
Survival Fraction
p=0.0059
Paired Peritumor
0.3
0.6
P52
Case #9
P54
0.4
P42
0.2
0.4
0.2
P56
Recurrent
0.1
0.2
u 0.0
Median Survival
r
or
In mo
8.4 mo
m
tu
rit
4.9 mo
tra
0.0
0.0
Pe
0.0 0.1 0.2 0.3 0.4 0.5 0 5 10 15 20

Paired Intratumor PFS time (months)
K L M
1500
4000
Higher in Rec
GdEC Sig. Score in Recurrent Tumors
10
4000
Difference of GdEC Sig.
2000
between pri/rec Pairs
Wilcoxon rank
GdEC Staining Index (%)
0
test p=5.5e-07
-0.5 0.0
0
2000
-1500
Lower in Rec Higher in Rec

-0.2 -0.1 0.0 0.1 0.2
Difference of Normalized WNT5A Expression
0
Spearman Correlation Test between pri/rec Pairs

rho=0.3449, p=0.0024
WNT5A vs. GdEC Sig. GdEC higher in Rec GdEC lower in Rec
-0.4 -0.2 0.0 WNT5A higher in Rec 30 14
WNT5A lower in Rec 13 24
0
Primary Recurrent Normalized mRNA Level of WNT5A in Recurrent Tumors Fisher exact test p=0.003, odds ratio=3.884
1292 Cell 167, 1281–1295, November 17, 2016

Soda et al., 2011; Wang et al., 2010). Our work provides reinforc- Clinically, disease recurrence is the sine qua non of GBM with
ing evidence of this phenomenon and expands our understand- tumor re-emergence typically within a few centimeters of the
ing of the molecular underpinnings and tumor biological rele- primary tumor bed following optimal multi-modality treatment
vance of GdECs. Specifically, as a result of AKT activation, (Giese et al., 2003). Based on comprehensive analysis using hu-
GdECs produce WNT5A resulting in recruitment and proximal man GBMs specimens and datasets, our study establishes a
association of host ECs, which, in turn, promotes distant satellite strong correlation among elevated levels of WNT5A and GdEC
lesion formation and glioma cell invasive growth. The sources of signature, peritumoral satellite lesions, and tumor recurrence,
host ECs in our model remain to be determined and may include prompting us to speculate that WNT5A-mediated EC differenti-
circulating endothelial cells and bone-marrow-derived endothe- ation of GSC and satellite lesion formation provide a nurturing tu-
lial progenitor cells (Boer et al., 2014; Folkins et al., 2009), in mor microenvironment in the brain parenchyma. In this light, it is
additional to a dense network of microvasculature within the worth noting that, while bevacizumab has been approved as a
brain. Importantly, we found that GdECs not only recruit host single-agent for recurrent GBM, patients experience only tran-
ECs, but also increase their proliferation in a WNT5A-dependent sient benefit and develop highly infiltrative tumors (de Groot
manner, a finding that provides a rational explanation for the pre- et al., 2010; Ferrara et al., 2004). Thus, it would be interesting
vious observation of robust neovascularization yet low frequency to explore whether bevacizumab increases WNT5A-mediated
of GdECs integration into tumor vessels.(Rodriguez et al., 2012). endothelial lineage differentiation resulting in these refractory
GSCs enrichment has been observed in perivascular and hy- phenotypes. Notably, a previous study showed that GSCs differ-
poxia niche, which has been shown to maintain GSC multipo- entiation into ECs failed to be blocked by anti-VEGF inhibitors
tency and tumor initiation potential as well as tumor progression, and that GdECs were increased following VEGF receptor inhib-
therapeutic resistance, and recurrence (Calabrese et al., 2007; itor treatment in mouse GBM (Soda et al., 2011). On the basis
Lathia et al., 2011). However, little is known about how GSCs of these clinical and experimental observations, together with
are maintained outside of these native niches in the peritumoral mechanistic findings of this study, we propose the therapeutic
regions, which can drive disease recurrence following surgery strategy of targeting WNT5A-mediated GSCs differentiation
and radiotherapy. Our in vitro and in vivo findings support a into ECs and GdECs recruitment of exiting ECs (Figure S7N).
model whereby GdECs play an instructive role in establishing This strategy should ameliorate the outcome of GBM patients
a vascular-like niche for GSCs maintenance and growth via undergoing VEGF therapy, by limiting tumor neovascularization,
WNT5A-mediated recruitment of existing ECs. In particular, we invasiveness, and disease recurrence.
noted that GdECs recruit ECs within very small cell clusters (Fig-
ures 6G and 7F), and thus the initiation of neovascularization in STAR+METHODS
GBM may occur prior to the hyperplasia to neoplasia transition
(Folkman et al., 1989; Hanahan and Folkman, 1996) and likely in- Detailed methods are provided in the online version of this paper
dependent of hypoxia in these peritumoral areas. Importantly, and include the following:
we observe that GdEC + EC co-cultures are able to enhance
GSC self-renewal, which supports their cooperative role in sup- d KEY RESOURCES TABLE
porting distal tumor invasive growth, hence tumor recurrence. d CONTACT FOR REAGENT AND RESOURCE SHARING
Figure 7. Correlation of WNT5A-Mediated GdEC with Peritumoral Satellite Lesion and Tumor Recurrence in GBM Patients
(A) Representative images of GdECs (yellow arrows) defined using indicated EC and GSC markers. White arrows denote host ECs. Scale bar, 20 mm.
(B) Representative images with IHC double-staining and cell segmentation obtained from Caliper InForm analysis software show the close proximity of GdEC
(SOX2+/CD31+, yellow) and host ECs (SOX2–/CD31+, green) compared with GSCs (SOX2+/CD31–, red) in tumor sections. SOX2–/CD31– cells are marked in blue
color. Scale bar, 20 mm.
(C) Boxplot of distances from host ECs to the nearest GSCs and GdECs, respectively (n = 300).
(D) The correlation between WNT5A mRNA expression and GdEC signature score. n = 364 (IDHwt GBMs); mRNA expression was normalized across genes.
(E) Representative image of H&E staining for intratumoral and peritumoral regions (black dashed line) of GBM patient’s sample. Black arrows denote peritumoral
satellite. Scale bar, 200 mm.
(F) Representative images for GdECs (black arrows) and host ECs (red arrows) in variously sized satellite lesions in IHC double-staining tumor sections. Scale bar,
25 mm.
(G) Fourteen patients’ primary tumors were divided by WNT5A staining index into two groups (low and high). Tumor sections with peritumoral satellite lesions
(more than ten) were counted as the highest score. *p = 0.04 by the log-rank test for PFS between two groups, HR = 3.45 (high versus low).
(H) Comparison of WNT5A mRNA expression between nine pairs of intratumor and peritumor regions from GBM patients. Each dot in the scatterplot represents a
pair. Boxplot summarizes the distribution of WNT5A expression in nine intratumor and peritumor regions, respectively.
(I) TCGA GBMs (IDHwt, n = 228) were used for PFS analysis. Red and blue lines show survival curves of top 20% of GBMs with highest and lowest WNT5A mRNA
expression, respectively.
(J) Representative images for WNT5A (brown) and CD31 (red) staining of paired primary/recurrent tumors from one GBM patient. Scale bar, 25 mm.
(K) Unbiased quantification of GdEC frequency in primary and recurrent GBMs (n = 150).
(L) Correlation between WNT5A expression and GdEC signature scores in recurrent GBMs. Small boxplot panel shows all 81 pairs while the big boxplot panel
shows the majority of samples.
(M) Association of differences of WNT5A mRNA expression and GdEC signature score between 81 matched primary/recurrent GBMs pairs. Each circle in the
scatterplot represents a GBM pair; mRNA expression was normalized across genes.
See also Figure S7 and Tables S2, S4, S5, and S6.
Cell 167, 1281–1295, November 17, 2016 1293

d EXPERIMENTAL MODEL AND SUBJECT DETAILS Received: March 22, 2016
B Cell Lines and Cell Culture Revised: August 11, 2016
B Mice and Animal Housing Accepted: October 20, 2016
B Intracranial Xenograft Tumor Models
d METHOD DETAILS
REFERENCES
B Lentivirus Production and Transduction of Target Cells
B Immunoblotting (IB), Immunohistochemistry (IHC) and Adam, R.C., Yang, H., Rockowitz, S., Larsen, S.B., Nikolova, M., Oristian, D.S.,
Immunofluorescence (IF) Polak, L., Kadaja, M., Asare, A., Zheng, D., and Fuchs, E. (2015). Pioneer fac-
B Flow Cytometry and FACS Sorting tors govern super-enhancer dynamics in stem cell plasticity and lineage
B Chromatin Immunoprecipitation Sequencing (ChIP- choice. Nature 521, 366–370.
Seq) and ChIP-qPCR Alcantara Llaguno, S., Chen, J., Kwon, C.H., Jackson, E.L., Li, Y., Burns, D.K.,
B RNA Isolation, qRT-PCR and DNA Microarray Alvarez-Buylla, A., and Parada, L.F. (2009). Malignant astrocytomas originate
from neural stem/progenitor cells in a somatic tumor suppressor mouse
B Anchorage-Independent Growth Assays, Transwell
model. Cancer Cell 15, 45–56.
Assay and Matrigel-based Tube Formation Assay
Bao, S., Wu, Q., McLendon, R.E., Hao, Y., Shi, Q., Hjelmeland, A.B., Dewhirst,
B Magnetic Resonance Imaging (MRI)
M.W., Bigner, D.D., and Rich, J.N. (2006). Glioma stem cells promote radiore-
B Selective Targeting of GdECs in GBM xenografts by sistance by preferential activation of the DNA damage response. Nature 444,
GCV/HSVTK system 756–760.
B Identification of Histone H3K27 Status Switch Genes Bernstein, B.E., Mikkelsen, T.S., Xie, X., Kamal, M., Huebert, D.J., Cuff, J., Fry,
and AKT Activation Signature Genes B., Meissner, A., Wernig, M., Plath, K., et al. (2006). A bivalent chromatin struc-
B Clinical Datasets and Pathological Analysis ture marks key developmental genes in embryonic stem cells. Cell 125,
d QUANTIFICATION AND STATISTICAL ANALYSIS 315–326.
d DATA AND SOFTWARE AVAILABILITY Boer, J.C., Walenkamp, A.M., and den Dunnen, W.F. (2014). Recruitment of
B Data Resources bone marrow derived cells during anti-angiogenic therapy in GBM: The poten-
tial of combination strategies. Crit. Rev. Oncol. Hematol. 92, 38–48.
SUPPLEMENTAL INFORMATION Brennan, C.W., Verhaak, R.G., McKenna, A., Campos, B., Noushmehr, H.,
Salama, S.R., Zheng, S., Chakravarty, D., Sanborn, J.Z., Berman, S.H.,
Supplemental Information includes seven figures and seven tables and can be et al.; TCGA Research Network (2013). The somatic genomic landscape of
found with this article online at http://dx.doi.org/10.1016/j.cell.2016.10.039. glioblastoma. Cell 155, 462–477.
Calabrese, C., Poppleton, H., Kocak, M., Hogg, T.L., Fuller, C., Hamner, B.,
AUTHOR CONTRIBUTIONS Oh, E.Y., Gaber, M.W., Finklestein, D., Allen, M., et al. (2007). A perivascular
niche for brain tumor stem cells. Cancer Cell 11, 69–82.
B.H., Y.A.W., and R.A.D. designed the project and analyzed data; B.H. per- Cancer Genome Atlas Research Network (2008). Comprehensive genomic
formed the experiments; Q.W. performed bioinformatics analysis for ChIP- characterization defines human glioblastoma genes and core pathways.
seq, RNA-seq, DNA microarray, and clinical datasets; S.H. performed ChIP- Nature 455, 1061–1068.
seq, DNA microarray, and data analysis; R.G.W.V., Y.Z., and J.Z. provided Ceccarelli, M., Barthel, F.P., Malta, T.M., Sabedot, T.S., Salama, S.R., Murray,
assistance for TCGA data analysis; C.-E.G.S., D.O., M.M.M., P.D., Y.W.H., B.A., Morozova, O., Newton, Y., Radenbaugh, A., Pagnotta, S.M., et al.; TCGA
G.W., Z.T., H.Y., and W.-T.L. provided assistance in cell-culture and molecular Research Network (2016). Molecular profiling reveals biologically discrete sub-
biochemical experiments; Q.C. provided assistance in image capture; Q.W., sets and pathways of progression in diffuse glioma. Cell 164, 550–563.
J.Z., Y.Y., N.L., and L.C. provided assistance for analysis of GBM paired sam-
Chen, J., Li, Y., Yu, T.S., McKay, R.M., Burns, D.K., Kernie, S.G., and Parada,
ples; Z.D.L., G.N.F., J.J.P., and M.S.B. provided TCGA GBM biospecimens;
L.F. (2012). A restricted cell population propagates glioblastoma growth after
E.P.S., G.N.F., and L.J.C. provided assistance for pathological analysis on hu-
chemotherapy. Nature 488, 522–526.
man GBM samples; L.C. provided intellectual contribution and designed early
study. S.H., C.-E.G.S., D.O., X.L., J.H., and D.J.S. provided critical intellectual Cheng, C.W., Yeh, J.C., Fan, T.P., Smith, S.K., and Charnock-Jones, D.S.
contributions throughout the project; B.H., Y.A.W., and R.A.D. wrote the (2008). Wnt5a-mediated non-canonical Wnt signalling regulates human endo-
manuscript. thelial cell proliferation and migration. Biochem. Biophys. Res. Commun. 365,
285–290.
ACKNOWLEDGMENTS Cheng, L., Huang, Z., Zhou, W., Wu, Q., Donnola, S., Liu, J.K., Fang, X., Sloan,
A.E., Mao, Y., Lathia, J.D., et al. (2013). Glioblastoma stem cells generate vascular
The authors thank Dr. Raghu Kalluri for critical reading and comments; Dr. pericytes to support vessel function and tumor growth. Cell 153, 139–152.
Keith L. Ligon for initial assistance with histopathological analysis and de Groot, J.F., Fuller, G., Kumar, A.J., Piao, Y., Eterovic, K., Ji, Y., and Conrad,
providing GSCs lines; Drs. Colin Watts, Andrea Sottoriva, and Sara G.M. Pic- C.A. (2010). Tumor invasion after treatment of glioblastoma with bevacizumab:
cirillo for providing detailed information about their published datasets of gene Radiographic and pathologic correlation in humans and mice. Neuro-oncol.
expression profile; Verlene K. Henry and her staff for their help in mouse brain 12, 233–242.
implantation; Keith A. Michel and Charles V. Kingsley for assistance with MRI
Dunn, G.P., Rinne, M.L., Wykosky, J., Genovese, G., Quayle, S.N., Dunn, I.F.,
imaging and analysis; Shan Jiang for excellent mouse husbandry and care; Dr.
Agarwalla, P.K., Chheda, M.G., Campos, B., Wang, A., et al. (2012). Emerging
Jared K. Burks for assistance with confocal image and PE Vectra system; Dr.
insights into the molecular and cellular basis of glioblastoma. Genes Dev. 26,
Karen C. Dwyer and her staff for assistance with flow cytometer; Sequencing &
756–784.
Non-Coding RNA program and Sequencing and Microarray Facility at MDACC
provided sequencing service. This research is supported by UCSF Brain Ferrara, N., Hillan, K.J., Gerber, H.P., and Novotny, W. (2004). Discovery and
Tumor SPORE Tissue Bank P50 CA097257 (J.J.P.), NIH 2P50CA127001 development of bevacizumab, an anti-VEGF antibody for treating cancer.
(YAW), 5P01CA095616 (R.A.D. and L.C.), the Ben and Catherine Ivy Founda- Nat. Rev. Drug Discov. 3, 391–400.
tion Research Award (2009, R.A.D. and L.C.), and Clayton Foundation Folkins, C., Shaked, Y., Man, S., Tang, T., Lee, C.R., Zhu, Z., Hoffman, R.M.,
(RAD). The core facilities are supported by P30CA16672. and Kerbel, R.S. (2009). Glioma tumor stem-like cells promote tumor
1294 Cell 167, 1281–1295, November 17, 2016

angiogenesis and vasculogenesis via vascular endothelial growth factor and Tumour vascularization via endothelial differentiation of glioblastoma stem-
stromal-derived factor 1. Cancer Res. 69, 7243–7251. like cells. Nature 468, 824–828.
Folkman, J., Watson, K., Ingber, D., and Hanahan, D. (1989). Induction of Rodriguez, F.J., Orr, B.A., Ligon, K.L., and Eberhart, C.G. (2012). Neoplastic
angiogenesis during the transition from hyperplasia to neoplasia. Nature cells are a rare component in human glioblastoma microvasculature. Onco-
339, 58–61. target 3, 98–106.
Förstermann, U., and Münzel, T. (2006). Endothelial nitric oxide synthase in Shang, Y., Hu, X., DiRenzo, J., Lazar, M.A., and Brown, M. (2000). Cofactor
vascular disease: From marvel to menace. Circulation 113, 1708–1714. dynamics and sufficiency in estrogen receptor-regulated transcription. Cell
Furnari, F.B., Fenton, T., Bachoo, R.M., Mukasa, A., Stommel, J.M., Stegh, A., 103, 843–852.
Hahn, W.C., Ligon, K.L., Louis, D.N., Brennan, C., et al. (2007). Malignant as- Shen, Q., Goderie, S.K., Jin, L., Karanth, N., Sun, Y., Abramova, N., Vincent, P.,
trocytic glioma: Genetics, biology, and paths to treatment. Genes Dev. 21, Pumiglia, K., and Temple, S. (2004). Endothelial cells stimulate self-renewal
2683–2710. and expand neurogenesis of neural stem cells. Science 304, 1338–1340.
Gao, J., Li, T., and Lu, L. (2007). Functional role of CCCTC binding factor in in- Singh, S.K., Hawkins, C., Clarke, I.D., Squire, J.A., Bayani, J., Hide, T., Henkel-
sulin-stimulated cell proliferation. Cell Prolif. 40, 795–808. man, R.M., Cusimano, M.D., and Dirks, P.B. (2004). Identification of human
Gao, J., Wang, J., Wang, Y., Dai, W., and Lu, L. (2011). Regulation of Pax6 by brain tumour initiating cells. Nature 432, 396–401.
CTCF during induction of mouse ES cell differentiation. PLoS ONE 6, e20954. Soda, Y., Marumoto, T., Friedmann-Morvinski, D., Soda, M., Liu, F., Michiue,
H., Pastorino, S., Yang, M., Hoffman, R.M., Kesari, S., and Verma, I.M. (2011).
Giese, A., Bjerkvig, R., Berens, M.E., and Westphal, M. (2003). Cost of
Transdifferentiation of glioblastoma cells into vascular endothelial cells. Proc.
migration: Invasion of malignant gliomas and implications for treatment.
Natl. Acad. Sci. USA 108, 4274–4280.
J. Clin. Oncol. 21, 1624–1636.
Sottoriva, A., Spiteri, I., Piccirillo, S.G., Touloumis, A., Collins, V.P., Marioni,
Gill, B.J., Pisapia, D.J., Malone, H.R., Goldstein, H., Lei, L., Sonabend, A., Yun,
J.C., Curtis, C., Watts, C., and Tavaré, S. (2013). Intratumor heterogeneity in
J., Samanamud, J., Sims, J.S., Banu, M., et al. (2014). MRI-localized biopsies
human glioblastoma reflects cancer evolutionary dynamics. Proc. Natl.
reveal subtype-specific differences in molecular and cellular composition at
Acad. Sci. USA 110, 4009–4014.
the margins of glioblastoma. Proc. Natl. Acad. Sci. USA 111, 12550–12555.
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gil-
Hanahan, D., and Folkman, J. (1996). Patterns and emerging mechanisms of
lette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., and
the angiogenic switch during tumorigenesis. Cell 86, 353–364.
Mesirov, J.P. (2005). Gene set enrichment analysis: A knowledge-based
Hu, J., Ho, A.L., Yuan, L., Hu, B., Hua, S., Hwang, S.S., Zhang, J., Hu, T., approach for interpreting genome-wide expression profiles. Proc. Natl.
Zheng, H., Gan, B., et al. (2013). From the Cover: Neutralization of terminal dif- Acad. Sci. USA 102, 15545–15550.
ferentiation in gliomagenesis. Proc. Natl. Acad. Sci. USA 110, 14520–14527.
Suzuki, Y., Shirai, K., Oka, K., Mobaraki, A., Yoshida, Y., Noda, S.E., Okamoto,
Jeong, H.M., Jin, Y.H., Kim, Y.J., Yum, J., Choi, Y.H., Yeo, C.Y., and Lee, K.Y. M., Suzuki, Y., Itoh, J., Itoh, H., et al. (2010). Higher pAkt expression predicts a
(2011). Akt phosphorylates and regulates the function of Dlx5. Biochem. Bio- significant worse prognosis in glioblastomas. J. Radiat. Res. (Tokyo) 51,
phys. Res. Commun. 409, 681–686. 343–348.
Lathia, J.D., Heddleston, J.M., Venere, M., and Rich, J.N. (2011). Deadly team- Wang, H., Wang, H., Zhang, W., Huang, H.J., Liao, W.S., and Fuller, G.N.
work: Neural cancer stem cells and the tumor microenvironment. Cell Stem (2004). Analysis of the activation status of Akt, NFkappaB, and Stat3 in human
Cell 8, 482–485. diffuse gliomas. Lab. Invest. 84, 941–951.
Liu, Q., Nguyen, D.H., Dong, Q., Shitaku, P., Chung, K., Liu, O.Y., Tso, J.L., Liu, Wang, R., Chadalavada, K., Wilshire, J., Kowalik, U., Hovinga, K.E., Geber, A.,
J.Y., Konkankit, V., Cloughesy, T.F., et al. (2009). Molecular properties of Fligelman, B., Leversha, M., Brennan, C., and Tabar, V. (2010). Glioblastoma
CD133+ glioblastoma stem cells derived from treatment-refractory recurrent stem-like cells give rise to tumour endothelium. Nature 468, 829–833.
brain tumors. J. Neurooncol. 94, 1–19.
Wen, P.Y., and Kesari, S. (2008). Malignant gliomas in adults. N. Engl. J. Med.
Lobo, N.A., Shimono, Y., Qian, D., and Clarke, M.F. (2007). The biology of 359, 492–507.
cancer stem cells. Annu. Rev. Cell Dev. Biol. 23, 675–699.
Yang, D.H., Yoon, J.Y., Lee, S.H., Bryja, V., Andersson, E.R., Arenas, E., Kwon,
Masckauchán, T.N., Agalliu, D., Vorontchikhina, M., Ahn, A., Parmalee, N.L., Y.G., and Choi, K.Y. (2009). Wnt5a is required for endothelial differentiation of
Li, C.M., Khoo, A., Tycko, B., Brown, A.M., and Kitajewski, J. (2006). Wnt5a embryonic stem cells and vascularization via pathways involving both Wnt/
signaling induces proliferation and survival of endothelial cells in vitro and beta-catenin and protein kinase Calpha. Circ. Res. 104, 372–379.
expression of MMP-1 and Tie-2. Mol. Biol. Cell 17, 5163–5172. Zhang, Y., Liu, T., Meyer, C.A., Eckhoute, J., Johnson, D.S., Bernstein, B.E.,
Mayes, D.A., Hu, Y., Teng, Y., Siegel, E., Wu, X., Panda, K., Tan, F., Yung, Nusbaum, C., Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based anal-
W.K., and Zhou, Y.H. (2006). PAX6 suppresses the invasiveness of glioblas- ysis of ChIP-Seq (MACS). Genome Biol. 9, R137.
toma cells and the expression of the matrix metalloproteinase-2 gene. Cancer Zheng, H., Ying, H., Yan, H., Kimmelman, A.C., Hiller, D.J., Chen, A.J., Perry, S.R.,
Res. 66, 9809–9817. Tonon, G., Chu, G.C., Ding, Z., et al. (2008). p53 and Pten control neural and gli-
Molina, J.R., Hayashi, Y., Stephens, C., and Georgescu, M.M. (2010). Invasive oma stem/progenitor cell renewal and differentiation. Nature 455, 1129–1133.
glioblastoma cells acquire stemness and increased Akt activation. Neoplasia Zhou, Y.H., Wu, X., Tan, F., Shi, Y.X., Glass, T., Liu, T.J., Wathen, K., Hess,
12, 453–463. K.R., Gumin, J., Lang, F., and Yung, W.K. (2005). PAX6 suppresses growth
Paina, S., Garzotto, D., DeMarchis, S., Marino, M., Moiana, A., Conti, L., Cat- of human glioblastoma cells. J. Neurooncol. 71, 223–229.
taneo, E., Perera, M., Corte, G., Calautti, E., and Merlo, G.R. (2011). Wnt5a is a Zhu, T.S., Costello, M.A., Talsma, C.E., Flack, C.G., Crowley, J.G., Hamm,
transcriptional target of Dlx homeogenes and promotes differentiation of inter- L.L., He, X., Hervey-Jumper, S.L., Heth, J.A., Muraszko, K.M., et al. (2011).
neuron progenitors in vitro and in vivo. J. Neurosci. 31, 2675–2687. Endothelial cells create a stem cell niche in glioblastoma by providing NOTCH
Phillips, H.S., Kharbanda, S., Chen, R., Forrest, W.F., Soriano, R.H., Wu, T.D., ligands that nurture self-renewal of cancer stem-like cells. Cancer Res. 71,
Misra, A., Nigro, J.M., Colman, H., Soroceanu, L., et al. (2006). Molecular sub- 6061–6072.
classes of high-grade glioma predict prognosis, delineate a pattern of disease Zhu, Z., Khan, M.A., Weiler, M., Blaes, J., Jestaedt, L., Geibert, M., Zou, P.,
progression, and resemble stages in neurogenesis. Cancer Cell 9, 157–173. Gronych, J., Bernhardt, O., Korshunov, A., et al. (2014). Targeting self-renewal
Ricci-Vitiani, L., Pallini, R., Biffoni, M., Todaro, M., Invernici, G., Cenci, T., in high-grade brain tumors leads to loss of brain tumor stem cells and pro-
Maira, G., Parati, E.A., Stassi, G., Larocca, L.M., and De Maria, R. (2010). longed survival. Cell Stem Cell 15, 185–198.
Cell 167, 1281–1295, November 17, 2016 1295

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
Rabbit polyclonal anti-AKT Cell Signaling Technology Cat# 9272 RRID:AB_329827
Rabbit polyclonal anti-Phospho-Akt (Ser473) Cell Signaling Technology Cat# 9271 RRID:AB_329825
Rabbit monoclonal anti-Phospho-Akt Cell Signaling Technology Cat# 4056 RRID:AB_331163
(Thr308) (244F9)
Rabbit monoclonal anti-Phospho-S6 Cell Signaling Technology Cat# 4858S RRID:AB_916156
Ribosomal Protein (Ser235/236)
(D57.2.2E) XP
Mouse monoclonal anti-S6 Ribosomal Cell Signaling Technology Cat# 2317 RRID:AB_2238583
Protein (54D2)
Rabbit monoclonal anti-Phospho-p70 S6 Cell Signaling Technology Cat# 9234P RRID:AB_10121787
Kinase (Thr389) (108D2)
Rabbit monoclonal anti-p70 S6 Kinase Cell Signaling Technology Cat# 2708 RRID:AB_390722
Rabbit polyclonal anti-p44/42 MAPK (Erk1/2) Cell Signaling Technology Cat# 9102L RRID:AB_823494
Rabbit polyclonal anti-Phospho-p44/42 Cell Signaling Technology Cat# 9101 RRID:AB_331646
MAPK (Erk1/2) (Thr202/Tyr204)
Rabbit monoclonal anti -Wnt5a/b (C27E8) Cell Signaling Technology Cat# 2530S RRID:AB_2215595
Rabbit monoclonal anti-VEGFR2 (55B11) Cell Signaling Technology Cat# 2479L RRID:AB_2212507
Rabbit polyclonal anti-Phospho-CaMKII Cell Signaling Technology Cat# 3361 RRID:AB_10015209
(Thr286)
Rabbit polyclonal anti-CaMKII (pan) Cell Signaling Technology Cat# 3362 RRID:AB_2067938
Rabbit polyclonal anti-p53 (FL-393) Santa Cruz Biotechnology Cat# sc-6243 RRID:AB_653753
Goat polyclonal anti-DLX5 (ChIP) Santa Cruz Biotechnology Cat# sc-18152 RRID:AB_2090874
Mouse monoclonal anti-PAX6 (WB) Santa Cruz Biotechnology Cat# sc-81649 RRID:AB_1127044
Normal rabbit IgG antibody Santa Cruz Biotechnology Cat# sc-3888 RRID:AB_737196
Normal goat IgG antibody Santa Cruz Biotechnology Cat# sc-2028 RRID:AB_737167
CD31-APC, human (clone: AC128) Miltenyi biotec Cat#130-092-652
CD144 (VE-Cadherin)-FITC, human Miltenyi biotec Cat#130-100-742
(clone: REA199)
CD144 (VE-Cadherin)-APC, human Miltenyi biotec Cat#130-100-708
(clone: REA199)
CD133/1 (AC133)-PE, human (clone: AC133) Miltenyi biotec Cat#130-080-801
Mouse monoclonal anti-b-Actin (Clone Sigma-Aldrich Cat# A2228
AC-74)
Rabbit polyclonal anti-PAX6(IHC) Sigma-Aldrich Cat# HPA030775 RRID:AB_10601243
Rabbit polyclonal anti-DLX5(WB/IHC) Sigma-Aldrich Cat# HPA005670 RRID:AB_1078681
Mouse monoclonal anti-CD105 (Endoglin, DAKO Cat# M3527 RRID:AB_2099044
SN6h)
Rabbit polyclonal anti-Glial Fibrillary Acidic DAKO Cat# N1506 RRID:AB_10013482
Protein (GFAP)
Rabbit monoclonal anti-Ki67 Vector Laboratories Cat# VP-RM04 RRID:AB_2336545
Rabbit polyclonal anti-VEGF Receptor 2 Abcam Cat# ab39256 RRID:AB_883437
Rabbit polyclonal anti-Von Willebrand Factor Abcam Cat# ab9378 RRID:AB_307223
Rabbit polyclonal anti-CD31 Abcam Cat# ab28364 RRID:AB_726362
Mouse monoclonal anti-CD31 Abcam Cat# ab9498 RRID:AB_307284
Rabbit polyclonal anti-H3K27ac Abcam Cat# ab4729 RRID:AB_2118291

Continued
Rabbit polyclonal anti-H3K4me1 Abcam Cat# ab8895, RRID:AB_306847
Rabbit polyclonal anti-PAX6 (ChIP) Abcam Cat# ab5790 RRID:AB_305110
Rabbit monoclonal anti-SOX2 (EPR3131) Abcam Cat# ab92494 RRID:AB_10585428
Mouse monoclonal anti-Nestin (10C2) Abcam Cat# ab22035 RRID:AB_446723
Rabbit polyclonal anti-Histone H3K4me2 Active Motif Cat# 39141 RRID: AB_2614985
Rabbit polyclonal anti-H3K27me3 EMD Millipore Cat# 07-449 RRID:AB_310624
Rabbit polyclonal anti-H3K4me3 EMD Millipore Cat# 07-473 RRID:AB_1977252
Rabbit monoclonal anti-CD34 (EP373Y) GeneTex Cat# GTX61737 RRID:AB_10624965
Mouse monoclonal anti-Human TRA-1-85 R&D Systems Cat# MAB3195 RRID:AB_2066681
Rabbit monoclonal anti-Neuronal Class III Covance Research Cat# MRB-435P-100 RRID:AB_10175616
beta-Tubulin (TUJ1)
Mouse monoclonal anti-CD144 BD Biosciences Cat# 555661 RRID:AB_396015
Mouse monoclonal anti-eNOS/NOS Type III BD Biosciences Cat# 610297 RRID:AB_397691
Wnt Antagonist III, Box5 Calbiochem Cat# 681673
Wnt-5a Recombinant Protein R&D Systems Cat# 645-WN
Wnt-3a Recombinant Protein R&D Systems Cat# 5036-WN
Ganciclovir (GCV) InvivoGen CAS # 82410-32-0 Cat. Code sud-gcv
Iodonitrotetrazolium chloride Sigma-Aldrich Cat# I10406
Calcein AM BD Biosciences Cat#564061
DiI-AcLDL Uptake Assay Thermo Fisher Cat# L35353
NEBNext DNA Library Prep kit New England BioLabs Cat# E7370S
BD FluoroBlok System BD Biosciences Cat# BD351161
TumorTACS In Situ Apoptosis Detection Kit Trevigene Cat# 4815-30-K
MACH 2 Double Stain 1 Biocare Medical Cat# MRCT523G
MACH 2 Double Stain 2 Biocare Medical Cat# MRCT525G
Deposited Data
Gene expression profile NCBI Gene Expression Omnibus GEO:GSE85615
ChIP sequencing data NCBI Gene Expression Omnibus GEO: GSE86624
Myc-immortalized human neural progenitor EMD Millipore Cat# SC007
cells (ReNcell)
Myc-immortalized human neural stem cells This paper N/A
Patient derived GSC lines (TS543, Laboratory of Dr. Cameron W. Brennan N/A
TS576,TS586,TS603) (MSKCC)
Patient derived GSC lines (BT112,BT147) Laboratory of Dr. Keith L. Ligon (DFCI) N/A
293T packaging cells ATCC CRL-11268
Human umbilical vein endothelial cells ScienCell Research Laboratories Cat#8000
(HUVEC)
Human Brain Microvascular Endothelial Cells ScienCell Research Laboratories Cat#1000
(HBMECs)
Human Brain Microvascular Endothelial Cells Neuromics Cat# HEC02
(HBMECs)
Mouse: ICR SCID female Taconic ICRSC-F
Recombinant DNA
pWZL-Blast-myc Addgene Cat#10674
pCMVR8.74 Addgene Cat#22036

Continued
pMD2.G Addgene Cat#12259
pLenti6.3/V5-DEST gateway Vector Thermo Fisher Cat#V53306
pLVX-ZsGreen1-N1 Clontech Cat#632565
cEF.tk-GFP Addgene Cat#33308
pLenti6.3-GFP This paper N/A
pWZ-neo-myr-AKT This paper N/A
pLenti6.3-myr-AKT This paper N/A
pLenti6.3-p53DN This paper N/A
pLenti6.3-CXCL14 This paper N/A
pLenti6.3-DLX5 This paper N/A
pLenti6.3-DMRT3 This paper N/A
pLenti6.3-GPR37 This paper N/A
pLenti6.3-MYLIP This paper N/A
pLenti6.3-NUDT14 This paper N/A
pLenti6.3-TCF7 This paper N/A
pLenti6.3-WNT5A This paper N/A
pLenti6.3-PAX6 This paper N/A
pCD144-HSVTK-GFP This paper N/A
pLKO.1 target gene set (CXCL14) Sigma-Aldrich SHCLNG-NM_004887
pLKO.1 target gene set (DLX5) Sigma-Aldrich SHCLNG-NM_005221
pLKO.1 target gene set (DMRT3) Sigma-Aldrich SHCLNG-NM_021240
pLKO.1 target gene set (GPR37) Sigma-Aldrich SHCLNG-NM_005302
pLKO.1 target gene set (MYLIP) Sigma-Aldrich SHCLNG-NM_013262
pLKO.1 target gene set (NUDT14) Sigma-Aldrich SHCLNG-NM_177533
pLKO.1 target gene set (TCF7) Sigma-Aldrich SHCLNG-NM_003202
pLKO.1 target gene set (WNT5A) Sigma-Aldrich SHCLNG-NM_003392
All primers and oligonucleotides are listed This paper N/A
in Table S7
InForm Cell Analysis Version 2.2 PerkinElmer http://www.perkinelmer.com/
lab-products-and-services/resources/
software-downloads.html#inForm
Flow Jo_v10 FlowJo http://www.flowjo.com/
Pannoramic Viewer 3DHISTECH Ltd. http://www.3dhistech.com/
pannoramic_viewer
ImageJ National Institutes of Health https://imagej.nih.gov/ij/
Aperio ImageScope_v12 Leica Biosystems http://www.leicabiosystems.com/
digital-pathology/digital-pathology-
management/imagescope/
Integrative Genomics Viewer (IGV) The Broad Institute of MIT and Harvard http://software.broadinstitute.org/
software/igv/
R package (Version 3.2.5) The R Project for Statistical Computing https://www.r-project.org
Model-based Analysis of ChIP-Seq (MACS) Zhang et al., 2008 https://genomebiology.biomedcentral.com/
articles/10.1186/gb-2008-9-9-r137
Data Visualization Tools for Brain Tumor N/A http://gliovis.bioinfo.cnio.es
Datasets

Further information and requests for reagents may be directed to, and will be fulfilled by the corresponding author Ronald A. DePinho
(rdepinho@mdanderson.org).
Cell Lines and Cell Culture

The c-myc-immortalized human neural progenitor cells (ReNcell) were purchased from Millipore (EMD Millipore, Billerica, MA).
Another human neural stem cell line (NSC) was derived from 18-week gestation fetal brain tissue that was provided by Dr. Volney
L. Sheen (BIDMC, Harvard Medical School, Boston, MA, USA), which was immortalized by c-MYC (pWZL-Blast-MYC, Addgene).
Patient-derived glioma stem cells (GSCs) were provided by Dr. Cameron W. Brennan (Memorial Sloan Kettering Cancer Center,
New York, NY, USA) and by Dr. Keith L. Ligon (Dana-Farber Cancer Institute, Boston, MA, USA). All NSCs and GSCs were cultured
in NSC proliferation media (Millipore Corporation, Billerica, MA) with 20 ng/ml EGF and 20 ng/ml bFGF. Human umbilical vein endo-
thelial cells (HUVECs) and human brain microvascular endothelial cells (HBMECs) were purchased from ScienCell and Neuromics
and were cultured in endothelial cell media (ECM, Cat#1001, ScienCell; MED001, Neuromics).The 293T packaging cells from
ATCC were cultured in DMEM with 10% FBS.
Mice and Animal Housing

Female ICR SCID mice at 3-4 weeks age were purchased from Taconic Biosciences. Mice were grouped by 5 animals in large
plastic cages and were maintained under pathogen-free conditions. All animal experiments were performed with the approval of
MD Anderson Cancer Center’s Institutional Animal Care and Use Committee (IACUC).
Intracranial Xenograft Tumor Models

Female SCID mice were anesthetized and placed into stereotactic apparatus equipped with a z axis (Stoelting). A small hole was
bored in the skull 0.5 mm anterior and 3.0 mm lateral to the bregma using a dental drill. Cells (2 3 105 in Figure 1C; 200-200,000
in Figure S1C) in 5 ml Hanks Balanced Salt Solution were injected into the right caudate nucleus 3 mm below the surface of the brain
using a 10 mL Hamilton syringe with an unbeveled 30-gauge needle. Alternatively, mice were bolted before the intracranial implan-
tation at MD Anderson’s Brain Tumor Center Animal Core. To install guide screw, animals were anesthetized by intraperitoneal in-
jection with ketamine/xylazine solution (200 mg ketamine and 20 mg xylazine in 17 mL of saline) at a dosage of 0.15 mg/10 g
body weight. The plastic screw was rotated into a small drill hole made 2.5 mm lateral and 1 mm anterior to the bregma and the central
hole of the guide screw was closed by placing a cross-shaped stylet inside it. After one week recovery, mice were grouped by four or
five animal for cells implantation. The cells (5 3 105 in Figure S5D; 1 3 104 in Figure S5M) were injected in 5 ml Hanks Balanced Salt
Solution. Animals were followed daily for the development of tumors. Mice with neurological deficits or moribund appearance were
sacrificed. Brains were removed using transcardial perfusion with 4% paraformaldehyde (PFA) and were fixed in formalin or post-
fixed in 4% PFA and processed for paraffin embedded or OCT frozen tissue blocks.
METHOD DETAILS
Lentivirus Production and Transduction of Target Cells

The expression vectors (p53 dominant negative-p53DN, myr-AKT, CXCL14, DLX5, DMRT3, GPR37, MYLIP, NUDT14, TCF7,
WNT5A, and PAX6) were generated by cloning the respective open reading frame (ORF) into pLenti6.3 vector using Gateway Cloning
system. The pLKO.1 shRNAs were purchased from Sigma. Gene expression was validated by qRT-PCR or immunoblotting in lenti-
virus infected target cells. Lentiviruses were produced in 293T cells with packaging system (pCMVR8.74, pMD2.G, Addgene) as per
Vendor’s instruction.
Immunoblotting (IB), Immunohistochemistry (IHC) and Immunofluorescence (IF)

For immunoblotting, cells were harvested, washed with phosphate buffered saline, lysed in RIPA buffer (150 mM NaCl, 50 mM Tris
[pH 8.0], 1.0% Igepal CA-630, 0.5% sodium deoxycholate, 0.1% sodium dodecyl sulfate [SDS]; Sigma) with protease inhibitor cock-
tail tablet complete mini (Roche Diagnostics), phosphatase inhibitor cocktail 2 (Sigma) and 1 mM DTT, and centrifuged at 10,000 3 g
at 4 C for 15 min. Protein lysates were subjected to SDS-polyacrylamide gel electrophoresis on 4%–12% gradient polyacrylamide
gel (NuPage, Thermo Fischer Scientific), transferred onto nitrocellulose membranes which were incubated with indicated primary
antibodies, washed, and probed with HRP-conjugated secondary antibodies. For IHC staining, brain sections were incubated
with indicated primary antibodies for 1 hr at room temperature (RT) or overnight at 4 C after deparaffinization, rehydration, antigen
retrieval, quenching of endogenous peroxidase and blocking. The sections were incubated with horseradish peroxidase (HRP)-con-
jugated polymer (DAKO) for 40 min and then Diaminobenzidine using Ultravision DAB Plus Substrate Detection System (Thermo
Fischer Scientific) for 1-10 min at RT, followed by hematoxylin staining. For IF staining, OCT frozen brain sections were thawed at
RT for 30 min, rinsed and rehydrated with phosphate buffered saline 3 times. After blocking with PBS buffer containing 10% FBS,

1% BSA and 0.3% Triton, the sections were incubated with indicated primary antibodies overnight at 4 C. The samples were then
incubated with species-appropriate donkey secondary antibodies coupled to AlexaFluor dyes (488, 555, 568 or 594, 647, Invitrogen)
for 1 hr at RT. VECTASHIELD with DAPI (Vector Laboratories) was used to mount coverslips. The slides were scanned using the
digital slide scanner, Pannoramic 250 Flash II (3DHISTECH, Ltd.) and images analyzed by Pannoramic viewer.
Flow Cytometry and FACS Sorting

Cells were harvested and suspended in ice-cold PBS with 1% BSA and 2mM EDTA. After incubation with FcR Blocking Reagent
(Miltenyi Biotec), cells were stained by fluorescently conjugated antibodies and incubated for 10 min in the dark in the refrigerator
(2 8 C). Antibodies include CD31-APC, CD144-FITC, CD144-APC, CD133-PE, IgG-APC, IgG-FITC, and IgG-PE from Miltenyi
Biotec. The stained cells or GFP-labeled cells were analyzed in a BD Fortessa analyzer. FACS sorting was performed using the
BD FACSAria cell sorter. Data were analyzed using FlowJo software.
Chromatin Immunoprecipitation Sequencing (ChIP-Seq) and ChIP-qPCR

Chromatin Immunoprecipitation (ChIP) was performed on early passage cell lines, hNSCs and three tumor neurosphere lines derived
from hNSC transduced with p53DN and myr-AKT as previously described (Shang et al., 2000). Briefly, cells (2 3 106 cells per ChIP)
were cross-linked in 1% formaldehyde solution, re-suspended, and lysed. Cell lysates were solubilized, and cross-linked chromatin
was sheared to a size range of 100 to 300 bases using a Bioruptor Sonicator (Diagenode, UCD-200). Solubilized chromatin was
diluted 10-fold in ChIP dilution buffer and incubated at 4 C with 2 mg antibodies against specific histone modification or transcription
factors. The following antibodies were used in ChIP assays: anti-H3K27me3, anti-H3K27ac, anti-H3K4me3, anti-H3K4me1, anti-
H3K4me, anti-PAX6, anti-DLX5, normal rabbit IgG and normal goat IgG. After ChIP, samples were washed, and bound complexes
were eluted and reverse cross-linked. Multiplexed and barcoded sequencing libraries for ChIPed DNA and Input DNA were gener-
ated with NEBNext Library Prep kit according to the manufacturer’s instructions, and then were sequenced by Illumina HiSeq 2000.
Histone modification peaks and transcription factor-bound regions were identified as genomic regions with a significant read enrich-
ment in ChIPed reads over the Input reads analyzed by the Model-based Analysis of ChIP-Seq (MACS) tool (Zhang et al., 2008). For
ChIP-qPCR assays, the fold enrichment of ChIPed DNA relative to input DNA at a given genomic site was determined by comparative
CT (DD CT) method using Power SYBR Green PCR Master Mix (Applied Biosystems) according to the manufacturer’s protocol. An
18S rRNA genomic region was used for normalization. The primers used for ChIP-qPCR are listed in Table S7.
RNA Isolation, qRT-PCR and DNA Microarray

RNA was isolated with RNeasy Mini Kit (QIAGEN), and then used for first-strand cDNA synthesis using random primers and Super-
ScriptIII Reverse Transcriptase (Invitrogen). qRT–PCR was performed using Power SYBR Green PCR Master Mix (Applied Bio-
systems). Primers are listed in Table S7. The relative expression of genes was normalized using ribosomal protein L39 (RPL39) as
a housekeeping gene.
Early passage cell lines, including hNSC, hNSC-p53DN, two independent lines for hNSC-P53DN-AKT, three tumor neurosphere
lines derived from hNSC-P53DN-AKT (iGSC-1, iGSC-2, and iGSC-3), and FACS-sorting cells were grown in NSC proliferation media
with EGF and bFGF for 24 hr. RNA was isolated using Trizol (Invitrogen) and the RNeasy mini kit (QIAGEN). Gene expression profiling
was performed using the Affymetrix U133 Plus 2.0 Array at DFCI and MD Anderson’s Sequencing and Microarray core facility.
Anchorage-Independent Growth Assays, Transwell Assay and Matrigel-based Tube Formation Assay
Anchorage-independent growth assays were performed in triplicate in 6-well plates or in 48-well plates. Indicated cells (2 3 104 or
1 3 103 per well) were seeded in NSC proliferation media with EGF and bFGF containing 0.4% low-melting agarose on the top of
bottom agar containing 1% low-melting agarose NSC proliferation media with EGF and FGF. After 14 – 21 days, colonies were
stained with Iodonitrotetrazolium chloride (Sigma) and counted.
Transwell assays were performed in BD FluoroBlok 96-multiwell insert systems (3.0 mm pore sizes) as per manufacturer’s pro-
tocol (BD biosciences). HBMECs were seeded in transwell inserts at 1 3 104 cells/ well in EC media overnight. After 4 hr starvation
in EC basal media at 37 C, 5% CO2 incubator, the inserts were transferred into the basal chambers containing chemoattractant in
NSC media as indicated. After 24 hr incubation, the inserts were transferred into a second 96-well plate containing 4 mg/mL
Calcein AM (BD biosciences) in DPBS. Incubate for 1 hr at 37 C, 5% CO2, fluorescence of invaded cells was read at wavelengths
of 494/517 nm (Ex/Em) on fluorescent plate reader. Neurosphere formation was performed by transwell assay in 24-well plate by
culturing sorted GdECs or non-GdECs with HBMECs (1 3 104 of indicated cells) in transwell inserts containing NSC media, and
GSC being cultured in basal chamber at 1 cell per microliter (500 ml/well) in NSC media. GSC neurospheres were counted after
7 days.
EC tubular formation was assessed by growth factor reduced Matrigel assay kit (BD Biosciences) in three-dimensional (3D) culture
according to the manufacturer’s instructions. The CD133+/CD144+ cells sorted from p53DN-AKT-hNSCs were infected by lentivirus
carrying shRNA targeting the indicated genes (Figures 3C and 3D) or were treated with BOX5 (100mM) (Figures 3F and 3G). Cells were
harvested at 48 hr post-infection or treatment and then were cultured in growth factor reduced Matrigel. Quantification was per-
formed after 8-12 hr. To quantify the tubular formation, branch points (3 or more tubular branches emanating from a point) were
analyzed with an inverted microscope at 40x magnification and counted in 5 random fields per well.

Magnetic Resonance Imaging (MRI)
MRI studies were performed on the 4.7 T Biospec USR MRI system (Bruker Biospin MRI, Billerica, MA) in MD Anderson’s Small
Animal Cancer Imaging Research Facility. Animals were anesthetized with 1.5%–5% isoflurane inhalation anesthesia. Images of
brains were acquired using T2-weighted axial and coronal Rapid Acquisition with Relaxation Enhancement (RARE) scans with
TR = 3000 ms, TE = 57 ms, RARE factor = 12, 4 Averages, 156 mm in-plane resolution, 4 cm x 3 cm FOV, 0.75 mm slice thickness
and 0.25 mm slice gap. Tumor volume was measured by contouring the lesions in the T2-weighted images using ImageJ software
(National Institutes of Health, Bethesda, MD, USA). The total tumor volume is the sum of the in-plane tumor volumes and the sum of
the tumor volumes within the slice gaps, which was estimated by multiplying the mean of the contoured areas on adjacent slices by
the width of the slice gap.
Selective Targeting of GdECs in GBM xenografts by GCV/HSVTK system

To generate the plasmid of CD144 (VE-Cadherin)-promoter-driven expression of HSVTK plus GFP, the original promoter in pLVX-
ZsGreen1-N1 (Clontech) was replaced by a PCR amplified 1.5 kb genomic region of human CD144 promoter. The fragment coding
HSVTK-GFP amplified from cEF.tk-GFP (Addgene) by PCR was inserted into pLVX-ZsGreen1-N1 downstream of CD144 promoter to
generate pCD144-HSVTK-GFP, in which the region of ZsGreen was removed and subsequently validated by sequencing. GSCs were
transduced with pCD144-HSVTK-GFP though lentiviral infection and then transplanted into brains of SCID mice. Tumor-bearing an-
imals were administrated GCV (InvoGen) at 80mg/kg/day or PBS daily through intraperitoneal injection. The xenograft tumors were
collected for IHC and IF analyses. To detect GCV-induced apoptosis in GdECs expressing HSVTK, TUNEL assay was performed
according to manufacturer’s instructions (Trevigene).
Identification of Histone H3K27 Status Switch Genes and AKT Activation Signature Genes
Genomic regions within 2 kilobases upstream and downstream of gene transcriptional start sites (TSSs) were examined for histone
modification peaks based on Model-based Analysis of ChIP-Seq (MACS). Histone H3K27 status switch genes were identified as a
group of genes with dynamic histone modification changes of H3K27me3 and H3K27ac in iGSCs compared with hNSCs. AKT acti-
vation signature genes (417) were identified based on gene expression profile comparison: at least 2-fold changes for 3 independent
tumor spheres lines derived from p53DN-AKT-hNSCs (iGSC-1, iGSC-2, and iGSC-3) versus hNSCs; two independent cell lines for
p53DN-AKT-hNSCs (different levels of AKT activation) versus hNSCs; one line for p53DN-AKT-hNSCs (higher AKT levels) versus the
other line for p53DN-AKT-hNSCs (lower AKT levels).
Clinical Datasets and Pathological Analysis

TCGA GBM datasets include gene mutations, copy number, gene expression, proteomics (RPPA), tumor subtypes and patient sur-
vival information (https://tcga-data.nci.nih.gov). Preprocessed gene expression profile and annotation of TCGA GBM samples were
obtained from GlioVis. For the published datasets of human GBMs used in this study, gene expression profiles data for 9 pairs of
intratumor and peritumor regions from GBM patients were obtained from ArrayExpress Archive (accession nos. E-MTAB-1215
and E-MTAB-1129)(Sottoriva et al., 2013); each gene mRNA expression was normalized to NES in Figures 7H and S7G. RNA-Seq
data for 39 samples from contrast-enhancing (CE) regions and 36 samples from non-enhancing (NE) regions from 27 different glioma
patients were obtained from Gene Expression Ominbus (accession number GSE59612) (Gill et al., 2014); each gene mRNA expres-
sion was normalized to NES in Figures S7H and S7I.
RNA-Seq data for 124 (81 pairs with IDHwt and pairwise profiles on the same platform for analysis in this study) paired primary and
recurrent gliomas including both TCGA and in-house datasets were provided by Dr. Roel Verhaak’s lab (MD Anderson). Frozen GBM
tissues (n = 12) were obtained from TCGA collections and 10 primary GBMs (FFPE) blocks were obtained from Dr. Erik Sulman’s lab.
The paired primary/recurrent GBM slides (FFPE) for IHC were provided by the first Affiliated Hospital of Nanjing Medical University,
Nanjing, China and Guangdong 999 Brain Hospital, Guangzhou, China. The pathological analysis of human GBMs was guided by
board-certified neuropathologists. Aperio ImageScope and InForm software were used for identification and quantification. All
human GBM tissue samples were analyzed with IRB-approval protocol (PA16-0408).
For quantification of microvessel density (MVD), images of tumor sections with IF or IHC staining were captured by using the digital
slide scanner, Pannoramic 250 Flash II. Measurement was performed in a single area of intratumoral or peritumoral tumor
(0.178 mm2 in Pannoramic view) representative of the highest microvessel density (‘‘hot spot’’). The CD34 positive cells or micro-
vessels were counted. Five fields in each tumor were randomly selected for MVD analysis and statistical analysis was performed by
using Welch’s t test of Graphpad Prism6.
Quantification of GdECs by co-localization analysis using Caliper Vectra Image System and InForm software. Briefly, the IF or IHC
(double staining-Wrap red and DAB) stained slides were loaded onto the Vectra slide scanner. Vectra Nuance 3.0.0 software was
used to build the spectral libraries using 1 single chromogen only (e.g., DAPI, AlexaFluor-488, AlexaFluor-594, DAB, Wrap red, he-
matoxylin). Nuance multispectral image cubes were acquired with 20 3 objective lens (0.5 micron/pixel) and using a full CCD frame at
1 3 1 binning (1360 3 1024 pixels) for analysis. For GdECs in IF stained xenograft tumors (Figure 5D), at least 3 image fields from 3

tumors with intratumoral and peritumoral areas were used for automated co-localization analysis using InForm software. Statistical
analysis was performed by using unpaired Student’s t test. For GdECs in IHC stained human GBM tumors sections (Figure 7K), 150
random images fields from 5 primary or recurrent GBM tumors were used for automated co-localization analysis using InForm soft-
ware. Statistical analysis was performed by using Wilcoxon rank test.
To quantify cell distance in xenograft tumor sections (Figure 6B), the IF stained images were captured using the digital slide scan-
ner, Pannoramic 250 Flash II and cell distance was measured manually using Pannoramic viewer. GdECs (GFP+) were first located in
the peritumoral regions (low cell density) and then the nearest host EC (CD34+/TRA-1-85-) within 30mm of each respective GdEC
was defined. The nearest tumor cells (TRA-1-85+/GFP-) to the defined host EC was then located. At least 5 fields in peritumoral areas
for each tumor (n = 3) were selected for distance measurements. Statistical analysis was performed by using Welch’s t test of Graph-
pad Prism6. To quantify cell distance in human GBM specimens (Figure 7C), 300 image fields from 10 human GBM tumors with IHC
(double staining-Wrap red and DAB) staining were captured using Caliper Vectra Image System and analyzed data were generated
using InForm software. GdECs were first located and the nearest host EC within 40 pixels (28mm) of each respective GdEC was
defined. The nearest GSC (SOX2+/CD31-) to the defined host EC (SOX2-/CD31+) was then located for calculation by using R pack-
age. Statistical analysis was performed by using Wilcoxon rank test.
To test the significance of overlap between stem cell pathways/genesets that compiled from MsigDB v5.1 and 85 genes
with H3K27 acetylation (epigenetic activation), hypergeometric test was performed by using R package. P value for significance
was given by 1-phyper (X, M, N, 85), where X is the size of overlapped genes, M is the number of genes in the stem cell related
pathways for testing, and N is the number of genes that do not in stem cell related pathways. Based on this formula, the pathway
of HEMATOPOIESIS_STEM_CELL NUMBER_LARGE_VS_TINY_UP was not significantly enriched (p value of hypergeometric test
of overlap > 0.14), however, EC signaling pathway was significantly enriched (p value of hypergeometric test of overlap < 0.05) in
these 85 genes (Figure 1)
Statistical information including n, mean and statistical significance values are indicated in the text or the figure legends. Animal
survivals were analyzed using Log-rank test and cell distance and MVD were analyzed using Welch’s t test based on Graphpad
Prism6. Comparisons of cell growth, colony formation in anchorage-independent growth assays, tubular formation, transwell assay,
neurosphere formation, and gene expression by qRT-PCR were performed using the unpaired Student’s t test. Error bars in the ex-
periments represent standard deviation (SD) of the mean values from either independent experiments or independent samples. All
other statistical analyses were performed using R package (Version 3.2.5), and the detailed information about statistical methods
were specified in figures/tables.
Data Resources
The gene expression profile by microarray and the histone landscape by ChIP-Seq in this paper have been deposited in NCBI GEO:
GSE85615 and GSE86624.

A B
1.0
Total AKT AKT_pT308 AKT_pS473
Cutoff = 0.2537 Cutoff = 0.6625 Cutoff = 0.3101

0.8
Survival Fraction
Logrank test p=0.1442 Logrank test p=0.0011 Logrank test p=0.0393

Low (n=24) Low (n=25) Low (n=20)
rep1
0.6
High (n=13) High (n=12) High (n=17)

0.4
0.2
Median Survival Median Survival Median Survival

6.8 mo 13.5 mo 15.6 mo
6.8 mo
1.0 0.0
13.5 mo 4.5 mo
rep2
Total S6 S6_pS235/236 S6_pS240/240
Cutoff = -0.2576 Cutoff = 0.1101 Cutoff = 0.2645

0.8
Survival Fraction
Logrank test p=0.1732 Logrank test p=0.0149 Logrank test p=0.0176

Low (n=10) Low (n=20) Low (n=21)
0.6
High (n=27) High (n=17) High (n=16)
rep3
0.4
0.2
Median Survival Median Survival Median Survival

15.6 mo 13.5 mo 13.5 mo
5.0 mo 4.8 mo
11.8 mo CTRL p53DN p53DN-AKT
0.0
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
Overall Survival (months) Overall Survival (months) Overall Survival (months)
C E 60
H3K27me3 D iGSC-1 iGSC-2 iGSC-3
mean of score
200,000 cells (n=4) 2,000 cells (n=4)

40
Oncogenic Activation
20,000 cells (n=4) 200 cells (n=4) hNSC

20
100 iGSC
White field
0
-2 -1 0 1 2
75 (kb)
Survival (%)
5’ TSS 3’
60
mean of score
H3K27ac
50
NSC Media
40 iGSC
25 20
hNSC
Weeks 0
0 -2 -1 0 1
(kb) 2
0 10 20 30 40 50 5’ TSS 3’
NesƟn1/DAPI
IgG-APC CD144-APC CD31-APC

F
0.2% 12.8% 8.5%
APC-A
Tuj1/GFAP/DAPI
+1%FBS
FSC-A
G 5
Relative mRNA Level
**
4 NSC Media EC Media

**
3 ** **
**
2 *
I Tumor-1 Tumor-2
1
DAPI/CD34/TRA-1-85
0
CD144 VEGFR2 CD31 TIE2 vWF CD105
H NSC media EC media
Dil-AcLDL/DAPI
DAPI/CD31/TRA-1-85
Figure S1. Characterization of EC Phenotypes in Tumor Neurospheres Derived from p53DN-AKT-hNSCs, Related to Figure 1
(A) Overall survival relative to the levels of AKT pathway activation in TCGA GBM cohorts. TCGA GBM samples with IDH wild-type, TP53 mutations, proteomic
datasets (RPPA) and clinical data were divided into two groups according to the indicated protein levels by optimal cutoff. Patient survival relative to the levels of
total AKT, AKT-pT308, AKT-pS473, S6, S6-pS235/236, and S6-pS240/244.
(B) Images for soft agar colony formation assay in 6-well plates showing transformation of hNSCs expressing p53DN, p53DN, and myr-AKT (p53DN-AKT).

(C) Tumor neurospheres (e.g., iGSC-1, iGSC-2, and iGSC-3) were isolated for secondary implant into mouse brain. Kaplan–Meier survival analysis after intra-
cranial injection of different cell numbers of iGSC-2.
(D) Tumor cells were isolated from mouse brains and characterized by neurosphere formation and differentiation in vitro. Scale bars, 100 mm (black) and 50 mm
(white).
(E) Mean of H3K27me3 and H3K27ac ChIP-Seq scores within 2 kb upstream and downstream of TSSs for 85 genes with H3K27 residue switch between hNSC
and iGSC for oncogenic activation.
(F) Representative FACS analysis of the iGSCs expressing EC markers by CD144 and CD31 APC conjugated antibodies. FSC, forward scatter.
(G) qRT-PCR analysis of indicated EC marker expression in iGSCs under EC culture condition compared with NSC culture condition after 5 days. Error bars
represent SD of the mean.*p < 0.05 and **p < 0.01.
(H) Representative images showing DiI-AcLDL uptake using iGSCs on matrigel supplemented with either EC media or NSC media. Scale bars, 50 mm.
(I) Representative images showing GdECs (yellow arrows) in tumor sections derived from p53DN-AKT-hNSCs by IF staining using the indicated markers. Scale
bars, 40 mm.
A Vector p53DN p53DN-AKT C
CD105 /Dil-AcLDL/DAPI VEGFR2 /Dil-AcLDL/DAPI
DMSO
CD144
RAPA
CD31
B Dil-AcLDL/DAPI E F
CD105 VEGFR2 vWF
4022
HUVEC
HUVEC
HUVEC
CD133-/CD144+
D Phase contrast Dil-AcLDL/DAPI
CD133+/CD144-
-
CD133-/CD144+
900
CD133+/CD144+
EC Signature Score
DAPI/eNOS/VEGFR2
CD133+/CD144+
CD133+/CD144+
CD133+/CD144-
CD133+/CD144-
800
G
TS543 TS576 TS586
CD133-/CD144-
TS603 BT112 BT147

CD133-/CD144-
700
Figure S2. AKT Activation Induces Endothelial Lineage Differentiation of hNSCs, Related to Figure 2
(A) IF staining of CD144 and CD31 in hNSCs expressing empty vector (Vector), p53DN transduced hNSCs (p53DN) and p53DN-AKT transduced hNSCs (p53DN-
AKT), Scale bars, 50 mm.
(B) Immunofluorescence analysis of HUVECs cultured with EC media for EC marker (CD105, VEGFR2, and vWF) expression and functional uptake of DiI-AcLDL.
Scale bar, 40 mm.
(C) IF staining of EC marker (CD105 and VEGFR2) expression and DiI-AcLDL uptake with Rapamycin (RAPA) treatment (50 nM) in sorted CD133+/CD144+ cells
from p53DN-AKT-hNSCs. Scale bars, 50 mm.
(D) Representative images showing tubular network formation and DiI-AcLDL uptake of sorted CD133+/CD144+ cells from p53DN-AKT-hNSCs under EC culture
conditions. Scale bars, 100 mm (top) and 50 mm (bottom).
(E) EC signature scores were calculated using the gene expression profiles of HUVEC (GSE20986) and the p53DN-AKT-hNSCs sorted cell fractions, CD133-/
CD144-, CD133+/CD144-, CD133+/CD144+, and CD133-/CD144+.
(F) Immunofluorescence analysis of sorted subpopulations from p53DN-AKT-hNSCs cultured with EC media for 3 days for EC marker (VEGFR2 and eNOS)
expression. Scale bar, 40 mm.
(G) Representative images showing the formation of the tubular network on matrigel of patient-derived GSCs (TS543, TS576, TS586, TS603, BT112 and BT147)
under EC culture conditions. Scale bars, 100 mm.
A Relative mRNA Level of WNT5A B C
4
myr-AKT OE WNT5A OE
KT
3
A
N-
- + - + BOX5
N
3D
3D
RL
p5
p5
CT
2 p-CaMKII
pAKT
WNT5A Total CaMKII

1
pS6 (Ser235/236)
ACTIN Actin
0
CTRL p53DN p53DN-AKT
IgG DMSO BOX5

D E
0.0 0.0 67.9 11.4 65.6 5.3
myr-AKT OE
- + BOX5
CD133-PE
0.0 0.2 0.1

p-CaMKII
0.0 0.0 63.4 9.0 64.2 5.5
WNT5A OE
Total CaMKII
Actin
0.0 0.1 0.2
CD144-FITC
Figure S3. AKT Upregulates WNT5A in EC Lineage Differentiation of hNSCs, Related to Figure 3
(A) qRT-PCR and (B) Immunoblotting analyses of WNT5A expression in hNSCs, p53DN-hNSCs and p53DN-AKT-hNSCs.
(C) Immunoblots showing WNT5A/CaMKII pathway in BOX5 (100uM) treated p53DN-hNSCs with overexpressed myr-AKT or WNT5A.
(D) Representative FACS showing the percentage of CD133+/CD144+ cells in p53DN- hNSCs that overexpress myr-AKT or WNT5A under treatment with WNT5A
antagonist BOX5 (50 mM) for 72 hr.
(E) Immunoblots showing WNT5A/CaMKII pathway in CD133+/CD144+ cells sorted from p53DN-AKT-hNSCs with BOX5 treatment (100uM).
A
Acetyl-a-Tubulin-Lys40
S6_pS235_S236
S6_pS240_S244
p70S6K_pT389
mTOR_pS2448
Rictor_pT1135
- Non-Sig.
Transglutaminase
Rb_pS807_S811
PI3K-p110-alpha
EGFR_pY1068
EGFR_pY1173
PARP_cleaved
HER3_pY1289
HER2_pY1248
PEA15_pS116
AMPK_pT172
Correlation
C-Raf_pS338
* p<0.01
Chk1_pS345
beta-Catenin
Annexin_VII
14-3-3_zeta
N-Cadherin
Src_pY416
Caspase-8
Cyclin_D1
Cyclin_B1
Heregulin
ER-alpha
Bap1-c-4
VEGFR2
** p<0.001
p90RSK
ARID1A
INPP4B
p70S6K
IGFBP2
GAPDH
4E-BP1
ERCC1
MYH11
GATA3
PREX1
Paxillin
Smad1
Notch1
PEA15
eEF2K
Raptor
53BP1
Rad51
Rab25
mTOR
-0.6 0 0.6
N-Ras
MIG-6
eIF4G
EGFR
FASN
HER2
eIF4E
GAB2
TFRC
ERK2
XBP1
TSC1
c-Met
ARHI
Chk1
Bcl-2
IRS1
Snail
SCD
DJ-1
VHL
SF2 *** p<0.0001
Bax
Syk
p53
Bid
Akt
WNT5A Odds Ratio Sig.
GLI2 269.01 ***

FOXG1 59.86 ***
SOX2 38.08 ***
PAX4 10.60 ***
PAX6 7.91 ***
HES1 4.54 **
TCF4 2.53 -
DLX5 0.25 **
B 100kb (hg19) chr2:121,750,000

C _
2kb (hg19) chr14:29,240,000
D _
5kb (hg19) chr3:181,436,000
_
hNSC hNSC hNSC
H3K27me3 _
H3K27me3 _
H3K27me3 _
iGSC iGSC iGSC

17 38.5 _ 55.7 _
_
hNSC hNSC hNSC

H3K27ac 19 _ H3K27ac 33.9 _ H3K27ac 32.7 _ iGSC
iGSC iGSC
11.1 _ 83.7 _ 90.2 _
hNSC hNSC hNSC

H3K4me3 13.8 _ H3K4me3 49.7 _ H3K4me3 74 _ iGSC
iGSC iGSC
35.2 _ 31.1 _ 54.7 _ hNSC
hNSC hNSC
H3K4me1,2 23.8 _ H3K4me1,2 23.6 _
H3K4me1,2 32.2 _ iGSC
iGSC iGSC
GLI2 FOXG1 SOX2
E hNSC
2kb (hg19) chr3:193,859,000 F _
hNSC 200kb (hg19) chr18:53,300,000 G

H3K27me3 2kb (hg19) chr7:127,256,000
_
_
H3K27me3 _ iGSC hNSC

iGSC H3K27me3 22.4
_
27.2 _
iGSC
16.7 _ hNSC
hNSC
_
H3K27ac 22.4 _ iGSC hNSC

H3K27ac 14.5 _
iGSC
H3K27ac _
_
iGSC
55
57.5 _ hNSC _
hNSC hNSC
H3K4me3 28.2 _ H3K4me3
H3K4me3 33.9 iGSC
_
_
iGSC iGSC
48.7 _
_
49.2 _ hNSC
hNSC hNSC
H3K4me1,2 35.6 _ H3K4me1,2 _
H3K4me1,2 32.2 _
iGSC iGSC
iGSC
TCF4 PAX4
HES1
H 30
CD133+/CD144-
I iGSC TS603 BT147
J iGSC TS543 TS576
Relative mRNA Level
CD133+/CD144+
20
- + - + - + OE PAX6 - + - + - + OE DLX5
4
PAX6 DLX5
3
2 WNT5A WNT5A
1
Actin Actin
0
A
4
X6
14
T5
PA
D
N
C
K L Genetic Events (PTEN loss, etc)
SOX2
TCF4 S X2
SOX2
PAX6 TCF4
TCF4
CF
F4
F4
FOXG1 PAX6
P
PAX
AX66
FO
FOXG1
OX
OX
DLX5
Core Transcriptional Networks in NSC Class 3. Endothelial cells Core Transcriptional Networks in NSC
Repressed Class 3.
Endoth
Endothelial
helial cells
c (WNT5A signaling, etc) Repressed
Lineages
(WNT5A signaling, etc) Class 2. Poised Lineages
eages Lineages
Class 2. Poised Lineages
Neurons Neurons
Oligodendrocyte
Oligodendrocytes
t Astrocytes Oligodendrocytes Astrocytes
Class 1. Normal Lineages Class 1. Normal Lineages

Physiological Pathological

Figure S4. Association among mTOR/S6K Pathway, WNT5A, and NSC Master Transcription Factors, Related to Figure 4
(A) Heatmap showing the correlation between WNT5A and NSC master transcription factors (TFs) in the context of mTOR/S6K pathway. TCGA GBMs (IDH wild-
type, n = 158) with both proteomic (RPPA) and transcriptomic datasets were used to calculate the correlation between gene expression and protein levels by
Spearman rank correlation (red/green color indicating positive/negative correlation). The first row shows expression correlation between the levels of indicated
proteins and WNT5A mRNA. The proteins with a Spearman correlation coefficient higher than 0.1 or less than 0.1 are shown. The correlation between TFs and
WNT5A was calculated by Fisher exact test; the odds ratios and significances are shown.
(B–G) Chromatin modification changes from pre-malignant state (hNSC) to malignant state (iGSC) for transcription factors, Gli2 (B), FoxG1 (C), SOX2 (D), HES1
(E), TCF4 (F), and PAX4 (G).
(H) qRT-PCR for CD144, WNT5A and PAX6 mRNA levels on the sorted CD133+/CD144- and CD133+/CD144+ cells from p53DN-AKT-hNSCs. Error bars represent
SD of the mean (n = 3). Immunoblot showing PAX6 (I) and DLX5 (J) overexpression in indicated GSCs. Cartoons showing models of WNT5A transcriptional
network involving in plasticity and multiple lineage differentiation of neural stem cell in physiological (K) and pathological (L) situation.
H&E pS6 WNT5A PAX6 DLX5
A
p53DN-AKT
TS543
B Vector WNT5A OE
C Tumor volume (mm^3) 500 p=0.0397 D Vector (n=5) WNT5A OE (n=5) F 150
1 00
Survival (%)
MVD by CD34
400
80
300 100
60
p<0.001 p<0.0001
200 40
20 50
100
0
0 0 10 20 30 40
Vector WNT5A OE Days after Implantation 0
Vector WNT5A OE
CD31 vWF
E G Vector WNT5A OE H 60
Satellites in peritumoral
region(mm^2)
40 p<0.0001
Vector
20
0
Vector WNT5A OE
WNT5A OE
I CD144 Promoter HSVTK GFP
HUVEC TS543
J Vector WNT5A OE
pCD144-GFP White Field
Vector WNT5A OE
K pCD144-GFP+ pCD144-GFP+
0.61% 6.98%
FL-1
FSC-H
L pCD144-GFP / DAPI TUNEL/ DAPI Merge M N GCV(-) GCV(+)
GCV (-) (n=5) GCV (+) (n=5)
GCV(-)
1 00
Survi val (%)
DAPI/TRA-1-85
75
50 p=0.012
GCV(+)
25
0
0 10 20 30 40 50
Days after Implantation

Figure S5. Overexpression of WNT5A in Patient-Derived GSCs Increases Vascularization and Invasiveness, Related to Figure 5
(A) Compared with xenograft tumor derived from p53DN-AKT-hNSCs (p53DN-AKT), representative images show that xenograft tumors derived from patient-
derived GSC TS543 display lower levels of activation of AKT/mTOR pathway (pS6), lower levels of WNT5A and DLX5, and higher levels of PAX6. Scale bars,
50 mm.
(B) Representative magnetic resonance images from SCID mice after intracranial injection of TS543 overexpressing WNT5A (WNT5A OE) or empty vector as
control (Vector). T2 sequences demonstrate infiltrative tumors in mouse brain (yellow line).
(C) Tumor volume was measured by T2 MRI scan (n = 5).
(D) Kaplan–Meier tumor-free survival analysis. TS543 cells overexpressing empty vector (Vector) or WNT5A (WNT5A OE) were implanted into SCID mouse brains.
Numbers of animals are indicated; p value was calculated by log-rank test.
(E) Representative IHC images of endothelial marker expression (CD31 and vWF) with low (Scale bars, 100 mm) and high (Scale bars, 50 mm) magnification for
tumor sections.
(F) Quantitation of MVD evaluated by CD34 staining (n = 3 tumors, 5 fields per tumor).
(G) Representative images of tumor edge between WNT5A OE versus Vector by H&E staining. Scale bars, 100mm.
(H) Quantitation of the number of satellites (> 3 nuclei close together) in peritumoral regions (0.3 mm2) by IF staining (n = 4 tumors, 5 fields per tumor).
(I) Schematic illustration of CD144-promoter-driven expression of HSVTK and GFP (pCD144-GFP).
(J) Representative images showing GFP expression driven by CD144 promoter only in HUVEC and TS543 overexpressing WNT5A (WNT5A OE) compared with
control (Vector). Scale bars, 100mm.
(K) Representative FACS analysis of GFP expression in human sphere line TS543 transduced with lentivirus carrying pCD144-GFP.
(L) TUNEL staining of apoptotic cells in GFP positive cells with pCD144-GFP in tumors after GCV treatment for one week, Scale bars, 25mm.
(M) Kaplan–Meier tumor-free survival analysis. TS543 cells overexpressing WNT5A were implanted into SCID mouse brain and mice were treated with/out GCV.
Numbers of animals are indicated; P value was calculated by log-rank test.
(N) Representative images showing tumors in SCID mouse brains with/out GCV treatment. Tumor cells were labeled by TRA-1-85 antibody staining (red). Scale
bars, 2000 mm.
A Low Frequency of GdECs High Frequency of GdECs
B
25
DAPI / TRA -1- 85 / pCD144-GFP /CD34
p=1.5e-05
MVD (CD34/Field)
20
15
10
0
Low frequency of High frequency of
GdECs GdECs
C
P1 P2 P3 P4
DAPI / TRA -1- 85 / pCD144-GFP /CD34
D **
E pCD144-GFP- pCD144-GFP+
pCD144-GFP+
/HBMECs HBMECs
30 *
Cell Number (10e+03)
25
TS543
20
15
10
TS603
0
CTRL rWNT5A
0 hour 72 hours
F G 80
Soft Agar Colony Formation
60
TS603
Soft Agar Colony Formation
70
TS543
50 60
50
40
40
30
30
20 20
10 10
0
0
- + - - - - + -
pCD144-GFP- - + - - - - + - pCD144-GFP-
+ - - + - + - -
+ - - + - + - - pCD144-GFP+
pCD144-GFP+
+ + + + + - - -
GSC + + + + + - - - GSC
HBMEC - - + + - - - +
HBMEC - - + + - - - +
Figure S6. WNT5A-Mediated GdECs Recruit Existing ECs for GSC Growth, Related to Figure 6
(A) Representative images showing the density of existing endothelial cells (TRA-1-85-/CD34+) and GdECs (pCD144-GFP+, green arrows) in the peritumoral
areas. Scale bars, 50 mm.
(B) Boxplots show the CD34-based MVD analyzed in peritumoral areas with low (less than 5%) and high (more than 5%) frequency of GdECs (n = 3 tumors, 5 fields
per tumor).
(C) Representative images show the distance between mouse endothelial cells (TRA-1-85-/CD34+, white arrows) and the nearest GdECs (pCD144-GFP+, green
arrows)/tumor cells (TRA-1-85+/GFP-, red arrows) in multiple peritumoral areas (P1-P4). Scale bar, 25 mm.
(D) The number of HBMECs was counted after 72 hr treated with/without rWNT5A at 0.5 mg/ml in serum-free EC media. Error bars represent SD of the mean, n = 3;
*p < 0.05, **p < 0.01.
(E) Representative images showing neurosphere formation of TS543 and TS603 co-cultured with GdECs and/or HBMECs in transwell for 7 days. Scale bars,
200 mm.
Soft agar colony formation assay in 48-well plate showing anchorage-independent growth capability of GSC co-culturing with GdECs and HBMECs in TS543 (F)
and TS603 (G) Error bars represent SD of the mean for 5 wells.
A B C D G
2000
Normalized GdEC Signature Score
6
10 15 P4 P41
p=8.9e-8 p=0.0002 p<0.0001
GdEC (SOX2+/CD105+) (%)

WMT5A Expression (RMA)
EC Signature Score
Relative mRNA Level
9
8 P57
Paired Peritumor
P49 p=0.00039
4
10 P55
6
6
8
P52
1000
4
P42
4
7
2
5 P54
0
2 Spearman Correlation Test
6
or
In mo
rho=0.185, p-value=0.0004
m
P56
tu
rit
tra
Pe
0
0
0 0
5
IDHwt GBMs non-turmor Low,n=6 High,n=6 Low High -0.5 0.0 0.5 1.0 1.5 2.0 0 2 4 6
n=364 n=10 WNT5A WNT5A Normalized WNT5A mRNA Expression Paired Intratumor
E SOX2/CD31/Hematoxylin F WNT5A/CD31/Hematoxylin H I
400
Tumor Vessel High WNT5A (case #6) Low WNT5A (case #7)
Normalized GdEC Signature Score

Wilcoxon rank Wilcoxon rank
0.4
Normalized WNT5A Expression
test p=6.09e-05 test p=4.25e-08
Intratumoral
Raw Image
200
0.2
Score Map
Peritumoral
0.0
0
CE NE CE NE
J WNT5A/CD31/Hematoxylin L M
1500
Primary Recurrent 4000
Difference of EC Sig. between pri/rec Pairs

GdEC Sig. Score in Primary Tumors
Higher in Rec
4000
2000
1000
0
Case #1
-0.5 0.0 0.5
500
2000
0
-500
-1000
0
Case #14
Spearman Correlation Test

-1500
Lower in Rec Higher in Rec

rho=0.1875, p=0.094
-0.2 -0.1 0.0 0.1 0.2
Difference of normalized WNT5A mRNA
-0.4 -0.2 0.0
expression between pri/rec pairs
Normalized mRNA expression of EC Sig.higher EC Sig. lower
WNT5A in primary tumors WNT5A vs. EC Sig. in Rec in Rec
WNT5A higher in Rec 30 14

15 p=0.0348 40 p=0.0298
K N WNT5A lower in Rec 11 26
WNT5A Staining Index (%)
CD31 Staining Index (%)
Fisher exact test p=0.0006, odds ratio=4.953
Primary tumor Satellite lesion Recurrent tumor

30 Host EC
10 GSC-derived EC
(GdEC)
20 Glioblastoma stem
cell (GSC)
5 non-Glioblastoma
10 stem cell
Primary tumor Satellite lesion Recurrent tumor Endothelial lineage
differentiation of GSC
0 0 x WNT5A-mediated
x
recruitment
x
4)
4)
)
14
Depletion of GdEC
14
=1
=1
n=
n=
i(n
i(n
x
c(
Blocking recruitment
c(
Pr
Pr
Re
Re
Figure S7. WNT5A and GdECs Are Strongly Correlated with Tumor Recurrence in Human GBMs, Related to Figure 7
(A) WNT5A mRNA expression in TCGA IDHwt GBM tumors compared to non-tumor brain tissues. Gene expression was normalized by RMA and p value was
calculated by Wilcoxon Rank test.
(B) Two groups, Low WNT5A (n = 6) and High WNT5A (n = 6), show the average of WNT5A mRNA level for 12 fresh GBM specimens (IDHwt) from TCGA.
(C) The quantitation of GdEC (CD105+/SOX2+) percentage in 12 tumors from Low and High WNT5A groups. The p value was calculated by unpaired Student’s t
test in two groups.
(D) Correlation between WNT5A mRNA expression and EC signature score (n = 364 IDHwt); mRNA expression was normalized across genes.
(E) Identification of GdECs in tumor vessels by an automated quantitative pathology imaging system. Representative images with IHC double-staining and cell
segmentation obtained from Caliper InForm analysis software show tumor vessels with close proximity of GdEC (SOX2+/CD31+, yellow) and host ECs (SOX2-/
CD31+, green) in GBM patient specimens. SOX2+/CD31- cells are marked in red color and SOX2- /CD31- cells are marked in blue color. Scale bars, 20 mm.
(F) Representative IHC images show WNT5A and CD31 staining in the primary tumors of two patients with peritumoral satellite lesions. Scale bars, 25 mm (top
panel); 50 mm (bottom panel).
(G) Comparison of GdEC signature score between 9 pairs of intratumor and peritumor regions from GBM patients. Each dot represents a pair. Boxplot sum-
marizes the distribution of GdEC signature score in 9 intratumor and peritumor regions, respectively.
(H and I) Boxplots showing WNT5A expression and GdEC signature score in 39 samples from contrast-enhancing (CE) regions and 36 samples from non-
enhancing (NE) regions from 27 different glioma patients.
(J) Representative double-stained IHC images show WNT5A and CD31 staining in paired primary and recurrent GBM from 2 patients. Scale bars, 25 mm.
(K) Quantification of WNT5A and CD31 staining index in 14 paired primary and recurrent GBMs. The p values were calculated by Wilcoxon signed-rank test.
(L) Correlation between WNT5A expression and GdEC signature scores in primary GBMs. Boxplot inset shows all the 81 pairs, while large boxplot panel shows
the majority of samples (n = 69).
(M) Association of differences of WNT5A mRNA expression and EC signature score between 81 matched primary/ recurrent GBMs pairs. Each circle represents a
GBM pair. The mRNA expression was normalized across genes.
(N) Cartoon showing the model for GSC-EC differentiation and recruitment contributing to satellite lesions formation and tumor recurrence. It may be possible to
block tumor recurrence by targeting this process.
Article
Hematopoietic Stem Cells Count and Remember

Self-Renewal Divisions
Jeffrey M. Bernitz, Huen Suk Kim,
Ben MacArthur, Hans Sieburg,
Kateri Moore
Correspondence
kateri.moore@mssm.edu
In Brief
HSCs count and remember the number of
times they have divided to limit their cell
divisions, a mechanism that may underlie
many phenomena associated with HSC
aging.
Highlights
d A rare population of dormant LR-HSCs persists throughout
adult life
d Only 1% of repopulating cells within the aging HSC

compartment are LT-HSCs
d LR-HSCs divide symmetrically four times throughout adult

life to accumulate with age
d HSCs count and retain a memory of their cell division events

in vivo
Bernitz et al., 2016, Cell 167, 1296–1309

November 17, 2016 Published by Elsevier Inc.
Article
Hematopoietic Stem Cells Count

and Remember Self-Renewal Divisions
Jeffrey M. Bernitz,1,2,3 Huen Suk Kim,1,2,3 Ben MacArthur,4,5,6 Hans Sieburg,7 and Kateri Moore1,2,8,*
1Department of Cell, Developmental and Regenerative Biology
2Black Family Stem Cell Institute
3The Graduate School of Biomedical Sciences
Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1496, New York, NY 10029, USA
4Mathematical Sciences
5Centre for Human Development, Stem Cells and Regeneration, Faculty of Medicine
6Institute for Life Sciences
University of Southampton, Southampton SO17 1BJ, UK

7Vaccine Research Institute of San Diego, San Diego, CA 92121, USA
8Lead Contact
*Correspondence: kateri.moore@mssm.edu
SUMMARY therapeutic agents, but readily activate in response to stress

(Essers et al., 2009; Wilson et al., 2008). LR-HSCs are known
The ability of cells to count and remember their divi- to retain labels for close to a year (van der Wath et al., 2009; Wil-
sions could underlie many alterations that occur dur- son et al., 2008), yet their role, or presence, throughout the en-
ing development, aging, and disease. We tracked the tirety of adult life remains unknown.
cumulative divisional history of slow-cycling hemato- Upon aging, HSC regenerative potential declines. Aged HSCs
poietic stem cells (HSCs) throughout adult life. This show reduced self-renewal (Dykstra et al., 2011), impaired
homing and engraftment upon transplantation (Dykstra et al.,
revealed a fraction of rarely dividing HSCs that con-
2011; Morrison et al., 1996), and myeloid-biased differentiation
tained all the long-term HSC (LT-HSC) activity within
(Beerman et al., 2010; Benz et al., 2012; Cho et al., 2008;
the aging HSC compartment. During adult life, this Muller-Sieburg et al., 2004; Rossi et al., 2005; Sudo et al.,
population asynchronously completes four traceable 2000). Paradoxically, HSCs are reported to increase in number
symmetric self-renewal divisions to expand its size with age, both in mice and humans (Beerman et al., 2010; Cho
before entering a state of dormancy. We show that et al., 2008; Dykstra et al., 2011; Morrison et al., 1996; Pang
the mechanism of expansion involves progressively et al., 2011; Sudo et al., 2000). Current models of HSC aging sug-
lengthening periods between cell divisions, with gest that the increase in HSC numbers compensate for defects
long-term regenerative potential lost upon a fifth divi- that naturally occur with age, but evidence for this hypothesis
sion. Our data also show that age-related phenotypic is lacking (Geiger et al., 2013).
changes within the HSC compartment are divisional Our previous work showed that as HSCs progressively divide,
they gradually lose regenerative capacity (Qiu et al., 2014). Many
history dependent. These results suggest that
characteristics of aged hematopoiesis can be replicated by en-
HSCs accumulate discrete memory stages over their
forcing HSC proliferation (Beerman et al., 2013; Walter et al.,
divisional history and provide evidence for the role of 2015). Thus, we hypothesized that HSC aging may be a direct
cellular memory in HSC aging. result of extensive proliferative history accrued throughout life.
This requires a better understanding of how HSCs naturally cycle
INTRODUCTION over long periods of time.
Here, we report a population of HSCs capable of retaining
Hematopoietic stem cells (HSCs) are bone marrow (BM) resident H2BGFP for at least 22 months in vivo. This population of stable
stem cells responsible for maintaining the hematopoietic system label-retaining HSCs (sLR-HSCs, cells capable of retaining a
throughout life and largely reside in a quiescent state (Naka- pulsed H2BGFP label for >10 months) contains all cells capable
mura-Ishizu et al., 2014). Studies on HSC cycling kinetics show of robust multi-lineage engraftment in primary and secondary
the presence of a rare population of dormant HSCs that divide transplantation (hereafter referred to as long-term [LT-] HSCs)
minimally over time and retain all the serially transplantable he- in aging BM when assayed at the clonal level. We find that LR-
matopoietic regenerative potential in the BM (Foudi et al., HSCs decline in frequency within the stem cell compartment
2009; Qiu et al., 2014; Wilson et al., 2008). These dormant but accumulate in absolute number in the BM with age. This
HSCs are identified by their ability to retain a pulsed histone accumulation process follows a model in which LR-HSCs un-
2B-green fluorescent protein (H2BGFP) label and are referred dergo symmetric self-renewal events with progressively length-
to as label-retaining HSCs (LR-HSCs). LR-HSCs are believed ening periods between cell divisions until LR-HSCs reach a state
to act as a reserve stem cell population, which can resist chemo- of complete dormancy after four traceable self-renewal events.
1296 Cell 167, 1296–1309, November 17, 2016 Published by Elsevier Inc.
Figure 1. LR-HSCs Persist in BM throughout Life and Contain All LT-HSC Activity in Aging BM
(A) Schematic of long-term dox treatments. Two- to 4-month-old 34/H2BGFP mice were placed on dox for periods ranging from 3–22 months. At the end of dox,
chase BM was analyzed for the presence of LR-HSCs.
(B) Histogram of the LSKCD48-Flk2–CD150+ HSC compartment before and after 12-month dox chase. LR-HSCs were determined by gating above the back-
ground GFP levels of single transgenic TetO-H2BGFP HSCs.
Cell 167, 1296–1309, November 17, 2016 1297

Our results show a direct link between cell division events and populations sustained secondary engraftment, while GFPLo
HSC behavior and suggest that HSCs can count and retain a HSCs progressively declined until exhaustion by the end of
memory of their cell divisions during adult life. 24 weeks (Figures 1F and 1G; Table S1). At sacrifice, we
analyzed regeneration of primitive HSPC compartments in the
RESULTS BM (Figure 1H). We found that after both transplants, GFPHi cells
generated higher HSPC chimerism compared to GFPLo cells
A Small Population of LR-HSCs Exists throughout (Figures 1I and 1J). However, when compared to the Total pop-
Adult Life ulation, GFPHi cells showed greater regeneration only in the HSC
To assess the cumulative divisional history of HSCs throughout population after primary transplant, with no difference after sec-
adult life, we used a tet-off hematopoietic stem and progenitor ondary transplant. Taken together, the data show that the small
cell (HSPC)-specific H2BGFP label-retaining system (Qiu et al., proportion of GFPHi sLR-HSCs—identifiable only by their mini-
2014). This system allows HSPCs to be continuously labeled mal divisional history—contain all of the LT-HSC potential within
with H2BGFP throughout development and ontogeny, ensuring the aging HSC compartment.
thorough and robust labeling of the HSC compartment (cells
defined as Lineage-Sca-1+cKit+CD48–Flk2–CD150+) at the Increased Divisional History Marks Increased Myeloid
onset of doxycycline (dox) chase (Figure 1B). At the onset of Potential
adulthood, a fully labeled HSC compartment can dilute GFP Aged HSCs show increased myeloid cell output upon transplan-
seven to eight times before reaching background levels (Fig- tation (Beerman et al., 2010; Benz et al., 2012; Cho et al., 2008;
ure S1A). However, after the onset of adulthood, active regula- Dykstra et al., 2011; Gekas and Graf, 2013; Sudo et al., 2000). To
tion of the hCD34 promoter to drive H2BGFP expression is understand how proliferative history correlates with lineage
specific to only a subset of the HSC compartment (Figures reconstitution, we measured lineage output in the blood of pri-
S1B–S1F). mary and secondary transplant recipients. While all three aging
We placed young adult mice on dox chases ranging from HSC populations show increased myeloid output in primary
3–22 months (Figure 1A). The percentage of H2BGFP+ cells transplant, similar to the blood of unmanipulated aging mice (Fig-
rapidly declines initially but plateaus after 10 months and re- ure 1K), only GFPHi cells maintained this myeloid output in sec-
mains relatively constant for chases lasting nearly 2 years (Fig- ondary recipients (Figure 1L). As GFPLo grafts exhausted by
ure 1C). After 10–22 months on dox, sLR-HSCs represent the end of secondary transplantation (Figure 1F; Table S1), we
3.24% ± 1.26% (mean ± SD) of the HSC compartment (Fig- hypothesized that this fraction may be enriched for myeloid pro-
ure 1D). To eliminate the possibility that sLR-HSCs are an arti- genitors found within the HSC compartment lacking self-renewal
fact of leaky H2BGFP expression, we analyzed the HSC capacity.
compartment of mice exposed to dox from conception until While highly enriched for HSCs, the HSC compartment is het-
adulthood (Figure S2A). These mice did not express H2BGFP erogeneous. CD41 has been reported to mark myeloid and
above background levels at adulthood (Figure S2B) or after megakaryocyte progenitors within the HSC compartment
exposure to dox for up to 1 year (Figures S2C–S2E). Together, (Haas et al., 2015; Yamamoto et al., 2013) and its expression in-
these results show that a small population of cells within the creases with age (Gekas and Graf, 2013). Given that myeloid-
HSC compartment divides minimally throughout the majority restricted repopulating cells accumulate in aged mice (Dykstra
of adult life. et al., 2011; Sudo et al., 2000), we hypothesized that by exam-
ining CD41 expression, we would better understand both the
sLR-HSCs Contain All LT-HSC Activity within Aging BM heterogeneity and the myeloid potential within aging HSC
To test the function of sLR-HSCs, we performed competitive compartments.
transplants with aging GFPLo, GFPHi (sLR-HSCs), and Total We first compared the primitive stem and progenitor compart-
HSC populations sorted from 19-month-old mice chased with ments of young and aging mice. Consistent with other reports
dox for 15 months (Figure 1E). In primary recipients, all three (Beerman et al., 2010; Dykstra et al., 2011), CD150+ cells
populations stably engrafted, but the GFPHi population showed marking the HSC compartment increase with age (Figures 2A,
significantly higher levels of blood chimerism compared to the 2B, S3A, and S3B). However, the vast majority of this expanded
GFPLo and Total populations (Figures 1F and 1G). The Total compartment is CD41+ (Figures 2C, 2D, S3C, and S3D). Within
HSC population showed intermediate levels of chimerism in pri- the HSC compartment, CD41+ cells accumulate with age (Fig-
mary recipients. In secondary hosts, only the Total and GFPHi ures 2E, 2F, S3E, and S3F). Dissecting the HSC compartment
(C) Time course of label dilution after initiation of dox chase; n = 2–15 mice per time point.
(D) Percent of HSCs that are label-retaining after 10–22 months of dox chase (sLR-HSCs). n = 42 mice from 12 independent experiments.
(E–L) HSC populations were sorted from 19-month-old mice chased with dox for 15 months into Total, GFPHi, and GFPLo HSC populations. 200 cells from each
population were competitively transplanted per mouse. (E) Gating strategy for Total, GFPHi, and GFPLo HSC fractions. (F and G) Blood chimerism of granulocytes
(F), and total white blood cells (G) during primary and secondary transplants. (H–J) Analysis of donor-derived stem and progenitor cell compartments in recipient
BM. Gating strategy (H) and quantification of donor-derived HSPCs in primary (I) and secondary (J) transplantations after 22 and 24 weeks, respectively. (K and L)
Lineage distribution of donor-derived peripheral blood in primary (K) and secondary (L) hosts at 22 and 24 weeks, respectively. n = 8–14 mice per group from two
independent experiments. Data are displayed as the mean ± SEM. *p < 0.05, **p < 0.01, ***p < 0.001 by Welch’s t test.
See also Figures S1 and S2 and Table S1.
1298 Cell 167, 1296–1309, November 17, 2016

Figure 2. CD41 Expression on Young, Ag-
ing, and LR-HSCs
FACS analysis and quantification of the primitive
HSPC compartment from young and aging mice.
(A) CD150+ cells marking the HSC compartment in
young and aging BM.
(B) Quantification of (A).
(C) The same populations in (A) displayed as a
function of CD150 and CD41 expression.
(D) Quantification of (C).
(E) CD41 expression on Total, GFPLo, and GFPHi
HSCs.
(F) Quantification of CD41 expression on Total
HSCs.
(G) Ratio of CD41– to CD41+ cells found in Total,
GFPLo, and GFPHi HSCs. Data are displayed as the
mean ± SEM of 9–11 mice per group from three
independent experiments. *p < 0.05, **p < 0.01,
***p < 0.001 by Welch’s t test.
Clonal Analysis of Aging HSC

Populations Based on Label-
Retention and CD41 Expression
Lineage reconstitution, cell-surface
marker, and cell-cycle analyses suggest
that the expanded aging HSC compart-
ment may be dominated by myeloid pro-
genitors rather than HSCs. To test this,
we performed limiting cell number serial
transplants of aging HSC populations
based on label-retention and CD41
expression. We sorted the HSC compart-
ment from 19-month-old mice chased
with dox for 17 months into four popula-
tions: GFPHiCD41–, GFPHiCD41+,
GFP CD41 , and GFP CD41+ (as in Fig-
Lo – Lo
ure 2E). Fifteen cells from each population

were then competitively transplanted and
the donor-derived myeloid, B, and T cell
contributions to peripheral blood were fol-
lowed in both primary and secondary
hosts. Because we were looking for any
further using label-retention, we find the Total and GFPLo popu- form of repopulating activity and not just stem cell activity, we
lations show similar increases in CD41 expression with age, but considered a mouse repopulated if at any point during primary
young and aging GFPHi populations are depleted of CD41+ cells transplantation we found any donor-derived lineage to con-
(Figures 2E and 2G). Cell-cycle analyses show CD41+ cells have tribute to >0.1% of total circulating leukocytes (Figure 3A)
a lower percentage of cells in G0 than CD41 cells (Figures S3I (Yamamoto et al., 2013). Based on this definition of repopulation,
and S3J) and retain lower levels of H2BGFP over time (Figures we performed limiting dilution analysis to determine the clonality
S3K and S3L). These data show that CD41 expression on of our transplanted populations, revealing that most of the mice
HSCs correlates with diminished quiescence and a higher rate were repopulated at or very near clonal levels (Table S2).
of cycling in vivo. Examining megakaryocyte potential, we found As a further measure to validate clonality, we transplanted BM
that the Total, GFPLo, and GFPHi HSC populations from young from a single primary host into two secondary hosts. Daughter
and aging animals all generated colonies in vitro that contained HSCs generated from the same initial parent HSC in vivo show
large cells with megakaryocyte morphology (Figures S4E and synchronous repopulation kinetics when transplanted into sepa-
S4G), but only the Total and GFPLo populations generated col- rate hosts (Müller-Sieburg et al., 2002). Thus, in secondary trans-
onies exclusively containing large megakaryocytes (Figures plants we followed the kinetics of total donor-type (CD45.2+)
S4D and S4G). repopulation in paired secondary hosts and measured the
Cell 167, 1296–1309, November 17, 2016 1299

Figure 3. Clonal Analysis of the Aging HSC Compartment Based on CD41 Expression and Label-Retention
The HSC compartment was sorted from 19-month-old mice chased with dox for 17 months into four populations based on CD41 expression and label retention,
and transplanted at a dose of 15 cells per mouse.
1300 Cell 167, 1296–1309, November 17, 2016

degree of synchronicity between them using Hamming distance type, as expected the most primitive cell types show the most
clustering (Sieburg and Müller-Sieburg, 2004). A total of 37 of the robust regeneration efficiencies, while the more developmentally
46 mice initially transplanted survived through 6 months in pri- restricted cell types showed reduced regeneration capacities.
mary hosts. Twenty-four of those showed repopulation above Interestingly, both LT- and IT-HSC patterns showed impaired
threshold in primary recipients and were used for paired second- regeneration of common lymphoid progenitors (CLPs), consis-
ary transplantation. A total of 23 of the 24 secondary transplant tent with aging HSC phenotypes (Rossi et al., 2005).
pairs were used for Hamming distance analysis, as one of the Together, these data support the conclusion that LT-HSCs
mice in the 24th pair died before the first blood analysis. From residing in aging BM are exclusively found in the rare population
the 23 pairs analyzed, 18 showed synchronous kinetics of of sLR-HSCs (3%), and the vast majority of the HSC compart-
donor-type repopulation in secondary hosts, indicative of clonal ment (97%) consists of repopulating cells with limited self-
repopulation (Figure S5). renewal and restricted differentiation potential. This ability allows
For analysis of repopulating patterns, all 24 repopulated mice us to see that the dramatic expansion of the phenotypic HSC
were considered. From the 24 repopulated mice, we observed compartment with age is primarily due to the expansion of pro-
five distinct repopulation patterns: myeloid-restricted, bipotent liferative progenitors with limited regenerative potential.
progenitor, short-term HSC (ST-HSC), intermediate-term HSC
(IT-HSC), and LT-HSC patterns (Figure 3B). We quantified the Tracking GFP Label Dilution Reveals that LR-HSCs
heterogeneity of repopulation patterns within each sorted popu- Symmetrically Self-Renew throughout Adult Life
lation and found LT-HSC repopulation was exclusively confined Next, we wanted to address how functional HSC numbers
to the GFPHi fractions (Figure 3C). GFPHi cells showed only IT- expand with age. While the HSC compartment expands dramat-
and LT-HSC repopulation patterns, which were similarly repre- ically with age (Beerman et al., 2010; Dykstra et al., 2011; Rossi
sented regardless of CD41 expression (Figure 3C). In contrast, et al., 2005; Sudo et al., 2000), functional LT-HSC numbers
the GFPLo cells showed limited self-renewal potential and within aging BM expand but to a lesser extent (Cho et al.,
greater heterogeneity of repopulating patterns. We found CD41 2008; Morrison et al., 1996; Sudo et al., 2000). Our previous
expression enriched for myeloid-restricted repopulating cells studies showed, and this study confirms at the clonal level,
within the GFPLo compartment (Figure 3C). Finally, we used that independent of cell surface marker expression all LT-
the proportion of repopulating cell types found within each HSCs are LR-HSCs (Figure 3) (Qiu et al., 2014). Thus, we inves-
sorted HSC fraction to extrapolate the representation of repopu- tigated the phenomena of functional HSC expansion further by
lating cell types found within the total aging HSC compartment. utilizing information from our total LR-HSC populations, regard-
Remarkably, LT-HSCs only represent 1% of the repopulating less of CD41 expression.
cells of the total aging HSC compartment (Figure 3C). Instead, First, we examined H2BGFP label dilution in young mice. In the
we found that cells with limited self-renewal represent 80% GFPHi HSC fraction, we find five distinct subpopulations of cells
of repopulating cells of the aging HSC compartment, with 38% associated with GFP peaks (Figure 4A). These peaks are identifi-
myeloid progenitors (Figure 3C). able using the proliferation index utility in Flowjo software (Trees-
After primary transplant, we analyzed recipient BM for donor- tar), indicating that they are associated with successive cell divi-
derived regeneration of various stem and progenitor cell com- sions where divisional history increases as cells progress from
partments (Figure 3D). When comparing regeneration based on peak 0 to peak 4. To test if the observed peaks represent actual
sorted cell phenotype, GFPHi cells repopulated each compart- cell division events, we extracted GFP intensity data from GFPHi
ment with higher efficiency than GFPLo cells. Notably, both HSCs of young mice and analyzed the relative positions of each
GFPHiCD41+ and GFPHiCD41– cells generated CD41– and of the observed GFP peaks. We found that the positions of these
CD41+ HSCs with similar efficiencies. Within the GFPLo fractions, peaks are very well described by a simple model in which GFP
GFPLoCD41– cells more efficiently regenerated the more primi- dilutes by a factor of 2 through each cell division (Figure 4C).
tive compartments than the GFPLoCD41+ cells, which primarily This suggests that these peaks are indeed marking cell division
made myeloid progenitors (Figure 3D). When comparing regen- events, and we can use the GFP peaks found within the GFPHi
eration based on retrospectively assigned repopulating cell cells to quantify cell divisions over time in vivo.
(A) Reconstitution curves of donor-derived (CD45.2+) total white blood cells, myeloid, B cells, and T cells for each transplanted mouse through 24 weeks in
primary and secondary recipients. The transition from primary to secondary transplantation is marked by the x axis break. The horizontal line marks the threshold
of successful reconstitution. Secondary transplants are displayed as the mean ± SEM.
(B) Examples of the five reconstitution patterns observed. Definition of repopulation patterns: myeloid-restricted only repopulated myeloid cells; bipotent pro-
genitors gave rise to myeloid and B cells; ST-HSCs showed transient repopulation of all three lineages, with donor chimerism of at least one lineage dropping
below threshold by 24 weeks after primary transplantation; IT-HSCs repopulated all three lineages, but had at least one lineage drop below threshold by 24 weeks
after secondary transplantation; and LT-HSCs maintained repopulation in all three lineages above threshold throughout both primary and secondary
transplantation.
(C) Distribution of repopulating cell types found within each aging HSC compartment. The zoomed region represents 10% of the Total HSC compartment.
(D) Heatmaps displaying the regeneration of primitive BM populations by clonally transplanted aging HSCs after primary transplant. Transplanted cell populations
are listed above each column—initially sorted cell (left) and retrospectively categorized repopulating cell (right). Regenerated HSPC types are listed to the right.
The darker the chamber, the greater the proportion of reconstituted mice regenerated the given cell type. Numbers within each chamber represent the per-
centage in decimal format of reconstituted mice with each cell type.
Cell 167, 1296–1309, November 17, 2016 1301

1302 Cell 167, 1296–1309, November 17, 2016

Next, we compared the GFP peaks from young and aging LR- LR-HSCs cycle symmetrically, generating two daughter LR-
HSC populations. Although LR-HSCs from young mice display HSCs with each division, which we can visualize a maximum of
five distinct GFP peaks, 97.4% of sLR-HSCs from aging mice four times, and LR-HSCs found in peak 4 in young mice remain
are found within peak 4 (Figures 4A and 4B). This suggests deeply dormant throughout adult life.
that the vast majority of sLR-HSCs have completed four trace-
able cell divisions. We then quantified the absolute number of LR-HSC Cell-Cycle Times Extend with Divisional History
LR-HSCs in the leg bones of young and aging mice. Even though until Dormancy Is Reached after Four Traceable
LR-HSC representation decreases between young and aging Divisions
mice (Figure 4D), we find that the absolute number of LR- To investigate this age-related LR-HSC number expansion, we
HSCs within the BM increases 2-fold with age (Figure 4E). used LR-HSC numbers from young mice as initial conditions
To summarize, transplantation experiments indicate that all for a series of mathematical models to determine whether divi-
LT-HSC activity is contained within the LR-HSC fraction in sional history impacts LR-HSC cell-cycle progression (see the
both young and aging mice (Figures 1 and 3). Thus, our findings STAR Methods). The first model assumes that cells divide at a
show that (1) LT-HSCs are exclusively LR-HSCs regardless of constant rate regardless of divisional history (Figure 4G; Con-
age, (2) the absolute number of LR-HSCs increases over time, stant). Because GFP will ultimately dilute away in all cells under
(3) the majority of sLR-HSCs have completed four divisions dur- this model, it cannot explain LR-HSC accumulation with age and
ing the span of dox chase, and (4) H2BGFP is not re-expressed in suggests that divisional history impacts LR-HSC cell-cycle pro-
the presence of dox. Taken together, these data suggest that gression time. To account for this accumulation, we considered
expansion of LR-HSC numbers is due to symmetric self-renewal a revised model in which individual cells cycle at a constant rate
events of LR-HSCs during the lifespan of the mouse. until they have divided four times, at which point they stop
To test this, we used a simple model to estimate the expansion dividing. This modified model explains the data well (Figure 4G,
capacity of LR-HSCs from young mice (Figure 4F). First, we Step Function), but makes the strong assumption that cells cycle
quantified the absolute numbers of LR-HSCs found in each normally until they have divided four times and then suddenly
peak in young mice. Because >95% of sLR-HSCs in aging stop dividing. We reasoned that this was not likely, but rather
mice are found in peak 4, we modeled LR-HSC expansion with that successive cell divisions become progressively less likely
age as population doublings of each GFP peak as the cells divide to occur the more a cell has divided previously. This prompted
to dilute their H2BGFP level from their starting peak in young us to further adjust our model to understand how cell-cycle entry
mice to peak 4. This generated a prediction of sLR-HSC alters with divisional history. Both linear and exponential exten-
numbers once every young LR-HSC had divided enough to sions of cell-cycle entry times with divisional history did not
match the GFP intensity found in peak 4. We found that the pre- accurately predict cell numbers in aging mice. However, a su-
dicted values closely matched the experimentally acquired data per-exponential extension was found to fit the experimental
from aging mice (Figure 4E). Of note, this model assumes no loss data well (Figure 4G). This model predicts that the times between
of LR-HSC number due to cell death, differentiation, or cell divi- cell-cycle events for cells in peak 0, 1, and 2 are short, but as
sion events resulting in GFP dilution to levels below peak 4. The cells undergo further divisions, this time dramatically lengthens
correlation between predicted and experimental data suggests until the expected time to the next cell cycle in peak 4 is
Figure 4. HSCs Count Symmetric Self-Renewal Divisions throughout Adult Life and Progress toward Dormancy
(A–C) Analysis of H2BGFP subpopulations within the GFPHi LR-HSC compartment. (A) Histogram displaying the H2BGFP peaks 0–4 visible within the LR-HSC
compartment of young (3–4 months on dox) and aging mice (14–22 months on dox). (B) Quantification of (A). n = 21 and 13 mice from six independent experiments
for young and aging mice, respectively. (C) Least-squares fitting of single cell GFP intensity data collected from LR-HSCs found within each GFP peak of young
mice. Observed experimental data are plotted as open circles, while predictions of a theoretical model in which H2BGFP concentration is reduced by a factor of 2
with each cell division is given by the dashed blue line. Experimental data were collected from 1,568 single LR-HSCs from six independent experiments.
(D) Percentage of LR-HSCs within the HSC compartment. n=19 and 13 from young and aging mice, respectively.
(E) Absolute number of LR-HSCs per long bone in young and aging mice. Predicted LR-HSC numbers were generated by extrapolating the expansion of each
young LR-HSC data point based on the distribution of cells found in peaks 0–4 for each mouse using the model in (F), then corrected based on the average
distribution of cells found in aging mice in (B). n = 17 and 10 mice from five and three independent experiments for young and aging mice, respectively.
(F) Symmetric self-renewal expansion model of LR-HSCs. As LR-HSCs slowly divide throughout adult life they transition from peak 0 to peak 4, symmetrically self-
renewing to double their numbers with each cell division. Arrows in the histogram depict the expansion capacity of cells as they progressively divide to reach
peak 4. Numbers displayed are the average numbers of LR-HSCs in each peak per long bone from 17 young mice in five independent experiments. Boxed in red is
the summation of LR-HSCs predicted to accumulate in peak 4 with aging.
(G) Mathematical modeling of cell-cycle progression as a function of divisional history within the LR-HSC compartment. Five models were considered (see the
STAR Methods for details). Displayed are representations of cell-cycle time progressions for each model (red dashed lines), as well as the experimentally
determined (open circles) and model-predicted sLR-HSC numbers (dashed blue curves) found in each GFP peak of aging mice. Cell-cycle times for the step
function and super-exponential models are actual times predicted by the model. As the constant, linear, and exponential models do not fit the data well, their
corresponding cell-cycle times are only visual representations.
(H and I) Distribution of LR-HSCs across each GFP peak (H), and quantification of LR-HSC absolute numbers after various lengths of dox chase (I). Legends refer
to the length of dox chase. Data are representations of two to six independent experiments per group.
(J) Cell-cycle analysis of GFP Peak cells in young (5 months old, 3 month dox chase, n = 3) and aging (11 months old, 9 month dox chase, n = 2) mice. Each mouse
represents an independent experiment. Data are displayed as the mean ± SEM. **p < 0.01, ***p < 0.001 by Welch’s t test.
See also Figure S6.
Cell 167, 1296–1309, November 17, 2016 1303

1,256 days, or 3.5 years. Importantly, this model predicts CD41+ LR-HSC (Figure 5B). However, with aging both CD41–
rather than assumes, that cells enter a dormant state once and CD41+ sLR-HSCs have similar GFP distributions where
they reach peak 4 and therefore explains why cells found in most cells reside in peak 4 (Figure 5B). To quantify this, we
peak 4 in young mice are still present in aged mice. measured GFP fluorescence intensities of each population and
To test these predictions, we examined the change in GFP observed brighter GFP in young CD41 over CD41+ LR-HSCs,
peak distribution and LR-HSC absolute numbers after various but find indistinguishable intensities in aging mice (Figure 5C).
lengths of dox chase. Consistent with the super-exponential Importantly, CD41– cell GFP intensity drops with age, but the
model of cell-cycle extension, we find cells in peaks 0, 1, and 2 GFP intensity of the CD41+ population does not change. As
are almost completely lost after a 3–4 month chase, and the CD41+ cells more rapidly proliferate than CD41– cells (Figure S3J)
vast majority of LR-HSCs are found in peak 4 as soon as and the CD41+ LR-HSC population expands with time (Figure 5D),
9 months after the start of dox (Figure 4H). Additionally, LR- either CD41– LR-HSCs found in GFP peaks 0–3 directly generate
HSC doubling is seen as early as 12 months after dox treatment CD41+ LR-HSCs, or CD41 expression becomes dynamic with age
and is maintained until at least 22 months on dox, corresponding and CD41– LR-HSCs turn on CD41 expression in a manner that
to 25 months of age (Figure 4I). If peak 4 cells were to continue to does not alter their regenerative potential (Figures 3C and 3D).
proliferate during these long chases, LR-HSC numbers should To test this, we compared the change in absolute numbers of
decrease over time. Rather, we find that LR-HSCs numbers CD41 and CD41+ LR-HSCs with age to their predicted expan-
are stable, suggesting that once cells reach peak 4, they are sion capacity from young mice (as in Figure 4F). We found that
dormant. We found one exception, where a 24-month-old while CD41 cells expand minimally from young to aging mice,
mouse had significantly lower numbers of sLR-HSCs than young CD41 LR-HSCs have the capacity to expand nearly
what would be predicted (Figure 4I, red asterisk). Upon dissec- 3-fold and still retain the GFP label (Figure 5D). In contrast,
tion, this mouse had an enlarged liver with multiple tumors CD41+ LR-HSCs show a 3-fold expansion from young to aging
(data not shown), indicative of a severe systemic stress, which mice, but contain no endogenous expansion capacity (Fig-
may have led to the activation and depletion of sLR-HSCs in ure 5D). Thus, in order for CD41+ LR-HSC numbers to expand,
this mouse. Cell-cycle profiles of young and aging peak 4 cells CD41 LR-HSCs must generate CD41+ LR-HSCs as they divide.
showed >90% of cells in G0, suggesting that the cell cycle of To precisely gauge CD41 contribution to the CD41+ LR-HSC
these cells do not change with time (Figures 4J and S6). Taken pool over time, we modified our super exponential model of
together, these analyses suggest that LR-HSCs double their cell-cycle extension to allow CD41 LR-HSCs to gain CD41
cell numbers with time by entering a dormant state after four expression with a fixed probability each time they divide. This
traceable symmetric self-renewal divisions, indicating that LR- model accurately fits the data for both CD41 and CD41+ LR-
HSCs count their cell divisions throughout life. HSC expansion in aging mice and estimates that in order for
the CD41+ LR-HSC pool to expand, approximately one in ten di-
CD41+ sLR-HSCs Are Generated from CD41– LR-HSCs vision events within the CD41 LR-HSC pool must give rise to a
throughout Adult Life CD41+ LR-HSC (Figure 5E). This shows that in the context of
In order to see if a cell division counting mechanism could under- homeostatic hematopoiesis, CD41 LR-HSCs reside at the
lie other changes seen in the HSC compartment with age, we re- apex of the hematopoietic hierarchy, generating the expanded
examined the hierarchical relationship between CD41– and CD41+ LR-HSC population found with age.
CD41+ LR-HSCs. CD41+ cells do not function as LT-HSCs at
the clonal level in young adult mice (Yamamoto et al., 2013), DISCUSSION
but in the aging HSC compartment CD41+ HSCs have been
shown to reside at the top of the hematopoietic hierarchy (Gekas Cells take in and store information about previous events that in-
and Graf, 2013). In order to understand how CD41+ cells develop fluence their subsequent behavior. In Drosophila and Aplysia,
LT-HSC potential with aging, we compared the H2BGFP label activated synapses of neurons convert CPEB to a self-propa-
dilution of CD41– and CD41+ cells within the GFPHi LR-HSC frac- gating state allowing specific synapses to ‘‘remember’’ their
tions of young and aging mice. We observe that young CD41– previous activation to facilitate long-term memory storage
LR-HSCs are distributed throughout all five GFP peaks, while (Majumdar et al., 2012; Si et al., 2003a, 2003b). B cells undergo
the vast majority of young CD41+ cells are primarily found in permanent somatic hypermutation after stimulation by foreign
peak 4, similar to the distribution of the aging compartment (Fig- antigens to produce antibodies with increased antigen affinity
ure 5A). CD41+ LR-HSCs increase considerably in number and (MacLennan and Gray, 1986). The differentiation of cells during
frequency with age (Figures 5D, S3G, and S3H), but given that development from a pluripotent to a committed state is generally
the majority of CD41+ LR-HSCs can no longer divide and retain considered an irreversible process that is epigenetically re-
LT-HSC potency, it would not be possible for young CD41+ corded to prevent aberrant expression of alternative cell-type-
LR-HSC to generate the expanded population of CD41+ sLR- specific genes (Ang et al., 2011; Efroni et al., 2008). Cell division
HSCs found in aging mice. Thus, we wondered whether CD41– ‘‘counting’’ might also be an important manifestation of cellular
LR-HSCs from young mice are responsible for generating the memory. Such a mechanism has been proposed to account
expanded population of CD41+ sLR-HSCs found in aging mice. for the loss of function and senescence of cells with age, where
Direct comparison of CD41– and CD41+ LR-HSCs from young cells ‘‘count’’ cell divisions via telomere length (Harley et al.,
mice showed that CD41– LR-HSCs have a greater proportion of 1990). While cells may be storing information about their cumu-
cells with a minimal divisional history in GFP peaks 0–3 than lative divisional history, technical challenges associated with
1304 Cell 167, 1296–1309, November 17, 2016

Figure 5. Young CD41– LR-HSCs Generate CD41+ sLR-HSCs in Aging Mice
(A) Distributions of CD41– (left) and CD41+ (right) LR-HSCs in young and aging mice across GFP peaks 0–4.
(B) Representative histograms directly comparing GFP levels in CD41– and CD41+ LR-HSCs from young (left) and aging (right) mice.
(C) GFP mean fluorescence intensity (MFI) of CD41– and CD41+ LR-HSCs in young and aging mice. n = 9–11 mice per group from three independent experiments.
(D) Quantification of CD41 and CD41+ LR-HSCs in young and aging mice. Predicted expansion capacity of young HSCs as predicted by the model in Figure 4F.
n = 6–7 mice from two independent experiments.
(E) Mathematical modeling of CD41 LR-HSC contribution to the CD41+ LR-HSC compartment. Predictions of a model in which the cell-cycle time extends
super-exponentially with the number of cell divisions (see Figure 4G) and in which CD41 LR-HSCs gain CD41 expression with probability 1 a each time they
divide are given in blue (full model details in the STAR Methods). This model most accurately fits the data when a = 0.88, suggesting that approximately one in
every ten CD41 LR-HSC divisions gives rise to a CD41+ daughter cell. Data are displayed as the mean ± SEM. *p < 0.05, **p < 0.01, ***p < 0.001 by Welch’s t test
or paired t test.
accurately documenting large numbers of cell divisions in vivo able regenerative potential (Foudi et al., 2009; Qiu et al., 2014;
have largely precluded observations of these phenomena. Wilson et al., 2008). We showed that label-retaining cells within
Here, we used H2BGFP label dilution to track HSC cell divisions the functionally heterogeneous LSK population could long-
accrued through the process of aging and investigate their term repopulate a mouse at a frequency of 1 in 2.9 cells (Qiu
impact on regenerative potential. et al., 2014). Here, we used label-retention to dissect the hetero-
geneity of the aging HSC compartment. We find that clonal sLR-
Divisional History and Heterogeneity of the Aging HSC HSCs function exclusively with IT- or LT-HSC potential. On the
Compartment other hand, the proliferative non-LR-HSCs contain a diverse
Previous work using H2BGFP label-retention suggests that the class of progenitors with reduced self-renewal and differentia-
fewer times HSCs cycle over time, the greater their transplant- tion potential. This non-LR-HSC population, being the product
Cell 167, 1296–1309, November 17, 2016 1305

of accumulated divisional history, may be the source of myeloid- tion, which contains all LT-HSC activity, cannot be contributing
repopulating cells described in clonal transplantation experi- to homeostatic hematopoiesis. This suggests that the popula-
ments (Dykstra et al., 2011; Sudo et al., 2000; Yamamoto tions maintaining a continuous supply of blood at steady state
et al., 2013). The degree of functional heterogeneity we find and the populations that act as stem cells during transplantation
within the aging non-LR-HSC population is consistent with evi- are distinct. These findings are in agreement with recent work on
dence from young animals that cell surface-marker combina- homeostatic hematopoiesis (Busch et al., 2015; Sun et al., 2014),
tions alone cannot select for functionally homogeneous as well as on crypt cells in the gut (Buczacki et al., 2013). Our
populations when analyzed at the single-cell level (Paul et al., data suggest that single cell divisions of LR-HSCs impact cell-
2015; Perié et al., 2015; Yamamoto et al., 2013). Additionally, cycle entry, and in the case of the fifth traceable division, a com-
the reduced regenerative capacity of the proliferative non-LR- plete loss of LT-HSC activity results. Thus, not only are HSCs
HSC population is also supported by studies indicating that counting their divisions, but also single cell divisions can impact
increasing cell cycle often reduces HSC function (Bowie et al., HSC behavior and regenerative potential.
2006; Pietras et al., 2011). Taken together, the accumulation
of divisional history and its inferred link to myeloid potential A Mechanism of Cell Division Counting May Underlie
in the aging non-LR-HSC population may contribute to the Age-Related Changes to the HSC Compartment
development of mutations that lead to myelofibrosis in aging We used our system to infer information about the hierarchical
populations. relationship of CD41+ and CD41– HSCs, both in the context of
Of note, our sLR-HSCs do not repopulate with absolute effi- regeneration and homeostasis. We initially thought that the
ciency. This could be explained by a number of defects that expanded CD41+ HSC compartment in aging mice exclusively
characterize aging HSC populations, including defects in cell- contained cells with myeloid-restricted regenerative potential.
cycle entry, replication stress, and repair of DNA damage accu- We found, in agreement with published data (Gekas and Graf,
mulated from long-term quiescence (Beerman et al., 2014; Flach 2013), that some CD41+ cells exhibit LT-HSC regenerative po-
et al., 2014; Mohrin et al., 2010; Rossi et al., 2007). However, all tential at the clonal level. These CD41+ LT-HSCs were found
repopulating sLR-HSCs repopulate with LT- or IT-HSC func- exclusively within the sLR-HSC compartment. Analysis of BM re-
tional potential. Thus, we propose that label-retention is a generated from retrospectively identified LT-HSCs after trans-
powerful tool for discriminating cells with HSC potential from plantation revealed that on the clonal level, both CD41– and
populations of functionally heterogeneous progenitors, espe- CD41+ HSCs had similar propensities to regenerate each other,
cially in the context of an aged system. This could prove benefi- suggesting that in the context of transplantation CD41 may not
cial when studying defects of aging HSCs by eliminating be an indicator of HSC hierarchical primitiveness. However,
confounding results that describe defects among the diverse CD41 expression within the non-LR-HSC population did enrich
HSPCs invariably contained within the HSC compartment. for cells with myeloid-restricted regenerative potential, suggest-
ing that CD41, and perhaps other cell surface markers, are
Tracking HSC Divisional History throughout Adult Life greater predictors of cellular function once cells have left the
Reveals HSCs Document Their Cell Divisions label-retaining population. Interestingly, our label retention sys-
The use of our label-retaining system enabled us to track and tem also allowed us to make inferences about the hierarchical
quantify single cell divisions over the lifespan of a mouse. We relationship of CD41 expressing cells within the LR-HSC. As
find that LT-HSCs are capable of executing four traceable divi- GFP fluorescence intensity can only be lost over time in a cell di-
sions prior to the loss of LT-HSC potential. This can be visualized vision-dependent manner, this enabled us to use the H2BGFP
by the dilution of H2BGFP intensity over time. This experimen- label-retaining system as a means to track the origin of expand-
tally supports the mathematical models of previous work on ing cell populations. Doing so showed that CD41– LR-HSCs
LR-HSCs predicting that LR-HSCs divide five times during the must give rise to the expanded CD41+ sLR-HSC population in
lifespan of a mouse (Wilson et al., 2008). In light of our data, aging mice.
this fifth cell division would result in a complete loss of LT-HSC It is important to state that while much of our evidence on ho-
potential and most likely the initiation of an irreversible decline meostatic hematopoiesis is supportive of recent studies (Busch
of the hematopoietic system. et al., 2015; Sun et al., 2014), and is supported by our mathemat-
Our approach enabled us to make inferences about LR-HSC ical models, further experiments need to be performed to fully
expansion and cell division patterns. We conclude that age- validate these conclusions, possibly using a dual label-retaining
associated expansion can be reached only if (1) LR-HSCs exclu- and lineage-tracing genetic system that identifies progeny of
sively undergo symmetric self-renewal divisions throughout life, label-retaining populations over time (Buczacki et al., 2013).
and (2) they reach dormancy after completing the fourth trace- We cannot exclude the possibility that a small number of LR-
able division. LR-HSCs divide asynchronously, but regardless HSCs undetectable by our experimental strategy divide beyond
of when a cell completes its fourth traceable division, we can as- peak 4 to leave the LR-HSC compartment and eventually
sume it enters a dormant state because each cell found in young contribute to active hematopoiesis. However, if this were the
mice can be accounted for mathematically in aging mice. If case, it must be a comparatively rare event. Nevertheless,
dormancy were not achieved after this fourth division, LR- viewed overall, our findings permit us to suggest a model for
HSCs could not accumulate to the extent they do with age, indi- the aging of the HSC compartment (Figure 6) in which the
cating that HSCs count and maintain a record of their divisions. dormant label-retaining compartment that contains all LT-HSC
Our data also allowed us to conclude that the LR-HSC popula- activity decreases in frequency within the HSC compartment
1306 Cell 167, 1296–1309, November 17, 2016

Figure 6. Self-Renewal Counting Model of Hematopoietic Stem Cell Aging
The HSC pool can be segregated into two populations based on label-retention. The LR-HSC pool contains all of the transplantable LT-HSC activity, while the
non-LR-HSC pool is comprised of cells with minimal self-renewal and restricted regenerative capacity. With aging, both the LR- and non-LR-HSC pools expand.
The LR-HSC pool asynchronously undergoes four traceable symmetric self-renewal events, increasing the functional stem cell pool size over time while
simultaneously diluting the GFP label with each cell division. After a fourth traceable self-renewal event, LR-HSCs enter a state of dormancy—as a fifth cell
division would result in complete loss of LT-HSC potential—indicating that the LR-HSC population counts their cell divisions throughout life. The fact that the LR-
HSC pool exclusively undergoes symmetric cell divisions before entering a stably dormant state means that they contribute minimally, if at all, to homeostatic
hematopoiesis, unless activated to divide again by stress. The non-LR-HSC pool represents the vast majority of the stem cell pool in aging mice. Within the non-
LR-HSC pool CD41+ cells enriched for myeloid progenitor activity accumulate with time and dominate the aging HSC compartment. It is most likely that the non-
LR-HSC pool maintains active hematopoiesis during steady state conditions. In the context of regeneration, we identified five types of stem and progenitor cells
with regenerative potential after transplantation within the total HSC compartment. When analyzed as a total HSC population, the predominance of myeloid
progenitors and cells with limited self-renewal potential contributes to increased myeloid representation in regenerated peripheral blood and reduced long-term
engraftment. Image by J. Gregory (2015) Mount Sinai Health System.
but actually expands over time. This partially explains the obser- It remains uncertain whether the non-LR-HSC compartment is
vation that aging HSCs show impaired function upon transplan- directly derived from the LR-HSC compartment as a function of
tation due to the diminished frequency of LR-HSCs within the continuously accumulated proliferative history. As the aging
stem cell compartment, while also revealing how LT-HSCs in- non-LR-HSC compartment shows attenuated repopulation po-
crease with aging via increased absolute number of LR-HSCs tential and increased myeloid cell output, it would be consistent
within the whole BM. This compartment symmetrically self-re- with recent studies indicating that increased divisional history re-
news over time precluding contribution to homeostatic hemato- capitulates these hallmarks of aged HSCs (Beerman et al., 2013;
poiesis. The non-LR-HSC compartment also expands over time, Walter et al., 2015). However, it is not yet clear if the aging LR-
but to a greater extent, and contains cells with limited self- and non-LR-HSC compartments differ in other described phe-
renewal and differentiation capacity upon transplantation. These notypes of aged HSCs including the surrogate DNA damage
cells are likely to support homeostatic hematopoiesis. With ag- marker gH2AX foci or cdc42 localization.
ing, the non-LR-HSC compartment becomes dominated by It will be interesting to investigate the underlying molecular
CD41 expressing cells enriched for myeloid-restricted repopula- mechanisms responsible for this cellular memory. Several
tion potential and partially accounts for the increased propensity studies have tracked cell division numbers in Bacillus subtillis
of the compartment as a whole to produce greater myeloid cell sporulation, Drosophila spermatogenesis, and oligodendrocyte
output with age. precursor differentiation. These studies reveal that after several
Cell 167, 1296–1309, November 17, 2016 1307

rounds of division a genetic factor accumulates, reaching a Beerman, I., Bhattacharya, D., Zandi, S., Sigvardsson, M., Weissman, I.L.,
threshold to initiate a cell fate change (Dugas et al., 2007; Insco Bryder, D., and Rossi, D.J. (2010). Functionally distinct hematopoietic stem
cells modulate hematopoietic lineage potential during aging by a mechanism
et al., 2009; Levine et al., 2012). Alternatively, one could imagine
of clonal expansion. Proc. Natl. Acad. Sci. USA 107, 5465–5470.
dilution of a factor wherein progressive loss of this factor extends
Beerman, I., Bock, C., Garrison, B.S., Smith, Z.D., Gu, H., Meissner, A., and
cell-cycle progression leading to dormancy during homeostatic
Rossi, D.J. (2013). Proliferation-dependent alterations of the DNA methylation
aging. Further studies are necessary to reveal this mechanism. landscape underlie hematopoietic stem cell aging. Cell Stem Cell 12, 413–425.
Within it may lay the key to understanding the maintenance of Beerman, I., Seita, J., Inlay, M.A., Weissman, I.L., and Rossi, D.J. (2014).
self-renewal so long lacking in the field. Quiescent hematopoietic stem cells accumulate DNA damage during aging
that is repaired upon entry into cell cycle. Cell Stem Cell 15, 37–50.
STAR+METHODS Benz, C., Copley, M.R., Kent, D.G., Wohrer, S., Cortes, A., Aghaeepour, N.,
Ma, E., Mader, H., Rowe, K., Day, C., et al. (2012). Hematopoietic stem cell
Detailed methods are provided in the online version of this paper subtypes expand differentially during development and display distinct lym-
phopoietic programs. Cell Stem Cell 10, 273–283.
and include the following:
Bowie, M.B., McKnight, K.D., Kent, D.G., McCaffrey, L., Hoodless, P.A., and
d KEY RESOURCES TABLE Eaves, C.J. (2006). Hematopoietic stem cells proliferate until after birth and
d CONTACT FOR REAGENT AND RESOURCE SHARING show a reversible phase-specific engraftment defect. J. Clin. Invest. 116,
2808–2816.
d EXPERIMENTAL MODEL AND SUBJECT DETAILS
Buczacki, S.J., Zecchini, H.I., Nicholson, A.M., Russell, R., Vermeulen, L.,
d METHOD DETAILS
Kemp, R., and Winton, D.J. (2013). Intestinal label-retaining cells are secretory
B Sample Preparation and Flow Cytometry
precursors expressing Lgr5. Nature 495, 65–69.
B Transplantation Assays
Busch, K., Klapproth, K., Barile, M., Flossdorf, M., Holland-Letz, T., Schlenner,
B Limiting Dilution Analysis S.M., Reth, M., Höfer, T., and Rodewald, H.R. (2015). Fundamental properties
B Cell-Cycle Analysis of unperturbed haematopoiesis from stem cells in vivo. Nature 518, 542–546.
B In Vitro Analysis of Megakaryocyte Potential Cho, R.H., Sieburg, H.B., and Muller-Sieburg, C.E. (2008). A new mechanism
B Hamming Distance Analysis for the aging of hematopoietic stem cells: aging changes the clonal composi-
B Mathematical Modeling tion of the stem cell compartment but not individual stem cells. Blood 111,
d QUANTIFICATION AND STATISTICAL ANALYSES 5553–5561.
Dugas, J.C., Ibrahim, A., and Barres, B.A. (2007). A crucial role for p57(Kip2) in
SUPPLEMENTAL INFORMATION the intracellular timer that controls oligodendrocyte differentiation. J. Neurosci.
27, 6185–6196.
Supplemental Information includes six figures and three tables and can be Dykstra, B., Olthof, S., Schreuder, J., Ritsema, M., and de Haan, G. (2011).
found with this article online at http://dx.doi.org/10.1016/j.cell.2016.10.022. Clonal analysis reveals multiple functional defects of aged murine hematopoi-
etic stem cells. J. Exp. Med. 208, 2691–2703.
AUTHOR CONTRIBUTIONS Efroni, S., Duttagupta, R., Cheng, J., Dehghani, H., Hoeppner, D.J., Dash, C.,
Bazett-Jones, D.P., Le Grice, S., McKay, R.D., Buetow, K.H., et al. (2008).
J.M.B. conceived and performed experiments, acquired and analyzed data, Global transcription in pluripotent embryonic stem cells. Cell Stem Cell 2,
and wrote the manuscript. B.M. and H.S. developed the models, analyzed 437–447.
data, and edited the manuscript. H.S.K. performed experiments. K.M. Essers, M.A., Offner, S., Blanco-Bose, W.E., Waibler, Z., Kalinke, U., Duch-
conceived and performed experiments, edited the manuscript, and managed osal, M.A., and Trumpp, A. (2009). IFNalpha activates dormant haematopoietic
the project. stem cells in vivo. Nature 458, 904–908.
Flach, J., Bakker, S.T., Mohrin, M., Conroy, P.C., Pietras, E.M., Reynaud, D.,
ACKNOWLEDGMENTS
Alvarez, S., Diolaiti, M.E., Ugarte, F., Forsberg, E.C., et al. (2014). Replication
stress is a potent driver of functional decline in ageing haematopoietic stem
The authors wish to thank members of the Moore and Lemischka laboratories
cells. Nature 512, 198–202.
for advice and criticisms, I. Lemischka, R. Brosh, S. Ghaffari, C. Schaniel, and
R. Krauss for critical reading of the manuscript, and B. Dykstra for thoughtful Foudi, A., Hochedlinger, K., Van Buren, D., Schindler, J.W., Jaenisch, R.,
discussions of the work. We thank J. Qiu and X. Niu for placing mice on dox Carey, V., and Hock, H. (2009). Analysis of histone 2B-GFP retention reveals
and maintaining them until J.M.B. took over the experiments. We also thank slowly cycling hematopoietic stem cells. Nat. Biotechnol. 27, 84–90.
J. Gregory for her artistic contributions and the flow cytometry and animal Geiger, H., de Haan, G., and Florian, M.C. (2013). The ageing haematopoietic
facility shared resources at the ISMMS. K.M. was supported by NIH stem cell compartment. Nat. Rev. Immunol. 13, 376–389.
2R01HL58739 and J.M.B was supported by T32HD075735. The authors Gekas, C., and Graf, T. (2013). CD41 expression marks myeloid-biased adult
wish to dedicate this manuscript to the memory of Dr. Christa Müller-Sieburg; hematopoietic stem cells and increases with age. Blood 121, 4463–4472.
a dear friend, mentor, and colleague. Haas, S., Hansson, J., Klimmeck, D., Loeffler, D., Velten, L., Uckelmann, H.,
Wurzer, S., Prendergast, A.M., Schnell, A., Hexel, K., et al. (2015). Inflamma-
Received: January 26, 2015 tion-induced emergency megakaryopoiesis driven by hematopoietic stem
Revised: July 8, 2016 cell-like megakaryocyte progenitors. Cell Stem Cell 17, 422–434.
Harley, C.B., Futcher, A.B., and Greider, C.W. (1990). Telomeres shorten dur-
ing ageing of human fibroblasts. Nature 345, 458–460.
REFERENCES Hu, Y., and Smyth, G.K. (2009). ELDA: extreme limiting dilution analysis for
comparing depleted and enriched populations in stem cell and other assays.
Ang, Y.S., Gaspar-Maia, A., Lemischka, I.R., and Bernstein, E. (2011). Stem J. Immunol. Methods 347, 70–78.
cells and reprogramming: breaking the epigenetic barrier? Trends Pharmacol. Insco, M.L., Leon, A., Tam, C.H., McKearin, D.M., and Fuller, M.T. (2009).
Sci. 32, 394–401. Accumulation of a differentiation regulator specifies transit amplifying division
1308 Cell 167, 1296–1309, November 17, 2016

number in an adult stem cell lineage. Proc. Natl. Acad. Sci. USA 106, 22311– Radomska, H.S., Gonzalez, D.A., Okuno, Y., Iwasaki, H., Nagy, A., Akashi, K.,
22316. Tenen, D.G., and Huettner, C.S. (2002). Transgenic targeting with regulatory
Levine, J.H., Fontes, M.E., Dworkin, J., and Elowitz, M.B. (2012). Pulsed feed- elements of the human CD34 gene. Blood 100, 4410–4419.
back defers cellular differentiation. PLoS Biol. 10, e1001252. Rossi, D.J., Bryder, D., Zahn, J.M., Ahlenius, H., Sonu, R., Wagers, A.J., and
MacLennan, I.C., and Gray, D. (1986). Antigen-driven selection of virgin and Weissman, I.L. (2005). Cell intrinsic alterations underlie hematopoietic stem
memory B cells. Immunol. Rev. 91, 61–85. cell aging. Proc. Natl. Acad. Sci. USA 102, 9194–9199.
Majumdar, A., Cesario, W.C., White-Grindley, E., Jiang, H., Ren, F., Khan,
Rossi, D.J., Bryder, D., Seita, J., Nussenzweig, A., Hoeijmakers, J., and Weiss-
M.R., Li, L., Choi, E.M., Kannan, K., Guo, F., et al. (2012). Critical role of amy-
man, I.L. (2007). Deficiencies in DNA damage repair limit the function of hae-
loid-like oligomers of Drosophila Orb2 in the persistence of memory. Cell 148,
matopoietic stem cells with age. Nature 447, 725–729.
515–529.
Mohrin, M., Bourke, E., Alexander, D., Warr, M.R., Barry-Holson, K., Le Beau, Si, K., Giustetto, M., Etkin, A., Hsu, R., Janisiewicz, A.M., Miniaci, M.C., Kim,
M.M., Morrison, C.G., and Passegué, E. (2010). Hematopoietic stem cell J.H., Zhu, H., and Kandel, E.R. (2003a). A neuronal isoform of CPEB regulates
quiescence promotes error-prone DNA repair and mutagenesis. Cell Stem local protein synthesis and stabilizes synapse-specific long-term facilitation in
Cell 7, 174–185. aplysia. Cell 115, 893–904.
Morrison, S.J., Wandycz, A.M., Akashi, K., Globerson, A., and Weissman, I.L. Si, K., Lindquist, S., and Kandel, E.R. (2003b). A neuronal isoform of the aplysia
(1996). The aging of hematopoietic stem cells. Nat. Med. 2, 1011–1016. CPEB has prion-like properties. Cell 115, 879–891.
Müller-Sieburg, C.E., Cho, R.H., Thoman, M., Adkins, B., and Sieburg, H.B. Sieburg, H.B., and Müller-Sieburg, C.E. (2004). Classification of short kinetics
(2002). Deterministic regulation of hematopoietic stem cell self-renewal and by shape. In Silico Biol. (Gedrukt) 4, 209–217.
differentiation. Blood 100, 1302–1309.
Muller-Sieburg, C.E., Cho, R.H., Karlsson, L., Huang, J.F., and Sieburg, H.B. Sudo, K., Ema, H., Morita, Y., and Nakauchi, H. (2000). Age-associated char-
(2004). Myeloid-biased hematopoietic stem cells have extensive self-renewal acteristics of murine hematopoietic stem cells. J. Exp. Med. 192, 1273–1280.
capacity but generate diminished lymphoid progeny with impaired IL-7 Sun, J., Ramos, A., Chapman, B., Johnnidis, J.B., Le, L., Ho, Y.J., Klein, A.,
responsiveness. Blood 103, 4111–4118. Hofmann, O., and Camargo, F.D. (2014). Clonal dynamics of native haemato-
Nakamura-Ishizu, A., Takizawa, H., and Suda, T. (2014). The analysis, roles poiesis. Nature 514, 322–327.
and regulation of quiescence in hematopoietic stem cells. Development 141,
van der Wath, R.C., Wilson, A., Laurenti, E., Trumpp, A., and Liò, P. (2009).
4656–4666.
Estimating dormant and active hematopoietic stem cell kinetics through exten-
Pang, W.W., Price, E.A., Sahoo, D., Beerman, I., Maloney, W.J., Rossi, D.J., sive modeling of bromodeoxyuridine label-retaining cell dynamics. PLoS ONE
Schrier, S.L., and Weissman, I.L. (2011). Human bone marrow hematopoietic 4, e6972.
stem cells are increased in frequency and myeloid-biased with age. Proc. Natl.
Acad. Sci. USA 108, 20012–20017. Walter, D., Lier, A., Geiselhart, A., Thalheimer, F.B., Huntscha, S., Sobotta,
M.C., Moehrle, B., Brocks, D., Bayindir, I., Kaschutnig, P., et al. (2015). Exit
Paul, F., Arkin, Y., Giladi, A., Jaitin, D.A., Kenigsberg, E., Keren-Shaul, H.,
from dormancy provokes DNA-damage-induced attrition in haematopoietic
Winter, D., Lara-Astiaso, D., Gury, M., Weiner, A., et al. (2015). Transcriptional
stem cells. Nature 520, 549–552.
heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–
1677. Wilson, A., Laurenti, E., Oser, G., van der Wath, R.C., Blanco-Bose, W., Jawor-
Perié, L., Duffy, K.R., Kok, L., de Boer, R.J., and Schumacher, T.N. (2015). The ski, M., Offner, S., Dunant, C.F., Eshkind, L., Bockamp, E., et al. (2008). He-
branching point in erythro-myeloid differentiation. Cell 163, 1655–1662. matopoietic stem cells reversibly switch from dormancy to self-renewal during
Pietras, E.M., Warr, M.R., and Passegué, E. (2011). Cell cycle regulation in he- homeostasis and repair. Cell 135, 1118–1129.
matopoietic stem cells. J. Cell Biol. 195, 709–720. Yamamoto, R., Morita, Y., Ooehara, J., Hamanaka, S., Onodera, M., Rudolph,
Qiu, J., Papatsenko, D., Niu, X., Schaniel, C., and Moore, K. (2014). Divisional K.L., Ema, H., and Nakauchi, H. (2013). Clonal analysis unveils self-renewing
history and hematopoietic stem cell function during homeostasis. Stem Cell lineage-restricted progenitors generated directly from hematopoietic stem
Reports 2, 473–490. cells. Cell 154, 1112–1126.
Cell 167, 1296–1309, November 17, 2016 1309

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
Anti-Human/Mouse B220 biotin (clone RA3-6B2) eBioscience Cat# 13-0452
Anti-Mouse c-Kit PE (clone 2B8) eBioscience Cat# 12-1178
Anti-Mouse CD11b biotin (clone M1/70) eBioscience Cat# 13-0112
Anti-Mouse CD34 Alexafluor700 (clone RAM34) eBioscience Cat# 56-0341
Anti-Mouse CD3ε biotin (clone 17A2) eBioscience Cat# 13-0031
Anti-Mouse CD4 PE/Cy5 (clone GK1.5) eBioscience Cat# 15-0041
Anti-Mouse CD41 APC (clone eBioMWReg31) eBioscience Cat# 17-0411
Anti-Mouse CD8a PE/Cy5 (clone 53-6.7) eBioscience Cat# 17-0081
Anti-Mouse FcgRIII PE/Cy7 (clone 93) eBioscience Cat# 25-0161
Anti-Mouse Flk2 biotin (clone A2F10) eBioscience Cat# 13-1351
Anti-Mouse Flk2 PE (clone A2F10) eBioscience Cat# 12-1351
Anti-Mouse Gr1 biotin (clone RB6-8C5) eBioscience Cat# 13-5931
Anti-Mouse IL-7Ra PE/Cy7 (clone A7R34) eBioscience Cat# 25-1271
Anti-Mouse Ter-119 biotin (clone TER-119) eBioscience Cat# 13-5921
Anti-Mouse/Rat Ki67 PE (clone SolA15) eBioscience Cat# 12-5698
Streptavidin eFluor450 eBioscience Cat# 48-4317
Streptavidin PE/Cy5 eBioscience Cat# 15-4317
Anti-Mouse/Human B220 PE/Cy7 (clone RA3-6B2) Biolegend Cat# 103221
Anti-Mouse c-Kit APC (clone 2B8) Biolegend Cat# 10-5812
Anti-Mouse c-Kit FITC (clone 2B8) Biolegend Cat# 10-5806
Anti-Mouse c-Kit PerCP/Cy5.5 (clone 2B8) Biolegend Cat# 10-5823
Anti-Mouse CD11b APC/Cy7 (clone M1/70) Biolegend Cat# 101226
Anti-Mouse CD150 PE/Cy7 (clone TC15-12F12.2) Biolegend Cat# 115914
Anti-Mouse CD45.2 Alexafluor700 (clone 104) Biolegend Cat# 109821
Anti-Mouse CD45.2 PE (clone 104) Biolegend Cat# 109808
Anti-Mouse CD48 biotin (clone HM48-1) Biolegend Cat# 103410
Anti-Mouse CD48 PerCP/Cy5.5 (clone HM48-1) Biolegend Cat# 103422
Anti-Mouse Gr1 APC/Cy7 (clone RB6-8C5) Biolegend Cat# 108423
Anti-Mouse Sca-1 APC/Cy7 (clone D7) Biolegend Cat# 108126
Anti-Mouse Sca-1 PacificBlue (clone D7) Biolegend Cat# 108120
Streptavidin APC/Cy7 Biolegend Cat# 405208
4’,6-Diamidino-2-Phenylindole, Dihydrochloride (DAPI) Sigma Cat# D9542
Propidium Iodide (PI) Sigma Cat# P4170
Doxycycline hycalate (Dox) Sigma Cat# D9891
16% Formaldehyde Soultion (w/v), Methanol-free Thermo Scientific Cat# 28906
Dynabeads Biotin Binder Life Technologies Cat# 11047
Recombinant Human TPO R&D Systems Cat# 288-TP
Recombinant Mouse SCF R&D Systems Cat# 455-MC
Recombinant Mouse IL-3 R&D Systems Cat# 403-ML
StemSpan SFEM StemCell Technologies, Inc. Cat# 09650
Fetal Bovine Serum HyClone Cat# SH30070
Newborn Calf Serum GIBCO Cat# 16010519

Continued
Mouse: C57BL/6J The Jackson Laboratory Stock No: 000664
Mouse: B6.SJL-Ptprca Pepcb/BoyJ The Jackson Laboratory Stock No: 002014
Mouse: Tg(tetO-HIST1H2BJ/GFP)47Efu/J The Jackson Laboratory Stock No: 005104
Mouse: HuCD34-tTA Radomska et al., 2002 N/A
ELDA: Extreme Limiting Dilution Analysis Hu and Smyth, 2009 http://bioinf.wehi.edu.au/software/elda/
Information and requests for reagents may be directed to the lead contact Kateri Moore (kateri.moore@mssm.edu).
Tg(tetO-HIST1H2BJ/GFP)47Efu/J (TetO-H2BGFP), hCD34-tTA (hCD34), C57BL/6 (B6), and the congenic B6.SJL-Ptprca Pepcb/
BoyJ (SJL) mice were acquired and maintained as previously described (Qiu et al., 2014). Double transgenic mice 34/H2BGFP
mice were derived from crossbreeding the single transgenic TetO-H2BGFP and hCD34 mice. F1 mice from this cross were used
for all experiments, with the exception of cell cycle analysis in Figures S2I and S2J, which was performed on B6 BM. The F1 progeny
of crosses from TetO-H2BGFP and B6 mice were used for background GFP gating controls in all label-dilution experiments. Dox was
administered through the drinking water at 1 mg/ml to mice beginning between 2-4 months of age and changed twice weekly. Both
male and female mice were used in all experiments. Sample sizes for experiments were determined without formal power calcula-
tions. Animal experiments were approved by the Institutional Animal Care and Use Committee and conducted in accordance with the
Animal Welfare Act.
METHOD DETAILS
Sample Preparation and Flow Cytometry

BM cells were harvested from tibias, femurs, and pelvic bones by crushing with a mortar and pestle in a PBS buffer supplemented
with 5% new born calf serum (NCS, GIBCO). Cells were triturated to obtain a single cell suspension, and bone debris was removed by
filtering through a 70 mm cell strainer (BD). Red cells were lysed with an ammonium chloride lysis buffer. For staining, cells were first
incubated with biotin-conjugated lineage markers anti-CD3ε, anti-B220, anti-Gr-1, anti-CD11b, and anti-Ter119, and on occasion
also with biotin-conjugated anti-CD48 and anti-Flk2, followed by anti-biotin superparamagnetic beads (Dynabeads Biotin Binder,
Life technologies), and lineage marker expressing cells were depleted via magnetic separation. The fraction enriched for
Lineage-/low cells was further incubated with antibodies against CD150, CD48, c-Kit, Sca1, CD41, Flk2, IL-7Ra, CD34, FcgRIII,
and fluorophore-conjugated streptavidin. Dead cells were excluded by staining with 4’, 6-diamidino-2-phenylindole (DAPI, Sigma),
or propidium iodide (PI, Sigma). Cells were analyzed on an LSRII (Becton Dickenson) flow cytometer and sorted on an Influx (Becton
Dickenson). The various stem and progenitor populations were defined as follows: HSCs (Lin–Sca-1+c-Kit+CD48–Flk2–CD150+
or Lin–Sca-1+c-Kit+CD48–CD150+), MPPs (Lin–Sca-1+c-Kit+CD48–CD150–), HPC-1 (Lin–Sca-1+c-Kit+CD48+CD150–), HPC-2
(Lin–Sca-1+c-Kit+CD48+CD150+), MkPs (Lin–Sca-1–c-Kit+CD150+CD41+), CMPs (Lin–Sca-1–c-Kit+CD34+FcgRIII–), GMPs (Lin–Sca-
1–c-Kit+CD34+FcgRIII+), MEPs (Lin–Sca-1–c-Kit+CD34–FcgRIII–), CLPs (Lin–Sca-1midc-KitmidFlk2+IL7Ra+). Label retention was
defined by gating above the background GFP levels found in heterozygous single transgenic TetO-H2BGFP HSCs. Antibodies
are listed in Table S3.
Transplantation Assays
HSCs were sorted from 34/H2BGFP mice (CD45.2) that were 19 months of age, and had been chased with dox for 15 or 17 months
into various populations based on label retention. Sorted cells from each population were injected retro-orbitally into lethally irradi-
ated SJL (CD45.1) mice (2 rounds of 550 rads, three hours apart) at a dosage of 200 sorted cells plus 1.3x105 cells of Lin/CD48/Flk2-
depleted competitor BM (CD45.1) per mouse. Mice were bled at timed intervals post transplantations from the retro-orbital venous
plexus, red blood cells were lysed, and the contribution of donor derived CD45.2+ cells were assessed for contribution to the B cell
(B220+), T cell (CD4/CD8+), and Myeloid (CD11b/Gr1+) lineages. Granulocytes were identified as SSCHiCD11b/Gr1+ cells. Secondary
transplants were performed at 24 weeks post primary transplant and 5x106 cells of pooled whole BM from each group were
transplanted into secondary hosts.

Limiting dilution transplantations were performed by transplanting 15 sorted HSCs from 19-month old mice chased with dox for
17 months along with 2x105 cells of congenic un-manipulated competitor bone (CD45.1) into lethally irradiated young SJL mice. Sec-
ondary transplants were performed 24 weeks post primary transplantation by transplanting 5x106 cells of whole BM from a single
primary recipient mouse into two new hosts (CD45.1).
Limiting Dilution Analysis

Limiting dilution analysis was performed with peripheral blood reconstitution data at 24 weeks after primary transplant using the freely
available ELDA software (http://bioinf.wehi.edu.au/software/elda/).
Cell-Cycle Analysis
Cells were stained and prepared as above, then fixed in 2% methanol-free paraformaldehyde diluted in PBS. Cells were then washed
three times with PBS containing 5% NCS, permeabilized in 0.2% Triton X-100, then stained with anti-Ki-67 (PE, eBioscience), and
DAPI prior to analysis. For cell cycle analysis of GFP peak cells, LSKCD48-Flk2-CD150+ cells were first enrichment sorted prior to
fixation, subsequent staining, and cell cycle analysis.
In Vitro Analysis of Megakaryocyte Potential

Single LSKCD48-Flk2-CD150+ cells were sorted into U-shaped 96-well plates in 200ml StemSpan SFEM (StemCell Technologies)
supplemented with 10% FBS (HyClone), 1% b-mercaptoethanol (0.1 mM, Sigma-Aldrich), 1% penicillin/streptomycin (GIBCO)
and cytokines mIL-3 (20 ng/ml, R&D Systems), mSCF (50 ng/ml, R&D Systems) and hTpo (50 ng/ml, R&D Systems). Colony
morphology and size were evaluated on day 13. Colonies were collected for cytospin preparation and hematoxylin and eosin staining
to determine the presence of megakaryocytes and other myeloid cells.
Hamming Distance Analysis

The Hamming distance was measured for pairs of leukocyte repopulation kinetics from secondary recipient mice as described
(Müller-Sieburg et al., 2002). Thereby, the numerical kinetics of total peripheral blood white blood cell chimerism from paired
secondary recipients were first transformed into symbolic dynamics based on the slopes of consecutive pairs of donor-type cell
values over time as described. Symbolic dynamics are defined as sequences of symbols from the sign of the slope (‘‘+’’ = positive
slope, ‘‘-’’ = negative slope, ‘‘’’ = close to zero slope). Next, the Hamming distances of all pairs of symbolic dynamics were deter-
mined. Pairs of secondary transplant kinetics were defined as synchronous if their Hamming distance was below an empirically set
threshold (the smallest possible Hamming distance = 0 indicated that the two kinetics were identical based on this clustering mea-
sure). All pairs of kinetics with Hamming distances above the threshold were considered asynchronous.
Mathematical Modeling
Assuming that GFP dilutes by a factor of 2 with each cell division, the relative positions of the GFP peaks in the LR-HSC population are
described by the following model:

y0 c
yn = + c;
2n
where yn is the mean fluorescence intensity at the nth peak and c is a constant that accounts for background fluorescence. The MFI of
GFP levels in HSCs of single transgenic TetO-H2BGFP mice was used to estimate c.
Expansion of LR-HSC numbers during aging was described by the following model:
dx0
= k0 x0 ;
dt
dx1
= 2k0 x0 k1 x1 ;
dt
dx2
= 2k1 x1 k2 x2 ;
dt
dx3
= 2k2 x2 k3 x3 ;
dt
dx4
= 2k3 x3 k4 x4 ;
dt
where xn ðtÞ is the expected number of cells per long bone that have divided n( = 0, 1, 2, 3, 4) times since dox chase, and lnð2Þ=kn is the
expected length of time to the next division for a cell that has previously divided n times. The above system is linear and may therefore
be directly integrated and easily compared with experimental data. To determine how divisional history affects cell cycle time we fit
the following functional forms for kn to experimental data using least-squares fitting.

1. Let kn = k for all n. In this model, cells cycle on average at the same rate regardless of how many times they have divided pre-
viously. In this case there is one free parameter k = lnð2Þ=t, where t is the expected cell cycle time.
2. Let kn = k for n < 4 and k4 = 0. In the first model xn /0 as t/N for all n (i.e., GFP ultimately dilutes away in all cells). However, in
this modification xn /0 as t/N for n < 4 and x4 /16x0 ð0Þ + 8x1 ð0Þ + 4x2 ð0Þ + 2x3 ð0Þ + x4 ð0Þ, where xn ð0Þ is the initial number of
cells in state n. This model assumes the cells proliferate normally until they have undergone four divisions, at which point they
adopt a permanently quiescent state.
3. Let kn = k0 + nb for all n. In this model the cell cycle time grows linearly with divisional history: each cell cycle is, on average, b
hours longer than the last.
4. Let kn = kbn , with b < 1 for all n. In this model the expected cell cycle time grows exponentially with divisional history: each cell
cycle is, on average, 1=b times longer than the last.
n
5. Let k0 = k and kn = kb2 with b < 1 for all n > 0. In this model the expected cell cycle time grows super-exponentially as the cells
n
divide: the ðn + 1Þ cell cycle is, on average, 1=b2 times longer than the nth.
th
To account for the gain of CD41 expression in the LR-HSC fraction during aging, we assume that CD41 negative LR-HSCs gain
CD41 expression with probability 1 a each time they divide. The dynamics of CD41 expression within the label-retaining fraction
are then described by the following model:
dx0 dy0
= k0 x0 ; = k0 y0 ;
dt dt
dx1 dy1
= 2ð1 aÞk0 x0 k1 x1 ; = 2k0 y0 + 2ak0 x0 k1 y1 ;
dt dt
dx2 dy2
= 2ð1 aÞk1 x1 k2 x2 ; = 2k1 y1 + 2ak1 x1 k2 y2 ;
dt dt
dx3 dy3
= 2ð1 aÞk2 x2 k3 x3 ; = 2k2 y2 + 2ak2 x2 k3 y3 ;
dt dt
dx4 dy4
= 2ð1 aÞk3 x3 k4 x4 ; = 2k3 y3 + 2ak3 x3 k4 y4 ;
dt dt
where xn ðtÞ and yn ðtÞ are the expected number of CD41- and CD41+ LR-HSCs cells per long bone that have divided
n ð = 0; 1; 2; 3; 4Þ times since dox chase. To minimize the number of free parameters we assumed that CD41 status does not alter
cell cycle progression and use model 5, above, to account for the onset of cellular quiescence. In this case the full model has three
free parameters ðk; b; and a).
QUANTIFICATION AND STATISTICAL ANALYSES
Data are presented as mean ± SEM. The sample size for each experiment and the replicate number of experiments are included in the
figure legends. Statistical significance was determined by Welch’s t test, Paired t test, or One-Way ANOVA followed by test for linear
trend using GraphPad Prism 6 (GraphPad Software, La Jolla, CA). P values < 0.05 were considered significant. P values for each
experiment are included in associated figure legends.

Figure S1. Dynamic Range of the hCD34-tTA 3 TetO-H2BGFP System and Specificity of the hCD34 Promoter to a Subset of HSCs during
Adulthood, Related to Figure 1
(A) Dynamic range of the H2BGFP reporter system in the absence of dox chase. Vertical lines indicate one-half dilutions in fluorescence intensity of the H2BGFP
label, indicating a range of 7-8 H2BGFP dilutions prior to reaching background level.
(B) Schematic for examining H2BGFP loss over time in 34/H2BGFP animals without dox treatment.
(C) Histograms depicting H2BGFP level in the LSKCD48-CD150+ BM HSC compartment from mice of various ages that have never been exposed to dox. The
upper and lower range of GFPHi HSC frequency is displayed for each age group.
(D) Quantification of GFPHi HSC frequency from mice of various age groups never exposed to dox (n = 3-12 mice per group). Data are displayed as the mean ±
SEM. Statistical significance was assessed by one-way ANOVA followed by test for linear trend; **p < 0.01.
(E) Schematic for testing active H2BGFP labeling of the HSC compartment after dox release. Single transgenic hCD34 and H2BGFP mice were mated together to
produce double transgenic 34/H2BGFP mice that were born on dox. Progeny were raised on dox until 8 weeks (56 days) of age, at which point dox was removed.
BM was then collected at various time points after dox removal, and LSKCD48-CD150+ cells were analyzed for the presence of H2BGFP above background
levels.
(F) Time course kinetics of H2BGFP labeling after dox release. Data are displayed as the mean ± SEM (n = 3-5 mice per group from two independent experiments).
Figure S2. Leakiness of the hCD34-tTA 3 TetOH2BGFP System, Related to Figure 1
(A) Experimental setup. Single transgenic hCD34-tTA and TetO-H2BGFP mice were mated while exposed to dox through the drinking water. Pups born from
these matings were maintained on dox until adulthood, at which point BM was analyzed for the presence of H2BGFP expression above background levels.
(B) Histogram showing GFP levels of LSKCD48-CD150+ cells from BM of 34/H2BGFP mice born on dox.
(C) Modified experimental timeline. Mice born on dox were analyzed after a year of continuous dox treatment.
(D) Histograms of GFP levels from three 34/H2BGFP mice born and maintained on dox for 1 year, and three single transgenic TetO-H2BGFP mice (background).
(E) Quantification of the brightest GFP intensity from each mouse displayed in (D).
Data are displayed as the mean ± SEM.
Figure S3. Quantification of Young and Aging HSC Populations, and Cell-Cycle Analysis of HSCs Based on CD41 Expression, Related to
Figure 2
(A and B) Frequency (A) and absolute number (B) of HSCs in young and aging bone marrow. n = 10-17 mice per group.
(C and D) Frequencies (C) and absolute numbers (D) of various HSPC populations (I-III) in young and aging bone marrow. n = 6-7 mice per group.
(E and F) Frequency (E) and absolute number (F) of CD41+ HSCs in young and aging bone marrow. n = 6-10 mice per group.
(G and H) Frequencies (G) and absolute number (H) of HSC populations characterized based on CD41 expression and label retention in young and aging bone
marrow. n = 6-10 mice per group from 2-3 independent experiments.
(I and J) Representative images (I) and quantification (J) of CD41– and CD41+ HSC snapshot cell cycle profiles. n = 6 mice per group from two independent
experiments.
(K) Histograms displaying the H2BGFP label retention over time of CD41– and CD41+ HSCs. Histograms are representations of young mice chased with dox for
12 weeks.
(L) Quantification of H2BGFP label retention in (K). n = 9-11 mice per group from three independent experiments. Data are displayed as the mean ± SEM. *p <
0.05, **p < 0.01, ***p < 0.001 by Welch’s t test (quantifications), or paired Student t test (cell cycle).
Figure S4. Megakaryocyte Potential of HSC Compartment with Aging Based on Divisional History, Related to Figure 2
Single cells from the GFPHi, GFPLo, and Total HSC populations were sorted from young (5 months old, dox treated 3 months) and aging (11 months old, dox
treated 9 months) mice into wells of a 96 well plate and were cultured in the presence of SCF, IL-3, and Tpo.
(A–D) Images of representative colonies after 13 days in culture. Mixed cell colonies containing both small and large cells (A and B), small cell only colonies (C), and
large cell only colonies (D). Yellow arrows mark large megakaryocyte-like cells.
(E and F) Representative images of cytospun mixed (E) and small cell only colonies (F) stained with H&E. Only mixed colonies showed megakaryocytes with large
multi-lobed nuclei (black arrows). Large cell only colonies generated too few cells to be mounted on slides for staining.
(G) Quantification of colony types found from each sorted HSC population.
(H) Quantification of colony size at day 13 generated from each sorted HSC population. Data are displayed as the mean ± SEM of 64-130 single cells per group
from 4 independent experiments.
Figure S5. Synchronistic Repopulation Kinetics in Paired Secondary Transplantations, Related to Figure 3
Bone marrow from each mouse repopulated with 15 cells from aging HSC populations was transplanted into paired secondary hosts. Repopulation kinetics were
followed in both secondary recipients over 24 weeks to determine the degree of synchronicity of total white blood cell repopulation (%CD45.2+) in independent
hosts. We quantitatively defined the degree of synchronicity as the Hamming distance between pairs of time series.
(A) Repopulation curves grouped into 2 clusters based on the degree of synchronicity. The cluster boxed in gray contains curves with kinetics determined to be
synchronous, while the cluster boxed in red contains asynchronous kinetics for paired secondary hosts. The letter on the right side of each repopulation kinetic
indicates the retrospectively identified repopulating cell type. L, LT-HSC; I, IT-HSC; S, ST-HSC, B, Bipotent Progenitor, M, Myeloid Progenitor.
(B) Scaled paired secondary repopulation curves used to determine symbolic dynamics data from each secondary repopulation curve. The orange and blue
curves represent individual secondary recipients. Shaded plots indicate asynchronous repopulation behavior.
(C) Hamming distance measurements. Any two kinetics were defined to be asynchronous (red dots), if their Hamming distance was > 2. One exception occurred
where secondary repopulation was considered asynchronous with a Hamming distance of 2 due to a shorter symbolic dynamic sequence.
Figure S6. Cell-Cycle Profiles of HSCs from Each GFP Peak, Related to Figure 4
(A) Gating strategy for each HSC population. Cells were enrichment sorted on LSKCD48-Flk2-CD150+ cells, then fixed, stained for Ki67 and DAPI, then analyzed
by flow cytometry for cell cycle state. Displayed are cells sorted from a young mouse.
(B and C) Static cell cycle profile of Total, GFPHi, and Peak 0-4 cells from Young (5 months old, 3 months on dox; B) and Aging (11 months old, 9 months on dox; C)
mice. Between 5000-15000 events in the Total HSC gate were acquired for each sample. n = 3 and 2 for Young and Aging mice respectively.
Article
Epigenetic Memory Underlies Cell-Autonomous

Heterogeneous Behavior of Hematopoietic Stem
Cells
Vionnie W.C. Yu, Rushdia Z. Yusuf,
Toshihiko Oki, ..., Charles P. Lin,
Peter V. Kharchenko, David T. Scadden
Correspondence
peter.kharchenko@post.harvard.edu
(P.V.K.),
david_scadden@harvard.edu (D.T.S.)
In Brief
Hematopoietic stem cells display
heterogeneous, stereotypic clonal
behavior that is conserved under various
conditions, and the differences in their
epigenome, instead of niche, are
responsible for this remarkable memory.
Highlights
d Clonal tracking demonstrates lone-specific functional
heterogeneity in vivo
d Stereotypical functions of HSCs are preserved under stress

or tissue injury
d HSC clonal behavior is associated with a distinct epigenetic

pattern
d HSC is epigenetically constrained with limited plasticity in

response to cues
Yu et al., 2016, Cell 167, 1310–1322

Article
Epigenetic Memory Underlies

Cell-Autonomous Heterogeneous Behavior
of Hematopoietic Stem Cells
Vionnie W.C. Yu,1,2,3 Rushdia Z. Yusuf,1,2,3 Toshihiko Oki,1,2,3 Juwell Wu,4 Borja Saez,1,2,3 Xin Wang,5 Colleen Cook,1,2,3
Ninib Baryawno,1,2,3 Michael J. Ziller,2,3,4 Eunjung Lee,5,6 Hongcang Gu,4 Alexander Meissner,2,3,4 Charles P. Lin,2,7
Peter V. Kharchenko,2,5,* and David T. Scadden1,2,3,8,*
1Center for Regenerative Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
2Harvard Stem Cell Institute, Cambridge, MA 02138, USA
3Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA
4Broad Institute of Harvard and MIT, Cambridge, MA 02138, USA
5Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
6Division of Genetics, Brigham and Women’s Hospital, Boston, MA 02115, USA
7Wellman Center for Photomedicine, Massachusetts General Hospital, Boston, MA 02114, USA
8Lead Contact
*Correspondence: peter.kharchenko@post.harvard.edu (P.V.K.), david_scadden@harvard.edu (D.T.S.)

SUMMARY to exhibit bias toward myeloid, lymphoid, or megakaryocytic

lineage upon transplantation of single cells (Dykstra et al.,
Stem cells determine homeostasis and repair of many 2007, 2011; Morita et al., 2010), on ex vivo barcoding and trans-
tissues and are increasingly recognized as function- plantation of populations (Aiuti et al., 2013; Gerrits et al., 2010;
ally heterogeneous. To define the extent of—and Jordan and Lemischka, 1990; Lemischka, 1993; Lemischka
molecular basis for—heterogeneity, we overlaid et al., 1986; Lu et al., 2011; Mazurier et al., 2004; Shi et al.,
functional, transcriptional, and epigenetic attributes 2002; Snodgrass and Keller, 1987), or by retrotransposon
tagging of endogenous cells (Sun et al., 2014b). Further, sin-
of hematopoietic stem cells (HSCs) at a clonal level
gle-cell transplant data have been coupled with single-cell
using endogenous fluorescent tagging. Endogenous
gene expression analysis on different cells to resolve subpopula-
HSC had clone-specific functional attributes over tions with corresponding gene expression and repopulation po-
time in vivo. The intra-clonal behaviors were highly tential (Wilson et al., 2015). Overlaying in vivo functional behavior
stereotypic, conserved under the stress of trans- of endogenous HSC clones with their gene expression and
plantation, inflammation, and genotoxic injury, and epigenetic characteristics represents a key unresolved chal-
associated with distinctive transcriptional, DNA lenge. The coupling of function with gene expression and chro-
methylation, and chromatin accessibility patterns. matin state at clonal resolution is important for defining what
Further, HSC function corresponded to epigenetic governs stem cells; particularly for defining if HSC function is
configuration but not always to transcriptional bounded by cell-autonomous epigenetic constraints. To test
state. Therefore, hematopoiesis under homeostatic whether divergent HSC behaviors could be defined at a clonal
level under homeostatic conditions and whether these behaviors
and stress conditions represents the integrated ac-
were epigenetically determined, we created a multi-fluorescent
tion of highly heterogeneous clones of HSC with mouse model that enables both molecular profiling and func-
epigenetically scripted behaviors. This high degree tional tracking of live cells in vivo.
of epigenetically driven cell autonomy among HSCs
implies that refinement of the concepts of stem cell RESULTS
plasticity and of the stem cell niche is warranted.
Generation and Validation of the Multi-color Hue Mouse
INTRODUCTION Model as a Clonal Tracking Tool
We took advantage of the fluorescent tagging system first
Heterogeneity among cells within tissues is increasingly recog- developed for clonal lineage tracking in the nervous system
nized in both normal and malignant conditions (Ding et al., to generate a transgenic animal bearing fluorescence protein
2012; Lemischka et al., 1986; Notta et al., 2011). Data in the he- encoding genes that could be recombined to provide a range
matopoietic system increasingly point to populations of cells of distinct colors (Livet et al., 2007). We created a new mouse
being comprised of subpopulations with divergent properties. strain (termed ‘‘HUe’’) in which the fluorescent tags were driven
These include cells that have distinctive behaviors in terms of by a ubiquitously expressed chicken actin promoter with
cell production and lineage bias (Dykstra et al., 2007; Picelli intervening stop sequences flanked by LoxP sites followed by
et al., 2013). Hematopoietic stem cells have been demonstrated a fluorescent cassette containing GFP, EYFP, tDimer2, and
A C E
D F
Figure 1. Endogenous Labeling of Individual Cells with Different Colors

(A) HUe transgene construct contains GFP, EYFP, tDimer2, mCerulean fluorescent cDNAs arranged in tandem invertible segments flanked by four LoxP sites. A
LoxP variant floxed STOP sequence was inserted in front of the fluorescent cassette, thereby prohibiting background fluorescence in the absence of Cre
recombinase.
(B) Cre-mediated excision of the STOP sequence and random inversion or excision of the fluorescent cassette generates four possible color outcomes. Color
complexity is further increased by insertion of multiple copies of transgene into the mouse genome. A HUe founder line with 20 copies of transgene inserted can
have 103 color combinations.
(C) Testing the efficiency of expression of fluorescent proteins by crossing the HUe mice with different strains containing a Cre-driving promoter. When the HUe
mouse was crossed to the limb mesenchyme-specific Prx1-CreER strain, we observed efficient endogenous labeling of cells in a fracture callus with various colors.
(D) Chondrocytes were labeled with color diversity when the HUe mouse was mated to a collagen-specific Cre driver, Col(II)-CreER.
(E) Hematopoietic cell labeling was assessed by crossing the Mx1-Cre strain with HUe (Mx1-Cre;HUe). When the Mx1-Cre;HUe mouse was given a pulse of pIpC,
multi-colored hematopoietic cells within the calvarial cavity could be visualized using an intra-vital fluorescent microscopy system.
(F) Bone marrow cells of the same animal could be extracted and re-visualized on glass sections with fluorescent confocal microscopy.
(G) Clonal quantification of hematopoietic sub-compartments. Flow cytometry can identify and isolate hematopoietic stem cells (HSCs), progenitor cells:
multipotent progenitors (MPPs), common lymphoid progenitors (CLPs), granulocyte macrophage progenitors (GMPs), megakaryocyte erythroid progenitors
(MEPs), and mature cells of different lineages: B cells, T cells, monocytes, granulocytes, and erythroid cells. Endogenous HUe fluorescence from these pop-
ulations is shown in the 3D graphs with x axis (tDimer2, red fluorescence), y axis (Cerulean, blue fluorescence), and z axis (EYFP, green fluorescence) representing
increasing fluorescent intensities in log scale. The panel shows that hematopoietic cells at all hierarchy can be identified and clones within each compartment can
be isolated by flow cytometry.
Cerulean intercalated by multiple LoxP pairs (Figure 1A) to To examine the efficiency of HUe in marking hematopoietic
enable Cre-induced stochastic recombination and expression. cells, we crossed the HUe mouse with the interferon-inducible
The design is very similar to the independently created Mx1-Cre strain (Kühn et al., 1995) (herein Mx1-Cre;HUe). We
‘‘Confetti’’ mouse (Snippert et al., 2010) with the distinction did not observe background fluorescence in the absence of
that the HUe mouse has 20 tandemly integrated cassettes Cre including in transplantation-mediated stress settings (data
enabling a wider range (theoretically >103) of possible colors not shown). We activated endogenous hematopoietic cell
generated by random combinations, in analogy to the color labeling by administering polyinosinic:polycytidylic acid (pIpC)
range generated by a television screen using three basic color into Mx1-Cre;HUe mice and evaluated mice after an interval
hues (red, blue, green). We crossed HUe with various pro- (>30 days) when the effects of interferon induction have been
moter-driven Cres to demonstrate marking in mesenchymal long shown to subside (Essers et al., 2009). Intra-vital imaging
or hematopoietic tissue (Figures 1C–1F). in live animals showed labeling of cells in the calvarial bone
Cell 167, 1310–1322, November 17, 2016 1311

Experimental Set 1 Experimental Set 2 marrow (Figure 1E). These cells can be harvested from the bone
marrow, stained with hematopoietic cell surface markers (Table
S1), isolated by flow cytometry, and re-visualized by fluorescent
m1 m9 microscopy (Figures 1F and 1G). We confirmed HUe fluores-
cence fidelity upon 12 days of cell division and differentiation
by single cell colony assay in vitro (Figure S1A) and transplanta-
tion in vivo (Figures S1B and S1C).
Clusters of cells exhibiting distinct color signatures were
m2 apparent in the setting of Mx1-Cre-activated HUe mice (Fig-
m10 ure 1G). To evaluate whether such clusters represented cell
clones, we transplanted single LT-HSCs of distinct colors into
214 lethally irradiated recipients to define the boundaries of
clonal populations by flow cytometry (Figure S1C). At 30 weeks
m3 post-transplant, the fluorescent positive population in blood
and bone marrow was evaluated. We then used similar gates
m11 to isolate cells from the bone marrow of activated Mx1-Cre;
HUe mice. Single clusters of cells with immunophenotypic
signature of granulocyte monocyte progenitor (GMP) were
m4 sorted and transplanted into sublethally irradiated mice.
Spleens were harvested at day 11 and DNA fingerprinting
Mice / Clones
Mice / Clones
m12 performed (Figure S2), demonstrating clonal signatures dis-

tinctive for each cluster. Clusters of cells with the same color,
therefore, likely represent clonal descendants. However, over-
lapping boundaries can hamper our ability to distinguish indi-
m5 vidual clones. To minimize that issue, our analyses only
m13 involved animals with up to 15 color clones and used statistical
treatment that did not require explicit partitioning of clones
(Figure S3).
Hematopoiesis Is the Composite Product of Dissimilar

m6 m14 Clones with Stereotypical Behaviors
We used Mx1-Cre;HUe mice to examine the clonal dynamics of
native hematopoiesis. Sixteen Mx1-Cre;HUe mice were injected
with pIpC to induce endogenous labeling of hematopoietic cells.
Bone marrow aspirates were collected at 2, 3, 5, and 10 months
m15 of age and subjected to flow cytometry to quantify the total num-
m7 ber of existing HUe fluorescent clones and the size of each clone
(Figure 2; ‘‘Statistical Analysis’’ in the STAR Methods).
Results revealed that the hematopoietic tissue is composed
of both persistent and fluctuating clones. We identified 5–11
m16 clones per mouse with >15 cells throughout the 10-month
m8 chase period in a total of 16 animals (Figure 2). The clones
had uneven, near-exponential distribution in size, with 1-4 large
clones accounting for 80% of all cells in each animal (Fig-
2 3 5 10 2 3 5 10 ure S4A). Although clonal changes between month 2 and 3
Age (months) Age (months) were more pronounced, clonal dynamics were relatively
consistent from month 3 to month 10 (Figures 2 and S4B).
Figure 2. In Vivo Hematopoietic Dynamics under Homeostatic All animals showed one to four clones that was found at
Conditions month 2 and persisted until month 10. Among the clones that
To assess in vivo hematopoietic dynamics in animals under homeostatic persisted, some were stable in size, while others fluctuated
conditions, 16 pIpC-induced Mx1-Cre;HUe mice (m1–m16) were subjected to
over time. Ten out of 16 mice had one to three clones identified
bone marrow aspiration of hematopoietic cells at months 2, 3, 5, and 10. Each
uniquely colored circle represents an individual clone in an animal. The area of
at month 2 but disappeared at month 10. Twelve animals
the circle is proportional to the size of each clone. Bone marrow hematopoietic showed emergence of one to two new clones during the chase
clonal dynamics under homeostatic conditions were tracked from month 2 to period, and sometimes these new clones became the dominant
month 10. Clones <15 cells throughout the tracking period were not scored. clones at later time points. Overall, our results show that native
See also Figures S3 and S4. murine hematopoiesis is composed of a few major labeled
clones that persist and others that expand, disappear, or newly
emerge.
1312 Cell 167, 1310–1322, November 17, 2016

Figure 3. Hematopoiesis Is Composed of Dissimilar Clones with Cell-Autonomous Behavior
(A) The HUe recipient cohort—a cohort of mice with highly similar clonal fluorescence pattern. To generate a recipient cohort, Mx1-Cre;HUe mice induced with
pIpC were used as donors. Randomly labeled fluorescent bone marrow cells from multiple Mx1-Cre;HUe donors were pooled as one mixture and isolated by
fluorescence-activated cell sorting (FACS). Fluorescent HSPCs (LineageLoSca+cKit+) mixed with support cells from C57BL/6J were transplanted into each of 20
lethally irradiated C57BL/6J recipients. After 16 weeks of reconstitution, the recipients showed high consistency in clonal output including proliferation, fluo-
rescence, and lineage characteristics in all hierarchy of hematopoietic cell types. HUe clonal fluorescent patterns of B cells, monocytes, and erythroid cells in
multiple recipients are shown, illustrating consistency among the recipients and the distinction between different cell compartments.
(B) Unbiased hierarchical clustering of HUe fluorescence profiles. The distribution of HUe fluorescence in each hematopoietic cell type (e.g., B cells in Figure 1G)
represents the clonal composition of that cell type in each mouse (Ctrl 1–5). Hierarchical clustering of such HUe profiles is shown. The clustering groups the same
cell type samples from different mice together, illustrating that the pronounced pattern of proliferation and lineage bias exhibited by the individual clones is
sufficient to consistently distinguish individual cell types from multiple recipients based on their clonal composition.
HSC Behavior Is Highly Cell Autonomous clone size) and lineage commitment. Each color-defined clone
A major advantage of the HUe model is that we can measure and behaved similarly in different recipients, consistently exhibiting
characterize the behavior of endogenous HSC in vivo, then cell activation, proliferation, and lineage differentiation charac-
selectively isolate live HSCs based on fluorescent tagging, trans- teristics distinct from the other clones. The individual HUe fluo-
plant them into new hosts, and study their long-term behavior in rescent profiles of different cell types (Figure 3A, e.g., B cells
competition or under varying stress conditions. This cannot be from recipient 1) collected from multiple recipients were
achieved by DNA barcoding or transposon insertion analyses analyzed using unbiased hierarchical clustering. Clustering of
because these methods require the destruction of cells. Trans- the HUe fluorescent profiles grouped the same cell types
planting equal aliquots of randomly fluorescent-tagged donor together even though they were from different recipients (Fig-
HSCs into 20–40 C57BL/6J recipients resulted in an unantici- ure 3B). This demonstrates that the extent and consistency
pated consistency of clonal behavior in recipients (p < 10 16) of clone-specific biases was sufficiently large to distinguish
(Figures 3A, S4C,and S5A–S5C). That is, the individual clones different hematopoietic cell types in recipient mice solely based
in the recipients behaved after transplant as they had as endog- on their clonal composition, as measured by the fluorescent dis-
enous HSC in the donor in terms of cell proliferation (defined by tribution of each cell type. We termed the group of transplanted
Cell 167, 1310–1322, November 17, 2016 1313

recipients a ‘‘recipient cohort.’’ The consistency of behavior in in the epigenetic state of non-differentiated HSCs (Figure 6A),
a recipient cohort was striking and suggests cell autonomy gov- the Cohort1.R clone showed significantly higher DNA methyl-
erns the in vivo behavior of HSCs. ation at HSC-specific enhancers and promoters and lower tran-
scriptional expression magnitude of such genes (Figure 6B).
HSC Cell Autonomy Is Persistent upon Stress Consistently, the Cohort1.R clone showed higher expression
Using recipient cohorts, we could then test how individual of genes associated with HSC proliferation (Kittler et al., 2007;
HSC clones respond to a particular stress or perturbation. Venezia et al., 2004) and G1 phase (Oki et al., 2014) and lower
We divided a recipient cohort into sub-cohorts that received expression of genes characteristic of unmobilized HSCs (Cham-
either saline control or inflammatory stress (0.3 mg/kg lipo- bers et al., 2007; Forsberg et al., 2010) and G0 phase (Oki et al.,
polysaccharide [LPS], intraperitoneal) (Figure 4A) Similarly, 2014) compared to the Cohort1.Y clone (Figure 6C). These
another recipient cohort was divided into sub-cohorts that consistent patterns of multiple epigenetic and transcriptional
received either no treatment or genotoxic stress (4.5 Gy total regulation highly reflected the enhanced proliferation rate
body irradiation) (Figure 4B). Stressed and non-stressed observed for the dominant Cohort1.R clone by flow cytometry.
sub-cohorts were harvested and analyzed by flow cytometry. The strong lymphoid bias observed for the Cohort1.Y clone
Notably, while individual clones behaved differently was reflected in the epigenetic state of regulatory regions with
(decreased or increased in clone size) in response to stress, significantly lower DNA methylation levels in CLP-specific
consistency of response for a given clone was again observed enhancer regions (Figure 6D). However, no significant differ-
in all recipients (Figures 4C and 4D). For example, while both ences were observed in the expression magnitude of the CLP-
HSCs and progenitors responded to LPS, there was a signif- or CMP-specific genes or the DNA methylation state of their
icant reduction in the size of myeloid lineage clones at 12 hr promoters (not shown). These results suggest that physiological
immediately following LPS stress (Figure 4E) but an overall in- differences between clones, such as lineage bias, can arise due
crease in the number of clones (Figure S5D, p < 0.036), to distinct epigenetic configuration of the regulatory regions at
consistent across the treated recipient cohort (Figure 4C, the level of HSCs (Figure 6E). Furthermore, these differences
p < 0.001). In contrast to LPS, irradiation reduced overall he- may not manifest themselves in transcriptional differences until
matopoietic clonal complexity (Figure S5E) and caused later stages of differentiation. Therefore, a ‘‘poised’’ equipotent
expansion of the remaining clones, most notable at day 44 state may be evident in the transcriptome, but the epigenome
post-radiation when the hematopoietic system had returned provides lineage-constraining boundaries within which lineage
to homeostasis (Figure 4F). Again, we observed statistically bias will eventually be resolved.
significant consistency of clonal response at all levels of the Given that the enhancer DNA methylation state was particu-
hematopoietic hierarchy across all recipient cohorts subjected larly informative about lineage bias, we also examined whether
to irradiation (Figure 4D, p < 0.001). other aspects of epigenetic state, such as chromatin accessi-
bility, are also informative about inter-clonal variation. We have
Immunophenotypically Equivalent Stem Cell isolated an independent set of HSC clones (‘‘Cohort2.G’’ and
Subpopulations Have Distinct Functional Attributes that ‘‘Cohort2.P’’) and in addition to measuring gene expression
Are Associated with Distinct Transcriptional and and DNA methylation, applied assay for transposase-accessible
Regulatory States chromatin with high-throughput sequencing (ATAC-seq) (Lara-
To explore the molecular mechanisms that underpin the Astiaso et al., 2014) to assess genome-wide chromatin accessi-
remarkably consistent behavior observed in HSC clones, we bility profile for each clone, in parallel with flow cytometric
examined epigenetic and transcriptional states of select clones analysis of their functional output (Figure 7A). Analysis of clone
in parallel with flow cytometric analysis of the functional output size and clonal contribution to different lineages showed that
of these clones in terms of clonal expansion and lineage while the Cohort2.G HSC clone was larger and contributed to
outcome. LT-HSCs (LineageLoSca+cKit+CD48 CD150+) be- both myeloid and lymphoid production, the Cohort2.P HSC
longing to two clones (‘‘Cohort1.Y’’ and ‘‘Cohort1.R’’) were clone was smaller and mostly contributed to lymphoid produc-
picked, and their DNA methylation and transcriptome states tion (Figures 7B and 7C). Analysis of both chromatin accessibility
were assessed using whole-genome bisulfite sequencing and and DNA methylation states of known enhancer regions
RNA sequencing (RNA-seq) assays, respectively (Figure 5A). confirmed that the regulatory state of the collected cells was
Analysis of clonal contributions to different lineages (Figure 5B) more similar to that of HSCs than progenitor or effector cells
indicated that the exemplar Cohort1.R clone exhibited higher (Figures 7D and 7E). The Cohort2.G clone exhibited epigenetic
proliferation rates (i.e., contributes higher than expected frac- signatures of CMP-specific enhancers (Figure 7F), associated
tion of cells to the multipotent progenitor [MPP] compartment) with a strong myeloid output as measured by flow cytometry
(Figure 5C) and biased toward myeloid differentiation (Fig- (Figures 7B and 7C). The ATAC-seq differences were particularly
ure 5D). By contrast, the ‘‘Cohort1.Y’’ clone showed lower prominent, revealing significantly higher chromatin accessibility
proliferation rates and exhibited strong bias toward lymphoid of the CMP-specific enhancers in the Cohort2.G clone when
production (i.e., contributed to the common lymphoid pro- compared between the two clones or to a set of CLP-specific en-
genitor [CLP] but not common myeloid progenitor [CMP] hancers within the Cohort2.G clone (Figure 7F). At the same time,
compartment) (Figures 5C and 5D). analysis of RNA-seq data did not show significant differences in
Comparing the DNA methylation patterns of the HSCs from CMP/CLP transcriptional bias between the clones (data not
the two isolated clones, we found that while both clones were shown).
1314 Cell 167, 1310–1322, November 17, 2016

Saline Control No IR Control
A 16 w
eeks Harvest B 16 w
eeks 0
Harvest
poly(I:C) 0 poly(I:C)
LPS 4.5 Gy IR
16 weeks 16 weeks
Harvest Harvest
-12hrs 0 -14 Day 0
16 w Mx1Cre;HUe 16 w
Mx1Cre;HUe eeks LPS eeks 4.5 Gy IR
Harvest
C67BL/6J C67BL/6J Harvest
HUe Recipient -44 day 0 HUe Recipient -44 Day 0
Cohort Cohort
2.0
C ∆LKS-∆LKS P<0.01 D
density
Saline Control LPS Treated Clonal Difference
∆CMP-
∆CMP
LPS IR
VS
0.0
Ter119
Ter119
Mac1
B220
Mac1
GMP
CMP
B220
MEP
GMP
CMP
MEP
CD3
CLP
LKS
CD3
−1.0 −0.5 0.0 0.5 1.0
CLP
LKS
Gr1
Gr1
∆LKS correlation of difference patterns
LKS
Pearson R = 0.35 SLAM *
SLAM *
LKS * LKS *
CLP * CLP *
∆LKS vs. ∆CMP CMP * CMP *
Pearson R = 0.01 GMP * GMP *
MEP * MEP *
VS P<0.001 P<0.001
B220 * B220 *
P<0.01 CD3 * P<0.01
CD3 *
∆CMP P<0.05 Gr1 * P<0.05
CMP Gr1 *
P<0.10 P<0.10
Pearson R = 0.41 Mac1 * Mac1 *
not significant not significant
E Control MEP Erythroid Cells F Control

14d after IR
MEP Erythroid Cells
12 hrs after LPS CMP CMP
44d after LPS Monocytes
44d after IR Monocytes
GMP GMP
Granulocytes
Clone Size Difference (fraction of total cells)

HSC MPP
Clone Size Difference (fraction of total cells)
HSC MPP Granulocytes
T Cells T Cells
CLP CLP
B Cells B Cells
T
SL
LK
C
T
SL
LK
LP
D
M
EP
LP
EP
AM
3
AM
P
P
LPS Stress IR Stress
Figure 4. Hematopoietic Cell Autonomy Is Persistent upon Stress

(A) The experimental design to study hematopoietic response upon LPS-mediated inflammatory stress. A HUe recipient cohort was generated by transplanting
aliquots of the same mixture of randomly labeled HSPCs into 15 lethally irradiated C57BL/6J recipients. After 16 weeks of reconstitution, the cohort was divided
into three sub-cohorts. One sub-cohort received LPS injection 12 hr prior to analysis (12hrs), another sub-cohort at 44 days prior (44 day), and a third sub-cohort
received PBS treatment (control). All mice, including the control group, were sacrificed on the same day and assayed by flow cytometry.
(B) The experimental design to study the effect of genotoxic stress on hematopoietic clonal dynamics. An independent HUe recipient cohort was generated as in
(A). After 16 weeks of hematopoietic reconstitution, the cohort was divided into three sub-cohorts: a control group that received no irradiation, one group received
4.5 Gy irradiation 14 days prior to data collection (14 day), and another group received 4.5 Gy irradiation at 44 days prior to data collection (44 day). All mice,
including the control group, were harvested for data collection on the same day.
(C) To test for consistency of clonal response to LPS treatment among cohort recipients, we performed pairwise comparisons of the fluorescent clonal pattern for
each hematopoietic compartment (e.g., SLAM, LKS, CLP, etc.) and across each LPS-stressed and control animal. For all pairwise comparisons, the correlation of
the LPS-associated clonal changes was significantly higher within a compartment than between compartments. In other words, the clonal response to the LPS
insult was distinct among individual hematopoietic cell types, but was highly consistent across multiple HUe recipients (*p < 10 7 significance when comparing a
given cell type with all other cell types).
(D) Consistency of the clonal response to irradiation treatment was assessed in the same way as in (C). For the vast majority of pairwise comparisons, the clonal
changes within a cell compartment showed significantly higher correlation than between cell compartments. Each cell type was significantly distinct when
compared against all other cell types (*p < 10 7) but consistent across multiple recipients.
(E) LPS treatment led to reduction in output of existing clones. Illustration of the hematopoietic clonal response to LPS inflammatory stress at 12 hr and day 44 in
comparison to saline-treated controls at the stem, progenitor, and mature stages. To quantify the effect of LPS treatment, we performed pairwise comparison of
mice from LPS-treated and control groups, detecting parts of the fluorescent spectra showing statistically significant differences in cell density. The barplot
shows average change of cell numbers (measured as a fraction of the total number of cells measured) within such regions (whiskers show 95% confidence
interval). The negative values correspond to decrease in the cell counts relative to control group. The analysis shows that at 12 hr following LPS treatment, the
majority of existing clones were significantly reduced in size, accompanied by appearance of small new clones (Figure S5D). Such changes preferentially
impacted the HSC-MPP-CMP-GMP branch of hematopoiesis, and were largely attenuated at 44 days post LPS treatment.
(F) Irradiation triggered expansion of existing clones. The barplots show the prevalent direction of cell density changes at 14 or 44 days following irradiation
treatment. The changes were most pronounced at 44 days and showed widespread increase in the output of individual clones, significantly affecting most of the
hematopoietic cell types.
See also Figure S3.
Cell 167, 1310–1322, November 17, 2016 1315

A
C D
Figure 5. Interrogation of the Molecular Signature Associated with Distinct Functions of HSC Clones
(A) To examine molecular differences associated with phenotypically distinct HSC clones, LT-HSC cells belonging to two selected clones (Cohort1.Y and Co-
hort1.R) were harvested from a HUe recipient cohort, subjected to RNA-seq (transcriptome), WGBS (DNA methylation) assays, and flow cytometric measurement
of multi-lineage reconstitution.
1316 Cell 167, 1310–1322, November 17, 2016

The analysis of molecular signatures underlying clonal biases genitor clones rather than HSCs. This discrepancy in clone
can be limited by intra-clonal variability. To evaluate how the number is likely due to the limited sensitivity of our method
degree of transcriptional variability within each clone relates and the inability to measure clone size in the transposon-based
to the variability between clones, we performed single-cell tagging method. We found that a few labeled clones support
RNA-seq analysis on HSCs associated with different clones the production of progeny for up to 1 year, but we cannot es-
in an independent cohort (see the STAR Methods). We found timate what proportion of the active HSC were labeled by our
that consistent with the bulk measurements, different clones method. Of note, we did not find labeled progenitors without
showed statistically significant bias in their distribution within a corresponding labeled stem cell, but again our method
the transcriptional space. However, the extent of intra-clonal does not have the ability to comprehensively scan for cell
variation varied from one clone to another (Figure S6). For clones. Rather, we can define the attributes of a limited number
instance, while one of the clones was preferentially found of clones over time. Inferring from them, our data support a
outside of the transcriptional state with a mitotic signature indi- model where hematopoiesis is the composite of multipotent
cating overall lower cell-cycle frequency, there were cells from stem cells where some HSC persist for long intervals while
that clone that were also found within the mitotic state. Pres- others are transient. The advantages of such a composite
ence of high-coverage whole-genome bisulfite sequencing model where clones turn cell production ‘‘on’’ and ‘‘off’’ or
(WGBS) data also allowed us to check whether the clones iso- maintain cell production ‘‘on’’ seem self-evident as a means
lated from each cohort showed notable genomic differences of sustaining hematopoiesis in the context of widely varying
that could potentially impact their phenotype (see the STAR and at times hostile, physiologic challenges.
Methods). Despite good sensitivity of the approach, genomic In addition, our data points to an unanticipated stereo-
copy number variation (CNV) analysis showed no difference typical behavior of clones upon transplantation. While it has
between clones that belong to the same cohort (Figure S7), been previously reported that single isolated cells can be
suggesting that the different HSC phenotypes we observed in serially transplanted with retained behavior (Dykstra et al.,
these clones is a true biological phenomenon and not due to 2007; Picelli et al., 2013), it is not clear that this behavior
transgene-induced aberrant chromosome rearrangement. reflects the behavior of the endogenous cells. The system re-
Together, these data show that HSC clones exhibit inter-clonal ported here enables comparison of endogenous and trans-
variation in behavior that is mirrored by the differences in their planted HSC. While the transient induction of interferon with
epigenetic state. pIpC used to activate the Mx-1 promoter may not be consid-
ered an unperturbed state, it is modest compared with trans-
DISCUSSION plantation, and it has been shown that HSC functions revert to
baseline shortly after pIpC exposure (Essers et al., 2009).
Our data demonstrate that endogenous clonal behavior can Overall, our data are consistent with behavioral features of
be quantitatively monitored in vivo under varying conditions individual HSC clones being established in development prior
and the clonal cells can be contemporaneously assessed for to young adulthood and persistently manifest under varying
transcriptional and epigenetic characteristics. The results indi- conditions.
cate that endogenous hematopoiesis is a composite of highly Lineage bias and proliferative potency has been demon-
heterogeneous clones with very different cell kinetics, roughly strated previously (Dykstra et al., 2007; Morita et al., 2010;
balancing multipotent clones that are transient (generating cells Muller-Sieburg et al., 2004; Picelli et al., 2013), including at
for short intervals) with clones that provide persistent cell the clonal level (Sun et al., 2014b). The data here indicate
output. These data are consistent with the recent findings of that multiple other characteristics including sensitivity to inflam-
Busch et al. (2015) where pulsed labeling of the pool of HSC mation or radiation are also clone-specific features. The consis-
was used to model cell kinetics. HSC were found to infre- tency of these characteristics despite clones residing in
quently (1/110 HSC per day) generate downstream progeny different hosts again points to rather remarkable cell autonomy.
producing blood cells while maintaining the HSC pool. Our While stem/progenitors are generally thought of as relatively
data are also consistent with recent data in mouse (Verovskaya plastic cells with the capacity to respond variably to their
et al., 2014) and human (Biasco et al., 2016), that the majority of specific environment, a wide range of behaviors appear to be
the hematopoietic population is sustained by a few major HSC highly constrained by cell intrinsic features. The functional
clones despite the existence of smaller clones. This is in characteristics of cell pools are therefore likely to reflect an
contrast to the report by Sun et al. (2014b), which suggests ensemble phenotype of individual clones with much more
that murine hematopoiesis is maintained by thousands of pro- bounded behaviors. These stereotyped behaviors have distinct
(B–D) Both long-term lineage contribution and clone size production of the two select LT-HSC (LineageLoSca+cKit+CD48 CD150+) clones (Cohort1.Y and
Cohort1.R) to HSC, MPP, CLP, CMP, GMP, MEP, B cells, T cells, monocytes, granulocytes, and erythroid compartments were measured by flow cytometry (B
and D) and analyzed as described in Figure S3. The percentage of cells representing either Cohort1.Y or Cohort1.R among all fluorescent cells in each he-
matopoietic compartment was shown. The Cohort1.R clone exhibited higher proliferation rate as it increased in size (density of cells) from HSC to MPP
compartment (C and D) and was present in all hematopoietic compartments particularly toward myelopoiesis. In contrast, the Cohort1.Y clone showed lower
proliferation rate (i.e., decreased clone density from HSC to MPP) and a strong presence in the CLP compartment (C), but reduced production in the CMP, GMP,
and downstream myeloid compartments.
Cell 167, 1310–1322, November 17, 2016 1317

1318 Cell 167, 1310–1322, November 17, 2016

molecular features at least as we defined by using the fluores- guides their function even with stress at times when they are pre-
cence system for isolation and analysis of clones with specific sumably exposed to highly different exogenous conditions.
functions. While we assessed a limited number of cells with On their surface, the results of this study argue against the
particular functional attributes, we envision that these data concept of the HSC as a plastic cell capable of different func-
will encourage a more comprehensive analysis of clones with tions in response to particular organismal needs. Rather, our
different lineage biases, proliferation, response to inflammation data indicate that HSC have clone-specific stereotyped behavior
or tolerance of genotoxicity as we defined here, but include a that is epigenetically constrained. Varied responses by the HSC
far greater range of activities. From the data we have and the pool to particular stresses may therefore reflect differential acti-
consistency of behaviors in multiple hosts, we anticipate that vation of clones of cells that each have their own predefined
each of the functional clonal behaviors will have molecular function, rather like a set of chess pieces. Achieving a nuanced,
signatures. Defining these signatures may provide both new in- condition-specific response may then reflect differential activa-
sights into how in vivo HSC functions are governed and identify tion of particular clones or particular combinations of clones.
points where molecular manipulation can change specific The data also argue against at least some aspects of the niche
in vivo outcomes. hypothesis: that cells are dependent upon a specific microenvi-
Accomplishing a link of functional features with gene expres- ronment for their regulated self-renewal and differentiation. HSC
sion and epigenetic characterization is a challenging dimension behaviors such as lineage and proliferation outcomes were
of stem cell biology as noted and explored by others (Wilson preserved even after transplantation into an independent host
et al., 2015). Single cell studies of function and gene expression and thereby, a presumably independent niche. It may be that
have been conducted (Tsang et al., 2015; Wilson et al., 2015), the niche governs only fundamental aspects of HSC behavior
however, they are generally performed on different cells within like survival or aspects of cell state other than the ones we
an immunophenotypically, not functionally or clonally isolated measured. However, the data are also consistent with a faculta-
subset. Similarly, detailed epigenetic characteristics have been tive model where the niche is generic and responds to stem/pro-
assessed in populations of cells isolated by immunophenotype genitor cells to provide the specific support functions dictated by
(Bock et al., 2012; Sun et al., 2014a). We have attempted to the particular stem/progenitor cells it serves. The largely cell
accomplish these analyses at a clonal level with clonal functional autonomous HSC might ‘‘condition’’ its own niche. An alternative
features defined in vivo. The result is that HSCs appear to have model is that specific HSC clones find and localize to very
epigenetic features dictating how they will behave. The transcrip- specific niches that match their needs. There may be a heteroge-
tome was not consistently correlated with behavior. Rather, DNA neity among niches comparable in complexity to the HSC pool
methylation and chromatin accessibility data provide a clearer and transplantation succeeds when specific functional niches
window into the ultimate function of specific clones. HSCs there- pair with specific functional HSC partners. Distinguishing be-
fore appear to establish and retain a memory imposed on them tween these alternatives will help define the relative importance
prior to the testing we conducted in the young adult stage of or- of niche components to hematopoietic function and guide efforts
ganism development. This epigenetic memory is persistent and to control cell production.
Figure 6. Immunophenotypically Equivalent HSCs Have Distinct Functional Attributes that Are Associated with Distinct Transcriptional and
Epigenetic Regulatory States
(A) The epigenetic state of both Cohort1.Y and Cohort1.R clones matched that expected of the HSCs. The DNA methylation state of enhancers activated
at different stages of hematopoiesis was examined in the two clones. Both clones showed equally low methylation levels at the enhancers active at the
HSC stage, with higher methylation observed at the enhancer regions activated at later MPP and CLP stages. Whiskers represent 95% confidence
interval.
(B) Higher proliferative bias of the Cohort1.R clone was apparent from its epigenetic state. Gene set enrichment analysis (GSEA) analysis showed higher
DNA methylation of HSC-specific enhancers and lower methylation of MPP-specific enhancers in the Cohort1.R clone relative to the Cohort1.Y clone.
Similarly, Cohort1.R clone showed higher DNA methylation at HSC-specific and lower at MPP-specific promoter regions. Combined with the corre-
spondingly higher expression of MPP- and lower expression of HSC-specific genes in the Cohort1.R clone, all three types of molecular signatures reflect
higher proliferative bias of the Cohort1.R clone. In each GSEA plot, the genes (enhancer/promoters) are ranked according to their relative expression (DNA
methylation level) ration between Cohort1.Y and Cohort1.R, with the highest Y/R ratios positioned on the left. The top plot shows rank sum statistics with
the point of maximum deviation from 0 considered to be the enrichment score of that set (red vertical line). The middle plot marks the positions of the genes
(promoters/enhancers) that belong to the set. The bottom plots show log2 fold ratio of expression (DNA methylation) magnitudes between Cohort1.Y and
Cohort1.R.
(C) GSEA analysis showed higher expression of proliferation-associated genes and genes associated with G1 phase in the Cohort1.R clone compared to the
Cohort1.Y clone, consistent with higher relative contribution of the Cohort1.R clone to the MPP compartment observed in fluorescence data. Higher relative
expression of genes associated with unmobilized HSC and G0 phase signature was seen in the Cohort1.Y clone.
(D) Enhancer state reflected lymphoid-specific bias of the Cohort1.Y clone. Consistent with the pronounced lymphoid bias observed for the Cohort1.R clone in
fluorescence data, Cohort1.Y clone showed lower DNA methylation at CLP-specific enhancer elements and higher methylation at CMP-specific enhancers
relative to the Cohort1.R clone.
(E) Despite both Cohort1.R and Cohort1.Y clones having been immunophenotypically defined as HSCs, molecular profiling of their epigenetic and transcriptional
landscape revealed distinctive signatures reflective of their differential functional behavior. Consistent with its larger clone size, the Cohort1.R clone had
distinctive DNA methylation pattern at enhancer and promoter regions, as well as transcription of genes indicative of a proliferative cell state. In comparison, the
Cohort1.Y clone showed a pronounced lymphoid output and such lineage preference was manifested by lower DNA methylation of lymphoid-specific enhancer
regions, while no discernable pattern was detected in terms of promoter methylation or gene transcription.
Cell 167, 1310–1322, November 17, 2016 1319

A Figure 7. Enhancer Methylation State Re-
flects Functional Differences between HSC
Clones
(A) In a second independent experiment, LT-HSC
cells belonging to two independently selected
clones (Cohort2.G and Cohort2.P) were harvested
from an independent HUe recipient cohort, sub-
jected to RNA-seq (transcriptome), WGBS (DNA
B methylation), and ATAC-seq (chromatin accessi-
bility) assays, as well as flow cytometric measure-
C ment of clone size and multi-lineage reconstitution.
(B and C) We assessed both long-term lineage
contribution and clone size production of the
Cohort2.G and Cohort2.P clones toward myeloid
(Mac+) and lymphoid (B220+) lineages by flow
cytometry. The Cohort2.G clone had increased
clone size (density of cells) at the HSC stage (C).
While it contributed moderately to lymphoid cells,
it had a strong myeloid output (C), consistent with
the Z score heatmap indicating statistically sig-
nificant (p < 10 3) bias of the Cohort2.G clone
toward the myeloid lineage (B). In comparison, the
D E Cohort2.P HSC clone was smaller, contributed
moderately to lymphoid cells and had reduced
production in myeloid cells.
(D) Both clones exhibited chromatin methylation
pattern representative of LT-HSC and ST-HSC but
not progenitors at the lineage-specific enhancer
regions.
(E) Average chromatin accessibility, as measured
by the ATAC-seq assay, at the lineage-specific
enhancer regions in the Cohort2.G and Cohort2.P
clones (two replicate measurements are shown for
F each clone). Consistent with the DNA methylation
results shown in D, ATAC-seq assay indicated
highest average accessibility at enhancers asso-
ciated with LT- and ST-HSC states.
(F) Analogous to (E), Cohort2.G clone showed
higher accessibility of the CMP-specific en-
hancers (relative to CLP-specific enhancers, and
relative to the CMP-specific enhancers in the
Cohort2.P clone), consistent with the strong
myeloid bias observed for the Cohort2.G clone in
the flow cytometric measurements.
(D)–(F) Whiskers give 95% confidence interval.
See also Figures S3, S6, and S7 and Table S2.
STAR+METHODS B Whole Genome Bisulfite DNA Sequencing (WGBS) and

Bulk RNA-Seq
Detailed methods are provided in the online version of this paper B Single-cell RNA-Seq
and include the following: B ATAC-Seq
d QUANTIFICATION AND STATISTICAL ANALYSIS
d KEY RESOURCES TABLE B Identifying Significant Differences between Fluores-
d CONTACT FOR REAGENTS AND RESOURCE SHARING cence Patterns
d EXPERIMENTAL MODEL AND SUBJECT DETAILS B Enumerating Individual Clones
B Generation of the HUe Mouse Model B Whole-Genome Bisulfite Sequencing (WGBS)
B Mouse Models Processing
d METHOD DETAILS B CNV Analysis of WGBS Data
B Flow Cytometry B Computational Validation of the HSC States of the
B Bone Marrow Aspiration Selected Clones
B Generation of the HUe Recipient Cohort B Gene Set Enrichment Analysis for Testing Cell Type
B LPS Stress Experiment Bias of Clones
B Irradiation Stress Experiment d DATA AND SOFTWARE AVAILABILITY
1320 Cell 167, 1310–1322, November 17, 2016

SUPPLEMENTAL INFORMATION evolution in relapsed acute myeloid leukaemia revealed by whole-genome
sequencing. Nature 481, 506–510.
Supplemental Information includes seven figures and two tables and can be Dykstra, B., Kent, D., Bowie, M., McCaffrey, L., Hamilton, M., Lyons, K., Lee,
found with this article online at http://dx.doi.org/10.1016/j.cell.2016.10.045. S.J., Brinkman, R., and Eaves, C. (2007). Long-term propagation of distinct he-
matopoietic differentiation programs in vivo. Cell Stem Cell 1, 218–229.
Dykstra, B., Olthof, S., Schreuder, J., Ritsema, M., and de Haan, G. (2011).
Clonal analysis reveals multiple functional defects of aged murine hematopoi-
V.W.C.Y. created the mouse model, designed concept, performed experi-
etic stem cells. J. Exp. Med. 208, 2691–2703.
ments, interpreted data, and wrote the manuscript. R.Z.R. contributed flow cy-
tometric expertise and participated in flow experiments. T.O. performed gene Essers, M.A., Offner, S., Blanco-Bose, W.E., Waibler, Z., Kalinke, U., Duch-
set enrichment analysis. J.W. provided biocomputational contribution. B.S. osal, M.A., and Trumpp, A. (2009). IFNalpha activates dormant haematopoietic
and C.C. participated in multiple experiments in this project. N.B. participated stem cells in vivo. Nature 458, 904–908.
in single-cell isolation. H.G. prepared DNA libraries for sequencing. X.W. and Fan, J., Salathia, N., Liu, R., Kaeser, G.E., Yung, Y.C., Herman, J.L., Kaper, F.,
M.Z. contributed to DNA sequence analysis. P.V.K. designed the approach Fan, J.B., Zhang, K., Chun, J., and Kharchenko, P.V. (2016). Characterizing
for statistical analysis of flow cytometric data, performed DNA and RNA transcriptional heterogeneity through pathway and gene set overdispersion
sequence analysis, and was involved in manuscript writing. C.L. provided analysis. Nat. Methods 13, 241–244.
expertise on optical physics and intra-vital imaging. E.L. carried out CNV anal- Forsberg, E.C., Passegué, E., Prohaska, S.S., Wagers, A.J., Koeva, M., Stuart,
ysis of WGBS data. A.M. provided expertise on epigenetic experiments. D.T.S. J.M., and Weissman, I.L. (2010). Molecular signatures of quiescent, mobilized
supervised the project and was involved in concept design, data interpreta- and leukemia-initiating hematopoietic stem cells. PLoS ONE 5, e8785.
tion, and manuscript writing.
Gerrits, A., Dykstra, B., Kalmykowa, O.J., Klauke, K., Verovskaya, E., Broe-
khuis, M.J., de Haan, G., and Bystrykh, L.V. (2010). Cellular barcoding tool
ACKNOWLEDGMENTS
for clonal analysis in the hematopoietic system. Blood 115, 2610–2618.
We are grateful to Drs. Jeff W. Lichtman, Jean Livet, and Joshua R. Sanes for Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X.,
their kind gifts of the Brainbow fluorescent vectors. We also thank Laura Prick- Murre, C., Singh, H., and Glass, C.K. (2010). Simple combinations of lineage-
ett, Kathryn E. Folz-Donahue, and Meredith Weglarz at the Flow Cytometry determining transcription factors prime cis-regulatory elements required for
Core Facility of the Harvard Stem Cell Institute for their technical assistance. macrophage and B cell identities. Mol. Cell 38, 576–589.
We are grateful for support from Science for Life Laboratory, the Knut and Alice Jordan, C.T., and Lemischka, I.R. (1990). Clonal and systemic analysis of long-
Wallenberg Foundation, and the National Genomics Infrastructure, funded by term hematopoiesis in the mouse. Genes Dev. 4, 220–232.
the Swedish Research Council, for assistance with single-cell RNA-seq mea- Kharchenko, P.V., Tolstorukov, M.Y., and Park, P.J. (2008). Design and anal-
surements. This work was supported by NIH grants 1R21HL126070-01A1 to ysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26,
V.W.C.Y., DK103074 and CA193461 and the Gerald and Darlene Jordan Chair 1351–1359.
to D.T.S, and the Ellison Medical Foundation AG-NS-0965-12 and NIA
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L.
5K25AG037596 to P.V.K.
(2013). TopHat2: accurate alignment of transcriptomes in the presence of in-
sertions, deletions and gene fusions. Genome Biol. 14, R36.
Revised: August 9, 2016 Kittler, R., Pelletier, L., Heninger, A.K., Slabicki, M., Theis, M., Miroslaw, L.,
Accepted: October 25, 2016 Poser, I., Lawo, S., Grabner, H., Kozak, K., et al. (2007). Genome-scale RNAi
Published: November 17, 2016 profiling of cell division in human tissue culture cells. Nat. Cell Biol. 9, 1401–
1412.
REFERENCES Kühn, R., Schwenk, F., Aguet, M., and Rajewsky, K. (1995). Inducible gene
targeting in mice. Science 269, 1427–1429.
Aiuti, A., Biasco, L., Scaramuzza, S., Ferrua, F., Cicalese, M.P., Baricordi, C., Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with
Dionisio, F., Calabria, A., Giannelli, S., Castiello, M.C., et al. (2013). Lentiviral Bowtie 2. Nat. Methods 9, 357–359.
hematopoietic stem cell gene therapy in patients with Wiskott-Aldrich syn-
drome. Science 341, 1233151. Lara-Astiaso, D., Weiner, A., Lorenzo-Vivas, E., Zaretsky, I., Jaitin, D.A., David,
E., Keren-Shaul, H., Mildner, A., Winter, D., Jung, S., et al. (2014). Immunoge-
Anders, S., and Huber, W. (2010). Differential expression analysis for sequence
netics. Chromatin state dynamics during blood formation. Science 345,
count data. Genome Biol. 11, R106.
943–949.
Biasco, L., Pellin, D., Scala, S., Dionisio, F., Basso-Ricci, L., Leonardelli, L.,
Lemischka, I.R. (1993). Retroviral lineage studies: some principals and appli-
Scaramuzza, S., Baricordi, C., Ferrua, F., Cicalese, M.P., et al. (2016). In vivo
cations. Curr. Opin. Genet. Dev. 3, 115–118.
tracking of human hematopoiesis reveals patterns of clonal dynamics during
early and steady-state reconstitution phases. Cell Stem Cell 19, 107–119. Lemischka, I.R., Raulet, D.H., and Mulligan, R.C. (1986). Developmental po-
tential and dynamic behavior of hematopoietic stem cells. Cell 45, 917–927.
Bock, C., Beerman, I., Lien, W.H., Smith, Z.D., Gu, H., Boyle, P., Gnirke, A.,
Fuchs, E., Rossi, D.J., and Meissner, A. (2012). DNA methylation dynamics Livet, J., Weissman, T.A., Kang, H., Draft, R.W., Lu, J., Bennis, R.A., Sanes,
during in vivo differentiation of blood and skin stem cells. Mol. Cell 47, J.R., and Lichtman, J.W. (2007). Transgenic strategies for combinatorial
633–647. expression of fluorescent proteins in the nervous system. Nature 450, 56–62.
Busch, K., Klapproth, K., Barile, M., Flossdorf, M., Holland-Letz, T., Schlenner, Lu, R., Neff, N.F., Quake, S.R., and Weissman, I.L. (2011). Tracking single
S.M., Reth, M., Höfer, T., and Rodewald, H.R. (2015). Fundamental properties hematopoietic stem cells in vivo using high-throughput sequencing in conjunc-
of unperturbed haematopoiesis from stem cells in vivo. Nature 518, 542–546. tion with viral genetic barcoding. Nat. Biotechnol. 29, 928–933.
Chambers, S.M., Boles, N.C., Lin, K.Y., Tierney, M.P., Bowman, T.V., Brad- Lu, F., Liu, Y., Jiang, L., Yamaguchi, S., and Zhang, Y. (2014). Role of Tet pro-
fute, S.B., Chen, A.J., Merchant, A.A., Sirin, O., Weksberg, D.C., et al. teins in enhancer activity and telomere elongation. Genes Dev. 28, 2103–2119.
(2007). Hematopoietic fingerprints: an expression database of stem cells Mazurier, F., Gan, O.I., McKenzie, J.L., Doedens, M., and Dick, J.E. (2004).
and their progeny. Cell Stem Cell 1, 578–591. Lentivector-mediated clonal tracking reveals intrinsic heterogeneity in the hu-
Ding, L., Ley, T.J., Larson, D.E., Miller, C.A., Koboldt, D.C., Welch, J.S., man hematopoietic stem cell compartment and culture-induced stem cell
Ritchey, J.K., Young, M.A., Lamprecht, T., McLellan, M.D., et al. (2012). Clonal impairment. Blood 103, 545–552.
Cell 167, 1310–1322, November 17, 2016 1321

Morita, Y., Ema, H., and Nakauchi, H. (2010). Heterogeneity and hierarchy Sun, D., Luo, M., Jeong, M., Rodriguez, B., Xia, Z., Hannah, R., Wang, H., Le,
within the most primitive hematopoietic stem cell compartment. J. Exp. T., Faull, K.F., Chen, R., et al. (2014a). Epigenomic profiling of young and aged
Med. 207, 1173–1182. HSCs reveals concerted changes during aging that reinforce self-renewal. Cell
Muller-Sieburg, C.E., Cho, R.H., Karlsson, L., Huang, J.F., and Sieburg, H.B. Stem Cell 14, 673–688.
(2004). Myeloid-biased hematopoietic stem cells have extensive self-renewal Sun, J., Ramos, A., Chapman, B., Johnnidis, J.B., Le, L., Ho, Y.J., Klein, A.,
capacity but generate diminished lymphoid progeny with impaired IL-7 Hofmann, O., and Camargo, F.D. (2014b). Clonal dynamics of native haemato-
responsiveness. Blood 103, 4111–4118. poiesis. Nature 514, 322–327.
Notta, F., Mullighan, C.G., Wang, J.C., Poeppl, A., Doulatov, S., Phillips, L.A., Tsang, J.C., Yu, Y., Burke, S., Buettner, F., Wang, C., Kolodziejczyk, A.A., Teich-
Ma, J., Minden, M.D., Downing, J.R., and Dick, J.E. (2011). Evolution of human mann, S.A., Lu, L., and Liu, P. (2015). Single-cell transcriptomic reconstruction
BCR-ABL1 lymphoblastic leukaemia-initiating cells. Nature 469, 362–367. reveals cell cycle and multi-lineage differentiation defects in Bcl11a-deficient
hematopoietic stem cells. Genome Biol. 16, 178.
Oki, T., Nishimura, K., Kitaura, J., Togami, K., Maehara, A., Izawa, K., Sakaue-
Venezia, T.A., Merchant, A.A., Ramos, C.A., Whitehouse, N.L., Young, A.S.,
Sawano, A., Niida, A., Miyano, S., Aburatani, H., et al. (2014). A novel cell-cy-
Shaw, C.A., and Goodell, M.A. (2004). Molecular signatures of proliferation
cle-indicator, mVenus-p27K-, identifies quiescent cells and visualizes G0-G1
and quiescence in hematopoietic stem cells. PLoS Biol. 2, e301.
transition. Sci. Rep. 4, 4012.
Verovskaya, E., Broekhuis, M.J., Zwart, E., Weersing, E., Ritsema, M., Bos-
Picelli, S., Björklund, A.K., Faridani, O.R., Sagasser, S., Winberg, G., and
man, L.J., van Poele, T., de Haan, G., and Bystrykh, L.V. (2014). Asymmetry
Sandberg, R. (2013). Smart-seq2 for sensitive full-length transcriptome
in skeletal distribution of mouse hematopoietic stem cell clones and their
profiling in single cells. Nat. Methods 10, 1096–1098.
equilibration by mobilizing cytokines. J. Exp. Med. 211, 487–497.
Shi, P.A., Hematti, P., von Kalle, C., and Dunbar, C.E. (2002). Genetic marking Wilson, N.K., Kent, D.G., Buettner, F., Shehata, M., Macaulay, I.C., Calero-
as an approach to studying in vivo hematopoiesis: progress in the non-human Nieto, F.J., Sánchez Castillo, M., Oedekoven, C.A., Diamanti, E., Schulte, R.,
primate model. Oncogene 21, 3274–3283. et al. (2015). Combined single-cell functional and gene expression analysis re-
Snippert, H.J., van der Flier, L.G., Sato, T., van Es, J.H., van den Born, M., solves heterogeneity within stem cell populations. Cell Stem Cell 16, 712–724.
Kroon-Veenboer, C., Barker, N., Klein, A.M., van Rheenen, J., Simons, B.D., Wu, Y., Zhou, H., Fan, X., Zhang, Y., Zhang, M., Wang, Y., Xie, Z., Bai, M., Yin,
and Clevers, H. (2010). Intestinal crypt homeostasis results from neutral Q., Liang, D., et al. (2015). Correction of a genetic disease by CRISPR-Cas9-
competition between symmetrically dividing Lgr5 stem cells. Cell 143, mediated gene editing in mouse spermatogonial stem cells. Cell Res. 25,
134–144. 67–79.
Snodgrass, R., and Keller, G. (1987). Clonal fluctuation within the haemato- Xi, R., Lee, S., Xia, Y., Kim, T.M., and Park, P.J. (2016). Copy number analysis
poietic system of mice reconstituted with retrovirus-infected stem cells. of whole-genome data using BIC-seq2 and its application to detection of
EMBO J. 6, 3955–3960. cancer susceptibility variants. Nucleic Acids Res. 44, 6274–6286.
1322 Cell 167, 1310–1322, November 17, 2016

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
B220-biotin Biolegend Cat#:103204; RRID: AB_312989
CD3e-biotin Biolegend Cat#:100244; RRID: AB_2563947
CD4-biotin Biolegend Cat#:100508; RRID: AB_312711
CD8a-biotin Biolegend Cat#:100704; RRID: AB_312743
CD19-biotin Biolegend Cat#:115504; RRID: AB_313639
CD11b-biotin Biolegend Cat#:101204; RRID: AB_312787
Gr1-biotin Biolegend Cat#:108404; RRID: AB_313369
Ter119-biotin Biolegend Cat#:116204; RRID: AB_313705
CD11c-biotin Biolegend Cat#:117304; RRID: AB_313773
NK1.1-biotin Biolegend Cat#:108704; RRID: AB_313391
Streptavidin-Pacific Orange Invitrogen Cat#:S32365
cKit-APC-Cy7 Biolegend Cat#:105826; RRID: AB_1626278
Sca-PE-Cy7 Biolegend Cat#:108114; RRID: AB_493596
Sca-PE-Cy5 Biolegend Cat#:108110; RRID: AB_313347
CD48-APC Biolegend Cat#:103412; RRID: AB_57199
CD150-PE-Cy5 Biolegend Cat#:115912; RRID: AB_493598
CD127-PE-Cy7 eBioscience Cat#:25-1271-82; RRID: AB_469649
CD16/32-PE-Cy7 eBioscience Cat#:25-0161-82; RRID: AB_469598
CD34-efluor660 eBioscience Cat#:50-0341-82; RRID: AB_10596826
B220-APC eBioscience Cat#:17-0452-83; RRID: AB_469396
CD3-APC Biolegend Cat#:100236; RRID: AB_2561456
Mac1-APC Biolegend Cat#:101212
Gr1-APC Biolegend Cat#:108412; RRID: AB_313377
Ter119-APC Biolegend Cat#:116212; RRID: AB_313713
Polyinosinic:polycytidylic acid Amersham Cat#:24939-03-5
Isoflurane Sigma-Aldrich Cat#:792632-250MG
Lipopolysaccharide from E. coli 055:B5 Sigma-Aldrich Cat#:L2880-10MG
Medium 199 ThermoFisher Scientific Cat#:11043023
Fetal bovine serum ThermoFisher Scientific Cat#:10082139
RNase Out ThermoFisher Scientific Cat#:10777019
Calcein AM ThermoFisher Scientific Cat#:C3099
Propidium iodide ThermoFisher Scientific Cat#:P3566
NEBNext High-Fidelity 2X PCR Master Mix New England Biolabs Cat#:M0541L
Invitrogen SYBR Green I Dye Invitrogen Cat#:S7563
4OH-tamoxifen Sigma Cat#:T-5648
Progesterone Sigma Cat#:P-3972
Smart-seq2 protocol Picelli et al., 2013 N/A
Nextera XT Illumina Cat#:FC-131-1024
NuGen’s Ovation Ultralow Methyl-Seq NuGen Cat#:0335
Library Systems

Continued
QIAGEN Epitect Bisulfite kit QIAGEN Cat#:59104
Illumina Nextera DNA Preparation Kit Illumina Cat#:FC-121-1030
QIAGEN MinElute PCR purification Kit QIAGEN Cat#:28004
Deposited Data
HUe RNA-Seq data GEO datasets http://www.ncbi.nlm.nih. GEO: GSE60101
gov/gds
129P2 WGBS data GEO datasets http://www.ncbi.nlm.nih. GEO: GSE56986
gov/gds
BALBcJ WGBS data GEO datasets http://www.ncbi.nlm.nih. GEO: GSE60485
gov/gds
129P2 sequence structure annotation files http://www.sanger.ac.uk/science/ 129P2
programmes/mouse-and-zebrafish-
genetics
BALBcJ sequence structure annotation http://www.sanger.ac.uk/science/ BALBcJ
files programmes/mouse-and-zebrafish-
genetics
Referenced RNA-Seq data of Lara-Astiaso et al., 2014 N/A
hematopoietic subtypes
HUe mouse This paper N/A
C57BL6/J Jackson Laboratories 000664
Mx1-Cre Jackson Laboratories 003556
Prx1-CreER This paper N/A
Col(II)-CreER Jackson Laboratories 006774
Recombinant DNA
HUe fluorescent cassette Livet et al., 2007 #6 – CMV-Brainbow-2.1 ‘‘R’’ [GYRC]
pCAG promoter www.addgene.org Plasmid #13777
LoxP-STOP-LoxP-TOPO sequence www.addgene.org Plasmid #11584
LoxSL: ATAACTTCGTATA GTTCAACT This paper N/A
TATACGAAGTTAT
Custom primers used for ATAC-Seq This paper Table S2
FlowJo software TreeStar Inc. N/A
PAGODA package v1.99.3 Fan et al., 2016 http://pklab.med.harvard.edu/scde
MASS R package N/A https://cran.r-project.org/
MClust R package N/A https://cran.r-project.org/
BSMap 2.7 N/A N/A
BIC-seq2 Xi et al., 2016 http://www.math.pku.edu.cn/teachers/
xirb/downloads/software/BICseq2/
BICseq2.html
bowtie2 Johns Hopkins University http://bowtie-bio.sourceforge.net/bowtie2/
index.shtml
tophat2 Johns Hopkins University https://ccb.jhu.edu/software/tophat/index.
shtml
DESeq Anders and Huber, 2010 https://www.bioconductor.org/
Homer package Heinz et al., 2010 http://homer.salk.edu/homer/
SPP package 1.13 Kharchenko et al., 2008 http://compbio.med.harvard.edu/
Supplements/ChIP-seq/

Continued
Other
BD FACSAria II Cell Sorter BD Biosciences N/A
Cesium 137 irradiator N/A N/A
Illumina Nexseq 500 instrument Illumina N/A
CONTACT FOR REAGENTS AND RESOURCE SHARING
Requests should be addressed to and will be fulfilled by Lead Contact David T. Scadden (david_scadden@harvard.edu).
Generation of the HUe Mouse Model

Development of the HUe transgenic construct utilized a fluorescent cassette that was originally adapted for the neural system (Livet
et al., 2007), with some modifications. In brief, a STOP sequence flanked by a pair of LoxP variants was inserted in front of a fluo-
rescent cassette containing GFP, EYFP, tDimer2, and Cerulean cDNA sequence interspersed by multiple LoxP sites such that no
background fluorescence was expressed in the absence of Cre recombinase (Figure S1). The whole construct was placed under
a ubiquitous chicken beta actin promoter and the linearized construct was microinjected into C57BL6/J embryos to generate trans-
genic mouse lines. Six founder lines were established and fluorescence was observed in multiple founders. Experiments performed
in this study used founder six, which has approximately 20 copies of transgene insertion.
Mouse Models
HUe, Prx1-CreER, Mx1-Cre, Col(II)-CreER, and C57BL6/J strains were used and cross-bred as needed in this study. Mouse strains
HUe and Prx1-CreER were made in-house, while B6.Cg-Tg(Mx1-cre)1Cgn/J (Mx1-Cre), FVB-Tg(Col2a1-cre/ERT)KA3Smac/J
(Col(II)-CreER), and C57BL6/J were obtained from Jackson Laboratory. To image labeled cells of limb bud mesenchyme,
Prx1-CreER was crossed with HUe to create Prx1-CreER;HUe. 2mg of 4OH-tamoxifen and 1mg of progesterone was injected
into < 20 g pregnant females at E18.5. Mice were sacrificed for imaging at post-natal day 1 to 1 month of age. To image cells of
labeled cartilage, Col(II)-CreER was crossed HUe to create Col(II)-CreER;HUe. Col(II)-CreER;HUe mice at 2 months of age was in-
jected with 2mg of 4OH-tamoxifen and sacrificed for imaging 2-4 weeks post-injection. To study hematopoiesis, Mx1-Cre was
crossed with HUe to create Mx1-Cre;HUe strain. To induce hematopoietic cell labeling, Mx1-Cre;HUe mice were injected with
12.5ug pIpC/g BW at two weeks of age. For most transplantation studies, 6-8 months old Mx1-Cre;HUe and C57BL6/J mice
were used. To track endogenous hematopoiesis, bone marrow aspirates were obtained from Mx1-Cre;HUe mice at 2, 3, 5, and
10 months old. For all studies, age matched littermates were used as experimental controls. All animal housing, usage, and proced-
ures performed were approved by the Institutional Animal Care and Use Committee of Massachusetts General Hospital.
METHOD DETAILS
Flow Cytometry
For each mouse, tibiae, femurs, iliac crests, and spines were collected for bone marrow cells. Isolation and enumeration of different
hematopoietic cell types was performed by flow cytometry. Bone marrow cells harvested from each animal were ACK lysed before
antibody staining. We routinely stain 5x107 cells per sample for the stem population, and 1x107 cells per sample for each progenitor
and mature population. Lineage cocktail consists of biotinylated B220, CD3e, CD4, CD8a, CD19, CD11b, Gr1, Ter119, CD11c, and
NK1.1 antibodies. Fluorescence conjugated to streptavidin was used to recognize lineage cocktail. Using the following antibody com-
binations, we were able to identify hematopoietic subpopulations at the stem-cell level: hematopoietic stem cells (HSCs) (Lineage-Pa-
cific Orange, cKit-APC-Cy7, Sca-PE-Cy7, CD48-APC, CD150-PE-Cy5), at the stem/progenitor level: multipotent progenitor cells
(MPPs) (Lineage-Pacific Orange, cKit-APC-Cy7, Sca-PE-Cy5), common lymphoid progenitors (CLPs) (Lineage-Pacific Orange, cKit-
APC-Cy7, Sca-PE-Cy5, CD127-PE-Cy7), common myeloid progenitors (CMPs), granulocyte macrophage progenitors (GMPs), mega-
karyocyte erythroid progenitors (MEPs) (all three with Lineage-Pacific Orange, cKit-APC-Cy7, Sca-PE-Cy5, CD16/32-PE-Cy7, CD34-
efluor660), as well as mature lineages: B cells (B220-APC), T cells (CD3-APC), monocytes (Mac1-APC), granulocytes (Gr1-APC), and
erythroid cells (Ter119-APC) using a BD FACSAria II Cell Sorter equipped with ultraviolet, violet, blue, yellow/green, red lasers.
Bone Marrow Aspiration

Bone marrow aspiration was performed under full body anesthesia using 3% isoflurane and 2 L/min O2. Fur was removed from the
knee joint to expose intact skin. A PBS-wetted 27-gauge needle coupled with a 1mL syringe was inserted from the femur-tibial joint

longitudinally into the bone marrow cavity of the tibia with negative pressure applied to extract 10ul of bone marrow. Mice after sur-
gical procedure were placed under the heat lamp for 3-5 min to aid recovery from anesthesia and were monitored daily for any signs
of discomfort following the Pain Assessment Protocol published by the National Research Council (US) Committee. All animal usage
and procedures performed were approved by the Institutional Animal Care and Use Committee of Massachusetts General Hospital.
Generation of the HUe Recipient Cohort

PIpC induced Mx1-Cre;HUe mice at 6-8 weeks old were used as donors. Bone marrow cells from 8-12 donors were pulled, lineage-
depleted and flow sorted for LineageLoSca+cKit+ cells before transplantation. One hundred thousand flow sorted fluorescent Line-
ageLoSca+cKit+ cells mixed with 500,000 Sca- C57BL/6J support cells were transplanted into each of 20 lethally irradiated C57BL/6J
recipients. Sixteen weeks were allowed for hematopoietic reconstitution. Experiment was repeated 6-8 times.
LPS Stress Experiment

A HUe recipient cohort (15 mice) was divided into three sub-cohorts: saline control (5 mice), treatment at 12 hr prior to tissue harvest
(5 mice), and 44 days prior (5 mice). Mice received an intraperitoneal injection of either PBS or 0.3mg/kg BW LPS at the respective
time points and all three groups of mice were euthanized for bone marrow harvest on the same day. Bone marrow cells were stained
with antibodies summarized in Table S1 to identify HSCs, CLPs, GMP, MEPs, B cells, T cells, monocytes, granulocytes, and erythroid
cells by flow cytometry, n = 5 for each data point. Experiment was independently performed twice with different recipient cohorts.
Irradiation Stress Experiment

A HUe recipient cohort (15 mice) was divided into three sub-cohorts including no treatment control (5 mice), 4.5 Gy irradiation at
14 days prior to tissue harvest (5 mice), and 4.5 Gy irradiation 44 days prior (5 mice). All mice were euthanized for bone marrow har-
vest on the same day. Bone marrow cells were stained with antibodies to identify HSCs, CLPs, GMP, MEPs, B cells, T cells, mono-
cytes, granulocytes, and erythroid cells by flow cytometry (n = 5 for each data point). Experiment was independently performed twice
with different recipient cohorts.
Whole Genome Bisulfite DNA Sequencing (WGBS) and Bulk RNA-Seq

A HUe recipient cohort of 36 mice was divided into two sub-cohorts: one group for monitoring lineage output using flow cytometry,
one group for isolation of LT-HSC clones. Cells from a red and a yellow LT-HSC (LineageLoSca+cKit+CD48-CD150+) clone were flow
sorted from multiple recipient cohort mice by flow cytometry and subjected to whole genome bisulfite DNA sequencing (WGBS) and
RNA-seq. For WGBS, purified genomic DNA was fragmented to a size range of 100-400bp, 20-50ng of DNA fragments from each
sample were end-repaired and ligated with indexed adapters using the NuGen’s Ovation Ultralow Methyl-Seq Library Systems.
Adaptor-equipped DNA fragments were subjected to bisulfite treatments using the QIAGEN Epitect Bisulfite kit followed manufac-
ture’s recommendation with modifications. Bisulfite converted library DNA was PCR-amplified and sequenced at the Broad Institute
Genomics Platform. For RNA-seq, RNA was extracted from sample, reverse transcribed to cDNA, and was subjected to whole
genome sequencing at Partners Healthcare Center for Personalized Genetic Medicine. Each data point represents triplicate of sam-
ples. The differential expression analysis was performed using CuffDiff 2.0, and the GSEA analysis was conducted on the resulting
log-fold change values.
Single-cell RNA-Seq
HUe mice were induced by pIpC two weeks before single-cell sort of LKS SLAM cells. Whole bone marrow was isolated from femurs
and tibiae by softly crushing bones, filtered by 40 mm cell strainer, and resuspended in Media 199 (ThermoFisher Scientific) supple-
mented with 2% fetal bovine serum (ThermoFisher Scientific) and RNase Out (ThermoFisher Scientific). Cells were stained for LKS
SLAM markers, Calcein AM (ThermoFisher Scientific) and propidium iodide for cell viability detection. Single-cells were sorted using
a BD FACSAria II (BD Biosciences) into PCR 384 well plates (ThermoFisher Scientific) containing standard lysis buffer. Whole tran-
scriptome amplification was performed using the Smart-seq2 protocol (Picelli et al., 2013), and libraries prepared by Nextera XT
(Illumina). Samples were pooled and sequenced on an Illumina Nexseq 500 instrument using a 50 bp paired-end-reads. The analysis
was carried out using PAGODA package v1.99.3(Fan et al., 2016).
ATAC-Seq
We used an independent cohort of HUe recipient mice to select new clones to for a second set of experiment that integrated ATAC-,
DNA- and RNA-seq analysis in correlation with HSC behavior. A new HUe recipient cohort of 32 mice was divided into two sub-co-
horts: one group for monitoring lineage output using flow cytometry, one group for isolation of LT-HSC clones. Two new LT-HSC
(LineageLoSca+cKit+CD48-CD150+) clones (Red2 and Yellow2) were isolated from multiple recipient cohort mice by flow cytometry
and subjected to ATAC-Seq, WGBS, and RNA-seq. For ATAC-Seq, 10,000 cells of each clone were lysed in 50 mL of cold lysis buffer
(10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% IGEPAL CA-630) and immediately subjected to a transposition reaction at
37 C for 30 min with 2.5 mL transposase enzyme (Illumina Nextera DNA Preparation Kit). Transposed DNA was purified using QIAGEN
MinElute PCR purification Kit and subjected to library amplification using NEBNext High-Fidelity 2X PCR Master Mix, Invitrogen
SYBR Green I Dye, and primers (Table S2). Prior to sequencing, the ATAC-Seq library was assayed for quality using TapeStation

and BioAnalyzer instruments, and qPCR. WGBS and RNA-seq libraries were prepared as described previously. All libraries of this
second set of experiment were sequenced at the Bauer Core Facility of Harvard FAS Center for Systems Biology.
Identifying Significant Differences between Fluorescence Patterns

The fluorescent patterns between any two samples were compared based on the density of cells on a unit sphere in the space of color
composition. The uncertainty of the color composition measured for each cell was determined based on the intensity of each color,
using power law dependency between intensity and the variance. The power coefficients were fit based on technical replicates of
homeostatic condition mixtures, and were in the 0.5-0.75 range. To determine regions of statistical significance, 1000 sampling
rounds were performed, resampling the exact positions of every cell on the color composition surface using the appropriate variance
for each cell. Smoothed cell density was calculated on the surface, and compared using Student’s t test with degrees of freedom
corresponding to n-2, where n is the minimum number of cells in either sample. The color composition regions showing significant
differences (Z-score < 3) were identified as regions of significant differences between the samples. The change in the fractional cell
density between the samples was measured as a fraction of the total number of cells recorded in a given experiment.
Enumerating Individual Clones

Because of the naturally occurring intensity differences, the clones can have highly non-spherical shape, making identification and
separation of individual clones particularly challenging. To provide the initial definition of the clones, all pairwise cell-to-cell distances
were calculated in terms of the number of standard deviations (based on the intensity-variance model for each color). The distance
structure was then projected into a 2D space using non-metric multi-dimensional scaling (as implemented in MASS R package). The
clones were identified using elliptical clustering method implemented by the MClust R package. Despite such approach, some of the
very similar clones occurring in different samples were split or merged. In such cases, the count was corrected manually.
Whole-Genome Bisulfite Sequencing (WGBS) Processing

To calculate DNA methylation estimates for individual CpG positions (beta values), whole genome-bisulfite sequencing libraries were
aligned to the mouse genome mm9/NCBI Build 37 using BSMap 2.7 with the following parameters: -v 10 -f 40 -q 5 -S 1. Subse-
quently, CpG methylation calls were made, excluding duplicate reads as well as the first four bases of each read, only taking into
account CpGs with quality scores R 20 as well as requiring that surrounding bases exhibit quality scores R 10. Here, CpG methyl-
ation calling is defined as the computation of the number of reads overlapping a particular CpG harboring a C or a T at the cytosine
coordinate of the CpG. Let m be the number of C’s and u be the number of T’s. The value beta = m/(m + u) then gives the methylation
ratio of each CpG that was used as a basis of subsequent analysis.
CNV Analysis of WGBS Data

To determine whether there are significant genomic alterations that may distinguish clones within the same mouse/cohort, we used
BIC-seq2 (Xi et al., 2016), a read-depth-based CNV calling algorithm to detect copy number variation (CNVs) from the bisulfite WGBS
data of the mouse clones. Briefly, BIC-seq2 divides genomic regions into disjoint bins and counts uniquely aligned reads in each bin.
Then, it combines neighboring bins into genomic segments with similar copy numbers iteratively based on Bayesian Information
Criteria (BIC), a statistical criterion measuring both the fitness and complexity of a statistical model. BIC-seq2 provides two different
CNV calling methods: 1) paired-sample CNV calling that takes a pair of samples as input and detects genomic regions with different
copy numbers between the two samples, and 2) control-free CNV calling that takes only one sample as input and calls CNVs in the
sample. We used a bin size of 1000 bp and a lambda of 50 (a smoothing parameter for CNV segmentation). We called segments as
copy gain or loss when their log2 copy ratios were larger than 0.2 or smaller than 0.2, respectively.
To evaluate the sensitivity of our CNV calling method for bisulfite whole genome sequencing data, we analyzed published bisulfite
whole genome sequencing data for two different mouse strains: 129P2 (GSE56986) (Lu et al., 2014) and BALBcJ (GSE60485) (Wu
et al., 2015). The CNVs for the three mouse strains were annotated by Sanger mouse genome project. We downloaded the FASTQ
files from the NCBI GEO database and mapped the reads using the same procedure as described in the previous section. We then
applied BIC-seq2 with the same parameters (bin size and lambda) as we analyzed our own data. We also downloaded annotation files
for structural variation of the three mouse strains from the Sanger mouse genome project web site (http://www.sanger.ac.uk/
science/programmes/mouse-and-zebrafish-genetics). Using annotated deletions larger than 10 kbp as a gold standard set, we
calculated the sensitivity as the fraction of the gold standard CNVs that were detected by BIC-seq2. The estimated CNV sensitivity
was 58.6% and 44.7% for 129P2 and BALBcJ.
Computational Validation of the HSC States of the Selected Clones

To validate that the Cohort1.Y, Cohort1.R, Cohort2.G, Cohort2.P clones are at the HSC states, we compared DNA methylation levels
of enhancers activated at different hematopoietic stages. The enhancer positions and their epigenetic states within hematopoiesis
were taken from Lara-Astiaso et al. (2014). Following the described methods, we tallied H3K4me1 and H3K27ac ChIP-seq read
counts for 48415 enhancers across 16 hematopoietic cell types. The H3K4me1 read counts were used to categorize the states of

enhancers, ‘on’, ‘off’ or ‘intermediate’, and H3K27ac read counts were used to further determine whether enhancers with ‘on’ states
are ‘active’ or ‘poised’, as described previously (Lara-Astiaso et al., 2014). We focused on early lymphopoiesis involving the ‘LT-HSC’
(long-term HSC), ‘ST-HSC’ (short-term HSC), ‘MPP’ (multipotent progenitors) and ‘CLP’ (lymphoid progenitors) stages. For each
stage, we selected a set of enhancers that were ‘on’ in that cell type as well as all downstream cell types, but ‘off’ in all upstream
cell types. For instance, for MPP enhancers we selected those that are ‘on’ in MPP and CLP, but are off in LT-HSC and ST-HSC.
Average DNA methylation (beta values) were calculated within each enhancer region and compared between the selected clones,
respectively (e.g., Figures 6A and 7D).
For the ATAC-seq data, the signal for each enhancer was quantified as a number of the ATAC-seq fragment centers that fall within
the ± 1kb region around the enhancer center, normalized by the library size (measured in million of reads). The read counts were
based on the paired-end alignment (using bowtie2) to the mm9 genome assembly, removing duplicate reads.
Gene Set Enrichment Analysis for Testing Cell Type Bias of Clones
Gene set enrichment analysis (GSEA) was performed to investigate whether gene expression profiles of the selected clones show
bias toward a specific hematopoietic cell type. In the GSEA, the genes were ranked by the Z score corresponding to the p value
of the expression differences between the clones (using tophat2 (Kim et al., 2013) with default parameters, and DESeq (Anders
and Huber, 2010) with local fit option). GSEA was used to test gene sets obtained based on differential expression analysis of
RNA-seq data across 16 different hematopoietic cell types from Lara-Astiaso et al. (2014). Following the same steps as described
by Lara-Astiaso, the RNA-seq data (GEO: GSE60101) were aligned to the mm9 mouse genome assembly using bowtie2 (Langmead
and Salzberg, 2012) with default parameters. Read counts of genes were calculated using ‘analyzeRNA.pl’ from the Homer package
(Heinz et al., 2010). Differential gene expression analysis was performed based on DESeq between a given pair of cell types (i.e. HSC
and MPP). Genes significantly higher expression in a given cell type (FDR-corrected p value < 0.05) were selected as a set for GSEA
analysis (Figure 6B).
GSEA was also applied to test whether the selected clones also show significant bias in DNA methylation at promoters (within 2 kb
upstream of transcription start sites) of the identified cell-type-specific genes (Figure 6B). In the GSEA, promoters were ranked based
on the maximum likelihood estimates of log2 fold ratio of promoter methylation of the clones, calculated based on the SPP package
(Kharchenko et al., 2008).
Finally, GSEA was also employed to study potential cell type bias of the two clones in DNA methylation of enhancers (Figures 6B
and 6D). For the GSEA, enhancers were ranked by the maximum likelihood estimates of log2 fold ratio of DNA methylation of the ‘Yel-
low’ and ‘Red’ clones. For a given pair of cell types (HSC versus MPP in Figure 6B, or CLP versus CMP in Figure 6D), we selected top
500 enhancers that are most ‘active’ (as determined by the H3K27ac read counts) in one cell type and are not ‘on’ (determined by
H3K4me1 counts) in the paired cell type, and vice versa, for the GSEA analysis.
HUe RNA-seq, WGBS and ATAC-seq data accessible through GEO datasets (GEO: GSE87527).

A
poly(I:C)
In vitro
Mx1-Cre;HUe
In vivo
B
2000 progenitors
CLP 8 days
c-Kit
IL7R
transplanted into
C57BL/6J post-transplant
Lin Sca
CD16/32
GMP
c-Kit
c-Kit
CMP HUe fluorescence analysis

MEP
Lin Sca CD34
CLP CMP GMP MEP

Donor
Recipient 1
Recipient 2
C Mx1-Cre;HUe LKS
poly(I:C) 9.5 Gy
1 Mx1-Cre;HUe HSC +
RFP
CFP
500,000 Sca- C57BL6 cells

Mx1-Cre;HUe
30 weeks post-transplantation
YFP GFP
Recipient 1 Recipient 2 Recipient 3 Recipient 4
Figure S1. Faithful In Vivo Propagation of Fluorescence over Generations Enables Clonal Tracking, Related to Figure 1
(A) Demonstration of fluorescence fidelity in vitro. Bone marrow cells harvested from induced Mx1-Cre;HUe were plated at low concentration in methycellulose-
containing medium for single cell derived hematopoietic colony to emerge and imaged under fluorescent microscope. Uniformity in fluorescence in individual
colonies showed that color was consistent over generations of cell division in vitro.
(B) Demonstration of fluorescence fidelity in vivo. Bone marrow cells harvested from induced Mx1-Cre;HUe were flow sorted to isolate CLP, CMP, GMP, and MEP
populations. Two thousand cells of each population were intravenously transplanted into each of 5 sublethally irradiated C57BL/6J recipients. Spleens of
recipient mice were harvested 8 days post-transplantation, and cells were subjected to flow cytometric analysis of HUe fluorescence. Endogenous HUe fluo-
rescence emanating from cells was plotted in a 3 dimensional graph with x axis (tDimer2 = red fluorescence), y axis (Cerulean = blue fluorescence), and z axis
(EYFP = green fluorescence) representing increasing fluorescent intensities in log scale. Recipient mice that received the same batch of donor cells exhibited a
HUe fluorescent profile nearly identical to the donor cells injected into them.

(C) We performed single cell transplantation to further confirm color fidelity in vivo. Single HSCs (LineageLocKit+Sca+CD48-CD150+), each carrying its own unique
HUe fluorescent signature, were sorted from induced Mx1-Cre;HUe donors. Each of 214 lethally irradiated C57BL/6J mice was transplanted with a single HUe
fluorescent HSC in combination with 500,000 Sca- C57BL/6J bone marrow support cells. At 30 weeks post-transplantation, mice engrafted with fluorescent cells
revealed a tight clone emanating from the single cell transplanted.
A Southern blot probes to detect number of copies of transgene inserted into the mouse genome:
Southern blot detected multiple copies of transgene inserted into the mouse genome in different founder lines:
5 copies control
No copy control
1 copy control
Founder 2
Founder 3
Founder 4
Founder 6
Founder 1
7.4 kb
B DNA Fingerprinting with probes to detect random genomic rearrangement in the presence of Cre:
Experimental Design:
Flow sorted Mx1-Cre;HUe GMPs
poly(I:C)
Transplanted 5000 4.5 Gy 4.5 Gy 4.5 Gy
cells per recipient.
RFP
CFP
Profile 1
Mx1-Cre;HUe
Profile 1 Profile 2 Profile 3
YFP GFP
Selected colour-restricted Mx1-Cre;HUe GMPs.
Harvested spleen DNA at day 11 post-transplantation.
DNA Fingerprinting for transgene rearrangement.

Pos Ctrl
Neg Ctrl
DNA Ladder
Mouse 1
Mouse 2
Mouse 3
Mouse 4
Mouse 5
Mouse 6
Mouse 7
Mouse 8
Mouse 9
Mouse 11
Mouse 10
Mouse 12
Mouse 13
Mouse 14
Mouse 15
Mouse 16
Mouse 17
Mouse 18
Mouse 19
Mouse 20
Mouse 21
Profile 1 Profile 2 Profile 3

Figure S2. Genetic Confirmation of HUe Fluorescence as a Clonal Marker, Related to Figure 1
(A) A schematic representation of Southern blot probe design to detect the number of transgene copies inserted into the genome. Southern blot detected multiple
copies of transgene inserted into the mouse genome in different founder lines.
(B) A schematic representation of DNA fingerprinting probe design to detect random genomic rearrangement in the presence of Cre. To genetically prove that a
fluorescent cluster of a defined size was clonal, we sorted HUe colored GMPs from induced Mx1-Cre;HUe mice using a restricted gate drawn on the fluorescent
intensity plots during cell sorting. Five thousand GMPs of restricted HUe fluorescence were transplanted into each of seven sublethally irradiated C57BL6/J mice.
We sorted GMPs of three different HUe colors and transplanted into 21 mice in total. Eleven days after transplantation, DNA was extracted from whole spleen of
each recipient and subjected to DNA fingerprinting for identification of transgene rearrangement, using a set of probes that bind to different regions of the
transgene.
A Flow cytometric measurement of HUe clonal fuorescence from bone marrow aspirates at 5 and 10
months old
Month 3 Month 5
Increase in density
Decrease in density
B Quantification of HUe clonal changes from 5 to 10 months old by custom-designed MClust R program
higher in Month 10 Z-score higher in Month 5
−3 0 3
Month 5 Difference Month 10
Green
Blue Red
Month 5 Statistically significant difference Month 10
N1
Green
E1: #: 9678.7; %nAF: 46.5; %T: 16.9 E1: #: 14400.7; %nAF: 92.6; %T: 55.1
W1: #: 7976.7; %nAF: 38.3; %T: 13.9 W1: #: 553.8; %nAF: 3.6; %T: 2.1
N1: #: 2855.1; %nAF: 13.7; %T: 5.0 W1 E1 N1: #: 329.8; %nAF: 2.1; %T: 1.3
# = total number of cells

%nAF = % of non-autofluorescent cells
%T = % of total number of cells
Blue Red
Figure S3. Quantification of Hematopoietic Clonal Changes under Homeostatic Conditions—Mouse #8, 5 to 10 Months Old, Related to
Figures 2, 4, 5, and 7
(A) A Mx1-Cre;HUe mouse was induced for endogenous HUe fluorescence with pIpC at one month old. Bone marrow aspirates were obtained from tibiae at 5 and
10 months old respectively. HUe fluorescence in total bone marrow cells was projected in spherical graph with x axis (tDimer2), y axis (Cerulean), and z axis
(EYFP) representing increasing fluorescent intensities in log scale. Clones that showed changes are highlighted with colors: red represents an increase in density,
while blue represents a decrease in density in comparison to the other time point.
(B) To quantify the pattern changes between the two time points, HUe fluorescence in 3D space was projected into two dimensional plots using sinusoidal
projection: 5 months old (on left) and 10 months old (on right), and the difference plot in between. In the difference plot, red represents a decrease in cell density at
the indicated HUe fluorescence at 10 months old, whereas blue represents an increase in cell density at 10 months old. White contour line indicates statistically
significant changes with a Z-score of 3 to 3. Clones that were statistically different between the two time points were summarized in the panel below. For each
clone, we scored the change in absolute number of cells, the percentage of non-autofluorescent cells, and the percentage of the total number of cells.
A Experimental Set 1 Experimental Set 2 B Experimental Set 1 Experimental Set 2
100 150 200
0.15
total clonal difference
total clonal difference
150
0.12
frequency
frequency
0.10
100
0.08
0.05
0.04
50
50
0.00
0.00
0
0 2−3 3−5 5−10

0.0 0.2 0.4 0.6 0.8 2−3 3−5 5−10
0.0 0.2 0.4 0.6 0.8 1.0
Ages being compared Ages being compared
clonal size (fraction) clonal size (fraction)
C
Total BM B Cells Monocytes Erythroid Cells
Mx1Cre;HUe Donor HUe Recipient Cohort
poly(I:C)
s Recipient 1
e ek
1 6w
Total BM
16 weeks
1
Recipient 2
C57BL/6J
16
we
ek
s
Recipient 3
Figure S4. Hematopoiesis Is a Composite of Dissimilar Clones with Stereotypic Behavior, Related to Figures 2 and 3
(A) Distribution of clone sizes (measured as fraction of total cells) is shown for the two sets of mice.
(B) A total fraction of cells affected by shifts in the clonal composition between the adjacent time points is shown. Whiskers show 95% confidence interval.
(C) Illustrated is another example of HUe recipient cohort. Endogenous fluorescence activated Mx1-Cre;HUe mice were used as donors. Bone marrow cells from
multiple Mx1-Cre;HUe donors were pooled as one mixture and flow sorted to isolate HSPCs (LineageLoSca+cKit+). HSCs with random endogenous fluorescence
were mixed with support cells from C57BL/6J and transplanted into each of 20 lethally irradiated C57BL/6J recipients. After sixteen weeks of reconstitution, the
recipients showed high consistency in clonal pattern including proliferation, fluorescence, and lineage characteristics in all hierarchy of hematopoietic cell types.
HUe clonal fluorescent patterns of B cells, monocytes, and erythroid cells in multiple recipients are shown, illustrating consistency among the recipients and the
distinction between different cell compartments.
A Clonal consistency within each cell type B Clonal pattern of each cell type is uniquely distinct from
across multiple mice in a HUe recipient others. Comparing correlation coefficients within each cell
cohort type against other cell types in the same army.
cell P-value cell/dataset Cohort 1 Cohort 2 combined
B220 8.7e-08 B220 5.0e-07 3.7e-01 9.8e-26
CD3 3.0e-08 CD3 3.3e-06 1.5e-12 5.4e-17
CLP 7.1e-07 CLP 1.2e-08 1.7e-11 2.0e-23
CMP 3.9e-04 CMP 1.1e-15 2.7e-07 4.6e-17
GMP 1.9e-02 GMP 1.5e-12 3.4e-12 2.5e-31
Gr1 1.1e-08 Gr1 1.5e-10 2.2e-03 3.6e-41
LKS 1.7e-07 LKS 2.4e-15 2.1e-04 3.0e-20
Mac1 2.1e-07 Mac1 1.9e-10 1.3e-02 8.4e-32
MEP 7.1e-04 MEP 4.3e-13 8.1e-08 1.3e-31
SLAM 2.9e-12 SLAM 2.3e-15 1.4e-15 6.0e-43
Ter119 1.3e-12 Ter119 1.0e-03 4.2e-04 9.0e-01
C Clonal differences among cell types are consistant within a given HUe recipient cohort
Recipient Cohort 1 Recipient Cohort 2
Ter119
Ter119
SLAM
SLAM
Mac1
Mac1
B220
B220
GMP
GMP
CMP
CMP
MEP
MEP
CD3
CD3
CLP
CLP
LKS
LKS
Gr1
Gr1
Ter119 * Ter119 *
SLAM * SLAM *
MEP * MEP *
Mac1 * Mac1 *
LKS * LKS *
Gr1 * Gr1 *
GMP * P<0.001 GMP * P<0.001
CMP * P<0.01 CMP * P<0.01
CLP * P<0.05 CLP * P<0.05
CD3 * P<0.10 P<0.10
CD3 *
not significant not significant
B220 * B220 *
D 12
LPS treatment Control 12 hours 44 days
number of non−AF clusters
10
0
B2
Te
SL
LK
G
LP
D
M
r1
ac
EP
r1
20
AM
3
P
E
P
19
12
number of non−AF clusters
IR treatment Control 14 days 44 days

10
0
B2
Te
SL
LK
G
LP
D
M
r1
ac
EP
r1
20
AM
3
P
19

Figure S5. Hematopoietic Functional Heterogeneity Is Preserved across All Mice in Any Given HUe Recipient Cohort, Related to Figures 3
and 4
(A) Individual hematopoietic cell populations show significant clonal consistency across members within a HUe recipient cohort. Clonal pattern of a hemato-
poietic cell type (e.g., B220) was highly similar across multiple mice in any given HUe recipient cohort. Table shows Wilcoxon test p values for each hematopoietic
cell type, by comparing correlation coefficients of the same cell type across mice within the same recipient cohort.
(B) Clonal pattern of each cell type is uniquely distinct from others. We compared correlation coefficients within each cell type against other cell types in the same
recipient cohort. Table shows Wilcoxon test p values comparing correlations of one cell type (e.g., B220) versus other cell types (e.g., CD3). Data of two in-
dependent HUe recipient cohorts are shown. The largest p value (combined Wilcoxon tests) was 6e-43.
(C) Clonal differences among cell types are consistent within a HUe recipient cohort. We interrogated the pattern differences between any two cell types (i.e. CLP
versus B220), and asked whether this change in clonal pattern was consistent among all mice within a recipient cohort. The triangular matrices summarize the
statistical significance of the clonal changes in different cell population comparisons. The asterisk on the row label indicates that when testing for fluorescence
spectrum changes of a given cell population against the others, the differences between the mice were significantly smaller than the differences between cell
population pairs. The largest p value (Wilcoxon test) was 1e-13.
(D, E). Effect of LPS and IR stress on cluster counts. The barplots show total number of clusters observed for different compartments in control and post-
perturbation mice for LPS (D) and irradiation (E) perturbations. x axis lists the groups of hematopoietic cell types analyzed at 12 hr and 44 days post-treatment
compared to mock treatment controls. y axis indicates the number of non-autofluorescent clusters per sample. While inter-mouse variation in cluster numbers is
substantial (whiskers give 95% Poisson confidence interval), analysis across different compartments using Poisson GLM shows statistically significant de-
viations. Specifically, compared to control, LPS treatment led to significant increase of cluster numbers in the stem cell/progenitor populations
(SLAM+LKS+CLP+GMP, p value 0.038 for 12 hr and 0.021 for 44 day samples). On day 44, the increase is statistically significant even downstream effector
compartments are considered (p value 0.0361). In contrast, IR treatment shows reduction of cluster numbers (p value 0.071).
A Clones Mitotic gene signature B cell gene expression
Red clone
20
Green clone
low
Blue clone
10
dimension 2
dimension 2
dimension 2
cell PC score
neutral
0
−10
high
−20
−15 −10 −5 0 5 10 15 20
dimension 1 dimension 1 dimension 1
Cluster assignment C
B
Red clone Green clone Blue clone
20
0.4
0.8
A 0.8
*** **
10
dimension 2
***
0.3
0.6
C
cluster fraction
cluster fraction
cluster fraction
expected fraction
0.6
0
0.2
0.4
***
0.4
D
−10
0.1
0.2
0.2
* *
−20
0.0
0.0
0.0
−15 −10 −5 0 5 10 15 20
dimension 1 A B C D A B C D A B C D
cluster cluster cluster
Figure S6. Transcriptional Heterogeneity within and between Clones as Illustrated by Single-Cell RNA-Seq Analysis, Related to Figures 5, 6,
and 7
Single HSCs (LineageLocKit+Sca+CD48-CD150+) belonging to individual clones were flow sorted from a pIpC induced Mx1-Cre;HUe mouse and subjected to
single-cell RNA-seq analysis.
(A) tSNE visualization of transcriptional heterogeneity, with cells colored according to (from left to right plot) the HUe clone they belong to (red, green or blue
clones), intensity of the mitotic signature (orange – high mitotic expression activity, green – low), and intensity of B cell like signature. The distribution of clones
showed notable bias toward particular transcriptional states, however the cells of both large clones (red and green) can be found throughout the transcriptional
space, indicating that despite overall transcriptional and phenotypic bias, substantial intra-clonal transcriptional variability was present.
(B) The plot shows transcriptional cluster definitions. Clusters A-D describe key subpopulations, with the small erythroid-like and neutrophil-like groups in the
center omitted.
(C) Plots show tests of distribution of different HUe clones within different transcriptional clusters. Each subplot tests distribution of a particular HUe clone (red,
green, blue). The clusters were defined by x axis, and the y axis represents the fraction of each transcriptional cluster taken up by a given clone. The dashed
horizontal gray line shows the fraction of the transcriptional cluster the HUe clone was supposed to account for based on clone’s overall frequency. The bars show
observed fraction, with the whiskers providing 95% CI. The stars indicate statistical significance (*p = 0.05, **p = 0.01, ***p = 0.001).
A 4
Cohort1.R : 35 CNVs
log2 coverage ratio
-4
1 3 5 7 11 13 15 17 19
B 4
Cohort1.Y : 76 CNVs
-4
C 4
Cohort1.R vs Cohort1.Y : 2 CNVs
-4
D 4
Cohort2.P : 52 CNVs
-4
E 4
Cohort2.G : 65 CNVs
-4
F 4
Cohort2.P vs Cohort2.G : 0 CNVs
Figure S7. Analysis of Genomic Differences between the Clones, Related to Figures 5, 6, and 7
The CNV profiles obtained from the WGBS data for each clone are shown. Black and red dots represent log2 copy ratios of bins and CNV segments, respectively.
The blue lines represent log2 copy ratios of zero, 1 and +1.
(A and B) Control-free CNV profiles for the R and Y clones from Cohort1. Many CNVs were detected within each clone (see titles), though almost all occur in both
clones.
(C) CNV profile resulting from direct comparison of Cohort1.R and Cohort1.Y clones. Two closely spaced CNVs reported on chromosome 4 appeared to be
WGBS artifacts: the first deletion was also called in both Cohort1.R and Cohort1.Y with almost identical breakpoints by the control-free CNV calling method (A, B),
indicating that the clone-specific deletion was likely a calling artifact. The adjacent large CNV (31.6 Mbp) was not called in either Cohort1.R or Cohort1.Y clone.
Instead, the region consisted of four smaller CNV segments with similar breakpoints and log2 ratios in both clones, indicating that the large deletion was unlikely to
be a genuine clone-specific deletion and caused by an erroneous CNV segmentation.
(D–F) Analogous control-free and direct comparison plots for Cohort2.P and Cohort2.G clones. In this case, CNV pattern appears to be identical between clones
with no notable deviations in the direct comparison. In summary, we found no convincing evidence of clone-specific CNVs despite the reasonable sensitivity of
our CNV detection method (see the STAR Methods).
Article
Impaired Epidermal to Dendritic T Cell Signaling

Slows Wound Repair in Aged Skin
Brice E. Keyes, Siqi Liu, Amma Asare, ...,
Maria Nikolova, Hilda Amalia Pasolli,
Elaine Fuchs
Correspondence
fuchslb@rockefeller.edu
In Brief
Loss of communication between
epithelial and immune cells in the skin
underlies the slowdown in wound healing
associated with aging.
Highlights
d Intrinsic and extrinsic defects impair wound re-
epithelialization in aged skin
d Loss of DETCs at the wound edge delays wound repair in

aged skin
d Epidermally expressed Skint3/9 mediate keratinocyte-DETC

cross-talk
d IL6/STAT3 signaling regulates Skint expression and

facilitates proper wound healing
Keyes et al., 2016, Cell 167, 1323–1338

Article
Impaired Epidermal to Dendritic T Cell Signaling

Slows Wound Repair in Aged Skin
Brice E. Keyes,1,2,3 Siqi Liu,1,2 Amma Asare,1 Shruti Naik,1 John Levorse,1 Lisa Polak,1 Catherine P. Lu,1 Maria Nikolova,1
Hilda Amalia Pasolli,1,4 and Elaine Fuchs1,5,*
1The Rockefeller University, New York, NY 10065, USA
2Co-first authors
3Present address: Calico Life Sciences, South San Francisco, CA 94080, USA
4Present address: Howard Hughes Medical Institute, Janelia Research Campus. Ashburn, VA 20147, USA
5Lead Contact
*Correspondence: fuchslb@rockefeller.edu
SUMMARY dence of infection and chronic wound formation. Wound healing

is a complex biological process that involves four independent,
Aged skin heals wounds poorly, increasing suscepti- yet interconnected processes beginning with: (1) coagulation
bility to infections. Restoring homeostasis after of platelets to form an eschar (scab) to generate a temporary bar-
wounding requires the coordinated actions of rier; (2) activation of resident T cells and infiltration of macro-
epidermal and immune cells. Here we find that both phages, monocytes, and neutrophils in immune surveillance;
intrinsic defects and communication with immune (3) local proliferation of epidermal keratinocytes at the wound
edge, followed by their migration into the wound bed to repair
cells are impaired in aged keratinocytes, diminishing
the damaged barrier and restore homeostasis; (4) resolution of
their efficiency in restoring the skin barrier after
the wound through repair of underlying damaged dermis and re-
wounding. At the wound-edge, aged keratinocytes modeling of its extracellular matrix (Gurtner et al., 2008).
display reduced proliferation and migration. They In many tissue types and organs, the ability to repair wounds
also exhibit a dampened ability to transcriptionally declines with age (Kennedy et al., 2014; Kenyon, 2010; López-
activate epithelial-immune crosstalk regulators, in- Otı́n et al., 2013; Oh et al., 2014). Skin is particularly vulnerable
cluding a failure to properly activate/maintain to age-related decline, exhibiting increased dryness, roughness,
dendritic epithelial T cells (DETCs), which promote hair loss, impaired wound healing, and increased susceptibility
re-epithelialization following injury. Probing mecha- to infection and chronic wounds (Ashcroft et al., 2002; Keyes
nism, we find that aged keratinocytes near the wound et al., 2013; Nishimura et al., 2005; Velarde et al., 2015). Poor
edge don’t efficiently upregulate Skints or activate wound healing in aged adults has been documented for well
over a century, and age-related declines in cutaneous wound
STAT3. Notably, when epidermal Stat3, Skints, or
healing contribute to a variety of health complications, and to
DETCs are silenced in young skin, re-epithelialization
decreased lifespan. Despite its importance, however, the molec-
following wounding is perturbed. These findings un- ular underpinnings for the age-related decline in wound-repair
derscore epithelial-immune crosstalk perturbations are not well understood, impeding the prospects for therapeutic
in general, and Skints in particular, as critical media- advances.
tors in the age-related decline in wound-repair. Formation of a proper epithelial barrier after acute injury re-
quires the coordinated action and cross-talk of keratinocytes
INTRODUCTION and immune cells at the wound site. So-named due to their
morphology and epidermal location, dendritic epithelial T cells
Although paper-thin, the epidermis acts as the key physical bar- (DETCs) inhabit the epidermis and account for a majority of
rier to the external environment. It prevents dehydration, blocks T cells in the skin epidermis. Born in the developing fetal thymus
damaging ultraviolet radiation, and guards against pathogens and then migrating to the epidermis, 90% of DETCs express an
and infection. To maintain proper tissue homeostasis and barrier invariant Vg5Vd1 (also known as Vg3Vd1) T cell antigen receptor
function, epidermis must be constantly renewed and repaired by (TCR) (Lewis et al., 2006). Vg5Vd1 DETCs reside in the innermost
proliferative progenitor keratinocytes that periodically exit the basal layer of the epidermis but can extend their dendrites into
innermost basal layer and execute a terminal differentiation pro- the suprabasal layers, a feature thought to be used for surveil-
gram, eventually being sloughed as dead squames from the lance in guarding against pathogenic infection (Chodaczek
body surface (Fuchs, 2007). et al., 2012; Hayday, 2009).
Following injury, a wound healing response is triggered to DETCs have been implicated in maintaining skin function,
rapidly repair epidermis and restore the skin barrier. While repair including epidermal homeostasis, tumor surveillance, and
is ongoing, the damaged skin is exposed to pathogens, and de- wound repair (Heath and Carbone, 2013). Upon injury, DETCs
lays in the re-epithelialization process can lead to higher inci- bordering the wound edge retract their dendrites, become
rounded, and begin to produce epidermal mitogens such as display DETC and wound repair defects which are strikingly
FGF7/10 and IGF1, facilitating wound re-epithelialization similar to those in aged mice, and that elevating this signaling
(Jameson et al., 2002). Mice lacking the T cell receptor d subunit pathway can stimulate Skint expression as well as improve
(TCRd) show pronounced delays in cutaneous wound healing epidermal migration in aged skin. These findings not only
(Itohara et al., 1993; Jameson et al., 2002). However, these demonstrate proof of principle, but in addition, offer new promise
mice lack all gd T cells, including both epidermal DETCs and for therapeutic intervention in elderly individuals who need a
dermal Vg4Vd1 T cells; each secretes a different repertoire of boost in restoring skin barrier acquisition after injury.
factors and cytokines that could impact wound-repair (Gray
et al., 2011; Sumaria et al., 2011). RESULTS
Mice that selectively lack Vg5Vd1 DETCs have been described
(Boyden et al., 2008; Turchinovich and Hayday, 2011) but have Aged Animals Maintain a Functional Epidermis in
not been tested for possible defects in wound repair. They har- Homeostasis
bor a null mutation in selection and upkeep of intraepithelial The dorsal (backskin) epidermis of young (2–4 month) mice is a
T cells 1 (Skint1), lack canonical Vg5Vd1-expressing DETCs. stratified epithelial tissue composed of dead outer stratum cor-
Skint1 is the founding member of a family (Skint1-11) of butyro- neum cells, differentiating granular and spinous layers, and an
philin-like proteins containing transmembrane spanning do- inner proliferative basal layer attached to an underlying base-
mains and extracellular IgV and IgC domains (Mohamed et al., ment membrane (Figure 1A). The corresponding epidermis of
2015). During development, Skint1 is expressed by thymic aged (22–24 month) female C57BL6/J animals also displayed
epithelial cells, promoting functional differentiation of DETC pro- these morphological features, although an 20% reduction in
genitors (Boyden et al., 2008). A number of Skint family members epidermal thickness was accompanied by an equivalent dermal
are also expressed in the skin epidermis and intestinal epithelium thinning (Figures 1B and 1C). Immunofluorescence microscopy
(Boyden et al., 2008). However, their functions in these adult tis- confirmed the presence of a seemingly normal differentiation
sues remain unexplored. program in aged mouse skin (Figure 1D and data not shown).
In the present study, we were drawn to DETCs and Skints In all, we carried out immunostaining for basement membrane
through an unbiased approach in defining the age-related de- protein b4 integrin (CD104), basal keratins 5 and 14 (K5 and
fects that underlie impaired re-epithelialization after skin wound- K14), spinous layer keratins (K10 and K1), wound-response ker-
ing. Using mouse as a model system, we first showed that atins (K6 and K17), and granular layer proteins filaggrin and lor-
re-epithelialization to restore the skin barrier is delayed in aged icrin, and observed no obvious structural differences between
mice. We found that aged skin epidermal keratinocytes are aged and young skin.
less transcriptionally dynamic after wounding and fail to regulate To probe more deeply for differences between young and
key processes necessary for wound-repair. Many genes facili- aged epidermal keratinocytes in vivo, we used fluorescence acti-
tating interactions with immune cells weren’t activated properly vated cell sorting (FACS) to purify basal layer keratinocytes
in basal keratinocytes at the wound-edge of aged skin. Most (a6-integrinhighCD34negativeSca1high) from young and aged
notable were Skint genes. When we investigated the DETCs, mouse skin, followed by deep sequencing (RNA-seq) of their
we found that our unwounded aged mice harbored Vg5Vd1 mRNAs. Comparative expression analysis of duplicates of
DETCs, and hence differed from Skint1 null mice. However, the RNA-seq data revealed 74 genes that were ±2-fold differentially
DETCs displayed an age-related, wound-specific defect in their expressed (q < 0.05) between young and aged keratinocytes
behavior. (Figure 1E and Table S1). Overall, however, their transcripts
Our findings brought to the forefront prior speculation, never (56 upregulated, 18 downregulated) were relatively modestly
tested, that SKINTs or some other interacting ligand(s) on changed (1.9-fold average), indicating that under homeostatic
wound-proximal keratinocytes might function in the DETC conditions, aged animals maintain an epidermis that is architec-
response to injury (Havran et al., 1991; Jameson et al., 2004; Ko- turally and transcriptionally similar to that of young mice.
mori et al., 2012). We therefore turned to addressing whether
Skints might function in adult tissue homeostasis and wound- Aged Skin Is Slow to Re-epithelialize Wounds Following
repair, and whether perturbations in SKINTs might affect DETCs Injury
and/or their communication with epidermal cells to account for We next challenged the epidermis to wounding, to see if the
some of the age-related defects in wound healing. epidermis of aged mice was able to mount an injury response
Specifically, we discovered that young mice conditionally comparable to younger mice. Six millimeter punch biopsies
knocked down for Skint3 and Skint9 in epidermal keratinocytes created full-thickness (epidermis + dermis) wounds on the ani-
display defects in wound-repair and in wound-related DETC mals’ backs, which typically healed by 7 days (d7). As shown
behavior. Similarly, we found that young mice which a) lack in the images from representative experiments, young mice
Vg5Vd1-DETCs altogether, or b) display DETCs, but either lack consistently closed their wounds faster than their aged counter-
the Skint3-4-9 gene cluster or are epidermally knocked down parts, with little difference on the macroscopic level in wound
for individual Skints, also exhibit delays in skin re-epithelialization contraction (Figure S1). When quantified over five independent
during wound-repair. Finally, we identified conserved STAT3 wound studies, the biggest differences in wound area were
binding motifs in Skint promoters and showed that STAT3- consistently between d3 and d5 post wounding, where the rate
signaling and one of its upstream activators, Interleukin-6, are of wound closure was always faster in young animals (Figures
diminished in aged, wounded skin. Moreover, Stat3-null mice S1B and S1C).
1324 Cell 167, 1323–1338, November 17, 2016

A B C
Epidermis Young (2-4 months) Aged (22-24 months) Histology Quantification
Cornified Env. Epi
Gran.(LOR, FLG)
Derm
Spin.(K1, K10) HF
HF
Basal(K5, K14)
SubCu Fat
B.M.(CD104)
D Young Aged E Homeostasis young/aged

K14 CD104
-log10 p-value
2
K5 K10
0
−10 −5 0 5 10
log2 Fold Change
Figure 1. Young and Aged Epidermis

(A) Schematic illustrating the differentiated layers of the epidermis.
(B) Images of semi-thin sections of young (2–4 months old) and aged (22–24 months old) skin stained with toluidine blue. Abbreviations: Epi, epidermis; Derm,
dermis; HF, hair follicle; SubCu Fat, subcutaneous fat. Scale bars, 100 mm.
(C) Quantification of the thickness of epidermis and dermis of young and aged skin. n = 8. Students t test was used to measure statistical significance.
(D) Immunofluorescence images of young and aged skin labeled with antibodies (Abs) against keratin 14 (K14), b4-integrin (CD104), keratin 5 (K5) and keratin 10
(K10) [secondary Abs are color-coded as shown]. Sections were co-stained with DAPI (blue) to visualize nuclei. Scale bars, 25 mm.
(E) Volcano plot of in vivo RNA-seq data comparing young:aged basal keratinocyte transcripts. Vertical red colored lines denote fold changes greater ± 2 fold.
Horizontal red line denotes p value > 0.05. Data are represented as mean ± SEM.
See also Table S1.
To visualize the re-epithelialization process, wounded skins Declines in Proliferative Capacity of Aged Keratinocytes
from mice were collected at intervals and subjected to K14 im- To understand the basis of the delayed re-epithelialization of
munostaining (Figure 2A). Wounds at d1 post-wounding ex- wounds in aged animals, we assayed the functional abilities
hibited little or no signs of re-epithelialization, but thereafter, an of young and aged keratinocytes to proliferate and migrate in
epithelial tongue of migrating keratinocytes was visible under- response to injury. To this end, at intervals after wounding,
neath the eschar (Figure 2C). Beginning at d3 and culminating mice were pulsed with 5-ethynyl-20 -deoxyuridine (EdU) for
at d5, a marked delay in re-epithelialization was evident in the 3 hr before harvesting and analyzing their skins. As quantified
wound beds of aged mice. At d5 when re-epithelialization had both in tissue sections of wounded skin and basal keratino-
closed the wound (96% ± 4%) in young animals, the epithelial cytes analyzed by flow cytometry, significantly fewer EdU
tongues from opposing sides of the wounds of aged animals labeled cells were seen in aged versus young epidermis at d3
had migrated less than half way (41% ± 8%) into the wound and d5 post-wound time-points (Figure 3A). This difference
site (Figure 2C). By d7, aged wounds were still not completely was not seen under homeostatic conditions, where basal levels
closed (92% ± 5%). of proliferation were similar (Figure 3B). Together, these find-
The delay in wound closure in aged animals corresponded ings suggest that the defect is rooted specifically in the
well to the decreased migration of the epithelial tongue under wound-response. Consistent with our punch wound studies,
the eschar, as quantified in Figure 2D. Moreover, signs of we also observed a notable decrease in proliferation in aged
epidermal responsiveness, as judged by enhanced epidermal versus young epidermal keratinocytes from skins that were
thickening and associated wound-induced keratin K17, analyzed 24 hr after waxing to depilate the skin, a procedure
extended further from the wound site in young than in aged that stimulates basal layer proliferation analogous to punch
mice (Figure S2). wounds (Figure 3C).
Cell 167, 1323–1338, November 17, 2016 1325

A Young Post-wounding K14 DAPI Aged
d1
d3 S
d5
S
d7
S
B
Epi
d1
Dermis
d3
d5
d7
C D
Figure 2. Re-epithelialization of Young and Aged Cutaneous Wounds

(A) Immunofluorescence images of the temporal re-epithelization process that occurs following skin wounding (t = 0) in young and aged mice. Sections are
immunolabeled for basal epidermal keratinocytes (K14) and co-stained with DAPI. Scale bar, 500 mm. ‘‘S’’ denotes scab and yellow arrows denote wound edge.
(B) Schematics depicting progress of the re-epithelialization process at time-points indicated in young and aged animals. While wounds heal, the process is
delayed in aged skin.
(C and D) Quantification of re-epithelialization in young and aged wounds (C) and of the length of the tongue of epidermal keratinocytes that migrate in from the
wound edges during the wound-repair process (D). Students t test was used to measure statistical significance, n = 5. Data are represented as mean ± SEM.
1326 Cell 167, 1323–1338, November 17, 2016

Figure 3. Functional Capabilities of Young and Aged Keratinocytes
(A) Immunofluorescence images of young and aged wounds at d3 and d5 time-points with proliferating cells labeled with EdU (green). Dashed lines denote
epidermal/dermal boundaries. Scale bars, 100 mm. Quantifications are of EdU incorporation from independent samples. n = 7.
(B) Proliferation of young and aged basal layer keratinocytes under homeostatic conditions. Animals were pulsed with EdU and collected after 24 hr. EdU
incorporation was measured by flow cytometry. Young (7.9% ± 1.2%) and aged (7.6% ± 0.5%), p = 0.82, n = 6.
(C) Proliferation of young and aged basal layer keratinocytes after depilation. EdU was 24 hr prior to collection. Graph shows percentage of basal layer kerati-
nocytes with EdU incorporation post-depilation at indicated time-points. At 24 hr post-depilation young (51.5% ± 1.5%) and aged (28.6% ± 7.0%), p = 0.018, n = 4.
(D) DIC images of explant cultures from young and aged tissue biopsies. Dashed lines denote the borders of keratinocyte outgrowth; yellow lines denote radial
distances migrated. E = explant. Scale bars, 10 mm. n = 12. Quantifications are of the area and distance of outgrowth of keratinocytes in explants during a 7 day
time-course.
(E–I) Scratch wound assays. (E) Migration into 600 mm scratch wounds of aged and young skin keratinocytes was measured by time-lapse video microscopy.
Shown are DIC images of time-points indicated. Quantifications of wound closure are shown in the graph at right. (F) EdU incorporation during the interval of the
time-lapse imaging. (G) Comparative measurements of distance migrated and velocity. (H) Migration plots of individual cells during the wound closure. (I) EdU
incorporation during the interval of the time-lapse imaging. Students t test was used to measure statistical significance. Data are represented as mean ± SEM.
See also Figure S3 and Movies S1 and S2.
Cell 167, 1323–1338, November 17, 2016 1327

A B
Wound response in aged vs young keratinocytes

ANATOMICAL STRUCTURE DEVELOPMENT
POST TRANSLATIONAL PROTEIN MODIFICATION
REGULATION OF TRANSPORT
ENZYME LINKED RECEPTOR PROTEIN SIGNALING PATHWAY
EPIDERMAL GROWTH FACTOR RECEPTOR SIGNALING PATHWAY
REGULATION OF GENE EXPRESSION
CELL MATRIX ADHESION
ESTABLISHMENT OF LOCALIZATION
REGULATION OF TRANSCRIPTION
MITOTIC CELL CYCLE
ANATOMICAL STRUCTURE MORPHOGENESIS
RESPONSE TO HYPOXIA
ACTIN FILAMENT BASED PROCESS
RESPONSE TO OTHER ORGANISM
CELLULAR HOMEOSTASIS
IMMUNE RESPONSE
CELLULAR RESPIRATION
DNA CATABOLIC PROCESS
RESPONSE TO OXIDATIVE STRESS
RNA SPLICING
PROTEIN FOLDING
LOCOMOTORY BEHAVIOR
RNA PROCESSING
CELLULAR BIOSYNTHETIC PROCESS
TRANSLATION
−5.0 −2.5 0.0 2.5 5.0
Normalized Enrichment Score
C
log10 FPKM +1
4
2.0
1.5
1.0 3
-log10(p-value)
0.5
1
yg wound
yg epi
aged wound
aged epi
0
-10 -5 0 5 10
log2(fold change)
D
Genes changed in aged vs young keratinocytes after wounding
Lat2 (-1.1), Ik (-1.2), Il10 (-1.3), Il7 (-1.5), Defb1 (-1.5), Cxcl12 (-1.6), Il15 (-1.9),
immune
Ccl20 (-3.1), Ccl2 (-3.2), Ccrl1 (-4.4), Il1r2 (-4.5), Il6 (-4.5), Skint2 (-7.1), Skint3 (-3.0),
function
Skint9 (-4.0), Skint5 (-3.7), Rfx1(2.3), Il17b (1.9), Il28ra (1.7), Il18 (1.6), Tnfrsf14 (1.7),
Traf6 (1.3), CD97 (3.1)
Inha (2.6), Cdk2 (2.4), Ccnd3 (2.2), Ccnd1 (2.1), Rb1 (1.9), Bub1b (1.9),
cell cycle Cdc25a (1.7), Pten (1.5), Ccna2 (1.4), Cdkn2c (1.4), Chek1 (1.3), Apc (1.2),
Cdc25c (1.2), Cdkn2b (1.2), Pin1 (-1.3), Bccip (-1.3), Cdc16 (-1.3), Cdc6 (1.3)
locomotive Cxcl9 (-1.3), Ccl1 (-1.3), Cklf (-1.6), Cxcl12 (-1.6), Cxcl1 (-1.6), Cxcl10 (-2.0)
behavior Ccl7 (-2.4), Ccl8 (-2.8), Ccrl2 (-2.8), Ccl20 (-3.1), Ccl2 (-3.2), Cxcl14 (-3.6)
Ccrl1 (-4.4), Lamb1 (3.3), Nexn (-1.3), Cdh13 (-1.4), Myh9 (6.4), Slit1 (3.3)
Figure 4. Transcriptional Profiling of Wound Response

(A) Heatmap of differentially regulated transcripts between aged and young epidermal keratinocytes isolated from unwounded skin or the wound edge of skin and
subjected directly to RNA-seq analyses. Yg wound, young keratinocytes isolated from the wound edge; aged wound, aged keratinocytes isolated from the wound
edge; yg epi, young keratinocytes under homeostatic conditions; aged epi, aged keratinocytes under homeostatic conditions. Blue color denotes low FPKM
expression, green high FPKM expression.
(B) Negatively and positively enriched GO terms in genes that were differentially regulated between aged wound and young wound samples.
1328 Cell 167, 1323–1338, November 17, 2016

Aged Keratinocytes Exhibit Intrinsic Defects in Wound associated with age and wound-healing from those associated
Induced Migration merely with age.
The age-related reduction in wound-induced proliferation did not Differential transcription analysis of young and aged keratino-
appear to be attributable to an increased incidence of apoptosis, cytes at the wound edge revealed 393 genes that were differen-
as no differences in activated caspase 3 staining were noted tially regulated in replicates with a q-value < 0.05 (Table S2).
(data not shown). However, over d7 ex vivo, both the area and Hierarchical clustering analysis showed that unwounded young
distance of keratinocyte outgrowth were reduced in aged and aged epidermal samples clustered together, consistent
compared to young skin explants (Figure 3D). Finally, when as- with their similar behaviors during normal homeostasis. Interest-
sayed by time-lapse imaging for their abilities to migrate into ingly, however, aged keratinocytes at the wound edge grouped
the 600 mm space created by a scratch wound, cultured more closely with young and aged homeostatic (unwounded)
keratinocytes from aged skin showed a significant delay in states than did young keratinocytes at the wound edge (Fig-
migration and scratch closure, without a corresponding reduc- ure 4A). Notably, this was even true when gene expression
tion in proliferation (Figures 3E and 3F; Movies S1 and S2). Addi- across the entire transcriptome was compared (Figure S4C).
tionally, when we tracked individual cells, both young and aged Gene set enrichment analysis (GSEA) found 114 core GO
keratinocytes migrated at the same speed and distances (Fig- terms (p value < 0.05) enriched in our age-regulated wound
ure 3G). Strikingly, however, young keratinocytes exhibited gene set relative to young wounded keratinocytes (Figure 4B
more robust directional migration (higher horizontal movement and Table S3). Leading edge analysis identified 8 enriched
into the open area of the scratch) than their aged counterparts core processes (cell cycle, cell death, immune system, meta-
(Figures 3H and 3I). bolism, migration, signaling, transcription, and transport) each
Similar intrinsic age-related migration delays were observed with multiple GO terms enriched within them (Figure S4D).
using Boyden chamber assays, in which serum-starved kerati- When we compared young basal keratinocytes under homeo-
nocytes in the top chamber migrated to the bottom chamber static conditions (young epi) to young wound-edge keratino-
containing feeder cell conditioned media (Figure S3A). In vitro cytes (young wound) to define a molecular signature for the
adhesion assays also revealed defects in the ability of aged wound response. In response to injury, 1,679 genes were down-
keratinocytes to attach to fibronectin-, collagen-, and laminin- regulated by young keratinocytes, while 500 transcripts were
coated plates and reduced cell spreading on these substrates elevated (Table S4). Not surprisingly, these transcripts differen-
(Figures S3B–S3D). Finally, when basal keratinocytes were tially changed by ± 2-fold were enriched for GO terms including
FACS isolated from aged and young mice and compared transcription, cell cycle regulation, migration, metabolism, and
over longer time periods in vitro, aged keratinocytes clearly dis- epidermal development (Table S5).
played reduced colony forming efficiency and formed smaller In contrast, keratinocytes at the wound edge of older mice
colonies (Figures S3E and S3F). These results provided were much more similar to homeostatic conditions, revealing a
compelling evidence that the delayed wound re-epithelialization much less dynamic transcriptional response to injury. Only 564
in aged mice was rooted at least in part in intrinsic defects in genes changed ± 2-fold in wound-activated aged keratinocytes
the proliferative and migratory behavior of aged epidermal compared to their unwounded aged counterparts (Table S6).
keratinocytes. Of these changes, 90% of the 236 upregulated genes in
aged, wound-induced epidermis were also upregulated in their
Aged Skin Keratinocytes In Vivo Display a Less Dynamic younger counterparts; and a 34% overlap was seen among
Transcriptional Response to Wounding Than Their genes that were downregulated in response to wounding in
Youthful Counterparts young and old animals (Figure S4E). Taken together, these
In light of the prominent age-related differences in wound re- data suggest that not only are aged keratinocytes more refrac-
epithelialization in vivo and intrinsic differences measured tory transcriptionally in their response to injury, but there are
in vitro, we next focused on differences in gene expression genes mis-regulated in aged keratinocytes that are not part of
that occur in epidermal keratinocytes at the wound edge. To the normal wound response at this time-point.
capture responses at peak age-related wound-repair differ-
ences, we profiled at d3 after wounding, when both young and Wounding Reveals Age-Related Defects in Cross-Talk
aged keratinocytes were actively re-epithelializing their wounds. Between Epidermal Cells and DETCs
We micro-dissected an 1 mm skin region surrounding the Further inspection revealed that many genes associated with im-
wound-site and then FACS-purified and transcriptionally profiled mune function failed to be regulated by aged keratinocytes at the
the basal epidermal keratinocytes by RNA-seq (Figures S4A and wound edge (Figures 4C and 4D). This was particularly evident in
S4B). By comparing wounded and unwounded keratinocyte pro- the volcano plot, where immune function genes (blue dots in 4C)
files, we could distinguish transcriptional differences specifically were often markedly under-expressed in aged versus young
(C) Volcano plot of differentially regulated genes between young wound and aged wound samples. Vertical rose-colored lines denote fold changes
greater ± 2-fold. Horizontal rose line denotes p value > 0.05. Blue dots indicate genes with a GO annotation relating to immune function (note marked failure of
many of these genes to be upregulated in aged wounds).
(D) Table of selected genes for indicated GO term from RNA-seq analysis (blue, downregulated by log2 of value; green, upregulated by log2 of value). Data are
represented as mean ± SEM.
See also Figure S4 and Tables S2, S3, S4, S5, and S6.
Cell 167, 1323–1338, November 17, 2016 1329

A B C
D
E
F G J
H
I
Figure 5. Immune Cells and Wound Healing in Aged Skin

(A) Skins from young and aged mice either unwounded or within 1 mm of a wound-edge were subjected to flow cytometry analysis using the scheme in Fig-
ure S4A. Percentages of specific immune subclasses relative to total immune cells (CD45+) are shown. DC, dendritic cells (MHCII+CD11c+); Macs, macrophages
(CD64+CD11b+); Mono, monocytes (Ly6chiLy6gneg); Neutro, neutrophils (Ly6cnegLy6ghi); T cells (gdTCR+TCRb). n = 6.
(B) Quantification of skin resident T cells by flow cytometry. Data are presented as percentages of specific T cell subclasses relative to total immune cells
(CD45+). n = 6.
(C) Quantifications of DETCs numbers from 0–700 mm of the wound edge at indicated times after wounding. n = 5.
(D) Schematic of epidermal sheet preparation (whole mount) of wound from birds-eye-view and maximum projections of z series images of epidermal sheets are
from skin adjacent to wound edge (yellow dotted lines), with immunostaining for gd TCR to detect DETCs. Shown are images from d3 after wounding.
(E) At right are quantifications of DETC distribution, plotted in histograms as the distance proximal (0–400 mm) and distal (1,200–1,600 mm) from the wound
edge. n = 10.
1330 Cell 167, 1323–1338, November 17, 2016

wounded skin keratinocytes. This was interesting, as the visualized and quantified (Figures 5H–5J). By d0.5 post-wound-
epidermis is known to undergo various forms of cross-talk with ing, the DETCs of young skin displayed a narrow distribution of
different immune system components, which can impact their rounded cells within 100 mm of the wound edge. This band
proliferative capacity (Castellana et al., 2014; Depianto et al., was maintained at d1 post-wounding, and extended from the
2010; Glitzner et al., 2014). wound edge by d3. In contrast, rather than the tight transition
We used flow cytometry to analyze the overall immune from dendritic to rounded morphology in young skin, rounded
response prior to and following wounding. We observed a clear DETCs were scattered from the start throughout a broader range
influx of neutrophils, monocytes, and macrophages into the from the wound site of aged skin. By d3, many were lost from the
wound bed d3 after wounding. However, no differences were wound edge. Irrespective of body site, DETCs in the vicinity of
noted which could specifically explain the age-related, wound- the wound sites of aged mice showed perturbations in
related downshift in epidermal expression of immune signaling morphology and numbers, often sustaining more dendrites
genes that we unearthed at this same time-point (Figure 5A). than their younger counterparts (Figures 5G and S5E). Together,
Similarly, we did not observe age-specific differences in the these studies exposed age-related, wound-specific defects in
wound response of either ab T cells or dermal gd T cells at this DETC morphology and maintenance.
time (Figure 5B). In contrast, a striking wound site-specific
reduction in epidermal gd T cells emerged that was selective to Epidermally-Expressed Skints Function in Wound
aged skin (Figure 5B). We confirmed their identity of these Repair and in Signaling to DETCs
epidermal gd T cells as Vg5Vd1-DETCs based on their localiza- Since young wound-activated DETCs are known to signal to ker-
tion, dendritic morphology and high surface expression of atinocytes to produce epidermal growth promoting factors
Vg5 (Figure S5A). (Jameson et al., 2002), it seemed plausible that intrinsic age-
In the homeostatic (unwounded) state, DETCs were present related defects in DETCs might account for delays in wound
in equivalent numbers in young and aged skin (Figure 5C). re-epithelialization. That said, the marked intrinsic differences
Following wounding, DETC numbers declined significantly in aged epidermal keratinocytes to mount a transcriptional
more in aged than in young skin; differences peaked by d3 but cascade of immune modulatory factors following injury raised
by d7, when re-epithelialization neared completion, DETC the possibility that wound-specific, epidermally-derived signals
numbers were still low in the repaired skins of aged mice. This to DETCs might be altered in aged animals.
d3–d7 time period corresponded to the time when the greatest Returning to our RNA-seq data, we were struck by the prepon-
differences were seen in aged versus young skin wound healing. derance of Skint genes, which were upregulated in young
To characterize this age- and wound-related difference, we wounded skin epidermis relative to its aged counterpart.
immunostained DETCs in sagittal sections and in whole-mount Notably, during normal homeostasis, Skint2, Skint3, Skint5,
epidermal sheets of backskins. Corroborating the flow cytome- Skint7, and Skint9 were low in basal epidermal keratinocytes
try data, DETC numbers were equivalent in young and aged regardless of age, but at the wound edge, their expression was
skin, but declined markedly in the aged skin after wounding elevated in young but not aged keratinocytes (Figure 6A). RT-
and remained low even at d7 (Figures 5D; S5B–S5E). By qPCR on independent samples of wound-induced epidermis
whole-mount microscopy, we could analyze the temporal validated the age-related differential expression of these Skint
behavior of DETCs relative to the wound edge. Quantifications transcripts (Figure 6B).
revealed a >5X reduction in DETCs seen in a 400 mm radius The functions of Skints in adult tissue homeostasis and
of skin surrounding the d3 wound, while DETC levels remained wound-repair are unknown. However, the established role of
similar at sites distal to the wound (Figure 5E). Skint1 in the selective development of Vg5Vd1-TCR-expressing
In wounded skin, DETCs are known to change from a dendritic DETCs in fetal thymus (Boyden et al., 2008) led us to posit that
to a rounded morphology as they enter an activated state around diminished epidermal Skint activation might account at least in
a wound edge (Chodaczek et al., 2012; Jameson et al., 2002). part for the diminished ability of aged, wounded skin to sustain
We therefore examined DETC morphology changes in wounded and/or activate DETCs. To test the functional relevance of Skints
backskin of young and aged mice. Interestingly, the DETCs at expressed by young epidermal keratinocytes in response to
the wound site were less rounded and displayed more dendrites wounding, we engineered lentiviruses harboring shRNAs that
than their younger counterparts at d1 and d3 after wounding target Skints 1, 2, 3, and 9 and a Scrambled control (Figures
(Figures 5F and 5G). S6A). To identify transduced regions, we added a transgene en-
We also examined epidermal sheet preparations of earskin, coding fluorescently tagged histone H2B-mRFP to each lentiviral
where the characteristic dendritic morphology of DETCs is better vector (Figure S6B). Lentiviruses were injected in utero into the
(F) Sagittal imunofluorescence images of skin (wounded and unwounded), immunostained for DETCs. Dashed lines denote epidermal/dermal boundaries. Scale
bars, 25 mm. Insets show DETCs highlighted with arrows.
(G) Quantification of the numbers of dendrites per DETC. n = 5. Students t test was used to measure statistical significance.
(H) Whole-mount DETC immunofluorescence of ear-skin. Shown are images prior to and at d1 and d3 after wounding. Scale bars, 100 mm. n = 3. Yellow dotted
lines denote wound edge (wd).
(I) Density plots of the distribution of rounded (no dendrites) of DETCs in ear-skin whole-mount preparations at times post-wounding indicated. Vertical lines
represent mean distance of rounded DETCs from wound edge (0 mm).
(J) Quantifications of DETCs at the wound site at times after injury. Data are represented as mean ± SEM. See also Figure S5.
Cell 167, 1323–1338, November 17, 2016 1331

A B C
Skint5 In utero lentivirus injection
Relative Expression (mRNA)

Skint11*
Skint9
log10 FPKM +1
Skint4
2.0
Skint3 1.5
Skint6 1.0
Skint1 0.5
Skint11
Skint10
Skint4*
Skint2 E9.5 Adult animal
Skint7
nd
nd
i
i
ep
ep
ou
ou
ed
yg
w
w
d5 post-wounding
ag
ed
yg
ag
D E S
Scr shRNA
K14, DAPI
S
Skint3 shRNA
Skint9 shRNA
p=0.02 p=0.007
F G H
100
Percent of DETCs
dendrites/DETC
75
2.0
1.5
50
1.0
0.5
25
0.0
0
e
t3
t9
bl
in
in
m
Sk
Sk
ra
Sc
sh
d5 post-wounding p=0.01 p=0.002

I C57BL6 J K 100
S
dendrites/
Percent of DETCs
75 DETC
3
K14, DAPI
FVBjax 2
50 1
S 0
25
FVBtac
S 0
d1
d1
d3
d3
6
6
x
x
BL
BL
ja
ja
B
B
FV
FV
57
57
C
L unwounded d1 d3 M
Rounded DETC distribution
wd wd 0.0100 0.005
FVBjax
C57BL6
C57BL6
γδTCR (DETC)
0.004
0.0075
density
density
0.003 day post-wd

1
0.0050
3
0.002
wd
wd 0.0025
0.001
FVBjax
0.0000 0.000
0 100 200 300 0 100 200 300
distance from wound edge (μm) distance from wound edge (μm)
1332 Cell 167, 1323–1338, November 17, 2016

amniotic sacs of living E9.5 embryos, a method that specifically Similar to aged and control mice, Vg5Vd1-DETCs in young
and efficiently infects the single layer of unspecified surface FVBjax animals were still in equivalent number during normal ho-
epithelial cells (Beronja et al., 2010). Within a few days, lentiviral meostasis (Barbee et al., 2011) (Figures S6H and S6I). This was
DNA integrates and is thereafter stably propagated throughout also the case for wounded skin. However, even though FVBjax
the epidermis (Figure 6C). DETCs had somewhat fewer dendrites in the unwounded state,
Mice transduced in utero with Skint shRNAs lived to adult- DETCs remaining at d1 and d3 time-points after wounding dis-
hood. qPCR confirmed the efficient knock-down of Skints in vivo played more dendrites than normal (Figures 6K and S6J–S6K).
(Figures 6D and S6D). Therefore at P55, we administered punch Additionally, rounded DETCs were atypically intermingled with
wounds and monitored the activation of Skints and the re-epithe- dendritic ones at the FVBjax wound edge (Figures 6L-6M). Over-
lialization process. Notably, wound-induced re-epithelialization all, the perturbations in DETC morphologies at the wound edge
was significantly impaired in these young mice whose epidermis of young FVBjax mice were similar to those of aged C57BL6 mice.
was transduced with Skint3 and Skint9 but not Scrambled Finally, we also examined wound-repair in FVBTac mice, which
shRNAs (Figures 6E and 6F). At d5 post-wounding, the epithelial lack Vg5Vd1-DETCs but which differ from the TCR d null mice
tongues of control wounds were largely sealed (90% ± 6%) while previously studied in a wound context (Jameson et al., 2002) in
wounds from Skint9 and Skint3 knock-down skins showed as lit- that they have other gd T cells in their epidermis (Boyden et al.,
tle as 22% ± 5% and 49% ± 5% closure, respectively. For Skint3 2008) (Figure S6L). Indeed, FVBTac mice too displayed a pro-
and Skint9, two distinct shRNAs were transduced independently nounced wound-repair defect (Figures 6I–6J). This result under-
and knockdowns gave analogous results (Figure S6D). These scored a specific role for Vg5Vd1-DETCs in wound-repair,
data reinforced the efficacy of the wound-impaired phenotypes extending prior data reporting defective wound healing in mice
in young mice lacking SKINTs. lacking all gd T cells (Jameson et al., 2002).
We also assessed the numbers and morphologies of DETCs in
our individual keratinocyte-specific Skint knockdown mice. Skints Act Downstream of STAT3 Signaling, Which Is
Importantly, Skint3 and Skint9 knockdown did not appreciably Reduced at the Aged Wound Front
affect steady state DETC numbers in unwounded skin; however, To gain mechanistic insight into why Skint genes are upregulated
Skint shRNA-transduced young mice showed a modest in epidermal keratinocytes of young wounded skin and yet failed
decrease in DETCs adjacent to the wound bed relative to those to be induced in keratinocytes at the wound edge of aged mice,
in Scrambled control mice (Figure 6G). More noticeable was we analyzed the promoter sequences (4,000 bp up-stream of the
that DETCs displayed more dendrites than their control counter- start codon) for conserved motifs that maybe regulating their
parts (Figure 6H). These alterations in dendrite morphology were expression. MEME software analysis of up-stream sequences
consistent with, albeit not as marked, as we saw in aged identified three highly conserved motifs, distributed across Skint
wounded skin. gene promoters (Figures S7A and S7B). Within the conserved
To further document the role of Skints in wound repair, we took motifs were 39 transcription factor consensus-binding sites,
advantage of the natural deletion of the Skint3, Skint4, and Skint9 which when cross-referenced with our RNA-seq data, yielded
gene locus in FVBjax mice (Boyden et al., 2008) (Figure S6F). nine putative transcription factors that were expressed in basal
Interestingly, young FVBjax mice displayed re-epithelialization epidermal keratinocytes at the wound edge (Figure S7C).
delays when compared to several strains of mice that maintain Among these nine factors was the transcription factor signal
the Skint3, Skint4, and Skint9 gene locus, including closely transducer and activator of transcription 3 (STAT3) (Figure S7D).
related NON/ShiLtJ, as well as C57BL6 and CD-1 (Figures 6I STAT3 took on particular relevance given a prior report that
and 6J and S6G). The kinetics of the re-epithelialization delay Stat3/ mice have delays in wound-healing, with reduced
in young Skint3-4-9 null mice paralleled those in aged C57BL6. keratinocyte migration in in vitro assays (Sano et al., 1999).
Figure 6. Failure of Keratinocytes To Up-Regulate Skints Results in Impaired Wound Healing in Young Mice
(A) Heatmap of Skint gene family expression from RNA-seq data. Asterisks denotes splice variant. Yg wound = young keratinocytes isolated from the wound
edge, aged wound = aged keratinocytes isolated from the wound edge, young epi = young keratinocytes under homeostatic conditions, aged epi = aged
keratinocytes under homeostatic conditions.
(B) qRT-PCR of Skint mRNAs from keratinocytes isolated from wound edges.
(C) Illustration in utero lentiviral infections into amniotic sacs of E9.5 embryos and selective transduction of mouse skin epidermis.
(D) Knockdown efficiency of Skint shRNAs as measured by qRT-PCR of adult epidermis prior to wounding.
(E) Immunofluorescence images of d3 backskins of wounded young mice whose epidermises were transduced in utero for the Skint shRNAs indicated. Scr,
scrambled. Tissue sections are immunolabeled for K14 (green) and DAPI (blue). S, scab; arrows denote wound-edge. Scale bar, 500 mm.
(F) Wound closure at d5 of young mice transduced for the indicated shRNAs. n = 4.
(G and H) Quantification of DETC number and number of dendrites per DETC (H) in sections of unwounded and wounded skins transduced as indicated. n = 4.
(I) Immunofluorescence images of back-skins of re-epithelialization process following 5d or 3d after wounding of young mice of the strains indicated. Note that
FVBJax lacks Skints 3-4-9; FVBTac lacks Vg5Vd1 DETCs. Tissue sections are immunostained for K14 (green) and DAPI (blue). S, scab; arrows denote wound-edge.
Scale bar, 100 mm. n = 2.
(J) Quantification of wound closure by re-epithelialization.
(K) Quantificaton of dendrities per DETC from tissue sections of wounds in C57BL6 and FVBJax at time-points indicated.
(L) Whole-mount immunofluorescence and quantifications of DETCs in ear-skin of young FVBJax versus C57BL/6 mice. Scale bars, 100 mm. n = 2.
(M) Density plots of the distribution of rounded (no dendrites) of DETCs in ear-skin whole-mount preparations at times post-wounding indicated. Vertical lines
represent mean distance of rounded DETCs from wound edge (0 mm). Data are represented as mean ± SEM. See also Figure S6.
Cell 167, 1323–1338, November 17, 2016 1333

Figure 7. STAT3 Signaling Regulates Skint Expression
(A) Skint mRNA levels after IL-6 treatment of primary WT keratinocytes in vitro.
(B) Relative in vivo expression of Skint genes in keratinocytes isolated from d3 wound edges.
(C) Immunofluorescence images of sagittal wounded skin sections from young and aged WT mice. Labeling is for Abs against pSTAT3 (green), K5 (red), and DAPI
(gray). Dashed line denotes epidermal/dermal boundaries, yellow arrows denote wound edges, and ‘‘S’’ denotes scab. Scale bars, 100 mm.
(D) Quantification of the epithelial tongue length at backskin wounds at times post-wound indicated. n = 4.
(E and F) Quantifications of DETC numbers and (F) morphologies from d3 wound edges. n = 4.
(G) Whole-mount immunofluorescence of ear skins imaged at times indicated after wounding. Yellow dotted line denotes wound edge. Scale bar, 100 mm. n = 2.
(H) Density plots of the distribution of rounded (no dendrites) DETCs in ear-skin whole-mount preparations at times post-wounding indicated. Vertical lines
represent mean distance of rounded DETCs from wound edge (0 mm).
(I) Quantifications of DETC numbers in ear skin whole-mounts after wounding at time-points indicated.
1334 Cell 167, 1323–1338, November 17, 2016

Intriguingly, interleukin 6 (IL-6), a canonical up-stream ligand of markedly improved keratinocyte outgrowth in aged and young
STAT3 signaling, showed a 4.5X reduction in aged versus young explants (Figures 7J and 7K). However, the effects were more
keratinocytes after wounding (Figure 4D). However, after treat- pronounced in aged explants, which showed a 2.2-fold increase
ment of either young or aged keratinocytes with recombinant in IL-6 mediated outgrowth relative to a 1.4-fold increase in
IL-6 in vitro, phosphorylated (activated) STAT3 and nuclear young explants.
translocation occurred within 30 min of treatment (Figure S7E).
Importantly, as judged by RT-qPCR, a number of Skint tran- DISCUSSION
scripts were appreciably elevated by IL-6 (Figure 7A).
If the enhanced induction of Skints following pSTAT3-signaling Wound healing is a complex biological process. It requires the
is physiologically relevant, then the ability of keratinocytes to interaction of diverse cell types and distinct signaling pathways,
induce Skint expression after wounding should be reduced in which must be orchestrated in a spatiotemporal manner to
K14-Cre;Stat3fl/fl mice. Indeed, Stat3-null epidermis from young achieve proper re-epithelialization. Beginning with DuNouy’s ob-
mice showed significantly reduced activation of Skints following servations of delayed wound healing in older soldiers in World
wounding (Figure 7B). War I, to more rigorous experimental evidence in rats and other
If the link between STAT3 and Skint gene expression is rele- animals, studies have shown that wound healing is delayed in
vant to the age-related decline in the ability to activate DETCs, aged tissues (Goodson and Hunt, 1979; Raja et al., 2007; Reed
there should be a corresponding age-related decline in STAT3 et al., 2003). Alterations have been described in almost every
signaling upon wounding. Immunostaining for activated, phos- phase of the healing process, with delays ranging from 20% to
pho-STAT3-Tyr705 (pSTAT3) in d3 post-wound tissue sections 60% (Ashcroft et al., 2002; Gosain and DiPietro, 2004; Sgonc
revealed many fewer and much more weakly labeled keratino- and Gruber, 2013). The molecular underpinnings of why such de-
cytes in aged skin relative to their young counterparts (Figure 7C). lays are observed, how age-related physiological changes nega-
In contrast, only rare pSTAT3-positive cells were seen in tively affect wound healing, and how chronic wounds develop in
unwounded skin tissue regardless of age (Figure S7F). We elderly individuals is poorly understood.
corroborated a wound-closure delay in K14-Cre;Stat3fl/fl mice, In our study, both intrinsic and extrinsic factors contributed to
and further showed that analogous to aged mice, it is the re- impaired healing in aged skin. We detected reduced proliferation
epithelialization feature of wound-repair that is defective and and re-epithelialization at the wound sites of aged versus young
with similar slowed kinetics of healing (Figures 7D and S7G). animals, and when placed into equivalent environments in vitro,
Given our observations that first, Skints are reduced in Stat3 aged keratinocytes still displayed reductions in colony forming
cKO mice and second, that DETC behavior is perturbed in efficiency, explant outgrowth, and migration when compared
mice deficient in epidermal Skints, we focused on the DETCs with their youthful counterparts. Moreover, aged basal epidermal
in Stat3 cKO animals. In the unwounded state, DETC numbers keratinocytes isolated from the wound edge appeared to be
were unaffected by Stat3 loss. However, following wounding, more recalcitrant to activation, as judged by their markedly
DETC numbers declined dramatically in the Stat3 cKO skin (Fig- reduced transcriptional activity of genes involved in important
ures 7E). Moreover, those DETCs that remained at the wound processes of wound-repair.
site displayed more dendrites than in Stat3 Het animals (Fig- While these features pointed to the age-related intrinsic alter-
ure 7F). Whole-mount immunofluorescence microscopy of ations in epidermal keratinocytes, marked changes in the
wounded Stat3 cKO ear-skin revealed that at early times, wound extrinsic environment of the wound also surfaced in aged skin,
edges lacked the tight distribution of rounded DETCs seen in the as exemplified by the significant decline in epidermally ex-
control mice (Figures 7G and 7H). By d3 there was a marked pressed immune response genes. These age- and wound-spe-
paucity of DETCs (Figure 7I). Taken together, the loss of cific transcriptional differences in the epidermal keratinocytes
STAT3 recapitulated the defects in DETCs and wound healing were accompanied by age-related perturbations in DETC
seen in aged mice and in mice deficient for Skint3/9. behavior during wound-repair.
Recent lines of evidence have implicated the adaptive immune
IL6 Treatment Improves Wound Repair in Aged Skin system in a number of facets of wound-repair in multiple tissue
Our collective findings pointed to the existence and functional types (Burzyn et al., 2013; Carvalho et al., 2014; Jameson and
importance of a key IL6-STAT3-Skint connection in wound- Havran, 2007; McGee et al., 2013; Rani et al., 2015). Mice that
repair. If as our results suggest, a decline in the ability to activate lack all gd T cells display an 2d delay in wound healing
this pathway is at the crux of the age-related wound-repair de- (Jameson et al., 2002). We observed similar wound-induced de-
fects, it should be possible to enhance keratinocyte migration lays in re-epithelialization in FVBTac mice, which specifically lack
in skin explants from aged mice through administration of IL-6. Vg5Vd1-DETCs but display other gd-T cells in the epidermis that
To test this hypothesis, we treated explants from young and are normally confined to the dermis (Barbee et al., 2011; Lewis
aged skin with 10 and 50 ng/ml IL-6 in vitro and monitored kera- et al., 2006). Our finding demonstrates that Vg5Vd1-DETC
tinocyte outgrowth at d5 and d7 time-points. Exposure to IL-6 loss alone is sufficient to instigate wound-related problems in
(J) DIC images (from d5 time-point) of explant cultures from young and aged tissue biopsies treated with IL-6 at concentrations indicated. Dashed lines denote the
borders of keratinocyte outgrowth; E, explant. Scale bars, 10 mm. n = 8.
(K) Quantifications are of the distance of outgrowth of keratinocytes in explants during a 7 day time-course. n = 8. Data are represented as mean ± SEM. See also
Figure S7.
Cell 167, 1323–1338, November 17, 2016 1335

re-epithelialization, lending strong support for prior studies also abate the wound healing impairment in aged mice. Given
showing that upon wounding, DETCs produce signaling factors that elevated IL-6 has been associated with inflammation in
which act in paracrine to promote epidermal proliferation and aged tissue and observed in senescent cells (Franceschi and
healing (Jameson and Havran, 2007). Campisi, 2014; Kojima et al., 2013), whether these chronic pro-
In light of these data, it was particularly intriguing to unearth cesses negatively impact the acute wound healing response in
aberrations in DETC behavior in aging mice, because they are aged wounded skin was particularly intriguing, and suggests
wild-type for gd T cells. Moreover, age-related DETC perturba- avenues for future therapeutics in accelerating healing in the
tions only surfaced following wounding, where DETCs which elderly population.
normally participate in re-epithelialization, were not maintained
at the wound edge, resulting in delays in restoration of the skin STAR+METHODS
barrier. This particular defect took on all the more interest given
a previous report that a soluble form of Vg5Vd1-TCR binds to Detailed methods are provided in the online version of this paper
keratinocytes adjacent to a wound, but not to keratinocytes in and include the following:
unwounded skin (Komori et al., 2012). Such findings provide
strong support for the notion that epidermal keratinocytes at d KEY RESOURCES TABLE
the wound edge undergo a specific change that impacts their d CONTACT FOR REAGENT AND RESOURCE SHARING
recognition by DETCs (Havran and Jameson, 2010). d EXPERIMENTAL MODEL AND SUBJECT DETAILS
We traced the elusive epidermally expressed ligand(s) that can B Mice and Wounding Experiments
affect wound-induced DETC behavior to SKINTS, whose func- d METHOD DETAILS
tions in adult tissues are unknown. We first discovered that epi- B Wounding Study
dermally expressed Skints are selectively upregulated in the B Histology and Immunofluorescence
basal epidermal keratinocytes at the wound edge of young but B Cell Culture
not aged mice. We next determined that Skint expression is B RNA-Seq and Analysis
directly influenced by STAT3 signaling, which like its up-stream B Lentvirus Production and Injections
activator IL-6 and its downstream Skint targets, is also dimin- B Flow Cytometry
ished at the wound edge of aged mice. Finally, we unearthed B RT-qPCR
wound-induced re-epithelialization delays in mice that are d QUANTIFICATION AND STATISTICAL ANALYSIS
(1) knocked down for epidermal Skint3 and Skint9, (2) deleted B Data and Software Availability
for Skints 3, 4 and 9, or (3) conditionally targeted for epidermal
Stat3. Importantly, like aged mice, these various mutant mice SUPPLEMENTAL INFORMATION
were still replete with Vg5Vd1-DETCs but exhibited perturba-
tions in DETC behavior. Supplemental Information includes seven figures, seven tables, and two
movies and can be found with this article online at http://dx.doi.org/10.1016/
Our findings provide compelling evidence that epidermal ker-
j.cell.2016.10.052.
atinocytes are powerful not only in responding to, but also in
communicating with, their resident DETCs to achieve restoration AUTHOR CONTRIBUTIONS
of the skin barrier following injury. Moreover, they provide in-
sights into the physiological significance of SKINTs, which previ- B.E.K. and E.F. conceptualized the study. B.E.K. and E.F. wrote the manu-
ously until now has been limited to SKINT1. Our findings here script. B.E.K., E.F., S.L. and S.N. designed experiments. B.E.K. characterized
expose a role for other SKINT family members in mediating ker- young and aged skin, performed wounding studies and analyzed re-epithelial-
ization of young and aged backskin wounds; performed in vitro experiments
atinocyte-DETC crosstalk during wound-repair. Since individual
and collected and analyzed in vivo RNA-seq data.; analyzed performed
Skint knockdowns showed a wound-delay phenotype and
immune cell analysis and Skint knockdown studies and analysis of Skint
SKINTs are known to homo-/hetero-dimerize (Barbee et al., expression. B.E.K. and S.L. performed Stat3 cKO wounding experiments.
2011), it will be interesting in the future to probe deeper into their S.L. performed NON/ShiLtJ, FVBjax and FVBtac young mice wounding
interactions and functions at the wound-edge. studies; aged ear skin wounding studies. Vg5 qPCR was performed by S.L.
Finally, our data also underscore a mechanistic role for IL-6 and C.P.L. S.L. and S.N. performed the IL-6 explant experiments and provided
mediated pSTAT3 signaling in driving Skint transcription during immunology expertise in characterizing immune cell populations. L.P. assisted
with wounding procedures. J.L. performed in utero lentiviral injections. M.N.
a youthful wound-response, and reveal an overall reduction in
and B.E.K. performed cell adhesion assays. H.A.P performed ultrathin
this circuitry in aged mice. Although pSTAT3’s functions in sectioning and staining of young and aged skin; A.A. and B.E.K. provided bio-
wound-induced immune responses no doubt extend beyond informatics and quantitation/statistics expertise.
the pathway that we discovered here, the effects of epidermal
Stat3 loss of function on the ability to retain DETCs at a wound ACKNOWLEDGMENTS
site were strikingly similar to that which we observed in aged
mice. We thank Fuchs’ lab colleagues Irina Matos and E. Heller for help with micro-
When coupled with the wound-induced re-epithelialization scopy; Shijing Luo and N. Oshimori for intellectual input and suggestions;
M. Sribour and S. Hacker for technical assistance in the mouse facility;
impairment for both aged and epidermal Stat3-null skin, our
J. dela Cruz-Racelis for assistance in tissue sectioning and culture; E. Wong
data demonstrate that this circuitry is critical in skin barrier resto- for genotyping; S. Larson for assistance in tissue explants. Rockefeller Univer-
ration. Indeed our findings show that IL-6 can both enhance sity’s Comparative Bioscience Center (AAALAC accredited) provided care of
pSTAT3 and Skint expression in epidermal keratinocytes and mice in accordance with National Institutes of Health (NIH) guidelines and
1336 Cell 167, 1323–1338, November 17, 2016

Flow Cytometry facility for FACS sorting. Weill Cornell Medical School Geno- Gray, E.E., Suzuki, K., and Cyster, J.G. (2011). Cutting edge: Identification of
mics Center conducted sequencing. E.F. is an Investigator of the Howard a motile IL-17-producing gammadelta T cell population in the dermis.
Hughes Medical Institute and a Senior Investigator of the Ellison Foundation J. Immunol. 186, 6091–6095.
for Aging Research. B.E.K. was funded by the NIH/NCI (CA009673-36A1) Gurtner, G.C., Werner, S., Barrandon, Y., and Longaker, M.T. (2008). Wound
and a Postdoctoral Fellowship from AFAR. S.L. is a Jane Coffin Childs Post- repair and regeneration. Nature 453, 314–321.
doctoral Fellow and a Women & Science Fellow. A.A. is the recipient of a Merck
graduate fellowship and a Medical Scientist Training Program traineeship. S.N. Havran, W.L., and Jameson, J.M. (2010). Epidermal T cells and wound healing.
is a Damon Runyon Postdoctoral Fellow. This study was supported by grants J. Immunol. 184, 5423–5428.
from the NIH (R01-AR050452) and the Ellison Foundation (AG-SS-2965-12, Havran, W.L., Chien, Y.H., and Allison, J.P. (1991). Recognition of self antigens
E.F.). by skin-derived T cells with invariant gamma delta antigen receptors. Science
252, 1430–1432.
Received: November 4, 2015
Hayday, A.C. (2009). Gammadelta T cells and the lymphoid stress-surveillance
response. Immunity 31, 184–196.
Published: November 17, 2016 Heath, W.R., and Carbone, F.R. (2013). The skin-resident and migratory im-
mune system in steady state and memory: innate lymphocytes, dendritic cells
and T cells. Nat. Immunol. 14, 978–985.
REFERENCES
Humphries, M.J. (2009). Cell adhesion assays. Methods Mol. Biol. 522,
203–210.
Ashcroft, G.S., Mills, S.J., and Ashworth, J.J. (2002). Ageing and wound heal-
ing. Biogerontology 3, 337–345. Itohara, S., Mombaerts, P., Lafaille, J., Iacomini, J., Nelson, A., Clarke, A.R.,
Hooper, M.L., Farr, A., and Tonegawa, S. (1993). T cell receptor delta gene
Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren,
mutant mice: independent generation of alpha beta T cells and programmed
J., Li, W.W., and Noble, W.S. (2009). MEME SUITE: tools for motif discovery
rearrangements of gamma delta TCR genes. Cell 72, 337–348.
and searching. Nucleic Acids Res. 37, W202–W208.
Jameson, J., and Havran, W.L. (2007). Skin gammadelta T-cell functions in ho-
Barbee, S.D., Woodward, M.J., Turchinovich, G., Mention, J.-J., Lewis, J.M.,
meostasis and wound healing. Immunol. Rev. 215, 114–122.
Boyden, L.M., Lifton, R.P., Tigelaar, R., and Hayday, A.C. (2011). Skint-1 is a
highly specific, unique selecting component for epidermal T cells. Proc. Natl. Jameson, J., Ugarte, K., Chen, N., Yachi, P., Fuchs, E., Boismenu, R., and
Acad. Sci. USA 108, 3330–3335. Havran, W.L. (2002). A role for skin gammadelta T cells in wound repair. Sci-
ence 296, 747–749.
Beronja, S., Livshits, G., Williams, S., and Fuchs, E. (2010). Rapid functional
dissection of genetic networks via tissue-specific transduction and RNAi in Jameson, J.M., Cauvi, G., Witherden, D.A., and Havran, W.L. (2004). A kerati-
mouse embryos. Nat. Med. 16, 821–827. nocyte-responsive gamma delta TCR is necessary for dendritic epidermal
Boyden, L.M., Lewis, J.M., Barbee, S.D., Bas, A., Girardi, M., Hayday, A.C., Ti- T cell activation by damaged keratinocytes and maintenance in the epidermis.
gelaar, R.E., and Lifton, R.P. (2008). Skint1, the prototype of a newly identified J. Immunol. 172, 3573–3579.
immunoglobulin superfamily gene cluster, positively selects epidermal gam- Kennedy, B.K., Berger, S.L., Brunet, A., Campisi, J., Cuervo, A.M., Epel, E.S.,
madelta T cells. Nat. Genet. 40, 656–662. Franceschi, C., Lithgow, G.J., Morimoto, R.I., Pessin, J.E., et al. (2014). Gero-
Burzyn, D., Kuswanto, W., Kolodin, D., Shadrach, J.L., Cerletti, M., Jang, Y., science: linking aging to chronic disease. Cell 159, 709–713.
Sefik, E., Tan, T.G., Wagers, A.J., Benoist, C., and Mathis, D. (2013). A special Kenyon, C.J. (2010). The genetics of ageing. Nature 464, 504–512.
population of regulatory T cells potentiates muscle repair. Cell 155, 1282–
Keyes, B.E., Segal, J.P., Heller, E., Lien, W.-H., Chang, C.-Y., Guo, X., Oristian,
1295.
D.S., Zheng, D., and Fuchs, E. (2013). Nfatc1 orchestrates aging in hair follicle
Carvalho, L., Jacinto, A., and Matova, N. (2014). The Toll/NF-kB signaling stem cells. Proc. Natl. Acad. Sci. USA 110, E4950–E4959.
pathway is required for epidermal wound repair in Drosophila. Proc. Natl.
Kojima, H., Inoue, T., Kunimoto, H., and Nakajima, K. (2013). IL-6-STAT3
Acad. Sci. USA 111, E5373–E5382.
signaling and premature senescence. JAK-STAT 2, e25763.
Castellana, D., Paus, R., and Perez-Moreno, M. (2014). Macrophages
Komori, H.K., Witherden, D.A., Kelly, R., Sendaydiego, K., Jameson, J.M.,
contribute to the cyclic activation of adult hair follicle stem cells. PLoS Biol.
Teyton, L., and Havran, W.L. (2012). Cutting edge: dendritic epidermal gd
12, e1002002.
T cell ligands are rapidly and locally expressed by keratinocytes following
Chodaczek, G., Papanna, V., Zal, M.A., and Zal, T. (2012). Body-barrier surveil- cutaneous wounding. J. Immunol. 188, 2972–2976.
lance by epidermal gd TCRs. Nat. Immunol. 13, 272–282.
Lewis, J.M., Girardi, M., Roberts, S.J., Barbee, S.D., Hayday, A.C., and Tige-
Depianto, D., Kerns, M.L., Dlugosz, A.A., and Coulombe, P.A. (2010). Keratin laar, R.E. (2006). Selection of the cutaneous intraepithelial gammadelta+ T cell
17 promotes epithelial proliferation and tumor growth by polarizing the im- repertoire by a thymic stromal determinant. Nat. Immunol. 7, 843–850.
mune response in skin. Nat. Genet. 42, 910–914.
López-Otı́n, C., Blasco, M.A., Partridge, L., Serrano, M., and Kroemer, G.
Franceschi, C., and Campisi, J. (2014). Chronic inflammation (inflammaging) (2013). The hallmarks of aging. Cell 153, 1194–1217.
and its potential contribution to age-associated diseases. J. Gerontol.
McGee, H.M., Schmidt, B.A., Booth, C.J., Yancopoulos, G.D., Valenzuela,
A Biol. Sci. Med. Sci. 69(Suppl 1 ), S4–S9.
D.M., Murphy, A.J., Stevens, S., Flavell, R.A., and Horsley, V. (2013). IL-22 pro-
Fuchs, E. (2007). Scratching the surface of skin development. Nature 445, motes fibroblast-mediated wound repair in the skin. J. Invest. Dermatol. 133,
834–842. 1321–1329.
Glitzner, E., Korosec, A., Brunner, P.M., Drobits, B., Amberg, N., Schonthaler, Moffat, J., Grueneberg, D.A., Yang, X., Kim, S.Y., Kloepfer, A.M., Hinkle, G.,
H.B., Kopp, T., Wagner, E.F., Stingl, G., Holcmann, M., and Sibilia, M. (2014). Piqani, B., Eisenhaure, T.M., Luo, B., Grenier, J.K., et al. (2006). A lentiviral
Specific roles for dendritic cell subsets during initiation and progression of RNAi library for human and mouse genes applied to an arrayed viral high-con-
psoriasis. EMBO Mol. Med. 6, 1312–1327. tent screen. Cell 124, 1283–1298.
Goodson, W.H., 3rd, and Hunt, T.K. (1979). Wound healing and aging. Mohamed, R.H., Sutoh, Y., Itoh, Y., Otsuka, N., Miyatake, Y., Ogasawara, K.,
J. Invest. Dermatol. 73, 88–91. and Kasahara, M. (2015). The SKINT1-like gene is inactivated in hominoids but
Gosain, A., and DiPietro, L.A. (2004). Aging and wound healing. World J. Surg. not in all primate species: implications for the origin of dendritic epidermal
28, 321–326. T cells. PLoS ONE 10, e0123258.
Cell 167, 1323–1338, November 17, 2016 1337

Nishimura, E.K., Granter, S.R., and Fisher, D.E. (2005). Mechanisms of hair Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gil-
graying: incomplete melanocyte stem cell maintenance in the niche. Science lette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., and
307, 720–724. Mesirov, J.P. (2005). Gene set enrichment analysis: a knowledge-based
Nowak, J.A., and Fuchs, E. (2009). Isolation and culture of epithelial stem cells. approach for interpreting genome-wide expression profiles. Proc. Natl.
Methods Mol. Biol. 482, 215–232. Acad. Sci. USA 102, 15545–15550.
Oh, J., Lee, Y.D., and Wagers, A.J. (2014). Stem cell aging: mechanisms, reg- Sumaria, N., Roediger, B., Ng, L.G., Qin, J., Pinto, R., Cavanagh, L.L., Shklov-
ulators and therapeutic opportunities. Nat. Med. 20, 870–880. skaya, E., Fazekas de St Groth, B., Triccas, J.A., and Weninger, W. (2011).
Raja, S., Sivamani, K., Garcia, M.S., and Isseroff, R.R. (2007). Wound re- Cutaneous immunosurveillance by self-renewing dermal gammadelta
epithelialization: modulating keratinocyte migration in wound healing. Front. T cells. J. Exp. Med. 208, 505–518.
Biosci. 12, 2849–2868.
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren,
Rani, M., Zhang, Q., Scherer, M.R., Cap, A.P., and Schwacha, M.G. (2015). M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly
Activated skin gd T-cells regulate T-cell infiltration of the wound site after and quantification by RNA-Seq reveals unannotated transcripts and isoform
burn. Innate Immun. 21, 140–150. switching during cell differentiation. Nat. Biotechnol. 28, 511–515.
Reed, M.J., Koike, T., and Puolakkainen, P. (2003). Wound repair in aging.
Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel,
A review. Methods Mol. Med. 78, 217–237.
H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and tran-
Rheinwald, J.G., and Green, H. (1975). Serial cultivation of strains of human script expression analysis of RNA-seq experiments with TopHat and Cufflinks.
epidermal keratinocytes: the formation of keratinizing colonies from single Nat. Protoc. 7, 562–578.
cells. Cell 6, 331–343.
Sano, S., Itami, S., Takeda, K., Tarutani, M., Yamaguchi, Y., Miura, H., Yoshi- Turchinovich, G., and Hayday, A.C. (2011). Skint-1 identifies a common molec-
kawa, K., Akira, S., and Takeda, J. (1999). Keratinocyte-specific ablation of ular mechanism for the development of interferon-g-secreting versus inter-
Stat3 exhibits impaired skin remodeling, but does not affect skin morphogen- leukin-17-secreting gd T cells. Immunity 35, 59–68.
esis. 18, 4657–4668. Velarde, M.C., Demaria, M., Melov, S., and Campisi, J. (2015). Pleiotropic age-
Sgonc, R., and Gruber, J. (2013). Age-related aspects of cutaneous wound dependent effects of mitochondrial dysfunction on epidermal stem cells. Proc.
healing: a mini-review. Gerontology 59, 159–164. Natl. Acad. Sci. USA 112, 10407–10412.
1338 Cell 167, 1323–1338, November 17, 2016

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
Guinea pig Anti-Mouse K5 Fuchs Lab N/A
Rabbit Anti-Mouse K14 Fuchs Lab N/A
Rabbit Anti-Mouse K17 Fuchs Lab N/A
Guinea pig Anti-Mouse K6 Fuchs Lab N/A
Rabbit Anti- K10 Clone: Poly19054 Biolegend (Covance) Cat# 905401
Rabbit Anti Cleaved Caspase-3 Clone: 269518 R&D Cat# MAB 835
Armenian Hamster Anti- gd TCR Clone: GL3 AF647 Biolgend Cat# 118133
Armenian Hamster Anti- gd TCR Clone: GL3 Biolgend Cat# 118101
Rabbit Anti-pStat3 (Tyr705) Clone:D3A7 Cell Signaling Cat# 9145
Rat Anti-CD104 Clone: 346-11A Biolegend Cat#123602
Syrian Hamster Anti-Mouse TCR Vg3 Clone: 536 Biolegend Cat# 137505
Rat Anti-Mouse CD3e Clone: 17A2 Biolegend Cat# 100212
Armenian Hamster Anti-MouseTCRb Clone: H57-597 Biolegend Cat# 109201
Armenian Hamster Anti-MouseTCRb Clone: H57-597 Biolegend Cat# 109227
PerCp/Cy5.5
Donkey Anti-Rabbit, AF488 conjugated secondary Jackson ImmunoReseach Cat# 711-545-152
Donkey Anti-Rabbit AF546, conjugated secondary Jackson ImmunoReseach Cat# 711-165-152
Donkey Anti-Rabbit AF647, conjugated secondary Jackson ImmunoReseach Cat# 711-605-152
Donkey Anti-Guinea pig AF488, conjugated secondary Jackson ImmunoReseach Cat# 706-545-148
Donkey Anti-Guinea pig RRX conjugated secondary Jackson ImmunoReseach Cat# 706-295-148
Donkey Anti-Guinea pig AF647, conjugated secondary Jackson ImmunoReseach Cat# 706-605-148
Donkey Anti-Rat AF488, conjugated secondary Jackson ImmunoReseach Cat# 712-545-150
Donkey Anti-Rat RRX, conjugated secondary Jackson ImmunoReseach Cat# 712-295-150
Donkey Anti-Rat AF647, conjugated secondary Jackson ImmunoReseach Cat# 712-605-150
Anti-Mouse Ly6c-FITC Clone: HK1.4 Biolegend Cat# 128005
Anti-Mouse Ly6g-PE Clone:1A8 Biolegend Cat# 127607
Anti-Mouse CD11c-PECy7 Clone: N418 Biolegend Cat# 117317
Anti-Mouse CD11b-PacBlue Clone: M1/70 Biolegend Cat# 101223
Anti-Mouse 1-A/1-E-AF700 Clone:M5/114.15.2 Biolegend Cat# 107621
Anti-Mouse CD45-AF750 Clone: 30-F11 Biolegend Cat# 103153
Anti-Mouse CD64-PerCP-Cy5 Clone:X54-5/7.1 Biolegend Cat# 139307
Anti-Mouse CD34 eFluor 660 Clone: RAM34 eBiosciences Cat# 50-0341-82
Anti-mouse Ly-6A/E (Sca-1) PerCP-Cy5.5 eBiosciences Cat# 45-5981-82
Anti-Human CD49f (a6-integrin) PE BD PharMingen Cat# 555736
OCT Compount Tissue Tek VWR Cat# 25608-930
4’6’-diamidino-2-phenylindole (DAPI) Sigma-Aldrich Cat# 28718-90-3
Prolong Gold Invitrogen Cat# P36930
Ammonium Thiocyanate Sigma-Aldrich Cat# 221988
Glutaraldehyde Solution Sigma-Aldrich Cat# G5882
16% Paraformaldehyde Solution Electron Microscopy Sciences Cat#15700
Osmium Tetroxide Sigma-Aldrich Cat# 20816-12-0
Mitomycin-C Sigma-Aldrich Cat# M7949

Continued
Trypan Blue Sigma-Aldrich Cat# T8154
Rhodamine B Sigma-Aldrich Cat # R6626
Recombinant Murine Interlukin-6 R&D Cat# 406-ML-CF
TRI Reagent Sigma Cat# T3934
Poly-L-lysine Sigma-Aldrich Cat# P4707
Bovine Serum Albumin Sigma-Aldrich Cat# A7906
Fibronectin Human Protein, Plasma Millipore Cat# FC010
Corning Collagen I, Rat Taile Corning Cat# 354236
Matrigel Matrix (Phenol Free) Corning Cat# 356237
E-Media Fuchs Lab N/A
Trypsin-EDTA(25%) GIBCO Cat# 25200056
RPMI with L-glutamine ThermoFisher Cat# 11875-093
Sodium Pyruvate (100 mM) ThermoFisher Cat# 11360070
Acid free HEPES (1 M) ThermoFisher Cat# 15630080
Liberase TL Research Grade Sigma-Aldrich Cat# 5401020001
DNase 1 from bovine pancreas Sigma-Aldrich Cat# D4263
LIVE/DEAD Fixable Blue Dead Cell Stain Kit ThermoFisher Cat# L23105
SYBR Green PCR Master Mix Appliedbiosystems Cat# 4367659
Click-IT EdU Alexa-Flour Imaging Kit Life Technologies Cat# C10337
Directzol RNA MiniPrep Zymo Cat# R2050
TrueSeq RNA Library PrepKit lllumina Cat# RS-122-2001
SuperScript VILO cDNA Synthesis Kit and Master Mix ThermoFisher Cat# 11752050
Deposited Data
Raw RNA-seq data files NCBI Gene Expression This paper GSE74283
J2 fibroblast feeder cells Fuchs Lab N/A
Primary Mouse Keratinocyte Cell Lines Fuchs Lab N/A
C57BL6 Mice Jackson Laboratories JAX: 000664
Aged C57BL6 Mice NIA N/A
K14Cre Fuchs Lab N/A
Stat3fl/fl Jackson Laboratories JAX:016923
FVB/NJ Jackson Laboratories JAX:001800
FVB/NTac Taconic Biosciences N/A
Non/ShiLtj Jackson Laboratories JAX:002423
Recombinant DNA
pLKO.1 TRC Cloning Vector Moffat et al., 2006 Addgene# 10878
TRC Mouse Genome shRNA Library Dharmacon Cat# RMM4013
See Table S7 N/A
Fiji (ImageJ) https://fiji.sc/ N/A
Adobe Photoshop and Illustrator CS5 Adobe.com N/A
R https://www.r-project.org/ N/A
Bowtie2 Trapnell et. al 2012; http://bowtie-bio.sourceforge.net/bowtie2/
index.shtml

Continued
CummeRbund package in R http://compbio.mit.edu/ N/A
cummeRbund/
Cufflinks Trapnell et. al 2012; http://cole-trapnell-lab.github.io/cufflinks/
Gene Set Enrichment Analysis (GSEA) http://software.broadinstitute.org/gsea/
index.jsp
MEME software suite (including TomTom) Bailey et al., 2009 http://meme-suite.org/doc/cite.html?
man_type=web
FACS Diva software BD Biosciences N/A
FlowJo Software FlowJo N/A
Prism Graphpad N/A
Other
Sterile 1.5 mm, 2 mm, 4 mm and 6 mm Biopsy punch Integra Cat# 33-31A, 33-31, 33-34, 33-36
(Miltex)
Spectrum Tissue Pulverizer Fisher Scientific Cat# 189476
Further information and requests for reagents may be directed to, and will be fulfilled by the Lead Contact: Elaine Fuchs (fuchslb@
rockefeller.edu).
Mice and Wounding Experiments

Aged (22–24) months female C57BL6 mice were obtained from the National Institutes of Aging (NIA). We specified to receive mice
with ‘‘good hair coats’’ to avoid animals with clear signs dermatitis, fighting, scratching and inflammation. Animals with visible
neoplasia were discarded. Young (2–4 months of aged) C57BL6 mice were obtained from Jackson Laboratories. Stat3 cKO mice
were obtained by crossing Stat3floxed animals from Jackson Laboratories (Stock No:016923) to K14-Cre/Rosa26-YFP (Fuchs Lab)
animals. FVBjax (FVB/NJ Stock No:001800) and NON/ShiLtJ (Stock No: 002423) mice were obtained from Jackson Laboratories
and FVBtac (FVB/NTac) from Taconic Laboratories. All mice were maintained in an AAALAC-approved facility at The Rockefeller
University. Procedures were performed using IACUC-approved protocols that adhere to NIH standards.
METHOD DETAILS
Wounding Study
Punch biopsies were performed on anaesthetized mice in the telogen phase of the hair cycle. For backskin wounds, dorsal hairs were
cut with clippers and skin was swabbed with EtOH prior to wounding. 6 mm biopsy punches (Miltex) were used to make full-thickness
wounds. For ear punch biopsies, animals were anesthetized and a 2 mm biopsy was used to punch a through-and-through wound
(hole) in the center of each ear. After wounding, tissue was collected at 0.5d, 1d and 3d after wounding for immunostaining (details
below). Depilation was performed as described (Keyes et al., 2013). For EdU pulse experiments mice were injected intraperitoneally
(50 mg/g) (Sigma-Aldrich) at specified intervals before collection.
Histology and Immunofluorescence

Backskin tissue was embedded in OCT compound (Tissue Tek) and frozen on dry ice, and cryo-sectioned (10–12 mm section
thickness). Sections were fixed in 4% paraformaldehyde, rinsed with PBS, permeabilized 10 min with 0.1% Triton X-100 (Sigma)
in PBS, and then blocked for 1 hr in blocking buffer (2.5% normal donkey serum, 2.5% normal goat serum, 1% BSA, 1% Gelatin,
0.3% Triton X-100). Primary antibodies (and their dilutions) used were as follows: K5 (guinea pig, 1:500, Fuchs lab); K14 (rabbit,
1:1,000, Fuchs lab), K17 (rabbit, 1:1,000, Fuchs lab), K10 (rabbit, 1:1,000, Covance), K6 (guinea pig, 1:2,000, Fuchs lab), Cleaved
Caspase-3 (Rabbit, R&D Systems, 1:1,000), gdTCR (Armenian Hamster, BioLegend, 1:100), pSTAT3 (Rabbit, Cell Signaling,
1:100), CD104 (Rat, BioLegend, 1:500), APC TCR Vg3 (Syrian Hamster, Biolegend, 1:200), AlexaFluor-488 CD3 (rat, Biolegend,
1:200). Primary antibodies were diluted in blocking buffer and incubated at 4 C overnight. After washing with PBS, secondary anti-
bodies, conjugated with Alexa488, Alexa546, or Alexa647 (Jackson ImmunoResearch), were added for 1–3 hr at room temperature
(RT). Slides were washed with PBS, counterstained with 4’6’-diamidino-2-phenylindole (DAPI) and mounted in Prolong Gold

(Invitrogen). EdU staining was performed using Click-iT EdU Alexa Fluor Imaging Kit (Life Technologies) per manufacturer’s instruc-
tions. Wound images were acquired with an AxioOberver.Z1 epifluorescence microscope equipped with a Hamamatsu ORCA-ER
camera and an ApoTome.2 (Carl Zeiss) slider. Tiled and stitched images of wounds were collected using a 20X objective, controlled
by Zen software (Carl Zeiss).
For whole-mount epidermal sheet preparations of wounded skin, skin was excised around the wound, fat was scraped away with a
scalpel and treated with 3.8% ammonium thiocyanate (Sigma) for 30 min at 37 C. The epidermis was separated with forceps from
dermal tissue manually. The epidermis was then fixed in 4% paraformaldehyde overnight and stained as above with gdTCR
(BioLegend, Armenian Hamster, 1:100) antibody. Tissue was mounted onto slides and counterstained with 4’6’-diamidino-20 -phenyl-
indole (DAPI) and mounted in Prolong Gold (Invitrogen). Imaging was performed on a Zeiss Axioplan2 using a Plan-Apochromat
20X/0.8 air objective. Images presented are maximum projections of a z series of images. For analysis of DETC dendrites, maximum
projection images were used to count dendrites on DETCs in wounded and unwounded control skin. For whole-mount ear-epidermal
preparations, ears were split laterally, then incubated in 3.8% ammonium thiocyanate for 30 min at 37 C before ear epidermis were
separated from dermis. Epidermal sheets were then fixed in 4% paraformaldehyde for 1 hr at room temperature, before proceed for
gdTCR immunostaining. Imaging was performed on a Zeiss Axioplan2 using a Plan-Apochromat 20X/0.8 air objective. Images pre-
sented are maximum projections of a z series of images.
For semi-thin sectioning and staining, samples from backskins were fixed in 2% glutaraldehyde, 4% paraformaldehyde and 2 mM
CaCl2 in 0.05 M sodium cacodylate buffer, pH 7.2, at room temperature for > 1 hr. Samples were postfixed in 1% osmium tetroxide,
and processed for Epon embedding. Semi-thin sections (800 nm) were cut and stained with toluidine and examined under bright-field
Zeiss Axioplan2 microscope. Figures were prepared using ImageJ, Adobe Photoshop and Illustrator CS5.
Cell Culture
Young and aged basal cell keratinocytes were FACS isolated from animals and plated on mitomycin-C treated J2 fibroblast feeder
cells to establish primary cell lines. Independent clones were cultured and passaged in E-media supplemented with 15% serum and a
final concentration of 0.3 mM Ca2+ for 3 passages and then moved to feeder-free cell culture (Rheinwald and Green, 1975). For colony
forming efficiency assays, viability of epithelial keratinocytes was determined using trypan blue (Sigma) staining on a hemocytometer
after FACS-isolation. Equal numbers of live cells were plated, in triplicate, onto mitomycin-C treated dermal fibroblasts in E-media
supplemented with 15% serum and 0.3 mM Ca2+. After 14 days in culture, cells were fixed and stained with 1% Rhodamine B
(Sigma). Colony diameter was measured from scanned images of plates using ImageJ and colony numbers were counted. For
IL-6 treatment experiments, keratinocytes were serum starved for 24 hr then treated with recombinant mIL-6 (R&D Systems) at
10ng/ml for indicated time-points. Cells were collected directly in Trizol (Invitrogen) and RNA was extracted for qRT-PCR (see below).
Cell adhesion and cell spreading assays were performed as described previously (Humphries, 2009). Wells were coated using
10 mg/ml human plasma fibronectin (Milipore), 40 mg/ml rat tail collagen-I (Corning), 0.1% (w/v) poly-L-lysine (Sigma), and 1 mg/ml
BSA (Sigma) for 1 hr at room temperature, washed with PBS 3-times and used in cell adhesion assays.
For scratch migration assays, keratinocytes were plated on 6-well tissue culture dishes and allowed to reached confluency.
Scratches were then created by manual scraping of the cell monolayer with a pipette tip. The dishes were then washed with PBS,
replenished with E media supplemented with 1 mM HEPES, and photographed for periods of 25–36 hr in 5% CO2 on a PerkinElmer
Volocity spinning disk system equipped with a heated enclosure and gas mixer (Solent) and 20X/0.75 CFI Plan-Apo objective. Indi-
vidual keratinocytes migration was manually tracked using ImageJ software. Transwell migration assays were performed in 6-well
plates (Corning). The bottom of each well was coated with 10 mg/ml fibronectin and fibroblast-conditioned E-media containing
0.3 mM Ca2+ was added. Young and aged keratinocytes were serum starved for 24 hr, and a total of 20,000 cells/well were plated
in serum-free medium containing 0.3 mM Ca2+. At time-points indicated cells were washed off the top membrane and then cells were
fixed to the bottom membrane. Cells were stained using hemotoxylin and eosin and counted under the microscope.
For explants assays, backskin tissue was harvested and hair was removed with Nair and washed with PBS. Subcutaneous fat was
gently removed with a scalpel. Explants were cut out using a 1.5 mm dermal biopsy punch (Miltex), then placed on fibronectin coated
24 well tissue culture dishes and secured to bottom of dish with 1–2 mL Matrigel (Corning), and submerged in E-media containing
0.3 mM Ca2+. Outgrowths from explants were imaged at indicated time-points and analyzed with ImageJ. For explants treated
IL-6, 2 mm biopsy punches were used to cut out explants and treated with 10ng/ml and 50 ng/ml of IL-6 in E-media containing
0.3 mM Ca2+. Images were taken at indicated time-points and outgrowths measured using ImageJ.
RNA-Seq and Analysis

FACS isolated keratinocytes were sorted directly into TRI Reagent (Sigma). Three animals were pooled per condition and all exper-
iments were performed in duplicate. RNA was purified using Direct-zol RNA MiniPrep kit (Zymo Research) per manufacturer’s in-
structions. Quality of the RNA for sequencing was determined using Agilent 2100 Bioanalyzer, all samples used had RNA integrity
numbers (RIN) > 8. Library preparation using Illumina TrueSeq mRNA sample preparation kit was performed at the Weill Cornell Med-
ical College Genomic Core facility, and RNAs were sequenced on Illumina HiSeq 2000 machines. Alignment of reads was done using
Tophat with the mm9 build of the mouse genome. Transcript assembly and differential expression was determined using Cufflinks
with Refseq mRNAs to guide assembly (Trapnell et al., 2010). Analysis of RNA-seq data was done using the cummeRbund package
in R (Trapnell et al., 2012). Differentially regulated transcripts were used in Gene Set Enrichment Analysis (GSEA) to find enriched

functional GO annotations (Subramanian et al., 2005). MEME software suite (including TomTom) was used to identify enriched motifs
in Skint promoters, the JASPAR vertebrate database was as a source for consensus transcription binding site sequences (Bailey
et al., 2009).
Lentvirus Production and Injections

Production and concentration of lentivirus, as well as ultrasound-guided in utero injections, were performed as described elsewhere
(Beronja et al., 2010). shRNAs were obtained from the Broad Institute’s Mission TRC-1/2 mouse library.
Flow Cytometry
Preparation of adult mice backskins for isolation of keratinocytes and staining protocols were done as previously described (Nowak
and Fuchs, 2009). Briefly, subcutaneous fat was removed from skins with a scalpel, and skins were placed dermis side down on
trypsin (GIBCO) at 37 C for 45 min. Single-cell suspensions were obtained by scraping the skin to remove the epidermis and hair
follicles from the dermis. Cells were then filtered through 70 mm, followed by 40 mm strainers. Cell suspensions were incubated
with the appropriate antibodies for 30 min on ice. The following antibodies were used for FACS: a6-integrin (BD PharMingen),
CD34 (eBiosciences) and Sca-1 (eBiosciences). DAPI was used to exclude dead cells. Cell isolations were performed on FACS
Aria sorters running FACS Diva software (BD Biosciences). For EdU incorporation experiments, staining was performed using
Click-iT EdU Alexa Fluor 488 Flow Cytometry Kit (Life Technologies) per manufacturer’s instructions. FACS analyses were performed
using LSRII FACS Analyzers and results were analyzed with FlowJo software.
For analysis of immune cells at the wound site, wound tissue was isolated from the backskin, keeping margins as close to wound as
possible. Tissue was minced in media (RPMI with L-glutamine, Sodium pyruvate, acid free HEPES, Penicillin and streptomycin) then
Liberase TL (Roche) was added (25 g/ml) and tissue was digested for 90 min at 37 C while shaking gently. The digest reaction was
stopped by addition of 20 ml of 0.5 M EDTA and 1 ml of 10% DNase solution. Cells were passed through a 70 mm strainer and stained
with the following antibodies from eBiosciences: Ly6c-FITC 1:100, Ly6g-PE 1:200, CD11c-PECy7 1:150, CD11b-PacBlue 1:300,
MHCII-AF700 1:300, CD45-A780 1:100, CD64-PerCP-Cy5 1:200, TCRb-PCRP 1:200, gdTCR-APC 1:400. Dead cells were excluded
using a LIVE/DEAD Fixable Blue Dead Cell Stain Kit (Molecular Probes), for UV excitation. FACS analyses were performed using LSRII
FACS Analyzers and results were analyzed with FlowJo software.
RT-qPCR
RNA was purified from FACS sorted cells by directly sorting into TrizolLS (Invitrogen) and purified using Direct-zol RNA MiniPrep kit
(Zymo Research). Equivalent amounts of RNA were reverse-transcribed by SuperScript VILO cDNA Synthesis Kit (Invitrogen). cDNAs
were normalized to equal amounts using primers against b-actin. cDNAs were mixed with indicated primers and Power SYBR Green
PCR Master Mix (Applied Biosystems), and quantitative PCR (qPCR) was performed on a Applied Biosystems 7900HT Fast Real-
Time PCR system. Primer sequences for RT-PCR were obtained from Roche Universal ProbeLibrary.
For Vg5 qPCR, unwounded and wounded skin was incubated in 50 mM EDTA for 1 hr, to separate epidermis was separated from
dermis. Epidermal cells were immediately frozen in liquid nitrogen. Frozen tissues were homogenized using Bessman Tissue Pulver-
izer (SpectrumTM) and collected in Trizol (Invitrogen). RNA was extracted using Direct-zol RNA MiniPrep kit (Zymo Research) per
manufacturer’s instructions. Equivalent amounts of RNA were reverse-transcribed by SuperScript VILO cDNA Synthesis Kit
(Invitrogen). cDNAs were mixed with indicated primers and Power SYBR Green PCR Master Mix (AppliedBiosystems), and quanti-
tative PCR (qPCR) was performed on a Applied Biosystems 7900HT Fast Real-Time PCR system.
Student’s t test was used to determine the significance between two groups with Prism5 software. Box-and-whisker plots are used to
describe the entire population without assumptions about the statistical distribution. Error bars plotted on graphs denote SEM. For all
statistical tests, the 0.05 level of confidence was accepted as a significant difference.
Data and Software Availability

RNA-seq data have been submitted to the NCBI-GEO under the accession number GEO: GSE74283.

Figure S1. Wound Closure in Young and Aged Animals, Related to Figure 2
(A) Representative examples of skin from young (2–4 months) and aged (22–24 months) mice subjected to 6 mm punch biopsy. Representative wounds shown at
the indicated time-points after wounding.
(B) Area of wound over time-course measured from images. n = 5.
(C) One phase exponential decay modeling of wound closure in young and aged animals by the equation: N(t) = N0e-t/T. Exponential decay constant (l) T = 1/l.
Span is the difference between the initial size of the wound and the plateau of wound closure. Fit of curves are significantly different (Mann-Whitney test,
p < 0.0001).
Figure S2. Epidermal Response to Wounding, Related to Figure 2
(A) Sagittal images of young and aged backskin after punch biopsy at indicated time-points. Immunostaining for K17 and b4-integrin (CD104) antibodies as
indicated by color-coded secondary antibodies. Scale bar, 200 mm. ‘‘S’’ denotes scab. n = 5.
(B) Quantification of the distance away from the wound site where upregulation of K17 can be observed in epidermal keratinocytes. Data are represented as
mean ± SEM.
Figure S3. In Vitro Keratinocyte Assays, Related to Figure 3
(A) Quantification of migration in Boyden chamber assay of young and aged keratinocytes after 12 or 24 hr after seeding. Migration is expressed as the percentage
of cells that reach the bottom of chamber versus the total number of cells seeded in the upper chamber.
(B) Cell adhesion in vitro. FACS isolated basal epidermal keratinocytes from young and aged skins were assayed for their ability to attach to tissue culture dishes
coated with the different substrates indicated (fibronectin, collagen I, Poly-Lysine, and bovine serum albumin). Representative images of young and aged
keratinocytes on fibronectin coated plate after adhesion assay. Plot of cells bound to fibronectin coated plates at 10, 30 and 60 min time-points. n = 6.
(C) Quantification of cell adhesion to different substrates indicated.
(D) Quantification of cell spreading (area of the cell attached to dish) after cell adhesion assay. n = 6. (E) Colony forming efficiency of young and aged keratinocytes
isolated by FACS. n = 15.
(F) Quantification of colony number and size of colonies for in vitro growth assays. Data are represented as mean ± SEM.
Figure S4. Global Transcriptional Analysis in Young and Aged Keratinocytes, Related to Figure 4
(A) Schematic of keratinocyte isolation from wound edge and FACS isolation strategy.
(B) RNA-seq data from isolated keratinocytes for epithelial markers Krt14/Krt16, Krt5, Trp63, Itga6, Klf5 and markers for endothelial cells (Cd31), immune cells
(Cd45), fibroblasts (Cd140a), and melanocytes (Sox10).
(C) Heatmap of unsupervised hierarchal clustering of RNA-sequencing data after k-means clustering of transcripts into 25 groups. Young wd = keratinocytes
isolated from the wound edge of a young mouse; aged wd = keratinocytes isolated from the wound edge of an aged mouse, young epi = young keratinocytes
under homeostatic conditions, aged epi = aged keratinocytes under homeostatic conditions.
(D) Venn Diagrams comparing genes from young keratinocytes versus young wounded keratinocytes genes up (500 genes) and down (1679 genes) regulated (Yg,
young), and aged keratinocytes versus aged wounded keratinocytes genes up (236 genes) and down (328 genes) regulated (Ag, aged) by folds indicated.
Overlapping regions denote genes similarly regulated in both young and aged wounded skin.
(E) Dot plot of leading edge results in GSEA. Each dot represents a GO term grouped into the 8 core functional categories. The size of the dot denotes GO
terms -log FDR value.
A B γδTCR DAPI
Vγ5Vδ1(TCR) mRNA levels
Young
Aged
C Young Aged D
γδTCR
E γδTCR DAPI
d1 d3 d5 d7
Young
Aged
Dendtrites on DETC within

F 200μm of ear wound edge
n.s. p=0.04
100
dendrites/
75 DETC
Percent of DETCs
5
4
50 3
2
1
25 0
0
d1
d1
d3
d3
g
ed
ed
un
un
ag
ag
yo
yo
Figure S5. DETCs in Young and Aged Skin, Related to Figure 5

(A) qPCR for Vg5 before and after wounding in skins of young and aged mice. Relative expression levels are normalized to the unwounded states.
(B) Representative images of DETCs in young and aged epidermis during normal homeostasis in backskin from mice when hair follicles are in the resting (telogen)
phase. DETCs are immunolabeled for gd T cell receptor (gdTCR) in red, DAPI in gray. Scale bar, 200 mm.
(C) Representative whole-mount epidermal sheet preparations of young and aged backskin stained for gdTCR in unwounded skin.
(D) Flow cytometry analysis of DETC numbers in young and aged skin at homeostasis. n = 5.
(E) Images of the wound edge in young and aged animals immunostained for gdTCR (DETCs) in red at time-points post-wounding indicated. n = 5.
(F) Quantification of dendrites per DETC 200 mm from the wound edge from earskin whole mounts at post-wound time-points indicated. n = 2. Data are rep-
resented as mean ± SEM.
Figure S6. Lentiviral Knock-Down of Skints in Epidermal Keratinocytes In Vivo, Related to Figure 6
(A) Table of shRNAs used for Skints knockdown. All shRNAs and nomenclatures are from the TRC mouse lentiviral library (Sigma).
(B) Stick diagram of lentivirus construct used for selective delivery of shRNA expression to the skin epidermis (Beronja et al., 2010). Representative image of
transduced sections of adult skin (H2B-RFP+) after wounding mice, which were infected at E9.5 while in utero with the lentivirus.
(C) Quantification of Skint gene knockdown with independent shRNA constructs relative to Scrambled control.
(D) Quantification of closure of 6 mm biopsy punches in animals with individual shRNA hairpins.
(E) Quantification of the number of dendrites on individual DETCs in wounded epidermis from animals which as E9.5 embryos were transduced with the individual
shRNA, as noted.
(F) RT-qPCR for Skint1, Skint3 and Skint9 in backskin from C57BL6 young and aged mice, FVBjax and FVBtac mouse strains in unwounded skin. Data presented
as relative mRNA expression.
(G) Immunofluorescence images of wounded skin d5 post-wounding in NON/ShiLtJ and CD-1 animals. Sections are immunolabeled for epidermal keratinocytes
(K14) and co-stained with DAPI. ‘‘S’’ denotes scab and yellow arrows denote wound edge. Statistical significance shown for NON/ShiLtJ versus FVBjax, FVBtac.
(H) Whole-mount epidermal sheet preparations of unwounded skin immunostained for gdTCR in C57BL6 and FVBjax skin. Inset shows enlarged image of DETCs,
DAPI in blue. Scale bar, 100 mm.
(I) Quantification of DETC numbers and dendrites per DETC form whole-mount images.
(J) Quantification of DETC numbers from earskin wounding study.
(K) Images of wounded skin in C57BL6 and FVBjax animals immunostained for gdTCR (DETCs) in red at the wound edge. Inset, from area in yellow dashed lines,
shows enlarged image of DETCs. Scale bar, 200 mm. Quantification of DETC numbers within 700 mm of the wound edge to the right.
(L) Skin sections from FVBjax and FVBtac immunostained with CD3 (pan T cell marker) and Vg5 antibodies. FVBtac animals lack gd T cell populations. Data are
represented as mean ± SEM.
A B
2
MEME motif analysis
GAGTT AGG CAGCCTGG CTA A AG

Skint p-value:
bits
motif 1 1
3.2e-66
-4000 0 0 ATAC GAAG
CC
T
G
G TA A
C
C
G
T
TGT
AATC TG
A
T
T
ATAGTCGTT A AG
CG
AC
C
A
T
GA
TT
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
2
T T T T AT T
C CC GCC CTGCC CCT G GC GGGATTA AGG
motif 1
p-value:
bits
1
motif 2
motif 2 T A A 3.1e-50
0 TC A
G
T
A
TT T T A T CG T C T CCCGC
GG C
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
motif 3 2
G TGCTG GAA T AACTC GG CCTCTG AAGAGCA T GT TCTT
bits
motif 3 1 p-value:
C motif 1 motif 2 motif 3 0
G
T
A
T
G
C
A TGC T
G CT G
T
A A T CA
C T
TTCACA
A A
T
A C TA
G
C
A GA T
GGC
T
C
A
GACA AC
T
G
G
A
T
C
C
T
G C T A
4.3e-49
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Tcfap2c Nr5a2 Msx2 Nfatc2
Nkx3 Nr3c1 Ar Stat3
Gfi1b Tcp1 Meis2 Meis1 D
Err1 Esr1 Hnf4a Gfi1b 2
STAT3
Arid3a Foxa1 Meis1 Stat4
T C GGAA
bits
binding site 1
Gabpa Zfp423 Nr2f1 Sfpi1
2 0
C
T
G CTAAT
TG A
G
Nfya Hnf4g
T
A
Dux Esr2
G
A A A
T
10
11
Tal1
Gata1
Ehf
Err2
Bcl6
Pknox1
Meis3
Nr3c1
motif 3
bits
1
0
G
T
A G TGCTG GAA T AACTC GG CCTCTG AAGAGCA T GT TCTT
T
G
C
A TGC T
G
T
A A T
TG
CC C
A TTCACA
A
T
A
T
A C TA
G
C
A GA T
GGGAC
T
C
A
C
AC
T
G
A
G
A
C
TC
G T A
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
E2F6 Stat5a/b
Stat6
E Hrs: 0 0.5 2 6
DAPI, K5
pSTAT3
pSTAT3
unwounded skin H
F Young Aged Stat3 Het Stat3 cKO
pSTAT3, DAPI
γδTCR DAPI
K5, CD104,
pSTAT3
γδTCR
G WT K14 DAPI Stat3 cKO

d1
S S
post-wound
S
d3
S
d5
S
d7
Figure S7. STAT3 Regulation of Skint Genes and Response to STAT3 Induction by Epidermal Keratinocytes In Vitro, Related to Figure 7
(A) Diagram of the Skint locus used for promoter analysis. Diagram (below) shows location of each of the 3 motifs found across all 11 Skint promoters.
(B) Top three conserved motifs found in promoter regions ( 4,000 bp from start codon) in Skint family of genes.
(C) Transcription factors found by TomTom to have binding sites in each motif. Transcription factors highlighted in red were found to be expressed with a
FPKM > 1 in our RNA-seq data.

(D) Consensus STAT3 binding motif (above) and conserved DNA motif in Skint promoters identified by MEME (below).
(E) Images of keratinocytes in vitro treated with IL-6 for indicated time-points, fixed and stained with antibodies to K5 (red), STAT3 (green) and DAPI (blue).
Quantification of pSTAT3 positive cells before and after IL-6 treatment to the right. Yg = young. Chi-square test p < 0.0001.
(F) Image of young and aged telogen backskin during normal homeostasis immunolabeled for K5 (keratin 5), b4-intergrin (CD104), and pSTAT3, co-stained
with DAPI.
(G) Immunofluorescence images of WT and Stat3cKO skin taken after wounding the mice at indicated time-points. Sections are immunolabeled for K14 and co-
stained with DAPI. ‘‘S’’ denotes scab and yellow arrows denote wound edge. n = 4.
(H) Images of wounded skin in WT and Stat3cKO animals immunostained for gdTCR (DETCs) in red at the wound edge. Yellow arrows denote wound edge. Scale
bar, 100 mm. n = 4. Data are represented as mean ± SEM.
Article
A Dietary Fiber-Deprived Gut Microbiota Degrades

the Colonic Mucus Barrier and Enhances Pathogen
Susceptibility
Mahesh S. Desai, Anna M. Seekatz,
Nicole M. Koropatkin, ...,
Thaddeus S. Stappenbeck,
Gabriel Núñez, Eric C. Martens
Correspondence
mahesh.desai@lih.lu (M.S.D.),
emartens@umich.edu (E.C.M.)
In Brief
Regular consumption of dietary fiber
helps prevent erosion of the intestinal
mucus barrier by the gut microbiome,
blunting pathogen infection and reducing
the incidence of colitis.
Highlights
d Characterized synthetic bacterial communities enable
functional insights in vivo
d Low-fiber diet promotes expansion and activity of colonic

mucus-degrading bacteria
d Purified prebiotic fibers do not alleviate degradation of the

mucus layer
d Fiber-deprived gut microbiota promotes aggressive colitis

by an enteric pathogen
Desai et al., 2016, Cell 167, 1339–1353

Article
A Dietary Fiber-Deprived Gut Microbiota

Degrades the Colonic Mucus Barrier
and Enhances Pathogen Susceptibility
Mahesh S. Desai,1,2,3,7,* Anna M. Seekatz,2 Nicole M. Koropatkin,2 Nobuhiko Kamada,2 Christina A. Hickey,4
Mathis Wolter,3 Nicholas A. Pudlo,2 Sho Kitamoto,2 Nicolas Terrapon,5 Arnaud Muller,6 Vincent B. Young,2
Bernard Henrissat,5 Paul Wilmes,1 Thaddeus S. Stappenbeck,4 Gabriel Núñez,2 and Eric C. Martens2,8,*
1Luxembourg Centre for Systems Biomedicine, Esch-sur-Alzette 4362, Luxembourg
2University of Michigan Medical School, Ann Arbor, MI 48109, USA
3Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette 4354, Luxembourg
4Washington University School of Medicine, St. Louis, MO 63110, USA
5Aix-Marseille Université, UMR 7257, 13288 Marseille, France
6Department of Oncology, Luxembourg Institute of Health, Luxembourg 1526, Luxembourg
7Present address: Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette 4354, Luxembourg
8Lead Contact
*Correspondence: mahesh.desai@lih.lu (M.S.D.), emartens@umich.edu (E.C.M.)

SUMMARY gory that includes a broad array of polysaccharides that are not
digestible by human enzymes—has also drawn it into the spot-
Despite the accepted health benefits of consuming light: it provides an important substrate to the community of
dietary fiber, little is known about the mechanisms microbes (microbiota) that inhabits the distal gut (Sonnenburg
by which fiber deprivation impacts the gut micro- and Sonnenburg, 2014). Unlike humans, who produce 17
biota and alters disease risk. Using a gnotobiotic gastrointestinal enzymes to digest mostly starch, our gut micro-
mouse model, in which animals were colonized with biota produces thousands of complementary enzymes with
diverse specificities, enabling them to depolymerize and ferment
a synthetic human gut microbiota composed of fully
dietary polysaccharides into host-absorbable short-chain fatty
sequenced commensal bacteria, we elucidated the
acids (SCFAs) (El Kaoutari et al., 2013). Thus, the physiology of
functional interactions between dietary fiber, the the gut microbiota is geared toward dietary polysaccharide
gut microbiota, and the colonic mucus barrier, which metabolism. At present, relatively little is known about how a
serves as a primary defense against enteric patho- fiber-deprived gut microbiota fulfils its energy demands and
gens. We show that during chronic or intermittent how low fiber-induced microbiota changes impact our health.
dietary fiber deficiency, the gut microbiota resorts Apart from dietary fiber, an alternative energy source for the
to host-secreted mucus glycoproteins as a nutrient microbiota is the glycoprotein-rich mucus layer that overlies
source, leading to erosion of the colonic mucus the gut epithelium as a first line of defense against both
barrier. Dietary fiber deprivation, together with a commensal microbes and invading pathogens (Johansson
fiber-deprived, mucus-eroding microbiota, pro- et al., 2013; McGuckin et al., 2011). The colonic mucus layer is
a dynamic and chemically complex barrier composed largely
motes greater epithelial access and lethal colitis by
of secreted mucin-2 glycoprotein (MUC2) (Johansson et al.,
the mucosal pathogen, Citrobacter rodentium. Our
2008). Goblet cells secrete MUC2 as a disulfide cross-linked
work reveals intricate pathways linking diet, the gut network that expands to form an inner layer, which is tightly
microbiome, and intestinal barrier dysfunction, adherent to the epithelium and is poorly colonized by
which could be exploited to improve health using commensal bacteria. As bacterial and host enzymes continu-
dietary therapeutics. ously hydrolyze the luminal edge of this layer, a looser outer layer
is formed that supports a more dense and metabolically distinct
INTRODUCTION community (Li et al., 2015). A key nutritional aspect of the mucus
layer for gut bacteria is its high polysaccharide content, with up
The diet of industrialized nations has experienced a decrease in to 80% of the mucin biomass being composed of mostly
fiber intake, which for many is now well below the recommended O-linked glycans (Johansson et al., 2013). However, only a
daily range of 28 35 g for adults, and this deficit has been linked distinct subset of gut microbiota species has evolved the capac-
to several diseases (Burkitt et al., 1972; Sonnenburg and ity to utilize this nutrient source (Hoskins and Boulding, 1981;
Sonnenburg, 2014). Fiber provides direct physical benefits, Png et al., 2010).
including increased fecal bulking and laxation (Burkitt et al., The direct impact of fiber polysaccharides on the microbiota,
1972). However, another feature of dietary fiber—a nutrient cate- combined with the ability of at least one nutritional generalist
(Bacteroides thetaiotaomicron) to shift from dietary polysaccha- complexity of the gut microbiota is a barrier to deriving detailed
rides to mucus glycan metabolism in the absence of fiber (Son- conclusions because sequence-based approaches (16S rRNA
nenburg et al., 2005), suggests a connection between diet and gene and meta-genomics/-transcriptomics) suffer from sub-
the status of the colonic mucus barrier. Indeed, three previous stantial functional uncertainty. Thus, to test our hypothesis that
reports have correlated reduced dietary fiber with thinner colonic specific members within a fiber-deprived gut microbiota cause
mucus (Brownlee et al., 2003; Earle et al., 2015; Hedemann et al., damage by increasingly foraging for nutrients in the protective
2009). Nevertheless, the underlying mechanisms with respect to mucus layer, we designed a synthetic microbiota (SM) contain-
involvement of the microbiota and, perhaps more importantly, ing 14 species of fully sequenced commensal human gut bac-
consequences for the host remain largely unknown. Such knowl- teria (Figure 1A). The selected species were chosen to represent
edge is important as it could provide explanations for why devi- the five dominant phyla and collectively possess important core
ations or imbalances in gut microbial community membership metabolic capabilities (Figure S1A).
and physiology (‘‘dysbiosis’’) correlate with several negative To provide an additional layer of functional knowledge about
health outcomes, including pathogen susceptibility, inflamma- complex carbohydrate metabolism, we pre-evaluated our 14
tory bowel disease (IBD), and colon cancer (Cameron and Sper- species for growth in vitro on a panel of 42 plant- and animal-
andio, 2015; Flint et al., 2012; McKenney and Pamer, 2015). derived mono- and polysaccharides, including purified mucin
Finally, such knowledge could inform therapeutic and preventa- O-glycans (MOGs) as sole carbon sources (Martens et al.,
tive strategies to correct these conditions. 2011). These growth assays allowed us to determine that all
The integrity of the mucus layer is critical for health. Genetic major groups of dietary fiber and host mucosal polysaccharides
ablation of Muc2 in mice brings bacteria into close contact could be used by one or more strains in our community as well as
with the epithelium, leading to inflammation and colon cancer which bacteria target each glycan (Figures 1A, S1A, and S1B;
(Van der Sluis et al., 2006). Additional studies have implicated Table S1). It is evident that the four mucin-degrading species
reduced or abnormal mucus production or O-glycosylation in fall into two categories: mucin specialists (A. muciniphila and
the development of intestinal inflammation (Fu et al., 2011; Lars- B. intestinihominis), which only grow on MOGs as a sole poly-
son et al., 2011) and penetration of commensal bacteria in the saccharide source, and mucin generalists (B. thetaiotaomicron
inner mucus layer in murine models of colitis and ulcerative and B. caccae), which each grow on several other polysaccha-
colitis patients (Johansson et al., 2014). Moreover, the mucus rides. Overall, our choice of species is physiologically and
barrier—a reservoir of antimicrobial peptides and immunoglobu- ecologically representative of the more complex native gut mi-
lins—is the first structure that a mucosal pathogen must crobiota. Because our community is composed of bacteria
overcome to establish an infection (McGuckin et al., 2011). with determined carbohydrate metabolic abilities, it allows us
Given that the status of the mucus layer is precariously balanced to address our central hypothesis in more precise, mechanistic
between replenishment by goblet cells and degradation by gut detail.
bacteria, we hypothesized that a fiber-deprived microbiota To develop a gnotobiotic model, we assembled the SM in
would progressively forage on this barrier, leading to inflamma- germfree mice, which were fed a standard fiber-rich (FR) labora-
tion and/or increased pathogen susceptibility. tory diet that contains 15% dietary fiber from minimally pro-
We aimed to investigate the mechanistic connections be- cessed grains and plants (Figures 1B and 1C). Colonized animals
tween chronic or intermittent dietary fiber deprivation on micro- were maintained on the FR diet for 14 days to monitor reproduc-
biota composition and physiology as well as the resulting effects ibility and stability of community assembly (Figure 1B). All of the
on the mucus barrier. To create a model that facilitates functional introduced species persisted in each mouse between 6 and 54
interpretation, we assembled a synthetic gut microbiota from or 66 days of colonization depending on the length of the exper-
fully sequenced human gut bacteria in gnotobiotic mice. In the iment (n = 37 total, two independent experiments; analyzed by
face of reduced dietary fiber, we examined changes in commu- both 16S rRNA sequencing [Table S2] and qPCR approaches
nity physiology and susceptibility to Citrobacter rodentium, a [Table S3]). Individual mice exhibited reproducible SM assembly
murine pathogen that models human enteric E. coli infection irrespective of caging, mouse gender, experimental replicate, or
(Collins et al., 2014). We demonstrate that a microbiota deprived method of analysis (Figure S2; Tables S2 and S3). In addition to
of dietary fiber damages the colonic mucus barrier and promotes 29 germfree control animals, a total of four different gnotobiotic
pathogen susceptibility. Our findings suggest a mechanism colonization experiments (51 SM-colonized mice in total; exper-
through which diet alters the activity of the gut microbiota iments 1 4) were performed according to the timeline shown in
and impacts health, which is important prerequisite knowledge Figure 1B.
for rationally designing future dietary interventions and
therapeutics. Both Chronic and Intermittent Fiber Deficiency
Promotes Enrichment of Mucus-Degrading Bacteria
RESULTS Although dietary changes are known to perturb microbiota
composition, the impact of diet variation, especially chronic or
A Synthetic Human Gut Microbiota with Versatile Fiber intermittent fiber deficiency, on the activities and abundance of
Polysaccharide Degrading Capacity mucin-degrading bacterial communities has not been studied
Diet changes are known to rapidly affect the composition of the in functional detail. After validating stable SM colonization, three
microbiota in humans and rodents (David et al., 2014; Faith et al., groups of mice were maintained by constant feeding of one of
2011; McNulty et al., 2013; Rey et al., 2013). However, the full three different diets: fiber-rich (FR), fiber-free (FF), or prebiotic
1340 Cell 167, 1339–1353, November 17, 2016

A B
Microbial community Microbial community
Synthetic human gut microbiota (SM) dynamics dynamics
s
on
ke hia ofa um en
Ak eric aer ios exig i
i
C vinb cte cta s s
ch l l a ym a itz
13 species included here
b ria te e icr
i
a r b a re a l i i n
la
Es n s e s form usn
Eu ebu la in cca om
M cali um tin om
+ Desulfovibrio piger Gavages with SM
hi
an li ns
40 days
b t
R es es tai s
ip
a
a
e ri tes ih
rn id the mi
rm c cie
lo r riu le
os ie ca ot
o l iu ia r
in
si HS
ct ide uni s
C trid ant m p
Fa acte in stin
(D, days)
Ba ero s for
Ba tero es atu
uc
(Proteobacteria)
m
c id ov
o
a
Ba tero es
m
s y
c id
l
Fecal samples Fecal samples
Ba tero
D1+2+3 D14 12 days
li
13 days
c
Ba
Mucus O-glycans
Pullulan Germ-free Fiber-rich (FR) diet
Glycogen
Amylopectin (potato)
mice Confirm
Amylopectin (maize) microbial Fiber-free (FF) diet
Inulin colonization
Levan Prebiotic (Pre) diet
Heparin (qPCR)
Hyaluronan
Polysaccharides
Chondroitin sulfate
Polygalacturonate FR/FF diets daily change
Rhamnogalacturonan I Normalized growth:
Pectic galactan (potato)
1.0-0.8
Pectic galactan (lupin)
0.8-0.6 FR/FF diets 4-day change
Arabinogalactan
Arabinan 0.6-0.4
Oat spelt xylan 0.4-0.2 Pre/FF diets daily change
Arabinoxylan (wheat) 0.2-0.0
Galactomannan No growth
Glucomannan Pre/FF diets 4-day change
Xyloglucan
β-glucan
Cellobiose Host responses
Laminarin Readouts
Lichenin Microbial responses
Dextran
α-mannan
Arabinose C
Fructose
Fucose
Galactose FR diet FF diet Pre diet
Monosaccharides
Galacturonic acid Glucose

Glucuronic acid Starch and simple
Glucosamine sugars Cellulose
Glucose
Mannose (non-degradable
N-acetylglucosamine
Fiber (corn, soybean,
wheat, oat & alfalfa) by SM)
N-acetylgalactosamine
N-acetylneuraminic acid Fru
Glc Protein Prebiotic mix
Rhamnose
Ribose 100 g/kg Protein
Fat
Xylose 150 g/kg 21 g/kg Fat
es es ria ria ia Corn starch
et icu
t te te ob
Bacterial phyla
ro
id
ri m b ac bac icr
te o o o m
c F tin ote uc
Ba Ac Pr err
V
Figure 1. Carbohydrate Utilization by the Synthetic Human Gut Microbiota Members and Gnotobiotic Mouse Treatments
(A) Heatmap showing normalized growth values of 13/14 synthetic human gut microbiota (SM) members.
(B) Schematic of the gnotobiotic mouse model illustrating the timeline of colonization, feeding strategies, and fecal sampling.
(C) Compositions of the three distinct diets employed in this study (common additives such as vitamins and minerals are not shown). The prebiotic mix contained
equal proportions of 14 host indigestible polysaccharides (see Table S1).
See also Figure S1.
(Pre). In contrast to the FR diet that contained naturally milled a similar effect on community composition as the FF diet but
food ingredients with intact fiber particles, the Pre diet was separated slightly by PCoA ordination from FF, likely due to
designed to study the effect of adding a mixture of purified, increased Bacteroides abundance (Figures 2A and 2B). Intrigu-
soluble glycans, similar to those used in prebiotic formulations ingly, the abundances of the same four bacteria noted above
(Figure 2C). To imitate the fact that the human diet experiences fluctuated rapidly on a daily basis when the FR and FF diets
fluctuating amounts of fiber from meal-to-meal, four other were oscillated (Figures 2C and S3), corroborating their ability
groups were alternated between the FR and FF or Pre and FF to respond dynamically to variations in dietary fiber. The increase
diets on a daily or 4-day basis (Figure 1B). in mucin-degrading species observed in fecal samples matched
Fecal microbial community dynamics showed that in mice with cecal abundances at the end of the experiment (Figure 2D
switched to the FF diet, several species rapidly and reproducibly and panels to the right of plots in Figure 2A). Moreover, similar
changed in abundance (Figures 2A, 2B and S3). Four species— levels of mucin-degrading bacteria were quantified in the colonic
A. muciniphila, B. caccae, B. ovatus, and E. rectale—were highly lumen and mucus layer using laser capture microdissection (Fig-
responsive to diet change. A. muciniphila and B. caccae are able ure 2E), indicating that proliferation of mucin-degrading bacteria
to degrade MOGs in vitro. B. ovatus and E. rectale cannot in this model is a community-wide effect and not limited just to
metabolize MOGs, but together can use a broad range of poly- the mucus layer.
saccharides found in dietary fiber (Figure 1A). In the absence Many of the other bacteria (except R. intestinalis and B. intes-
of fiber, the abundance of A. muciniphila and B. caccae tinihominis) were sensitive to changes between the FR and FF
increased rapidly with a corresponding decrease of the fiber-de- diets on daily and 4-day bases, albeit to lower degrees (Fig-
grading species (Figure 2A). The Pre diet, which contains purified ure S3B; Table S2). Two additional species especially sensitive
polysaccharides and is otherwise isocaloric with the FF diet, had to diet change were Desulfovibrio piger (increased on FF diet)
Cell 167, 1339–1353, November 17, 2016 1341

ts
ts
sc A
ts
sc A
sc A
rip
Tr rRN
rip
Tr rRN
16 cal
16 cal
16 cal
rip
Tr r R N
Ce
Ce
Ce
an
S
an
S
an
S
A Fiber-rich (FR) diet Fiber-free (FF) diet Prebiotic (Pre) diet
100
Synthetic microbiota (SM)
A. muciniphila
Fecal and cecal bacteria
(relative % abundance)
C. aerofaciens
80
D. piger
E. coli HS
60
F. prausnitzii
R. intestinalis
M. formatexigens
40
C. symbiosum
E. rectale
B. intestinihominis
20
B. caccae
B. uniformis
0
B. ovatus
B. thetaiotaomicron
FR diet 6 13 16 19 22 25 42 45 48 51 54 6 13 16 19 22 25 42 45 48 51 54 6 13 16 19 22 25 42 45 48 51 54
Days
B C
0.4
Fecal bacteria A. muciniphila B. caccae

Fiber free (FF)
10
Fiber rich (FR) ****** ** ***** * ********** * *******
* ** * * **** **
20
Prebiotic (Pre) *
0.2
5
1-day FR/FF
PCoA 2 (9%)
10
0
0.0
-5
0
-10
Change in fecal bacteria
-10
-0.2
8 13 17 21 25 42 46 50 54 -15 8 13 17 21 25 42 46 50 54
-0.4
E. rectale B. ovatus
10
* ***** *** ******* ** *** * ****** ****

-0.4 -0.2 0.0 0.2 0.4
PCoA 1 (75%) * ** *
10
**
5
D
Cecal mucus-degrading bacteria
80
-5
0
60
-10
-5
40
-15
-10
20
8 13 17 21 25 42 46 50 54 8 13 17 21 25 42 46 50 54
0 Days Days
Fiber-rich (FR)
FR
FF
pre
F
F
F
R/F
R/F
re/F
re/F
Fiber-free (FF) 1-day FR/FF group feeding on 1-day FR/FF group feeding on
fiber-rich (FR) diet fiber-free (FF) diet
ay F
ay F
ay P
ay P
1-day FR/FF
1-d
4-d
1-d
4-d
60
Colonic mucus- and fiber-
E 50 ns
FR Lumen
40 FF Mucus
degrading bacteria
ns
30
ns
20
ns
ns
10 ns ns
ns
0
A. municiphila B. caccae E. rectale B. ovatus
Colon Colonic section Lumen Mucus (mucus) (mucus) (fiber) (fiber)
Figure 2. Complex Dietary Fiber Deficiency Leads to Proliferation of Mucus-Degrading Bacteria

(A) Stream plots exhibiting fecal (over time, Figure 1B) and cecal (end point) microbial community dynamics and average abundance of total species-specific
transcripts from cecal RNA-seq transcriptome mapping at the endpoint; for transcript abundance n = 3 mice/group.
(B) Principal coordinate analysis (PCoA) based on bacterial community similarity.
(C) Changes in relative bacterial abundance over time in mice oscillated for 1-day increments between FR and FF feeding. Changes in FR and FF control groups
are shown for comparison. Asterisks (colored according to the dietary group) indicate a statistically significant difference in the change of relative abundance from
the previous day within each group. Student’s t test.
1342 Cell 167, 1339–1353, November 17, 2016

and Marvinbryantia formatexigens (decreased on FF diet) (Fig- zymes was primarily contributed by B. caccae, an organism
ure S3A). Population changes in the groups oscillated between that possesses 16 of these genes compared to just four in
Pre and FF diets were similar to the abundances observed in A. muciniphila and five in all other species combined. This ob-
FF only diet regimen (Table S2). Thus, despite the pure polysac- servation suggests the tantalizing possibility that B. caccae is
charides contained in the Pre diet exerting a clear physiological particularly equipped via its M60-like proteases to perform a
impact on the microbiota (discussed below), the amount and key degradative step (cleavage of glycoprotein backbones)
composition of the purified polysaccharides in this diet exert little during mucin foraging and this ability may facilitate access to
effect on species composition. mucus carbohydrate structures by other bacteria.
In the group fed the Pre diet, similar transcripts as in the FR-fed
Community Transcriptional and Enzymatic Readouts group were elevated relative to FF, albeit to lower levels (Fig-
Demonstrate Enhanced Degradation of Mucus When ure 3A bottom histogram). Furthermore, transcripts for the
Fiber Is Absent same bacterial enzymes presumed to target mucus in the
Because mucin-degrading bacteria were higher on both the FF FR/FF diet comparison were observed in the Pre/FF comparison.
and Pre diets that lack naturally complex plant fiber, we Additional RNA sequencing (RNA-seq) analyses of cecal tran-
reasoned that this increased abundance is due to their ability scriptomes from mice oscillated between FR/FF and Pre/FF on
to degrade mucus as an alternative nutrient. To test this, we a daily basis (collected after 1 day on FF diet) provided similar
measured changes in transcripts encoding carbohydrate active results to those obtained for the FF only diet mice (Table S4).
enzymes (CAZymes) that enable gut bacteria to utilize dietary fi- Our transcriptomic readouts corroborate the increased abun-
ber and mucosal polysaccharides (cecal samples from all but the dance of mucin-degrading bacteria observed in these mice (Fig-
4-day oscillation groups were analyzed). ure 2D) and demonstrate that even intermittent fiber deficiency
Based on new or existing genome annotations of the 14 spe- has the potential to alter the microbiota and favor mucin-degrad-
cies in our synthetic community, a total of 1,661 different degra- ing species.
dative CAZymes belonging to 96 different families were detected To further connect the in vivo responses of B. caccae and
(glycoside hydrolase [GH], polysaccharide lyase [PL], and carbo- A. muciniphila with degradation of mucin O-glycans, we
hydrate esterase [CE] families were counted). This number is performed additional transcriptional profiling of these two spe-
close to the total number of families (122) that was identified in cies on purified MOGs from porcine gastric mucus. We have
a larger survey of 177 human gut bacterial reference genomes previously shown that this mixture contains 110 different
(El Kaoutari et al., 2013), indicating that our synthetic community structures (Hickey et al., 2015) that when metabolized by
retains much of the metabolic potential toward carbohydrates B. thetaiotaomicron stimulate a transcriptional response that
that is present in a more diverse microbiota. Of the 96 enzyme overlaps substantially with genes expressed in vivo under fi-
families in our community, members of 38 families, plus M60- ber-restricted conditions (Martens et al., 2008). During growth
like proteases (pfam13402), a group of enzymes previously on MOGs as the sole carbon source, B. caccae and
shown to degrade mucin glycoproteins (Nakjang et al., 2012), A. muciniphila activated expression of 82 and 58 genes, respec-
showed variable expression in either FR/FF or Pre/FF community tively (Table S5). Based on a recalculation using a 5-fold cutoff of
transcriptome comparisons (Figure 3A). These differentially previous microarray data from growth in the same substrate,
abundant degradative enzymes mapped to 770 different genes B. thetaiotaomicron activated expression of 166 genes (Martens
contributed from all species except D. piger (Table S4). et al., 2008). Next, we examined expression of these validated
Our transcriptomic data show that in the mice fed the FR diet, O-glycan-responsive genes (for B. caccae, A. muciniphila, and
transcripts belonging to enzyme families that target dietary fiber B. thetaiotaomicron) in the SM community from FF-fed mice
polysaccharides were more abundant. In contrast, in mice fed compared to FR. In support of our hypothesis, validated
the FF diet, transcripts encoding enzyme families known to B. caccae and A. muciniphila O-glycan-responsive genes were
release sugars from host substrates, including mucin O-glycans, increased in the FF condition (Figure 3B). B. caccae expression
were elevated (Figure 3A). In line with our in vitro growth assays was increased irrespective of normalization by reads mapped
(Figure 1A) and the microbial community abundance data to the whole community (i.e., including increased B. caccae
(Figure 2), we found that B. ovatus and E. rectale contributed a abundance) or to just the B. caccae genome (discounts abun-
majority of CAZymes specific for plant polysaccharides dance change and examines changes in expression).
(Figure 3A). The four in vitro mucin degraders were the major Consistent with its specialization for O-glycans, A. muciniphila
contributors to the degradation of host glycans and mucus mostly showed increased expression of O-glycan-responsive
in vivo. Additionally, transcripts encoding M60-like proteases genes proportional to its increased population size (from
(pfam13402) were also more highly expressed in FF conditions 20% to 40%), indicating that it does not shift its substrate
(Figure 3A). Expression of these putative mucin-targeting en- utilization in comparison to the FR diet (Figure 3B; see also
(D) Additive relative abundances of four mucus-degrading bacteria (Figure 1A).

(E) Relative bacterial abundances in laser capture microdissected colonic lumen and mucus samples (images displayed on left). n = 3 mice/group.
Microbial community abundance data are based on Illumina sequencing of 16S rRNA genes (V4 region) and median values at each time point are shown; error
bars in (D) and (E) denote interquartile ranges (IQRs). Unless specified, significance was determined using Kruskal-Wallis test and n = 4 for FR and FF groups, n = 3
for all other groups. All data in (A) (E) are from experiment 1.
See also Figures S2 and S3 and Tables S2 and S3.
Cell 167, 1339–1353, November 17, 2016 1343

A l
ha
Xy
R
β-
21 L- Substrate target colors (text over bars):

α-
19
cellulose & other β-glucans, hemicelluloses
pectins
17 starch and storage glycans
fungal cell wall mannan
yl
-X
ac a
l-G ra
A, - A r
yl
15 may target multiple polysaccharides
-A
Fiber rich (FR)/Fiber free (FF)
αG ha A
et
L
-L
a l
-G α-
host O- and N-linked glycans
,α
,
yl
al
13
hy
al
ac , βX
R
βG lA
βG
et
L-
a
m
α-
lu
yl
Fold change
11 et
al
Ac
al yl
al
αG l-X
βG
αG , βG
N
αM , αG βM al
9
lc
y
βG
et
βG
a al, an
al
ac
αG βX ra,
a,
A
Ar
-A
7
lc
yl
αM A
yl
L-
βG α-L
αG
al
βX
lu yl
βX NA , α-
et yl
lu l
αG
βX n
αG -Xy
βG βX
a
ac ha
,
ac l-X
,
,
lu al
βG n
yl
an
lu
lc
5
lc yl
lc
βG αG
yl
R
yl c
y
,
βG βX
βM
al
αG ha
al
lu
et
L-
αG
βG
βG
R
α-
,
,
lu
L-
an
3
α-
lc c
c
lc c
uc c
αM
βG NA
αN NA
βG NA
m NA
in
eu
al
1
αG
-1
Fiber targeting enzymes
-3
Mucus targeting
enzymes
-5
G 1
9
G 8
E6
E7
G 1
5
G 4
G H9
G 3
11
G 0
G 1
PL 8
42
C 1
53
G 8
G 6
43
G 8
94
G 6
36
30
78
97
G 3
G 0
G 3
85
13 4
6
G 5
G 5
G 9
2
PL
PL
E
E
H
H
H
1
E1
3
9
2
7
7
1
am 8
10
11
10
12
10
40
PL
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
pf GH
C
C
C
G
G
G
H
H
G
G
13 Species color codes (bar colors):
A. muciniphila R. intestinalis B. intestinihominis
11
C. aerofaciens M. formatexigens B. caccae
9 E. coli C. symbiosum B. uniformis
Prebiotic (Pre)/FF
7 F. prausnitzii E. rectale B. ovatus

Fold change
B. thetaiotaomicron
5
-1
-3
-5
G 1
9
G 8
E7
G 1
E6
G 3
5
G 4
G H9
11
G 0
G 1
78
97
G 3
G 0
G 3
85
13 4
PL 8
42
C 1
53
G 8
G 6
43
G 8
G 4
G 6
36
30
6
G 5
G 5
G 9
2
PL
PL
E
H
H
H
1
E1
7
1
am 8
9
2
7
3
9
2
10
11
10
12
10
40
PL
H
H
H
H
H
H
H
pf GH
H
H
H
H
H
H
H
H
H
C
C
C
C
G
G
G
G
H
H
G
G
B C D FR
7 FF/FR, normalized FR 1-day FR/FF
to community FF
FF 4-day FR/FF
mucin O-glycan (MOG)-specific transcripts
FF/FR, normalized 2.0

6 * to single species Pre Butyrate
256 140 ns
(mmol per gram of cecal contents)
1.5
Cecal microbial enzyme activity
5 128 120
(μmol/min/mg of protein)
1.0
64 *
100
SCFAs and OA
Fold change
4 0.5
32 *** ***
80
16 0.0
3 60
**
** * 8
ns p=0.0007
4 40
2
ns
9
20
0. 0
2 ns
p=
1 1 0
st)
an e
o-
yc e
te
e
te
in se
s) e
s)
e
yc as
ali
gl a s
at
e uc
) as
an a s
o-
at
at
ra
na
s)
e r ila st)
an
uc ta
gl sid
in
ca ne
n t s id
et
ct
ty
ta
in a s l g l
in id
yc id
ph iali
io
(m ulf a
st)
)
cc
0
a c ge
La
Ac
Bu
ini ec aio
uc os
nt o
gl os
op
la lo
uc i d t y
ali
Su
la u c
c s et
(p xy
(m a m a c e
S
(m -fuc
uc sp
nt ct
er
Pr
B . ucu
(p -gl
h
la la
β-
m us . t on gen
s -N-
(p -ga
β
α
. r
B ic us
n
( m A uc
i
)
β
(m m uc
(m
Figure 3. Diet-Specific Changes in Carbohydrate Active Enzyme Expression Reveal a Community Shift from Fiber to Mucus Degradation
(A) Positive and negative fold-changes in transcripts encoding carbohydrate active enzymes (CAZymes) between either FR/FF (top) or Pre/FF (bottom)
comparisons. Only CAZyme families (x axis) in which >2-fold changes and p < 0.05 (Student’s t test) were observed for all of the genes in that family in RPKM-
normalized cecal community transcriptomes are shown as averages; open circles denote statistically insignificant differences. n = 3 mice/group, experiment 1.
(B) Fold-change values of empirically validated (Table S5), MOG-specific transcripts of three mucus-degrading bacteria. n = 3 mice/group, experiment 1. Data are
shown as average and error bars represent SEM. Student’s t test.
1344 Cell 167, 1339–1353, November 17, 2016

Table S6E). The idea that B. caccae is capable of broader (Petersson et al., 2011; Wrzosek et al., 2013), which found that
transcriptional shifts, compared to A. muciniphila, is further microbial colonization or exposure to cues such as peptido-
supported by global analysis of its gene expression changes be- glycan or lipopolysaccharides is required for mucus production.
tween the FF and FR diets and 1-day alternations (Figure S4). In Notably, mucus in SM-colonized FF diet mice was five to six
response to the FF diet, B. caccae showed increased in vivo times thinner than colonized mice fed the FR diet. From these
expression of 230 genes, including 27 degradative enzymes; data, we conclude that the mucus layer: (1) is initially thinner in
whereas, A. muciniphila only showed increased expression of GF mice regardless of diet, (2) begins expanding upon microbial
43 genes, including two enzymes (Figure S4). In contrast to colonization, but (3) is disproportionately eroded back to a
previous monoassociation data (Sonnenburg et al., 2005), thinner layer due to the increased mucus foraging activity by
B. thetaiotaomicron mostly showed unchanged or slightly the microbiota in the context of the FF diet. A similar reduction
decreased expression of its known O-glycan utilization genes af- in mucus thickness was observed in the Pre diet and both
ter a shift to FF (Figure 3B; Table S6B). 4-day oscillation groups, while an intermediate thickness was
Colonic mucin O-glycans contain glycosidic linkages distinct observed in the 1-day FR/FF oscillation group (Figure 4C).
from plant fibers and also covalently linked sulfate. In further Next, we determined whether mucus production was altered
support of increased bacterial degradation of the host mucus in colonized mice fed the FF diet. We examined the abundance
on the FF diet, we detected significantly increased bacterial en- of transcripts encoding several key proteins involved in building
zymes targeting mucin linkages (sulfatase and a-fucosidase) in and regulating the mucus barrier (Figure 4D): Muc2 and Muc5ac,
the mice subjected to the FF diet on either a chronic or intermit- two building blocks of colonic mucus; Tff1 and Tff3, goblet cell
tent basis (Figure 3C). In contrast, enzymes targeting linkages in proteins that promote mucosal repair and protection; and Klf3,
fiber polysaccharides (b-glucosidase) were significantly reduced a transcription factor involved in barrier function. Our results
in the mice fed the fiber-deficient diets, while others involved in show that the transcription of the major colonic mucin gene
xylan and a-galactan degradation trended similarly without sig- (Muc2) was slightly elevated in the colonized FF diet group, sug-
nificance (Figure 3C). Despite the dramatic change in microbiota gesting a compensatory response of the host to offset the
species abundance and transcriptional response, there was only increased bacterial mucus degradation in this group; whereas
one significant change (succinate) in SCFA and organic acids in other genes (Muc5ac, Tff1, Tff3, and Klf3) remained statistically
the FF diet fed mice (Figure 3D). Overall, the transcriptomic and unchanged (Figure 4D). Qualitative visualization by Alcian blue
enzyme analyses support the conclusion that a fiber-deprived (Figure 4A) staining supports the conclusion that the colonic tis-
gut microbiota synergistically and progressively expresses sue of FF-fed colonized mice contained similar numbers of
CAZymes, sulfatases, and proteases to attack mucus polysac- goblet cells that have yet to secrete their glycoproteins.
charides when the diet lacks complex plant fiber. As expected, degradation of the mucus layer by the fiber-
deprived gut microbiota brought luminal bacteria closer to the in-
Fiber Deprivation Leads the Gut Microbiota to Degrade testinal epithelium (Figure 4B, inset), which could potentially
the Colonic Mucus Barrier trigger deleterious effects or other host compensatory re-
The mucus layer is a dynamic barrier that is constantly replen- sponses. Histopathology (Figure S5A) and body weight mea-
ished through the secretory activity of goblet cells (Johansson surements over time (Figure S5B) of mice from the groups with
et al., 2013). We rationalized that if bacterial consumption of reduced mucus thickness did not reveal changes compared to
mucin-derived nutrients exceeds new production, the integrity the mice consuming the FR diet. However, measurements of
of this critical barrier could be compromised. To explore this three additional host parameters provided support for altered
possibility, we performed blinded thickness measurements of host responses in the face of mucus erosion: the first was fecal
the colonic mucus layer from proximal colon to rectum in each lipocalin—a neutrophil protein that binds bacterial siderophores
mouse using Alcian blue-stained sections (Figure 4A). We further and is associated with low-grade inflammation (Chassaing et al.,
validated thickness of the mucus layer by immunofluorescence 2015)—that was increased in the group of colonized mice fed the
staining of the Muc2 mucins using a-Muc2 antibody (Figure 4B). FF diet compared to those fed FR (Figure 4E). A second readout,
To address the possibility that variations in thickness are directly colon length, revealed shorter colons in colonized FF fed mice
influenced by the diets used, we measured mucus layer thick- and other SM colonized groups when compared to colonized
ness in germfree mice fed the FR or FF diets. FR fed mice or GF mice on either diet (Figure 4F). Additional anal-
Colonic mucus measurements revealed that mucus thickness ysis of host cecal tissue global transcriptional responses failed to
was highest in the colonized group fed the FR diet (Figure 4C). In reveal large-scale changes in the host; although, some compen-
most other groups, including germfree (GF) controls, mucus satory responses were suggested by pathway analysis, which
thickness was significantly thinner than in colonized FR mice. illuminated several immune responses as altered in the colo-
The observation that GF mice have thinner colonic mucus is nized FF fed mice (Figures 4G and S5C; Table S7). Collectively,
consistent with previous studies in gnotobiotic mice and rats the data described above indicate that fiber-restricted,
(C) Activities of cecal enzymes determined by employing p-nitrophenyl-linked substrates. n = 4 for FR and FF groups and n = 3 for other groups, experiment 1.
Data are shown as average and error bars represent SD. One-way ANOVA, FR diet group versus other groups.
(D) Concentrations of organic acid (OA, succinate) and short-chain fatty acids (SCFA) determined from cecal contents. n = 4 mice/group; 2 mice/dietary group in
two independent experiments (#2A and 3). Middle lines indicate average of the individual measurements shown and error bars represent SEM. Student’s t test.
See also Figure S4 and Tables S4, S5, and S6.
Cell 167, 1339–1353, November 17, 2016 1345

A Fiber-rich (FR) diet (with SM) Fiber-free (FF) diet (with SM) C 140
a SM: synthetic microbiota
120 Pre: Prebiotic
Thickness of inner
mucus layer (μm)
100 a
80
*a
60
b
40
b
† b
b b
B Muc2
20
DAPI D
0
Mice: 6 6 3 3 3 3 3 6 3
Measurements: 962 722 347 261 70 77 349 845 198
FF
et
SM t
S t
ith ie
ith ie
/F
/F
/F
F )
Pr M)
F e)
e)
m t
di
m t
er ie
er ie
(w R d
e/
(w F d
FR
FR
re
fre
(g R d
e
(g F d
Pr
Pr
f
F
F
da
da
da
da
1-
4-
1-
4-
12
D FR diet (with SM) E F ns
FF diet (with SM)
p=0.011 11 ns
***
Colon length (cm)

12 p=0.046 ns 100000
Cecal tissue transcript level
10 *
Lipocalin (pg per g feces)
10
p=0.055
10000
8 9 *
**
6 1000 8 ** ***
4 ns
ns 100 7
2
0 10 6
c
2
f1
f3
f3
e)
e)
ith et
5a
uc
)
ith et
da /FF
FF
et
FF
SM
SM
Tf
Tf
Kl
SM
SM
fre
/F
fre
(w di
(w di
M
uc
di
e/
e/
FR
FR
M
FR
ith
ith
e
FF
m
Pr
Pr
Pr
(w
(w
er
er
y
FR day
da
(g
(g
da
Mucus gene probed
et
et
1-
4-
et
et
1-
4-
di
di
di
di
FR
FF
FF
G
−0.5
−1.5
0.5
−1
1
Log2 fold change FF diet (with SM)/FR diet (with SM)

Abc 1gap
Actgn1a
Rnf3 14
Muc i1
Cd9 c1
Osm t2
Sta a4
Igf1 a1
Bcl6 1
Lim 1
Fn1 2
Cyth 4
Pm 4
Abl1 1
Siglep1
Clic 1
Fnbs1
Fer 1
Egr 2
t1
lg
Rapk1
Cd3 3
1
Tgfba
s2
Ighm1
Ras 9
Plxn 2
1
Pro 2
Par 2
Myh r
9
Axl 2
Casl1
3
F3 3
Clip 6
m
a
p
t1
t3
Saa3
Fur 1
a
in
Flnc1
p
4
Ccl14
s
1
Myl6
v
Rela
Mylk
Actn
Ets2
r
Slc2
Bcl2
Flnb
Flna
Myh
Gbp
Gab
Rab
Icos
Gsn
Nco
Soa
Saa
Scn
Dsp
Anx
Thb
Mm
l
Acs
Itgb
Ptg
Sta
Bcr
Vcl
Immune response of cells
Tumor necrosis factor (TNF) targets
Integrin-linked kinase (ILK) signaling
Figure 4. Microbiota-Mediated Erosion of the Colonic Mucus Barrier and Host Responses
(A) Alcian blue-stained colonic sections showing the mucus layer (arrows). Scale bars, 100 mm. Opposing black arrows with shafts delineate the mucus layer that
was measured and triangular arrowheads point to pre-secretory goblet cells.
(B) Immunofluorescence images of colonic thin sections stained with a-Muc2 antibody and DAPI. Opposing white arrows with shafts delineate the mucus layer.
Inset (FF diet group) shows a higher magnification of bacteria-sized, DAPI-stained particles in closer proximity to host epithelium and even crossing this barrier.
Scale bars, 100 mm; inset, 10 mm.
(C) Blinded colonic mucus layer measurements from Alcian blue-stained sections. Mice in the FR and FF fed colonized groups (experiments 1 and 2A), and in the
FR-diet fed germfree groups are from two independent experiments; all other colonized mice are from experiment 1. Asterisk and dagger indicate that colons of
only two and one mice contained fecal masses, respectively. Data are presented as average and error bars represent SEM. Statistically significant differences are
annotated with different letters p < 0.01. One-way ANOVA with Tukey’s test.
(D) Microarray-derived transcript levels of genes involved in the production of colonic mucus (n = 4 for the FR diet group and n = 3 for the FF diet group). Data are
from two independent experiments (#2A and 3). Values are shown as average and error bars represent SEM. Student’s t test.
(E) Levels of fecal lipocalin (LCN2) measured by ELISA in the FR and FF diet fed groups (day 50, Figure S6A; experiment 2A). n = 7 mice/group. Middle lines
indicate average of the individual measurements shown and error bars represent SEM. Mann-Whitney test.
(F) Colon lengths of mice subjected to different dietary treatments. Data for the FR (with SM) and FF (with SM) are representative of three independent exper-
iments (experiments 1, 2A, and 3). Middle lines indicate average of the individual measurements shown and error bars represent SEM. One-way ANOVA, FR diet
group (with SM) versus other groups.
1346 Cell 167, 1339–1353, November 17, 2016

colonized mice experience erosion of the mucus barrier and GF+Cr control groups. To determine if increased mucus produc-
some altered intestinal responses, albeit without overt signs of tion post infection could explain the lower disease observed in
disease. the GF+Cr groups, we measured thickness of the colonic mucus
and found that Cr triggered only a slight increase in GF mucus
A Fiber-Deprived Gut Microbiota Promotes Heightened thickness; whereas the thick mucus layer associated with the
Pathogen Susceptibility FR diet in the context of microbiota colonization persisted (Fig-
Because the mucus layer is a critical barrier against both ures S7A and S7B).
commensal microbes and invading pathogens, we next hypoth- Based on the above results, we further hypothesized that the
esized that the reduction in thickness associated with microbiota increased area of inflamed tissue in FF mice was due to earlier
activity during low-fiber conditions would increase pathogen and increased pathogen access due to the microbiota-degraded
susceptibility. To test this idea, we chose the attaching/effacing mucus layer. To test this idea, we infected the same four treat-
pathogen Citrobacter rodentium (Cr) because it must traverse ment groups (SM-colonized or GF mice, fed either FR or FF diets)
the mucus layer to access the epithelium and cause colitis with a luciferase-expressing Cr strain (Figure 6A). At 4 days post
(Collins et al., 2014). Toward this point, a previous study demon- infection, we sacrificed all mice and conducted bioluminescent
strated that mice genetically lacking the dominant colonic mucin imaging of the colons after flushing out the luminal contents. In
glycoprotein (Muc2 / ), but not wild-type mice, develop lethal support of our hypothesis, and despite having similar levels of
colitis following infection with Cr, highlighting that the mucus fecal Cr in FR- and FF-fed SM mice (Figure 6B), we saw signifi-
layer is an important initial barrier to this pathogen (Bergstrom cantly higher pathogen signal adherent to the colonic tissue of
et al., 2010). SM colonized mice fed the FF diet as compared to those fed
Therefore, we recreated the previously observed diet-modu- FR (Figures 6C and 6D). The higher levels of attached Cr in FF
lated thick and thin mucus layer phenotypes in gnotobiotic fed SM mice were further validated by transmission electron mi-
mice and infected both groups with Cr (Figure S6A). To control croscopy, revealing increased appearance of the attaching and
for diet-specific effects on Cr pathogenesis in the absence of effacing lesions, pedestals and loss of microvilli that is typically
our SM, we infected two additional groups consisting of germ- associated with Cr infection (Figure 6E). Notably, GF+Cr mice
free (GF) mice fed a priori (4 weeks before infection) the on either diet displayed similarly high adherent bacterial signal
same FR or FF diets. We collected fecal samples each day as the FF-fed SM mice (Figures 6D and S7C). Taken together,
post Cr infection to measure changes in pathogen colonization these results suggest that the pathogen can more quickly tra-
by both selective plating for Cr and 16S rRNA gene analysis verse the thin colonic mucus layers in GF mice (irrespective of
(Figures 5A, 5B, and S6A). In the two groups colonized by the diet) and SM-colonized mice fed the FF diet. However, the
synthetic microbiota, Cr levels gradually increased but were commensal microbiota is also required in the context of
significantly higher beginning at day 2 in mice fed the FF diet increased pathogen access to elicit more severe disease,
and remained 10-fold higher thereafter. possibly by provoking co-inflammatory responses.
The dramatic diet-specific increase of Cr levels in SM-colo-
nized mice fed the FF diet was accompanied by weight loss DISCUSSION
that was specific to this group (Figure 5C). Importantly, both
GF+Cr groups that had high pathogen levels (Figure 5A) failed The health benefits of fiber consumption have been purported
to exhibit similar weight loss, illuminating that the pathogen alone for decades, yet the influence of many different chemical and
is insufficient for this effect on either diet. The higher pathogen physical forms of fiber polysaccharides on the gut microbiota
burdens in colonized mice fed the FF diet were associated with and the ways through which gut bacteria digest, sequester,
multiple signs of morbidity such as hunched posture and inac- and share these chemically complex nutrients, are just now be-
tivity. Notably, by 10 days post infection, 60% of the mice from ing unraveled in detail (Cuskin et al., 2015; Rakoff-Nahoum et al.,
the SM colonized FF group had to be euthanized due to 2016). Aside from loss of beneficial SCFA production, micro-
R20% loss of body weight (Figure 5D). Mice in the other three biota-mediated mechanisms that connect low fiber intake to
groups did not show similar morbidity. poor gastrointestinal health have not been described. Using a
Histological scoring of the cecal and colonic tissue revealed gnotobiotic mouse model, our study provides a mechanism by
that the SM-colonized FF diet group experienced inflammation which a diet deficient in complex plant fiber triggers a synthetic
that covered a significantly more expansive surface area (Figures gut microbiota to feed on the colonic mucus layer that acts as a
5E–5G). An exception was the descending colon/rectum, which primary barrier against invading pathogens (Figure 7). Our find-
showed larger areas of inflamed tissue in both FR and FF groups; ings reveal important implications regarding how our immediate
although, the FF group was still significantly higher and 100% diet history may modify susceptibility to some enteric diseases.
affected. When the tissue was inflamed, the level of hyperplasia Our approach highlights the power of using a tractable syn-
was similar across all four groups (Figure S6B). Importantly, thetic human gut microbiota, in which the individual members
there were overall lower levels of inflamed tissue in both of our can be characterized or manipulated to support functional
(G) Changes in the host cecal transcriptome between FR and FF diet conditions. Heatmap shows statistically significant fold changes of genes identified from
ingenuity pathway analysis (false discovery rate [FDR] < 0.05 and absolute Log2 fold-change > 0.5). n = 4 for the FR diet group and n = 3 for the FF diet group; data
are from two independent experiments (#2A and 3).
Cell 167, 1339–1353, November 17, 2016 1347

A At day 0 Thick colonic mucus layer B
(other groups, thin colonic mucus) n = 5 mice/group
30
Fecal relative abundance (%)

ns 16S rRNA genes ns
Fiber-rich (FR) diet
11 ns ns ns ns ns
ns 25 (Illumina platform) (Synthetic microbiota
ns ns
C. rodentium (Cr)
C. rodentium (Cr)
ns (SM)+C. rodentium (Cr))

log10CFU/g feces
20
* Caging: 3 cages; 1, 1,
10 * and 3 mice
15 * Fiber-free (FF) diet
* 10 * ** **
*** *** **** **** **
(SM+Cr)
9
* *
ns Caging: 2 cages; 1 and
ns 4 mice
5
ns ** ns
8 0 FR diet (Cr only)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Caging: 2 cages; 2 and
Days post infection (dpi) dpi 3 mice
C D
105 100 FF diet (Cr only)
Caging: 1 and 4 mice
Weight change (%)
100 80
Survival (%) FR (SM+Cr)
95 60
FF (SM+Cr)
90 40 FR (Cr only)
** FF (Cr only)
85 * ** 20
80
** *** 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
dpi dpi
E F
Cecum Cecum
(10 dpi) (10 dpi) FR (SM+Cr) FF (SM+Cr)
FR (SM+Cr)
FF (SM+Cr)
G Cecum Ascending colon Rectum and descending colon

b b
100 100 100
% of affected tissue
a
80 80 80 FR diet (SM+Cr)
b
60 60 60 c FF diet (SM+Cr)
40 a 40 40 FR diet (Cr only)
(10 dpi)
a a a
20 a 20 20 d FF diet (Cr only)
a
0 0 0
Figure 5. Fiber-Deprived Gut Microbiota Contributes to Lethal Colitis by Citrobacter rodentium

(A) Fecal C. rodentium levels over time. Data are shown as average and error bars represent SEM. Student’s t test; FR (SM+Cr) group versus FF (SM+Cr) group
(bottom statistics labels) and FR (Cr) versus FF (Cr) (top statistics labels). Data in (A) (G) are from experiment 2B.
(B) Relative abundance of C. rodentium in fecal samples over time. Data are shown as median and error bars represent IQR. Wilcoxon test.
(C) Weight changes in the four groups of mice. Values are shown as average and error bars represent SEM. One-way ANOVA, FF diet group (with SM) versus other
groups.
(D) Survival curves for the four groups of mice. One-way ANOVA with Tukey’s test.
(E) Representative images of unflushed ceca after H&E staining highlighting major differences in hyperplasia (indicated with arrows in the FR group, where
hyperplasia is patchy and infrequent). Scale bars, 5 mm.
(F) Images of representative H&E-stained colonic thin sections depicting differences in hyperplasia between two groups. Scale bars, low power, 500 mm; high
power, 50 mm.
(G) Measurements of inflamed tissue area in different intestinal segments. n = 5 mice/group except that n = 4 mice were used for FF (SM+Cr) group. Values are
shown as mean and error bars represent SEM. Statistically significant differences are shown with letters within each intestinal segment; p < 0.0002. One-way
ANOVA with Tukey’s test.
1348 Cell 167, 1339–1353, November 17, 2016

A B 11
C. rodentium in feces (4 dpi)

Fiber-rich (FR) diet Fiber-free (FF) diet FR diet FF diet b
n = 4 mice n = 5 mice n = 5 mice n = 5 mice b
log10CFU/g feces
10
Thick colonic mucus Thin colonic mucus Thin colonic mucus Thin colonic mucus
(with SM) (with SM)) (germ free) (germ free) a
a
9
8
Infection with luciferase- Readouts; panels B–E;
carrying Citrobacter rodentium (Cr)
SM t
nl t
SM t
nl t
ith ie
4 days post infection (dpi)
r o ie
ith ie
r o ie
)
y)
y)
(w R d
(C R d
(w F d
(C F d
F
F
F
C Flushed colons (4 dpi) D *
2.5 FR diet (with SM) FF diet (with SM) *
ns
1010 ns ns
Luminescence intensity in
Radiance (p/sec/cm2/sr) x 107
(Radiance, p/sec/cm2/sr)
2.0 **
flushed colons (4 dpi)

109
1.5
108
1.0
107
SM t
nl t
SM t
nl t
ith ie
r o ie
ith ie
r o ie
)
)
0.5
y)
y)
(w R d
(C R d
(w F d
(C F d
F
F
F
Cage, male Cage, females Cage, males Cage, females
FR diet FF diet
E
Figure 6. Fiber-Deprived Gut Microbiota Promotes Faster C. rodentium Access to the Colonic Epithelium
(A) Experimental setup for luminescent C. rodentium experiment (experiment 4).
(B) Fecal burdens of C. rodentium at 4 dpi. Data are shown as averages and error bars represent SEM; statistically significant differences are shown with different
letters (p < 0.001). One-way ANOVA with Tukey’s test.
(C) Bioluminescence images of flushed colons showing the location and intensity of adherent C. rodentium colonization.
(D) Quantified bioluminescence intensities of C. rodentium from (C) and Figure S7C. Middle lines indicate average of the individual measurements shown.
Kruskal-Wallis one-way ANOVA with Dunn’s test.
(E) Transmission electron microscopy images of the representative colonic regions from flushed colons; arrowheads denote individual C. rodentium cells and ‘‘P’’
denotes epithelial pedestals in high power/FF image. Scale bars, low power views 10 mm and high power views 2 mm.
See also Figure S7.
interpretations. We demonstrate that fiber deficiency allows the precise catalytic roles. Here, we not only leverage knowledge
subset of mucin-degrading bacteria to increase their population of the substrate and enzyme specificities associated with
and express mucin-degrading CAZymes to access mucin as a some of the well-studied species in our SM (Table S4 and
nutrient. While the ability to annotate CAZyme functions is well references therein), but we also employ new in vitro growth
developed vis-a-vis many other metabolic functions that are and transcriptional profiling experiments for key mucus-degrad-
important in the microbiome (El Kaoutari et al., 2013), there are ing bacteria (B. caccae and A. muciniphila). Our results point out
still substantial ambiguities in connecting such predictions with a poignant example of how this evolving ‘‘bottom up’’ approach
Cell 167, 1339–1353, November 17, 2016 1349

FR/FF diet Figure 7. Model of How a Fiber-Deprived
Fiber-rich (FR) diet Fiber-free (FF) diet (no commensal microbiota) Gut Microbiota Mediates Degradation of
the Colonic Mucus Barrier and Heighted
Pathogen Susceptibility
Schemes derived from results shown in Figures 1,
2, 3, 4, 5, 6 illustrating the balance between fiber
degradation and mucus degradation in FR diet-fed
mice; whereas an FF diet leads to proliferation of
mucus-degrading bacteria and microbiota-me-
diated degradation of the colonic mucus layer.
The latter results in more severe colitis by
C. rodentium.
transit (Collins et al., 2014), our data

Mature mucus layer: Microbiota eroded mucus Immature barrier function
intact barrier function layer: barrier dysfunction illustrate that a fiber-deprived microbiota
has profound effects on the susceptibil-
Fiber-degrading Mucus-degrading Mucosal pathogen Bacterial dietary- Bacterial host-secreted
microbiota microbiota fiber degradation mucus degradation ity to a gastrointestinal pathogen via
reduction of this barrier. The contribution
of mucus degradation to heightened
pathogen susceptibility in our study is
can increasingly resolve functional resolution in complex surprisingly parallel to a previous report that found a similar
microbial systems: only very recently, the single B. ovatus level of lethal colitis in mice with a genetically ablated (Muc2 / )
GH98 enzyme that is increased in the FR diet (leftmost bar, mucus layer (Bergstrom et al., 2010). From this perspective, it is
Figure 3A, top) was shown unequivocally to be an endo-b-xylo- striking that a dietary alteration in wild-type mice can imitate
sidase (Rogowski et al., 2015). Prior to this finding, the GH98 the phenotype of a mutation as severe as Muc2 loss. Given
family was only known to contain blood group antigen-cleaving that Muc2 knockout mice experience inflammation and
endo-a-galactosidases—a function that could have confusingly eventual colorectal cancer, it is reasonable to conclude that
been associated with mucus O-glycan metabolism instead of its prolonged diet-driven mucus layer loss could result in similar
proper target, the plant fiber xylan. outcomes. In this context, it is worth noting that higher levels
Our results shed important light on the nature and amount of of mucolytic bacteria have been found in IBD patients (Png
fiber that is required for the health of the colonic mucus layer. et al., 2010). In light of our observations that mice subjected
The prebiotic diet, which contains purified soluble fibers that to intermittent (daily or 4-day) dietary fiber deprivation exhibit
are similar to common prebiotics (e.g., inulin, arabinoxylan, thinner mucus, it will be critical in future studies to investigate
b-glucan), could not mitigate microbial erosion of the mucus the impact of periodic fiber deprivation, which is more like
barrier, despite having a clear impact on the cecal community real human dietary habits, on the status of the mucus layer
transcriptome. Because the FR diet contains complex plant fiber and the many downstream health effects that may be con-
in its natural form (intact plant cell walls) and at higher concentra- nected to mucus barrier dysfunction.
tion (15% versus 10% in the Pre diet), we cannot determine Taken together, our findings support a model in which dy-
which variable (form or amount) is most important. However, namic interactions between dietary fiber and metabolism of a
the defined FF diet provides an ideal platform to which purified synthetic microbiota composed of commensal bacteria influ-
polysaccharides and even individual food items can be added ence the status of the colonic mucus layer and susceptibility to
to separately test both of these parameters for their ability to alle- pathogens that traverse this barrier (Figure 7). The current find-
viate mucus degradation. Such an approach could help to ings are likely applicable to gut microbial communities with
design dietary therapeutics and next-generation prebiotics and higher numbers of species: three previous studies (see also the
will be particularly powerful given our existing knowledge of Introduction) involving rats with native microbiota and mice
which polysaccharides the non-mucin-degrading species target with transplanted human gut microbiota found a correlation be-
(Figure 1A) and our ability to implant new species with other tween fiber-deficient diets and thinner colonic mucus layer.
defined functionalities in the SM. However, it remains to be investigated whether a thin colonic
The present work highlights that the gut microbiota plays mucus layer together with a complex microbiota would
significant positive and negative roles in the pathogenesis of contribute to enhanced pathogen susceptibility or how this
C. rodentium. Whereas we previously showed that the pres- effect might vary between individual microbial communities.
ence of a microbiota blocks colonization by C. rodentium un- Moreover, to understand whether microbial degradation of the
less it possesses virulence traits (Kamada et al., 2012), here, colonic mucus is required for its secretion by the host in order
we demonstrate that diet-specific modulation of the gut micro- to achieve a thicker mucus layer, future experiments need to
biota can facilitate pathogen colonization and a fiber-deprived address the effects on the mucus thickness after exclusion of
microbiota enhances disease susceptibility. Because the the four validated mucus degraders from our synthetic micro-
colonic mucus layer is an early barrier that a pathogen must biota. Finally, because the strains used here are of human origin,
1350 Cell 167, 1339–1353, November 17, 2016

and given the significant structural overlap between human and A.M.S., and E.C.M. analyzed data. M.S.D., A.M.S., C.A.H., and E.C.M. pre-
murine mucin (including glycosylation) (Johansson et al., 2013) pared figures. M.S.D. and E.C.M. primarily wrote and edited the manuscript.
M.S.D., N.M.K., and N.A.P. carried out bacterial in vitro growth assays.
and that C. rodentium uses similar pathogenesis mechanisms
G.N., N.K., and S.K. assisted with C. rodentium infection and luminescence
as human pathogenic E. coli strains, it is likely that such diet- experiments. M.W. assisted with mucus measurements. C.A.H. and T.S.S.
induced disease susceptibility would extend to humans. conducted blinded histology scoring. A.M. analyzed microarray data. N.T.
Because E. coli infections are associated with high morbidity and B.H. provided CAZy annotations. All authors discussed the results and
(Kaper et al., 2004) and health-care cost, our study emphasizes provided comments on the manuscript.
the need to consider a dietary perspective in fully understanding
their transmission. With this in mind, efforts to find the optimal ACKNOWLEDGMENTS
combinations of natural or prebiotic fiber polysaccharides and
We thank Lansing C. Hoskins for critical comments on this manuscript and the
the minimum intake required to restore the integrity and resil- germfree animal facility of the University of Michigan for expert support. We
ience of the colonic mucus layer should be paramount. also thank Markus Ollert and Rudi Balling for their encouragement and advice.
This work was supported by Luxembourg National Research Fund (FNR)
STAR+METHODS INTER Mobility (13/5624108) and CORE (C15/BM/10318186) grants to
M.S.D.; Luxembourg Ministry of Higher Education and Research support
(DM-Muc) to M.S.D.; FNR ATTRACT (A09/03), CORE (11/1186762), and Euro-
Detailed methods are provided in the online version of this paper pean Union Joint Programming in Neurodegenerative Diseases (INTER/JPND/
and include the following: 12/01) grants to P.W.; NIH R01 (GM099513) grant to E.C.M.; and financial sup-
port from the University of Michigan Host Microbiome Initiative and Center for
d KEY RESOURCES TABLE Gastrointestinal Research (DK034933).
d CONTACT FOR REAGENT AND RESOURCE SHARING
d EXPERIMENTAL MODEL AND SUBJECT DETAILS Received: May 13, 2016
B Gnotobiotic Mouse Model and Diet Treatments Revised: August 13, 2016
B Citrobacter rodentium Infection Accepted: October 21, 2016
B Formulation of the Synthetic Microbiota
d METHOD DETAILS REFERENCES
B Experimental Design
B Sample Processing for Animal Experiments Bergstrom, K.S.B., Kissoon-Singh, V., Gibson, D.L., Ma, C., Montero, M.,
B Purification of Mucin O-Glycans Sham, H.P., Ryz, N., Huang, T., Velcich, A., Finlay, B.B., et al. (2010). Muc2
B Bacterial Growth Assays in a Custom Carbohydrate protects against lethal infectious colitis by disassociating pathogenic and
commensal bacteria from the colonic mucosa. PLoS Pathog. 6, e1000902.
Array
B Citrobacter rodentium quantification Brownlee, I.A., Havler, M.E., Dettmar, P.W., Allen, A., and Pearson, J.P. (2003).
Colonic mucus: secretion and turnover in relation to dietary fibre intake. Proc.
B Extraction of Nucleic Acids
Nutr. Soc. 62, 245–249.
B Bioluminescence Imaging and Transmission Electron
Burkitt, D.P., Walker, A.R.P., and Painter, N.S. (1972). Effect of dietary fibre on
Microscopy stools and the transit-times, and its role in the causation of disease. Lancet 2,
B Laser Capture Microdissection 1408–1412.
B Illumina Sequencing and Data Analysis Cameron, E.A., and Sperandio, V. (2015). Frenemies: signaling and nutritional
B Microbial RNA-Seq and CAZyme Annotation integration in pathogen-microbiota-host interactions. Cell Host Microbe 18,
B p-Nitrophenyl Glycoside-Based Enzyme Assays 275–284.
B Thickness Measurements of the Colonic Mucus Layer Chassaing, B., Koren, O., Goodrich, J.K., Poole, A.C., Srinivasan, S., Ley, R.E.,
B qPCR and Gewirtz, A.T. (2015). Dietary emulsifiers impact the mouse gut microbiota
B Quantification of Short-Chain Fatty Acids promoting colitis and metabolic syndrome. Nature 519, 92–96.
B Immunofluorescence Staining Collins, J.W., Keeney, K.M., Crepin, V.F., Rathinam, V.A.K., Fitzgerald, K.A.,
B ELISA for Fecal Lipocalin Finlay, B.B., and Frankel, G. (2014). Citrobacter rodentium: infection, inflam-
mation and the microbiota. Nat. Rev. Microbiol. 12, 612–623.
B Tissue Histology
Cuskin, F., Lowe, E.C., Temple, M.J., Zhu, Y., Cameron, E.A., Pudlo, N.A.,
B Mouse Microarray Analyses
Porter, N.T., Urs, K., Thompson, A.J., Cartmell, A., et al. (2015). Human gut
d QUANTIFICATION AND STATISTICAL ANALYSIS Bacteroidetes can utilize yeast mannan through a selfish mechanism. Nature
B Statistical Analyses 517, 165–169.
d DATA AND SOFTWARE AVAILABILITY David, L.A., Maurice, C.F., Carmody, R.N., Gootenberg, D.B., Button, J.E.,
B Accession Numbers Wolfe, B.E., Ling, A.V., Devlin, A.S., Varma, Y., Fischbach, M.A., et al.
(2014). Diet rapidly and reproducibly alters the human gut microbiome. Nature
SUPPLEMENTAL INFORMATION 505, 559–563.
Earle, K.A., Billings, G., Sigal, M., Earle, K.A., Lichtman, J.S., Hansson,
Supplemental Information includes seven figures and seven tables and can be G.C., Elias, J.E., Amieva, M.R., Huang, K.C., and Sonnenburg, J.L.
found with this article online at http://dx.doi.org/10.1016/j.cell.2016.10.043. (2015). Quantitative imaging of gut microbiota spatial resource quantita-
tive imaging of gut microbiota spatial organization. Cell Host Microbe
AUTHOR CONTRIBUTIONS 18, 478–488.
Edgar, R.C., Haas, B.J., Clemente, J.C., Quince, C., and Knight, R. (2011).
M.S.D., E.C.M., P.W., T.S.S., and G.N. conceived the study. M.S.D. and UCHIME improves sensitivity and speed of chimera detection. Bioinformatics
E.C.M. designed the study. M.S.D. performed the experiments. M.S.D., 27, 2194–2200.
Cell 167, 1339–1353, November 17, 2016 1351

El Kaoutari, A., Armougom, F., Gordon, J.I., Raoult, D., and Henrissat, B. Li, H., Limenitakis, J.P., Fuhrer, T., Geuking, M.B., Lawson, M.A., Wyss, M.,
(2013). The abundance and variety of carbohydrate-active enzymes in the hu- Brugiroux, S., Keller, I., Macpherson, J.A., Rupp, S., et al. (2015). The outer
man gut microbiota. Nat. Rev. Microbiol. 11, 497–504. mucus layer hosts a distinct intestinal microbial niche. Nat. Commun. 6,
Faith, J.J., McNulty, N.P., Rey, F.E., and Gordon, J.I. (2011). Predicting a hu- 8292.
man gut microbiota’s response to diet in gnotobiotic mice. Science 333, Loubinoux, J., Bronowicki, J.-P., Pereira, I.A.C., Mougenel, J.-L., and
101–104. Faou, A.E. (2002). Sulfate-reducing bacteria in human feces and their as-
Flint, H.J., Scott, K.P., Louis, P., and Duncan, S.H. (2012). The role of the gut sociation with inflammatory bowel diseases. FEMS Microbiol. Ecol. 40,
microbiota in nutrition and health. Nat Rev Gastroenterol Hepatol. 9, 107–112.
577–589. Martens, E.C., Chiang, H.C., and Gordon, J.I. (2008). Mucosal glycan foraging
Fu, J., Wei, B., Wen, T., Johansson, M.E.V., Liu, X., Bradford, E., Thomsson, enhances fitness and transmission of a saccharolytic human gut bacterial
K.A., McGee, S., Mansour, L., Tong, M., et al. (2011). Loss of intestinal core symbiont. Cell Host Microbe 4, 447–457.
1-derived O-glycans causes spontaneous colitis in mice. J. Clin. Invest. 121, Martens, E.C., Lowe, E.C., Chiang, H., Pudlo, N.A., Wu, M., McNulty, N.P.,
1657–1666. Abbott, D.W., Henrissat, B., Gilbert, H.J., Bolam, D.N., and Gordon, J.I.
Gibson, G.R., Cummings, J.H., and Macfarlane, G.T. (1991). Growth and (2011). Recognition and degradation of plant cell wall polysaccharides by
activities of sulphate-reducing bacteria in gut contents of healthy subjects two human gut symbionts. PLoS Biol. 9, e1001221.
and patients with ulcerative colitis. FEMS Microbiol. Lett. 86, 103–112. McGuckin, M.A., Lindén, S.K., Sutton, P., and Florin, T.H. (2011). Mucin
Hedemann, M.S., Theil, P.K., and Bach Knudsen, K.E. (2009). The thickness of dynamics and enteric pathogens. Nat. Rev. Microbiol. 9, 265–278.
the intestinal mucous layer in the colon of rats fed various sources of non- McKenney, P.T., and Pamer, E.G. (2015). From hype to hope: the gut micro-
digestible carbohydrates is positively correlated with the pool of SCFA but biota in enteric infectious disease. Cell 163, 1326–1332.
negatively correlated with the proportion of butyric acid in digesta. Br. J.
McNulty, N.P., Wu, M., Erickson, A.R., Pan, C., Erickson, B.K., Martens, E.C.,
Nutr. 102, 117–125.
Pudlo, N.A., Muegge, B.D., Henrissat, B., Hettich, R.L., and Gordon, J.I.
Hickey, C.A., Kuhn, K.A., Donermeyer, D.L., Porter, N.T., Jin, C., Cameron, (2013). Effects of diet on resource utilization by a model human gut microbiota
E.A., Jung, H., Kaiko, G.E., Wegorzewska, M., Malvin, N.P., et al. (2015). containing Bacteroides cellulosilyticus WH2, a symbiont with an extensive
Colitogenic Bacteroides thetaiotaomicron antigens access host immune cells glycobiome. PLoS Biol. 11, e1001637.
in a sulfatase-dependent manner via outer membrane vesicles. Cell Host
Nakjang, S., Ndeh, D.A., Wipat, A., Bolam, D.N., and Hirt, R.P. (2012). A novel
Microbe 17, 672–680.
extracellular metallopeptidase domain shared by animal host-associated
Hoskins, L.C., and Boulding, E.T. (1981). Mucin degradation in human colon mutualistic and pathogenic microbes. PLoS ONE 7, e30287.
ecosystems. Evidence for the existence and role of bacterial subpopulations
producing glycosidases as extracellular enzymes. J. Clin. Invest. 67, Petersson, J., Schreiber, O., Hansson, G.C., Gendler, S.J., Velcich, A.,
163–172. Lundberg, J.O., Roos, S., Holm, L., and Phillipson, M. (2011). Importance
and regulation of the colonic mucus barrier in a mouse model of colitis. Am.
Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., J. Physiol. Gastrointest. Liver Physiol. 300, G327–G333.
Scherf, U., and Speed, T.P. (2003). Exploration, normalization, and summaries
of high density oligonucleotide array probe level data. Biostatistics 4, 249–264. Png, C.W., Lindén, S.K., Gilshenan, K.S., Zoetendal, E.G., McSweeney, C.S.,
Sly, L.I., McGuckin, M.A., and Florin, T.H.J. (2010). Mucolytic bacteria with
Johansson, M.E.V., and Hansson, G.C. (2012). Preservation of mucus in increased prevalence in IBD mucosa augment in vitro utilization of mucin by
histological sections, immunostaining of mucins in fixed tissue, and localiza- other bacteria. Am. J. Gastroenterol. 105, 2420–2428.
tion of bacteria with FISH. Methods Mol. Biol. 842, 229–235.
Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K.S., Manichanh, C., Nielsen,
Johansson, M.E.V., Phillipson, M., Petersson, J., Velcich, A., Holm, L., and
T., Pons, N., Levenez, F., Yamada, T., et al.; MetaHIT Consortium (2010).
Hansson, G.C. (2008). The inner of the two Muc2 mucin-dependent mucus
A human gut microbial gene catalogue established by metagenomic
layers in colon is devoid of bacteria. Proc. Natl. Acad. Sci. USA 105, 15064–
sequencing. Nature 464, 59–65.
15069.
Rakoff-Nahoum, S., Foster, K.R., and Comstock, L.E. (2016). The evolution of
Johansson, M.E.V., Sjövall, H., and Hansson, G.C. (2013). The gastrointestinal
cooperation within the gut microbiota. Nature 533, 255–259.
mucus system in health and disease. Nat. Rev. Gastroenterol. Hepatol. 10,
352–361. Rey, F.E., Gonzalez, M.D., Cheng, J., Wu, M., Ahern, P.P., and Gordon, J.I.
(2013). Metabolic niche of a prominent sulfate-reducing human gut bacterium.
Johansson, M.E.V., Gustafsson, J.K., Holmén-Larsson, J., Jabbar, K.S., Xia,
Proc. Natl. Acad. Sci. USA 110, 13582–13587.
L., Xu, H., Ghishan, F.K., Carvalho, F.A., Gewirtz, A.T., Sjövall, H., and
Hansson, G.C. (2014). Bacteria penetrate the normally impenetrable inner Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K.
colon mucus layer in both murine colitis models and patients with ulcerative (2015). limma powers differential expression analyses for RNA-sequencing
colitis. Gut 63, 281–291. and microarray studies. Nucleic Acids Res. 43, e47.
Kamada, N., Kim, Y.-G., Sham, H.P., Vallance, B.A., Puente, J.L., Rogowski, A., Briggs, J.A., Mortimer, J.C., Tryfona, T., Terrapon, N., Lowe,
Martens, E.C., and Núñez, G. (2012). Regulated virulence controls the E.C., Baslé, A., Morland, C., Day, A.M., Zheng, H., et al. (2015). Glycan
ability of a pathogen to compete with the gut microbiota. Science 336, complexity dictates microbial resource allocation in the large intestine. Nat.
1325–1329. Commun. 6, 7481.
Kaper, J.B., Nataro, J.P., and Mobley, H.L.T. (2004). Pathogenic Escherichia Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister,
coli. Nat. Rev. Microbiol. 2, 123–140. E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., et al.
(2009). Introducing mothur: open-source, platform-independent, community-
Kozich, J.J., Westcott, S.L., Baxter, N.T., Highlander, S.K., and Schloss, P.D.
supported software for describing and comparing microbial communities.
(2013). Development of a dual-index sequencing strategy and curation
Appl. Environ. Microbiol. 75, 7537–7541.
pipeline for analyzing amplicon sequence data on the MiSeq Illumina
sequencing platform. Appl. Environ. Microbiol. 79, 5112–5120. Schneider, C.A., Rasband, W.S., and Eliceiri, K.W. (2012). NIH Image to
Larsson, J.M.H., Karlsson, H., Crespo, J.G., Johansson, M.E.V., Eklund, L., ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675.
Sjövall, H., and Hansson, G.C. (2011). Altered O-glycosylation profile of Sonnenburg, E.D., and Sonnenburg, J.L. (2014). Starving our microbial
MUC2 mucin occurs in active ulcerative colitis and is associated with self: the deleterious consequences of a diet deficient in microbiota-accessible
increased inflammation. Inflamm. Bowel Dis. 17, 2299–2307. carbohydrates. Cell Metab. 20, 779–786.
1352 Cell 167, 1339–1353, November 17, 2016

Sonnenburg, J.L., Xu, J., Leip, D.D., Chen, C.-H., Westover, B.P., Weather- litis, indicating that MUC2 is critical for colonic protection. Gastroenterology
ford, J., Buhler, J.D., and Gordon, J.I. (2005). Glycan foraging in vivo by an 131, 117–129.
intestine-adapted bacterial symbiont. Science 307, 1955–1959. Wickham, H. (2011). The split-apply-combine strategy for data. J. Stat. Softw.
40, 1–29.
Sonnenburg, J.L., Chen, C.T.L., and Gordon, J.I. (2006). Genomic and meta-
bolic studies of the impact of probiotics on a model gut symbiont and host. Wrzosek, L., Miquel, S., Noordine, M.-L., Bouet, S., Joncquel Chevalier-Curt,
PLoS Biol. 12, e413. M., Robert, V., Philippe, C., Bridonneau, C., Cherbuy, C., Robbe-Masselot,
C., et al. (2013). Bacteroides thetaiotaomicron and Faecalibacterium praus-
Van der Sluis, M., De Koning, B.A., De Bruijn, A.C., Velcich, A., Meijerink, J.P., nitzii influence the production of mucus glycans and the development of
Van Goudoever, J.B., Büller, H.A., Dekker, J., Van Seuningen, I., Renes, I.B., goblet cells in the colonic epithelium of a gnotobiotic model rodent. BMC
and Einerhand, A.W. (2006). Muc2-deficient mice spontaneously develop co- Biol. 11, 61.
Cell 167, 1339–1353, November 17, 2016 1353

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
Alexa Fluor 488 goat anti-rabbit IgG Life Technologies Cat#A11008
Mucin 2 antibody (H-300) Santa Cruz Biotechnology Cat#sc-15334
4-nitrophenyl N-acetyl-b-D-glucosaminide Sigma-Aldrich Cat#N9376
4-nitrophenyl a-D-galactopyranoside Sigma-Aldrich Cat#N0877
4-nitrophenyl b-D-glucopyranoside Sigma-Aldrich Cat#N7006
4-nitrophenol Sigma-Aldrich Cat#73560
Acetic acid Acros Organics Cat#222140010
Adenine Sigma-Aldrich Cat#A2786
Alanine Sigma-Aldrich Cat#A7469
Alcian blue Sigma-Aldrich Cat#A5268
Alginate Sigma-Aldrich Cat#180947
Ammonium chloride Sigma-Aldrich Cat#A0171
Ammonium sulfate Thermo Fisher Scientific Cat#A702
Arginine Sigma-Aldrich Cat#A8094
Asparagine Sigma-Aldrich Cat#A4159
Aspartic Acid Sigma-Aldrich Cat#A93100
Barley b-glucan (Barliv Betafiber) Cargill Cat#BBF-100
Beef extract Sigma-Aldrich Cat#B4888
Biotin Sigma-Aldrich Cat#B4501
Boric acid Sigma-Aldrich Cat#B6768
Calcium chloride Sigma-Aldrich Cat#C1016
Calcium pantothenate Sigma-Aldrich Cat#P2250
Cellulose International Fiber Corporation Solka-Floc
Chloroform Sigma-Aldrich Cat#496189
Chondroitin sulfate Federal Laboratories Cat#CSP1K
cOmplete, Mini, EDTA-free Protease Sigma-Aldrich Cat#000000004693159001
Inhibitor Cocktail
Copper(II) sulfate pentahydrate Sigma-Aldrich Cat#C7631
Corn starch / amylopectin Sigma-Aldrich Cat#10120
Cyanocobalamin Sigma-Aldrich Cat#V2876
Cysteine Sigma-Aldrich Cat#C7352
Cytosine Sigma-Aldrich Cat#C3506
DAPI Sigma-Aldrich Cat#D9542
Dextran Sigma-Aldrich Cat#31389
Dipotassium phosphate Thermo Fisher Scientific Cat#BP363-500
EDTA Sigma-Aldrich Cat#ED4SS
Ethanol Decon Labs Cat#2701
Fiber-Free diet Teklad/Envigo Cat#TD.130343
Folic acid Sigma-Aldrich Cat#F7876
Fructose Sigma-Aldrich Cat#F0127
Galactose Sigma-Aldrich Cat#G0625
Glucomannan Konjac Foods Konjac Glucomannan Powder
Glucose Sigma-Aldrich Cat#158968

Continued
Glutamine Sigma-Aldrich Cat#G8540
Glutaraldehyde Electron Microscopy Sciences Cat#16537
Guanine Sigma-Aldrich Cat#G11950
Guar gum galactomannan Sigma-Aldrich Cat#G4129
Glutamic Acid Sigma-Aldrich Cat#G1501
Hematin Sigma-Aldrich Cat#H3281
Histidine Sigma-Aldrich Cat#H8000
Hydrochloric Acid Sigma-Aldrich Cat#320331
Inulin Cargill Oliggo-Fiber Instant Inulin
Iron(II) sulfate heptahydrate Sigma-Aldrich Cat#215422
Isobutyric acid Alfa Aesar Cat#79-31-2
Isoleucine Sigma-Aldrich Cat#I2752
Isopropanol Sigma-Aldrich Cat#278475
Isovaleric acid Alfa Aesar Cat#503-74-2
Laboratory Autoclavable Rodent Diet - 5010 LabDiet Cat#0001326
Larch arabinogalactan Megazyme Cat#P-ARGAL
Leucine Sigma-Aldrich Cat#L8000
Lysine Sigma-Aldrich Cat#L5501
Lysozyme Thermo Fisher Scientific Cat#BP535-1
Magnesium chloride anhydrous Sigma-Aldrich Cat#M8266
Magnesium sulfate anhydrous Sigma-Aldrich Cat#M7506
Magnesium sulfate heptahydrate Sigma-Aldrich Cat#M5921
Manganese(II) sulfate monohydrate Mallinkrodt Cat#6192
Mannose Acros Organics Cat#150600
Menadione Sigma-Aldrich Cat#M5625
Methanol anhydrous Thermo Fisher Scientific Cat#A412-1
Methionine Sigma-Aldrich Cat#M9625
Mucin from porcine stomach Sigma-Aldrich Cat#M1778
N-acetyl glucosamine Sigma-Aldrich Cat#A3286
Nicotinic acid Sigma-Aldrich Cat#N4126
Osmium tetroxide Electron Microscopy Sciences Cat#19100
p-Aminobenzoic acid Sigma-Aldrich Cat#A9878
Pancreatic digest of casein BioWorld Cat#30620060-1
Phenol Sigma-Aldrich Cat#P4557
Phenol:Chloroform:Isoamyl Alcohol Thermo Fisher Scientific Cat# 15593031
(pH 8.05)
Phenol:Chloroform:Isoamyl Alcohol Fisher Scientific Cat#BP1754I-400
(pH 4.3)
Phenylalanine Sigma-Aldrich Cat#P2126
p-nitrophenyl a-L-fucopyranoside Sigma-Aldrich Cat#N3628
p-nitrophenyl b-D-xylopyranoside Sigma-Aldrich Cat#N2132
Polygalacturonic acid Sigma-Aldrich Cat#P3850
Potassium 4-nitrophenyl sulfate Sigma-Aldrich Cat#N3877
Potassium chloride Sigma-Aldrich Cat#P9333
Potassium dihydrogen phosphate Thermo Fisher Scientific Cat#P284
Potato pectic galactan Megazyme Cat#P-PGAPT
Proline Sigma-Aldrich Cat#P5607
Propionic acid Acros Organics Cat#149300010

Continued
Propylene oxide Electron Microscopy Sciences Cat#20412
Pyridoxine HCl Sigma-Aldrich Cat#P9755
Resazurin Acros Organics Cat#418900010
Retrievagen A BD Biosciences Cat#550524
Rhamnogalacturonic acid Megazyme Cat#P-RHAM1
Riboflavin Sigma-Aldrich Cat#R7649
RNAprotect QIAGEN Cat#76506
SDS Sigma-Aldrich Cat#L3771
Serine Sigma-Aldrich Cat#84959
Sodium acetate Sigma-Aldrich Cat#S2889
Sodium bicarbonate Sigma-Aldrich Cat#S5761
Sodium chloride Sigma-Aldrich Cat#S7653
Sodium citrate Sigma-Aldrich Cat#S1804
Sodium hydroxide Sigma-Aldrich Cat#S8045
Sodium lactate Thermo Fisher Scientific Cat#S326-500
Sodium molybdate dehydrate JT Baker Chemical Company Cat#3764
Sugar beet arabinan Megazyme Cat#p-ARAB
Tamarind xyloglucan Carbomer Cat#4-00634
Thiamine HCl Sigma-Aldrich Cat#T4625
Thioctic acid Sigma-Aldrich Cat#T5625
Threonine Sigma-Aldrich Cat#T8625
Thymine Sigma-Aldrich Cat#T0895
Tris Thermo Fisher Scientific Cat#BP152
Triton X-100 Sigma-Aldrich Cat#T9284
TRIzol Invitrogen Cat#15596026
Tryptone Thermo Fisher Scientific Cat#BP1421
Tyrosine Sigma-Aldrich Cat#T3754
Uracil Sigma-Aldrich Cat#U1128
Valeric acid Alfa Aesar Cat#109-52-4
Valine Sigma-Aldrich Cat#V0500
Wheat arabinoxylan Megazyme Cat#P-WAXYL
Xylene Sigma-Aldrich Cat#296333
Xylene Substitute Sigma-Aldrich Cat#A5597
Xylose Sigma-Aldrich Cat#X3877
Yeast extract Fluka Analytical Cat#70161
Zinc sulfate heptahydrate JT Baker Chemical Company Cat#4382
AccuPrimeTaq DNA Polymerase, high Thermo Fisher Scientific Cat#12346086
fidelity kit
Affymetrix Mouse Gene ST 2.1 strips Affymetrix Cat#902120
Arcturus PicoPure DNA Extraction Kit Arcturus Cat#KIT0103
DNeasy Blood & Tissue Kit QIAGEN Cat#69506
epMotion 5075 TMX Eppendorf Cat#960020033
High-sensitivity DNA analysis kit Agilent Cat#5067-4626
KAPA SYBRFAST qPCR kit KAPA Biosystems Cat#KK4600
KAPA Library Quantification Kit for Illumina KAPA Biosystems Cat# KK4824
platforms
Mouse Lipocalin-2/NGAL DuoSet ELISA kit R & D Biosystems Cat#DY1857

Continued
Pierce Microplate BCA Protein Assay Kit Thermo Fisher Scientific Cat#PI23252
PowerSoil Isolation Kit MoBio Laboratories Cat#12888
Qubit RNA Assay Kit Thermo Fisher Scientific Cat#Q32852
Ribo-Zero rRNA Removal Kits (Bacteria) Illumina Cat#MRZB12424
RNeasy Protect Bacteria Mini Kit QIAGEN Cat #74524
SequalPrep Normalization Plate Kit, 96-well Thermo Fisher Scientific Cat#A1051001
TURBO DNase kit Ambion Cat#AM1907
Deposited Data
16S rRNA gene sequences and metadata NCBI BioProjectID; NCBI SRA PRJNA300261;SRP065682
Mouse microarray data NCBI Geo GSM2084849 55
RNA-Seq data NCBI BioProjectID NCBI: SRP092534, SRP092530,
SRP092478, SRP092476, SRP092461,
SRP092458, SRP092453
Akkermansia muciniphila: DMS 22959, type DSMZ Cat#DMS 22959
strain
Bacteroides caccae: DSM 19024, type DSMZ Cat#DSM 19024
strain
Bacteroides ovatus: DSM 1896, type strain DSMZ Cat#DSM 1896
Bacteroides thetaiotaomicron: DSM 2079, DSMZ Cat#DSM 2079
type strain
Bacteroides uniformis: ATCC 8492, type ATCC Cat#ATCC 8492
strain
Barnesiella intestinihominis: YIT11860 DSMZ Cat#DSM 21032
Citrobacter rodentium: DBS100 David Schauer, Massachusetts N/A
Institute of Technology
Citrobacter rodentium: DBS120 David Schauer, Massachusetts N/A
Institute of Technology
Clostridium symbiosum: DSM 934, type DSMZ Cat#DSM 934
strain, 2
Collinsella aerofaciens: DSM 3979, type DSMZ Cat#DSM 3979
strain
Desulfovibrio piger: ATC 29098, type strain ATCC Cat#ATC 29098
Escherichia coli HS ATCC N/A
Eubacterium rectale: DSM 17629, A1-86 DSMZ Cat#DSM 17629
Faecalibacterium prausnitzii: DSM 17677, DSMZ Cat#DSM 17677
A2-165
Marvinbryantia formatexigens: DSM 14469, DSMZ Cat#DSM 14469
type strain, I-52
Roseburia intestinalis: DSM 14610 type DSMZ Cat#DSM 14610
strain, L1-82
16S rRNA gene Illumina sequencing Kozich et al., 2013 Table S2B
primers
qPCR primers This paper Table S3A
Arraystar DNAStar http://www.dnastar.com/
t-sub-products-genomics-arraystar.aspx
Gen5 Biotek http://www.biotek.com/products/
microplate_software/gen5_data_
analysis_software.html

Continued
ImageJ Schneider et al., 2012 http://imagej.net/Welcome
Microsoft Excel Microsoft https://products.office.com/en-us/excel
Mothur v1.33.3 Schloss et al., 2009 http://www.mothur.org/
Multi-array average (RMA) method Irizarry et al., 2003 http://www.bioconductor.org/
Prism v5.04 GraphPad Software http://www.graphpad.com/
scientific-software/prism/
QIAGEN’s Ingenuity Pathway Analysis QIAGEN http://www.ingenuity.com/products/ipa
R The R Foundation https://www.r-project.org/
R - Limma package Ritchie et al., 2015 https://bioconductor.org/packages/
release/bioc/html/limma.html
R - Plyr package Wickham, 2011 http://cran.r-project.org/web/packages/
plyr/index.html
UCHIME Edgar et al., 2011 http://drive5.com/usearch/manual/
uchime_algo.html
Other
Acid-washed glass beads (212-300 mm) Sigma-Aldrich Cat#G1277
Anaerobic chamber Coy manufacturing Vinyl Type A + Type B
Biostack automated plate handling device Biotek Instruments BIOSTACK2WR
Bioluminescence reader Xenogen IVIS200
Breathe-Easy polyurethane membrane Biversified Biotech Cat#BEM-1
Flat bottom 96-well plates Costar
General Laboratory Homogenizer Omni International
Microdissection instrument Arcturus Veritas Microdissection Instrument
Millex-GV Syringe Filter Unit, 0.22 mm, EMD Millipore Cat#SLGV004SL
PVDF, 4 mm, ethylene oxide sterilized
Mini-BeadBeater-16 Biospec Products Cat#607
Powerwave HT absorbance reader Biotek Instruments PowerWaveHT
Synergy HT absorbance reader Biotek Instruments Synergy HT
Upright fluorescence microscope Olympus BX60
Transmission electron microscope Philips Philips CM-100
Further information may be obtained from the Lead Contact Eric C. Martens (Email: emartens@umich.edu; address: University of
Michigan Medical School, Ann Arbor, Michigan 48109, USA).
Gnotobiotic Mouse Model and Diet Treatments

All animal experiments followed protocols approved by the University of Michigan, University Committee for the Use and Care of
Animals. Germfree male and female wild-type Swiss Webster mice were colonized at 8 9 weeks of age and none of these mice
were involved in any previous experiments/treatments. Mice were housed alone or in groups as appropriate for gender, litter and
diet requirements and provided ad libitum with autoclaved distilled water and the diets described below.
Identities and culture purity of the bacterial species in the synthetic gut microbiota were confirmed by sequencing their 16S rRNA
genes, followed by comparison to sequences in public databases. Bacteria were grown in their respective media (Table S1) for
community assembly or in vitro growth evaluation on carbohydrates. Each individual bacterial member of the SM was grown anaer-
obically (atmosphere 85% N2, 10% H2, 5% CO2) in its respective medium (Table S1) at 37 C with final absorbance (600nm) readings
ranging from about 0.5 to 1.0. Bacterial cultures were mixed in equal volumes and each individual inoculum sealed in its own tube with
anaerobic headspace. Each mouse was gavaged with 0.2 mL of this mixture (freshly prepared each day) for three consecutive days at
nearly the same time of the day.
Fiber-free (FF) and Prebiotic (Pre) diets were sterilized by gamma irradiation and the Fiber-rich (FR) diet (LabDiet 5010; autoclav-
able rodent diet) was sterilized by autoclaving. The FF diet was manufactured by Teklad/Envigo (WI, USA) and, as previously

described (TD.140343) (Kamada et al., 2012), consisted of a modified version of Harlan TD.08810 in which starch and maldodextrin
were replaced with glucose. The Pre diet was a new formulation based on the FF diet with 2.1% of a purified polysaccharide mixture
(Table S1) added along with 10% cornstarch (each replacing an equivalent amount of glucose).
On day 14 after colonization (Figure 1B), mice were randomly assigned to groups by a technician, who was not aware of the details
of the treatment groups. The mice were sometimes caged separately even within individual groups. For dietary oscillations, mice
from their respective cage were transferred to a different cage with another diet. Bedding was replaced in each cage before the
mice were transferred. To minimize the potential for circadian effects, the oscillation was carried out at nearly the same time of
the day (±1.0 hr between different days) and fecal samples were collected just prior to their transfer to another cage containing a
different diet. The fecal samples were immediately stored at 20 C until further use.
Citrobacter rodentium Infection

A kanamycin (Km)-resistant wild-type C. rodentium strain (DBS120) and a luciferase-expressing strain of C. rodentium (DBS100; resis-
tant to ampicillin (Amp)) were used (Kamada et al., 2012). Each mouse was gavaged with 0.2 mL of culture grown aerobically overnight
at 37 C (109 CFU grown in Luria-Bertani broth without antibiotics). The exact same culture was used to gavage all mice in a single
experiment to rule out effects of growth variation on pathogenesis. For experiments with luciferase expressing C. rodentium, mice
were fed the FF diet for nearly the same duration (39 days instead of 42) as for the other experiment (Experiment 2B, Figure 5), prior
to infecting them with C. rodentium. For the experiment with germfree (GF) mice, two groups of mice were separately pre-fed the FR
and FF diets for 4 weeks prior to infection with luciferase-expressing C. rodentium or the wild-type C. rodentium.
Formulation of the Synthetic Microbiota

We selected 12/14 bacterial species (Figure 1A) from the list of the most common/frequent 75 89 species in the human gut (Qin et al.,
2010). Moreover, the selection of 14 species was based in part on carbohydrate utilization abilities of a larger pool of 350 strains that
were screened on the same platform as shown in Figure 1A (K. Urs, N.A.P., and E.C.M., unpublished data). Based on the in vitro as-
says, our synthetic microbiota (SM) is not biased toward mucin-degrading bacteria, as only 4/14 species possess this ability
(Figure 1A).
METHOD DETAILS
Experimental Design
A total of four gnotobiotic animal experiments (Experiments 1 4; also mentioned in figure legends) were performed – details of the
experimental replication are provided in the corresponding figure legends. Both male and female mice were randomly used depend-
ing on the availability of animals. Gnotobiotic Experiment 1 contained 2 male mice in Fiber-rich (FR) group, 2 male mice in Fiber-free
(FF) group and 1 male mouse in Prebiotic (Pre) group; all other animals in Gnotobiotic Experiment 1 were females. Gnotobiotic Exper-
iment 2A and 2B had all male mice. All animals in Gnotobiotic Experiment 3 were females. Gender details of the animals in gnotobiotic
Experiment 4 are shown in Figure 6 (both males and females were used). For infection with wild-type C. rodentium in germfree (GF)
mice, all male mice were used. Gender details of GF mice used for infection with luciferase-expressing C. rodentium are included in
Figure S7 (both males and females were used). Finally, all GF mice used for measurement of the colonic mucus layer (Figure 4C) were
females. The researchers were not blinded to the identities of the treatment groups; however, the technician who assigned individual
gnotobiotic animals to different treatment groups was not aware of the experimental details. Measurements of the colonic mucus
layer were single blinded (see details below in the relevant section). The pathologist who devised the inflammation-scoring rubric
was not blinded, and the pathologist who performed the histology scoring and the technician who performed electron microscopy
were blinded for the identities of the treatment groups (see below for details of the methods). No data were excluded from the final
analysis.
Sample size estimations were performed as follows in consultation with a statistician. Based on previous studies it was assumed
an effect size (ratio of mean difference to within group standard deviation) of 3 would be reasonable for readouts such as mucus layer
measurements, enzyme assays and measurement of transcript changes. With 3 animals in each group and a 5% significance level,
two-sided, this would yield a power of 78% for the t test. Therefore, for some of the feeding groups (those alternated between
different diets), 3 animals were used. However, for other feeding groups that were more important for the central research question
of the study (e.g., constant feeding on Fiber-rich (FR) and Fiber-free (FF) diets), at least 4 animals were used to obtain higher power.
For C. rodentium infection experiments, in most cases 5 animals per group were used based on results of our previous study (Ka-
mada et al., 2012).
Sample Processing for Animal Experiments

All animals were killed using CO2 asphyxiation followed by cervical dislocation. The gastrointestinal tracts were quickly removed. The
colons were gently separated, by cutting at the cecum-colon junction and the rectum, and immediately preserved in Carnoy’s fixative
(dry methanol:chloroform:glacial acetic acid in the ratio 60:30:10) with slight modifications to a previous protocol (Johansson and
Hansson, 2012). Note that the Carnoy’s fixative was made fresh with anhydrous methanol, chloroform and glacial acetic acid.
The colons were fixed in Carnoy’s solution for 3 hr followed by transfer to fresh Carnoy’s solution for 2 3 hr. The colons were

then washed in dry methanol for 2 hr, placed in cassettes and stored in fresh dry methanol at 4 C until further use. Cecal contents
from each animal were divided into replicates; instantly flash-frozen in liquid nitrogen and were stored at 80 C until further use.
Immediately after squeezing out the cecal contents, the cecal tissues were transferred in separate screw-cap tubes and were rapidly
flash-frozen in liquid nitrogen, followed by their storage at 80 C until further use. Lengths of colons were measured immediately
after fixation in Carnoy’s solution by photographing the colons in a reference cassette of identical size, followed by length measure-
ment in ImageJ.
Purification of Mucin O-Glycans

Mucin O-glycans were purified from porcine gastric mucus as previously described in Martens et al. (2008), albeit with several
modifications. Porcine gastric mucin (Type III, Sigma, USA) was suspended at 2.5% w/v in 100 mM Tris (pH 7.4): the mixture was
immediately autoclaved for 5 min to increase solubility and reduce potential contaminating glycoside hydrolase and polysaccharide
lyase activity, then cooled to 55 C. Proteinase K (Invitrogen, USA) was added to a final concentration of 0.1 mg/ml and the suspen-
sion was incubated at 55 C for 16 20 hr with slow shaking. The proteolyzed solution was subsequently centrifuged at 21,000 x g for
30 min at 4 C to remove insoluble material, and NaOH and NaBH4 were added to final concentrations of 0.1 M and 1 M, respectively.
This solution was incubated at 65 C for 18 hr to promote selective release of O-linked glycans (mucin O-glycans and GAGs) from
mucin glycopeptides by alkaline b-elimination. The pH was subsequently decreased to 7.0 with HCl and the neutralized mixture
centrifuged at 21,000 x g for 30 min at 4 C, and then filtered through a 0.22 mm filter (Millipore) to remove remaining insoluble material.
The filtrate was exhaustively dialyzed (1 kDa cutoff) against deionized distilled H2O to remove salts and contaminating small mole-
cules. The collected mucosal glycans were further fractionated using anion exchange chromatography by passing them twice over a
DEAE-Sepharose (Sigma) column (325 mL bed volume; equilibrated in 50 mM Tris 7.4; gravity flow). The flow through (neutral fraction)
was collected and the column washed with 1L of 50 mM Tris, pH 7.4. This fraction was used in all growth experiments and was further
prepared by dialyzing against ddH2O (1 kDa cutoff), lyophilized and resuspended in ddH2O at 20 mg/ml.
Bacterial Growth Assays in a Custom Carbohydrate Array

All species except Desulfovibrio piger were evaluated in a custom carbohydrate array (n = 2 replicate cultures per glycan); D. piger
failed to grow in any of the tested minimal media, but is not predicted to have extensive carbohydrate-degrading capacity based on
its small genomic complement of only 30 carbohydrate active enzymes. The custom carbohydrate array was formulated according to
Martens et al., 2011, but with a few modifications included in the following protocol: flat bottom 96-well plates (Costar) were used, to
which 100 mL of a 2x concentrated solution (prepared in Milli-Q water) of each of the sterilized carbohydrate stocks (Table S1) were
added. The plates were then transferred to the anaerobic chamber (10% H2, 5% CO2 and 85% N2) and were allowed to equilibrate
with the anaerobic atmosphere for 3 4 hr. Growth assays for all carbohydrates were carried out in non-adjacent duplicates and all
growth arrays contained two non-adjacent water only negative controls that were checked to ascertain that other medium compo-
nents without added carbohydrates did not yield detectable growth. The cultures for the inoculation were grown overnight in their
respective minimal media (MM)/regular growth media at 37 C under an anaerobic atmosphere (10% H2, 5% CO2 and 85% N2).
MM for some bacterial species were used from previous studies, and for other members of the synthetic microbiota, MM with novel
formulations were devised (see Table S1 for compositions of all growth media used in this study). MM were pre-reduced in the anaer-
obic chamber overnight by loosening the lids of the glass bottles containing the MM. 1 mL of the culture was centrifuged and the
pellet was recovered – note that the centrifugation was performed inside the anaerobic chamber. The pellet was washed 2 times
in the respective MM in order to remove carried over carbohydrates from the culture media and was then resuspended in 1 ml,
2x concentrated MM without any carbohydrates. This 1 mL culture was used to inoculate 50 mL of 2x concentrated MM without
carbohydrates at a 1:50 ratio. 100 mL of the resulting cultures were then added to the individual wells of the carbohydrate solutions
in the 96-well plates, resulting in 200 mL of final volumes. A gas permeable, optically clear polyurethane membrane (Diversified
Biotech, USA) was then used to seal the well plates under the same anaerobic atmosphere. Next, the well plates were loaded in
a Biostack automated plate-handling device (Biotek Instruments, USA) placed inside the anaerobic chamber, which was coupled
with a Powerwave HT absorbance reader (Biotek Instruments, USA).
Absorbance values were measured at 600nm (A600) at an interval of 10 min over 96 hr for all species, except for Akkermansia
muciniphila, for which the absorbance was measured over 144 hr, owing to its relatively slow growth on mucin O-glycans
(Figure S1B). To construct the heatmap containing relative growth values (Figure 1A), absorbance data of all bacterial species
were normalized as follows: only carbohydrate growth assays for which both replicate cultures produced an increase in absorbance
of more than 0.1 were scored as positive (all other values were set to 0). Next, the maximum change in absorbance was normalized
within each individual species by setting its best growth to 1.0 and normalizing all other positive growths to this maximum value
(normalized values were thus between 0 and 1.0). Finally, growth on each substrate was normalized across species by setting the
maximum (previously normalized) growth value on that substrate to 1.0 and then adjusting the growth values for other strains on
that same substrate relative to the maximum value, yielding final normalized values between 0 and 1.0. Both raw and normalized
values are provided in Table S1.
To perform RNA-Seq analysis on pure cultures, A. muciniphila and Bacteroides caccae were grown anaerobically in their respec-
tive minimal media (Table S1). A. muciniphila was grown separately on two different substrates N-acetylglucosamine (5 mg/ml final
concentration) and purified mucin O-glycans (10 mg/ml final concentration). Cultures of A. muciniphila were grown to mid-log phase

(A600 values between 0.45 0.6). Cultures of B. caccae were grown separately on glucose and mucin O-glycan as the carbon sources
(10 mg/ml final concentration for both sugars) to mid-log phase (OD values between 0.7 0.8). Cultures of both species were treated
with RNAprotect (QIAGEN, USA) according to the manufacturer’s instructions. The RNAprotect treated bacterial pellets were stored
at 80 C until extraction of RNA. Two replicate cultures per glycan (with closely matching ODs) were performed for each of the two
bacterial species.
Citrobacter rodentium quantification

To determine the CFUs of C. rodentium, freshly collected fecal samples were weighed and homogenized in cold phosphate-buffered
saline and were plated on LB agar plates with 50 mg/ml Km (for strain DBS120) or 200 mg/ml Amp (for strain DBS100) at serial dilutions
up to 10 9. The plates were incubated aerobically overnight at 37 C. Killing of E. coli HS, the only facultative anaerobe in our SM, to
Km and Amp was confirmed by plating it on LB agar with Km or Amp.
Extraction of Nucleic Acids

DNA from fecal samples was isolated using the MoBio PowerSoil Isolation Kit (MoBio Laboratories, USA) adapted for use in the ep-
Motion 5075 TMX or the DNA extraction protocol used for Collinsella aerofaciens (mentioned below). DNA was extracted from the
bacterial pure cultures using DNeasy Blood & Tissue Kit (QIAGEN, USA), except that the following bead beating and phenol chloro-
form extraction protocol, was employed to better extract DNA from C. aerofaciens: 1 2 mL of the overnight grown culture was centri-
fuged and the resulting pellet was combined with acid-washed glass beads (212 300 mm; Sigma-Aldrich, USA), 500 ml Buffer A
(200 mM NaCl, 200 mM Tris, 20 mM EDTA), 210 ml SDS (20% w/v, filter-sterilized) and 500 ml phenol:chloroform:isoamyl alcohol
(25:24:1, pH 8.05; Thermo Fisher Scientific, USA). A Mini-BeadBeater-16 (Biospec Products, USA) was used to disrupt the bacterial
cells for 5 min at room temperature, which was followed by cooling the samples for 1 2 min on wet ice. The samples were then centri-
fuged and the aqueous phase was recovered. An equal volume of phenol:chloroform:isoamyl alcohol (25:24:1) was added to the
aqueous phase and was mixed with the aqueous phase by gentle inversion. After centrifugation (12,000 rpm, 4 C, 3 min), the
aqueous phase was recovered. Next, 500 ml of pure chloroform was added to the aqueous phase, mixed by inversion and the tubes
were centrifuged (12,000 rpm, 4 C, 3 min). The aqueous phase was transferred into fresh tubes and 1 volume of 20 C chilled 100%
isopropanol and 1/10 volume 3 M sodium acetate (pH 5.2) were added to the aqueous phase. The samples were mixed by gentle
inversion and incubated at 20 C for 1 hr, centrifuged for 20 min (12,000 rpm, 4 C) and the supernatants were discarded. The pellets
were washed in 70% ethanol (v/v, prepared in nuclease-free water), air-dried and then resuspended in nuclease-free water. The
resulting DNA extracts were purified by using DNeasy Blood & Tissue Kit (QIAGEN, USA).
RNA was extracted from cecal contents using a standard phenol chloroform method with bead beating as mentioned earlier (Son-
nenburg et al., 2006), but with a few modifications: 1 mL of RNAprotect (QIAGEN, USA) stored at room temperature was added to
200 300 mg of cecal contents stored at 80 C, followed by thawing the cecal samples on wet ice. After the cecal contents were
thawed, 250 ml acid-washed glass beads (212 300 mm; Sigma-Aldrich, USA) were added to the samples. Next, 500 ml of a solution
of Buffer A (200 mM NaCl, 200 mM Tris, 20 mM EDTA), 210 ml of 20% SDS (filter sterilized) and 500 ml of phenol:chloroform:isoamyl
alcohol (125:24:1, pH 4.3; Fischer Scientific, USA) were added to the samples. The mixture was then bead beaten (instrument same
as above) for 5 min and centrifuged at 4 C (3 min at 13000 rpm). The aqueous phase was recovered and mixed with 500 ml of the
aforementioned phenol:chloroform:isoamyl alcohol solution. Afterward, the mixture was centrifuged again at 4 C (3 min at
13,000 rpm) and the aqueous phase was recovered. 1/10 volume of a 3M sodium acetate (pH: 5.2) and 1 volume of 20 C chilled
ethanol were added to the aqueous phase. The resulting solution was then mixed by gentle inversion and incubated for 20 min on ice.
Afterward, the mixture was centrifuged at 4 C (20 min at 13,000 rpm). The pellet was recovered and washed twice in 500 ml of cold
70% ethanol. The mixture was centrifuged at 4 C (5 min at 13,000 rpm) and the RNA pellet was recovered, air-dried and then resus-
pended in nuclease-free water. The RNA extracts were then purified using an RNeasy Mini kit (QIAGEN, USA) according to the man-
ufacturer’s protocol. During extraction of RNA from the cecal contents, a portion (100 200 ml) of homogenized material was
removed immediately after bead beating and stored at 80 C for extraction of DNA. To extract DNA from these cecal-content
derived samples, DNA extraction protocol described above (used for C. aerofaciens) was used, except that bead beating and inclu-
sion of glass beads were skipped. RNA was extracted from the RNAprotect-treated cell pellets of bacterial pure cultures of B. caccae
using RNeasy Protect Bacteria Mini Kit. For extraction of RNA from RNAprotect-treated cell pellets of A. muciniphila, the RNA extrac-
tion protocol used for cecal contents (see above) was used, except that the samples were not treated with RNAprotect after thawing.
RNA was extracted from cecal tissue by thawing the samples in the presence of RNAprotect (as described above for cecal contents),
followed by homogenization (OMNI International) involving metal beads. RNA was then extracted with Trizol (Invitrogen, USA)
according to manufacturer’s instructions. All RNA extracts were subjected to digestion of DNA using TURBO DNase (Ambion,
USA) according to the manufacturer’s instructions.
Bioluminescence Imaging and Transmission Electron Microscopy

For bioluminescence imaging of the luciferase-expressing Citrobacter rodentium, GI tracts were removed and luminal contents were
gently flushed with a syringe by passing phosphate-buffered saline (PBS) through the colon. The GI tracts were then cut open flat and
rinsed in PBS to remove loosely attached luminal contents. Bioluminescence was visualized (identical exposure across all samples)
and photographed using the IVIS200, Xenogen system. The colonic tissue sections showing highest luciferase intensity (from both

Fiber-rich (FR) and Fiber-free (FF) diet fed colonized mice) were then immediately fixed in 2.5% glutaraldehyde prepared in 0.1 M
Sorensen’s buffer (pH 7.4). Thereafter, the samples were treated with 1% osmium tetroxide in 0.1 M Sorensen’s buffer and were
sequentially dehydrated in graded alcohols and propylene oxide, followed by infiltration in Spurrs or Epon. Ultrathin sections of
the tissue samples were made using a diamond knife, stained and were visualized with a transmission electron microscope (Philips
CM-100).
Laser Capture Microdissection

To perform laser capture microdissection (LCM) on colonic thin sections, 4 and 3 fecal masses were analyzed for the Fiber-rich (FR)
and Fiber-free (FF) groups, respectively. Colonic thin sections that were deposited on microscope slides were deparaffinized in
xylene followed by dehydration by isopropanol (see details in the immunofluorescence staining protocol below). The sections
were stored overnight in a container with Drierite dessicant (Drierite, USA). LCM was carried out using a Veritas Microdissection in-
strument (Arcturus, USA). DNA was extracted from the microdissected samples using the Arcturus Pico Pure DNA extraction kit and
the accompanying protocol. In order to perform Ilumina sequencing, 16S rRNA genes were amplified from the LCM-derived samples
using a low biomass-optimized touch down PCR protocol as follows: denaturation at 95 C for 2 min; a total of 20 cycles with a touch-
down program: denaturation at 95 C for 20 s, extension at 72 C for 5 min, annealing starting at 60 C for 15 s which decreased 0.3 C
per cycle; a total of 20 cycles: extension at 72 C for 5 min, annealing at 55 C 15 s and extension at 72 C for 5 min; final extension at
72 C for 5 min. Note that a 5 min extension was used in order to reduce chimera development. Library preparation and sequencing
were carried out using similar protocols described for fecal and cecal samples (see below).
Illumina Sequencing and Data Analysis

PCR and library preparation were performed by the University of Michigan Microbial Systems Molecular Biology Lab as described by
Kozich et al. (2013). The V4 region of the 16S rRNA gene was amplified using the dual-index primers described by Kozich et al. (2013)
with a few modifications to the PCR assay, which are included in the following protocol. Each of these dual-index primers contains an
Illumina adaptor, an 8-nt index sequence, a 10-nt pad sequence, a 2-nt linker, and the V4 primers F515 and R806. These primer se-
quences are listed in Table S2. For the PCR assays, 5 mL of each of the 4 mM primers, 0.15 mL AccuPrime High Fidelity Taq polymerase
(Thermo Fisher Scientific, USA), 2 mL of 10x AccuPrime PCR II buffer (Thermo Fisher Scientific, USA), 11.85 mL of sterile PCR-grade
water and 1 mL of the DNA template were mixed. The PCR cycles started with a 2 min of denaturation at 95 C, followed by 30 cycles
each consisting of 95 C for 20 s, 55 C for 15 s and 72 C for 5 min, followed by a final step of 72 C for 10 min. The amplicons were
normalized to the lowest concentration of the pooled plates using a SequalPrep normalization plate kit (Thermo Fisher Scientific,
USA). A KAPA Library Quantification kit for Illumina platforms (Kapa Biosystems, USA) was used to determine the library’s concen-
tration and an Agilent Bioanalyzer high-sensitivity DNA analysis kit (Agilent, USA) was employed to determine the amplicon size. The
amplicons were sequenced using an Illumina MiSeq with a MiSeq Reagent 222 kit V2 (Illumina, USA). The libraries were prepared
following the Illumina protocol for 2nM libraries: ‘Preparing Libraries for Sequencing on the MiSeq’ (part 15039740, Rev. D).
Raw sequences were analyzed using mothur (v1.33.3) (Schloss et al., 2009). The following control samples were included: 1) DNA
extracted from the fecal samples collected from germfree mice, 2) a mixture of extracted DNA from pure cultures of the members of
the synthetic microbiota (DNA samples from each strain were mixed in equal amounts) and 3) PBS negative controls during DNA
extraction and PCR amplification. Following sequence barcode-trimming, sequences were aligned to a custom reference database,
consisting of the V4 16S rRNA region from each of the 14 bacterial members and C. rodentium. UCHIME (Edgar et al., 2011) was used
to remove sequence chimeras. The R package ‘vegan’ was used to calculate the principal coordinates analysis (PCoA) from the Bray-
Curtis dissimilarity index based on phylotype classification of the 14 bacterial members. Standard R commands and the R package
‘plyr’ (Wickham, 2011) were used to generate median values of relative abundance or change in relative abundance over time, and the
Wilcoxon signed-rank test (two-sample comparisons) or the Kruskal-Wallis test (multiple groups) was used to determine significance
as indicated due to the nonparametric distribution of relative abundance data. R was used to visualize relative abundance of bacterial
members in different groups, over time as streamplots, or in heatmaps. Change in relative abundance over time was determined by
subtracting the relative abundance of the specified microbial member from the day prior within each animal, over time. Change in
relative abundance followed a parametric distribution, and Student’s t tests were used to calculate significant differences in the
change of relative abundance between diet groups. The R package ‘ggplots’ was used to generate heatmaps visualizing the Percent
of Maximum Abundance (POMA) as previously described (McNulty et al., 2013). For this, the relative abundance of the different spe-
cies was normalized by their maximum abundance observed for a given species across all time-points from the given animal.
A detailed list of commands used to analyze the data, including the commands used in mothur, are included in https://github.
com/aseekatz/mouse.fiber. Raw sequences have been deposited in the Sequence Read Archive under the study accession and
Bioproject identifiers (SRA: SRP065682 and PRJNA300261).
Microbial RNA-Seq and CAZyme Annotation

Microbial RNA-Seq was performed on pure cultures of A. muciniphila and B. caccae that were grown separately on mucin O-glycans
and on the respective simple sugars (see above). For Bacteroides thetaiotaomicron, gene expression data from previous studies was
utilized (Martens et al., 2008; Sonnenburg et al., 2005). To perform RNA-Seq on cecal samples, 3 samples each (out of 4) were
randomly selected from Fiber-rich (FR) and Fiber-free (FF) diet groups and all three samples each from the Prebiotic diet (Pre), FR FF

daily oscillation and Pre FF daily oscillation groups were utilized. To remove ribosomal RNA, samples were subjected to Ribo-Zero
rRNA Removal Kits (Bacteria) (Epicenter, Illumina, USA) according to the manufacturer’s instructions. The resulting residual mRNA
concentrations were quantified using Qubit RNA Assay Kit (Life Technologies, USA).
Libray preparation and sequencing of RNA-Seq libraries was carried out using the Illumina HiSeq platform and TruSeq adaptors.
Samples were multiplexed in groups of 24 per lane (see Tables S4, S5, and S6 for quantification of reads mapped to each sample).
The resulting data in fastq file format were demultiplexed and mapped to the respective species genomes or community metage-
nomes using RPKM normalization and default parameters, and were further analyzed for fold-change and statistics (moderated
t test with Benjamini-Hochberg correction) within the Arraystar software package (DNAStar, USA). Mapping reads to all genes in
the 14 species community was intended to retain the contributions of community member abundance shifts while mapping only
to individual genomes was intended to normalize abundance shifts of the same species between conditions and isolate gene expres-
sion changes between conditions. The diet-specific behavior of known B. thetaiotaomicron and B. ovatus genes involved in fiber
polysaccharide degradation (Table S6) was used as internal validation that biologically relevant changes in gene expression were
indeed being detected. As stated above, three biological replicates were analyzed for each of the in vivo dietary conditions used.
p-Nitrophenyl Glycoside-Based Enzyme Assays

p-Nitrophenyl glycoside-based enzyme assays were carried out on cecal samples stored at 80 C. The cecal samples were thawed
on wet ice and 500 ml buffer (50 mM Tris, 100 mM KCl, 10 mM MgCl2; pH 7.25) was added to 22 67 mg of cecal contents. The buffer
additionally contained the following additives: lysozyme (tiny amount of powder/100 mL buffer), TritonX (100 ml, 12%/100 mL buffer),
DNases (tiny amount of powder/100 mL buffer) and protease inhibitor (one tablet of EDTA-free, Protease Inhibitor Cocktail, Roche,
USA/100 mL buffer). After adding the buffer to cecal samples, the samples were sonicated with an ultrasonic processor for 45 s
(9 cycles of 5 s sonication followed by a break of 10 s; 35% amplitude; using a tapered microtip of 3 mm) on ice. Sonicated samples
were subjected to centrifugation (10,000 g, 10 min, 4 C). Supernatants (400 ml) were carefully pipetted and were stored at 20 C
until further use. The following nitrophenyl-linked substrates (Sigma-Aldrich, USA) were employed: Potassium 4-nitrophenyl sulfate,
4-nitrophenyl a-D-galactopyranoside, 4-nitrophenyl N-acetyl-b-D-glucosaminide, 4-nitrophenyl b-D-glucopyranoside, p-nitro-
phenyl a-L-fucopyranoside and p-nitrophenyl b-D-xylopyranoside. Protein concentrations in the supernatants were determined
using Pierce Microplate BCA Protein Assay Kit (Thermo Scientific, USA). Some samples were diluted with the buffer (same buffer
as above) to obtain a homogeneous range of protein concentrations across all samples. 5 mg of total protein was used in the
150 ml reactions inside flat-bottom, 96-well plates (Costar) with 10 mM nitrophenyl-based substrate in the buffer (same buffer as
above). Absorbance measurements (405 nm) were started immediately in a plate reader (Biotek, USA) at 37 C and absorbance
values were recorded every minute for 6 12 hr duration depending on linearity of the kinetic curve. The enzyme activities were deter-
mined by plotting a standard curve of known concentrations of 4-nitrophenol and measuring the OD values at 37 C.
Thickness Measurements of the Colonic Mucus Layer

Post Carnoy’s fixation, the methanol-stored colon samples (see above) were embedded in paraffin and thin sections (5 mm) were
cut and deposited on glass slides. Alcian blue staining was performed by the following protocol: 1) deparaffinization and hydration to
distilled water, 2) Alcian blue solution for 30 min, 3) washing in running tap water for 2 min, 4) rinsing in distilled water, 5) dehydration
with 95% alcohol (2X changes) and treatment with absolute alcohol (2X changes), 3 min each, 6) clearance in xylene (3X changes),
3 min each 7) cover with coverslip. To measure the thickness of the colonic inner mucus layer, thousands of partially overlapping
photographs were taken from nearly the entire length of each colon based from the Alcian blue stained slides after cross-validation
using anti-Muc2 staining (Figures 4A and 4B main text). The images captured all of the available fecal masses of all mice, although this
number was variable and there were generally fewer colonic fecal masses in mice fed the FF diet alone or in any combination. Image
sample names were blinded by M.S.D. and M.W., and the thickness of the colonic sections were then measured by E.C.M. using
ImageJ. Only regions in which the mucus layer was sandwiched between epithelium on one side and luminal contents on the other
were used; care was taken to measure regions that represented the average thickness in each blinded image; 2 3 measurements
per image were taken and averaged over the entire usable colon surface. See Figures 4A and 4B for representative images in which
the region measured as the inner mucus layer is delineated in both Alcian blue and anti-Muc2 staining.
Measurements in Cr infected mice were conducted exactly as described for non-infected mice above, with the exception that only
distal colon rectal tissue was considered as this was uniformly a site at which inflammation was high and C. rodentium would be
present. Since only sections in which luminal contents that could be visualized adjacent to the mucus layer were considered, only
a few measurements were obtained for a single SM-colonized infected mouse fed the FF diet due to the fact that all mice in this group
were extremely morbid and not eating.
qPCR
In addition to Illumina sequencing of the 16S rRNA genes (V4 region), as a second approach to quantifying relative bacterial abun-
dance in fecal samples, phylotype-specific bacterial primers were designed. The primers were designed against randomly selected
genes that were checked for homology against the other 13 species in each case. These primer sequences are listed in Table S3A.
The primers were tested for specificities against the target strain by comparing the primer and target gene sequences against
sequences in public databases. Moreover, specificity of each primer was validated by the following three approaches: 1) by

quantitative PCR (qPCR) against target species genome and melting curve analysis (for a single peak), 2) by qPCR for each primer set
against a non-target template comprising of genomic DNA from the 13 bacterial species in our synthetic microbiota, 3) by performing
qPCR against DNA extracted from the fecal samples of germfree mice feeding on the Fiber-rich diet. qPCR was carried out in 384
wells (with each plate including known concentrations of template DNA included to plot a standard curve). The qPCR analyses were
performed using KAPA SYBR FAST qPCR Kits (KAPA Biosystems, USA) on Applied Biosystems (ABI) Real Time PCR instrument
(ABI, USA). The amount of DNA was quantified by plotting a standard curve of varying DNA concentrations of the target template.
Quantification of Short-Chain Fatty Acids

Cecal samples stored at 80 C were used to quantify short-chain fatty acids (SCFAs). Samples were first thawed on wet ice. Then,
an equivalent amount of Milli-Q water was added (100 ul per 100 mg of material) to cecal contents (R0.05 g) and the samples were
thoroughly homogenized by vortexing for 1 min. The samples were then centrifuged at 13,000 g for at least 3 min (or for a longer time
depending on time required to obtain a tight pellet). The supernatant was pipetted and filtered through a 0.22 mm filter (Millex-gv 4mm
SLGV004SL). Samples were kept on ice or frozen until quantification of SCFAs by high-performance liquid chromatography (HPLC).
Some samples were diluted to obtain enough liquid to inject onto the HPLC, or in certain cases they were diluted so that they could
be filtered. A Shimadzu HPLC with an Agilent HP-87X column was utilized for separating compounds, with a mobile phase of
0.01 N H2SO4, a flow rate of 0.6 ml/min, and a column temperature of 50 C. A UV detector set to a wavelength of 214 nm was
used to measure concentrations.
Immunofluorescence Staining
The immunofluorescence staining for Muc2 mucin was performed on the colonic thin sections after several modifications to the pro-
tocols from Johansson and Hansson, 2012 and an immunohistochemistry/tissue section staining protocol from BD Biosciences,
USA (http://www.bdbiosciences.com). The sections were deparaffinized by dipping in 50 mL Falcon conical tubes filled with xylene
(Sigma-Aldrich, USA) for 5 min, followed by transfer to another tube with fresh xylene for 5 min – care was taken to completely
immerse the tissue material in the liquid (also in the subsequent steps). This was followed by two dehydration steps of 5 min each
using 100% isopropanol contained in conical tubes. The slides were then washed by dipping in conical tubes containing Milli-Q
water. The antigens were retrieved by placing the slides in a glass beaker with enough BD Retrievagen A (pH 6.0; BD Biosciences,
USA) to cover the slides. The sections were then heated by microwaving and holding at about 89 C for 10 min (microwaving was
repeated during this time, as required). The slides were then cooled for 20 min at room temperature. Afterward, the slides were
washed 3 times with Milli-Q water. Excess liquid was gently blotted away and a PAP pen was used to draw a circle around the tissue
area, in order to better hold liquid on the tissue area during subsequent steps. Blocking was performed by immersing the slides into
blocking buffer (1:10 dilution of goat serum (Sigma, USA) in 1x Tris-buffered Saline (TBS; 500 mM NaCl, 50 mM Tris, pH 7.4)) and
incubating them at room temperature for 1 hr. For the primary antibody staining, the tissue sections were covered in a 1:200 dilution
Mucin 2 antibody (H-300) (original concentration: 200 mg/ml; Santa Cruz Biotechnology, USA) in the aforementioned blocking buffer
and incubated for 2 hr at room temperature. After the incubation step, the excess liquid was blotted away and the slides were rinsed
3 times in 1x TBS (in conical tubes) for 5 min each. The secondary antibody staining was performed by covering the tissue sections
with a 1:200 dilution of Alexa Fluor 488 conjugated goat anti-rabbit IgG antibody (original concentration: 2 mg/ml; Thermo Fisher Sci-
entific, USA) in blocking buffer and the sections were incubated for 1 hr at room temperature in dark. The excess liquid was blotted
away and the sections were rinsed twice for 5 min each using TBS. Next, the sections were stained for 5 min at room temperature in
dark using a 10 mg/ml of DAPI solution diluted in 1x TBS (Sigma-Aldrich, USA). The sections were then rinsed with Milli-Q water and
blotted dry. Finally, the sections were covered with ProLong Gold Antifade Mountant (Invitrogen, USA), covered with coverslips and
the edges of the coverslips sealed with nail polish. The slides were kept at room temperature in dark for at least 24 hr and then visu-
alized by Olympus BX60 upright fluorescence microscope (Olympus, USA).
ELISA for Fecal Lipocalin

Frozen fecal samples ( 20 C) were used to determine the levels of fecal Lipocalin (LCN-2). The assays were performed within
30 days of sample collection. The samples were prepared as mentioned previously (Chassaing et al., 2015), with a few modifications
in the sample prepration protocol: fecal samples stored at 20 C were thawed on wet ice and 6.9 67.7 mg of samples were sepa-
rated in fresh tubes, to which 0.5 mL of 1% (v/v) Tween 20 (Sigma-Aldrich, USA) prepared in PBS was added. To get a homogeneous
suspension, the samples were vortexed for 20 min. The suspension was then centrifuged at 4 C for 10 min at 12000 rpm. Next, the
supernatant was carefully recovered and stored at 20 C until the analysis. To measure the LCN-2 levels, a mouse Lipocalin-2/NGAL
DuoSet ELISA kit (R & D Biosystems, USA) was employed and the manufacturer’s protocol was followed.
Tissue Histology
To perform histology analyses on GI tracts of C. rodentium infected mice: first the intestinal segments (cecum and colon together)
were fixed in Carnoy’s fixative for 3 hr, followed by transfer to fresh Carnoy’s fixative overnight. Next, the samples were washed
in 100% methanol (2x) for 30 min each, which was followed by washing in 100% ethanol (2x) for 20 min each. The samples were
then stored in 100% ethanol at 4 C until further use. After 100% ethanol washes, the intestinal tissue samples were divided into 3
sections for histology: cecum, ascending colon, and the descending colon/rectum. These sections were embedded, processed

and cut by an experienced histology core (Washington University, USA), and then stained with hematoxylin and eosin (H and E) prior
to analysis. An unblinded experienced pathologist (T.S.S.) examined the slides from each of the groups to determine a viable readout.
The best readout was determined to be the extent of epithelial area showing crypt hyperplasia. After a scoring rubric was devised, an
independent blinded evaluator (C.A.H.) then measured the total length of each intestinal segment with a ruler in millimeters. Areas of
increased crypt hyperplasia were then determined by microscopy and the lengths of these areas were measured as a percentage of
the total epithelial length on a single slide.
Mouse Microarray Analyses

Mouse microarrays were carried out with Affymetrix Mouse Gene ST 2.1 strips. Expression values for each gene were calculated
using a robust multi-array average (RMA) approach (Irizarry et al., 2003). Linear models were fitted to the data by employing the limma
bioconductor package in R version 3.1.1. Note that selected probesets had a fold change greater than 1.5 and an FDR adjusted
p value of 0.05 or less. Data were analyzed through the use of QIAGEN’s Ingenuity Pathway Analysis (IPA, QIAGEN Redwood
City, http://www.ingenuity.com) using the default parameters. Input data correspond to significantly detected genes with
FDR < 0.05 and absolute log Fold-Change > 0.5. ILK signaling was the number 1 canonical pathway detected (p value =
8.33E-07). TNF targets corresponded to the number 1 (overlap p value = 2.64E-06) upstream regulator detected by IPA. The
category ‘Immune response of cells’ was predicted as the most significantly activated (p value = 6.83E-06 and Z-score = 2.316)
in the inflammatory response category.
Statistical Analyses
All experimental analyses were conducted in consultation with a statistician. Unless otherwise stated in individual method sections
above, all statistical analyses were performed using Prism 5.04 (GraphPad Software, Inc.), except statistics for colony forming units
(CFU) for C. rodentium (Figure 5A) were performed in Excel. Statistically significant differences are shown with asterisks as follows:
*p < 0.05, **p < 0.01, ***p < 0.001 and ****p < 0.0001; whereas, ns indicates comparisons that are not significant. Numbers of animals
(n) used for individual experiments, details of the statistical tests used and pooled values for several biological replicates are indicated
in the respective figure legends. A two-tailed t test was employed in all cases. Since generally the microbiome data did not follow a
normal distribution, for these data a nonparametric test such as the Wilcoxon test was used. An exception was the data in Figure 2C,
which generally followed a normal distribution and hence a t test was used (see details in the section: Illumina sequencing and data
analysis). For the other data that appeared normally distributed, a t test was used; otherwise, a non-parametric (Mann-Whitney) test
was used (for example for Figure 4E). Finally, ANOVA (parametric) and Kruskal-Wallis (nonparametric) methods were used to
describe differences between more than two groups. For data in Figure 6D, a non-parametric approach with Dunn’s test and
involving pairwise comparisons was employed.
Accession Numbers
Data from this study have been deposited in the NCBI Short-Read Archive (SRA) and Gene Expression Omnibus (GEO) databases
under the following accession and/or BioProjectID identifiers: 16S rRNA gene sequences and metadata (SRA: SRP065682,
PRJNA300261); RNA-seq data (NCBI: SRP092534, SRP092530, SRP092478, SRP092476, SRP092461, SRP092458, SRP092453);
mouse microarray data (GEO: GSM2084849-55). The commands used to analyze the 16S rRNA gene data can be found online at
the following link: https://github.com/aseekatz/mouse.fiber.

A
Hemicelluloses P
Plant fiber polysaccharide
(Bo, Bu, Ros &
Released
Ac
Pr
Erec) a
and starch degraders monosaccharides (mn)
op
et
B
at
io
(Bo, Bu, Bt, Bc, (E uty
e,
(Degradation by
na ina
Pectins
Fa rec, rat
Su
te
(Bo, Bt, Bc & Bu) Fpra, Mfor, Erec, Ros) e, R e Acetate all species)
cc
Cellobiose
Cs os, Acetogen
ym
te
(Bo, Bt, Ros, Erec, ) (Mfor)
Fpra & Mfor)
Starches, Fructans
(Bo, Bu, Bt, Bc, Ros,
Plant cells and fiber te,
SCFA + CO2 + H2 Dietary and released
eta amino acids (aa)
, Ac
Erec, Fpra & Mfor)
α, β-glucans ate
pion te
(Degradation by
Pro
(Bo, Bu, Bt, Ros,
cina Ecol, Csym, Col)
Suc
& Erec)
Propionate,
Sulfated and non- -2 Acetate,
sulfated GAGs Shed cells or SO 4 Succinate
(Bo, Bt & Bc)
meat
-2
Energy Mucin SO4 + H2 H 2S
Mucus source degradation Sulfate
(Amuc, Bar, Bt, Bc) reduction (Des)
Healthy
Host tissue
colonic tissue
Bacteroidetes Firmicutes Actinobacteria Verrucomicrobia Proteobacteria

1. Bacteroides ovatus (Bo) 1. Roseburia intestinalis (Ros) 1. Collinsella aerofaciens 1. Akkermansia muciniphila 1. Escherichia coli (Ecol)
2. Bacteroides uniformis (Bu) 2. Eubacterium rectale (Eur) (Col) (Amuc) 2. Desulfovibrio piger (Des)
3. Bacteroides thetaiotaomicron (Bt) 3. Faecalibacterium prausnitzii (Fpra)
4. Bacteroides caccae (Bc) 4. Marvinbryantia formatexigens (Mfor)
5. Barnesiella intestinihominis (Bar) 5. Clostridium symbiosum (Csym)
B A. muciniphila E. rectale R. intestinalis

0.8 1.5 2.0
Mucin O-glycan Pullulan
N-acetyl- Glycogen
0.6 1.5
galactosamine 1.0 Amylopectin
N-acetyl- (Maize)
OD600
OD600
OD600
0.4 glucosamine Amylopectin 1.0

Fucose (Potato)
0.5 Inulin
0.2 0.5
Xylan
Arabinoxylan
0.0 0.0 0.0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 Cellobiose 0 20 40 60 80 100
Time (h) Time (h) Time (h)
M. formatexigens F. prausnitzii Pullulan

0.8 1.5 Glycogen
Cellobiose Cellobiose Amylopectin
Inulin Inulin (Maize)
0.6
1.0 Chondroitin Amylopectin
OD600
OD600
sulfate (Potato)
(Potato
t )
0.4 Inulin
0.5 Xylan
0.2 Arabinoxylan
Cellobiose
0.0 0.0 Galactomannan
Galactot mannan
0 20 40 60 80 100 0 20 40 60 80 100
Time (h) Time (h)
Figure S1. Versatile Metabolic Abilities Contributed by Members of the Human Gut Synthetic Microbiota (SM), Related to Figure 1
(A) A schematic displaying abilities of the SM to degrade a wide variety of dietary and host-derived polysaccharides and possible metabolic interactions between
members of the SM. GAGs, Glycosaminoglycans.
(B) Representative growth curves of selected members of the SM on several polysaccharides and glycans as sole carbon sources (n = 2 for each glycan; values
are shown as averages). The absorbance was measured every 10 min. See Table S1 for raw and normalized growth values and growth media descriptions for the
13 members of the SM evaluated for carbohydrate growth ability (all except D. piger).
By experiment
0.4
Experiment 1
Experiment 2
FR: Fiber-rich diet

FF: Fiber-free diet
0.2
FR: cage 1
FR: cage 10
FR: cage 11
FR: cage 2
FR: cage 3
PCO 2 (7%)
FR: cage 4
FR: cage 7
0.0
FR: cage 8
FR: cage 9
FF: cage 1
FF: cage 2
FF: cage 5
FF: cage 6
-0.2
-0.4
-0.4 -0.2 0.0 0.2 0.4

PCO 1 (80%)
By sex
0.4
Male
Female
FR: Fiber-rich diet

FF: Fiber-free diet
0.2
FR: cage 1
FR: cage 10
FR: cage 11
FR: cage 2
FR: cage 3
PCO 2 (7%)
FR: cage 4
FR: cage 7
0.0
FR: cage 8
FR: cage 9
FF: cage 1
FF: cage 2
FF: cage 5
FF: cage 6
-0.2
-0.4
-0.4 -0.2 0.0 0.2 0.4

PCO 1 (80%)
Figure S2. PCoA Plots Demonstrating Clustering of Fecal Bacterial Communities over Time in Two Feeding Regimens, Related to Figure 2
Principal coordinates analysis (PCoA) of microbial community dissimilarity (Bray-Curtis) in fecal samples (collected according to Figure 1B) as determined by 16S
rRNA-based sequencing (V4 region). Samples from both Experiments 1 and 2 are shown, with samples coded by experiment (top panel) or sex (bottom panel) and
by cage (legend) (Experiment 1: n = 4 mice/group; Experiment 2: n = 7 mice/group).
A 50
A. municiphila B. caccae D. piger
40
Fiber rich 10
(FR)
Fiber free 8
40
30
(FF)
6
Prebiotic
30
20
(Pre)
4
1-day
20
FR/FF
10
2
Relative abundance (%)
10
0
6 13 17 21 25 42 46 50 54 6 13 17 21 25 42 46 50 54 6 13 17 21 25 42 46 50 54
Days Days Days
E. rectale B. ovatus M. formatexigens

20
30
15
25
15
20
10
10
15
10
All groups 5
5
fed FR diet
5
0
0
6 13 17 21 25 42 46 50 54 6 13 17 21 25 42 46 50 54 6 13 17 21 25 42 46 50 54
Days Days Days
B FR Relative
FF 1-day FR/FF 4-day FR/FF abundance:
Increased on FF Increased on FR Random change Increased on FF Increased on FR Random change

0 0.8
Days
6
8
13
15
16
17
18
19
20
21
22
23
24
25
26
42
43
44
45
46
47
48
49
50
51
52
53
54
B. hila
ae
E. is
ex s
C au a
is
is
B. hila
ae
E. is
ex s
C au a
is
is
ov l
ov l
C ofa ii
li
C ofa ii
li
un e r
un er
a
a
itz
co
itz
co
u
en
ym en
u
en
ym ien
et
et
rm
l
in
rm
l
in
ct
ct
su
su
te tina
te tina
B. pig
B. pig
at
at
cc
cc
F. . th
F. . th
. a sn
. a sn
om
om
ip
ip
re
re
ig
ci
ig
E.
E.
c
ifo
R bio
ifo
R bio
ca
ca
ic
ic
s
s
.
.
B
B
D
D
ih
ih
un
te
un
te
. f B.
. f B.
at
at
pr
pr
st
st
er
er
n
n
m
m
m
m
.i
.i
.s
.s
or
or
A.
in
A.
in
B.
B.
M
Figure S3. Fecal Microbial Community Dynamics in Mice from Distinct Dietary Feeding Groups, Related to Figure 2
(A) Relative abundance of indicated bacteria in mice over time subjected to various dietary regimes as determined by Illumina-based 16S rRNA sequencing
(Experiment 1). An explanation for the inverse relationship between the relative abundances of D. piger and M. formatexigens on FR and FF diets is their

competition for the same electron donor (hydrogen). The increased proliferation of mucin-degrading bacteria in the FF diet indicates higher degradation of
sulfated mucin O-glycans, which is supported by transcriptomic and enzyme assay data shown in Figure 3. The corresponding release of additional sulfate would
preferentially feed the sulfate-reducer D. piger, leading to production of the toxic metabolite hydrogen sulfide (Figure S1A). Increased D. piger has also been
observed in IBD patients (Gibson et al., 1991; Loubinoux et al., 2002), which could result from enhanced sulfate release by mucinolytic bacteria. Values are shown
as medians ± IQR. n = 4 for FR and FF groups; n = 3 for Pre and 1-day FR/FF groups.
(B) Heatmap showing Percent of Maximum Abundance (POMA) values of all species for two of the feeding groups from Illumina-based sequencing (see Fig-
ure 1B); n = 3 mice/group (Experiment 1; according to timeline in Figure 1B). See also Table S2.
A B. caccae
In vivo In vivo
In vitro
FF FF-FR 1-day
MOG
19 121 In vivo 8
β G a l te
(21)
a
FR
β G ulf
al
79
-s
Ac
116
O
Frequency
N
24 1 (4) (8) 44 6
lc
Ac
βG
(1)
N
ex
βH
10 360 4
al
Ac
Ac
(33)
βG
lN
(2)
N
eu
βG Ga
al
37
αGal c
uc Ac
αG
αN
,α
α Ga l N
αF lcN
7 (5) 9 2
al
al
αG
αG
(2) 1 1
1
0
G s
G 2
G 43
G 05
33
G 8
G 31
ta 2
G 6
G 3
G 20
G 88
G 35
PL 28
G _2
G 91
G 78
G 32
G 51
G 89
PL 29
G 8_2
PL 6
G 120
G 09
G 7
G _2
53
se
E
l fa 3
H
1
H
9
3
1
H
H
su M B
H
H
H
H
9
H
H
H
H
H
H
1
H
H
1
H
830 total genes upregulated
C
H
H
H
C
(76 degradative enzymes) Enzyme family

(representing 76 degradative enzymes)
B A. muciniphila
In vivo In vivo 10
In vitro
FF FF-FR 1-day
ul c
MOG
te
O xNA
fa
10 22
In vivo 8
e
-s
βH
(2)
FR
Frequency
104 6
8 4 7 (2) 32
Ac
N
eu
4
αN l
321
a
Ac
Ac
al c
βG
,G A
0
N
(24)
N
al alN
uc
lc
al c
Ac
αG
βG NA
αF
αG l,G
N
42 2
uc
al
al
αG l
a
a
αG
αG
βG
αF
0 (8) 0
3 0
0
1
ta 0
s
G H2
G 33
G 29
G 43
G H89
G 105
G 109
G 3
G 13
G 16
G 27
G 36
G 63
G 95
97
se
12
lfa 2
su G H
H
H
H
H
H
H
H
H
H
H
G
H
H
H
554 total genes upregulated

Enzyme family
(36 degradative enzymes)
(representing 36 degradative enzymes)
C SO4
O-sulfate αGalNAc αGalNAc
αGalNAc
αFuc α1,2 αFuc α1,2
Ser/Thr
Ser/Thr Ser/Thr
α1,3 β1,4 β1,3 β1,3 α1
α1,3 β1,4 β1,3 β1,3 α1 α1,3 β1,4 β1,3 β1,3 α1
αNeuNAc βGal βGlcNAc βGal αGalNAc αGal βGal
βHexNAc
Sulfated core 1 βGal βGlcNAc βGal A blood group βGlcNAc βGal B blood group
βHexNAc βHexNAc
Key:
N-acetyl N-acetyl N- acetyl
Galactose Fucose
glucosamine galactosamine neuraminic acid
Figure S4. Dynamic Changes in Transcriptional Profiles of B. caccae and A. muciniphila In Vivo and In Vitro, Related to Figure 3
Figures are based on RNA-Seq measurements of B. caccae (A) and A. muciniphila (B) responses in vitro (minimal medium with simple sugars or MOG) and in vivo
(constant feeding or daily alternation of FR and FF diets). In vivo samples are from the entire cecal community at the end of Experiment 1. Gene transcripts that
were increased > 5-fold relative to the corresponding simple sugar references are included for each bacterium. Venn diagrams show overlap and differences of
the transcripts between various groups. Numbers indicate the total differentially regulated gene count for a given sector, while number in parentheses denote
numbers of carbohydrate-degrading enzymes (glycoside hydrolase, polysaccharide lyase or carbohydrate esterase families counted toward this number; sul-
fatases and carbohydrate binding module, CBM, families were not counted). Note that A. muciniphila shows less regulatory versatility as manifest by most of its
upregulated enzymes being confined to the core (dark pink) sector containing all of the in vivo samples. This suggests that MOG only triggers a small percentage
of this species’ O-glycan degrading responses in vitro; although 8 enzymes were also triggered in vitro. The corresponding histograms display frequencies of
related enzyme families (shown with matching colors to their respective Venn sectors). Possible mucin-related degradative functions are given above each family-
specific histogram bar. For in vitro samples, n = 2 for each MOG and simple sugar grown condition; for in vivo samples, n = 3 mice/group (Experiment 1).
(C) Schematic mucin O-glycan structures, from among 102 that can be found on human and murine Muc2, with the sites at which various enzymes noted in (A)
and (B) would be expected, or are known, to cleave. See also Tables S4 and S5 for in vivo and in vitro transcript data.
A
Rectum
Fiber-rich (FR) diet
Rectum
Fiber-free (FF) diet
B C Top 15 diseases and functions

FR diet vs FF diet
Tissue morphology
FR diet (with synthetic microbiota)
Cell death and survival
120 FF diet
Hematological system development and function
115 Prebiotic diet
Weight change (%)
Cardiovascular system development and function

110
Organismal Development
105
Cardiovascular disease
100
Developmental disorder
95
ns Organismal Injury and abnormalities
ns ns
90
ns Connective tissue development and function
ns ns ns
85
Renal and urological system development and function
ns
80 Cancer
1 7 13 19 25 31 37 43 49 55
Days (starting with first gavage) Cell−to−cell signaling and interaction
Organ morphology
Gavaged with synthetic
microbiota (SM) for Inflammatory response
3 consecutive days (days 1, 2 and 3)
Organismal survival
0.0 2.5 5.0 7.5 10.0

−log10(p−value)
Figure S5. Histology Images, Body Weights, and Additional Cecal Tissue Transcriptional Responses of Gnotobiotic Mice Fed Fiber-Rich (FR)
and Fiber-Free (FF) Diets, Related to Figure 4
(A) Depictive histology images (Hematoxylin and Eosin of colonic thin sections) showing no overt signs of inflammation between the two dietary regimens
(Experiment 1), in the absence of C. rodentium. Scale bars, 500 mm.
(B) Weight change in mice over time. Values are shown as averages ± SEM; n = 4 for FR and FF groups; and n = 3 for Pre group (Experiment 1). ns, not significant;
One-way ANOVA with Tukey’s test.
(C) Top 15 altered diseases and functions between two dietary regimens detected by Ingenuity Pathway Analysis of microarray data (cecal tissue mRNA). n = 4 for
the FR diet group and n = 3 for the FF diet group (Experiments 2A and 3).
A Fiber-rich (FR) diet (n = 7 mice) Fiber-free (FF) diet (n = 7 mice) A. muciniphila
100
C. aerofaciens
D. piger
Synthetic microbiota (SM)

E. coli HS
80
F. prausnitzii
R. intestinalis
Fecal bacteria
M. formatexigens
60
C. symbiosum
E. rectale
B. intestinihominis
40
B. caccae
B. uniformis
B. ovatus
20
B. thetaiotaomicron
C. rodentium (Cr)
0
6 8 13 16 19 22 25 42 45 49 52 57 60 63 66 6 8 13 16 19 22 25 42 45 49 52 57 60 63 66
Days
FR diet Sacrifice 2 mice for Experiment 2B Sacrifice 2 mice for
mucus layer mucus layer
measurements (day 51) Infection with Cr measurements (day 51) Infection with Cr
(day 56; n = 5 mice) (day 56; n = 5 mice)
Experiment 2A
Mucus layer measurements Mucus layer measurements

included in Figure 4C included in Figure 4C
FR diet; SM+Cr (10 dpi) FR diet; Cr only (10 dpi)

B
FF diet; SM+Cr (10 dpi) FF diet; Cr only (10 dpi)
Figure S6. Microbial Community Structure Pre- and Post-Citrobacter rodentium Infection and Severity of Colitis Post-C. rodentium Infec-
tion, Related to Figure 5
(A) Stream plots illustrating fecal microbial community dynamics over time. Stream plots are based on Illumina sequencing of the V4 region of 16S rRNA genes
(Experiment 2A,B); see Figure 1B for timeline. See Table S2 for % relative abundance of each species in individual mice. Experimental setup for the gnotobiotic
experiments 2A and 2B is also shown.
(B) Histological images illustrating the similar severity of C. rodentium-associated hyperplasia in SM-colonized mice from different feeding groups or germfree
animals only exposed to pathogen. The images are Hematoxylin and Eosin (H and E) stained sections of unflushed cecal tissue all at 10 dpi (Experiment 2B). Scale
bars, 500 mm; higher power inset bars, 50 mm.
A SM + Cr, FR diet SM + Cr, FF diet GF + Cr, FR diet GF + Cr, FF diet
B 100 C Flushed colons (4 dpi)

a
FR diet (Cr only) FF diet (Cr only) 2.5
80
Thickness of rectal
mucus layer (μm )
Radiance (p/sec/cm2/sr) x 107

*
60 b b 2.0
40
1.5
20
1.0
0
r
ly r
ly r
+C
+C
on R-C
on F-C
M
0.5
F
-S
-S
FR
FF
Mice: 5 1 5 4
Cage, male Cage, males Cage, males Cage, males
Measurements: 152 37 136 170
Figure S7. Thickness of the Rectal Mucus Layer Post-Citrobacter rodentium Infection and Bioluminescence Images of Flushed Colons
Showing Colonization Intensity of Luciferase-Expressing C. rodentium, Related to Figures 5 and 6
(A) Periodic acid-Schiff (PAS)-Alcian Blue (AB) stained colonic thin sections showing the mucus layer (shown with opposing arrows with shafts) in recta of different
groups of mice at 10 dpi (Experiment 2B). Scale bar, 50 mm.
(B) Mucus layer measurements in the recta of mice from PAS-AB stained thin sections (exemplified in A). Asterisk indicates that the FF-SM group had only one
mouse, whose rectal mucus layer could be measured, because the other mice from this group were severely affected with colitis. Data are shown as average ±
SEM; statistically significant differences are shown with different alphabets (p < 0.01); One-way ANOVA with Tukey’s test.
(C) Flushed colons showing intensity of adherent, luciferase-expressing C. rodentium in germ free (GF) mice pre-fed the FR and FF diets and infected with the
pathogen (that is mice without the synthetic microbiota).
Article
b-Glucan Reverses the Epigenetic State of LPS-

Induced Immunological Tolerance
Boris Novakovic, Ehsan Habibi,
Shuang-Yin Wang, ..., Joost H.A. Martens,
Colin Logie, Hendrik G. Stunnenberg
Correspondence
h.stunnenberg@ncmls.ru.nl
In Brief
As part of the International Human
Epigenome Consortium (IHEC), this study
reveals that b-glucan reverses the state of
epigenetic immune tolerance that
develops after exposure to LPS and
restores the ability of human
macrophages to produce cytokines that
are critical for anti-pathogen responses.
Explore the Cell Press IHEC webportal at
http://www.cell.com/consortium/IHEC.

d Epigenetic and transcriptional characterization of human GSE85246
macrophage tolerance GSE85243
GSE85245
d LPS-exposed monocytes fail to induce macrophage-specific GSE87218
downstream pathways EGAD00001002693
d Monocyte-induced tolerance of macrophages can be
reversed by b-glucan at the epigenetic level
d In-vivo-tolerized monocytes can be reverted to a responsive

phenotype ex vivo by b-glucan
Novakovic et al., 2016, Cell 167, 1354–1368

Article
b-Glucan Reverses the Epigenetic State

of LPS-Induced Immunological Tolerance
Boris Novakovic,1,7 Ehsan Habibi,1,7 Shuang-Yin Wang,1,7 Rob J.W. Arts,2 Robab Davar,1 Wout Megchelenbrink,1
Bowon Kim,1 Tatyana Kuznetsova,1 Matthijs Kox,3 Jelle Zwaag,3 Filomena Matarese,1 Simon J. van Heeringen,4
Eva M. Janssen-Megens,1 Nilofar Sharifi,1 Cheng Wang,1 Farid Keramati,1 Vivien Schoonenberg,1 Paul Flicek,5
Laura Clarke,5 Peter Pickkers,3 Simon Heath,6 Ivo Gut,6 Mihai G. Netea,2 Joost H.A. Martens,1 Colin Logie,1
and Hendrik G. Stunnenberg1,8,*
1Department of Molecular Biology, Faculty of Science, Radboud University, 6525 GA Nijmegen, the Netherlands
2Department of Internal Medicine, Radboud University Medical Center, Radboud Center for Infectious Diseases (RCI), 6525 GA Nijmegen,
the Netherlands
3Department of Intensive Care Medicine, Radboud University Medical Center, Radboud Center for Infectious Diseases (RCI), 6500 HB
Nijmegen, the Netherlands

4Department of Molecular Developmental Biology, Faculty of Science, Radboud University, 6525 GA Nijmegen, the Netherlands
5European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton,
Cambridge CB10 1SD, UK

6Centro Nacional de Análisis Genómico (CNAG), Parc Cientı́fic de Barcelona, 08028 Barcelona, Spain
7Co-first author
8Lead Contact
*Correspondence: h.stunnenberg@ncmls.ru.nl
SUMMARY in the spectrum of innate immune memory and can be induced

by high bacterial burden in vivo or lipopolysaccharide (LPS)
Innate immune memory is the phenomenon whereby exposure in vitro (Netea et al., 2016). On the other hand, trained
innate immune cells such as monocytes or macro- immunity can be induced by exposure to certain vaccines,
phages undergo functional reprogramming after microbial components, or metabolites, and is a state character-
exposure to microbial components such as lipopoly- ized by increased pro-inflammatory response to secondary un-
saccharide (LPS). We apply an integrated epige- related infections (Netea et al., 2016). We recently showed that
tolerance (induced by LPS) and trained immunity (induced by
nomic approach to characterize the molecular events
Candida albicans b-glucan [BG]) are both associated with spe-
involved in LPS-induced tolerance in a time-depen- cific epigenomic states (Cheng et al., 2014; Saeed et al., 2014).
dent manner. Mechanistically, LPS-treated mono- Most notably, the identity of these macrophage subtypes was
cytes fail to accumulate active histone marks at specified by differences in primed and active distal element rep-
promoter and enhancers of genes in the lipid meta- ertoires (Saeed et al., 2014).
bolism and phagocytic pathways. Transcriptional Monocytes and macrophages play an important role in the
inactivity in response to a second LPS exposure in to- pathophysiology of sepsis and inflammation, along with other
lerized macrophages is accompanied by failure to de- innate and adaptive immune cells (Biswas and Lopez-Collazo,
posit active histone marks at promoters of tolerized 2009). Transcriptome analysis of tolerant monocytes from
genes. In contrast, b-glucan partially reverses the sepsis patients (Shalova et al., 2015) and a mouse sepsis model
LPS-induced tolerance in vitro. Importantly, ex vivo (Foster et al., 2007) reveals that the tolerized phenotype cannot
be explained purely through failure of specific signaling path-
b-glucan treatment of monocytes from volunteers
ways induced by pattern recognition receptors to activate
with experimental endotoxemia re-instates their ca- downstream genes. This implicates a role for local chromatin
pacity for cytokine production. Tolerance is reversed architecture and specific transcriptional regulators in control-
at the level of distal element histone modification ling the expression of tolerized genes (Glass and Natoli,
and transcriptional reactivation of otherwise unre- 2016). Further, studies in human cancers have revealed com-
sponsive genes. monalities between inflammation and cancer associated toler-
ance, including the role for IDO1 in both (Bessede et al.,
INTRODUCTION 2014). Accordingly, several anti-cancer drugs, such as bromo-
domain and extraterminal domain family (BET) inhibitors and a
Accumulating evidence suggests that monocytes can be reprog- topoisomerase inhibitor, have proven efficacious in blocking
rammed by exposure to microbe-associated molecular patterns inflammation-associated death in mice (Nicodeme et al.,
(MAMPs) during their time in the circulation (Quintin et al., 2014). 2010; Rialdi et al., 2016). The specific epigenetic and transcrip-
In this model, immune tolerance in myeloid cells, be they mono- tional remodeling induced by the initial LPS exposure and the
cytes in the circulation or macrophages in the tissues (lipopoly- extent to which it specifies tolerance to future LPS exposure
saccharide macrophages [LPS-Mfs]), represents one extreme are unknown.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
A Innate immunity memory model B
Distal H3K27ac dynamics Distal H3K4me1 dynamics
time
d0 1h 4h 24h d6 50
20
d0
PC2 (13.4%)
1 hour
PC2 (15%)
0
4 hour
Wash 0
out day 1
−20
day 6
−50
Naive (-) treatment

Mf Naive
−40
−100
LPS
LPS
LPS - Mf BG
−100 0 100 200 −50 0 50
BG
BG- Mf PC1 (55.9%) PC1 (69.3%)
C
BG up / LPS down LPS up Differentiation gain Differentiation loss
4 4 3 1
H3K27ac H3K27ac H3K27ac
H3K27ac signal
H3K27ac
3 3 2 0
logFC
2 2 1 −1
1 1 0 −2
0 0 −1 −3
d0 1h 4h d1 d6 d0 1h 4h d1 d6 d0 1h 4h d1 d6 d0 1h 4h d1 d6
Naive LPS BG
D RNAseq overall expression PCA E Mo Naive LPS BG

RNA level
2
40
0
time -2
20
d0
Differentiation gain
PC2 (10.8%)
1 hour
0
4 hour
day 1
−20
day 6
(i)
treatment
Naive
−40
LPS
BG
−60
−100 −50 0 50 100

PC1 (55.8%)
1h
4h
d1
d6
1h
4h
d1
d6
1h
4h
d1
d6
F Gene Ontology Mo Naive LPS BG
(i) Differentiation gain

Differentiation loss
Oxidative reduction, 80 genes, padj <1^-16

Metabolism, 36 genes, padj <1^-9
Lysosome, 26 genes, padj <1^-6 (ii)
(ii) Differentiation loss

Immune response, 60 genes, padj < 1^-16
Cytokine interaction, 30 genes, padj < 1^-7
Chemokine signaling, 21 genes, 21 padj < 1^-5
1h
4h
d1
d6
1h
4h
d1
d6
1h
4h
d1
d6
Mo Naive LPS BG
LPS
effect of
BG or LPS
1h
4h
d1
d6
1h
4h
d1
d6
1h
4h
d1
d6
Figure 1. Epigenomic and Transcriptomic Remodeling of Monocytes Induced by Exposure to LPS or BG

(A) Experimental setup for epigenomic interrogation of monocyte-to-macrophage differentiation and induction of tolerance (with LPS) or trained immunity (with BG).
(B) PCA plots of H3K27ac and H3K4me1 dynamic enhancers (monocytes, red circle; naive, circle; LPS, triangle; BG, square; 1 hr, blue; 4 hr, black; day 1, green;
and day 6, brown). Dynamic H3K27ac patterns show a clear deviation from the differentiation pathway (PC1) in LPS-treated cells. On the other hand, BG-treated
cells at day 1 are well on their way toward a full macrophage epigenetic profile.
Cell 167, 1354–1368, November 17, 2016 1355

Here, as part of the BLUEPRINT epigenome consortium (http:// marked enhancer repertoire of LPS-Mfs is significantly different
www.blueprint-epigenome.eu), we report the time-resolved, from those of naive-Mfs and BG-Mfs (Figure 1B). Repressive
comprehensive epigenomes of human monocyte-to-macro- marks, H3K9me3 and H3K27me3, showed no dynamics during
phage differentiation and induction of tolerance with LPS and the first 24 hr, indicating little role in the early, priming phase of
training with BG. Our epigenomic analysis revealed that tolerance innate immune memory (Figure S1C; Table S1).
and trained immunity involve opposing regulation of common In total, 17,500 enhancers with dynamic H3K27ac were identi-
pathways during early exposure to MAMPs, leading to distinct fied (Figure S1B). The two largest clusters show gain (n = 4,028) or
epigenomic states in the two macrophage subtypes. We there- loss (n = 6,462) of H3K27ac during differentiation in all three
fore hypothesized that BG may be capable of reversing LPS- macrophage subtypes (Figure 1C and S1B). The closest genes
induced tolerance. We show that ex vivo BG exposure can rein- associated with differentiation gain or loss clusters are associated
state a responsive phenotype in both monocytes tolerized by with leukocyte differentiation, activation, metabolism, and phago-
ex vivo LPS exposure and monocytes tolerized by in vivo exper- cytosis (Table S2). Upon monocyte exposure to LPS, H3K27ac in-
imental endotoxemia in healthy volunteers. This reversal of toler- duction precedes a temporally delayed H3K4me1 (Figures 1C
ance involves epigenomic reprogramming of macrophages. and S1B). The closest genes associated with these enhancers
are involved in cytokine response and nuclear factor kB (NF-kB)
RESULTS signaling, among other well-known LPS-response pathways (Ta-
ble S2). The ‘‘BG up/LPS down’’ enhancer cluster shows acceler-
Distinct Temporal Epigenetic Remodeling in Response ated H3K27ac deposition in BG-exposed monocytes and little to
to Microbial Components no H3K27ac accumulation in LPS-exposed monocytes relative to
Two innate immune memory states can be induced in culture naive-Mfs (Figure 1C). This cluster is composed of >3,200 en-
through an initial exposure of primary human monocytes to either hancers and shows concordant increase in H3K4me1 to day 6
LPS or BG for 24 hr, followed by removal of stimulus and differ- (Figure S1B). Chromatin segmentation analysis using EpicSeq
entiation to macrophages for an additional 5 days (Figures 1A (Mammana and Chung, 2015) revealed that these regions gain
and S1; Quintin et al., 2012; Saeed et al., 2014). The three sub- H3K4me1 at the expense of repressive H3K27me3 markings in
types of macrophages generated in this study were untreated naive-Mfs and BG-Mfs (Figure S1D). Conversely, LPS-Mfs main-
naive macrophages (naive-Mfs), LPS-exposed tolerized macro- tain a chromatin state more similar to monocytes, primarily low
phages (LPS-Mfs), and BG-exposed trained macrophages (BG- H3K4me1 with the presence of H3K27me3 (Figure S1D). The
Mfs). To gain insight into the mechanisms and order of events closest genes to these enhancers are involved in lipid biosyn-
that ultimately lead to these three subtypes, we generated epi- thesis and lysosome and leukocyte differentiation (Table S2), indi-
genomic data at several time points during this process (two do- cating that BG exposure leads to the accumulation of membrane
nors; summarized in Table S1 and Figure S1; GEO: GSE85246). components necessary for phagocytosis and cytokine release,
Depending on the modification, 2%–31% of marked regions whereas LPS exposure prevents their activation (Figure 1C).
showed dynamics during differentiation or LPS or BG exposure,
with H3K27ac at promoters and enhancers being the most dy- Transcriptome Changes Modulated by LPS and
namic mark in number and range (Figure S1; Table S1). Interest- b-Glucan
ingly, epigenetic changes were observable as early as 1 hr in RNA sequencing (RNA-seq) was performed on the same time
response to LPS and 4 hr to BG (Figure 1B). The overall points as epigenetic marks (n = 2 donors; Figure 1A). General ki-
H3K27ac pattern at dynamic promoters and enhancers indicates netics similar to those unveiled for epigenetic remodeling was
that the most pronounced changes are associated with differen- observable, with monocytes clustering after a short exposure
tiation (principal component 1 [PC1]), with BG- and RPMI- to BG and LPS (Figure 1D). Over the time course, the major
treated monocytes partially establishing macrophage-specific changes in gene expression patterns were associated with dif-
active regions already by day 1 (Figure 1B). Conversely, LPS ferentiation (PC1, 55.8% of the variance) and LPS exposure
treatment results in establishment of pro-inflammatory associ- (PC2, 10.8% of the variance), which is most pronounced at
ated active elements (PC2) and stunted differentiation, followed 4 hr and day 1 (Figure 1D). Over 5,700 protein-coding genes
by partial ‘‘catch up’’ establishment of differentiation marks showed dynamic expression (fold change [FC] > 2, adjusted p
following removal of stimulus (Figures 1B and S1). Contrary to value [padj] < 0.05) in our model between either treatments or
this catch up of H3K27ac marked enhancers, the H3K4me1 time points (Table S3; Figures 1E and S2A). LPS-induced genes
(C) A total of 17,500 H3K27ac dynamic gene-distal regions were identified and can be clearly separated into four clusters: BG up/LPS down, LPS up, differ-
entiation gain, and differentiation loss. Solid lines are median log-FC relative to day 0, and shaded areas represent the 25th and 75th quartile. Naive cells are shown
as a green line, LPS as a red line, and BG as a purple line. H3K4me1 at these regions can be seen in Figure S1B; LPS induces early H3K27ac accumulation,
followed by long-term H3K4me1 marking, while BG induces concurrent accumulation of H3K27ac and H3K4me1.
(D) PCA plots showing the relationships among all samples based on dynamic gene expression. PC1 explains most of the variation and is associated with
differentiation. PC2 is LPS related, with LPS 4 hr and LPS day 1 samples separating from the corresponding naive and BG samples.
(E) Heatmap of differentiation associated genes, as well as those induced by LPS or BG exposure. The general trend in expression is that BG exposed cells start to
express differentiation associated genes faster (at day 1) than naive cells, while LPS exposed cells lag behind.
(F) Top pathways associated with differentiation and showing opposing directions in response to BG and LPS.
See also Figures S1, S2, and S3 and Tables S1, S2, S3, and S4.
1356 Cell 167, 1354–1368, November 17, 2016

A BG up/LPS down LPS up Differentiation gain Differentiation loss
100
EGR2 NFKB SPI1 EGR2 JUNB
ARNT SPI1 JUNB

75
Enhancers
NFKB 1
CREB1
Relative importance (RF)
50
25
100
SPI1 SPI1 IRF NRF1
MITF
CREB1
75 ZFP161
Promoters
E2F JUNB
NFKB
CREB1 KLF15
50
25
0
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100
Relative importance (PLS)
B C
H3K27ac dynamic H3K27ac dynamic
enhancers promoters
Abundance over background
0.15 EGR2 MITF

BG
MITF ARNT
ARNT FOXK2 Dectin -1 1
CSF1
0 SPI1 PBX3
BCL11A ZFP161
ETS2 MTF1
EGR2
E MITF
F
CEBPB TFDP2 USF2
2
−0.15
CREB1 SPI1
GMEB2 EHF
0.5 E4F1 SMAD1 Lysosome
abundance in background
NFKB BACH1 Lipid biosynthesis

HIVEP2 ATF3
NFATC3 BCL6 Effect on expression by:
JUNB E4F1 LPS
BCL11A BG
LPS up
BG up / LPS down
Background
CREB1
0
JUNB
NFKB
LPS up
BG up / LPS down
Background
Figure 2. Motif Enrichment at Epigenetically Dynamic Promoters and Enhancers and Associated Transcription Factor Networks
Motif enrichment analysis was performed on ATAC-sequencing (nucleosome-free) peaks that overlap H3K27ac dynamic enhancers and H3K27ac promoters.
(A) Random forest (RF) and a partial least-squares (PLS) classifiers were trained using the TF motifs found by GIMME to determine features (TF motifs) based on
their ability to separate the 4 H3K27ac clusters shown in Figure 1C. Both classifiers produce a feature importance score (between 0 and 100), which is a measure
of how ‘‘characteristic’’ the presence or absence of the TF motif is for the considered cluster. Green dots represent positive features (motif over-represented in
cluster), and red dots represent negative features (motif under-represented in cluster). The EGR2 motif was the strongest positive feature for the BG up/LPS down
Cell 167, 1354–1368, November 17, 2016 1357

were involved in immune response, whereas LPS-delayed genes classifiers were trained with the caret R-package using 10-fold
were generally differentiation associated (Figure 1E; Table S4). cross-validation, repeated five times. We define positive (green
The major ontologies of BG-induced genes (662 genes at day dots) and negative predictors (red dots) as TF motifs that are
1) were lipid biosynthesis, metabolism, and the lysosome more or less abundant of the considered cluster compared to
pathway (Figure 1E; Table S4). Intersection between exposure- the other clusters, respectively.
dependent gene expression and promoter acetylation patterns Enhancers that show differentiation gain in H3K27ac were en-
showed a strong overlap between H3K27ac and gene expres- riched for the SPI1 (PU.1) motif, while LPS-induced active en-
sion temporal profiles (Figures S2B and S2C). hancers were enriched for NF-kBmotif (Figures 2A and 2B).
The top positive predictor motif for the BG up/LPS down cluster
LPS-Specific DNA (De)methylation Signatures was EGR2, with a score of 100, followed by ARNT (Figure 2A).
Recent studies have revealed extensive DNA methylation re- EGR2 is downstream of dectin-1 (Goodridge et al., 2007) and
modeling during B cell (Kulis et al., 2015) and osteoclast differen- shows prominent, transient induction in BG-exposed monocytes
tiation (de la Rica et al., 2013; Nishikawa et al., 2015). Con- (Figure S4A). Enhancers with EGR2 motifs are mainly associated
sidering that our ex vivo differentiation model occurs in the with genes involved in lipid metabolism and biosynthesis and
absence of cell division, we were interested to see the extent lysosome function (Figures 1E and 1F). The early activation of
to which (de)methylation plays a role during monocyte-to- these pathways in BG may account for the higher expression
macrophage differentiation and innate immune memory. Unlike of LAMP1, the major component of the mature lysosome, in
the comprehensive histone modification remodeling, consistent BG-Mfs (Figure S4A).
DNA methylation change (at least 30% change and four or Interestingly, LPS-exposed monocytes do not transiently acti-
more significant differentially methylated CpGs per differentially vate EGR2 (Figure S4A). The discordant effect of BG and LPS on
methylated region [DMR]) was limited to a few hundred genomic EGR2 expression, the differential H3K27ac deposition at associ-
regions (Figure S3). The vast majority of DMRs showed loss of ated enhancers, and expression of downstream lipid metabolism
methylation during monocyte-to-macrophage differentiation genes suggests that this pathway plays a role in inducing trained
irrespective of MAMP exposure (Figure S3B), consistent with immunity as opposed to tolerance. In order to further confirm the
recent findings in macrophages and dendritic cells (Vento- relationship between EGR2 and downstream lipid pathways, the
Tormo et al., 2016). We did not observe a role for DNA methyl- DNA-binding motif of EGR2 was scanned at the promoters of
ation in ‘‘training’’ the macrophages for future transcriptional known transcription factors, as well as lipid metabolism and
response to infection. More than 90% of DMRs occurred at distal lysosome genes that are induced in BG-Mfs compared to mono-
elements marked by H3K4me1, and only 6% occurred at pro- cytes (Figure S4B). The EGR2 motif was found at the promoter of
moters (Figure S3C). Cumulatively, our data indicate that LPS- several highly expressed TFs, including MITF, which is a positive
specific DNA methylation changes occur and, due to the more identifier for the differentiation gain promoter cluster (Figure 2A),
stable nature of this mark, may represent a useful biomarker and is also not activated in LPS-exposed monocytes (Fig-
for LPS-induced macrophage tolerance (Figures S3D and S3E). ure S4A). Cumulatively, EGR2, MITF, and downstream TF motifs
were found at the promoters of 79% of induced lipid metabolism
LPS- and b-Glucan-Specific Transcriptional Networks and lysosome genes (Figure 2B). This analysis suggests that BG/
Motif analysis was used to gain insight into which pathways dectin-1-induced EGR2 activation leads to higher expression of
and transcription factors (TFs) regulate the epigenetic changes downstream TFs (e.g., MITF) and the establishment of promoters
associated with differentiation and LPS or BG exposure. Four and enhancers that drive the expression of lysosomal and lipid
clusters of enhancers and promoters were designated based metabolism genes (Figure 2C). Given the importance of lipid
on H3K27ac dynamics over time: BG up/LPS down, ‘‘LPS up,’’ pathways in macrophage function, the opposing effect of LPS
‘‘differentiation gain,’’ and ‘‘differentiation loss’’ (Figures 1C and BG on these genes suggests that this pathway may play a
and S1B). Two classifiers (random forest [RF] and a partial critical role in the low cytokine release in LPS-Mfs and elevated
least-squares [PLS]) were trained, using the TF motifs found by release in BG-Mfs.
GIMME (van Heeringen and Veenstra, 2011). Both score features
(TF motifs) were based on their ability to separate the clusters— Transcriptional Response of Tolerized Macrophages to
the so-called feature importance score (between 0 and 100)— LPS Re-exposure
which is a measure of how characteristic the presence or Previous analysis in an ex vivo mouse model showed that tolerant
absence of the TF motif is for the considered cluster. Both LPS-Mfs are impaired in their ability to produce pro-inflammatory
enhancer cluster, NF-kB for the LPS up cluster, SPI1 (PU.1) for the differentiation gain cluster, and JUNB for the differentiation loss cluster. At the promoter
regions NF-kB was a positive feature for LPS up cluster, MITF for the differentiation gain cluster, and CREB1 and JUNB for the differentiation loss cluster.
(B) Motif enrichment is plotted as absolute difference in abundance compared to background (yellow, higher abundance than background; blue, lower
abundance than background) for the top enriched motifs. Consistently identified transcription factor motifs include SPI1 at differentiation associated enhancers,
NF-kB at LPS enhancers, and EGR2 and MITF at BG enhancers. Abundance increase over background supports the level of importance score.
(C) A diagram of the transcription factor network based on EGR2 and MITF motif occurrence at BG-induced lysosome and lipid metabolism genes. Purple arrows
indicate the direction of expression induced by BG exposure, and red arrows indicate the direction of expression induced by for LPS exposure. BG exposure
induces transient expression of the genes, while LPS exposure inhibits activation. The full network based on promoter abundance is shown in Figure S4B.
See also Figure S4.
1358 Cell 167, 1354–1368, November 17, 2016

Day 6 Re-exposure
A
24h 4h LPS
Culture
Naive
4h LPS ChIP- seq
LPS
RNA- seq
Naive Tolerized
monocyte BG 4h LPS
Trained
B C Expression of LPS induced genes

H3K27ac enhancer H3K4me1 enhancer
25 time
Tolerized (G2) Tolerized (G1)

50
d0
1 hour
0 B
PC2 (14.2%)
PC2 (21.8%)
4 hour
0 day 1
day 6
−25
re-exposure
−50
treatment
Naive
−50
−100 LPS Partially
BG
−100 0 100 −50 0 50 100
PC1 (46.2%) PC1 (71.6%)
Transcriptome H3K27ac promoter

Responsive (G3)
time
50
d0
25
1 hour
3
Expression
PC2 (14.3%)
25 4 hour
PC2 (23%)
day 1
0
day 6
0
re-exposure
treatment
−25
Naive -3
−25
LPS
−50 BG
−50 0 50 −40 −20 0 20 40 60
PC1 (52.7%) PC1 (42.5%) Naive Day 6
Tolerized LPS re-exposure
Figure 3. Macrophage Endotoxin Tolerance Defined at the Transcriptional Level following LPS Re-exposure
(A) The innate immune memory model, including data collection at LPS re-exposure at day 6.
(B) PCA plots of dynamic RNA-seq, H3K27ac at promoters and enhancers, and H3K4me1 peaks, including LPS re-exposure samples. After re-exposure to LPS,
significant enhancer H3K27ac changes occur in LPS-Mfs, indicating that they are capable of activating their enhancers. However, the level of their response is
lower compared to monocytes, naive-Mfs, and BG-Mfs, which can be seen on the second principal component. Unlike RNA and H3K27ac, H3K4me1 does not
show significant changes following LPS re-exposure in any of the three macrophage subtypes.
(C) The total macrophage transcriptional response (750 genes) to LPS was separated into three groups based on the induction of genes in LPS-Mfs, relative to
naive-Mfs and BG-Mfs, revealing a gradient in LPS-Mf response to LPS re-exposure. The groups are (G1) tolerized genes, (G2) partially tolerized genes, and (G3)
responsive genes.
See also Figure S5.
cytokines, but maintain their ability to express other genes, such enhancers (observable as large shift in PC2; Figure 3B). This indi-
as those required for tissue repair (Foster et al., 2007). Given the cates that tolerized macrophages can and do respond to LPS at
wide-ranging epigenetic alterations in LPS-Mfs (Figure 1C; Table the epigenetic and transcriptional level. However, from H3K27ac
S1), we sought to investigate the epigenetic basis for endotoxin and H3K4me1 principal-component analysis (PCA), it is clear that
tolerance by exposing differentiated naive-Mfs, LPS-Mfs, and the epigenetic profile of LPS-Mfs is markedly different from that of
BG-Mfs to LPS for 4 hr (LPS re-exposure) (Figure 3A). The overall naive-Mfs and BG-Mfs (observable as an LPS-Mfs lag on PC1;
transcriptional and histone modification changes induced in mac- Figure 3B).
rophages by LPS re-exposure are shown in Figure 3B, and few Polytomous modeling was used to separate genes based on
differences were observed between naive-Mfs and BG-Mfs. their transcriptional response to LPS re-exposure (4 hr) in mac-
LPS-Mfs show an avid response to LPS re-exposure both tran- rophages at day 6. In total, 780 genes showed higher expression
scriptionally and with H3K27ac deposition at promoters and distal (FC > 2, posterior probability > 0.3) in naive-Mfs following 4-hr
Cell 167, 1354–1368, November 17, 2016 1359

A
1.5
Naive-Mf
Day 6
z-score
0.5 LPS-Mf
RNA
-0.5 Naive-Mf
Restim
-1.5
LPS-Mf
B IRF
STAT2
EGR2 15
KLF6
% motif abundance chage

TP53 10
ZBTB7B
ZBTB33 5
E2F3
HIF1A 0
TFDP2
SP1 −5
NRF1
−10
HINFP
CREB1
−15
ZNF350
ZFP161
C
G1) tolerized G2) Partially tolerized G3) Responsive
Relative importance (RF)
100
E2F3 SP1 IRF ZNF350
ZBTB7B EGR2
STAT
75 EGR2 E2F3
50
25
0
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100
Relative importance (PLS)
D G1 G2 G3
(n = 95) (n = 106) (n = 189)
Dynamic H3K27ac promoters Dynamic H3K27ac promoters Dynamic H3K27ac promoters
2 2 2
1 1 1
logFC
logFC
logFC
0 0 0
RPMI RPMI RPMI

LPS LPS LPS
BG BG
−1 −1 −1 BG
d0 1h 4h d1 d6 R d0 1h 4h d1 d6 R d0 1h 4h d1 d6 R
Figure 4. Histone Modification Dynamics and Open Chromatin Analysis at Tolerized Gene Promoters
(A) Heatmap showing average expression of 777 LPS-responsive genes in naive-Mfs. Genes are ranked based on their induction in LPS-Mfs, first by tolerance
group (G1, G2, and G3) and then by relative induction compared to naive-Mfs within each group. Response to LPS re-exposure is a gradient in LPS-Mfs, with the
most tolerized genes on the left and the most responsive genes on the right.
(B) Heatmap showing abundance of significant motifs in the promoter regions of the three macrophage LPS-responsive gene groups. The tolerized gene pro-
moters are enriched for several transcriptional repressors, such as EGR2 and TP53, while the partially tolerized gene promoters are enriched for IRF and STAT
motifs.
(C) Random forest (RF) and a partial least-squares (PLS) classifiers importance score (between 0 and 100) for each tolerized gene cluster (G1, tolerized; G2,
partially tolerized; and G3, responsive). Green dots represent over-represented motifs, and red dots under-represented motifs. The top features of G1 gene
1360 Cell 167, 1354–1368, November 17, 2016

LPS exposure (Figure 3C; Table S5). Transcriptional responsive- moters closely matched those of H3K27ac (Figure S5B). This
ness to LPS re-exposure in LPS-Mfs is a gradient, with genes finding suggests that LPS-Mfs fail to accumulate H3K27ac at to-
showing complete tolerance (unresponsiveness) (cluster G1), a lerized genes either through absence of pro-inflammatory acti-
partial response (G2), or a full response comparable to naive- vators, such as IRF and STATs in the case of G2 genes, or
Mfs (G3) (Figures 3C and S5A). Cytokine genes were the most through presence of tolerance inducing TFs, such as HIF1A in
enriched group and were spread across the LPS re-exposure the case of G1 genes.
response gradient, with CXCL9 (G1) and TNF (G2) showing com-
plete or partial tolerance and IL6 and IL8 showing comparable b-Glucan Exposure Can Reverse Tolerance in Both
responsiveness to naive-Mfs (G3) (Figures S5B and S5C). The In Vitro and In Vivo LPS-Exposed Monocytes
normal induction of interleukin 6 (IL-6) mRNA expression and As indicated before, BG and LPS have an opposing effect on
the absence of response in ELISA assays therefore suggests EGR2 and MITF expression (Figure S4A), accumulation of
that tolerance is a complex phenotype that involves both damp- H3K27ac at target enhancers and promoters (Figure 1C), and
ened transcriptional responses to LPS re-exposure and an expression of genes involved in macrophage function, such as
inability to release some cytokines (Figure S5C). The top toler- lipid metabolism and lysosome and cytokine production (Fig-
ance-specific biological process was ‘‘cytokine production,’’ ure 1E). These findings point to a potential for reversal of LPS-
while the top pathway was ‘‘RIG-I-like signaling’’ and ‘‘p53 induced tolerance by using BG to stimulate the dectin-1
signaling’’ (Figure S5D). pathway. To test this hypothesis, monocytes were exposed to
LPS for 24 hr and then to BG for 24 hr, followed by a rest period
Epigenetic Profile of Tolerized Genes before LPS re-exposure (Figure 5A). We refer to these macro-
To understand the molecular mechanisms involved in the altered phages as ‘‘rescue-Mfs’’. Additionally, we used the clinically
gene induction by LPS re-exposure in LPS-Mfs, we investigated relevant small molecular histone mimic bromodomain and extra-
promoter motif enrichment at overlapping assay for transposase terminal domain family (BET) inhibitor (IBET)151 in a co-treat-
accessible chromatin (ATAC) peaks. Because transcriptional ment with LPS (‘‘preventative’’) or following LPS exposure
responsiveness to LPS re-exposure in tolerized macrophages (‘‘reversal’’) setting (Figure 5A). ELISAs showed that BG expo-
occurs on a gradient (Figure 4A), motif enrichment at promoters sure was able to reverse LPS-induced tolerance and reinstate
was scanned in a sliding window of 100 promoters throughout normal levels of cytokine release in rescue-Mfs (Figure 5B). On
the response gradient from most tolerized (G1) to responsive the other hand, IBET151 was only effective in preventing toler-
(G3) genes (Figure 4B). This analysis identified discrete motif sig- ance when used to block the LPS-induced response, but it did
natures in the G1 and G2 tolerized groups. The G1 gene promoters not reverse tolerance when administered after LPS (Figure 5C).
were enriched for several TF motifs, including EGR2, HIF1A, and This is in line with the finding that IBET151 is effective in blocking
p53. The latter TF was also identified as a top tolerized pathway inflammation-associated death in mice (Nicodeme et al., 2010)
(Figure S5D). The partially tolerized genes are enriched for IRF but suggests that IBET151 is not an effective treatment in
and STAT motifs (Figure 4B). Random Forest analysis also indi- monocytes that have already experienced an inflammatory
cated that EGR2 was the top identifier for the G1 group, while response. Therefore, BG represents a possible treatment option
IRF and STAT motifs are top identifiers for the G2 group. Interest- for restoring proper macrophage cytokine release during the
ingly, the G3 group does not contain positive identifiers (Figure 4C). post-inflammation tolerance phase.
IRF and STAT genes show a tolerized pattern (Figure S6A), indi- The suitability of the in vitro tolerance model to mimic the
cating that their unresponsiveness to LPS effects downstream in vivo situation is a major question. Chiefly, does LPS exposure
partially tolerized genes. On the other hand, NFKB1 and RELA in vivo induce the same transcriptional responses in monocytes,
showed normal induction in LPS-Mfs (Figure S6B). and can in vivo LPS-induced tolerance be reversed by BG? To
Dynamic H3K27ac change during differentiation and LPS or answer these questions, we used an in vivo experimental human
BG exposure was plotted over the promoter regions of G1, G2, endotoxemia model (Draisma et al., 2009) (Figure 5D). In this
and G3 genes (Figure 4D). Dynamic promoter H3K27ac was model, healthy volunteers are injected with 2 ng/kg US Standard
observed for roughly half of all genes, with the rest showing Reference Endotoxin Escherichia coli O:113 LPS (Pharmaceu-
consistent high acetylation during all time-points, including tical Development Section of the National Institute of Health, Be-
LPS re-exposure (not shown). Tolerized genes (G1) and partially thesda, MD, USA), which leads to a sepsis-like state (reviewed in
tolerized genes (G2) showed no or impaired accumulation of Bahador and Cross, 2007). Study protocols were approved by
H3K27ac, respectively, after LPS re-exposure in LPS-Mfs the local ethics committee of the Radboud University Nijmegen
compared to naive-Mfs and BG-Mfs (Figure 4D), while respon- Medical Centre. The volunteers experience transient fever and
sive genes (G3) were equally acetylated after LPS re-exposure cold chills as well as pro- and anti-inflammatory cytokine signa-
in all subtypes (Figure 4D). H3K4me3 patterns at these pro- tures. The in vivo LPS-exposed monocytes show elevated
promoters are E2F3, EGR2, and ZBTB7B motifs. The top features for G2 gene promoters are IRF and STAT, while G3 promoters do not have over-represented
features but are depleted of EGR2, E2F3, and ZNF350.
(D) Median H3K27ac at dynamic promoters of G1, G2, and G3 group genes, shaded areas represent the 25th and 75th quartile. This shows that LPS-Mfs do not
accumulate H3K27ac at tolerized genes but do so at the promoters of responsive genes. See also Figure S6 for H3K4me3.
See also Figure S6.
Cell 167, 1354–1368, November 17, 2016 1361

A 24h 48h Day 6
Culture naïve-Mf
LPS LPS-Mf LPS Cytokine
LPS + BG rescue-Mf release
LPS co-IBET LPSco- IBET
Naive
LPS+IBET Macrophage
monocyte LPS + IBET
B IL6 release recovery by BG C IL6 release recovery by IBET or BG
3000 1400
2500 1200
IL6(pg/mL)
IL6(pg/mL)
1000
2000
800
1500 600
1000 400
500 200
0 0
Donor: A Naïve
B C D ATolerized
B C D Tolerized
A B C Dco-ATolerized
B C D ATolerized
B CD
f
f
-M
M
f
M
IBET +BG
LPS +IBET
LPS
S-
naive-Mf LPS-Mf rescue-Mf

ue
e-
LP
iv
sc
coIBET + iBET
na
re
D In vivo endotoxemia model E IL6 release relative

to Naïve (n =12) IL6 release Individual donors
7 6000
6
LPS (4 hours) 5 5000
logFC to Naive
IL6 (pg/mL)
4
IL6 (pg/mL)
3 4000
ICU 2
3000
Fever 1
0
Inflammation -1
2000
-2 1000
Naive Tolerized -3
Monocyte -4 0
monocyte
G
G
G
G
ed
ed
ve
+B
+B
+B
+B
riz
ai
riz
N
ed
ed
ve
le
ve
le
To
To
riz
riz
Ex vivo
ai
ai
N
le
le
To
To
Media B-glucan Media B-glucan

F
TNF release relative
to Naïve (n =12) TNF release Individual donors
7 12000
24h 24h 6
5 10000
logFC to Naive
TNF ( pg/mL)
TNF ( pg/mL)
4
8000
3
2 6000
1
4000
0
72h 72h -1 2000
-2
-3 0
LPS LPS
G
G
ve
ed
G
G
ed
+B
+B
+B
24h 24h
ai
+B
riz
riz
ed
ve
le
ed
ve
le
To
riz
Cytokine
ai
To
Cytokine
riz
ai
le
N
le
release
To
release
To
Figure 5. BG Can Reverse Both In Vitro and In Vivo LPS-Induced Tolerance and Reinstate Proper Cytokine Production in Macrophages
(A) The in vitro monocyte tolerance reversal model, with BG added therapeutically after 24 hr of LPS exposure (rescue-Mfs). The histone-mimic and inflammation
blocker IBET was used in a preventative (co-culture with LPS for 24 hr LPS-co-IBET-Mfs) and a therapeutic (added after 24 hr of LPS exposure [LPS + IBET-Mfs])
manner. Following several days of rest, macrophages were re-exposed to LPS and cytokine release measured after 24 hr.
(B) BG re-instates IL-6 release in tolerized macrophages. Data from six donors are shown for naive-Mfs, LPS-Mfs, and rescue-Mfs.
1362 Cell 167, 1354–1368, November 17, 2016

mRNA expression of key cytokines at 4 hr (not shown) and fail to exposure (Figure 7). Interestingly, while IBET151 blocks 75%
release cytokines in response to a second ex vivo LPS exposure. of the transcriptional response to LPS in monocytes at 4 hr (data
In this regard, they behave much like in vitro LPS-tolerized not shown), LPS-co-IBET151-Mfs look more like LPS-Mfs at day
monocytes. Monocytes were isolated from peripheral blood 6, indicating no effect of IBET151 on the overall epigenomic profile
taken before and after LPS administration, and then exposed of LPS-Mfs (Figure 7A, blue square). The effect of BG exposure on
ex vivo to either culture medium alone, or with BG. Cytokine H3K27ac deposition in LPS-Mfs was observable at both pro-
release was measured following LPS re-exposure in culture (Fig- moters and distal enhancers of genes involved in metabolism
ure 5D). Ex vivo BG exposure increased the release of tumor ne- and lipid biosynthesis (Figures 7C and 7D).
crosis factor (TNF) and IL-6 in tolerized monocytes at LPS re-
exposure (Figures 5B and 5C). This finding indicates that BG DISCUSSION
can restore cytokine production of in-vivo-tolerized monocytes.
Cumulatively, this confirms that the mechanisms involved in the Perturbation of normal monocyte-to-macrophage differentiation
establishment of tolerance by LPS in vivo and in vitro are similar, by exogenous signals, such as high bacterial burden in sepsis,
validating the use of the in vitro model to study reversal of toler- can lead to a changed chromatin state and an associated devi-
ance by BG. More importantly, it suggests that the BG effect on ation from steady-state function (Amit et al., 2016). This phenom-
monocyte tolerance may be transferred to the clinic in the future. enon is known as innate immune memory, with the best-charac-
terized outcomes being endotoxin tolerance or trained innate
b-Glucan Recovers the Transcriptional Response to LPS immunity (Netea et al., 2016). Trained immunity can have bene-
at Tolerized Genes ficial effects through priming of macrophages for stronger re-
Next, we assessed whether BG reverses tolerance at the tran- sponses to subsequent infection and can be induced by a variety
scriptional level. In this experimental setup, monocytes were of MAMPs, such as C. albicans (Quintin et al., 2012), Bacille
exposed to LPS followed by BG and then left to rest 24 hr or Calmette-Guérin (BCG) vaccine (Kleinnijenhuis et al., 2012),
4 days before LPS re-exposure for 4 hr (Figure 6A). Additionally, and BG (Saeed et al., 2014). Conversely, exposure to high levels
monocytes were treated with a combination of LPS and IBET151 of LPS can induce a tolerized macrophage phenotype, which is a
(preventative) and LPS followed by IBET151 (reversal). BG was major cause of sepsis-associated mortality (SepsisReport,
able to recover the induction of 60% of tolerized genes at day 2012). Previously, we showed that tolerized macrophages
6 (Figure 6A), including several pro-inflammatory TFs (Figure 6B). (LPS-Mfs) and trained macrophages (BG-Mfs) have distinct
Similar effects were observed when IBET151 was used in a pre- epigenetic (Saeed et al., 2014), and metabolic states (Cheng
ventative model, indicating that BG reversal of LPS-induced et al., 2014). Mouse studies have shown that such distal element
tolerance leads to an outcome similar to that produced by block- markings are important for appropriate responses to infection
ing LPS-induced tolerance altogether. Overall BG reversal led to (Ghisletti et al., 2010; Ostuni et al., 2013) and identity of tissue-
a higher median expression of tolerized genes compared to both resident macrophages (Amit et al., 2016; Lavin et al., 2014).
preventative and reversal use of IBET151 (Figure 6A). Nevertheless, until now, the epigenetic basis for endotoxin toler-
ance in humans has not been explored.
Epigenomic Analysis of b-Glucan Recovery of Tolerized In the current study, our aim was to unveil the early epigenetic
Macrophages and transcriptional events following monocyte exposure to LPS
BG exposure following LPS exposure recovers the expression of or BG and how the resulting epigenetic landscapes determine
genes involved in lipid biosynthesis, phagocytosis, and cytokine the function of tolerized and trained macrophages. LPS- and
transport (Figures S7A and S7B). Recovery of expression was BG-induced active histone dynamics were observed as early
observed as early as day 3 and maintained at a higher level in mac- as 1 hr and 4 hr after exposure, respectively (Figures 1B and
rophages at day 6 (Figure S7B). Interestingly, addition of BG to 1C). Generally, H3K27ac accumulation was accompanied by
naive and tolerized monocytes at day 1, elicited the expression H3K4me1 accumulation, most obviously at BG-induced en-
of EGR2 and MITF within 4 hr, with a lower induction in tolerized hancers (Figure S1B). Contrary to this general pattern, LPS-
monocytes (Figure S7C). These findings indicate that BG-induced induced active enhancers, associated with an inflammation
receptor pathways remain at least partially inducible after the LPS- response, showed discordance in time with accumulation of
induced cytokine response and that these pathways can partially H3K4me1 (Figure S1B), which remained at higher levels in
recover the naive macrophage epigenetic and transcriptional pro- LPS-Mfs compared to naive-Mfs and BG-Mfs. This persistence
grams. Analysis of dynamic H3K27ac promoters and enhancers of H3K4me1 in LPS-Mfs contributes to the overall epigenetic
in naive-Mfs, LPS-Mfs, rescue-Mfs, and LPS-co-IBET151-Mfs signature of this macrophage subtype and may account for
revealed that BG exposure restores H3K27ac deposition at re- some of the tolerized phenotype (Figure 1B). More pointedly,
gions where H3K27ac increase was not obtained following LPS we discovered a set of more than 3,000 de novo macrophage
(C) Preventative use of IBET blocks the first LPS response in monocytes, resulting in differentiation of macrophages that can release cytokines at the second LPS
exposure. Therapeutic use of IBET does not re-instate cytokine release in macrophages.
(D) Experimental human endotoxemia model, with ex vivo BG administration. Monocytes were isolated from 12 healthy volunteers before (naive) and 4 hr after
LPS injection (tolerized). Naive or tolerized monocytes were exposed to BG for 24 hr, followed by culture media, or culture media alone. After 3 days ex vivo,
monocytes were re-exposed to LPS, and cytokines were measured 24 hr later.
(E and F) BG recovered IL-6 release in 9 out of 12 tolerized monocytes (E) and TNF release in 8 out of 12 monocytes (F). Data are presented as mean ± SD.
Cell 167, 1354–1368, November 17, 2016 1363

A B
6.5 Median expression
All tolerized genes Recovery of expression by BG at LPS re-exposure
6.0 for specific TFs and downstream targets
5.5 STAT2 STAT5A

log2 RPKM
2.5 2
5.0 2
1.5
logFC
1.5
4.5 1
1
4.0 0.5
0.5
3.5 0 0
similarity to naive response
f
f
f
f
-M
-M
-M
M
-M
S-
S-
ve
ve
ue
ue
LP
LP
ai
ai
sc
sc
N
N
re
re
(G1) Most tolerized
IRF1 IRF8
1.6 3
1.4
2.5
1.2
0 logFC 2
1
0.8 1.5
0.6
1
0.4
Individual genes
0.5
0.2
0 0
f
f
f
f
-M
-M
M
-M
-M
S-
S-
ve
ve
ue
ue
LP
LP
ai
ai
sc
sc
N
N
re
re
(G2) Partial tolerized
ITIH4 CXCL10
4.5 8
4 7
3.5 6
3
5
logFC
2.5
4
2
3
1.5
1 2
0.5 1
0 0
f
f
f
f
-M
-M
-M
-M
-M
-M
-M
S-
T-
S-
S-
ve
ve
ve
ET
ue
ue
ue
BE
LP
LP
LP
ai
ai
ai
sc
sc
sc
-IB
N
N
+I
re
re
re
co
S
LP
S
LP
Figure 6. Reversal of Tolerance by BG at the Transcriptional Level

(A) Heatmap of the transcriptional response of naive-Mfs, LPS-Mfs, and rescue-Mfs (BG reversed LPS-Mfs) to LPS re-exposure at day 6. The scale represents
relative expression between LPS exposed naive-Mfs (1) and LPS-Mfs (0). Rescue-Mfs exposed to LPS show the most similar profile to naive-Mfs. On the top of
the heatmap is median expression (log2 RPKM) of tolerized genes at day 6 and LPS re-exposure in naive-Mfs (black), LPS-Mfs (red), LPS-co-IBET-Mfs (blue),
LPS + IBET-Mfs (light blue), and rescue-Mfs (purple).
(B) BG reverses the tolerization of key LPS-induced transcription factors, such as STAT2, STAT5A, IRF1, and IRF8. Log2 fold change increase in mRNA
expression is shown.
established distal enhancers that were modulated in the oppo- phagocytosis, and lysosome maturation (Figures 1G and S2) and
site direction by BG or LPS exposure (Figure 1D). Deposition of have clear TF motif signatures for EGR2, MITF, and ARNT (Fig-
H3K27ac and H3K4me1 at these regions was accelerated by ure 2A). Interestingly, EGR2, a TF downstream of the BG recep-
BG exposure and delayed or completely blocked by LPS expo- tor dectin-1, showed clear transient upregulation by BG but re-
sure. Accordingly, expression of genes near these elements was mained inactive in LPS-exposed monocytes, suggesting a
induced by BG, peaking at 24 hr post-exposure, while they re- possible role in modulating these pathways (Figure S4). TFs
mained lowly expressed in LPS-exposed monocytes (Figure 1F). and pathways linking lipid biosynthesis and inflammation have
These genes were involved in lipid metabolism and biosynthesis, been described (Spann et al., 2012). Further, macrophage
1364 Cell 167, 1354–1368, November 17, 2016

A PCA plot H3K27ac B H3K27ac signal recovery by BG exposure of LPS-Mf
Signal Intensity
PC2 (explains 8.5%)
25
treatment
Naive
LPS-Mf
Rescue-Mf
0 LPS-coIBET-Mf
time
Donor 1
Donor 2
Donor 3
Donor 1
Donor 2
Donor 3
Donor 1
Donor 2
d0
day 6
−25
Naïve -Mf LPS-Mf Rescue-Mf

−200 −150 −100 −50 0 50
PC1 (explains 77.7%)
C ATP9B D LPL
LPS repressed H3K27ac LPS repressed H3K27ac
Naïve-Mf 1 Naïve-Mf 1
Naïve-Mf 2 Naïve-Mf 2
LPS-Mf 1 LPS-Mf 1
LPS-Mf 2 LPS-Mf 2
Rescue-Mf 1 Rescue-Mf 1
Rescue-Mf 2 Rescue-Mf 2
Figure 7. Reversal of Tolerance by BG at the Chromatin Level

(A) PCA plot of H3K27ac dynamics among monocytes, naive-Mfs, LPS-Mfs, rescue-Mfs, and LPS-co-IBET-Mfs. BG exposure of tolerized monocytes results in a
H3K27ac profile more similar to naive macrophages, while co-incubation of monocytes with LPS and IBET does not lead to activation of these regions.
(B) Heatmap showing re-establishment of the naive-Mf H3K27ac signal by BG exposure in tolerized macrophages.
(C) PCA plot of H3K4me1 dynamics among monocytes, naive-Mfs, LPS-Mfs, Rescue Mfs, and LPS-co-IBET-Mfs, the effect is similar to H3K27ac, but to a lesser
extent. (C) H3K27ac tracks at ATP9B (glucose transport) gene enhancer and (D) LPL (lipid metabolism) gene promoter.
See also Figure S7.
response to infection requires a substantial amount of energy, from sepsis patients, indicating that reversal of tolerance using
and shifts in metabolism and energy production are a whole innate immune ‘‘trainers’’ is a viable therapeutic strategy (Cheng
mark of macrophage polarization to M1 or M2 subtypes (Ghes- et al., 2016). We show that BG exposure can indeed reverse the
quière et al., 2014), as well as for establishment of trained immu- tolerance in macrophages induced by LPS exposure, with
nity (Cheng et al., 2014). rescue-Mfs showing higher release of cytokines in response to
Reversal of tolerance after the initial inflammation phase has a second LPS stimulus (Figures 5A–5C). This was in contrast to
garnered interest because of the limited success of inflamma- the inflammation blocker IBET151, which only prevented toler-
tion-blocking treatments to reduce overall sepsis mortality ance when used to block the initial LPS response but could not
(Angus and van der Poll, 2013) and because the majority of reverse it when given to cells after LPS-induced inflammation
sepsis deaths occur due to secondary hospital infection during (Figure 5). In order to further relate our findings to the in vivo sit-
the tolerized phase (Gilroy and Yona, 2015). Our hypothesis uation, we used an experimental human endotoxemia model to
was that BG can reverse LPS-induced tolerance because it induce tolerance in vivo (Draisma et al., 2009; Kox et al., 2014).
discordantly regulated pathways that LPS also affected. Specif- In terms of cytokine production, in-vivo-tolerized monocytes
ically, LPS fails to activate key regulators of lipid, lysosome, and behave similarly to their in-vitro-tolerized counterparts. The to-
metabolism genes, EGR2 and MITF, while BG induces their lerized state of in vivo LPS-exposed monocytes is similar to
expression (Figures 1 and 2). Recently, IFNG was shown to that of ex-vivo-exposed monocytes and, most importantly, can
partially recover metabolic function in tolerized monocytes also be rescued by ex vivo BG exposure (Figures 5D–5F),
Cell 167, 1354–1368, November 17, 2016 1365

indicating that the mechanisms controlling monocyte tolerance phenotype can be reversed in sepsis patients and ultimately pro-
in vivo can also be reverted to a more responsive phenotype. vides the framework for future therapeutic developments in
In order to determine the ability of BG to reverse tolerance at innate immune diseases.
the molecular level, we first characterized the transcriptional
and epigenetic response to a second LPS exposure in tolerized STAR+METHODS
macrophages (Figures 3 and 4). Studies in mouse sepsis models
and human sepsis patients have shown that rather than being Detailed methods are provided in the online version of this paper
inert in response to a second LPS exposure, tolerized macro- and include the following:
phages show a shift in the specific pathways that they activate
(Foster et al., 2007; Shalova et al., 2015). In line with these d KEY RESOURCES TABLE
studies, we show that LPS-Mfs remodel both H3K27ac and d CONTACT FOR REAGENTS AND RESOURCE SHARING
gene expression in response to LPS (Figure 3B). However, the d EXPERIMENTAL MODEL AND SUBJECT DETAILS
starting point of LPS-Mfs is significantly different from that of B Monocytes from Healthy Donors
naive and BG-Mfs, most clearly for H3K4me1 marked en- B In Vitro Monocyte-to-Macrophage Differentiation and
hancers, suggesting that while activation is occurring, the avail- Induction of Innate Immune Memory
able enhancer repertoire of these cells is limited or not suited. B Experimental Human Endotoxemia Model
Our analysis identified a gradient in the LPS-Mf response to d METHOD DETAILS
LPS, with some genes showing a tolerized pattern (no induction) B Cytokine Assays
and others showing a responsive pattern (Figure 3C). The most B RNA Extraction and cDNA Synthesis
tolerized gene promoters were enriched for EGR2, HIF1A, and B Chromatin Immunoprecipitation
p53 motifs, among many others. A potential role for HIF1A is in B Library Preparation for Sequencing
agreement with a recent transcriptional analysis in monocytes B Assay for Transposase Accessible Chromatin
from sepsis patients (Shalova et al., 2015), while the p53 B Whole Genome Bisulfite Sequencing
pathway was a top-ranked tolerized identified by Gene Ontology B RNA-Seq Data Analysis
(GO) analysis (Figure S5D). The strongest enrichment at partially B ChIP-Seq Data Analysis
tolerized genes was for the IRF and STAT TF motifs that show B ATAC-Seq Data Analysis
strong tolerized expression patterns themselves, ie, IRF1, B DNA-Binding Motif Scanning
IRF8, STAT2, and STAT5A (Figure S6A). IRF8 and its down- B Gene Ontology Analysis
stream target, KLF4, both of which are important regulators of d QUANTIFICATION AND STATISTICAL ANALYSIS
monocyte differentiation (Kurotaki et al., 2013), show a tolerized d DATA AND SOFTWARE AVAILABILITY
profile and enrichment at tolerized gene promoters (Figure 4B). B Data Resources
Other, non-TF regulators of tolerance, such as IRAK3, HIF1A,
SOCS3, and IDO1, are all more highly expressed in LPS-Mfs SUPPLEMENTAL INFORMATION
compared to naive-Mfs and BG-Mfs and have previously been
Supplemental Information includes seven figures and five tables and can be
associated with endotoxin-induced tolerance (Bessede et al., found with this article online at http://dx.doi.org/10.1016/j.cell.2016.09.034.
2014; Saeed et al., 2014; Shalova et al., 2015). A video abstract is available at http://dx.doi.org/10.1016/j.cell.2016.09.
Rescue macrophages (BG exposed following LPS exposure) 034#mmc6.
were able to induce 60% of tolerized genes at LPS re-exposure
(Figure 6). This indicates that BG reversal of tolerance at the tran- AUTHOR CONTRIBUTIONS
scriptional level is not complete (Figure 6A). Fascinatingly, BG
Conceptualization, H.G.S., C.L., J.H.A.M., and M.G.N.; Methodology, B.N.,
recovered the expression of tolerized genes to a level greater
R.D., B.K., R.J.W.A., T.K., F.M., C.W., E.M.J.-M., J.Z., M.K., P.P., and N.S.;
than that observed in macrophages treated with IBET and LPS Investigation, B.N., E.H., S.-Y.W., and W.M.; Writing – Original Draft, B.N.,
together (Figure 6A). This indicates that BG can reinstate a H.G.S., and C.L.; Writing – Review & Editing, E.H., S.-Y.W., M.G.N.,
responsive state at a higher level than that obtained by actually J.H.A.M., R.J.W.A., and T.K.; Funding Acquisition, H.G.S.; Resources, P.F.,
blocking the initial LPS transcriptional response. This important L.C., S.J.v.H., P.P., and I.G.; Supervision, H.G.S., C.L., J.H.A.M., and M.G.N.
observation suggests that BG-associated pathways remain
intact even after large-scale epigenetic and transcriptional pro- ACKNOWLEDGMENTS
grams are induced by LPS. At the level of histone modifications,
The research leading to the results described in this paper has received fund-
BG recovers H3K27ac at regions that are silent in LPS-Mfs, ing from the European Union’s Seventh Framework Programme (FP7/2007-
further supporting the notion that the molecular mechanisms 2013) under grant agreement 282510-BLUEPRINT. B.N. is supported by an
required for BG-induced chromatin remodeling remain after NHMRC (Australia) CJ Martin Early Career Fellowship. M.G.N. was supported
the initial LPS response (Figures 7 and S7). by an ERC Consolidator Grant (310372). The authors would like to thank GSK
In conclusion, the hypothesis-free epigenomic and transcrip- Epinova and Cellzome for providing the IBET reagent and Prof. David L. Wil-
liams (University of Tennessee) for b1,3(D)glucan (b-glucan).
tomic analysis of monocyte-to-macrophage differentiation and
innate immune memory generated a number of testable hypoth-
eses. Our findings show that the innate immune ‘‘training stim- Revised: August 5, 2016
ulus’’ b-glucan can reverse macrophage tolerance ex vivo. This Accepted: September 20, 2016
is an important step toward understanding how the tolerized Published: November 17, 2016
1366 Cell 167, 1354–1368, November 17, 2016

REFERENCES reinfection via epigenetic reprogramming of monocytes. Proc. Natl. Acad.
Sci. USA 109, 17537–17542.
Amit, I., Winter, D.R., and Jung, S. (2016). The role of the local environment and Kox, M., van Eijk, L.T., Zwaag, J., van den Wildenberg, J., Sweep, F.C., van der
epigenetics in shaping macrophage identity and their effect on tissue homeo- Hoeven, J.G., and Pickkers, P. (2014). Voluntary activation of the sympathetic
stasis. Nat. Immunol. 17, 18–25. nervous system and attenuation of the innate immune response in humans.
Angus, D.C., and van der Poll, T. (2013). Severe sepsis and septic shock. N Proc. Natl. Acad. Sci. USA 111, 7379–7384.
Engl J Med. 369, 840–851. Kuhn, M. (2008). Building predictive models in R using the caret package. J
Bahador, M., and Cross, A.S. (2007). From therapy to experimental model: a Stat Softw. 28, 1–26.
hundred years of endotoxin administration to human subjects. J. Endotoxin Kulis, M., Merkel, A., Heath, S., Queirós, A.C., Schuyler, R.P., Castellano, G.,
Res. 13, 251–279. Beekman, R., Raineri, E., Esteve, A., Clot, G., et al. (2015). Whole-genome
Barnett, D.W., Garrison, E.K., Quinlan, A.R., Strömberg, M.P., and Marth, G.T. fingerprint of the DNA methylome during human B cell differentiation. Nat.
(2011). BamTools: a C++ API and toolkit for analyzing and managing BAM files. Genet. 47, 746–756.
Bioinformatics 27, 1691–1692. Kurotaki, D., Osato, N., Nishiyama, A., Yamamoto, M., Ban, T., Sato, H., Naka-
Bessede, A., Gargaro, M., Pallotta, M.T., Matino, D., Servillo, G., Brunacci, C., bayashi, J., Umehara, M., Miyake, N., Matsumoto, N., et al. (2013). Essential
Bicciato, S., Mazza, E.M., Macchiarulo, A., Vacca, C., et al. (2014). Aryl hydro- role of the IRF8-KLF4 transcription factor cascade in murine monocyte differ-
carbon receptor control of a disease tolerance defence pathway. Nature 511, entiation. Blood 121, 1839–1849.
184–190. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and
Biswas, S.K., and Lopez-Collazo, E. (2009). Endotoxin tolerance: new mecha- memory-efficient alignment of short DNA sequences to the human genome.
nisms, molecules and clinical significance. Trends Immunol. 30, 475–487. Genome Biol. 10, R25.
Cheng, S.C., Quintin, J., Cramer, R.A., Shepardson, K.M., Saeed, S., Kumar, Lavin, Y., Winter, D., Blecher-Gonen, R., David, E., Keren-Shaul, H., Merad,
V., Giamarellos-Bourboulis, E.J., Martens, J.H., Rao, N.A., Aghajanirefah, A., M., Jung, S., and Amit, I. (2014). Tissue-resident macrophage enhancer
et al. (2014). mTOR- and HIF-1a-mediated aerobic glycolysis as metabolic landscapes are shaped by the local microenvironment. Cell 159, 1312–
basis for trained immunity. Science 345, 1250684. 1326.
Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Bur-
Cheng, S.C., Scicluna, B.P., Arts, R.J., Gresnigt, M.S., Lachmandas, E., Gia-
rows-Wheeler transform. Bioinformatics 25, 1754–1760.
marellos-Bourboulis, E.J., Kox, M., Manjeri, G.R., Wagenaars, J.A., Cremer,
O.L., et al. (2016). Broad defects in the energy metabolism of leukocytes un- Mammana, A., and Chung, H.R. (2015). Chromatin segmentation based on a
derlie immunoparalysis in sepsis. Nat. Immunol. 17, 406–413. probabilistic model for read counts explains a large portion of the epigenome.
Genome Biol. 16, 151.
de la Rica, L., Rodrı́guez-Ubreva, J., Garcı́a, M., Islam, A.B., Urquiza, J.M.,
Hernando, H., Christensen, J., Helin, K., Gómez-Vaquero, C., and Ballestar, McLean, C.Y., Bristor, D., Hiller, M., Clarke, S.L., Schaar, B.T., Lowe, C.B.,
E. (2013). PU.1 target genes undergo Tet2-coupled demethylation and Wenger, A.M., and Bejerano, G. (2010). GREAT improves functional interpre-
DNMT3b-mediated methylation in monocyte-to-osteoclast differentiation. tation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501.
Genome Biol. 14, R99. Netea, M.G., Joosten, L.A., Latz, E., Mills, K.H., Natoli, G., Stunnenberg, H.G.,
Draisma, A., Pickkers, P., Bouw, M.P., and van der Hoeven, J.G. (2009). Devel- O’Neill, L.A., and Xavier, R.J. (2016). Trained immunity: A program of innate im-
opment of endotoxin tolerance in humans in vivo. Crit. Care Med. 37, 1261– mune memory in health and disease. Science 352, aaf1098.
1267. Nicodeme, E., Jeffrey, K.L., Schaefer, U., Beinke, S., Dewell, S., Chung, C.W.,
Foster, S.L., Hargreaves, D.C., and Medzhitov, R. (2007). Gene-specific con- Chandwani, R., Marazzi, I., Wilson, P., Coste, H., et al. (2010). Suppression of
trol of inflammation by TLR-induced chromatin modifications. Nature 447, inflammation by a synthetic histone mimic. Nature 468, 1119–1123.
972–978. Nishikawa, K., Iwamoto, Y., Kobayashi, Y., Katsuoka, F., Kawaguchi, S., Tsu-
jita, T., Nakamura, T., Kato, S., Yamamoto, M., Takayanagi, H., and Ishii, M.
Ghesquière, B., Wong, B.W., Kuchnio, A., and Carmeliet, P. (2014). Meta-
(2015). DNA methyltransferase 3a regulates osteoclast differentiation by
bolism of stromal and immune cells in health and disease. Nature 511,
coupling to an S-adenosylmethionine-producing metabolic pathway. Nat.
167–176.
Med. 21, 281–287.
Ghisletti, S., Barozzi, I., Mietton, F., Polletti, S., De Santa, F., Venturini, E.,
Ostuni, R., Piccolo, V., Barozzi, I., Polletti, S., Termanini, A., Bonifacio, S., Cu-
Gregory, L., Lonie, L., Chew, A., Wei, C.L., et al. (2010). Identification and char-
rina, A., Prosperini, E., Ghisletti, S., and Natoli, G. (2013). Latent enhancers
acterization of enhancers controlling the inflammatory gene expression pro-
activated by stimulation in differentiated cells. Cell 152, 157–171.
gram in macrophages. Immunity 32, 317–328.
Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for
Gilroy, D.W., and Yona, S. (2015). HIF1a allows monocytes to take a breather
comparing genomic features. Bioinformatics 26, 841–842.
during sepsis. Immunity 42, 397–399.
Quintin, J., Saeed, S., Martens, J.H., Giamarellos-Bourboulis, E.J., Ifrim,
Glass, C.K., and Natoli, G. (2016). Molecular control of activation and priming
D.C., Logie, C., Jacobs, L., Jansen, T., Kullberg, B.J., Wijmenga, C.,
in macrophages. Nat. Immunol. 17, 26–33.
et al. (2012). Candida albicans infection affords protection against reinfec-
Goodridge, H.S., Simmons, R.M., and Underhill, D.M. (2007). Dectin-1 stimu- tion via functional reprogramming of monocytes. Cell Host Microbe 12,
lation by Candida albicans yeast or zymosan triggers NFAT activation in 223–232.
macrophages and dendritic cells. J. Immunol. 178, 3107–3115.
Quintin, J., Cheng, S.C., van der Meer, J.W., and Netea, M.G. (2014). Innate
Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., immune memory: towards a better understanding of host defense mecha-
Murre, C., Singh, H., and Glass, C.K. (2010). Simple combinations of lineage- nisms. Curr. Opin. Immunol. 29, 1–7.
determining transcription factors prime cis-regulatory elements required for Rialdi, A., Campisi, L., Zhao, N., Lagda, A.C., Pietzsch, C., Ho, J.S., Marti-
macrophage and B cell identities. Mol. Cell 38, 576–589. nez-Gil, L., Fenouil, R., Chen, X., Edwards, M., et al. (2016). Topoisomerase
Huang da, W., Sherman, B.T., and Lempicki, R.A. (2009). Systematic and inte- 1 inhibition suppresses inflammatory genes and protects from death by
grative analysis of large gene lists using DAVID bioinformatics resources. Nat. inflammation. Science 352, aad7993.
Protoc. 4, 44–57. Saeed, S., Quintin, J., Kerstens, H.H., Rao, N.A., Aghajanirefah, A., Matarese,
Kleinnijenhuis, J., Quintin, J., Preijers, F., Joosten, L.A., Ifrim, D.C., Saeed, S., F., Cheng, S.C., Ratter, J., Berentsen, K., van der Ent, M.A., et al. (2014).
Jacobs, C., van Loenhout, J., de Jong, D., Stunnenberg, H.G., et al. (2012). Ba- Epigenetic programming of monocyte-to-macrophage differentiation and
cille Calmette-Guerin induces NOD2-dependent nonspecific protection from trained innate immunity. Science 345, 1251086.
Cell 167, 1354–1368, November 17, 2016 1367

SepsisReport (2012). Focus on sepsis. Nat. Med. 18, 997. Vento-Tormo, R., Company, C., Rodrı́guez-Ubreva, J., de la Rica, L., Urquiza,
J.M., Javierre, B.M., Sabarinathan, R., Luque, A., Esteller, M., Aran, J.M., et al.
Shalova, I.N., Lim, J.Y., Chittezhath, M., Zinkernagel, A.S., Beasley, F., Her-
(2016). IL-4 orchestrates STAT6-mediated DNA demethylation leading to den-
nández-Jiménez, E., Toledano, V., Cubillos-Zapata, C., Rapisarda, A., Chen,
dritic cell differentiation. Genome Biol. 17, 4.
J., et al. (2015). Human monocytes undergo functional re-programming during
sepsis mediated by hypoxia-inducible factor-1a. Immunity 42, 484–498. Weirauch, M.T., Yang, A., Albu, M., Cote, A.G., Montenegro-Montero, A.,
Drewe, P., Najafabadi, H.S., Lambert, S.A., Mann, I., Cook, K., et al. (2014).
Spann, N.J., Garmire, L.X., McDonald, J.G., Myers, D.S., Milne, S.B., Shibata, Determination and inference of eukaryotic transcription factor sequence spec-
N., Reichart, D., Fox, J.N., Shaked, I., Heudobler, D., et al. (2012). Regulated ificity. Cell 158, 1431–1443.
accumulation of desmosterol integrates macrophage lipid metabolism and in- Wu, T.D., and Nacu, S. (2010). Fast and SNP-tolerant detection of complex
flammatory responses. Cell 151, 138–152. variants and splicing in short reads. Bioinformatics 26, 873–881.
van Heeringen, S.J., and Veenstra, G.J. (2011). GimmeMotifs: a de novo motif Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E.,
prediction pipeline for ChIP-sequencing experiments. Bioinformatics 27, Nusbaum, C., Myers, R.M., Brown, M., Li, W., and Liu, X.S. (2008). Model-
270–271. based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137.
1368 Cell 167, 1354–1368, November 17, 2016

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
Rabbit polyclonal anti-H3K27ac Diagenode pAb-196-050
Rabbit polyclonal anti-H3K4me1 Diagenode pAb-037-050
IBET-151 GSK Epinova and Cellzome GSK1210151A
Human Serum Sigma-Aldrich H4522-100ML
RPMI 1640 Medium, GlutaMAX Thermo Fisher Scientific 61870036
Gentamycin Thermo Fisher Scientific 15750060
L-glutamine Thermo Fisher Scientific 25030081
Sodium Pyruvate Thermo Fisher Scientific 11360070
Percoll Sigma-Aldrich P1644-1L
Ficoll Paque Plus Sigma-Aldrich GE17-1440-03
Lipopolysaccharides from Escherichia coli Sigma-Aldrich L2880-10MG
055:B5
b1,3(D)glucan (b-glucan) (Saeed et al., 2014) N/A
2-Mercaptoethanol Thermo Fisher Scientific 21985023
Actinomycin D Thermo Fisher Scientific 11805017
IGEPAL CA-630 Sigma-Aldrich I8896-50ML
KAPA library preparation kit Kapa Biosystems KK8400
riboZero gold rRNA removal kit Illumina MRZG12324
Nextera DNA Library Prep Kit Illumina FC-121-1031
TruSeq SBS KIT v3 - HS (50 cycles) Illumina FC-401-3002
NextSeq 500/550 High Output v2 kit (75 cycles) Illumina FC-404-2005
NEBNext High-Fidelity 2 3 PCR Master Mix New England Biolabs M0541
iQ SYBR Green Supermix Bio-Rad 1708880
100 3 SYBR Green I Nucleic Acid Gel Stain Thermo Fisher Scientific S7563
Human IL-6 elisa Sanquin M9316
Human TNFa elisa R&D DY210
SPRIselect reagent kit Beckman Coulter B23218
E-Gel SizeSelect Agarose Gels, 2% Thermo Fisher Scientific G661002
CD3 MicroBeads, human Miltenyi Biotec 130-050-101
dNTP set 100 mM Life Technologies 10297-018
dUTP 100 mM Promega U119A
Glycogen (20 mg/ml) Life Technologies 10814-010
Random Hexamer primers Sigma-Aldrich 11034731001
Second Strand Buffer Life Technologies 10812-014
Superscript III Reverse Transcriptase Life Technologies 18080-044
DNA polymerase I, E. coli New England Biolabs M0209S
USER enzyme New England Biolabs M5505L

Continued
E.Coli Ligase New England Biolabs M0205L
Rnasin Plus Rnase Inhibitor Promega N2615
Ribonuclease H Life Technologies AM2293
T4 DNA polymerase New England Biolabs M0203L
Sodium Acetate (3M) Life Technologies AM9740
DNase I QIAGEN 79254
Qubit RNA HS assay kit Life Technologies Q32852
Ribozero Gold Kit Illumina MRZG12324
Rneasy Mini Kit QIAGEN 74106
Deposited Data
Raw data files for RNA sequencing This paper GEO: GSE85243
Raw data files for ChIP sequencing This paper GEO: GSE85245
Raw data files for ATAC sequencing This paper GEO: GSE87218
Raw data files for WGBS sequencing This paper EGA: EGAD00001002693
Human: primary monocytes from healthy Sanquin Blood Bank N/A
volunteers
NEXTflex DNA Barcodes - 48 Bioo Scientific 514104
Primer EGR2: F 50 TTGACCAGATGAACGGAGTG This paper N/A
30 R 50 GTTGAAGCTGGGGAAGTGAC 30
Primer MITF: F 50 AACTCATGCGTGAGCAGATG This paper N/A
30 R 50 TACTTGGTGGGGTTTTCGAG 30
Primer CSF1: F 50 CAGATGGAGACCTCGTGCC This paper N/A
30 R 50 GCATTGGGGGTGTTATCTCTG 30
Primer LAMP1: F 50 TGAACAAGACAGGCCT This paper N/A
TCCC 30 R 50 TGTGCAGCTCCAGAGTCACC 30
Bedtools (Quinlan and Hall, 2010) http://bedtools.readthedocs.io/en/latest/
Bamtools (Barnett et al., 2011) https://github.com/pezmaster31/bamtools
Samtools (Li and Durbin, 2009) http://samtools.sourceforge.net/
GSNAP (Wu and Nacu, 2010) http://research-pub.gene.com/gmap/
GimmeMotifs (van Heeringen and Veenstra, 2011) https://github.com/simonvh/gimmemotifs
Caret (Kuhn, 2008) http://cran.r-project.org/web/packages/caret/
index.html
HOMER (Heinz et al., 2010) http://homer.salk.edu/homer/motif/
DAVID (Huang da et al., 2009) https://david.ncifcrf.gov/
bwa (Li and Durbin, 2009) http://bio-bwa.sourceforge.net/
bowtie (Langmead et al., 2009) http://bowtie-bio.sourceforge.net/index.shtml
MACS2 (Zhang et al., 2008) https://github.com/taoliu/MACS
CONTACT FOR REAGENTS AND RESOURCE SHARING
Further information and requests for reagents may be directed to, and will be fulfilled by the corresponding author Hendrik G. Stun-
nenberg (h.stunnenberg@ncmls.ru.nl).
Monocytes from Healthy Donors

All primary cells were isolated from healthy volunteers who gave written informed consent (Sanquin Blood bank, Nijmegen, the
Netherlands). Volunteers are of Northern European descent. Peripheral blood mononuclear cells were isolated by centrifugation in

Ficoll-Paque (GE Healthcare), followed by removal of T cells using an additional Percoll gradient. Monocytes were purified from
PBMCs using negative selection in an LD column magnet separator, with beads for CD3+ (T cells), CD19+ (B cells) and CD56+
(NK cells) positive cells (Miltenyi Biotech), yielding > 95% pure monocytes. Successful isolation of monocytes was confirmed with
FACS, as previously described (Saeed et al., 2014).
In Vitro Monocyte-to-Macrophage Differentiation and Induction of Innate Immune Memory

Monocytes were differentiated into resting macrophages by ex vivo culture in RPMI 1640 medium (Sigma Aldrich) with 10% Human
Serum. Media was supplemented with 10 mg/mL gentamycin, 10 mM L-glutamine and 10 mM pyruvate (Life Technologies). Toleriza-
tion was induced by treatment of monocytes with 10-100ng/mL LPS for 24 hr, followed by washout and five days culture in RPMI +
10% human serum, while trained innate immunity was induced by treatment with 5 mg/mL BG for 24 hr, followed by washout and
5 days in culture. Establishment of tolerance or training in the resulting macrophages at day 6 was determined by TNF and IL6 release
at 24 hr following LPS stimulation using ELISA. For ChIP-seq, 10x106 monocytes were seeded in 10cm dishes, for RNaseq and
ATAC-seq 1.5 3 106 monocytes were seeded in 6 well plates. IBET151 (GSK) was diluted to 50 mM stock using DMSO. Following
dosage titration 5 mM was determined as the appropriate final concentrations to prevent tolerization, without causing cell death.
IBET-151 was added to monocytes at the same time as LPS for 24 hr, followed by washout and five days culture in RPMI + 10%
human serum to macrophage differentiation.
Experimental Human Endotoxemia Model

In vivo endotoxin tolerance was examined in 12 healthy nonsmoking volunteers who participated in an experimental human endo-
toxemia study. The study is registered at Clinicaltrials.gov (NCT02602977) and study protocols were approved by the local ethics
committee of the Radboud University Nijmegen Medical Centre (NL53584.091.15/CMO 2015-1796). Written informed consent
was obtained from all study participants. Subjects were screened before the start of the experiment and had a normal physical ex-
amination, electrocardiography, and routine laboratory values. Throughout the study period, subjects were not allowed to take any
drugs, including acetaminophen, and were asked to refrain from alcohol and caffeine 24 hr and from food 12 hr before the start of the
endotoxemia experiment. All study procedures were conducted in accordance with the declaration of Helsinki including current re-
visions and Good Clinical Practice guidelines. Experimental human endotoxemia was conducted as described previously (Kox et al.,
2014). Briefly, all subjects received an intravenous bolus injection of LPS (lipopolysaccharide derived from Escherichia coli O:113,
Clinical Center Reference Endotoxin, National Institutes of Health (NIH), Bethesda, MD) at a dose of 2 ng/kg. Blood was obtained
before LPS administration and 4 hr afterward, and monocytes were isolated. Monocytes were exposed to culture or BG ex vivo,
and cytokine production in the supernatants was measured following ex vivo LPS (10ng/ml) exposure. Cytokine production was
determined by ELISA following the protocol of the manufactures (IL-6, sanquin and TNFa, R&D systems).
METHOD DETAILS
Cytokine Assays
TNFa and IL-6 were measured using ELISA according to the manufacturer protocol (IL6: Sanquin; and TNFa: R&D). For cytokines
production assays the differences between groups were analyzed using the Wilcoxon signed-rank test. The level of significance
was defined as a p value < 0.05.
RNA Extraction and cDNA Synthesis

Total RNA was extracted from cells using the QIAGEN RNeasy RNA extraction kit (QIAGEN, Netherlands), using on-column DNaseI
treatment. Ribosomal RNA was removed using the riboZero rRNA removal kit (Illumina). RNA was then fragmented into 200bp frag-
ments by incubation for 7.5 min at 95 C in fragmentation buffer (200 mM Tris-acetate, 500 mM Potassium Acetate, 150 mM Mag-
nesium Acetate [pH 8.2]). First strand cDNA synthesis was performed using SuperScript III (Life Technologies), followed by synthesis
of the second cDNA strand. Library preparation was performed using the KAPA hyperprep kit (KAPA Biosystems). Quality of cDNA
and the efficiency of ribosomal RNA removal was confirmed using quantitative RT-PCR using the IQ Sybr Supermix, with primers for
GAPDH, 18S and 28S rRNA.
Chromatin Immunoprecipitation
Purified cells were fixed with 1% formaldehyde (Sigma) at a concentration of approximately 10 million cells/ml. Fixed cell prepara-
tions were sonicated using a Diagenode Bioruptor UCD-300 for 3x 10 min (30 s on; 30 s off). 67 ml of chromatin (1 million cells) was
incubated with 229 ml dilution buffer, 3 ml protease inhibitor cocktail and 0.5-1mg of H3K27ac, H3K4me3, H3K4me1, H3K27me3,
H3K9me3 or H3K36me3 antibodies (Diagenode) and incubated overnight at 4 C with rotation. Protein A/G magnetic beads were
washed in dilution buffer with 0.15% SDS and 0.1% BSA, added to the chromatin/antibody mix and rotated for 60 min at 4 C. Beads
were washed with 400ml buffer for 5 min at 4 C with five rounds of washes. After washing chromatin was eluted using elution buffer for
20 min. Supernatant was collected, 8 ml 5M NaCl, 3ml proteinase K were added and samples were incubated for 4 hr at 65 C.Finally
samples were purified using QIAGEN; Qiaquick MinElute PCR purification Kit and eluted in 20 ml EB. Detailed protocols can be found
on the Blueprint website (http://www.blueprint-epigenome.eu/UserFiles/file/Protocols/Histone_ChIP_May2013.pdf).

Library Preparation for Sequencing
Illumina library preparation was done using the Kapa Hyper Prep Kit. For end repair and A-tailing double stranded DNA was incubated
with end repair and A-tailing buffer and enzyme and incubated first for 30 min at 20 C and then for 30 min at 65 C.Subsequently
adapters were ligated by adding 30ml ligation buffer, 10 Kapa l DNA ligase, 5 ml diluted adaptor in a total volume of 110ml and incu-
bated for 15 min at 15 C. Post-ligation cleanup was performed using Agencourt AMPure XP reagent and products were eluted in 20 ml
elution buffer. Libraries were amplified by adding 25 ml 2x KAPA HiFi Hotstart ReadyMix and 5ml 10x Library Amplification Primer Mix
and PCR, 10 cycles. Samples were purified using the QIAquick MinElute PCR purification kit and 300bp fragments selected using
E-gel. Correct size selection was confirmed by BioAnalyzer analysis. Sequencing was performed using Illumina HiSeq 2000
machines and generated 43bp single end reads. Samples for RNA-seq were treated to the above protocol exactly, except for a single
additional step: After post-ligation cleanup, and before library amplification, samples were incubated with 3 uL USER enzyme for
15 min at 37 C to digest the 2nd cDNA strand.
Assay for Transposase Accessible Chromatin

Monocytes or macrophages (100,000 cells) were scrapped in a well of a 6-well plate with cold PBS and then spun down at 800 3 g for
5 min at 4 C. Cells were washed with 50 ml of cold 1x PBS buffer, incubated in 50 ml of cold lysis buffer (10 mM Tris-HCL (pH 7.4),
10 mM NaCl, 3 mM MgCl2 0, 1% IGEPAL) and spun down at 800 3 g for 10 min at 4 C. The nuclei were immediately resuspended in
the transposition reaction mix (22.5 ml TD buffer, 2.5 ml Tn5 Transposase, 25 ml NF H2O) and incubated for 30 min at 37 C. Following
transposition, 100 ml AMPure beads were added to the reaction (sample-to-bead ratio of 1:2), mixed thoroughly by pipetting, and
incubated for 15 min at RT. Samples and beads were washed on the magnetic rack with 80% ethanol, dried for 5 min, and resus-
pended in 15 ml EB buffer. DNA was amplified with 10 - 15 PCR cycles using the mix (15 ml transposed DNA, 0.3 ml 100x SYBR Green
I, 25 ml NEBNext High-Fidelity master mix, 2.5 ml Nextera Primer index N7.. (25 mM), 2.5 ml Nextera Primer index S5.. (25 mM), 4.7 ml NF
H2O). In order to reduce GC and size bias in PCR, the PCR reaction is monitored using qPCR to stop amplification prior to saturation.
Following amplification, samples were incubated purified twice using SPRI beads, first using negative selection with a sample-to-
bead ratio of 1-0.65 and then positive selection with a sample-to-bead ratio of 1-1.8. After 80% Ethanol wash and drying, the sample
was eluted in 20 ml EB buffer, and quality checked before sequencing. Detailed protocol can be found on the Blueprint website (http://
www.blueprint-epigenome.eu/UserFiles/file/Protocols/ATAC_Seq_Protocol.pdf).
Whole Genome Bisulfite Sequencing

Genomic DNA (1-2 mg) was spiked with unmethylated l DNA (5ng of l DNA per mg of genomic DNA) (Promega). The DNA was sheared
by sonication to 50-500bp using a Covaris E220 and fragments of size 150-300 bp were selected using AMPure XP beads (Agencourt
Bioscience). Genomic DNA libraries were constructed using the Illumina TruSeq Sample Preparation kit (Illumina) following the
lllumina standard protocol: end repair was performed on the DNA fragments, an adenine was added to the 30 extremities of the frag-
ments and Illumina TruSeq adapters were ligated at each extremity. Adter adaptor ligation, the DNA was treated with sodium bisulfite
using the EpiTexy Bisulfite kit (QIAGEN) following the manufacturer’s instructions for formalin-fixed and paraffin-embedded (FFPE)
tissue samples. Two rounds of bisulfite conversion were performed to assure a high conversion rate. An enrichment for adaptor-
ligated DNA was carried out through 7 PCR cycles using the PfuTurboCx Hotstart DNA polymerase (Stratagene). Library quality
was monitored using the Agilent 2100 BioAnalyzer (Agilent), and the concentration of viable sequencing fragments (molecules car-
rying adaptors at both extremities) estimated using quantitative PCR with the library quantification kit from KAPA Biosystem. Paired-
end DNA sequencing (2x100 nucleotides) was then performed using the Illumina Hi-Seq 2000. WGBS data are available upon request
from the BLUEPRINT consortium.
RNA-Seq Data Analysis

For quality control and visualization, RNA-seq reads were aligned to the hg19 reference genome using GSNAP (Wu and Nacu, 2010)
with non-default parameters -m 1 -N 1 -n 1 -Q -s Ensembl_splice_68. Each RNA-seq sample was subjected to a quality control step,
where, based on read distribution over the annotated genome, libraries that are outliers were identified and discarded from further anal-
ysis. To infer gene expression levels, RNA-seq reads were aligned to the Ensembl v68 human transcriptome using Bowtie. Quantifica-
tion of gene expression was performed using MMSEQ. Differential expression was determined using MMDIFF. A two model compar-
ison was used to identify differentially expressed genes that confer cellular identity Mo/Mf. The null-model is that the mean expression
levels are the same in both cell types, and the alternative model is that the mean expression levels are allowed to differ between the two
cell types. Genes with a larger posterior probability for the second model, an RPKM value greater than 2 in any of Mo or Mf and mini-
mally a 2-fold expression change were considered as differentially expressed. Expression changes related to differentiation of each
treatment were studied using a 52-model comparison, a.k.a. polytomous comparison, under the null-model that assumes the mean
expression levels are the same across each time-point. Expression differences related to the treatments at each time-point were stud-
ied using a 5-model comparison, under the null-model that assumes the mean expression levels are the same across each treatment.
ChIP-Seq Data Analysis

Sequencing reads were aligned to human genome assembly hg19 (NCBI version 37) using bwa. Duplicate reads were removed after
the alignment with the Picard tools. For peak calling the BAM files were first filtered to remove the reads with mapping quality less

than 15, followed by fragment size modeling (https://code.google.com/archive/p/phantompeakqualtools/). MACS2 (https://github.
com/taoliu/MACS/) was used to call the peaks. H3K4me1,H3K9me3 and H3K27me3 peaks were called using the broad setting of
MACS2 while H3K27ac and H3K4me3 were called using the default (narrow) setting. For each histone mark dataset, the data
were normalized using the R package DESeq2 and then pair-wise comparisons were performed (fold change 3, adjusted p-adjvalue <
0.05 and RPKM R 2 in at least in any condition) to determine the differentially expressed genes per condition. The results from all
possible pairwise comparisons (within each condition and similar time points across all conditions per mark) were pooled and
merged to define the dynamic set of enriched regions. Promoters were defined as regions between ± 2kb from TSSs for each
ensemble gene and enhancers were determined as enriched H3K27ac/H3K4me1 regions more than ± 2kb away from the TSS.
To find different patterns over dynamic promoters or enhancers, we applied a K-means clustering procedure (with optimal number
of clusters per each dataset) to the dynamic datasets as described above.
ATAC-Seq Data Analysis

The full ATAC-seq protocol is available at the BLUEPRINT website (http://www.blueprint-epigenome.eu/UserFiles/file/Protocols/
ATAC_Seq_Protocol.pdf). ATAC-seq reads were mapped to the hg19 reference genome using BWA (Li and Durbin, 2009) with default
parameters. Non-uniquely mapped reads and PCR duplicates were removed. MACS2 (Zhang et al., 2008) was used to identify re-
gions of open chromatin (peaks) with parameters ‘‘–nomodel -p 1e-9.’’ Overlap peaks from different samples were merged.
DNA-Binding Motif Scanning

All the DNA-binding motifs used in this study are based on the cis-bp database described in (Weirauch et al., 2014). Only motifs with
direct evidence of binding in the species of vertebrate were selected. Within each motif family, as annotated by cis-bp, all motifs were
clustered using ‘gimme cluster’ from the GimmeMotifs package (van Heeringen and Veenstra, 2011) with a threshold of 0.9999. The
annotation of motifs is based on the annotation of human in the cis-bp database. Motifs were used for scanning if the assigned TF is
expressed (> 1 RPKM) in at least one time-point during the differentiation. Total ATAC-seq peaks were scanned for the presence of
motifs. We used Gimme motifs for scanning with dynamic motif scoring cut-offs targeting a false discovery rate (FDR) of both 0.01
and 0.05. To look at the motif enrichments in each set of regions (epigenetic cluster or gene cluster), ATAC-seq peaks were assigned
to the epigenomic cluster or the gene promoters by intersection. Motif occurrences were acquired by intersection of the assigned
ATAC-seq peaks with the motif scanning results on total ATAC-seq peaks. Total ATAC-seq peaks were divided into promoter set
and non-promoter set as the background for the calculation of motif enrichment. Enrichment of motifs in each set of regions was
defined by applying a hypergeometric test using the motif frequency in the corresponding background. This results in TFs that pu-
tatively regulate the activities of the regulatory regions. Motifs in each heat map satisfy an arbitrary cutoff of > 5% motif presence and
a fixed minimal presence difference from background in at least one cluster. Hierarchical clustering (Pearson correlation) was per-
formed in each heat map using the motif occurrence frequencies in the clusters. Based on the gene activity and dynamics, only one
TF was selected to represent a motif if multiple genes are assigned to the same motif. Scanning results from FDR of 0.01 and 0.05
were compared and do not affect the result of enrichment analysis.
Gene Ontology Analysis

Gene ontology analysis on dynamic lists of genes was performed using DAVID (Huang da et al., 2009). Gene ontology on dynamic
enhancer clusters was performed using GREAT (McLean et al., 2010). KEGG pathways and Biological Processes were ranked by p
value and the top terms were plotted.
Statistical parameters including the exact value of n, the definition of center, dispersion, and precision measures (mean ± SEM) and
statistical significance are reported in the Figures and the Figure Legends. Data are judged to be statistically significant when p < 0.05
by two-tailed Student’s T-Test or 2-way ANOVA, where appropriate.
Data Resources
Raw data files for the RNA, ATAC, and ChIP sequencing and analysis have been deposited in the NCBI Gene Expression Omnibus
under accession number: GSE85246.
Links to GEO SubSeries linked to GSE85246:
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85243

Naive LPS BG Naive LPS BG

A H3K27ac H3K4me3 B
H3K27me3
d0 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6 d0 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6
13% 8%
31% LPS
(3,729)
Total peaks Total peaks Total peaks

77,686 15,018 12,996 BG
(3,281)
H3K4me1 ATAC H3K9me3 Diff

gain
16% 2% (4,028)
25%
Total peaks Total peaks Total peaks

54,912 66,673 51,221 Diff
loss
(6,462)
H3K27ac H3K4me1
C Promoter Promoter
H3K27ac H3K4me3 H3K27me3 H3K9me3
dynamics dynamics dynamics dynamics
25 20
10
10
time
PC2 (14.3%)
PC2 (17.2%)
5
PC2 (7.1%)
PC2 (6.4%)
5 d0
0
0
4 hour
0
0 day 1
day 6
−5
−25
−20
−5
treatment
Naive
−10
LPS
−10
BG
−40 −15
−50 −25 0 25 50 −20 0 20 40 −25 0 25 50 −20 0 20 40
PC1 (52.1%) PC1 (54.9%) PC1 (84.7%) PC1 (77.6%)
D Chromatin states at H3K4me1 dynamics

LPS- Mf up H3K4me1 LPS- Mf down H3K4me1
1 2 3 4 5 6 7 8 9 1,398 (424+974 peaks) 4,793 (854+3,939)
100 100
H3K9me3
states
80
%Fraction of bins
80 1
H3K27me3
%Fraction of bins
2
H3K4me1
60 60 3
H3K27ac 4
H3K4me3 40 5
40
7
Bivalent
9
Prom Enh Hetero 20 20 8
6
0 0
Naive 4h
Naive d1
Naive d6
Mo d0
LPS 4h
LPS d1
LPS d6
BG 4h
BG d1
BG d6
Naive 4h
Naive d1
Naive d6
Mo d0
LPS 4h
LPS d1
LPS d6
BG 4h
BG d1
BG d6
1 2 3 4 5
log(mean count+1)
Figure S1. Summary of Dynamic Histone Marks and PCA Plots of Dynamic Active Histone Modifications at Promoters and Repressive Marks,
Related to Figure 1
(A) Percentage of histone ChIP-seq peaks designated as dynamic across time-points and between treatments. H3K27ac was the most dynamic modification,
with almost a third of regions showing significant changes.
(B) Heatmap showing histone intensity of H3K27ac and H3K4me1 at dynamic H3K27ac enhancers with 12kb ± from center of the peak.
(C) PCA plots for all time-points for H3K27ac dynamic promoters, H3K4me3 dynamic promoters, dynamic H3K27me3 regions, and dynamic H3K9me3 regions.
H3K27ac and H3K4me3 at promoters behave similarly over time and in response to LPS or BG exposure, and reflect the behavior of H3K27ac at enhancers.
Unlike active marks, repressive marks show little dynamics up to day 1.
(D) Stacked plots showing chromatin state changes over differentiation at ‘‘LPS-Mf up’’ and ‘‘BG up / LPS down’’ H3K4me1 enhancers. These enhancers are
established through H3K27ac dynamics shown in Figure 1C. The genome was segmented into 9 chromatin states based on the 5 histone marks analyzed. This
analysis indicates that H3K4me1 increase is associated with loss of H3K27me3.
A Number of genes at each time-point that deviate from RPMI by FC >2
LPS exposure BG exposure LPS exposure BG exposure

1h 1h
2000 2000
1500
No
No
Number of genes
1500 1500
.
ge
ge
1000 1000
up
ne
ne
500 500 500
s
d6 0 4h d6 0 4h
-500
down
-1500
d1 d1 1h 4h d1 d6 1h 4h d1 d6
B Overlap between H3K27ac clusters in Figure 1C and RNA clusters in Figure S2A
gene expression
LPS up BG up
1h 4h d1 d6 1h 4h d1 d6
H3K27ac promoter
LPS
pattern
BG
Diff gain
Diff loss
C
LPL (Lipoprotein Lipase) – Promoter and Enhancer belong to ‘BG up / LPS down’ cluster
RNA-seq Histone modifications
d0
d0 Naive
4h LPS
Naive
H3K27ac
BG
4h LPS Naive
d1 LPS
BG BG
Naive
RNA
Naive
H3K4me3 H3K27ac H3K27me3
d1 LPS LPS
BG
BG
Naive
Naive d6 LPS
BG
d6 LPS Naive
LPS
BG
BG
Figure S2. RNA-Seq Dynamics in Response to LPS and BG and Relationship to Histone Marks, Related to Figure 1
(A) Number of genes showing treatment (LPS or BG) specific expression at each time point (1h, 4h, d1, d6). LPS exposure induces the largest number of genes at
each time-point, with a minimum of 110 transcripts at 1h, and a maximum of 650 transcripts at day 1. Up to 100 genes maintain LPS-specific expression at d6.
Comparatively BG induced gene expression patterns peak at d1, a fraction of which is maintained to d6.
(B) Overlap between gene expression group and promoter H3K27ac cluster. LPS-induced H3K27ac accumulation at promoters correlates well with LPS induced
gene expression at all time-points. However, at day 1 and day 6, the ‘LPS-up’ genes are equally explained by a lag in differentiation-associated repression in LPS
treated cells. Conversely, BG exposure leads to faster expression of differentiation associated genes, with higher overlap between ‘BG-up’ genes and ‘differ-
entiation gain’ and BG-associated H3K27ac promoters.
(C) Example tracks of a BG induced/LPS repressed gene and an LPS induced gene, LPL (Lipoprotein Lipase).
A DNA methylation B
(all CpGs)
2,700 DMRs
1
DNA methylation
BG 4h
Naive 4h 0.5
Naive 1h
LPS 1h
BG 1h
LPS 4h 0
d0
Naive d1
d0 Naive LPS BG
BG d1
Day 6
BG d6
Naive d6
1 C
LPS d1
Distal Distal Open Promoter
LPS d6 H3K4me1 H3K27ac ATAC H3K4me3
0
BG 4h
Naive 4h
Naive 1h
LPS 1h
BG 1h
LPS 4h
d0
Naive d1
BG d1
BG d6
Naive d6
LPS d1
LPS d6
-1 91.6% 68.6% 69.8% 6%

R
D Macrophage subtype specific DMRs
Differentiation DMRs Naive/BG-Mf specific DMRs LPS-Mf specific DMRs

1.0
1.0
1.0
0.8
0.8
0.8
5mC+5hmC level
5mC+5hmC level
5mC+5hmC level
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
d0 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6 d0 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6 d0 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6
Mo Naive LPS BG Mo Naive LPS BG Mo Naive LPS BG
E H3K27ac signal at the DMRs

H3K27ac logFC
-3
Mo 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6 Mo 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6 Mo 1h 4h d1 d6 1h 4h d1 d6 1h 4h d1 d6
Naive LPS BG Naive LPS BG Naive LPS BG
Figure S3. DNA Methylation Dynamics in Monocyte-to-Macrophage Differentiation and Tolerance and Training, Related to Figure 1
(A) Correlation plot of DNA methylation values, showing clear separation of LPS d1 and LPS-d6 from other samples.
(B) Boxplot of 2,700 DMRs, showing that the general trend is loss of methylation during monocyte-to-macrophage differentiation.
(C) Chromatin context of DMRs. The majority (91%) of DMRs occur in distal regions marked by H3K4me1, 69 occur at H3K27ac marked enhancers and open
chromatin regions. Only 6% occur at promoters.
(D) Boxplots showing DNA methylation over time for macrophage sub-type specific DMRs. Analysis identified DMRs common to all macrophages, and those that
are only established in LPS-Mf or not-established in LPS-Mf.
(E) Heatmap of H3K27ac changes at DMRs. Generally, DNA de-methylation at DMRs was associated with accumulation of H3K27ac.
A Timecourse RNA-seq Time-specific validation
EGR2 Donor 1 Donor 2
150 150 4
relaƟve to RPMI
RPKM
Fold Change
100 100 3
2
4 hour
50 50
1 N=5
0 0 0
d0 4h d1 d6 d0 4h d1 d6 RPMI LPS
Naive BG
CSF1 Donor 1 Donor 2

150 60 80
relaƟve to RPMI
RPKM
60
Fold Change
100 40
40 4 hour
50 20 N=5
20
0 0 0
Naive BG
MITF Donor 1 Donor 2

200 200 4
relaƟve to RPMI
RPKM
Fold Change
100 100 2 4 hour
1 N=7
0 0 0
Naive BG
LAMP1 Donor 1 Donor 2

1000 1000 8
RPKM
relaƟve to RPMI 6
Fold Change
500 500 4 Day 1

2 N=4
0 0 0
Naive BG
B BG
Dectin FDPS
ALG14
14
DLAT
IRF3
CDIPT
ESRRA SOAT
SOAT1
STK11 AP1S1
BPNT1
NT1
NDUFS6
FIG4
ZNF740 HMG20B ACADS

A CAD
ATM
AC
ACLY
MTMR4
SNAI3
EGR2
NSDHL PHYH
RDH14 ADH5
APOL1
PPARA MTMR2
2 AGPAT2
A GPAT2
2 FECH
LTA4H
PIGK
MAF AKR1B1
A KR1B1 PIGV
IRF8
CPT2
ABHD5
APOL2 IVD CD36
Brown: MAZ ACOT4
ARSA SLC27A3
S LC27A3
3
PIGN
Expression peak 4h JUN

DOLPP1
D OLPP1
1
FADS1
ZNF232 ACAT2
ACAT1
THRA ALG8
GDE1
SCD
Yellow: KLF9
CEBPA
HTRA2
LPCAT4
L PCAT4
4
PGAP3
Expression peak d1 ACOT2

ETFDH
CTNS
Green:
MITF
Expression peak d6 ABCC1
IGF2R GCDH
PTGR1 ALAS1
SLC27A1
S LC27A1
1 INPP4A
A PLA2G4C
P LA2G4C
C
Red lines:
PDHB FGR
DHRS4 SMPD1 NFE2L1
N FE2L1 INPP5F CLN6
NANP
GNS
ZNF25
PLA2G15
P LA2G15
5
CDK8 MGST3 PIK3R2 CTSZ
PLTP PLD3 PPT1
Motif presence
ALG5 MTERFD1
M TERFDD1
1 PPAP2B
P PA B
THEM4 CTSDAP3M2
GALNSCTS PIGF
ZHX3
X3
GBA
PLD1 UBTF TFAM PIP4K2C
P IP4K2C
C
SCARB1
S CARB1
1 USF2 PTPMT1
PTPMT1 PDE3B
GLB1 RBL2
STUB1 CTSH AP3D1 ST3GAL2
S T3GAL2
2
AGPAT5
A GPAT5
5 PIGQ FABP
FABP3
ALDH3A2
A LDH3A2
2 ECI2 ZNF768 MSMO1 PEX7
GM2A PIGO DHCR24
PTGES2
AGPAT4
A GPAT4
4
ATP6AP1
A TP6AP1
1 ABCC3
TNFRSF21
TNFRSF2
21 ZNF775
UGCG
OSBPL1A
O SBPL1A
A NAGA ACAA2 ALG6
LPL PCCB
AGPS DPAGT1
D PAGT1 SRD5A3
S RD5A3
3 HMGCL CYP4V2
CECR5 HEXB ATP5A1
A TP5A1 RGL1
CTSB TECR
ZNF32 TEX2
ALG12
LAMP1
LA
AMP1
P1 PLA2G7
P LA2G7
7
GPD2 ECHS1 ECI1
CI1 ZNF616
Z NF6 ECH1 FAR2 C14orf1
C 14orf1
WDTC1
PDSS2 ACP5
MCEE ETNK1 HSD11B1
H SD11B1
1 CTSO
LIPA
L IPA
A SLC25A1
S LC2
IDH1
ACADM PIP4K2B
P IP4K2B
B
ACADVL
A CADVL
L HADHB
MGLL ACOT7 ABCD3 PDK4
MVK CPN
CPNE3 ACAD8
SORT1
NCEH1 RDH10 NAGLU
ME1 CAT ABHD3 HDLBP
ACP2 PAFAH2
P AFAH2
2
SLC25A20
S LC25A20
0 HEXA GPC4
DBI CARM1 ERLIN2
CERS6 ST6GALNAC2
ST6
6GALNA
A 2
AC
CD63 PC FUCA1
SDC2
CMAS
G6PD SLC27A4
S LC27A4
4
CROT
HSD17B4
H SD17B4
4
POGLUT1
P OGLUT1
1 ACOX1 PDHA1

Figure S4. Expression of Transcription Factors with Enriched Motifs at BG-Associated Promoters and Enhancers and Pathways Associated
with Downstream Genes, Related to Figure 2
(A) The expression of main genes enriched at ‘BG up / LPS down’ and ‘Differentiation gain’ promoters and enhancers is shown separately for each donor over
time. Naive cells are green, LPS exposed cells are red, and BG exposed cells are purple. EGR2 expression peaks transiently at 4 hr in BG exposed cells, but by
day 6, there is no difference between Naive, LPS-Mf or BG-Mf. CSF1 and MITF expression peaks at day 1 and then is reduced. Downstream TF USF2 shares one
motif with MITF, and shows high expression in BG macrophages at day 6. LAMP1 is a major component of the lysosome, and together with LAMP2 makes up
50% of all lysosomal proteins. LAMP1 expression peaks late, and is significantly higher in BG-Mf compared to naive and LPS-Mf. qPCR was used to validate
RNA-seq results in monocytes from multiple donors.
(B) Transcription Factor network based on EGR2 and MITF motif occurrence at BG induced lysosome and lipid metabolism genes. The size of the nodes
represents the number of connections. EGR2 motif is present in the MITF promoters (thick connection). EGR2 and/or MITF motifs are present in another 28 TFs,
which themselves have 14 distinct motifs (and are visible as a cluster. Most genes have a combination of EGR2, MITF and a downstream TF motifs (light brown
circle). The set of genes to the right do not have EGR2 or MITF motifs, but have motifs for one of the downstream TFs (light gray circle). Overall this network
explains 79% of BG-induced lipid metabolism and lysosome-associated genes, compared to 58% based on EGR2 and MITF scan alone. BG induces EGR2
expression, through its receptor, Dectin-1, and higher expression of MITF is observed, as well as its activator cytokine factor CSF1 (see also Figure S4).
Conversely, LPS treatment represses EGR2, CSF1 and MITF. Genes are labeled by time at which their expression peaks in BG exposed cells. EGR2 expression
peaks at 4 hr (brown), MITF and KLF9 at day 1 (gold). The rest of the downstream genes peak at day 1 (gold) or peak at day 6 (green). Connections between TFs
and downstream genes is shown as red lines.
A Median expression of tolerized, partially tolerized and responsive genes over the time-course
G1) Tolerized G2) Partially tolerized G3) Responsive

2 3 3
2 2
1
logFC
logFC
1 1
logFC
0
0 0
−1
−1 −1
RPMI RPMI RPMI
LPS LPS LPS
BG BG BG
−2 −2 −2
B CXCL11 CXCL9 IL10 IL8

1000 150 80 50000
Expression
800 60 40000
RPKM
600 100 30000

40
400 50 20000
200 20 10000
0 0 0 0
Day
- 6- ResƟm
- - -Day -6 ResƟm
- - - Day-6 ResƟm
- - Day
- 6- ResƟm
- -
C
RNA level Relative cytokine release
TNF IL6
500 800 100
over naïve - Mf
Fluorescence
400 600 10
RPKM
300
400 1
200
100 200 0.1
0 0 0.01
Day
- 6- Restim
- - Day
- -6 Restim
- - LPS BG LPS BG
Naïve - Mf TNF IL6

LPS - Mf
D
Tolerized pathways Responsive pathways
Toll−like signaling NOD−like receptor signaling

Cytokine signaling
Chemokine signaling Cytokine signaling
RIG−I−like receptor
Apoptosis
Type I diabetes Phagocytosis
Jak−STAT signaling
p53 signaling
Cytosolic DNA−sensing DNA sensing
Allograft rejection Helicobacter infection
Jak−STAT signaling Apoptosis
Chemokine signaling Toll−like signaling Hematopoiesis
Figure S5. Tolerance at the Transcriptional Level, Related to Figure 3

(A) Pattern of expression of tolerized and responsive genes during the time-course shown as median logFC of two donors (with first and third quartiles shown as
shaded areas). The most tolerized (G1) genes did not show upregulation in response to the initial LPS exposure in monocytes, while responsive genes (G3)
showed high induction in monocytes.
(B) Notable examples of tolerized and responsive genes. Data are shown as mean RPKM and error bars are standard deviations. Data are represented as
mean ± SD.

(C) Expression of IL6 and TNF. Release of these proteins from macrophages in response to LPS is considered the gold-standard for determining tolerance. At the
transcriptional level TNF is partially tolerized, while IL6 is responsive in LPS-Mf. Error bars represent standard deviation. IL6 and TNF protein release after LPS
restimulation is high in BG-Mf and absent in LPS-Mf compared to naive-Mf. The disconnect between transcription and release of IL6 can potentially be explained
by the larger size and higher lysosome content in BG-Mf, induced by early activation of lipid and lysosome pathways in BG exposed cells.
(D) Top 10 KEGG pathways enriched in tolerized and responsive gene groups from DAVID ontology analysis. Area relates to the number of genes within the
pathway, red font signifies that the pathway only shows significant enrichment in the tolerized gene group. Cytokine-cytokine receptor signaling was the top
pathway in both tolerized and responsive groups indicating that cytokine genes are equally spread across the gradient of LPS-Mf response to LPS re-exposure.
A Tolerized pro -inflammatory Transcription Factors
STAT2 STAT5A IRF1 IRF8

400 200 250 600
300 150 200
RPKM
150 400
200 100
100 200
100 50 50
0 0 0 0
- 6- ResƟm
Day - - - 6- ResƟm
Day - - - 6- ResƟm
Day - - - 6- ResƟm
Day - -
B Responsive pro - inflammatory Transcription Factors
NFKB1 RELA
300 80
60
200
40 Naïve- Mf
100
20 LPS - Mf
0 0
-
Day 6- ResƟm
- - -Day 6- ResƟm
- -
C Promoter dynamic H3K4me3
G1) Tolerized G2) partial tolerized G3) Responsive

3 2 2
2 1
1
logFC
1 0
0
0 −1
RPMI RPMI RPMI
LPS LPS LPS
BG BG BG
−1 −1 −2
Figure S6. Active Histone Mark Changes at Promoters of Tolerized and Responsive Genes and Overall Chromatin States at the Same
Promoters, Related to Figure 4
(A) Expression at day 6 and at LPS re-exposure for STAT2 and 5A, and IRF1 and 8 (mean RPKM of 4 donors, error bars represent standard deviation). These
pro-inflammatory TFs show a tolerized response in LPS-Mf to LPS re-exposure. The inability of these genes to be activated may play a role in the tolerance of
downstream targets, as suggested from the enrichment of their motifs in the G2 partially tolerized gene promoters (Figure 4B).
(B) expression at day 6 and at LPS re-exposure for NFKB1 and RELA. These TFs are responsive to LPS re-exposure in LPS-Mf, and their motifs are not
significantly enriched in tolerized genes. This suggests that NF-kB signaling is not impaired at the level of transcription. Data are represented as mean ± SD.
(C) LPS-Mf do not accumulate H3K4me3 at tolerized genes, but do so at the promoters of responsive genes. This pattern is similar to that of H3K27ac shown in
Figure 4D.
d0 24h 48h d6
A
Culture
RPMI +BG
LPS
LPS + BG
RNA d1 +4h Day 3 Day 6

collection (C) (B) (B)
B Differentiation-associated genes
Lipid metabolism, oxidative phosphorylation
LAMP1
1000
3 800
Expression
RPKM
600
400
-3
200
0
Mo Naive LPS BG Naive LPS LPS Naive LPS LPS Naive BG LPS LPS
BG BG BG
4h d3 d6 d6
C Day 1 naïve (RPMI culture) + BG 4h
EGR2 MITF CSF1

6 3.0 25
Fold change
5 2.5 20
4 2.0
RNA
15
3 1.5
2 1.0 10
1 0.5 5
0 0.0 0
Day 1 Tolerized (LPS exposed) + BG 4h

EGR2 MITF CSF1
6 3.0 25
Fold change
5 2.5 20
4 2.0
RNA
15
3 1.5
1.0 10
2
1 0.5 5
0 0.0 0
Figure S7. Expression of Genes Involved in Lipid Biosynthesis and Metabolism following BG Reversal of LPS-Induced Tolerance, Related to
Figure 7
(A) Experimental set-up, indicating the collection of samples for gene expression analysis. Samples were collected at day 1 +4h, indicating that monocytes were
treated with media (RPMI) or LPS for 24 hr, at which point cells were exposed to BG for 4 hr and collected. Additionally samples were collected at day 3 and day6.
(B) BG exposure, following LPS, recovers the expression of genes involved in lipid biosynthesis and oxidative phosphorylation as early as day 3. LAMP1 is
an example of a lysosome gene that shows high expression in BG-Mf and low expression in LPS-Mf. BG exposure recovers the expression of this gene in
LPS-BG-Mf.
(C) BG addition at day 1 in Naive monocytes induces the expression of EGR2, MITF and CSF1, as it does when added at day 0 (Figure S4C). In tolerized
monocytes, BG induces the expression of EGR2 and MITF, but to a lesser degree. This indicates that BG receptor pathways are not completely disrupted by LPS
exposure, providing a basis for BG reversal of LPS-induced tolerance.
Resource
Lineage-Specific Genome Architecture Links

Enhancers and Non-coding Disease Variants to
Target Gene Promoters
Biola M. Javierre, Oliver S. Burren,
Steven P. Wilder, ..., Chris Wallace,
Mikhail Spivakov, Peter Fraser
Correspondence
mf471@cam.ac.uk (M.F.),
cew54@medschl.cam.ac.uk (C.W.),
mikhail.spivakov@babraham.ac.uk
(M.S.),
peter.fraser@babraham.ac.uk (P.F.)
In Brief
This study deploys a promoter capture
Hi-C approach in 17 primary blood cell
types to match collaborating regulatory
regions and identify genes regulated by
noncoding disease-associated variants.
Explore this and other papers at the Cell
Press IHEC webportal at http://www.cell.
com/consortium/IHEC.
Highlights
d High-resolution maps of promoter interactions in 17 human
primary blood cell types
d Interaction patterns are cell type specific and segregate with

the hematopoietic tree
d Promoter-interacting regions enriched for regulatory

chromatin features and eQTLs
d Promoter interactions link non-coding GWAS variants with

putative target genes
Javierre et al., 2016, Cell 167, 1369–1384

November 17, 2016 ª 2016 The Authors. Published by Elsevier Inc.
Resource
Lineage-Specific Genome Architecture

Links Enhancers and Non-coding
Disease Variants to Target Gene Promoters
Biola M. Javierre,1,11 Oliver S. Burren,2,11 Steven P. Wilder,3,11 Roman Kreuzhuber,3,4,5,11 Steven M. Hill,6,11 Sven Sewitz,1
Jonathan Cairns,1 Steven W. Wingett,1 Csilla Várnai,1 Michiel J. Thiecke,1 Frances Burden,4,5 Samantha Farrow,4,5
Antony J. Cutler,2 Karola Rehnström,4,5 Kate Downes,4,5 Luigi Grassi,4,5 Myrto Kostadima,3,4,5 Paula Freire-Pritchett,1
Fan Wang,6 The BLUEPRINT Consortium, Hendrik G. Stunnenberg,7 John A. Todd,2 Daniel R. Zerbino,3 Oliver Stegle,3
Willem H. Ouwehand,4,5,8,9 Mattia Frontini,4,5,8,* Chris Wallace,2,6,10,* Mikhail Spivakov,1,12,* and Peter Fraser1,*
1Nuclear Dynamics Programme, The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
2JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Cambridge Biomedical Research
Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK
3European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SD, UK
4Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
5National Health Service Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
6MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK
7Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University Nijmegen,
Geert Grooteplein Zuid 30, 6525 GA Nijmegen, the Netherlands

8British Heart Foundation Centre of Excellence, Division of Cardiovascular Medicine, Addenbrooke’s Hospital, Hills Road,
Cambridge CB2 0QQ, UK

9Department of Human Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK
10Department of Medicine, University of Cambridge, Addenbrooke’s Hospital, Cambridge CB2 0SP, UK
11Co-first author
12Lead Contact
*Correspondence: mf471@cam.ac.uk (M.F.), cew54@medschl.cam.ac.uk (C.W.), mikhail.spivakov@babraham.ac.uk (M.S.),

peter.fraser@babraham.ac.uk (P.F.)
SUMMARY INTRODUCTION
Long-range interactions between regulatory ele- Genomic regulatory elements such as transcriptional en-
ments and gene promoters play key roles in tran- hancers determine spatiotemporal patterns of gene expres-
scriptional regulation. The vast majority of interac- sion. It has been estimated that up to 1 million enhancer
tions are uncharted, constituting a major missing elements with gene regulatory potential are present in
link in understanding genome control. Here, we use mammalian genomes (ENCODE Project Consortium, 2012).
Although a number of well-characterized enhancers map close
promoter capture Hi-C to identify interacting regions
to their target genes, assignment based on linear proximity is
of 31,253 promoters in 17 human primary hematopoi- error prone, as many enhancers map large distances away
etic cell types. We show that promoter interactions from their targets, bypassing the nearest gene (Mifsud et al.,
are highly cell type specific and enriched for links be- 2015; Sanyal et al., 2012; Schoenfelder et al., 2015). Long-
tween active promoters and epigenetically marked range gene regulation by enhancers in vivo involves close
enhancers. Promoter interactomes reflect lineage re- spatial proximity between distal enhancers and their target
lationships of the hematopoietic tree, consistent with gene promoters in the three-dimensional nuclear space (Carter
dynamic remodeling of nuclear architecture during et al., 2002), most likely involving a direct interaction (Deng
differentiation. Interacting regions are enriched in ge- et al., 2014), while the intervening sequences are looped
netic variants linked with altered expression of genes out. Thus, a comprehensive catalog of promoter-interacting
they contact, highlighting their functional role. We regions (PIRs) is a requisite to fully understand genome tran-
scriptional control.
exploit this rich resource to connect non-coding dis-
Thousands of disease- and trait-associated genetic variants
ease variants to putative target promoters, priori- have been identified by genome-wide association studies
tizing thousands of disease-candidate genes and (GWAS). The vast majority of these variants are located in non-
implicating disease pathways. Our results demon- coding regions of the genome, often at considerable genomic
strate the power of primary cell promoter interac- distances from annotated genes, making assessment of their
tomes to reveal insights into genomic regulatory potential function in disease etiology problematic. However,
mechanisms underlying common diseases. GWAS variants are enriched in close proximity to DNase I
Cell 167, 1369–1384, November 17, 2016 ª 2016 The Authors. Published by Elsevier Inc. 1369
Table 1. Summary of PCHi-C Datasets Generated in This Study
Biological Detected Promoter
Cell Type Acronym Replicates Unique Captured Read Pairsa Interactionsb
Megakaryocytes MK 4 653,848,788 150,779
Erythroblasts Ery 3 588,786,672 151,215
Neutrophils Neu 3 736,055,569 142,435
Monocytes Mon 3 572,357,387 165,947
Macrophages M0 M40 3 668,675,248 180,190
Macrophages M1 M41 3 497,683,496 171,031
Macrophages M2 M42 3 523,561,551 186,172
Endothelial precursors EndP 3 420,536,621 145,888
Naive B cells nB 3 629,928,642 189,720
Total B cells tB 3 702,533,922 213,539
Fetal thymus FetT 3 776,491,344 166,743
Naive CD4+ T cells nCD4 4 844,697,853 210,074
Total CD4+ T cells tCD4 3 836,974,777 199,525
Non-activated total CD4+ T cells naCD4 3 721,030,702 211,720
Activated total CD4+ T cells aCD4 3 749,720,649 213,235
Naive CD8+ T cells nCD8 3 747,834,572 216,232
Total CD8+ T cells tCD8 3 628,771,947 204,382
Total 11,299,489,740 698,187c
a
Total numbers of valid read pairs across all biological replicates are listed. See Table S1 for replicate-level statistics.
b
Interactions with CHiCAGO scores >5. This excludes 9,396 interactions involving 484 captured non-promoter fragments that are not considered
further in the study.
c
Unique interactions detected in at least one cell type.
hypersensitive sites, potentially disrupting transcription factor RESULTS

binding sites, suggesting that they may contribute to disease
by altering the function of distal regulatory elements in gene con- Promoter Capture Hi-C
trol (Maurano et al., 2012). Therefore, promoter interactions may We performed PCHi-C experiments in 17 human primary blood
link disease-associated variants to their putative target genes cell types (three or more biological replicates per cell type).
(Mifsud et al., 2015). The Hi-C step was performed using in-nucleus ligation (Nagano
Recent advances in chromosome conformation capture et al., 2015), and 22,076 fragments containing 31,253 annotated
technologies such as Hi-C have increased the potential to un- promoters were captured to enrich the Hi-C material for pro-
derstand long-range gene control. However, the enormous moter interactions. Sequencing of the PCHi-C samples pro-
combinatorial complexity of DNA fragment pairs in Hi-C libraries duced over 11 billion unique, valid read pairs involving promoters
impedes high-resolution detection of specific regulatory inter- (Tables 1 and S1). Comparison with Hi-C revealed a 15- to 17-
actions between individual genetic elements in a robust fashion. fold enrichment for promoter interactions, consistent with previ-
Using sequence capture to enrich for Hi-C interactions that ous PCHi-C studies (Schoenfelder et al., 2015), equivalent in this
involve specific regions of interest is a versatile approach to case to the promoter interaction detection power of over 165
overcome the limitations imposed by library complexity (Dryden billion conventional Hi-C read pairs. We used the CHiCAGO
et al., 2014; Sahlén et al., 2015; Schoenfelder et al., 2015). pipeline (Cairns et al., 2016) to assign confidence scores to inter-
We recently developed promoter capture Hi-C (PCHi-C), in actions between the captured promoter fragments and PIRs
which sequence capture is used to pull down fragments con- (Figures 1A–1C), detecting on average 175,000 high-confi-
taining nearly all annotated promoters and their interacting dence interactions per cell type (CHiCAGO score R 5; Figure 1D;
regions from Hi-C libraries, resulting in strong enrichment Tables 1 and S1; Data S1), with a median of four interactions per
for promoter interactions compared with Hi-C (Schoenfelder promoter fragment per cell type. More than half (55%) of PIRs
et al., 2015). interacted with a single promoter fragment, while fewer than
Here, we apply PCHi-C in primary cells to generate a 10% PIRs had four or more promoter interactions per cell
comprehensive catalog of the interactomes of 31,253 annotated type. We found abundant examples of tissue-specific and tis-
promoters in 17 human primary blood cell types. Devising a sue-invariant interactions (Figure 1C). In total, 698,187 high-con-
statistical methodology to link GWAS SNPs to their putative fidence unique promoter interactions were detected across all
target genes based on PCHi-C interaction data, we prioritize cell types, of which 9.6% were promoter-to-promoter interac-
thousands of new candidate genes potentially implicating a tions and 90.4% promoter-to-PIR, with a median linear distance
number of gene pathways in susceptibility to common diseases. between promoters and their interacting regions of 331 Kb.
1370 Cell 167, 1369–1384, November 17, 2016

A B
Read count per di-tag

17 primary
blood cell types
PCHi-C
PCHi-C
DHSs
sDI
Physical interactions of
31,253 annotated promoters TADs
Hi-C
CHiCAGO
~175,000 interactions per cell type

698,187 unique interactions across cell types
D E
Frequency of interactions crossing
1.0
Cumulative number of
Cumulative number of
0.8
interactions [x 1000]
TAD boundary
PIRs [x 1000]
0.6
0.4
0.2
0.0
Cell 167, 1369–1384, November 17, 2016 1371

Approximately 10% of promoter interactions were between frag- We then used Autoclass Bayesian clustering (Cheeseman
ments greater than 1 Mb apart and 5,103 mapped across chro- et al., 1988) to partition promoter interactions based on their
mosomes (‘‘trans-interactions’’). A total of 230,525 unique PIRs CHiCAGO scores across cell types, which produced 34 distinct
were detected, linked to 20,676 captured fragments containing interaction clusters (Figure 2B, heatmap). Just under half (47.4%)
29,992 annotated promoters (Figure 1D). of interactions mapped to predominantly lymphoid-specific
We also sequenced 16 pre-capture Hi-C libraries from eight clusters (1–15, 25, 26) (Figures 2B and 2C). Examples of genes
cell types (Table S1) and identified topologically associated whose promoter interactions predominantly map to this set of
domains (TADs) using the directionality index score (Dixon clusters include T cell receptor components (CD247, CD3D,
et al., 2012) (Figure 1B). We found that about a third of PCHi- and CD3G), as well as IKZF3 coding for the AIOLOS protein
C-identified interactions crossed TAD boundaries, which is that has a key role in lymphoid development. 38.9% of the inter-
significantly below that expected at random in all eight cell types actions mapped to generally myeloid-specific clusters (16–18,
(Figures 1E and S1A), consistent with previous results (Schoen- 27–34). Promoters with predominant interactions in this set of
felder et al., 2015). The frequency of TAD boundary-crossing clusters include, for example, DIP2C (Disco-Interacting Protein
interactions was broadly similar for both promoters adjacent 2 Homolog C) that shows high expression in acute myeloid
to the boundaries and those located in the centers of TADs leukemia. Clusters 19–24, containing 13.6% of interactions,
(on average, 32% and 28.5% respectively). showed strong signal in both lineages.
We chose approximately 1,000 identified PIRs for validation, We found that just over 60% of captured promoter fragments
using them as capture baits in a reciprocal capture system that had at least one interaction detected in both myeloid and lymphoid
we applied to eight Hi-C libraries from four cell types (Figure S2A; lineages, however nearly all of them (>99%) also engaged in
Table S1; and Data S1). The CHiCAGO interaction scores of additional lineage- or cell-type-specific interactions (Figure S3A).
PCHi-C and reciprocal capture Hi-C aligned well (Figure S2 On the whole, interactions sharing the same promoter fragment
and Quantification and Statistical Analysis), thus validating our tended to have more similar cell-type specificities than expected
approach. at random (Figure S3B). This suggests a complex and potentially
cooperative effect of cell-type-specific and invariant interactions
Promoter Interactomes Are Lineage and Cell Type in setting up genome organization and expression.
Specific Collectively, the cell-type specificity and lineage relatedness
Principal component analysis (PCA) of CHiCAGO interaction of promoter interactomes suggests that higher-order genome
scores across all biological replicates of the 17 cell types re- structure undergoes widespread and coordinated remodeling
vealed close clustering of the replicates and separation of the during lineage specification, dynamically reshaping transcrip-
individual cell types (Figure 2A). This demonstrates signal tional decisions.
reproducibility across replicates and suggests strong cell-type
specificity of the interactomes. We noted that neutrophils Promoter-Interacting Regions Are Enriched for
showed a distinct PCA profile, potentially reflecting their unusual Regulatory Chromatin Features
segmented nuclear morphology. Hierarchical clustering of the 17 PIRs were significantly enriched for regions of accessible
cell types based on their CHiCAGO interaction scores demon- chromatin (Figure S3C), with 56% containing accessible re-
strated that patterns of promoter interactions across the cell gions detected by assay for transposase-accessible chromatin
types segregated in a manner generally consistent with the he- sequencing (ATAC-seq) in at least one blood cell type (Corces
matopoietic tree (Figure 2B, top). We further confirmed the et al., 2016). This points to the regulatory potential of many
cell-type specificity and lineage relationships of the interactomes PIRs. To further investigate this, we studied the chromatin prop-
globally using conventional Hi-C at the level of large-scale A/B erties of PIRs using data from the BLUEPRINT project from
nuclear compartments (Figures S1B–S1D). the nine blood cell types, for which sufficient information was
Figure 1. Promoter Capture Hi-C across 17 Human Primary Blood Cell Types
(A) Schematic representation of the project.
(B) Interaction landscape of INPP4B gene promoter along a 5-Mb region in naive CD4+ (nCD4) cells (PCHi-C, top panel). Each dot denotes a sequenced di-tag
mapping, on one end, to the captured HindIII fragment containing INPP4B gene promoter, and on the other end, to another HindIII fragment located as per the
x axis coordinate; the y axis shows read counts per di-tag. Red dots denote high-confidence PIRs (CHiCAGO score R5), and their interactions with INPP4B
promoter are shown as red arcs. Gray lines denote expected counts per di-tag according to the CHiCAGO background model, and dashed lines show the upper
bound of the 95% confidence interval. Genes whose promoters were found to physically interact with INPP4B promoter are labeled in bold. Promoters selectively
interact with specific DNase hypersensitivity sites (DHSs, middle panel) defined in the same cell type from the ENCODE project. Some of these interactions occur
within the same topologically associated domain (TADs, black line, as defined according to the standardized directionality index score, sDI), while others span
TAD boundaries. A conventional Hi-C profile for the same locus in nCD4 cells is shown in the bottom panel.
(C) Interaction landscape of the INPP4B, RHAG, ZEB2-AS, and ALAD promoters in naive CD4+ cells (nCD4), erythroblasts (Ery), and monocytes (Mon). Dot plots
as in (B), with high-confidence PIRs shown in red (CHiCAGO score R5) and sub-threshold PIRs (3 < CHiCAGO score < 5) shown in blue.
(D) The numbers of unique interactions (left) and PIRs (right) detected for a given number of analyzed cell types. Lines and dots show the mean values over 100
random orderings of cell types; gray ribbons show SDs.
(E) Proportions of interactions crossing TAD boundaries per cell type; observed and expected frequencies of TAD boundary-crossing interactions. Error bars
show ±SD across 1000 permutations (see Quantification and Statistical Analysis).
See also Figures S1 and S2, Table S1, and Data S1.
1372 Cell 167, 1369–1384, November 17, 2016

A B
Lymphoid Myeloid
600
Mφ1Mφ2 Mφ0
Mφ1 Mφ1 Mφ2
o oo oo o EndP
Mφ2 oo oooEndP o
EndP
500
1000
Mφ0o Mφ0o
o MK
MK MK
Height
Eryo Ery
o o
oEry MK
o Neu Mon
500
o o
Neu oo
400
Mon Mon aCD4
800
O O
o Neu aCD4 aCD4 O
Cluster ID
400
PC1
naCD4
naCD4
PC1
naCD4 O
O
aCD4
nCD8
nCD4
O
EndP
300
naCD4
0
tCD4
tCD8
Mφ2
Mφ1
Mφ0
FetT
Mon
Neu
tCD4 O
MK
Ery
tCD4
nB
0
tCD4
nBo onB FetT
O
tB
O
nCD4 O
oo O
O
nCD4 O
nCD8 nCD8 nCD8
o nB oFetT
−400
O
nCD4
o
O O O
CHiCAGO 1
tB otBFetT
O
tCD8 tCD8 tCD8 nCD4
−500
o −500 0 500 interaction 2

tB PC2 score
3
>8
−1000
o ooo
o oo o 6 4
oo o
ooo
ooo
4 5
o o
6
2
−1500 −1000 −500 0 500
0
PC2 7
C
Lymphoid Myeloid 8
Cluster ID
10-14
naCD4
Cluster
aCD4
nCD8
nCD4
EndP
tCD4
tCD8
Mφ2
Mφ1
Mφ0
FetT
specificity
Mon
Neu
MK
Ery
nB
tB
score
3 1 15
2
1 3
4 16
5
−1
6 17-18
7
−3 8 19-20
9
10
11
12 21
13
14 22-24
15
16 25-26
17
18 27-30
19
20 31
21
22
23
24
25 32
26
27
28
29
30 33
31
32
33 34
34
Figure 2. Promoter Interactions Reflect the Lineage Relationships of the Hematopoietic Tree
(A) Principal Component Analysis (PCA) of the CHiCAGO interaction scores for each individual biological replicate (nB, naive B cells; tB, total B cells; FetT, fetal
thymus; aCD4, activated CD4+ T cells; naCD4, non-activated CD4+ T cells; tCD4, total CD4+ T cells; nCD8, naive CD8+ T cells; nCD4, naive CD4+ T cells; tCD8,
total CD8+ T cells; Mon, monocytes; Neu, neutrophils; M40–2, Macrophages M0, M1, M2; EndP, endothelial precursors; MK, megakaryocytes; Ery, erythro-
blasts). The inset shows the results of a separately performed PCA for CD4+ and CD8+ T cells only.
(B) Top (dendrogram): hierarchical clustering of the cell types according to their promoter interaction profiles. Bottom (heatmap): Autoclass Bayesian clustering of
interactions according to their cell-type specificity. Cluster IDs are shown on the right. Cluster 9 containing 108,066 interactions is not shown for clarity.
(C) Cell-type specificity of interaction clusters. The heatmap shows cluster specificity scores in each cell type (see Quantification and Statistical Analysis for
details). Cell types and clusters are arranged as in (B).
See also Figures S3A and S3B.
available (Figure 3). We found PIRs to be significantly enriched for H3K4me3 and H3K36me3 at PIRs, which are marks associ-
for histone marks associated with active enhancers, such as ated with active promoters and transcribed regions, respec-
H3K27ac and H3K4me1, in comparison with distance-matched tively, consistent with non-coding transcription of regulatory
random controls (Figures 3A and 3B). We also found enrichment regions (Natoli and Andrau, 2012).
Cell 167, 1369–1384, November 17, 2016 1373

A B Significance of PIR enrichment
z-score
PIR enrichment for histone modifications

Neu
150
3 nCD8
100
(mean across 9 cell types) nCD4
50
Mon
2 0
MK
Mφ2
1 Mφ1
Mφ0
Ery
0
ac e1 e3 e3 e 3 e3
ac
e1
e3
e3
3K 3
27 3
H me
3 K me
27
27
m m m 7 m K9m
4m
4m
9m
K 4 4 6
36
3K
H3 3K 3K K3 2
3K
3K
K H3
3K
H3 H3
H
H H
H
H
C
LCR
Regulatory
build
annotation
AC104389.31 >
< HBE1 OR51B6 > AC104389.32 >
Ensembl < CoTC_ribozyme < HBG1
annotation < HBB < HBD < HBBP1 < HBG2 < OR51B4
< OR51B2 < OR51B5
< OR51AB1P < OR51B3P
< CTD-2643I7.1
< AC104389.28 < OR51B8P
HindIII
fragments
Activity
Ery
PCHi-C
Activity
Mon
PCHi-C
Activity
nCD8
PCHi-C
Ensembl Homo sapiens version 83.37 (GRCh37.p13) Chromosome 11: 5,241,525 - 5,392,845 20Kb
Ensembl annotation Regulatory build annotation ChromHMM activity
Protein coding Processed transcript Promoter CTCF binding site TFBS
Distal enhancer Active Inactive
Pseudogene RNA gene Proximal enhancer DHS
D E F
Promoter Capture Hi−C
Correspondence of Correspondence of promoter
Count
0 20 50
promoter and enhancer activities interaction and enhancer activity
0.6 0.8 1 1.2 1.4

Enrichment
nCD8
nCD4
Regulatory build: distal active enhancers
Mφ2
Mφ1
Mφ0
Mon
Neu
Ery
MK
Neu
Mon
Mφ0 p=6e-5 p=0.02
Mφ1 Observed / expected
Mφ2
MK 0.6 1.0 1.4
Ery
nCD4 Active promoter Active enhancer
nCD8 Non-active promoter Non-active enhancer
1374 Cell 167, 1369–1384, November 17, 2016

We then focused on regions annotated as promoters and en- clustering to the resulting gene specificity scores, we obtained
hancers in the Ensembl Regulatory Build (Zerbino et al., 2015), the 12 clusters shown in Figure 4B. This revealed clusters of genes
defining their activity on the basis of ChromHMM (Ernst and with predominant enhancer specificity in one or multiple related
Kellis, 2012) segmentations of the BLUEPRINT histone ChIP cell types, and a cluster with no predominant specificity (cluster 9).
data. We asked whether the cell-type-specific activity state We compared the gene specificity scores based on interac-
of enhancers depended on their connectivity to promoters, or tions with active enhancers with analogous scores that capture
alternatively, whether enhancer-promoter interactions tended cell-type specificity of the respective genes’ expression. As
to be primed irrespective of enhancer activity (Ghavi-Helm shown in Figures 4C and S4B, genes mapping to a cell-type-
et al., 2014). Consistent with previous findings in the b-globin specific cluster based on their interactions with active enhancers
locus (Tolhuis et al., 2002), Figures 3C and S3D show that inter- were, on average, preferentially expressed in the same cell type.
actions between the Locus Control Region (LCR) enhancers and The link between cell-type specificity of active enhancer inter-
the HBB and HBG genes occur in erythroblasts, in which they are actions and gene expression was the most apparent when
active, but not in monocytes or CD4+ T cells. We observed this focusing on genes expressed with the highest cell-type speci-
activity-state-dependent connectivity of enhancers with pro- ficity (Figures 4D, 4E, and S4C). For example, 46% of the top
moters globally (Figure 3D), and formally confirmed it using over- 100 lymphoid-specifically expressed genes mapped to cluster
dispersion-adjusted statistical tests (Figures 3E and 3F). These eight characterized by lymphoid-specific active enhancer inter-
results demonstrate that the dynamic nature of enhancer- actions, while an additional 37% mapped to clusters with active
promoter interactions is preferentially coupled with the cell- enhancers specific to both nCD4 cells and other cell types
type-specific activity of the regulatory elements they connect. (Figure 4D). Taken together, these results support a direct func-
tional role of the identified enhancer-promoter interactions in
Enhancer Activity Associates with Lineage-Specific transcriptional control.
Gene Expression
To gain insight into the role of promoter contacts in regulating Expression Quantitative Trait Loci Provide Evidence for
lineage-specific gene expression, we integrated information PIR Regulatory Function
on chromatin states at promoters and enhancers with global Natural genetic variation has been described as an ‘‘in vivo muta-
gene expression profiles in the same cells available from the genesis screen’’ (Heinz et al., 2013). Here, we used data on
BLUEPRINT consortium. Comparing gene expression across sequence variants associated with altered expression of specific
cell types, we observed that promoter interactions with active genes (expression quantitative trait loci, eQTLs) in primary
enhancers generally had an additive effect on cell-type-specific monocytes and B cells (Fairfax et al., 2012) to demonstrate PIR
expression levels (p < 2 3 1016; Figure 4A). Notably, a weak, but function. Integrating eQTL information with PCHi-C results, and
also significant additive effect was observed when all PIRs, irre- considering at most one ‘‘lead’’ eQTL per gene, we found 899
spectively of their annotation, were considered for the analysis lead eQTLs in monocyte PIRs and 577 in B cell PIRs that physi-
(p < 2 3 1016; Figure S4A), with the fraction of active enhancers cally contact the promoters of the genes they regulate (false
among them providing an independent predictor (p < 2 3 1016; discovery rate [FDR] <10%; Table S2). To confirm the specificity
data not shown). These results confirm that active enhancers, of eQTL localization to PIRs, we randomized PIR locations ac-
and potentially other elements devoid of canonical enhancer counting for interaction distance and compared the proportions
features, quantitatively contribute to gene expression. of variants that are eQTLs at PIRs and at these random regions.
We then sought to partition genes based on the cell-type spec- We found that PIRs are selectively enriched for eQTLs regulating
ificity of their interactions with active enhancers. For each gene, the same gene that the PIR is connected to, across a broad
we used CHiCAGO interaction scores and enhancer activity range of linear distances from their target promoters (Figures
states to calculate a ‘‘gene specificity score’’ for each cell type 5A and 5B). We found a similar enrichment when considering
(see Quantification and Statistical Analysis). Applying k-means at most one eQTL variant per gene (Figures S5A and S5B). These
Figure 3. Promoters Preferentially Connect to Active Enhancers

(A) PIR enrichment for histone marks compared with distance-matched random regions. Error bars show SD across 100 draws of random regions.
(B) Significance of PIR enrichment for histone marks from (A), expressed in terms of Z scores.
(C) Promoter interactions and chromatin features in the b-globin locus. PCHi-C data from three cell types, showing regulatory element annotations from the
Ensembl Regulatory Build, colored by feature, and chromatin activities based on ChromHMM segmentations of BLUEPRINT histone modification data. The
image is based on a screenshot produced with Ensembl v83 using GRCh37 assembly and GENCODE v19 gene annotations. The b-globin Locus Control Region
(LCR) is highlighted (blue box).
(D) Enrichment of PIRs for active distal enhancers (shown per biological replicate).
(E) Enrichment of promoter-enhancer interactions for links between active promoters and active enhancers. The observed to expected ratios of each combination
of promoter and enhancer activity connected by an interaction are color coded. The p value is for the overdispersion-adjusted c2 test of independence of
promoter and enhancer states at either ends of interactions. The non-active category includes the ‘‘poised,’’ ‘‘Polycomb-repressed,’’ and ‘‘inactive’’ states
defined with chromHMM.
(F) Interactions between an active promoter and an enhancer are preferentially found in cell types, in which the enhancer is active. Observed to expected ratios for
each combination of enhancer activity and the presence or absence of interaction are color coded. The p value is for the overdispersion-adjusted c2 test of
independence of the enhancer state and the presence of interaction. The non-active category is as in (E).
See also Figure S3C.
Cell 167, 1369–1384, November 17, 2016 1375

A C
* * * nCD4 MK
2
4
Residual gene expression
1
● 1
Mean gene specificity score ( expression)

●
●●●
2
0 ●●
●●●
●●●● ● 0 ●
●●●●●
●●
0
−1
−1
−2
−1 0 1 2 −1 0 1 2
2 3
Ery Neu
−4
* *
2
−4 −2 0 2 4 6 >=8
1
No. active enhancers (mean centred)
● 1 ●
B ●●● ●
0 ●●● 0
● ●● ●
●●● ● ●●●●
●
●
Cluster ID
−1
−1
nCD4
Mφ0
Mφ2
Mφ1
Mon
Neu
Ery
MK
−1 0 1 2 −1 0 1 2
Mean gene specificity score (interactions with active enhancers)
1
Cluster ID 1 ● 2● 3● 4 ● 5● 6● 7● 8 ● 9● 10● 11● 12 ●

2
3
D
4
−4 −2 0 2 4
5
Gene specificity score

Cluster ID (interactions with active enhancers)
6
123 5 6 7 8 9 10 11 12
nCD4
7
Mφ1
Mφ2
Genes
Mφ0
Mon
MK
Ery
9
Neu
Top 100 nCD4-specific genes (based on expression)

10
E Cluster ID
1 2 3 4 5 6 7 8 9 10 11 12
nCD4
11
Mφ1 −0.2 0 0.2

Mφ2 Cluster enrichment for the top 100
cell type-specifically expressed genes
Mφ0
12
Mon
MK
Ery
−4 −2 0 2 4 Neu
(interactions with active enhancers)
1376 Cell 167, 1369–1384, November 17, 2016

results demonstrate that variants in physical contact with gene we devised blockshifter, a method that takes into account corre-
promoters are significantly more likely to have regulatory effects lation structure in both GWAS and PIR datasets (Figure S6A). We
on the genes they contact compared with other variants found that variants associated with autoimmune disease are en-
matched by distance. Taken together, these findings provide riched at PIRs in lymphoid compared to myeloid cells (Figure 6A).
robust functional support for over 1,000 promoter interactions In contrast, SNPs associated with platelet- and red-blood-cell-
in monocytes and in B cells. specific traits were predominantly enriched at PIRs in myeloid
The identification of eQTLs is affected by the power of the lineages (Figure 6A). Finally, SNPs associated with traits gener-
eQTL study. Therefore, we additionally considered eQTLs from ally unrelated to hematopoietic cells, such as blood pressure
a larger meta-analysis in whole blood (Westra et al., 2013). (systolic, BP S, and diastolic, BP D) and bone mineral density
We found 1,214 lead cis-eQTLs in whole blood are located in (in femoral neck, femoral neck mineral density [FNMD], and
PIRs that physically contact the respective eQTL target gene lumbar spine, lumbar spine mineral density [LSMD]) were
promoters in at least one analyzed cell type (Table S2), which not selectively enriched at PIRs in any analyzed cell types
significantly exceeded random expectation (p < 1e-3; Fig- (Figure 6B). Collectively, these results confirm the selective
ure S5C). In total, PIRs detected in our study overlapped enrichment of GWAS variants at PIRs in putative disease- and
25.7% of all lead cis-eQTLs in whole blood for the respective trait-relevant cell types.
PIR-connected genes. Collectively, these results provide abun- We next developed a Bayesian prioritization strategy termed
dant evidence of PIR function. COGS (Capture Hi-C Omnibus Gene Score) for using promoter
Examples of eQTLs at PIRs included those with effects interaction data to rank putative disease-associated genes
on more than one gene. For instance, eQTL SNP rs71636780 and tissues across the 31 GWAS traits. This algorithm integrates
localizes to a PIR of two genes, ARID1A and ZDHHC18 in statistical fine mapping of GWAS signals across SNPs mapping
monocytes (located 50 and 100 kb away, respectively), with to gene coding regions, promoters, and PIRs to provide a
its variants showing opposite effects on expression of these single measure of support for each gene. Figure 6C shows
genes (Figure 5C). In contrast, eQTL SNP rs117561058 within an example of the COGS algorithm at work in the 1p13.1
a PIR of NDUFAF4 and ZBTB2 shows consistent effects on rheumatoid arthritis (RA) susceptibility region, prioritizing RP4-
the expression of both genes. Strikingly, this PIR is located 753F5.1, CD101, TTF2, and TRIM45 as RA candidate genes
10 and 60 Mb from NDUFAF4 and ZBTB2, respectively (Fig- (Figure 6C, bottom panel). A possible role for CD101 in RA was
ure 5D). Further examples of long-range PIRs harboring eQTLs previously reported (Jovanovic et al., 2011). Future work may
are shown in Figures S5D and S5E. establish whether the other genes prioritized on the basis of
Notably, we found 194 monocyte eQTLs, 118 B cell eQTLs, the same GWAS SNP-harboring PIR (Figure 6C, white bar) also
and 310 whole-blood eQTLs at PIRs containing promoter re- contribute toward disease, since a single element may regulate
gions of other genes, suggesting that promoter-promoter inter- multiple genes, as evidenced by eQTL examples in Figures 5C
actions may have regulatory effects. This is consistent with and 5D and previous studies (Hanscombe et al., 1991; Mohrs
previous findings for the INS and SYT8 genes (Xu et al., 2011) et al., 2001).
and emerging genome-wide data (I. Jung and B. Ren, personal Using the COGS algorithm genome-wide for 31 diseases and
communication). blood cell traits, we prioritized a total of 2,604 candidate genes
Taken together, expression quantitative trait loci provide func- (with a median of 122 genes per trait at gene-level score >0.5;
tional and statistically supported evidence for a regulatory role of Table S3). The prioritized genes exhibited both expected and un-
the PCHi-C-identified promoter interactions and demonstrate expected enrichments for specific pathways in the Reactome
their potential to link non-coding regulatory variants with target Pathway Database (Fabregat et al., 2016). In particular, and as
genes. expected, genes prioritized for autoimmune diseases were en-
riched in inflammation and immune-response-related pathways,
Promoter Interactions Prioritize Putative Target Genes such as interleukin and T cell receptor signaling, whereas genes
of Disease-Associated SNPs prioritized for platelet traits were preferentially associated with
We integrated PCHi-C data with summary statistics from 31 platelet production and hemostasis (Figure 6D). Less obvious
GWAS, including eight autoimmune diseases, eight blood cell pathway associations included free oxygen species metabolism
traits, and nine metabolic and six other traits (Table S3). To in celiac disease (Yang et al., 2015), and post-translational and
assess cell-type-specific enrichment of GWAS signals at PIRs, epigenetic modifications of proteins and nucleic acids in the
Figure 4. Active Enhancers at PIRs Associate with Lineage-Specific Gene Expression

(A) Plot of log2-gene expression as a function of the number of interacting active enhancers in cell types, where the promoter is active. Trendline shows linear
regression. Asterisks above and below the boxplots reflect the fact that some outlying observations have been cropped.
(B) Heatmap of ‘‘gene specificity scores’’ for 7,004 protein-coding genes uniquely mapping to a captured fragment (rows), based on their interactions with active
enhancers in each of eight cell types (columns). Genes are partitioned using k-means clustering.
(C) Mean gene specificity score (based on interactions with active enhancers) for each of the clusters in (B) plotted against analogous mean gene specificity
scores based on expression data for nCD4, MK, Ery and Neu cells. Error bars indicate ±SD. Plots for Mon and M41–3 are shown in Figure S4B.
(D) Subset of the heatmap in (B), showing interaction-based gene specificity scores for the top 100 nCD4-specifically expressed genes, together with cluster IDs.
(E) Enrichment of the 12 clusters shown in (B) for the 100 genes expressed with highest specificity in each analyzed cell type (see Quantification and Statistical
Analysis for details).
See also Figure S4.
Cell 167, 1369–1384, November 17, 2016 1377

A B
Proportion of SNPs that are eQTLs

Monocytes Total B cells
Proportion of SNPs that are eQTLs
0.16 SNPs at PIRs 0.16 SNPs at PIRs
for the PIR target gene
for the PIR target gene

SNPs at randomised PIRs SNPs at randomised PIRs
0.12 *** 0.12
*** ***
0.08 ***
*** 0.08
***
0.04 *** 0.04
***
*** ***
0 0
KbKb Kb Kb Kb Kb Kb 0Kb 0Kb 0Kb 0Kb 0Kb

10 50 00 00 00 00 10 0 0 0 0
0- 10- 0-1 0-2 0-5 -10 0- 10-5 0-1 0-2 0-5 -10
5 10 20 00 5 10 20 00
5 5
Binned distance from TSS Binned distance from TSS
C Monocytes 50kb D Total B cells 10Mb

8
80 05
67 61
A 63 C 18 75 F4
1 1
I D1 r s7 HH r s1 FA 2
AR ZD
U TB
ND ZB
chr1 chr6
27,025,000 27,075,000 27,125,000 27,175,000 100,000,000 150,000,000
Gene Baited promoter fragment PIR SNP Gene Baited promoter fragment PIR SNP
rs71636780 rs71636780 rs117561058 rs117561058

ARID1A ZDHHC18 NDUFAF4 ZBTB2
3
Gene expression
2
Gene expression
2 2
0 0
0 0
-3 -2 -2 -2
G/G G/A A/A G/G G/A A/A A/A A/T T/T A/A A/T T/T
100kb window around rs71636780 100kb window around rs117561058

20kb 20kb
5
15
-log10p
-log10p
10 3
5
1
0
27080000 27100000 27120000 27140000 chr1 86260000 86280000 86300000 86320000 chr6
ARID1A eQTL test ZDHHC18 eQTL test NDUFAF4 eQTL test ZBTB2 eQTL test rs117561058
rs71636780 ARID1A+ZDHHC18 PIR NDUFAF4 PIR NDUFAF4+ZBTB2 PIR
Figure 5. Promoter-Interacting Regions Are Enriched for Interacting Gene eQTLs

(A and B) The proportion of SNPs that are eQTLs for the PIR-connected gene compared with the equivalent proportion at matched random regions (‘‘randomized
PIRs’’) in monocytes (A) and total B cells (B). Asterisks represent the significance of enrichment at observed versus randomized PIRs (permutation test *p < 0.05;
**p < 0.01; ***p < 0.001).
(C and D) Examples of a single common eQTL SNP identified for two genes (ARID1A and ZDHHC18, C; NDUFAF4 and ZBTB2, D) with either the opposite (C) or
the same (D) directionality of effect. SNPs have been tested within PIRs plus additional 500-bp windows on both sides of them. The Manhattan plots (bottom
panel) depict the eQTL signals for both genes. The gray dashed line represents the significance threshold.
red blood cell traits (Figure 6D), inviting further in-depth vali- more), and did not capture all those prioritized with COGS
dation by specialist communities. The COGS prioritization strat- (Figure S6B).
egy produced distinct results from a ‘‘brute-force’’ approach We further focused on a subset of 421 highest-scoring genes
based on promoter colocalization with disease susceptibility prioritized for at least one autoimmune disease. Taking into ac-
regions (DSRs) within the same TADs, which yielded con- count known and predicted protein-protein interactions and
siderably more candidates per disease (on average, 5-fold pathway co-localization of their products, we constructed a
1378 Cell 167, 1369–1384, November 17, 2016

A B
GLC B
MCHC
FNMD
LSMD
INS B
BP D
BP S
●MS
MCH
MCV
RBC
PBC
GLC
HDL
PCV
CEL
SLE
LDL
T1D
T2D
BMI
PLT
INS
●A
MS
CD
RA
UC
HB
TG
PV
TC
HT
Autoimmune
utoim
o munen
● Blood
o
MK
5.0 ●CEL ● Metabolic
ta
● Other
he Ery
Lymphoid
●RA ● T2D
D Mφ1
●PBC Mφ0
●SLE
S Mφ2
tB
2.5 nB
●T1D
●CD FetT
Neu
UC GLC
●MCHC ●HT Mon
● GL
G LC
L naCD4
P D●●● ● GLC B
G
vs
0.0 BP LSMD
M
●INSS B● FNMD
tCD4
BP S aCD4
● ● N
Myeloid
PV
IN
NS
N
●INS BMI ●●TT2D
2 tCD8
●L ●● ●●PPCV TG
LDL
●RBC
R nCD8
PLT
L
PLT nCD4
−2.5 ●TC HB ●HDLH MCV
MCH
C ● ●
−2.5 0.0 2.5 5.0 −4 −2 0 2 4 6
Mon, Mφ & Neu vs MK & Ery
85
26
19
22
19
41
43
23
58
15
58
85
75
29
77
47
52
49
C D
5
Input GWAS data
ILR/SHC signaling ●
6 IL2 signaling ●
GM-CSF, IL3 & 5 signaling ●
value
-log10p
4 GeneRatio Reactive Oxygen DETOX ●

2
LA METAB
Omega3 & 6 acid METAB
●
●
●
●
●
● 0.1
0 ALA METAB
Defective CHST6 causes MCDC1
●● ● ● ●
● 0.2 Defective B4GALT1 causes B4GALT1−CDG ● ● ●
Causal variant analysis ● 0.3 Keratan sulfate degradation ● ● ●

Lipoprotein METAB ● ● ●
●
probability
0.25 0.4 HDL−mediated lipid TRANS ● ● ●

Posterior
Lipid digestion, MOB & TRANS ● ● ● ●

p.adjust
Formyl peptide receptors bind ligands
TR of white adipocyte differentiation
Chylomicron−mediated lipid TRANS
●
● ● ● ●
●● ●
WNT ligand biogenesis and trafficking
Signaling by Wnt
0.00 0.04 DNA methylation
AR transcription regulated by PKN1, KLK2 & 3 ● ●
●
PCHi-C integrated analysis 0.03 SIRT1 down-regulates rRNA
PRC2 methylates histones and DNA
●
● ●
●
1.00 0.02 Meiotic recombination ● ●
● ● ●
●●
probability
Factors in MEG DEV & PLT production

●●●
Posterior
0.01 Hemostasis
0.75 Diseases associated with GAG METAB
0.50
Diseases of glycosylation
Immune System
DARPP−32 events
STING mediated induction of host IR
●
● ●●
●
●
0.25 IL6 signaling ●
TCR signaling
Phosphorylation of CD3 and TCR
●
●
0.00 Downstream TCR signaling ●
TAT of ZAP−70 to Immunological synapse ●
Chr1 117200000 117400000 117600000 Retinoid METAB & TRANS ● ●
CD
CEL
GLC
GLC B
HB
HDL
INS B
LDL
LSMD
MCH
MCHC
MCV
MS
PLT
PV
T1D
TC
TG
RA
C1orf137 PTGFRN CD101 TRIM45
CD58 CD2 TTF2
IGSF3 RP4-753F5.1
Physical interaction
Predicted interaction
Pathway FYN
IKZF1
SLE CD5 CD247
T1D INPP5D
ITGA4
MS CD3G CD3D
PBC PTPN2 CD4 IKZF3
RA EOMES
JAK2 MYC
CEL JAK1
UC IL24 TYK2 STAT4
SGMS1
CD IL22RA2 IL2RA
IFNGR1 SOCS1
VWF
IL19 IL12B SOCS3 IL6ST
CDKN1B
IRF1
GATA3 ANGPT2
GAPDH ETS2
IRF8
NFKB1
FOXO1
ETS1
REL ICAM1 ILF3

Cell 167, 1369–1384, November 17, 2016 1379
consolidated ‘‘autoimmune disease network’’ (Figure 6E). The 5/9 candidates (C8Orf13, BLK, TRAF1, FADS2, and SYNGR1)
highly connected core of this network (Figure 6E, inset) in- that were identified in a recent study (Zhu et al., 2016) combining
cludes cytokine genes such as IL19 and IL24, signaling and whole-blood eQTL with RA GWAS data by Mendelian ran-
transcription factors controlling proliferation, inflammation and domization. The relatively large number of prioritized genes
lineage identity (such as MYC, JAK1/2, ETS1/2, CDKN1B, without eQTL support is in agreement with previous reports of
NFKB1, FOXO1, and IKZF2/3). According to ImmunoBase limited overlap of disease variants with eQTLs (Guo et al.,
(http://www.immunobase.org), the majority (76%) of the genes 2015). This demonstrates complementary benefits of eQTL-
in the core autoimmune disease network were not previously based and physical-interaction-based approaches for priori-
implicated as causal candidates for autoimmune diseases, and tizing candidate target genes of non-coding disease variants.
65% fall outside of known DSRs (Table S3). Taken together, our results reveal large numbers of newly
We compared COGS-prioritized genes for Crohn’s disease identified potential disease genes and pathways and demon-
(CD) and ulcerative colitis (UC) with genes found to be differen- strate the power of high-resolution 3D promoter interactomes
tially expressed in at least one of five sorted immune cell for large-scale interpretation of GWAS data.
populations from inflammatory bowel disease (IBD) patients
(Peters et al., 2016). A total of 33/182 (18.1%) and 49/278 DISCUSSION
(17.6%) genes prioritized by COGS for CD and UC, respectively,
were differentially expressed in IBD patients. This corresponds We have presented a comprehensive analysis of promoter-asso-
to a significant enrichment of COGS-prioritized genes for differ- ciated genome architecture in human primary hematopoietic
ential expression in disease (Fisher’s exact test p = 0.007 and cells. We show that promoter interactomes are highly cell type
p = 0.016, respectively; Figure S6C). Notably, significant enrich- specific, enriched for links between active promoters and active
ment was not observed for genes prioritized on the basis of enhancers and reflect the lineage relationships of the hemato-
shared TADs (Figure S6C). The majority of the COGS-prioritized poietic tree. Collectively, these results suggest that three-dimen-
differentially expressed genes (20/33 and 44/49, respectively) sional genome architecture undergoes stepwise remodeling
were not previously implicated in these diseases based on during lineage specification.
GWAS results. This provides further functional evidence for our Theoretically, enhancer-promoter contacts can be either
prioritization strategy. ‘‘instructive’’ (triggering transcriptional activation) or ‘‘permis-
Finally, we used the RA and systemic lupus erythematosus sive’’ (poised for activation) (de Laat and Duboule, 2013). The
(SLE) GWAS datasets (Bentham et al., 2015; Okada et al., mechanistically verified model of instructive interactions are
2012), for which imputed results are publicly available, to ask loops in the b-globin locus (Deng et al., 2014). Our observations
whether the GWAS signals that drove candidate gene prioritiza- in blood cells provide additional evidence for the ‘‘instructive’’
tion are supported by eQTLs in the respective LD blocks. model. However, it is likely that both mechanisms are opera-
Genome-wide, this analysis revealed that out of 456 genes prior- tional, particularly in early development. For example, permis-
itized for these two diseases, 136 had eQTLs, of which four sive interactions were previously detected for early mesodermal
genes (BLK, RASGRP1, SUOX, and GIN1) showed evidence enhancers in Drosophila (Ghavi-Helm et al., 2014), in mouse
for possible co-localization of GWAS signals and eQTLs in RA embryonic stem cells (Schoenfelder et al., 2015), as well as for
and two genes (BLK and SLC15A4) in SLE (see Figure S6 tumor necrosis factor alpha (TNF-a) response genes in fibro-
for examples). In addition, the genes prioritized for RA included blasts (Jin et al., 2013).
Figure 6. Promoter Interactions Link GWAS SNPs with Putative Target Genes
(A) Enrichment of GWAS summary statistics at PIRs by tissue type. Axes reflect blockshifter Z scores for two different tissue group comparisons, first lymphoid
versus myeloid, then additionally within the myeloid lineage. Traits are labeled and colored by category (BMI, body mass index; BP_D, diastolic blood pressure;
BP_S, systolic blood pressure; CD, Crohn’s disease; CEL, celiac disease; FNBMD, Femoral neck bone mineral density; GLC, glucose sensitivity; GLC_B, glucose
sensitivity BMI-adjusted; HB, hemoglobin; HDL, high-density lipoprotein; HEIGHT, height; INS, insulin sensitivity; INS_B, insulin sensitivity BMI-adjusted; LDL,
low-density lipoprotein; LSBMD, lumbar spine bone mineral density; MCH, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration;
MCV, mean corpuscular volume; MS, multiple sclerosis; PBC, primary biliary cirrhosis; PCV, packed cell volume; PLT, platelet count; PV, platelet volume; RA,
rheumatoid arthritis; RBC, red blood cell count; SLE, systemic lupus erythrematosis; T1D, type 1 diabetes; T2D = type 2 diabetes; TC, total cholesterol; TG,
triglycerides; UC, ulcerative colitis).
(B) Blockshifter enrichment Z scores of GWAS summary statistics in PIRs by individual tissue type using endothelial cells as a control. Red indicates enrichment in
the labeled tissue; green indicates enrichment in the endothelial cell control.
(C) Example of the COGS gene prioritization method in 1p13.1 RA susceptibility region. GWAS summary p values for association with RA (Okada et al., 2012) (top)
are transformed into posterior probabilities for variant being causal (middle), which are then aggregated at all PIRs interacting with a given gene, accounting for
LD, to compute gene scores. Arcs representing promoter-PIR interactions are color coded with genes.
(D) Bubble plot of traits with significant enrichment (p.adj < 0.05) in one or more pathways from the Reactome database (Fabregat et al., 2016). Top numbers
indicate the total number of genes analyzed for each trait (gene score >0.5), bubble size indicates the ratio of test genes to those in the pathway, and blue to red
corresponds to decreasing adjusted p value for enrichment.
(E) The ‘‘core autoimmune disease network’’ containing the 421 highest-scoring genes prioritized for autoimmune disease. Genes (nodes) are color coded based
on diseases for which they were prioritized as candidates by the COGS algorithm. Edges between genes are drawn based on prior knowledge about their physical
interactions, predicted interactions and pathway associations obtained from GeneMania (Montojo et al., 2010) and are color coded accordingly. Inset shows
gene names for the highest-connected central part of the network. See Quantification and Statistical Analysis.
1380 Cell 167, 1369–1384, November 17, 2016

High-resolution interaction information makes it possible to B Reciprocal Capture CHi-C
connect genes to their enhancers. Using this approach, we B Comparing PCHi-C and Reciprocal Capture Hi-C
observe that enhancers show generally additive effects on the B Promoter Interaction Localization with Respect to
expression of their target genes, which may explain why genes TADs
are often able to buffer the effects of mutations at individual en- B Interaction Clustering and Principal Component
hancers (Frankel et al., 2010; Waszak et al., 2015). This buffering, Analysis
in turn, may underlie the fact that many non-coding GWAS SNPs, B Definition of Specificity Scores
while enriched at regulatory regions, are not detectable as B Calculation of Cluster Specificity Scores
eQTLs, particularly under normal conditions (Guo et al., 2015). B ATAC-Seq Data Analysis
Interestingly, we also observed additive effects, albeit weaker, B Histone Modification ChIP and the Definition of Chro-
for PIRs that were not annotated as enhancers. This provides matin States
additional support to recent findings that regions without B Dynamics of Enhancer-Promoter Interactions
‘‘classic’’ enhancer or other gene regulatory signatures may B Relationship between Active Enhancers and Gene
also be involved in activation of gene expression (Rajagopal Expression
et al., 2016). However, we do not imply that all PIRs have gene B Calculation and Clustering of Gene Specificity Scores
regulatory roles in the analyzed cell types. Some promoter inter- (Interactions with Active Enhancers)
actions may have structural or topological roles, whereas others B Calculation of Gene Specificity Scores (Expression)
could be remnants of past developmental stages or priming for B Calculation of Gene Cluster Enrichment Scores
future activation. B eQTL Analysis
Using naturally occurring sequence variants that affect B GWAS Summary Statistics
expression of specific genes (eQTLs), we provide abundant B Poor man’s Imputation (PMI)
evidence for PIR function in gene expression control, demon- B GWAS Tissue Set Enrichment Analysis of PCHi-C
strating the power of PCHi-C to link non-coding regulatory vari- B Integration of GWAS Summary Statistics with Tissue
ants with their target genes. Recent studies by ourselves and Specific PCHi-C and Functional Information
others have made a strong case for using 3D genome B TAD-Based Prioritization
information to interpret non-coding disease-associated variants B Prioritized Gene Enrichment in IBD Differentially Ex-
(Davison et al., 2012; Dryden et al., 2014; Martin et al., 2015; pressed Genes
Mifsud et al., 2015; Smemo et al., 2014; Stadhouders et al., B Reactome Pathway Analysis
2014). Here, we link thousands of GWAS SNPs to their putative B Core Autoimmune Network
target genes and prioritize more than 2,500 potential disease- d DATA AND SOFTWARE AVAILABILITY
associated genes, three-quarters of which were not previously B Software
implicated. These candidates map to expected and novel gene B Data Resources
pathways. While further validation will be required to firmly
establish the links to specific diseases, our work establishes a SUPPLEMENTAL INFORMATION
systematic approach to interpret non-coding genetic variation
Supplemental Information includes six figures, three tables, and one data file
and creates an unprecedented opportunity to unlock the seem-
ingly intractable promise created by current and future GWAS. 2016.09.037.
CONSORTIA
STAR+METHODS
The contributing members of the BLUEPRINT Consortium (http://www.
Detailed methods are provided in the online version of this paper blueprint-epigenome.eu) are Joost H. Martens, Bowon Kim, Nilofar Sharifi,
and include the following: Eva M. Janssen-Megens, Marie-Laure Yaspo, Matthias Linser, Alexander
Kovacsovics, Laura Clarke, David Richardson, Avik Datta, and Paul Flicek.
d CONTACT FOR REAGENT AND RESOURCE SHARING AUTHOR CONTRIBUTIONS
d EXPERIMENTAL MODEL AND SUBJECT DETAILS
d METHOD DETAILS Conceptualization, P.F. and M.S.; Methodology, B.M.J., O.S.B., C.W., S.M.H.,
J.C., P.F.-P., and M.S.; Investigation, B.M.J.; Formal Analysis, O.S.B., S.P.W.,
B Cell Isolation and Purity Test
R.K., S.M.H., S.S., J.C., S.W.W., C.V., M.J.T., P.F-P., F.W., C.W., and M.S.;
B Cell Fixation
Resources, M.F., F.B., S.F., A.J.C., K.R., K.D., L.G., BLUEPRINT Consortium,
B Hi-C Library Preparation H.G.S., M.K., J.A.T., D.R.Z., and W.H.O.; Writing, M.S., B.M.J., and P.F., with
B Biotinylated RNA Bait Library Design contributions from all authors; Supervision, P.F., M.S., M.F., C.W., D.R.Z.,
B PCHi-C O.S., W.H.O., and J.A.T.; Project Administration, M.S., M.F., W.H.O., and P.F.
B Sequencing
d QUANTIFICATION AND STATISTICAL ANALYSIS ACKNOWLEDGMENTS
B Hi-C and PCHi-C Sequence Alignment

We thank Stefan Schoenfelder, Takashi Nagano, Stephen Eyre, Jane Wor-
B Hi-C Data Processing and the Definition of TAD thington, Simon Andrews, and Sach Mukherjee for helpful advice and discus-
Boundaries sions; Inkyung Jung, Bing Ren, Julia Dmitrieva, and Michel Georges for sharing
B PCHi-C Interaction Calling unpublished observations; and Frank Waldron-Lynch, Helen Stevens, and
Cell 167, 1369–1384, November 17, 2016 1381

Marcin Pekalski for provision of the fetal thymus tissue. We thank Nicole Sor- of biological networks and gene expression data using Cytoscape. Nat. Pro-
anzo, the HaemGen consortium, Benjamin Fairfax, and Julian Knight for toc. 2, 2366–2382.
sharing GWAS and eQTL data. This work was supported by the following Corces, M.R., Buenrostro, J.D., Wu, B., Greenside, P.G., Chan, S.M., Koenig,
grants: UK Medical Research Council (MR/L007150/1, MC_UP_1302/1, J.L., Snyder, M.P., Pritchard, J.K., Kundaje, A., Greenleaf, W.J., et al. (2016).
MC_UP_1302/3, MC_UP_1302/5), UK Biotechnology and Biological Sciences Lineage-specific and single-cell chromatin accessibility charts human hema-
Research Council (BB/J004480/1), ERC (DEVOCHROMO advanced grant), topoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203.
JDRF (9-2011-253, 5-SRA-2015-130), Wellcome Trust (089989, 091157,
Cordell, H.J., Han, Y., Mells, G.F., Li, Y., Hirschfield, G.M., Greene, C.S., Xie,
095908, 100140, 107212, 107881), European Union 7th Framework Pro-
G., Juran, B.D., Zhu, D., Qian, D.C., et al.; Canadian-US PBC Consortium; Ital-
gramme (FP7/2007-2013, grant agreements 241447 [NAIMIT] and 282510
ian PBC Genetics Study Group; UK-PBC Consortium (2015). International
[BLUEPRINT]), NHS Blood and Transplant, NIHR (PG-0310-1002), and BHF
genome-wide meta-analysis identifies new primary biliary cirrhosis risk loci
(RG/09/12/28096). K.D. is funded by NHS Health Education England. M.F. is
and targetable pathogenic pathways. Nat. Commun. 6, 8019.
supported by the BHF Cambridge Centre of Excellence (RE/13/6/30180).
S.P.W., M.K., D.R.Z., and O.S. are funded by the European Molecular Biology Davison, L.J., Wallace, C., Cooper, J.D., Cope, N.F., Wilson, N.K., Smyth, D.J.,
Laboratory. We gratefully acknowledge the participation of all NIHR Cam- Howson, J.M.M., Saleh, N., Al-Jeffery, A., Angus, K.L., et al.; Cardiogenics
bridge BioResource volunteers and thank the NIHR Cambridge BioResource Consortium (2012). Long-range DNA looping and gene expression analyses
centre and staff for their contribution. Raw data are shared under managed ac- identify DEXI as an autoimmune disease candidate gene. Hum. Mol. Genet.
cess in accordance with the ethical consent signed by the volunteers. Recall of 21, 322–333.
Cambridge BioResource volunteers is by application. Processed data have Deng, W., Rupon, J.W., Krivega, I., Breda, L., Motta, I., Jahn, K.S., Reik, A.,
been made publicly available as described in STAR Methods. Gregory, P.D., Rivella, S., Dean, A., and Blobel, G.A. (2014). Reactivation of
developmentally silenced globin genes by forced chromatin looping. Cell
Received: June 3, 2016 158, 849–860.
Revised: September 6, 2016 Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and
Accepted: September 22, 2016 Ren, B. (2012). Topological domains in mammalian genomes identified by
Published: November 17, 2016 analysis of chromatin interactions. Nature 485, 376–380.
Dryden, N.H., Broome, L.R., Dudbridge, F., Johnson, N., Orr, N., Schoenfelder,
REFERENCES
S., Nagano, T., Andrews, S., Wingett, S., Kozarewa, I., et al. (2014). Unbiased
analysis of potential targets of breast cancer susceptibility loci by Capture
Anderson, C.A., Boucher, G., Lees, C.W., Franke, A., D’Amato, M., Taylor,
Hi-C. Genome Res. 24, 1854–1868.
K.D., Lee, J.C., Goyette, P., Imielinski, M., Latiano, A., et al. (2011). Meta-anal-
ysis identifies 29 additional ulcerative colitis risk loci, increasing the number of Dubois, P.C.A., Trynka, G., Franke, L., Hunt, K.A., Romanos, J., Curtotti, A.,
confirmed associations to 47. Nat. Genet. 43, 246–252. Zhernakova, A., Heap, G.A.R., Adány, R., Aromaa, A., et al. (2010). Multiple
common variants for celiac disease influencing immune gene expression.
Auton, A., Brooks, L.D., Durbin, R.M., Garrison, E.P., Kang, H.M., Korbel, J.O.,
Nat. Genet. 42, 295–302.
Marchini, J.L., McCarthy, S., McVean, G.A., and Abecasis, G.R.; 1000 Ge-
nomes Project Consortium (2015). A global reference for human genetic vari- Durinck, S., Spellman, P.T., Birney, E., and Huber, W. (2009). Mapping identi-
ation. Nature 526, 68–74. fiers for the integration of genomic datasets with the R/Bioconductor package
Barrett, J.C., Clayton, D.G., Concannon, P., Akolkar, B., Cooper, J.D., Erlich, biomaRt. Nat. Protoc. 4, 1184–1191.
H.A., Julier, C., Morahan, G., Nerup, J., Nierras, C., et al.; Type 1 Diabetes Ge- Ehret, G.B., Munroe, P.B., Rice, K.M., Bochud, M., Johnson, A.D., Chasman,
netics Consortium (2009). Genome-wide association study and meta-analysis D.I., Smith, A.V., Tobin, M.D., Verwoert, G.C., Hwang, S.J., et al.; Interna-
find that over 40 loci affect risk of type 1 diabetes. Nat. Genet. 41, 703–707. tional Consortium for Blood Pressure Genome-Wide Association Studies;
Bentham, J., Morris, D.L., Cunninghame Graham, D.S., Pinder, C.L., Tomble- CARDIoGRAM consortium; CKDGen Consortium; KidneyGen Consortium;
son, P., Behrens, T.W., Martı́n, J., Fairfax, B.P., Knight, J.C., Chen, L., et al. EchoGen consortium; CHARGE-HF consortium (2011). Genetic variants
(2015). Genetic association analyses implicate aberrant regulation of innate in novel pathways influence blood pressure and cardiovascular disease risk.
and adaptive immunity genes in the pathogenesis of systemic lupus erythema- Nature 478, 103–109.
tosus. Nat. Genet. 47, 1457–1464. ENCODE Project Consortium (2012). An integrated encyclopedia of DNA
Blangiardo, M., and Richardson, S. (2007). Statistical tools for synthesizing elements in the human genome. Nature 489, 57–74.
lists of differentially expressed features in related experiments. Genome Ernst, J., and Kellis, M. (2012). ChromHMM: Automating chromatin-state
Biol. 8, R54. discovery and characterization. Nat. Methods 9, 215–216.
Blangiardo, M., Cassese, A., and Richardson, S. (2010). sdef: An R package Estrada, K., Styrkarsdottir, U., Evangelou, E., Hsu, Y.-H., Duncan, E.L., Ntzani,
to synthesize lists of significant features in related experiments. BMC Bioin- E.E., Oei, L., Albagha, O.M.E., Amin, N., Kemp, J.P., et al. (2012). Genome-
formatics 11, 270. wide meta-analysis identifies 56 bone mineral density loci and reveals 14
Cairns, J., Freire-Pritchett, P., Wingett, S.W., Várnai, C., Dimond, A., Plagnol, loci associated with risk of fracture. Nat. Genet. 44, 491–501.
V., Zerbino, D., Schoenfelder, S., Javierre, B.-M., Osborne, C., et al. (2016). Fabregat, A., Sidiropoulos, K., Garapati, P., Gillespie, M., Hausmann, K., Haw,
CHiCAGO: Robust detection of DNA looping interactions in Capture Hi-C R., Jassal, B., Jupe, S., Korninger, F., McKay, S., et al. (2016). The Reactome
data. Genome Biol. 17, 127. pathway Knowledgebase. Nucleic Acids Res. 44, D481–D487.
Carter, D., Chakalova, L., Osborne, C.S., Dai, Y.-F., and Fraser, P. (2002). Fairfax, B.P., Makino, S., Radhakrishnan, J., Plant, K., Leslie, S., Dilthey, A.,
Long-range chromatin regulatory interactions in vivo. Nat. Genet. 32, 623–626. Ellis, P., Langford, C., Vannberg, F.O., and Knight, J.C. (2012). Genetics of
Cheeseman, P., Peter, C., James, K., Matthew, S., John, S., Will, T., and Don, gene expression in primary immune cells identifies cell type-specific master
F. (1988). AutoClass: A Bayesian Classification System. In Machine Learning regulators and roles of HLA alleles. Nat. Genet. 44, 502–510.
Proceedings 1988, pp. 54–64. Franke, A., McGovern, D.P.B., Barrett, J.C., Wang, K., Radford-Smith, G.L.,
Chen, L., Kostadima, M., Martens, J.H.A., Canu, G., Garcia, S.P., Turro, E., Ahmad, T., Lees, C.W., Balschun, T., Lee, J., Roberts, R., et al. (2010).
Downes, K., Macaulay, I.C., Bielczyk-Maczynska, E., Coe, S., et al.; BRIDGE Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s
Consortium (2014). Transcriptional diversity during lineage commitment of disease susceptibility loci. Nat. Genet. 42, 1118–1125.
human blood progenitors. Science 345, 1251033. Frankel, N., Davis, G.K., Vargas, D., Wang, S., Payre, F., and Stern, D.L. (2010).
Cline, M.S., Smoot, M., Cerami, E., Kuchinsky, A., Landys, N., Workman, C., Phenotypic robustness conferred by apparently redundant transcriptional en-
Christmas, R., Avila-Campilo, I., Creech, M., Gross, B., et al. (2007). Integration hancers. Nature 466, 490–493.
1382 Cell 167, 1369–1384, November 17, 2016

Frazer, K.A., Ballinger, D.G., Cox, D.R., Hinds, D.A., Stuve, L.L., Gibbs, R.A., Replication And Meta-analysis (DIAGRAM) Consortium; Multiple Tissue Hu-
Belmont, J.W., Boudreau, A., Hardenbol, P., Leal, S.M., et al.; International man Expression Resource (MUTHER) Consortium (2012). A genome-wide
HapMap Consortium (2007). A second generation human haplotype map of approach accounting for body mass index identifies genetic variants influ-
over 3.1 million SNPs. Nature 449, 851–861. encing fasting glycemic traits and insulin resistance. Nat. Genet. 44, 659–669.
Ghavi-Helm, Y., Klein, F.A., Pakozdi, T., Ciglar, L., Noordermeer, D., Huber, Martin, P., McGovern, A., Orozco, G., Duffus, K., Yarwood, A., Schoenfelder,
W., Furlong, E.E., and Furlong, E.E.M. (2014). Enhancer loops appear stable S., Cooper, N.J., Barton, A., Wallace, C., Fraser, P., et al. (2015). Capture Hi-C
during development and are associated with paused polymerase. Nature reveals novel candidate genes and complex long-range interactions with
512, 96–100. related autoimmune risk loci. Nat. Commun. 6, 10069.
Gieger, C., Radhakrishnan, A., Cvejic, A., Tang, W., Porcu, E., Pistis, G., Ser- Maurano, M.T., Humbert, R., Rynes, E., Thurman, R.E., Haugen, E., Wang, H.,
banovic-Canic, J., Elling, U., Goodall, A.H., Labrune, Y., et al. (2011). New gene Reynolds, A.P., Sandstrom, R., Qu, H., Brody, J., et al. (2012). Systematic
functions in megakaryopoiesis and platelet formation. Nature 480, 201–208. localization of common disease-associated variation in regulatory DNA. Sci-
Guo, H., Fortune, M.D., Burren, O.S., Schofield, E., Todd, J.A., and Wallace, C. ence 337, 1190–1195.
(2015). Integration of disease association and eQTL data using a Bayesian co- McLaren, W., Pritchard, B., Rios, D., Chen, Y., Flicek, P., and Cunningham, F.
localisation approach highlights six candidate causal genes in immune-medi- (2010). Deriving the consequences of genomic variants with the Ensembl API
ated diseases. Hum. Mol. Genet. 24, 3305–3313. and SNP Effect Predictor. Bioinformatics 26, 2069–2070.
Hanscombe, O., Whyatt, D., Fraser, P., Yannoutsos, N., Greaves, D., Dillon, N., Mifsud, B., Tavares-Cadete, F., Young, A.N., Sugar, R., Schoenfelder, S., Fer-
and Grosveld, F. (1991). Importance of globin gene order for correct develop- reira, L., Wingett, S.W., Andrews, S., Grey, W., Ewels, P.A., et al. (2015). Map-
mental expression. Genes Dev. 5, 1387–1394. ping long-range promoter contacts in human cells with high-resolution capture
Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Hi-C. Nat. Genet. 47, 598–606.
Murre, C., Singh, H., and Glass, C.K. (2010). Simple combinations of lineage- Mohrs, M., Blankespoor, C.M., Wang, Z.-E., Loots, G.G., Afzal, V., Hadeiba,
determining transcription factors prime cis-regulatory elements required for H., Shinkai, K., Rubin, E.M., and Locksley, R.M. (2001). Deletion of a coordi-
macrophage and B cell identities. Mol. Cell 38, 576–589. nate regulator of type 2 cytokine expression in mice. Nat. Immunol. 2, 842–847.
Heinz, S., Romanoski, C.E., Benner, C., Allison, K.A., Kaikkonen, M.U., Or- Montojo, J., Zuberi, K., Rodriguez, H., Kazi, F., Wright, G., Donaldson, S.L.,
ozco, L.D., and Glass, C.K. (2013). Effect of natural genetic variation on Morris, Q., and Bader, G.D. (2010). GeneMANIA Cytoscape plugin: Fast
enhancer selection and function. Nature 503, 487–492. gene function predictions on the desktop. Bioinformatics 26, 2927–2928.
Imakaev, M., Fudenberg, G., McCord, R.P., Naumova, N., Goloborodko, A., Morris, A.P., Voight, B.F., Teslovich, T.M., Ferreira, T., Segrè, A.V., Steinthors-
Lajoie, B.R., Dekker, J., and Mirny, L.A. (2012). Iterative correction of Hi-C dottir, V., Strawbridge, R.J., Khan, H., Grallert, H., Mahajan, A., et al.; Well-
data reveals hallmarks of chromosome organization. Nat. Methods 9, 999– come Trust Case Control Consortium; Meta-Analyses of Glucose and Insu-
1003. lin-related traits Consortium (MAGIC) Investigators; Genetic Investigation of
Jeffries, C.D., Ward, W.O., Perkins, D.O., and Wright, F.A. (2009). Discovering ANthropometric Traits (GIANT) Consortium; Asian Genetic Epidemiology
collectively informative descriptors from high-throughput experiments. BMC Network–Type 2 Diabetes (AGEN-T2D) Consortium; South Asian Type 2 Dia-
Bioinformatics 10, 431. betes (SAT2D) Consortium; DIAbetes Genetics Replication And Meta-analysis
(DIAGRAM) Consortium (2012). Large-scale association analysis provides in-
Jin, F., Li, Y., Dixon, J.R., Selvaraj, S., Ye, Z., Lee, A.Y., Yen, C.A., Schmitt,
sights into the genetic architecture and pathophysiology of type 2 diabetes.
A.D., Espinoza, C.A., and Ren, B. (2013). A high-resolution map of the three-
Nat. Genet. 44, 981–990.
dimensional chromatin interactome in human cells. Nature 503, 290–294.
Jovanovic, D.V., Boumsell, L., Bensussan, A., Chevalier, X., Mancini, A., and Di Nagano, T., Várnai, C., Schoenfelder, S., Javierre, B.M., Wingett, S.W., Fraser,
Battista, J.A. (2011). CD101 expression and function in normal and rheumatoid P., and Peter, F. (2015). Comparison of Hi-C results using in-solution versus
in-nucleus ligation. Genome Biol. 16, 175.
arthritis-affected human T cells and monocytes/macrophages. J. Rheumatol.
38, 419–428. Natoli, G., and Andrau, J.-C. (2012). Noncoding transcription at enhancers:
de Laat, W., and Duboule, D. (2013). Topology of mammalian developmental General principles and functional models. Annu. Rev. Genet. 46, 1–19.
enhancers and their regulatory landscapes. Nature 502, 499–506. Okada, Y., Terao, C., Ikari, K., Kochi, Y., Ohmura, K., Suzuki, A., Kawaguchi,
Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Stahl, E.A., Kurreeman, F.A.S., Nishida, N., et al. (2012). Meta-analysis
T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al. identifies nine new loci associated with rheumatoid arthritis in the Japanese
(2009). Comprehensive mapping of long-range interactions reveals folding population. Nat. Genet. 44, 511–516.
principles of the human genome. Science 326, 289–293. Ormiston, M.L., Toshner, M.R., Kiskin, F.N., Huang, C.J.Z., Groves, E., Morrell,
Lippert, C., Casale, F.P., Rakitsch, B., and Stegle, O. (2014). LIMIX: Genetic N.W., and Rana, A.A. (2015). Generation and culture of blood outgrowth endo-
analysis of multiple traits. bioRxiv. thelial cells from human peripheral blood. J. Vis. Exp. 106, e53384.
Locke, A.E., Kahali, B., Berndt, S.I., Justice, A.E., Pers, T.H., Day, F.R., Powell, Peters, J.E., Lyons, P.A., Lee, J.C., Richard, A.C., Fortune, M.D., Newcombe,
C., Vedantam, S., Buchkovich, M.L., Yang, J., et al.; LifeLines Cohort Study; P.J., Richardson, S., and Smith, K.G.C. (2016). Insight into genotype-pheno-
ADIPOGen Consortium; AGEN-BMI Working Group; CARDIOGRAMplusC4D type associations through eQTL mapping in multiple cell types in health and
Consortium; CKDGen Consortium; GLGC; ICBP; MAGIC Investigators; immune-mediated disease. PLoS Genet. 12, e1005908.
MuTHER Consortium; MIGen Consortium; PAGE Consortium; ReproGen Rajagopal, N., Srinivasan, S., Kooshesh, K., Guo, Y., Edwards, M.D., Banerjee,
Consortium; GENIE Consortium; International Endogene Consortium (2015). B., Syed, T., Emons, B.J.M., Gifford, D.K., and Sherwood, R.I. (2016). High-
Genetic studies of body mass index yield new insights for obesity biology. Na- throughput mapping of regulatory DNA. Nat. Biotechnol. 34, 167–174.
ture 518, 197–206. Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K.
Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold (2015). limma powers differential expression analyses for RNA-sequencing
change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550. and microarray studies. Nucleic Acids Res. 43, e47.
Maller, J.B., McVean, G., Byrnes, J., Vukcevic, D., Palin, K., Su, Z., Howson, Sahlén, P., Abdullayev, I., Ramsköld, D., Matskova, L., Rilakovic, N., Lötstedt,
J.M., Auton, A., Myers, S., Morris, A., et al.; Wellcome Trust Case Control B., Albert, T.J., Lundeberg, J., and Sandberg, R. (2015). Genome-wide map-
Consortium (2012). Bayesian refinement of association signals for 14 loci in ping of promoter-anchored interactions with close to single-enhancer
3 common diseases. Nat. Genet. 44, 1294–1301. resolution. Genome Biol. 16, 156.
Manning, A.K., Hivert, M.-F., Scott, R.A., Grimsby, J.L., Bouatia-Naji, N., Chen, Sanyal, A., Lajoie, B.R., Jain, G., and Dekker, J. (2012). The long-range inter-
H., Rybin, D., Liu, C.-T., Bielak, L.F., Prokopenko, I., et al.; DIAbetes Genetics action landscape of gene promoters. Nature 489, 109–113.
Cell 167, 1369–1384, November 17, 2016 1383

Sawcer, S., Hellenthal, G., Pirinen, M., Spencer, C.C., Patsopoulos, N.A., Wakefield, J. (2009). Bayes factors for genome-wide association studies:
Moutsianas, L., Dilthey, A., Su, Z., Freeman, C., Hunt, S.E., et al.; International Comparison with P-values. Genet. Epidemiol. 33, 79–86.
Multiple Sclerosis Genetics Consortium; Wellcome Trust Case Control Con-
Waszak, S.M., Delaneau, O., Gschwind, A.R., Kilpinen, H., Raghav, S.K., Wit-
sortium 2 (2011). Genetic risk and a primary role for cell-mediated immune
wicki, R.M., Orioli, A., Wiederkehr, M., Panousis, N.I., Yurovsky, A., et al.
mechanisms in multiple sclerosis. Nature 476, 214–219.
(2015). Population variation and genetic control of modular chromatin archi-
Schoenfelder, S., Furlan-Magaril, M., Mifsud, B., Tavares-Cadete, F., Sugar, tecture in humans. Cell 162, 1039–1050.
R., Javierre, B.M., Nagano, T., Katsman, Y., Sakthidevi, M., Wingett, S.W.,
et al. (2015). The pluripotent regulatory circuitry connecting promoters to their Westra, H.-J., Peters, M.J., Esko, T., Yaghootkar, H., Schurmann, C., Kettu-
long-range interacting elements. Genome Res. 25, 582–597. nen, J., Christiansen, M.W., Fairfax, B.P., Schramm, K., Powell, J.E., et al.
Schofield, E.C., Carver, T., Achuthan, P., Freire-Pritchett, P., Spivakov, M., (2013). Systematic identification of trans eQTLs as putative drivers of known
Todd, J.A., and Burren, O.S. (2016). CHiCP: A web-based tool for the inte- disease associations. Nat. Genet. 45, 1238–1243.
grative and interactive visualization of promoter capture Hi-C datasets. Wingett, S., Ewels, P., Furlan-Magaril, M., Nagano, T., Schoenfelder, S.,
Bioinformatics 32, 2511–2513. Fraser, P., and Andrews, S. (2015). HiCUP: pipeline for mapping and pro-
Smemo, S., Tena, J.J., Kim, K.-H., Gamazon, E.R., Sakabe, N.J., Gómez- cessing Hi-C data. F1000Res. 4, 1310.
Marı́n, C., Aneas, I., Credidio, F.L., Sobreira, D.R., Wasserman, N.F., et al.
Wood, A.R., Esko, T., Yang, J., Vedantam, S., Pers, T.H., Gustafsson, S., Chu,
(2014). Obesity-associated variants within FTO form long-range functional
A.Y., Estrada, K., Luan, J., Kutalik, Z., et al.; Electronic Medical Records and
connections with IRX3. Nature 507, 371–375.
Genomics (eMEMERGEGE) Consortium; MIGen Consortium; PAGEGE Con-
Stadhouders, R., Aktuna, S., Thongjuea, S., Aghajanirefah, A., Pourfarzad, F.,
sortium; LifeLines Cohort Study (2014). Defining the role of common variation
van Ijcken, W., Lenhard, B., Rooks, H., Best, S., Menzel, S., et al. (2014).
in the genomic and biological architecture of adult human height. Nat. Genet.
HBS1L-MYB intergenic variants modulate fetal hemoglobin via long-range
46, 1173–1186.
MYB enhancers. J. Clin. Invest. 124, 1699–1710.
Stahl, E.A., Raychaudhuri, S., Remmers, E.F., Xie, G., Eyre, S., Thomson, B.P., Xu, Z., Wei, G., Chepelev, I., Zhao, K., and Felsenfeld, G. (2011). Mapping of
Li, Y., Kurreeman, F.A.S., Zhernakova, A., Hinks, A., et al.; BIRAC Consortium; INS promoter interactions reveals its role in long-range regulation of SYT8
YEAR Consortium (2010). Genome-wide association study meta-analysis transcription. Nat. Struct. Mol. Biol. 18, 372–378.
identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508–514. Yang, Z., Matteson, E.L., Goronzy, J.J., and Weyand, C.M. (2015). T-cell meta-
Teslovich, T.M., Musunuru, K., Smith, A.V., Edmondson, A.C., Stylianou, I.M., bolism in autoimmune disease. Arthritis Res. Ther. 17, 29.
Koseki, M., Pirruccello, J.P., Ripatti, S., Chasman, D.I., Willer, C.J., et al.
(2010). Biological, clinical and population relevance of 95 loci for blood lipids. Yu, G., and He, Q.-Y. (2016). ReactomePA: An R/Bioconductor package for
reactome pathway analysis and visualization. Mol. Biosyst. 12, 477–479.
Nature 466, 707–713.
Tolhuis, B., Palstra, R.J., Splinter, E., Grosveld, F., and de Laat, W. (2002). Yu, G., Wang, L.-G., Han, Y., and He, Q.-Y. (2012). clusterProfiler: An R
Looping and interaction between hypersensitive sites in the active beta-globin package for comparing biological themes among gene clusters. OMICS 16,
locus. Mol. Cell 10, 1453–1465. 284–287.
Trynka, G., Westra, H.-J., Slowikowski, K., Hu, X., Xu, H., Stranger, B.E., Klein, Zerbino, D.R., Wilder, S.P., Johnson, N., Juettemann, T., and Flicek, P.R.
R.J., Han, B., and Raychaudhuri, S. (2015). Disentangling the effects of coloc- (2015). The ensembl regulatory build. Genome Biol. 16, 56.
alizing genomic annotations to functionally prioritize non-coding variants
within complex-trait loci. Am. J. Hum. Genet. 97, 139–152. Zerbino, D.R., Johnson, N., Juetteman, T., Sheppard, D., Wilder, S.P., Lavidas,
Turro, E., Su, S.-Y., Gonçalves, Â., Coin, L.J.M., Richardson, S., and Lewin, A. I., Nuhn, M., Perry, E., Raffaillac-Desfosses, Q., Sobral, D., et al. (2016). En-
(2011). Haplotype and isoform specific expression estimation using multi- sembl regulation resources. Database (Oxford) 2016, 2016.
mapping RNA-seq reads. Genome Biol. 12, R13. Zhu, Z., Zhang, F., Hu, H., Bakshi, A., Robinson, M.R., Powell, J.E., Montgom-
van der Harst, P., Zhang, W., Mateo Leach, I., Rendon, A., Verweij, N., Sehmi, ery, G.W., Goddard, M.E., Wray, N.R., Visscher, P.M., and Yang, J. (2016).
J., Paul, D.S., Elling, U., Allayee, H., Li, X., et al. (2012). Seventy-five genetic Integration of summary data from GWAS and eQTL studies predicts complex
loci influencing the human red blood cell. Nature 492, 369–375. trait gene targets. Nat. Genet. 48, 481–487.
1384 Cell 167, 1369–1384, November 17, 2016

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
CD41 BD cat#555466
CD42 IBGRL cat#9448
CD71 BD cat#551374
CD235 BD cat#555570
CD66b-FITC IBGRL cat#9453FL
CD16PE Miltenyi cat#130-091-245
CD14-FITC BD cat#345784
CCR7-FIT BD cat#5612271
CD25-PE Miltenyi cat#120-001-311
CD14-PEcy5.5 Invitrogen cat#MCHD1418
CD40-PEcy7 BD cat#561215
CD206-P BD cat#555954
CD36-FITC Southern Biotech cat#9605-02
CD45-PEcy5.5 Invitrogen cat#MCHD4518
CD27 PE BD cat#555441
IgD-FITC BD cat#555778
CD19 APC BD cat#555415
CD45ra PE BD cat#555489
CD3 BD cat#555332
CD8 BD cat#555367
CD62L BD cat#559772
CD45RA BD cat#555489
CD3-brilliant violet 421 Biolegend cat#300434
CD4-BUV395 BD cat#563550
CD45RA-brilliant violet 785 Biolegend cat#304123
CD25-APC BD cat#555434 and cat#340907
CD127-PECy7 Biolegend cat#351320
CD62L-brilliant violet 605 BD cat#562719
CD34 microbead kit Miltenyi cat#130-046-702
Monocyte negative selection isolation kit StemCell technologies cat#19059
EasySep human naive B cell enrichment kit StemCell technologies cat#19254
EasySep human B cell enrichment kit StemCell technologies cat#19054
EasySep custom kit for Naive CD4 StemCell technologies cat#19309
EasySep human CD4 + T cell enrichment kit StemCell technologies cat#19052
+
RosetteSep Human CD4 T cell enrichment StemCell technologies cat#15022
cocktail
Dynabeads Human T activator CD3/ Thermofisher cat#111.31D
CD28 beads

Continued
+
EasySep Human Naive CD8 T cell enrichment kit StemCell technologies cat#19158
EasySep Human CD8+ T cell enrichment kit StemCell technologies cat #19053
Quant-iT PicoGreen dsDNA Assay Kit Thermofisher cat#P7589
SureSelectXT Custom 3-5.9Mb library Agilent Technologies cat#5190-4831
SSEL TE Reagement Kit, ILM PE full adaptor Agilent Technologies cat#931108
SureSelectXT Custom 1Kb-499kb library Agilent Technologies cat#5190-4806
Deposited Data
Raw Promoter Capture Hi-C and reciprocal This study EGA: EGAS00001001911
capture Hi-C data
Processed data generated in this study This study https://osf.io/u8tzp
BLUEPRINT raw gene expression data BLUEPRINT project EGA: EGAS00001000327
H3K4me3 CHIPseq in human CD20+ cells. ENCODE project https://www.encodeproject.org/experiments/
ENCSR000DQR/ENCFF001WXC
DNase-seq in human naive CD4+ cells ENCODE project https://www.encodeproject.org/experiments/
ENCSR000EML/
Histone modification ChIP data BLUEPRINT project ftp://ftp.ebi.ac.uk/pub/databases/blueprint/
(GRCh37-based release) data/homo_sapiens/GRCh37/
Ensembl regulatory build Zerbino et al., 2015 ftp://ftp.ebi.ac.uk/pub/contrib/pchic/hg19/
overview/RegBuild.bb
ATAC-seq data Corces et al., 2016 GEO: GSE74912
Monocyte and B cell eQTL data Fairfax et al., 2012 EGA: EGAS00000000109; ArrayExpress:
E-MTAB-2232
Whole blood eQTL data Westra et al., 2013 http://genenetwork.nl/bloodeqtlbrowser/2012-
12-21-CisAssociationsProbeLevelFDR0.5.zip
Blood trait GWAS summary data Gieger et al., 2011; Obtained from authors
van der Harst et al., 2012
Autoimmune disease GWAS summary data Anderson et al., 2011; http://www.immunobase.org
Barrett et al., 2009; Bentham
et al., 2015; Cordell et al.,
2015; Dubois et al., 2010;
Franke et al., 2010; Sawcer
et al., 2011; Stahl et al., 2010
Type 2 diabetes GWAS summary data Morris et al., 2012 http://diagram-consortium.org/downloads.html
Height GWAS summary data Wood et al., 2014 https://www.broadinstitute.org/collaboration/
giant/images/0/01/GIANT_HEIGHT_Wood_et_al_
2014_publicrelease_HapMapCeuFreq.txt.gz
Tryglycerides GWAS summary data Teslovich et al., 2010 http://csg.sph.umich.edu/abecasis/public/
lipids2010/TG2010.zip
High density lipoprotein GWAS summary data Teslovich et al., 2010 http://csg.sph.umich.edu/abecasis/public/
lipids2010/HDL2010.zip
Low density lipoprotein GWAS summary data Teslovich et al., 2010 http://csg.sph.umich.edu/abecasis/public/
lipids2010/LDL2010.zip
Total Cholesterol GWAS summary data Teslovich et al., 2010 http://csg.sph.umich.edu/abecasis/public/
lipids2010/TC2010.zip
Glucose sensitivity BMI adjusted GWAS Manning et al., 2012 ftp://ftp.sanger.ac.uk/pub/magic/MAGIC_Manning_
summary data et_al_FastingGlucose_MainEffect.txt.gz
Glucose sensitivity GWAS summary data Manning et al., 2012 ftp://ftp.sanger.ac.uk/pub/magic/MAGIC_
Manning_et_al_FastingGlucose_MainEffect.txt.gz
Insulin sensitivity BMI adjusted GWAS Manning et al., 2012 ftp://ftp.sanger.ac.uk/pub/magic/MAGIC_Manning_
summary data et_al_FastingGlucose_MainEffect.txt.gz
Insulin sensitivity GWAS summary data Manning et al., 2012 ftp://ftp.sanger.ac.uk/pub/magic/MAGIC_Manning_
et_al_FastingGlucose_MainEffect.txt.gz
Femoral neck bone mineral density GWAS Estrada et al., 2012 http://www.gefos.org/sites/default/files/
summary data GEFOS2_FNBMD_POOLED_GC.txt.gz

Continued
Lumbar spine bone mineral density GWAS Estrada et al., 2012 http://www.gefos.org/sites/default/files/
summary data GEFOS2_LSBMD_POOLED_GC.txt.gz
Diastolic blood pressure GWAS Ehret et al., 2011 http://www.georgehretlab.org/ICBP-summary-
summary data Nature.csv.gz
Systolic blood pressure GWAS Ehret et al., 2011 http://www.georgehretlab.org/ICBP-summary-
summary data Nature.csv.gz
Body Mass Index GWAS summary data Locke et al., 2015 https://www.broadinstitute.org/collaboration/
giant/images/1/15/SNP_gwas_mc_merge_
nogc.tbl.uniq.gz
HiCUP Wingett et al., 2015 http://www.bioinformatics.babraham.ac.uk/
projects/hicup
HOMER Heinz et al., 2010 http://homer.salk.edu/homer/
CHiCAGO: calling interactions and computing Cairns et al., 2016 http://regulatorygenomicsgroup.org/chicago
feature enrichment at PIRs
Sdef method Blangiardo et al., 2010 https://cran.r-project.org/web/packages/sdef
Autoclass Bayesian clustering Cheeseman et al., 1988 https://ti.arc.nasa.gov/tech/rse/synthesis-projects-
applications/autoclass/autoclass-c/
Specificity score computation This paper https://github.com/Steven-M-Hill/PCHiC-
specificity-score-analysis
chromHMM Ernst and Kellis, 2012 http://compbio.mit.edu/ChromHMM/
DESeq2 Love et al., 2014 https://www.bioconductor.org/packages/DESeq2
Ensembl Regulatory Build process Zerbino et al., 2015 http://www.ensembl.org/info/genome/funcgen/
regulatory_build.html
MMSEQ Turro et al., 2011 https://github.com/eturro/mmseq
LIMIX Lippert et al.,2014 https://github.com/PMBio/limix
Poor man’s imputation This paper https://github.com/ollyburren/CHIGP
Blockshifter This paper https://github.com/ollyburren/CHIGP
COGS algorithm This paper https://github.com/ollyburren/CHIGP
Wakefield’s synthesis of approximate Wakefield, 2009 https://github.com/ollyburren/CHIGP
Bayes factors
GeneMania 3.4.0 plugin Montojo et al., 2010 http://genemania.org/plugin
Cytoscape 3.3.0 Cline et al., 2007 http://www.cytoscape.org
bioMaRt Durinck et al., 2009 https://www.bioconductor.org/packages/
biomaRt
ReactomePA Yu and He, 2016 https://www.bioconductor.org/packages/
ReactomePA
ClusterProfiler Yu et al., 2012 https://www.bioconductor.org/packages/
clusterProfiler
VEP McLaren et al., 2010 https://github.com/Ensembl/
ensembl-tools
As Lead Contact, Mikhail Spivakov is responsible for all reagent and resource requests. Please contact Mikhail Spivakov at mikhail.
spivakov@babraham.ac.uk with requests and inquiries. Raw data are shared under managed access in accordance with the ethical
consent signed by the volunteers. Recall of Cambridge BioResource volunteers is by application. Processed data have been made
publicly available as described below.
Human primary blood cells were obtained from either a single healthy donor (Mon, Neu, M40 (2/3 reps), M41, M42 (1/3 reps), Ery,
EndP, nCD4 (1/4 reps), tCD4, tCD8 (2/3 reps), tB, FetT) or pooled from multiple healthy donors (MK, M40 (1/3 reps), M42 (2/3 reps),

nCD4 (3/4 reps), naCD4, aCD4, nCD8, tCD8 (1/3 reps), nB). The samples were obtained after written informed consent under study
titles ‘‘A Blueprint of Blood Cells,’’ REC reference 12/EE/0040, and ‘‘Genes and mechanisms in type 1 diabetes in the Cambridge
BioResource,’’ REC reference 05/Q0106/20; NRES Committee East of England – Cambridgeshire and Hertfordshire.
METHOD DETAILS
Cell Isolation and Purity Test

Cells were isolated from venous or cord blood and in vitro cultured and differentiated in some cases following standard BLUEPRINT
protocols as detailed below and confirming purity by flow cytometry or morphological examination.
Monocytes were isolated from venous blood after CD16+ depletion and CD14+ selection of peripheral blood mononuclear
cells (PBMCs) by Miltenyi Biotec kits, as described in detail at http://www.blueprint-epigenome.eu/UserFiles/file/Protocols/
UCAM_BluePrint_Monocyte.pdf. Neutrophils were isolated from venous blood after erythrocyte lysis and CD16+ selection by Milte-
nyi Biotec kits. Macrophages were in vitro differentiated from monocytes isolated from venous blood. Briefly, M0 resting macro-
phages were obtained after stimulation with 50ng/ml M-CSF for 7 days of monocytes. M1 inflammatory macrophages were obtained
after stimulation of monocytes with 50ng/ml M-CSF for 6 days followed by LPS alone at 100ng/ml for the last 18 hours. M2 anti-in-
flammatory macrophages were obtained after stimulation of monocytes with of 15ng/ml IL-13 and 0.1uM Rosiglitazone. See http://
www.blueprint-epigenome.eu/UserFiles/file/Protocols/UCAM_BluePrint_Macrophage.pdf for full details.
Erythroblasts and megakaryocytes were cultured from CD34+ cells isolated from cord blood mononuclear cells obtained with the
human CD34 isolation kit (Miltenyi Biotec) as described in (Chen et al., 2014). Erythroblasts were cultured with erythropoietin, SCF
and IL3 for 14 days, while megakaryocytes were obtained by culturing CD34+ cells with thrombopoietin and IL1b in 10 days.
Endothelial precursors (blood outgrowth endothelial cells (BOECs)) were generated from circulating endothelial progenitors in
adult peripheral blood after long-term culturing of PBMCs with endothelial cell growth medium and colony isolation (Ormiston
et al., 2015).
Naive CD4+ lymphocytes were obtained from PBMCs from venous blood by using custom kit (Catalog#19309) from STEMCELL
Technologies. Total CD4+ lymphocytes were obtained from PBMCs from venous blood by negative selection using EasySep Human
CD4+ T Cell Enrichment kit (Catalog#19052) from STEMCELL Technologies.
Activated and non-activated total CD4+ T cells were enriched from whole blood using RosetteSep human CD4+ T cell enrichment
cocktail according to the manufacturer’s protocol (STEMCELL Technologies, Vancouver, Canada). The enriched CD4+ T cell culture
was washed twice in X-VIVO-15 media (Lonza, Basel, Switzerland) supplemented with 1% human AB serum (Lonza) and penicillin/
streptomycin (GIBCO, ThermoFisher). 250,000 CD4+ T cells (93–99% pure) were stimulated with anti-CD3/CD28 T cell activator
beads (Dynal, ThermoFisher). Beads were added at a ratio of 0.3 beads / 1 CD4+ T cell (75,000 beads / well) and the cells ± beads
were cultured for 4 hr at 37 C + 5% CO2.
Naive CD8+ lymphocytes were obtained from PBMCs from venous blood by negative selection using EasySep Human Naive CD8+
T Cell Enrichment kit (Catalog#19158) from STEMCELL Technologies. Total CD8+ lymphocytes were obtained from PBMCs from
venous blood by negative selection using EasySep Human CD8+ T cell Enrichment kit (Catalog#19053) from STEMCELL Techno-
logies. Naive B lymphocytes were obtained from PBMCs from venous blood by negative selection using EasySep Naive B Cell
Enrichment kit (Catalog#19254) from STEMCELL Technologies. Total B lymphocytes were obtained from PBMCs from venous blood
by negative selection using EasySep Human B cell Enrichment kit (Catalog#19054) from STEMCELL Technologies. Foetal thymus
cells were obtained after cell disaggregation from fetal thymus tissue that was sourced from Advanced Bioscience Resources
(Alameda, CA, USA), processed and banked in accordance with UK Human Tissue Act 2004. Ficoll isolation was used to select
healthy cells.
Cell Fixation
8x107 cells per library were resuspended in 30.625 ml of DMEM supplemented with 10% FBS, and 4.375 ml of formaldehyde was
added (16% stock solution; 2% final concentration). The fixation reaction continued for 10 min at room temperature with mixing and
was then quenched by the addition of 5 ml of 1 M glycine (125 mM final concentration). Cells were incubated at room temperature for
5 min and then on ice for 15 min. Cells were pelleted by centrifugation at 400g for 10 min at 4 C, and the supernatant was discarded.
The pellet was washed briefly in cold PBS, and samples were centrifuged again to pellet the cells. The supernatant was removed, and
the cell pellets were flash frozen in liquid nitrogen and stored at 80 C.
Hi-C Library Preparation

Hi-C library generation was carried with in-nucleus ligation as described previously (Nagano et al., 2015). Chromatin was then de-
crosslinked and purified by phenol-chloroform extraction. DNA concentration was measured using Quant-iT PicoGreen (Life Tech-
nologies), and 40 mg of DNA was sheared to an average size of 400 bp, using the manufacturer’s instructions (Covaris). The sheared
DNA was end-repaired, adenine-tailed and double size-selected using AMPure XP beads to isolate DNA ranging from 250 to 550 bp.
Ligation fragments marked by biotin were immobilized using MyOne Streptavidin C1 DynaBeads (Invitrogen) and ligated to
paired-end adaptors (Illumina). The immobilized Hi-C libraries were amplified using PE PCR 1.0 and PE PCR 2.0 primers (Illumina)
with 7–8 PCR amplification cycles.

Biotinylated RNA Bait Library Design
Biotinylated 120-mer RNA baits were designed to the ends of HindIII restriction fragments that overlap Ensembl-annotated pro-
moters of protein-coding, noncoding, antisense, snRNA, miRNA and snoRNA transcripts (Mifsud et al., 2015). A target sequence
was accepted if its GC content ranged between 25% and 65%, the sequence contained no more than two consecutive Ns and
was within 330 bp of the HindIII restriction fragment terminus. A total of 22,076 HindIII fragments were captured, containing a total
of 31,253 annotated promoters for 18,202 protein-coding and 10,929 non-protein genes according to Ensembl v.75 (http://grch37.
ensembl.org).
PCHi-C
Capture Hi-C of promoters was carried out with SureSelect target enrichment, using the custom-designed biotinylated RNA bait
library and custom paired-end blockers according to the manufacturer’s instructions (Agilent Technologies). After library enrich-
ment, a post-capture PCR amplification step was carried out using PE PCR 1.0 and PE PCR 2.0 primers with 4 PCR amplification
cycles.
Sequencing
Hi-C and PCHi-C libraries were sequenced on the Illumina HiSeq2500 platform. 3 sequencing lanes per PCHi-C library and 1
sequencing lane per Hi-C library were used.
Hi-C and PCHi-C Sequence Alignment

Raw sequencing reads were processed using the HiCUP pipeline (Wingett et al., 2015), which maps the positions of di-tags against
the human genome (GRCh37), filters out experimental artifacts, such as circularized reads and re-ligations, and removes all duplicate
reads. Library statistics are presented in Table S1.
Hi-C Data Processing and the Definition of TAD Boundaries

Aligned Hi-C data were analyzed using HOMER (Heinz et al., 2010). Using binned Hi-C data, we computed the coverage- and dis-
tance-related background in the Hi-C data at 25kb, 100kb and 1Mb resolutions, based on an iterative correction algorithm (Imakaev
et al., 2012). General genome organization in the eight selected cell types was compared by plotting the distance-and-coverage cor-
rected Hi-C matrices at 1Mb resolution, and by computing the compartment signal related (1st or 2nd) principle component of the
distance-and-coverage corrected interaction profile correlation matrix (Lieberman-Aiden et al., 2009) at 100kb resolution, with pos-
itive values aligned with H3K4me3 CHIP-seq in human CD20+ cells (https://www.encodeproject.org/experiments/ENCSR000DQR/
ENCFF001WXC). The compartment signal for the selected cell types in each replicate was plotted for comparison, and the genome-
wide concatenated ChIP-seq aligned principal components were clustered using hierarchical clustering (using 1 - Pearson correla-
tion as the distance metric). Directionality indices (Dixon et al., 2012) were calculated from the number of interactions 1Mb upstream
and downstream using a 25kb sliding window every 5kb steps, and were smoothed using a ± 25kb window. Topological domain
boundaries (TAD) were called between consecutive negative and positive local extrema of the smoothed directionality indices
with a standard score above 0.5. For each analyzed cell type, TADs called on individual biological replicates were merged by taking
the mean of the TAD boundary genome locations; TADs showing an overlap of less than 75% between biological replicates were
removed from the analysis.
PCHi-C Interaction Calling

Interaction confidence scores were computed using the CHiCAGO pipeline (Cairns et al., 2016). Briefly, CHiCAGO calls interactions
based on a convolution background model reflecting both ‘Brownian’ (real, but expected interactions) and ‘technical’ (assay and
sequencing artifacts) components. The resulting p values are adjusted using a weighted false discovery control procedure that spe-
cifically accommodates the fact that increasingly larger numbers of tests are performed at regions where progressively smaller
numbers of interactions are expected. The weights were learned based on the decrease of the reproducibility of interaction calls
between the individual replicates of macrophage samples with distance. Interaction scores were then computed for each fragment
pair as –log-transformed, soft-thresholded, weighted p values. Interactions with a CHiCAGO score R 5 in at least one cell type were
considered as high-confidence interactions.
Reciprocal Capture CHi-C

A capture system containing 949 PIRs identified in the PCHi-C experiments in at least one of the following cell types: activated, non-
activated CD4+ T cells, erythroblasts, and monocytes was used to probe the Hi-C material in these cell types. Data processing and
interaction detection were performed in the same way as for PCHi-C.

Comparing PCHi-C and Reciprocal Capture Hi-C
Determining consistent signals between genomics datasets is a non-trivial problem that requires leveraging both false-positive and
false-negative rates (Blangiardo and Richardson, 2007; Jeffries et al., 2009), particularly in undersampled datasets such as PCHi-C
(Cairns et al., 2016). Here we took advantage of the sdef method (Blangiardo et al., 2010) to determine the so-called q2 thresholds on
CHiCAGO interaction scores that minimize the global misclassification error by balancing sensitivity and specificity. The q2 thresh-
olds (Ery: 0.27; MK: 0.14; nCD4: 1.23; aCD4: 1.20) were below 5 in all cases, indicating that the consistency range between PCHi-C
and reciprocal capture Hi-C datasets extends considerably below the high-confidence threshold used throughout the study (as also
evident from Figure S2A). The proportion of high-confidence interactions called in PCHi-C (CHiCAGO score > = 5) that fell within con-
sistency range in the reciprocal capture (score > = q2 in both experiments) were, respectively 96.3% (Ery), 98.7% (MK), 92.9%
(nCD4), and 91.6% (aCD4).
Promoter Interaction Localization with Respect to TADs

High-confidence PCHi-C interactions (CHiCAGO score > = 5) were classified as either ‘‘within-TAD’’ or ‘‘TAD boundary-crossing’’
(only interactions with baits located within TAD boundaries were considered in the analysis). Localization expected at random
was estimated by randomly reshuffling the distances between baits and the TAD boundaries on both their flanks across baits,
thus preserving the overall structure of promoter interactions and bait positioning within TADs.
Interaction Clustering and Principal Component Analysis

Interactions with a CHiCAGO score R 5 in at least one cell type were clustered by the Bayesian algorithm ‘‘autoclass’’ (Cheese-
man et al., 1988) based on the full range of asinh-transformed CHiCAGO scores in each cell type. The algorithm was trained on a
sample of 30,000 interactions, and then used in the ‘‘predict’’ mode to classify the complete dataset. The relative error parameter
was set to 0.1. This resulted in 34 clusters, with cluster sizes ranging from 108,066 interactions to 12 interactions and a mean
cluster size of 21,436 interactions. Clustering of the cell types based on their interaction profiles was performed using a hierarchi-
cal algorithm with average linkage, based on Euclidian distances. Principal component analysis was performed using the prcomp
function in R.
Definition of Specificity Scores

Consider a set of cell types I. Let xi denote the measured value of a quantitative property (such as CHiCAGO interaction score or gene
expression level) for cell type i ∈ I. Then, the specificity score sc for a given cell type c ∈ I is a weighted mean of the differences xc – xi
for i s c,
1 X
sc = P dc;i ðxc xi Þ
dc;i isc
isc
where the weights dc,i are distances between cell type c and cell types i, calculated using the complete dataset (e.g., CHiCAGO
interaction scores for all interactions or expression values for all genes; distances calculated using Euclidean distance
metric). The distance weights are introduced to account for imbalances in the distances between cell types. For example,
among the cell types considered here are three types of macrophages that are likely to have very similar profiles of the
measured property compared with other analyzed cell types (and so the distances between macrophage samples will also
be smaller than between macrophages and other cell types). The distance weights focus the calculation of sc on cell types
that are relatively more distant from cell type c. In this example therefore, they will result in the calculation of sc for each
type of macrophage placing relatively little weight on the other types of macrophages. Without this weighting, specificity scores
for macrophages would be smaller on average simply because macrophages are over-represented among the cell types
considered.
Calculation of Cluster Specificity Scores

For a given Autoclass cluster (Figure 2B), a specificity score sc was calculated for each cell type c using the equation above, with xi
defined as the mean asinh-transformed CHiCAGO score for cell type i (mean calculated across all interactions in the given cluster).
The distance weights weights dc,i were calculated based on the full set of CHiCAGO interaction scores. These cluster specificity
scores are shown in Figure 2C.
ATAC-Seq Data Analysis

Processed count data were downloaded from GEO (accession GSE74912). Samples were normalized using DESeq2 (Love et al.,
2014) and the mean normalized counts across replicates were computed for each sample. Regions attracting top 10% mean
normalized counts for each cell type were considered for PIR enrichment analysis. Enrichment at PIRs was computed using the
peakEnrichment4Features function in the CHiCAGO package (Cairns et al., 2016) with respect to randomized PIRs generated so
as to preserve the distribution of PIR distances to promoters.

Histone Modification ChIP and the Definition of Chromatin States
Processed histone modification ChIP-seq datasets were downloaded from the BLUEPRINT project (the January 2015 GRCh37-
based release, ftp://ftp.ebi.ac.uk/pub/databases/blueprint/data/homo_sapiens/GRCh37/). Histone modification enrichment at
PIRs was computed using the peakEnrichment4Features function in the CHiCAGO package (Cairns et al., 2016) with respect to
randomized PIRs generated so as to preserve the distribution of PIR distances to promoters. To form genome segmentations,
ChromHMM (Ernst and Kellis, 2012) was applied to all BLUEPRINT samples with full reference epigenome histone modification align-
ment files, using default settings and defining 25 epigenetic states. This dataset was used as the basis for the Ensembl Regulatory
Build process (Zerbino et al., 2015), defining regulatory features based on the histone profiles (transcription start site, proximal
enhancer, distal enhancer), and also assigning activity statuses based on sample-specific experiments (active, poised, repressed,
inactive) (Zerbino et al., 2016). Baits and PIRs were then overlapped with Ensembl Regulatory Build regulatory features.
Dynamics of Enhancer-Promoter Interactions

Hierarchical clustering was conducted on the presence or absence of high-confidence interactions (CHiCAGO score > = 5) and distal
enhancer activity defined as presented above, using binary distance and complete linkage. Enrichment was calculated as observed
over expected, where observed is the number of active distal enhancers overlapping PIRs, and expected is the expected number
under the null model of no association between enhancer activity and the presence of an interaction.
For the analyses in Figures 3E and 3F, one representative BLUEPRINT sample was selected for each cell type to avoid double count-
ing interactions. A bait fragment was labeled ‘‘active’’ if it overlapped at least one promoter regulatory element in the chromHMM-
defined active state, and a PIR was labeled as ‘‘active’’ if it overlapped at least one distal enhancer in the chromHMM-defined active
state. Promoters and PIRs in all other states, including poised, repressed and inactive were considered as ‘‘non-active.’’ Removing
enhancers in the chromHMM-defined inactive state from the analysis in Figure 3F and considering only poised and repressed en-
hancers as non-active led to the same conclusions (overdispersion-adjusted p value = 0.0016; data not shown).
Sets were formed of overlapping promoter features and baits, and overlapping distal enhancers and PIRs. 2x2 contingency tables
were generated by summarizing these sets: either the full set (Figure 3E) or the subset where at least one cell type has a high-con-
fidence interaction between an active promoter and an active distal enhancer (Figure 3F). The p values for the null hypotheses of
independence between interaction state and regulatory state were calculated by the c2 test. Overdispersion was expected in the
underlying null distribution due to correlated observations arising from the shared baits of multiple interactions. Block bootstrapping
was therefore performed to estimate overdispersion by resampling baited fragments with replacement, and the observed c2-statistic
was scaled by a factor of sqrt(2) divided by the square root of the variance of the 1000 bootstrap-resampled c2-statistics.
Relationship between Active Enhancers and Gene Expression

BLUEPRINT gene expression data were obtained from EGA (https://www.ebi.ac.uk/ega, EGA: EGAS00001000327) and processed
as previously described (Chen et al., 2014), with quantification performed using MMSEQ (Turro et al., 2011). The data were then
filtered so that the Regulatory Build promoter feature was within 500 bp upstream and 50 bp downstream of an annotated transcrip-
tion start site for the gene. Only genes with active promoters in all BLUEPRINT samples were used in this analysis, to remove the large
effect of promoter status on gene expression. A linear model was fitted by robust regression using iterated reweighted least-squares,
where the gene expression was modeled by either the number of interacting active enhancers (Figure 4A), or the number of any
interacting PIRs and the fraction of interacting active enhancers (Figure S4A).
Calculation and Clustering of Gene Specificity Scores (Interactions with Active Enhancers)
We quantified the cell type-specificity of each gene’s interactions with active enhancers through calculation of gene specificity
scores. This analysis was restricted to the eight cell types for which BLUEPRINT expression and histone modification data were
available. The original set of high-confidence interactions was filtered to (i) only contain baits that mapped exclusively to a unique
protein-coding gene promoter and (ii) only contain interactions for which at least one of the eight cell types has both a CHiCAGO
score R 5 and an active enhancer. For this analysis, PIRs were considered as ‘‘active enhancers’’ if they contained proximal/distal
enhancer or transcription start site features (based on the Ensembl Regulatory Build) that were found to be in the active state based
on ChromHMM segmentations of the histone modification data in the corresponding cell type. This resulted in a set of 139,835 in-
teractions and 7,004 unique baits. To focus the analysis on active enhancers, for each interaction CHiCAGO scores were set to zero
for cell types where the enhancer had an inactive status. Finally, to avoid large CHiCAGO scores dominating the specificity analysis,
scores were asinh-transformed and values larger than a threshold of 4.3 (equivalent to a score z36.8) were set to 4.3. We refer to
these scores as ‘‘processed CHiCAGO scores.’’
For each enhancer-promoter interaction, specificity scores sc for each cell type c were calculated as described above (see ‘‘Defi-
nition of specificity scores’’ and equation therein), with xi defined as the processed CHiCAGO score for cell type i. The distance
weights weights dc,i were calculated based on the full set of CHiCAGO interaction scores (asinh-transformed with upper threshold
of 4.3). Now consider a single gene (protein-coding gene promoter) g. Let ng denote the number of enhancer interactions this gene
has among the set of 139,835 interactions. The gene then has ng specificity scores sc for cell type c, one for each interaction. These ng
scores are averaged to obtain the interaction-based gene specificity score for cell type c, sgc . The heatmap in Figure 4B shows these
scores for eight cell types and 7,004 genes.

Clustering of genes based on these specificity scores was performed in R using k-means with Euclidean distance metric and
10,000 random starts each with a maximum of 10,000 iterations. The analysis was repeated with the number of clusters varying be-
tween 2 and 30. We selected 12 clusters (shown in Figure 4B) by inspecting the scree plot of within-cluster sum of squares versus
number of clusters. The cell types were also clustered according to their interaction-based gene specificity scores across genes.
Hierarchical clustering was applied with Euclidean distance and complete linkage (see dendrogram in Figure 4B).
Calculation of Gene Specificity Scores (Expression)

For each of the 7,004 genes, expression-based specificity scores sc were calculated for each cell type c based on BLUEPRINT
expression data, processed as previously described (Chen et al., 2014). The scores for each gene were calculated as described
above (see ‘‘Definition of specificity scores’’ and equation therein) with xi defined as the asinh-transformed gene expression value
for cell type i. The distance weights dc,i were calculated based on the full expression dataset.
Calculation of Gene Cluster Enrichment Scores

Scores were calculated to quantify enrichment of each of the 12 gene clusters in Figure 4B (capturing cell type-specificity of inter-
actions with active enhancers) for the 100 genes expressed with highest specificity in each analyzed cell type.
Let Gc denote the set of 100 genes with highest expression-based gene specificity score for cell type c (Figures 4D and S4C show
interaction-based gene specificity scores for genes in Gc where c is nCD4 and monocytes respectively). Let pc,k denote the propor-
tion of genes in Gc that are in cluster k and qk denote the proportion of all 7,004 analyzed genes that are in cluster k. Then, the cluster k
enrichment score for genes in Gc is given by ec,k = pc,k - qk. Note that qk is the expected value of pc,k when Gc is replaced by a random
selection of 100 genes.
Enrichment scores are shown in Figure 4E. Overall, gene clusters characterized by interactions that are predominantly specific to
cell type c were the most enriched for the 100 genes in Gc.
eQTL Analysis
To evaluate the number of lead eQTLs in monocytes and B cells (Fairfax et al., 2012) that physically contact their target gene pro-
moters, we performed association tests using LIMIX (Lippert et al., 2014) within 2Mb windows around the gene bodies. For each
gene expression probe, at most one lead eQTL SNP was considered at FDR < 10%. We then counted cases, whereby the lead
eQTL or at least one SNP in LD with it (r2 > = 0.8, based on the 1000 Genomes EUR cohort (Auton et al., 2015)) overlapped a PIR
for the eQTL-associated gene. The same strategy was taken to evaluate the number of PIRs detected in at least one of the 17
cell types overlapping cis-eQTLs (FDR < 10%) for the PIR target genes reported in the whole-blood meta-analysis study (Westra
et al., 2013).
To compute the enrichment of eQTLs at PIRs in the monocyte and B cell data (Fairfax et al., 2012), we used LIMIX to perform as-
sociation tests between each SNP overlapping each PIR and the expression of the respective PIR-connected gene probe. The same
analysis was performed at random regions (‘‘randomised PIRs’’) generated in a manner maintaining the distribution of distances and
spatial interdependencies of the observed PIRs and accounting for the strand directionality of the genes. Specifically, the bait posi-
tion of all PIRs of a given gene was shifted to the bait position of another randomly selected gene. This procedure was performed for
all genes over 1000 permutations. If the randomly selected gene was on the opposite strand compared to the gene of origin, the set
of interactions was mirrored around the bait position. Enrichment was assessed by comparing a) proportions of SNPs that are eQTLs
for the PIR-connected target gene (Figures 5A and 5B) and b) proportions of PIR-connected genes with at least one significant
association (Figures S5A and S5B) at the observed and randomized PIRs over binned distances between the PIRs and the target
gene TSS. The p values were adjusted for all tests across variants and genes in each distance bin.
For the examples of SNPs in PIRs, associations of PIRs (plus extra 500bp on either side of them) with the connected gene expres-
sion were tested for each gene, and the p values were corrected globally for all tests across all variants and genes. Significant
associations were reported at FDR < 10%.
To assess the enrichment of whole-blood cis-eQTLs at the PIRs of their target genes (Figure S5C), we randomized PIRs in the same
way as for the monocyte and B cell analysis presented above, and compared the overlap of observed versus randomized PIRs with
the lead eQTL SNPs for the PIR-connected genes or SNPs in LD with them.
GWAS Summary Statistics

Blood trait summary data (Gieger et al., 2011; van der Harst et al., 2012) were kindly provided by N. Soranzo and the HaemGen con-
sortium; autoimmune disease summary data were retrieved from ImmunoBase (http://www.immunobase.org) (Anderson et al., 2011;
Barrett et al., 2009; Bentham et al., 2015; Cordell et al., 2015; Dubois et al., 2010; Franke et al., 2010; Sawcer et al., 2011; Stahl et al.,
2010); the remaining GWAS summary data were retrieved from various internet resources (Estrada et al., 2012; Ehret et al., 2011;
Locke et al., 2015; Manning et al., 2012; Morris et al., 2012; Teslovich et al., 2010; Wood et al., 2014). Where necessary we used
liftOver or in-house scripts to convert to GRCh37 coordinates. In order to remove SNPs with spuriously strong association statistics,
we removed SNPs with p < 5 3 108 for which there were no SNPs in LD (r2 > 0.6 using 1000 genomes EUR cohort as a reference
genotype panel (Auton et al., 2015)) or within 50 Kb with p < 105.

Poor man’s Imputation (PMI)
We developed a pipeline that approximates the p value for missing SNP summary statistics for a given study using a suitable refer-
ence genotype set. First we split the genome into regions based on a recombination frequency of 0.1cM using HapMap recombina-
tion rate data (Frazer et al., 2007). For each region we retrieve from the reference genotype set (1000 genomes EUR cohort (Auton
et al., 2015)) all SNPs that have MAF > 1% and use these to compute pairwise LD. We pair each SNP from our summary statistics set,
where p values are present, with SNPs from the reference set where p values are unavailable using maximum pairwise r2 (r2Max). If
r2Max > 0.6, we then impute the missing p value as that at the paired SNP. SNPs with missing data without a pair above this r2Max
threshold are discarded as are SNPs that are included in the study but don’t map to the reference genotype set. We masked
the MHC region (GRCh37:6:25-35Mb) from all downstream analysis due to its extended LD and known strong and complex
association with autoimmune diseases.
GWAS Tissue Set Enrichment Analysis of PCHi-C

We developed a method, blockshifter, based on ideas implemented in GOSHIFTER (Trynka et al., 2015) to examine the enrichment of
GWAS signals at PIRs in order to overcome linkage disequilibrium (LD) and interaction fragment correlation. Blockshifter implements
a competitive test of enrichment between a test set of PIRs compared to a control set. First the coordinates of the PIR in the union of
test and control sets are retrieved, and PIRs with no GWAS signal overlap, or that are found in both test or control set are discarded.
For the remaining PIRs we store the number and sum of overlapping GWAS posterior probabilities and these are used to compute d,
the difference in the means between the test and control sets. Due to spatial correlation between GWAS signals and between PIRs
the variance of d is inflated, we therefore compute it empirically using permutation. Runs of one or more PIRs (separated by at most
one HindIII fragment) are combined into ‘blocks’, that are labeled unmixed (either test or control PIRs) or mixed (block contains both
test and control PIRs). Unmixed blocks are permuted in a standard fashion by reassigning either test or control labels randomly, tak-
ing into account the number of blocks in the observed sets. Mixed blocks are permuted by conceptually circularising each block and
rotating the labels (Figure S6A). We then randomly sample from each these precomputed block permutations n times so that the pro-
portion of underlying PIR labels is the same as the observed set, and use this to compute the set of dnull. We use dnull to compute an
empirical Z-score:
d dnull
Z = pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Varðdnull Þ
Integration of GWAS Summary Statistics with Tissue Specific PCHi-C and Functional Information
In order to prioritize genes, traits and tissues for further study we developed the COGS algorithm to compute tissue specific gene
scores for each GWAS trait, taking into account linkage disequilibrium, interactions and functional SNP annotation. For each
GWAS trait, and for each SNP in a given recombination block, we used Wakefield’s synthesis (Wakefield, 2009) to compute approx-
imate Bayes factors and thus the posterior probability for that SNP being causal for that trait assuming at most one causal variant in
the recombination block (Maller et al., 2012). For each gene annotation, for which we have at least one high-confidence interaction
(CHiCAGO score > = 5), and recombination block we compute a block gene score that is composed of the contributions of three
components: (1) coding SNPs in the annotated gene as computed by VEP (McLaren et al., 2010), (2) promoter SNPs, which we define
as SNPs that overlap a region encompassing the bait and flanking HindIII fragments and not any coding SNPs, (3) SNPs that overlap
PIRs for a tissue or set of tissues that do not overlap coding SNPs. Thus for a given target gene, recombination block and trait we can
derive a block ‘‘genescore’’ that is the sum of the posterior probabilities (as computed by PMI) of SNPs overlapping each component.
We assume statistical independence between blocks, so that we can combine block genescores to get an overall ‘‘genescore’’:
Y
genescore = 1 ð1 genescore:blockÞ:
TAD-Based Prioritization
To compare COGS with ‘‘brute-force’’ TAD-based prioritization, we computed TAD-level scores for eight autoimmune traits across
eight cell types. Briefly, for each TAD in each cell type, we subdivided and summed posterior probabilities for each trait (excluding the
MHC region) by overlap with 0.1cM recombination blocks to obtain block TAD scores, removing coding SNPs, and computed an
overall TAD score such that:
Y
TAD:score = 1 ð1 TADscore blockÞ:
A TAD score was assigned to each gene mapping within the respective TAD in each tissue, and the maximum score across all eight
cell types was selected.
Prioritized Gene Enrichment in IBD Differentially Expressed Genes

Normalized microarray expression data for sorted CD4+ T cells, CD8+ T cells, B cells, Monocytes and Neutrophils in 49 patients with
Crohn’s disease (CD), 42 with ulcerative colitis (UC) and 43 healthy controls (Peters et al., 2016) was downloaded from ArrayExpress

(accession E-MTAB-3554). We then used limma (Ritchie et al., 2015) to perform a two-degree-of-freedom test for differential expres-
sion across any of the three patient groups, combining individual gene differential expression across cell types by selecting the most
significant cell type. Fisher’s test was used to compute enrichment across all protein coding genes that had both expression
and COGS scores for UC (Anderson et al., 2011) and CD (Franke et al., 2010). For comparison with TAD-based prioritized genes
in Figure S6C, COGS prioritization was rerun using only eight cell types for which Hi-C data (and therefore TAD information) was
available, with both MHC and coding variation masked.
Reactome Pathway Analysis

For each trait we selected all protein coding genes having an overall gene score above 0.5. We converted Ensembl gene identifiers
to Entrez identifiers with bioMaRt (Durinck et al., 2009) and then used ReactomePA (Yu and He, 2016) to compute enrichment of
genes in Reactome pathways using an FDR cutoff of 0.05. We generated a bubble plot of significant results for each trait using
ClusterProfiler (Yu et al., 2012).
Core Autoimmune Network

For each of the eight analyzed autoimmune traits (CD, CEL, RA, UC, PBC, SLE, MS, T1D) we selected top-scoring genes based on
the following criteria: genescore > 0.5, no more than top 75 genes per condition. The resulting 421 genes were combined into a single
list, and disease associations were assigned to each gene based on the respective genescore > 0.5. This gene list was used as input
to the GeneMania 3.4.0 plugin (Montojo et al., 2010) for Cytoscape 3.3.0 (Cline et al., 2007) to construct a network based on prior
knowledge about these 421 genes (shown in Figure 6E). The following information was used for linking gene pairs: physical interac-
tion (all sources in the plugin), co-localization (the ‘‘Satoh-Yamamoto-2013’’ dataset only), predicted interaction (I2D-based datasets
only), shared pathway annotation. Only the 421 network genes were plotted (‘‘find 0 related genes’’) and query-gene-based weights
were used. The Cytoscape network file is available through Open Science Framework (https://osf.io/u8tzp).
Software
Scripts to compute specificity scores are available at https://github.com/Steven-M-Hill/PCHiC-specificity-score-analysis. Imple-
mentations of the PMI, blockshifter and COGS algorithms, along with supporting documentation, are available at https://github.
com/ollyburren/CHIGP.
Data Resources
The accession number for the raw sequencing reads reported in this paper that were deposited to EGA (https://www.ebi.ac.uk/ega)
is EGAS00001001911. Lists of PHi-C-detected significant interactions, detected interactions between active promoters and active
enhancers, and a comparison of interactions scores between PCHi-C and reciprocal capture Hi-C experiments are available as part
of the Data S1 archive. High-confidence interactions (CHiCAGO score > = 5 in at least one cell type) are available via the CHiCP
browser (Schofield et al., 2016), where they can be visualized alongside GWAS data (https://www.chicp.org) and as custom tracks
for the Ensembl browser (ftp://ftp.ebi.ac.uk/pub/contrib/pchic/CHiCAGO). The regulatory build annotations and segmentations of
the BLUEPRINT datasets are available as a track hub for the Ensembl browser (ftp://ftp.ebi.ac.uk/pub/contrib/pchic/hub.txt). Further
processed datasets, including TAD definitions, regulatory region annotations, specificity scores and gene prioritization data, are
available via Open Science Framework (https://osf.io/u8tzp).

A
Obser ved
Randomized TADs
Mon nCD4 Ery
4000 4000 4000
Number of baits
Number of baits
Number of baits
3000 3000 3000
2000 2000 2000
1000 1000 1000
0 0 0
1 0.5 0 1 0.5 0 1 0.5 0
Fraction of within-TAD interactions per bait
2 2
sDI
sDI
0 0
-2
-2
77680000 77830000
0 77980000 78130000 78280000 132400000 132600000 132800000 133000000 133200000 133400000 Bait
PIR
chr11 chr3 TAD boundary
KCTD21; USP35 BFSP2-AS1
B MK Ery Neu Mon Mφ0 nCD4 nCD8 nB

Genomic position (Mb), chromosome 1
Genomic position (Mb), chromosome 1
100
150
200
250
100
150
200
250
100
150
200
250
100
150
200
250
100
150
200
250
100
150
200
250
100
150
200
250
100
150
200
250
50
50
50
50
50
50
50
50
0
0
0
50
100
150
200
250
0
50
100
150
200
250
-2 -1 0 1 2
log2 enrichment
C chromosome 1
D
MK
15
MK
Count
7.5
Ery
Ery
0
0.8 0.9 1
Neu Value
Neu
Mon
Mon Mon
Neu
Mon Neu
Mφ0
Mφ0 Mφ0
MK
Mφ0 MK
nCD4 Ery
Ery
nCD4 nCD4
nCD4
nCD8 nCD8
nCD8
nCD8 nB
nB
nB
nB
nB
nCD8
nCD8
nCD4
nCD4
Ery
Ery
MK
MK
Mφ0
Mφ0
Neu
Neu
Mon
Mon
nB
0 20 40 60 80 100 120 140 160 180 200 220 240
genomic position (Mb)

Figure S1. Higher-Order Topological Properties of Eight Blood Cell Types, Related to Figure 1
(A) Top panel: Distributions of the frequencies of promoter interactions (per bait) that cross the cognate TAD boundaries in three representative cell types. Black
bars show the observed frequencies, and gray bars show expected frequencies computed by permuting TAD boundaries 1000 times (see Quantification and
Statistical Analysis). The error bars show ± standard deviations of 1000 permutations. On the x axis, 1 corresponds to a scenario whereby all interactions of a
given bait localize within the same TAD as the bait, and 0 corresponds to a scenario whereby all interactions of a given bait cross TAD boundaries. Bottom panel:
examples of baits with PIRs mapping fully within (left) or fully outside (right) the baits’ TADs. Purple bars show baited regions, black arrows show the direction of
the corresponding genes’ transcription, purple arcs show high-confidence interactions called by CHiCAGO (score >= 5), orange bars show TAD boundaries. Plots
above show the directionality index (DI) profiles in the displayed regions, with TAD boundaries defined on the basis of a switch from a negative to a positive DI.
(B) Coverage-and-distance corrected Hi-C matrices of chromosome 1 show the log2-enrichment of interactions between chromatin segments binned at 1Mb
resolution. The eight analyzed cell types (MK, megakaryocytes; Ery, erythroblasts; Neu, neutrophils; Mon, monocytes; M40, macrophages M0; nCD4, naive CD4+
T cells; nCD8, naive CD8+ T cells; nB, naive B cells) are shown in columns, and the respective biological replicates are in rows.
(C) The first principal component of the 100kb-binned interaction correlation matrix for chromosome 1 shows compartmentalisation (positive values are
associated with A and negative values with B compartment). Each biological replicate of the eight analyzed cell types is shown.
(D) Correlation matrices of the genome-wide concatenated first principal components with dendrograms from hierarchical clustering show the grouping of cell
types according to the compartment signal.
A Ery MK naCD4 aCD4
1.0
1.0
1.0
1.0
Cumulative density
Cumulative density
Cumulative density
Cumulative density
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
q2 HCI q2 HCI q2 HCI q2 HCI
Asinh CHiCAGO score (reciprocal capture system)
HCI in PCHi-C
Other
B Ery chr4:121,869,149-124,881,888 C naCD4 chr7:115347371-117005131
0.5Mb 0.5Mb
200
80
150
60
TRPC3 TES
100
40
N
N
50
20
0
0
500
800
400
600
300
533781 695079
N
N
400
200
200
100
0
0
400
500
400
300
300
200
533782 695082
N
N
200
100
100
0
chr4 122,000,000 122,900,000 123,800,000 124,700,000 chr7 115600000 116200000 116800000
TES
TRPC3
Figure S2. Validation of Promoter Interactions Using Reciprocal Capture Hi-C, Related to Figure 1
(A) Cumulative density plots showing the distributions of asinh-transformed CHiCAGO interaction scores for promoter-containing reciprocal capture Hi-C
fragment pairs that are detected as high-confidence interactions (HCI) in the PCHi-C analyses in the respective cell types (blue line - HCI; CHiCAGO score > = 5)

versus those that are not detected as HCI in PCHi-C (gray line). Vertical lines show the high-confidence CHiCAGO score cutoff of 5 on the asinh-transformed scale
(2.31) for the reciprocal capture Hi-C samples and the q2 cutoffs minimizing the total misclassification error across the PCHi-C and reciprocal capture Hi-C
samples for each cell type (Blangiardo and Richardson, 2007). See Quantification and Statistical Analysis.
(B and C) Comparison of interactions detected with PCHi-C (top) and reciprocal capture (bottom two panels) for two example regions in erythroblasts (Ery, panel
B) and non-activated CD4 cells (naCD4, panel C). The PCHi-C baits capture the TRPC3 and TES promoters, respectively, while reciprocal capture baits were
designed to capture their selected PIRs. Interactions are plotted in the same way as in Figure 1C.
A B C
PIR enrichment for ATAC-seq peaks

*** z-score>75
***
3 ***
***
1000
Observed ***
Myeloid
Number of baits
Expected
3138 ***
2
500
845 4305
11237
Invariant Lymphoid
74 795
423 1
0
0.0 0.2 0.4 0.6
Variance of specificity score
across interactions of the same bait 0
tB tCD4 tCD8 Ery Mon
Ensembl AC104389.32 > OR52P1P >

annotation OR52J2P > OR52E3P > AC104389.31 > OR52B5P > TRIM34 > OR52U1P >
OR51P1P > OR52E1 > OR51Q1 > TRIM6-TRIM34 > OR56B2P >
OR52J3 > OR52J1P > OR51M1 > OR52D1 > TRIM6 > OR56B1 > OR52N3P > OR52E7P >
OR51L1 > OR52E1 > AC104389.16 > OR51B6 > OR51J1 > OR51I2 > OR52B6 > TRIM22 > OR52N4 > OR52N2 > OR52E4 >
< OR52E2 < OR52A4 < OR51A1P < CoTC_ribozyme < OR51B2 < OR51I1 < UBQLN3 < OR52T1P < TRIM5
< OR52S1P< OR52A5 < OR51V1 < HBG2 < OR52N5 < OR52E6
< OR52A1 < HBB < HBE1 < UBQLNL < HNRNPA1P53 < OR52N1 < OR52E8
< AC113331.9< OR52Z1 < HBD < OR51B4 < OR51K1P < OR52H1
< HBG1 < OR51B5 < OR51A10P < OR52H2P
< HBBP1 < OR51AB1P < AC087380.14
< CTD-2643I7.1 < OR51B8P < OR52V1P
< AC104389.28 < AC015691.13
< OR51B3P
Regulatory
build
HindIII
fragments
Activity
PCH-iC
Ery
Activity
PCH-iC
nCD8
Activity
PCH-iC
Mon
LCR
Ensembl Homo sapiens version 83.37 (GRCh37.p13) Chromosome 11: 5,015,756 - 5,934,932 100Kb
Ensembl annotation Regulatory build annotation ChromHMM activity
Protein coding Processed transcript Promoter CTCF binding site TFBS
Distal enhancer Active Inactive
Pseudogene RNA gene Proximal enhancer DHS

Figure S3. Additional Properties of Promoter Interactions, Related to Figures 2 and 3
(A) Venn diagram showing the numbers of promoter baits with interactions mapping to the ‘‘myeloid’’, ‘‘lymphoid’’ and ‘‘invariant’’ sets of clusters. See Figures 2B
and 2C and the main text for details. Includes 141 non-promoter-containing baits that are not considered in further analyses.
(B) Evidence that promoters preferentially have interactions with a similar cell type specificity. A histogram of the observed variance of the specificity scores
across interactions of the same bait (blue) versus the same obtained by permuting cluster labels (expected, gray). The specificity score for a given interaction was
taken to be the maximum of the interaction’s cluster specificity scores across all cell types. See Quantification and Statistical Analysis.
(C) Significance of PIR enrichment for chromatin accessibility regions detected by ATAC-seq in five blood cell types (tB, total B cells; tCD4, total CD4+ T cells;
tCD8, total CD8+ T cells; Ery, erythroblasts; Mon, monocytes) (Corces et al., 2016) in comparison with distance-matched random regions, expressed in terms of
z-scores. Error bars show ± SD across 100 draws of random regions.
(D) A zoomed-out view of promoter interactions and chromatin features in and around the b-globin locus. PCHi-C data from 3 cell types (Ery, erythroblasts; Mon,
monocytes; nCD8, naive CD8+ T cells), showing regulatory element annotations from the Ensembl Regulatory Build, colored by feature, and chromatin activities
based on ChromHMM segmentations of BLUEPRINT histone modification data. (ChromHMM activities included four states: ‘‘active’’, ‘‘poised’’, ‘‘Polycomb-
repressed’’, and ‘‘inactive’’, with only ‘‘active’’ and ‘‘inactive’’ states observed in the region shown). The image is based on a screenshot produced with Ensembl
v83 using GRCh37 assembly and GENCODE v19 gene annotations. The b-globin Locus Control Region (LCR) is highlighted in a blue box.
A
6
Residual gene expression
4
2
0
-2
-4
-6
0 10 20 30 40 50 60 70
No. of PIRs
B
Mon Mφ0 Mφ1 Mφ2
Mean gene specificity score
(expression data)
1
1 1 1
● ● ● ●
● ●
● ●● ● ● ●● ●
●● ● ●
● ●
●●●●
● ●
●
● ●
● ●
0
●●● ● ●● 0
●●●●
●●
●
●● ●●●●●
●
●●●●
● ●
●● 0 0 ●
●●
●●
● ●●
● ●
−1
−1 −1 −1
−1 0 1 2 0 1 2 0 1 2 0 1 2
Mean gene specificity score (interactions with active enhancers)

Cluster ID 1 ● 2● 3● 4 ● 5● 6● 7● 8 ● 9● 10● 11● 12 ●
C Cluster ID
1 2 34 5 6 7 8 9 10 11 12
nCD4
Mφ1
Mφ2
Mφ0
Mon
MK
Ery
Neu
Top 100 Mon-specific genes (based on expression)
−4 −2 0 2 4
(interactions with active enhancers)
Figure S4. Additional Evidence of the Link between Promoter Interactions and Gene Expression, Related to Figure 4
(A) Partial residual plot of log2-gene expression as a function of the number of PIRs interacting with the respective baited region in the cell types, where the
promoter is active in all analyzed cell types. The trendline is from a linear regression using iterated reweighted least-squares (see Quantification and Statistical
Analysis).
(B) Mean gene specificity score (based on interactions with active enhancers) for each of the clusters in Figure 4B is plotted against analogous mean gene
specificity scores based on expression data for monocytes (Mon) and macrophages M0, M1, M2 (M40-2). Error bars indicate ± SD. Plots for nCD4, MK, Ery and
Neu are shown in Figure 4C. See Quantification and Statistical Analysis for details.
(C) A subset of the heatmap in Figure 4B, showing interaction-based gene specificity scores for the top 100 monocyte-specifically expressed genes (obtained by
ranking genes according to their monocyte (Mon) expression-based specificity scores), together with cluster IDs.
A B C
Monocytes Total B cells Whole blood
contacting regulated gene promoters

Number of lead cis-eQTLs physically
SNPs at PIRs SNPs at PIRs Observed PIRs
SNPs at randomised PIRs SNPs at randomised PIRs Randomised PIRs
*** ***
Fraction of genes with

*** 0.04
Fraction of genes with
an eQTL within PIRs

0.04 ***
an eQTL within PIRs
1200
0.03 ***
0.03 *** *** 800
***
0.02 0.02
***
*** *** 400
0.01 0.01
0 0 0
Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb Kb
10 50 00 00 00 00 10 50 00 00 00 00
0- 10- 0-1 0-2 0-5 -10 0- 10- 0-1 0-2 0-5 -10
5 10 20 00 5 10 20 00
5 5
Lead eQTL SNP location Lead eQTL SNP location
D E
Total B cells 10Mb Monocytes 2Mb
9 5 6 73 rs1
79 48 08 NCO
81 9 21
rs3 R KA rs4 61 A4
0
AU
chr20
12500000 22500000 32500000 42500000 52500000 46000000 48000000 50000000 chr10
Gene Baited promoter fragment PIR SNP Gene Baited promoter fragment PIR SNP
rs3817995 rs4948673 rs10821610

AURKA NCOA4 NCOA4
4
Gene expression
Gene expression
2 2 2
0
0 0
-2
-2 -2
-4
A/A A/T T/T A/A A/G G/G
A/A A/G G/G
100kb window around rs3817995 100kb window around rs4948673 and rs10821610
20kb 14 40kb 14 40kb
4
3 10 10
-log10p
-log10p
2 6 6
1
2 2
19995000 20015000 20035000 20055000 chr20 45905000 45945000 chr10 51515000 51555000 chr10
AURKA eQTL test rs3817995 AURKA PIR NCOA4 eQTL test rs4948673 rs10821610 NCOA4 PIRs
Figure S5. Further Details on the Enrichment of eQTLs at Promoter-Interacting Regions, Related to Figure 5
(A and B) The proportion of genes with at least one eQTL SNP per gene expression probe located within PIRs compared with the equivalent proportion of eQTL
SNPs located within matched random regions (‘‘randomised PIRs’’) in monocytes (A) and total B cells (B). See Quantification and Statistical Analysis for details on
the randomization strategy. Asterisks represent the significance of enrichment at observed versus randomized PIRs (permutation test *p < 0.05; **p < 0.01; ***p <
0.001).
(C) Number of lead cis-eQTLs in whole blood (FDR < 10%) physically contacting regulated gene promoters (accounting for linkage disequilibrium). Results
obtained with randomized PIRs are shown as controls. Asterisks represent the significance of enrichment at observed versus randomized PIRs (permutation test
*p < 0.05; **p < 0.01; ***p < 0.001).
(D) An example of an extremely long-range eQTL association between rs3817995 and AURKA expression in total B cells, with the SNP located > 30 Mb away from
AURKA transcription start site (TSS). The gray dashed line represents the significance threshold.
(E) An example of two independent eQTL signals detected for NCOA4 in monocytes, with the primary eQTL SNP (rs4948673) located > 5 Mb away from the TSS.
The second, independent eQTL SNP (rs10821610) is located close (< 20kb) to the NCOA4 TSS. The gray dashed line represents the significance threshold.
A B C
1.00
probability
2.5
(max across 8 AI diseases)

Posterior
Odds ratio of differential

COGS gene scores
expression in IBD
0.75 219 538
2.0
Observed 0.50
Block- 1.5
shifted 0.25 12,490 3,203
HindIII
1.0
0.00
0.00 0.25 0.50 0.75 1.00 COGS TAD-based
TAD-based gene scores Crohn's disease
(max across 8 AI diseases) Ulcerative colitis
D F
SLE SLC15A4 tB RA GIN1 tB
8 8
6
-log 10 p
-log 10 p
6
4
2 2
128500000 129500000 chr12 102000000 103000000 chr5

eQTL test GWAS test eQTL test GWAS test
E G
SLE BLK tB RA RASGRP1 tB
20
10
-log 10 p
-log 10 p
10 6
2
0
10500000 11500000 chr8 38000000 39000000 chr15
eQTL test GWAS test eQTL test GWAS test
Figure S6. Colocalization of GWAS and eQTL Signals at Prioritized Candidate Genes, Related to Figure 6
(A) A schematic of the permutation strategy implemented in blockshifter. GWAS summary statistics are converted to posterior probabilities for a given SNP to be
causal (red dots depict SNPs likely to be causal, blue dots depict other SNPs). Blocks of adjacent PIRs found in either test (purple) or control (cyan) tissue sets,
separated by two or more non-PIR HindIII fragments (gray), are then defined. Labels of HindIII fragments within each block are then rotated (‘block-shifted’) to
generate test sets for estimating the empirical variance of the test statistic under the null while accounting for genomic structure.
(B) Comparison of COGS prioritization scores with those obtained using a ‘‘brute-force’’ algorithm based on shared TADs for eight autoimmune (AI) diseases (see
Quantification and Statistical Analysis for details). Quadrants correspond to genes not exceeding the score cutoff of 0.5 with both methods, and exceeding it with
just one or both methods. Counts of genes in each quadrant are shown.
(C) Odds ratios of differential expression in the immune cells of irritable bowel disease (IBD) patients (FDR < 5%) (Peters et al., 2016) for genes prioritized for
Crohn’s disease (purple) and ulcerative colitis (blue) by the PCHi-C-based COGS or a TAD-based algorithm (score > 0.5).
(D–G). 2 Mb windows around the genes prioritized by the GWAS/PCHi-C based algorithm in rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE)
were overlapped with eQTLs for the same genes in B cells. In five cases high LD (r2 > 0.8) was detected between the GWAS lead SNP and the eQTL lead SNP in the
2Mb regions. Shown are Manhattan plots for two SLE-prioritized genes (SLC15A4, panel D; BLK, panel E) and two RA-prioritized genes (GIN1, panel F;
RASGRP1, panel G), for which high LD (r2 > 0.8) was detected between the GWAS lead SNP and the eQTL lead SNP, providing evidence for colocalization of the
GWAS and eQTL signals in these regions.
Resource
Histone Acetylome-wide Association Study of

Autism Spectrum Disorder
Wenjie Sun, Jeremie Poschmann,
Ricardo Cruz-Herrera del Rosario, ...,
Jonathan Mill, Daniel H. Geschwind,
Shyam Prabhakar
Correspondence
dhg@mednet.ucla.edu (D.H.G.),
prabhakars@gis.a-star.edu.sg (S.P.)
In Brief
As part of the IHEC consortium, this study
characterized histone acetylation
patterns in brain samples from patients
with autism spectrum disorder (ASD),
uncovering a distinct epigenetic
signature in ASD and providing a rich
resource for future molecular analyses of
ASD patients. Explore the Cell Press IHEC
web portal at http://www.cell.com/
consortium/IHEC.
Highlights
d Histone acetylation population study of ASD and control
brain samples
d Discovery of ASD-specific epigenetic signature
d Similar epigenomic aberrations in syndromic and idiopathic

ASD
d Thousands of QTLs are discovered
Sun et al., 2016, Cell 167, 1385–1397

Resource
Histone Acetylome-wide Association Study

of Autism Spectrum Disorder
Wenjie Sun,1,6 Jeremie Poschmann,1,6 Ricardo Cruz-Herrera del Rosario,2 Neelroop N. Parikshak,3 Hajira Shreen Hajan,1
Vibhor Kumar,1 Ramalakshmi Ramasamy,1 T. Grant Belgard,3 Bavani Elanggovan,1 Chloe Chung Yi Wong,4
Jonathan Mill,4,5 Daniel H. Geschwind,3,* and Shyam Prabhakar1,7,*
1Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
2Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
3Program in Neurogenetics, Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of
Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA

4Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London SE5 8AF, UK
5University of Exeter Medical School, University of Exeter, Exeter EX2 5DW, UK
6Co-first author
7Lead Contact
*Correspondence: dhg@mednet.ucla.edu (D.H.G.), prabhakars@gis.a-star.edu.sg (S.P.)

SUMMARY number variation, chromosomal rearrangements, and also rare

single-base pair mutations in coding genes (Devlin and Scherer,
The association of histone modification changes with 2012; de la Torre-Ubieta et al., 2016). In addition, environmental
autism spectrum disorder (ASD) has not been sys- risk factors such as chemical toxins and maternal infection dur-
tematically examined. We conducted a histone acety- ing gestation are thought to play a role in the occurrence of
lome-wide association study (HAWAS) by performing ASD (Grabrucker, 2013; Matelski and Van de Water, 2016).
H3K27ac chromatin immunoprecipitation sequencing Thus, the etiology of ASD is complex and multifactorial, including
both genetic and environmental components.
(ChIP-seq) on 257 postmortem samples from ASD
Large-scale gene expression studies on ASD postmortem
and matched control brains. Despite etiological het-
brain regions, as well as the prenatal and postnatal developing
erogeneity, R68% of syndromic and idiopathic ASD brain, suggest that alterations in common molecular pathways
cases shared a common acetylome signature at such as transcriptional regulation, synaptic function, and immu-
>5,000 cis-regulatory elements in prefrontal and tem- nity may occur during brain development and contribute to ASD
poral cortex. Similarly, multiple genes associated with pathophysiology (Voineagu et al., 2011; Parikshak et al., 2013;
rare genetic mutations in ASD showed common ‘‘epi- Willsey et al., 2013). How the genetic and environmental hetero-
mutations.’’ Acetylome aberrations in ASD were not geneity is translated into shared molecular pathways is not well
attributable to genetic differentiation at cis-SNPs understood. In addition, most of the efforts have so far focused
and highlighted genes involved in synaptic transmis- on gene expression and genetic changes in coding regions.
sion, ion transport, epilepsy, behavioral abnormality, Many of these coding variants are extremely rare and account
only for a small proportion of ASD cases (Stein et al., 2013;
chemokinesis, histone deacetylation, and immunity.
Geschwind and State, 2015). Therefore, it has been proposed
By correlating histone acetylation with genotype, we
that epigenetic changes caused by non-coding genetic variation
discovered >2,000 histone acetylation quantitative or by environmental insults might contribute to ASD (Kubota
trait loci (haQTLs) in human brain regions, including et al., 2012). An attempt to characterize epigenomic changes
four candidate causal variants for psychiatric dis- in patients is thus likely to provide novel insights into the etiology
eases. Due to the relative stability of histone modifica- of ASD (Akbarian et al., 2015). Thus far, epigenome-wide associ-
tions postmortem, we anticipate that the HAWAS ation studies (EWAS) of psychiatric and other diseases have
approach will be applicable to multiple diseases. mostly focused on DNA methylation (Kubota et al., 2012; Mill
and Heijmans, 2013; Lunnon et al., 2014; Loke et al., 2015; Mon-
tano et al., 2016). In contrast, little is known about histone modi-
INTRODUCTION fication changes in psychiatric disease (Shulha et al., 2012) or the
genetics of population variation in histone modification (del Ro-
Autism spectrum disorder (ASD) is a collection of neuro-devel- sario et al., 2015; Grubert et al., 2015; van de Geijn et al., 2015).
opmental disorders characterized by deficits in social interaction To address this lack of knowledge, we globally interrogated
and social communication, along with restricted and repetitive the histone acetylomes of enhancers in a large cohort of ASD
behavior patterns. DNA sequence variation affecting the function and control samples by analyzing tissue from three brain regions
of several hundred genes has been implicated in the etiology of postmortem: prefrontal cortex (PFC), temporal cortex (TC), and
ASD at various levels of significance (Abrahams et al., 2013; de la cerebellum (CB). These brain regions were chosen due to the
Torre-Ubieta et al., 2016). These genetic changes include copy role of frontal and temporal lobe in social cognition and the
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
cerebellar dysfunction observed in some animal models of ASD In order to define the core set of chromatin aberrations in
(Abrahams and Geschwind, 2010; de la Torre-Ubieta et al., typical ASD cases, we employed a systematic mathematical cri-
2016). H3K27ac was selected as the representative acetylation terion to exclude atypical samples (STAR Methods). Because the
mark because it highlights active enhancers and promoters number of excluded samples was relatively small (5%–20% of
(Wang et al., 2008; Heintzman et al., 2009; Creyghton et al., total cases and 14%–20% of dup15q), acetylation fold changes
2010) and is also correlated with gene expression and transcrip- were not substantially altered (PFC: R = 0.94, TC: R = 0.90, CB:
tion factor binding (Kumar et al., 2013). We used the data to R = 0.98; Figure S4). We then used the remaining (typical) sam-
define aberrantly acetylated enhancer and promoters in ASD ples to define the final set of DA peaks for each brain region.
brain and thereby characterize commonly altered pathways, up- Strikingly, we detected 5,153 DA peaks in PFC and 7,009
stream regulatory factors, and developmental dynamics of in TC, indicating widespread, systematic histone acetylation
affected loci. In addition, we used chromatin immunoprecipita- changes in ASD cerebral cortex (Figures 1B and 1C). In contrast,
tion sequencing (ChIP-seq) reads to call SNPs within enhancers only 247 DA peaks were detected in CB. The limited molecular
and promoters. We then used the genotype-independent signal pathology of ASD cerebellum is consistent with results from tran-
correlation and imbalance (G-SCI) test (del Rosario et al., 2015) scriptomic studies (Voineagu et al., 2011; Parikshak et al., 2016).
to detect haQTLs in regulatory regions and assessed their rela- To evaluate the likelihood of false-positive DA peaks, we re-
tionship to known psychiatric disease-associated variants. This peated the entire procedure (initial DA peaks, discarding atypical
dataset from post-mortem human brains will provide a rich samples, final DA peaks) after randomly permuting ASD and con-
resource for future molecular analyses of ASD and serve as proof trol labels. At the same false discovery rate (FDR) threshold (Q %
of principle for the HAWAS approach, which can be applied to a 0.05), permuted datasets generated fewer than 100 DA peaks, on
wide variety of human diseases. average. Moreover, after 1,000 tries, none of the permutated data-
sets generated as many DA peaks as the true dataset (Figure 1B).
Thus, the chromatin changes we detected in ASD samples were
RESULTS far in excess of what would be expected by chance.
To further characterize the overall consistency of the DA peak
Data Generation, Processing, and Differential sets, we examined their overlaps. Over 45% of ASD-upregulated
Acetylation Analysis regions in PFC overlapped ASD-Up peaks in TC (p z 0; Fig-
In total, we performed 257 H3K27ac ChIP-seq assays on tissue ure 1D). The same was true of ASD-downregulated peaks. More-
samples from PFC, TC, and CB, in 94 individuals aged 10 years over, the ASD-versus-control acetylation fold change was highly
and above (45 ASD, 49 control; Figure 1A; Table S1). Forty-eight correlated between PFC and TC (R = 0.86; p z 0). Thus, the
ChIP-seq profiles were discarded based on data quality, result- chromatin dysregulation signature of ASD was highly consistent
ing in a final acetylome set comprising 209 profiles (STAR between the two cortical regions. Cerebellar DA peaks, on the
Methods; Table S2): 81 from PFC (41 ASD, 40 control), 66 from other hand, showed only 5% overlap with same-direction
TC (30 ASD, 36 control), and 62 from CB (31 ASD, 31 control). cortical DA peaks.
We used DFilter (Kumar et al., 2013) to call peaks in each of Of the 45 cases, 7 had a monogenic form of ASD, duplication
the 209 ChIP-seq profiles and then defined two consensus 15q syndrome (dup15q; Figure 1A), while the others had no
peak sets: 56,503 cortical peaks (union of PFC and TC) and detectable structural variants and were therefore idiopathic
38,069 CB peaks. Each consensus ChIP-seq peak defined a re- (Parikshak et al., 2016). It is possible that individuals with
gion of focal histone acetylation and thus represented a putative syndromic ASD could have unique chromatin aberrations. We
promoter or enhancer region. therefore defined DA peaks separately for syndromic and
The heights (aggregate read counts) of consensus peaks idiopathic ASD, relative to the same set of controls (STAR
represent acetylation levels of cortical and cerebellar regulatory Methods). Remarkably, acetylation changes were highly concor-
regions in each sample. We normalized these peak heights for dant genome-wide between the two forms of ASD (Figure 1E,
GC content (Figure S1) and distributional skews and then R = 0.88 in PFC, R = 0.87 in TC). To maximize statistical power,
controlled for confounders by regressing out multiple biological we therefore retained the original set of DA peaks based on all
covariates such as age, sex, and proportion of neurons and ASD samples (syndromic plus idiopathic).
also multiple technical covariates (STAR Methods; Figure S2). PFC and TC gene expression levels have been comprehen-
Corrected peak heights were used to define an initial set of differ- sively measured using RNA sequencing (RNA)-seq in a parallel
entially acetylated (DA) loci between ASD and control in each study on the same cohort (Parikshak et al., 2016). To investi-
brain region (Wilcoxon rank sum test; Q % 0.05, fold change gate the consistency between chromatin aberrations and
R1.3). Based on acetylation levels at these DA loci, we gene expression changes in ASD, we focused on promoter re-
measured inter-sample divergence and found that a small num- gions of differentially expressed (DE) genes (FDR % 0.05; linear
ber of atypical ASD samples showed greater similarity to control mixed model) (Parikshak et al., 2016). We used the EFilter tool
and vice versa (Figure S3). This was not surprising, given the (Kumar et al., 2013) to convert promoter histone acetylation
tremendous etiological heterogeneity of ASD and previous find- profiles into expression estimates and then identified the subset
ings from transcriptomic analysis (Voineagu et al., 2011). Never- of DE genes whose acetylation-based expression estimates
theless, in the majority of cases, ASD acetylomes resembled were significantly divergent between ASD and control (Q %
each other more than they resembled control and vice versa 0.05; Wilcoxon test; Benjamini-Hochberg correction), after con-
(Figure S3). trolling for covariates as before. At these gene loci, promoter
1386 Cell 167, 1385–1397, November 17, 2016

A B
Brain Region ASD vs Control: no. Average no. of DA peaks
of DA peaks (P-value) in permutated data
C
PFC 5,153 (<1e-3) 84
TC 7,009 (<1e-3) 59
CB 247 (<1e-3) 2
dup15q
D
C
C P≈0 P≈0
PFC TC PFC TC P≈0
1,410 1,210 2,078 1,343 1,184 2,527 1.5
PC2 (10.4%)
2 0 3 1 3 4
78 156
CB CB R=0.86
PC1 (52.1%)
E PFC TC
Union of DA peaks Union of DA peaks
2 2
log2FC idiopathic
log2FC idiopathic
PC2 (12.9%)
0 0
R=0.88 R=0.87
P≈0 P≈0
-2 -2
-2 0 2 -2 0 2
log2FC dup15q log2FC dup15q
PC1 (44.7%)
F TC
1 R=0.33 1 R=0.38
log2FC DA (EFilter)
log2FC DA (EFilter)
PC2 (10.5%)
P=0.016 P=0.0012
0 0
-1 -1
-2 0 2 -2 0 2
PC1 (28.4%) log2FC DGE (RNAseq) log2FC DGE (RNAseq)
Figure 1. Histone Acetylome-wide Association Study of ASD

(A) Overview of post mortem tissues in three brain regions used in this study. H3K27ac ChIP-seq was performed on prefrontal cortex (PFC), temporal cortex (TC),
and cerebellum (CB) samples from 45 ASD (A) and 49 control (C) individuals. ASD subjects with 15q duplication syndrome are highlighted in green.
(B) Number of DA peaks detected in the three brain regions. For comparison, the average number of DA peaks called in 1,000 randomized datasets (permutation
of case-control labels) is shown. The p value was computed as the fraction of randomized datasets yielding at least as many DA peaks as the true dataset.
(C) PCA of ChIP-seq peak heights (DA peaks only) in the three brain regions. Red dots, ASD samples; blue diamonds, control samples. Unfilled dots and di-
amonds indicate atypical samples. Variance explained by PC1 and PC2 are shown in parentheses. The vertical line, the threshold on ASD-specific global
acetylome signature (AGAS) score.
(D) Venn diagrams showing Overlap between DA peak sets from the three brain regions. The hypergeometric test was used to calculate p values, with the set of all
peaks as background. The density plot shows the log2 fold change in TC versus PFC in the union of DA peaks. The corresponding p value was calculated
assuming a t-distributed Pearson correlation coefficient.
(E) Correlation between log2 fold change of dup15q and idiopathic samples in PFC and TC. The correlation coefficient and its p value were calculated as in (D).
ChIP-seq peaks within the 15q duplication region are highlighted in red.
(F) Correlation between fold change of differential acetylation in the promoter region (DA) and differential gene expression (DGE) in PFC and TC. Only significantly
differential loci are shown (PFC:58 genes, TC:79 genes; STAR Methods). Statistical significance of concordance in fold change direction was calculated using the
hypergeometric test.
See also Figures S1, S2, S3, S4, and S5 and Tables S1, S2, and S4.
acetylation changes were significantly correlated with expres- to CB, due to the lack of detectable DE genes in that tissue.
sion fold change, both in PFC (R = 0.33; p = 0.016) and in TC Thus, while measurement noise and biological differences be-
(R = 0.38; p = 0.0012) (Figure 1F). This analysis was not extended tween chromatin variation and expression variation may have
Cell 167, 1385–1397, November 17, 2016 1387

attenuated the similarity between the two, we nevertheless that some of the DA peaks in Table S4 have been annotated
observed evidence of overall consistency between acetylation with the names of smaller genes within the introns of HDAC4).
aberrations and expression dysregulation in ASD. Chemokine gene loci were also enriched for DA-Down peaks
Having confirmed the robustness and consistency of the DA in both cortical regions. In addition, when all genes in the
peak set, we constructed an ASD-specific global acetylome genome were individually scored for enrichment near DA-
signature (AGAS), defined as the first principal component Down peaks, the chemokine receptor CX3CR1 ranked among
(PC1) of the corresponding peak height matrix. The strength the top five in both PFC and TC (Table 1). Multiple gut-related
of this signature in each brain sample was thus given by its gene groups such as embryonic digestive tract morphogenesis
PC1 score (X coordinate in Figure 1C). In all three brain re- and digestive/alimentary phenotype showed significantly
gions, ASD samples had significantly lower AGAS scores, reduced histone acetylation in ASD cortex. These gene sets
and disease status explained 12%–63% of the score variance included multifunctional morphogenetic genes such as FGFR2,
(Figures 1C and S5). Conversely, a simple threshold on the chemokine ligand and receptor genes (including CX3CR1), and
AGAS score could be used to predict disease status for the HDAC SALL3 (Table S3). Among the individually enriched
68%–95% of samples in the three brain regions (Figure 1C). genes near downregulated peaks in TC, the behavior-related
Thus, ASD was associated with a coherent global shift in the gene GRB10 (Garfield et al., 2011) was the most statistically
histone acetylome. significant (p < 1e-9).
We hypothesized that DA peaks might act as cis-regulatory
Functional Properties of ASD Chromatin Aberrations elements for some of the genes thought to be causal for ASD.
Although ASD is known to be etiologically highly heteroge- We therefore analyzed a curated list of 296 ASD genes for prox-
neous, we hypothesized that its diverse causal genetic and imity to DA peaks (SFARI database, gene score %4) (Abrahams
environmental perturbations could potentially converge on a et al., 2013) (Table S5). In order to control for biases in gene length
small set of downstream pathways (Voineagu et al., 2011; Par- and intergenic size, this analysis was performed using the same
ikshak et al., 2015). We therefore used the GREAT tool statistical procedure as in the GREAT tool (McLean et al., 2010)
(McLean et al., 2010) to test for significantly enriched (fold (hypergeometric test). In both PFC (p = 0.017) and TC (p =
change R 1.5, Q value % 0.01) gene categories in DA loci. 0.025), peaks showing increased acetylation in ASD were signif-
GREAT maps regulatory elements to flanking genes based icantly enriched near the ASD gene set (Table S5). ASD-downre-
on proximity and uses the hypergeometric test to determine gulated peaks in TC were also enriched, though not significantly
if the fraction of DA peaks near genes from any particular func- (p = 0.055) and downregulated peaks in PFC showed no enrich-
tional category is greater than expected by chance. Overall, ment for known ASD loci. Cerebellar DA peaks were too few in
DA peaks in PFC and TC showed very similar functional pro- number to analyze in this manner. Cortical DA peaks as a whole
files. In both cortical regions, upregulated DA peaks were (union of the four DA peak sets) were clearly overrepresented
significantly enriched in neuronal functions known to be per- near ASD genes (p = 7.6e-4; fold enrichment = 1.1).
turbed in ASD, including synaptic transmission, metal cation To identify transcription factors (TFs) that potentially mediate
transport, epilepsy, and the glutamate receptor pathway (Voi- aberrant histone acetylation in ASD, we used the HOMER tool
neagu et al., 2011) (Figure 2A; Table S3). Known ASD genes (Heinz et al., 2010) to scan for TF-binding motifs within DA
from these categories include CACNA1C and GRIN2B (Splaw- peaks. Most notably, we found strong enrichment of RFX mo-
ski et al., 2004; O’Roak et al., 2011), which flank five and seven tifs in ASD-upregulated peaks, both in TC and in PFC (Figure 3;
DA peaks, respectively (Figure 2B). In light of the observation DA Up). RFX2 has a DA peak at its promoter and RFX3 con-
that zinc deficiency is common in ASD (Yasuda et al., 2011), tains an intronic DA peak (Table S4). These two TFs are there-
it is intriguing that upregulated DA peaks were also enriched fore the most likely candidates for driving increased acetyla-
in loci related to zinc ion homeostasis. Genes contributing to tion in ASD. Three other TFs or TF families were enriched in
this result include the SLC30A5 zinc transporter gene, which DA Up peaks across both cortical regions: PAR bZIP, AP-1
flanks multiple DA peaks and has been shown to harbor rare and MEF2. Among the PAR bZIP candidate TFs, E4BP4 and
single nucleotide variants in ASD (O’Roak et al., 2011; Sanders HLF are the most promising, because their promoters are
et al., 2012). acetylated in cerebral cortex. Among the MEF2 factors,
Downregulated DA peaks also showed highly significant MEF2C is clearly the most prominent candidate, because the
enrichment for specific functions (Figure 2C). Immune-related corresponding gene locus hosts five DA peaks from the
terms such as abnormal immune serum protein physiology and downregulated list in PFC (p = 1.1e-4; Table S6). The nuclear
lymphatic system disease were most prominently enriched in receptor motif enriched in ASD-upregulated peaks in TC could
this peak set in PFC, perhaps reflecting unique microglial (Rodri- relate to the glucocorticoid or mineralocorticoid receptor (GR
guez and Kern, 2011; Zhan et al., 2014) or lymphoid cell states or MR), because the corresponding gene promoters are
(Louveau et al., 2015) in ASD cortex. Downregulated peaks in marked by H3K27ac peaks. In contrast to the five to six motifs
TC showed similar immune-related enrichments (Table S3). overrepresented in DA Up peaks in each cortical region, SOX
DA-Down peaks in the two cortical regions were also enriched was the only motif enriched in DA Down peaks in TC and
near histone deacetylase genes, including HDAC2 and HDAC4 ETS the only motif in PFC. These binding site enrichment re-
(Pazin and Kadonaga, 1997). In particular, the syndromic autism sults potentially indicate the presence of master TFs that drive
gene HDAC4 (Williams et al., 2010) neighbored 16 downregu- dysregulation of groups of regulatory elements across the ASD
lated DA peaks in TC and 4 in PFC (Figure 2D; Table S4; note genome.
1388 Cell 167, 1385–1397, November 17, 2016

A PFC Up TC Up
cation transporter activity (GO MF) cation transporter activity (GO MF)
calmodulin binding (GO MF) SNARE binding (GO MF)
SNARE binding (GO MF) serotonin binding (GO MF)
synaptic transmission (GO BP) synaptic transmission (GO BP)
regulation of DNA replication (GO BP) zinc ion homeostasis (GO BP)
abnormal excitatory postsynaptic currents (MP) abnormal excitatory postsynaptic currents (MP)
epilepsy (DO) epilepsy (DO)
Betal adrenergic receptor signaling pathway (PP) glutamate receptor group III pathway (PP)
0 10 20 0 10 20
-log10 (P value) -log10 (P value)
B
chr12 2,200,000 2,900,000 chr12 13,500,000 14,300,000
PFC Up PFC Up
TC Up TC Up
CACNA1C GRIN2B
C PFC Down TC Down

PolII transcription factor activity (GO MF) histone deacetylase activity (GO MF)
polysarcharide binding (GO MF) sulfur compound binding (GO MF)
histone deacetylase binding (GO MF) epithelial cell proliferation in gland morphogenesis (GO BP)
embyonic organ development (GO BP) embyonic digestive tract morphogenesis (GO BP)
tissue morphogenesis (GO BP) negative regulation of carbohydrate metabolism (GO BP)
response to biotic stimulus (GO BP) increased muscle weight (MP)
nuclear chromatin (GO CC) polydactyly (DO)
abnormal immune serum protein physiology (MP) Interferon-gamma signaling pathway (PP)
lymphatic system disease (DO) CC chemokine, conserved site (InterPro)
Interleukin signaling pathway (PP)
0 10
C-type lectin (InterPro) -log10 (P value)
0 10 20
-log10 (P value)
D
chr7 50,700,000 50,850,000 chr2 240,000,000 240,300,000
PFC Down PFC Down
TC Down TC Down
GRB10 HDAC4
Figure 2. Enrichment Analysis of Up and Down DA Peaks

(A) Functional enrichment analysis of ASD-Up DA peaks in PFC and TC (GREAT tool). MF, molecular function; BP, biological process; MP, mouse phenotype; DO,
disease ontology; PP, PATHER pathway.
(B) ASD-Up DA peaks in the CACNA1C (chr12: 2,161,809-2,900,900) and GRIN2B (chr12: 13,427,172-14,303,010) gene loci. Only the DA peaks closest to
CACNA1C and GRIN2B are visible in these genomic windows.
(C) Functional enrichment analysis of ASD-Down DA peaks in PFC and TC (GREAT tool). CC, cellular component.
(D) ASD-Down DA peaks in the GRB10 (chr7: 50,657,760-50,861,159) and HDAC4 (chr2: 239,960,131-240,388,294) gene loci. Only the DA peaks closest to
GRB10 and HDAC4 are visible in these genomic windows.
See also Table S3.
Cell 167, 1385–1397, November 17, 2016 1389

Table 1. Top Five Genes Enriched in DA Peaks
Gene p Value Rank Raw p Value FDR Q Value Observed Expected Enrichment
PFC up
LOC100996286 (FBXW7 intron) 1 6.1e 13 2.1e 9 13 0.97 13
DEAR (FBXW7 intron) 2 2.1e 11 3.6e 8 8 0.37 22
MSRA 3 7.7e 9 9.1e 6 13 1.8 7.4
METTL24 4 4.7e 8 4.1e 5 7 0.46 15
SLC39A14 5 6.5e 8 4.6e 5 6 0.32 19
PFC down
LMOD3 1 5.7e 11 1.9e 7 12 1.0 12
NR2F2 2 3.7e 10 6.2e 7 7 0.31 22
ABCC4 3 1.2e 9 1.3e 6 9 0.63 14
FRMD4B 4 1.5e 8 1.3e 5 14 2.2 6.5
CX3CR1 5 3.9e 8 2.2e 5 7 0.45 16
TC up
SNAP25-AS1 1 1.0e 8 4.8e 5 9 0.81 11
GPM6A 2 2.7e 6 4.1e 3 9 1.3 6.8
SHANK2 3 3.1e 6 4.1e 3 8 1.0 7.7
LINC01616 4 3.6e 6 4.1e 3 5 0.35 14
NRG3-AS1 5 5.1e 6 4.6e 3 8 1.1 7.3
TC down
GRB10 1 4.3e 10 1.5e 6 16 2.4 6.5
FGFR2 2 6.7e 10 1.5e 6 17 2.8 6.0
CCL3L3/CCL3L1 3 3.1e 9 4.6e 6 8 0.59 13
LOC105375556 (CNTNAP2 intron) 4 1.4e 8 1.1e 5 8 0.66 12
CX3CR1 5 1.4e 8 1.1e 5 8 0.66 12
See also Table S6.
Developmental Stage Specificity of Epigenetically Histone Acetylation QTLs in Human Brain Regions
Dysregulated Loci Noncoding genetic variants that affect disease susceptibility
It has been shown that genes upregulated during early postnatal potentially act via a gene regulatory mechanism (Boyle et al.,
development are often differentially expressed in adolescent and 2012; Maurano et al., 2012). Because histone acetylation serves
adult ASD brain (Parikshak et al., 2013). We therefore asked as a measure of gene regulatory function, such variants are also
whether early postnatal genes might also be enriched for the likely to influence acetylation levels. It is therefore instructive to
ASD-related acetylation changes we detected in older subjects identify histone acetylation QTLs (haQTLs), which are defined
(R10 years old). Using a database of human RNA-seq profiles as genetic variants that correlate with population variation in his-
(BrainSpan, 2015), we defined the 2,000 genes most upregu- tone acetylation (del Rosario et al., 2015). As we and others have
lated at each developmental stage (fold change relative to me- previously shown (del Rosario et al., 2015; Grubert et al., 2015),
dian expression across all stages). We then tested for enrich- haQTLs can be used to prioritize causal variants within disease-
ment of DA peaks near each such gene set. This analysis was associated loci.
performed separately for PFC and TC, using expression profiles To identify haQTLs in the three human brain regions, we used
from the corresponding regions of the developing human brain. the G-SCI pipeline that was previously validated on lymphoblas-
As expected, ASD-Up DA peaks in the adult (more precisely, toid cell lines (del Rosario et al., 2015). The pipeline uses ChIP-
R10 year) brain were significantly overrepresented near adult- seq reads to call DNA sequence variants in active regulatory re-
upregulated genes (Figure 4). Surprisingly, however, we found gions, followed by filtering to remove low-confidence variants
even greater enrichment of ASD-Up DA peaks near genes upre- (STAR Methods). By analogy to exome sequencing, this stage
gulated at 10–12 months after birth, which corresponds to the of the pipeline can be termed ‘‘regulome sequencing.’’ A unique
stage of synapse formation, and neuronal maturation. In aspect of the G-SCI method is that called variants need not be
contrast, ASD-Down DA peaks did not show stage-specificity. explicitly genotyped. Rather, counts and base qualities of refer-
Thus, although chromatin aberrations in ASD affect genes with ence- and alternative-allele ChIP-seq reads are used to infer ge-
a broad variety of developmental specificities, genes upregu- notype likelihoods. These likelihoods are then used to compute
lated at or near 12 months after birth are particularly strongly the haQTL p value of the variant using the G-SCI test, which max-
associated with increased acetylation in ASD cortex. imizes statistical power by combining information from peak
1390 Cell 167, 1385–1397, November 17, 2016

A
Motif Fold
Peaks Motif name Protein Motif logo P-value Q-value
class enrichment
V$RFX1_02 RFX RFX 8e-31 <1e-4 2.03
V$VBP_01 PAR-BZIP E4BP4, HLF, TEF, DBP 1e-15 <1e-4 1.73
V$FRA1_Q5 BZIP AP-1 family M01267 3e-15 <1e-4 1.40

TC up
V$AR_01 ZFC4-NR GR, MR, AR, PR 1e-8 <1e-4 1.60
V$NRSF_01 ZFC2H2 NRSF 4e-8 <1e-4 4.25
V$MEF2_02 MADS MEF2 M00231 2e-7 <1e-4 1.36
TC down V$SOX9_B1 HMG SOX 9e-14 <1e-4 1.39
B
Motif Fold
Peaks Motif name Protein Motif logo P-value Q-value
class enrichment
V$RFX1_02 RFX RFX 4e-19 <1e-4 1.86
V$FRA1_Q5 BZIP AP-1 family 2e-10 <1e-4 1.34
PFC up V$E4BP4_01 PAR-BZIP E4BP4, HLF, TEF, DBP 9e-8 <1e-4 1.70
V$HSF2_01 HSF HSF 5e-7 1e-4 1.95
V$MEF2_02 MADS MEF2 M00231 1e-6 1e-4 1.37
PFC down V$SPIB_01 ETS ETS family M01204 2e-6 7e-4 1.34
Figure 3. Enrichment of Transcription Factor-Binding Motifs in DA Peaks

(A) Motifs significantly enriched in ASD-Up or ASD-Down DA peaks in TC (HOMER tool).
(B) Similar table, PFC.
height variability and allelic imbalance across all individuals independent analysis at a genome-wide level. We therefore in-
within the cohort. In order to separate the cis effect of regulatory tersected the haQTL set with genome-wide significant (p %
SNPs from more general disease effects, we first adjusted ChIP- 5e-8) variants known to be associated with shared aspects of
seq peak heights by regressing out the diagnosis variable (ASD five psychiatric disorders: schizophrenia, bipolar disorder, major
versus control). We then applied the G-SCI test to called SNPs depressive disorder, ASD, and attention-deficit/hyperactivity
and identified 2,000 haQTLs in each of the three brain regions disorder (Cross-Disorder Group of the Psychiatric Genomics
(Figure 5A; Table S7). Note that these haQTLs are not specific to Consortium, 2013). While this GWAS set was too small to test
ASD. Rather, they represent region-specific regulatory variation for statistical enrichment near haQTLs, we did uncover two in-
in the general population. stances where brain haQTLs were strongly linked (R2 R 0.8) to
GWAS analyses have not so far uncovered statistically sig- disease-associated variants (Table S7). Most notably, an haQTL
nificant ASD-associated variants that have been replicated in (rs4765905) in an intron of the syndromic ASD gene CACNA1C
Cell 167, 1385–1397, November 17, 2016 1391

A PFC: ASD-Up DA peaks B TC: ASD-Up DA peaks Figure 4. Enrichment of DA Peaks near
Genes Upregulated at Specific Stages of
1.76 40 1.50
60 Brain Development
-log10(FDR)
-log10(FDR)
30 (A) ASD-Up DA peaks in PFC are most significantly
40 enriched near genes upregulated 1 year after
1.65 1.49 20 1.36 birth. Bar height indicates enrichment Q value
20 (FDR). Numbers above bars indicate fold enrich-
10
1.15 1.15 ment (Q % 0.05).
1.13
0 0 (B) Similar plot, TC.
10mos
8-40yrs
4mos
4yrs
4mos
10mos
4yrs
8-40yrs
25pcw
37pcw
8-9pcw
12-16pcw
17-19pcw
21-24pcw
26pcw
8-9pcw
12-16pcw
17-19pcw
21-24pcw
26pcw
37pcw
1yr
1yr
was linked to multiple psychiatric disease-associated SNPs peaks for genetic differentiation between patients and controls
within the locus (Figure 5B). Based on Hi-C data from (chi-square test). The distribution of genetic differentiation p
GM12878 cells (Jin et al., 2013), the putative enhancer contain- values was close to uniform (data not shown), suggesting that
ing this haQTL SNP was predicted to form a long-range loop genetic variation in cis SNPs is not a major contributor to case-
to the CACNA1C promoter, suggesting that it could exert its in- control acetylation differences at DA peaks. It is thus likely that
fluence on psychiatric disease by modulating the chromatin ASD-specific differential acetylation is driven mostly by other
state of CACNA1C. In addition, we intersected haQTLs with factors such as environmental influences, SNPs in trans (at a
128 SNPs associated with schizophrenia in a recent large-scale different locus), indels, and larger chromosomal variants (Krumm
meta-analysis of schizophrenia (Ripke et al., 2014). This analysis et al., 2015).
revealed two additional haQTLs strongly linked to psychiatric Overall, acetylation changes in ASD cerebral cortex were
disease-associated variants (Table S7). For example, we found significantly correlated with differential gene expression, consis-
that the haQTL SNP rs8054791 was linked to the schizo- tent with the known functional consequences of these alterations
phrenia-associated variant rs9922678 in an intron of GRIN2A, in chromatin structure (Figure 1F). However, the majority of DA
a glutamate receptor gene that has also been associated with peaks did not lie next to DE genes. This is consistent with previ-
ASD (Figure 5C). ous studies; we and others have shown that differences in chro-
matin state between two sample types are only moderately
DISCUSSION correlated with differential expression (Kumar et al., 2013; Yen
and Kellis, 2015). Differences in the sensitivity of ChIP-seq and
Despite etiological heterogeneity, our results indicate that RNA-seq at various loci could provide one explanation for this
shared aberrations in histone acetylation are widespread in phenomenon. For example, post-mortem RNA degradation or
ASD cerebral cortex: over 5,000 enhancer or promoter loci low steady-state mRNA levels could reduce the detectability of
were systematically shifted up or down (Figure 1B). The fact DE genes in some cases, while low read mappability or occlusion
that histone acetylation changes were broadly similar between of the acetylated epitope (for example) could limit the sensitivity
PFC and TC indicates similarity in ASD mechanisms across of DA peak analysis at other loci. Moreover, noise levels could
cortical regions and also suggests that our results on differential vary between the mRNA and chromatin readouts at individual
acetylation are unlikely to represent methodological artifacts. loci, resulting in differential statistical power. Finally, although
Note that, as expected for a complex disorder with highly hetero- histone acetylation and gene expression are correlated in gen-
geneous etiology, this global signature of chromatin alteration is eral, post-transcriptional regulation, other histone modifications,
not shared by all ASD samples (Figure 1C). An earlier transcrip- DNA methylation status, and the influence of additional regulato-
tomic study revealed a similar pattern of changes shared by ry elements within the same locus could all contribute to genuine
many, but not all, ASD cases (Voineagu et al., 2011). Neverthe- biological differences between mRNA fold change and acetyla-
less, the fact that the majority of patients conform to a single tion shifts. Thus, case-control chromatin profiling could serve
global epigenomic pattern indicates that the diverse causal as a valuable complement to the more common strategy of tran-
mechanisms of ASD have shared downstream effects on the scriptomic profiling by highlighting novel disease mechanisms.
acetylome. In contrast to cerebral cortex, only 247 loci were We found evidence for shared pathways and functional
found to be perturbed in cerebellum, indicating that the former themes among DA loci in ASD cerebral cortex (Figure 2). Among
is affected to a much greater degree. This disparity between loci with increased H3K27ac, there was strong enrichment for
ASD cerebrum and cerebellum has also been observed at the genes related to ion channels, synaptic function, and epilepsy/
transcriptomic level (Voineagu et al., 2011). Syndromic dup15q neuronal excitability, all of which have previously been shown
cases showed acetylome alterations that were highly correlated to be dysregulated in this disorder (Voineagu et al., 2011; Bour-
with those observed in idiopathic ASD (R R 0.87 in cerebral geron, 2015). Moreover, these adult DA loci were strongly en-
cortex), suggesting that most chromatin aberrations are shared riched for genes developmentally upregulated at or around
between idiopathic ASD and this syndromic form. 12 months of life (Figure 4), which coincides with the peak of
To examine the genetic basis of the epigenomic aberrations early experience-dependent synaptogenesis. A similar temporal
detected in ASD, we tested all high-coverage SNPs within DA enrichment has also been observed for cerebral DE genes in
1392 Cell 167, 1385–1397, November 17, 2016

A PFC TC CB Figure 5. Histone Acetylation Quantitative
117(6%) 66(3%) 56(2%) Trait Loci and Linkage with GWAS SNPs
(A) Pie chart, number of histone acetylation quan-
In dbSNP titative trait loci (haQTLs) called in PFC, TC, and CB.
(B) A SNP within an intron of CACNA1C is an haQTL
1,795(94%) 1,946(97%) 2,199(98%) Novel
(Q = 1.5e-4) in CB and is in LD with four genome-
wide significant SNPs from GWAS of five psychi-
atric disorders (Table S7). Histone acetylation
B CACNA1C tracks (chr12: 2,343,551-2,354,513), read depth
chr12: 2,346,000 2,353,000 analysis (bar graph) and peak height boxplots
haQTL:rs4765905
GG indicate that the reference ‘‘G’’ allele has higher
histone acetylation than the non-reference ‘‘C’’
GG
allele. All acetylation tracks are plotted on the same
GC
fold-enrichment scale (y axis: 0–120).
GC (C) A SNP within an intron of GRIN2A (chr16:
CC 9,936,580-9,951,221) is an haQTL (Q = 2.2e-2)
CC in TC and CB and is in LD with a genome-
wide significant SNP (rs9922678; R2 = 0.91) for
50 Reference reads schizophrenia. Histone acetylation tracks (chr12:
0.6
Nonreference reads 2,343,551-2,354,513), read depth analysis (bar
40
Peak heights
graph), and peak height boxplots indicate that the

Read count
30 reference ‘‘A’’ allele has higher histone acetylation

0 than the non-reference ‘‘G’’ allele. All acetylation
20 tracks are plotted on the same fold-enrichment
scale (y axis: 0–50).
10
-0.6 See also Table S7.
0 GG GC CC
Individuals
C GRIN2A
chr16: 9,940,000 haQTL: rs8054791 9,950,000
AA The above functional enrichments have
AA intriguing links to ASD epidemiology and
AG results from model organisms. In addition
AG to the well-studied roles of synaptic, ion-
GG channel, and glutamate-pathway genes
GG in ASD (Schmunk and Gargus, 2013; Par-
ikshak et al., 2015), exposure to HDAC in-
30 Reference reads 0.6 hibitors in utero has been linked to ASD
Nonreference reads and ASD-like symptoms in humans and
Peak heights
has also been demonstrated to cause so-

Read count
20
0
cial deficits in rodents (Chomiak et al.,
2013; Christensen et al., 2013). HDAC
10
suppression could thus be a common
-0.4 epigenomic feature of ASD. Chemokine
0 AA AG GG pathway changes in ASD are also plau-
Individuals
sible. Suppression of the chemokine re-
ceptor gene CX3CR1, which flanks eight
ASD (Parikshak et al., 2013). Loci with decreased acetylation in downregulated peaks in TC (p = 1.4e-8, Table 1), causes micro-
ASD also converged on shared functional categories, such as glial activation (Wolf et al., 2013). Moreover, CX3CR1 knockout
digestive tract morphogenesis, chemokine signaling, HDAC ac- mice have two phenotypes observed in autism: impaired
tivity, and immune processes related to microglia. Note that it is social interaction and increased repetitive behavior (Zhan
possible for functional categories to appear systematically en- et al., 2014). Finally, the enrichment of downregulated DA peaks
riched in DA peaks merely because of the contribution of a single near digestive tract morphogenesis genes could point to the
highly enriched ‘‘jackpot’’ gene. However, our results are likely to existence of pleiotropic loci potentially contributing to the co-
be robust to such artifacts, because we discarded functional morbidity of gastrointestinal problems with ASD (McElhanon
terms that had fewer than five genes near DA peaks and then et al., 2014).
manually inspected the remaining top hits (shown in Figure 2) In addition to pathway-level chromatin aberrations, we found
for jackpot effects. While the primary causes of ASD are highly strong enrichment of DA peaks near individual genes. The che-
heterogeneous, it appears that they nevertheless converge on mokine pathway genes CCL3L1/CCL3L3 (p = 3.1e-9) and
shared downstream epigenomic changes associated with spe- CX3CR1 (p = 1.4e-8) were both among the top five genome-
cific functions. It is possible that these shared chromatin alter- wide for enrichment in downregulated TC peaks (Tables 1 and
ations could in turn drive some of the shared symptoms of ASD. S6). The top-ranked gene in the same downregulated peak list
Cell 167, 1385–1397, November 17, 2016 1393

was GRB10 (p = 4.3e-10), an imprinted gene expressed via the a mechanistic intermediary between genotype and phenotype.
paternal allele in neurons and the maternal allele in most other As the number of genetic association studies increases and
adult mouse tissues (Plasschaert and Bartolomei, 2015). Dele- cohort sizes grow ever larger, the haQTLs identified here will
tion of the paternal allele specifically affects social behavior in serve as a valuable resource for mapping causal regulatory
mice (Garfield et al., 2011). Moreover, the GRB10-interacting mutations within brain disease-associated loci. This is, to the
GYF proteins GIGYF1 and GIGYF2 are known to harbor de best of our knowledge, the first cohort-scale HAWAS study,
novo loss-of-function mutations in ASD (Krumm et al., 2015). and as such it lays a foundation for future studies of histone
At a functional level, GRB10 mediates a negative feedback modification changes in disease. Moreover, this initial analysis
loop that damps mTORC1 signaling (Yu et al., 2011), a pathway of human brain haQTLs paves the way for multiple future studies
with multiple links to ASD. mTORC1 hyperactivity alters the syn- of chromatin-altering variants in primary samples based on
aptic excitation/inhibition ratio and causes multiple autism-like ChIP-seq, DNase-seq (Degner et al., 2012), assay for transpo-
symptoms in mice (Gkogkas et al., 2013). In addition, mTORC1 sase-accessible chromatin using sequencing (ATAC-seq), and
is negatively regulated by four syndromic autism genes (Wang other assays.
and Doering, 2013). Thus, GRB10 deacetylation could represent
a common epigenetic mechanism of idiopathic ASD via a STAR+METHODS
pathway that is also affected by rare genetic variants in syn-
dromic ASD. HDAC4 provides yet another example of mecha- Detailed methods are provided in the online version of this paper
nistic parallelism between rare genetic and common epigenetic and include the following:
mechanisms. The HDAC4 gene is mutated in a syndromic form
of ASD (Williams et al., 2010) and flanks 16 peaks deacetylated d KEY RESOURCES TABLE
in ASD (Table S4). The syndromic ASD gene CNTNAP2 provides d CONTACT FOR REAGENT AND RESOURCE SHARING
yet another example—it ranks fourth in the genome for deacety- d EXPERIMENTAL MODEL AND SUBJECT DETAILS
lated TC peaks in ASD (Table 1). On a broader scale, the conver- B Human Subjects
gence of rare genetic mutations and common ‘‘epimutations’’ on d METHOD DETAILS
similar pathways in ASD is supported by the genome-wide sim- B ChIP-Seq on Brain Tissue
ilarity of histone acetylation changes between dup15q syndrome B Read Alignment and Peak Calling
and idiopathic ASD (Figure 1E). B Peak Height Normalization
Among the TFs highlighted by motif analysis of DA peaks up- B Quality Control of 229 ChIP-Seq Datasets
regulated in ASD, the neurodevelopmental factor MEF2C (Li B Removal of Confounding Factors
et al., 2008) has substantial evidence for genetic association B Analysis of Differentially Acetylated (DA) Peaks
with ASD (Novara et al., 2010; Neale et al., 2012). Encouragingly, B Functional Enrichment of DA Peaks
it has also been identified through motif analysis of co-regulated B Enrichment of DA Loci for Expression at Specific
gene networks containing ASD risk genes (Parikshak et al., Developmental Stages
2013). The MEF2 complex is known to interact with HDAC4 B Motif Analysis
(Gruffat et al., 2002), which raises the hypothesis that downregu- B SNP-Calling Pipeline
lation of HDACs in ASD cerebral cortex could relieve the repres- B haQTL Calling
sion of MEF2C target sites, thus increasing their histone acetyla- B LD between Pyschiatric Disorder GWAS SNPs and
tion level. AP-1, another TF enriched in DA peaks, has also been haQTLs
shown to interact with HDAC4 (Yamaguchi et al., 2005). As the d QUANTIFICATION AND STATISTICAL ANALYSIS
most highly enriched motif in PFC (p = 4e-19) and TC (p = 8e- B Statistical Method of Computation
31) DA peaks, RFX is particularly noteworthy. Though there is B Inclusion and Exclusion Criteria of Any Data
no known genetic association of RFX TFs with ASD, members d DATA AND SOFTWARE AVAILABILITY
of this family play key roles in neurodevelopment (Benadiba B Data Resources
et al., 2012; Bae et al., 2014), and our results raise the hypothesis
that they could serve as mediators of diverse upstream causal SUPPLEMENTAL INFORMATION
factors. The enrichment of MR binding sites at upregulated
Supplemental Information includes five figures and seven tables and
peaks in TC is also noteworthy—the NR3C2 gene encoding
can be found with this article online at http://dx.doi.org/10.1016/j.cell.2016.
MR was recently shown to be significantly associated with 10.031.
autism in a recent exome-sequencing study (De Rubeis et al.,
2014). AUTHOR CONTRIBUTIONS
Although ASD is the focus of the current study, the haQTLs
detected in PFC, TC, and CB (Table S7) are not specific to J.P. prepared ChIP-seq libraries with help from H.S.H., R.R., and B.E. W.S.
ASD. Thus, they can serve as a resource for prioritizing causal carried out data analysis. R.C.-H.d.R. helped to call haQTLs and intersected
SNPs for a broad range of brain-related disorders. For example, them with GWAS data. J.P. performed GREAT and motif analysis. V.K. per-
formed the comparison of histone acetylation and gene expression. N.N.P.,
the haQTL set included candidate causal SNPs at four GWAS
T.G.B., and C.C.Y.W. helped with data sample management, data interpreta-
loci for schizophrenia and other psychiatric disorders, including tion and analysis. W.S. and S.P. wrote the manuscript with help from J.P. and
GRIN2A and CACNA1C (Figure 5; Table S7). At these loci, we comments from D.H.G. and J.M. D.H.G. and S.P. provided guidance on all ex-
hypothesize that perturbed histone acetylation could constitute periments and analyses.
1394 Cell 167, 1385–1397, November 17, 2016

ACKNOWLEDGMENTS de la Torre-Ubieta, L., Won, H., Stein, J.L., and Geschwind, D.H. (2016).
Advancing the understanding of autism disease mechanisms through ge-
This work was funded by PsychENCODE Grant 1R01MH094714 and the netics. Nat. Med. 22, 345–361.
Agency for Science, Technology and Research (A*STAR), Singapore. We are De Rubeis, S., He, X., Goldberg, A.P., Poultney, C.S., Samocha, K., Cicek,
grateful to the patients and families who participate in the tissue programs A.E., Kou, Y., Liu, L., Fromer, M., Walker, S., et al.; DDD Study; Homozygosity
from which our samples are obtained. Human tissue was obtained from the Mapping Collaborative for Autism; UK10K Consortium (2014). Synaptic, tran-
Autism BrainNet (sponsored by the Simons Foundation and Autism Speaks, scriptional and chromatin genes disrupted in autism. Nature 515, 209–215.
formerly the Autism Tissue Program) and the University of Maryland Brain
and Tissue Bank, which is a component of the NIH NeuroBioBank and the Degner, J.F., Pai, A.A., Pique-Regi, R., Veyrieras, J.-B.B., Gaffney, D.J., Pick-
Oxford Brain Bank. rell, J.K., De Leon, S., Michelini, K., Lewellen, N., Crawford, G.E., et al. (2012).
DNase I sensitivity QTLs are a major determinant of human expression varia-
Received: January 23, 2016 tion. Nature 482, 390–394.
Revised: July 14, 2016 del Rosario, R.C., Poschmann, J., Rouam, S.L., Png, E., Khor, C.C., Hibberd,
Accepted: October 18, 2016 M.L., and Prabhakar, S. (2015). Sensitive detection of chromatin-altering poly-
Published: November 17, 2016 morphisms reveals autoimmune disease mechanisms. Nat. Methods 12,
458–464.
REFERENCES DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C.,
Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., et al. (2011). A frame-
Abrahams, B.S., and Geschwind, D.H. (2010). Connecting genes to brain in the work for variation discovery and genotyping using next-generation DNA
autism spectrum disorders. Arch. Neurol. 67, 395–399. sequencing data. Nat. Genet. 43, 491–498.
Abrahams, B.S., Arking, D.E., Campbell, D.B., Mefford, H.C., Morrow, E.M.,
Devlin, B., and Scherer, S.W. (2012). Genetic architecture in autism spectrum
Weiss, L.A., Menashe, I., Wadkins, T., Banerjee-Basu, S., and Packer, A.
disorder. Curr. Opin. Genet. Dev. 22, 229–237.
(2013). SFARI Gene 2.0: a community-driven knowledgebase for the autism
spectrum disorders (ASDs). Mol. Autism 4, 36. Garfield, A.S., Cowley, M., Smith, F.M., Moorwood, K., Stewart-Cox, J.E.,
Gilroy, K., Baker, S., Xia, J., Dalley, J.W., Hurst, L.D., et al. (2011). Distinct
Akbarian, S., Liu, C., Knowles, J.A., Vaccarino, F.M., Farnham, P.J., Crawford,
physiological and behavioural functions for parental alleles of imprinted
G.E., Jaffe, A.E., Pinto, D., Dracheva, S., Geschwind, D.H., et al.; PsychEN-
Grb10. Nature 469, 534–538.
CODE Consortium (2015). The PsychENCODE project. Nat. Neurosci. 18,
1707–1712. Geschwind, D.H., and State, M.W. (2015). Gene hunting in autism spectrum
Bae, B.I., Tietjen, I., Atabay, K.D., Evrony, G.D., Johnson, M.B., Asare, E., disorder: on the path to precision medicine. Lancet Neurol. 14, 1109–1120.
Wang, P.P., Murayama, A.Y., Im, K., Lisgo, S.N., et al. (2014). Evolutionarily dy- Gkogkas, C.G., Khoutorsky, A., Ran, I., Rampakakis, E., Nevarko, T., Weath-
namic alternative splicing of GPR56 regulates regional cerebral cortical erill, D.B., Vasuta, C., Yee, S., Truitt, M., Dallaire, P., et al. (2013). Autism-
patterning. Science 343, 764–768. related deficits via dysregulated eIF4E-dependent translational control. Nature
Benadiba, C., Magnani, D., Niquille, M., Morlé, L., Valloton, D., Nawabi, H., 493, 371–377.
Ait-Lounis, A., Otsmane, B., Reith, W., Theil, T., et al. (2012). The ciliogenic Grabrucker, A.M. (2013). Environmental factors in autism. Front. Psychiatry
transcription factor RFX3 regulates early midline distribution of guidepost neu- 3, 118.
rons required for corpus callosum development. PLoS Genet. 8, e1002606.
Bernstein, B.E., Stamatoyannopoulos, J.A., Costello, J.F., Ren, B., Milosavl- Greenside, P., Srivas, R., Phanstiel, D.H., Pekowska, A., et al. (2015). Genetic
jevic, A., Meissner, A., Kellis, M., Marra, M.A., Beaudet, A.L., Ecker, J.R., control of chromatin states in humans involves local and distal chromosomal
et al. (2010). The NIH Roadmap Epigenomics Mapping Consortium. Nat. Bio- interactions. Cell 162, 1051–1065.
technol. 28, 1045–1048.
Gruffat, H., Manet, E., and Sergeant, A. (2002). MEF2-mediated recruitment of
Bolstad, B.M., Irizarry, R.A., Astrand, M., and Speed, T.P. (2003). A compari-
class II HDAC at the EBV immediate early gene BZLF1 links latency and chro-
son of normalization methods for high density oligonucleotide array data
matin remodeling. EMBO Rep. 3, 141–146.
based on variance and bias. Bioinformatics 19, 185–193.
Guintivano, J., Aryee, M.J., and Kaminsky, Z.A. (2013). A cell epigenotype spe-
Bourgeron, T. (2015). From the genetic architecture to synaptic plasticity in
cific model for the correction of brain cellular heterogeneity bias and its appli-
autism spectrum disorder. Nat. Rev. Neurosci. 16, 551–563.
cation to age, brain region and major depression. Epigenetics 8, 290–302.
Boyle, A.P., Hong, E.L., Hariharan, M., Cheng, Y., Schaub, M.A., Kasowski, M.,
Karczewski, K.J., Park, J., Hitz, B.C., Weng, S., et al. (2012). Annotation of Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocin-
functional variation in personal genomes using RegulomeDB. Genome Res. ski, F., Aken, B.L., Barrell, D., Zadissa, A., Searle, S., et al. (2012). GENCODE:
22, 1790–1797. the reference human genome annotation for The ENCODE Project. Genome
Res. 22, 1760–1774.
BrainSpan (2015). BrainSpan: Atlas of the Developing Human Brain [Internet].
Available from: http://brainspan.org. Heintzman, N.D., Hon, G.C., Hawkins, R.D., Kheradpour, P., Stark, A., Harp,
L.F., Ye, Z., Lee, L.K., Stuart, R.K., Ching, C.W., et al. (2009). Histone modifi-
Chomiak, T., Turner, N., and Hu, B. (2013). What We Have Learned about
cations at human enhancers reflect global cell-type-specific gene expression.
Autism Spectrum Disorder from Valproic Acid. Pathol. Res. Int. 2013, 712758.
Nature 459, 108–112.
Christensen, J., Grønborg, T.K., Sørensen, M.J., Schendel, D., Parner, E.T.,
Pedersen, L.H., and Vestergaard, M. (2013). Prenatal valproate exposure Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X.,
and risk of autism spectrum disorders and childhood autism. JAMA 309, Murre, C., Singh, H., and Glass, C.K. (2010). Simple combinations of lineage-
1696–1703. determining transcription factors prime cis-regulatory elements required for
macrophage and B cell identities. Mol. Cell 38, 576–589.
Creyghton, M.P., Cheng, A.W., Welstead, G.G., Kooistra, T., Carey, B.W.,
Steine, E.J., Hanna, J., Lodato, M.A., Frampton, G.M., Sharp, P.A., et al. Jin, F., Li, Y., Dixon, J.R., Selvaraj, S., Ye, Z., Lee, A.Y., Yen, C.-A.A., Schmitt,
(2010). Histone H3K27ac separates active from poised enhancers and pre- A.D., Espinoza, C.A., and Ren, B. (2013). A high-resolution map of the three-
dicts developmental state. Proc. Natl. Acad. Sci. USA 107, 21931–21936. dimensional chromatin interactome in human cells. Nature 503, 290–294.
Cross-Disorder Group of the Psychiatric Genomics Consortium (2013). Identi- Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M.,
fication of risk loci with shared effects on five major psychiatric disorders: a and Haussler, D. (2002). The human genome browser at UCSC. Genome Res.
genome-wide analysis. Lancet 381, 1371–1379. 12, 996–1006.
Cell 167, 1385–1397, November 17, 2016 1395

Krumm, N., Turner, T.N., Baker, C., Vives, L., Mohajeri, K., Witherspoon, K., in sporadic autism spectrum disorders identifies severe de novo mutations.
Raja, A., Coe, B.P., Stessman, H.A., He, Z.-X.X., et al. (2015). Excess of Nat. Genet. 43, 585–589.
rare, inherited truncating mutations in autism. Nat. Genet. 47, 582–588. Parikshak, N.N., Luo, R., Zhang, A., Won, H., Lowe, J.K., Chandran, V., Horvath,
Kubota, T., Miyake, K., and Hirasawa, T. (2012). Epigenetic understanding of S., and Geschwind, D.H. (2013). Integrative functional genomic analyses impli-
gene-environment interactions in psychiatric disorders: a new concept of cate specific molecular pathways and circuits in autism. Cell 155, 1008–1021.
clinical genetics. Clin. Epigenetics 4, 1. Parikshak, N.N., Gandal, M.J., and Geschwind, D.H. (2015). Systems biology
Kumar, V., Muratani, M., Rayan, N.A., Kraus, P., Lufkin, T., Ng, H.H., and and gene networks in neurodevelopmental and neurodegenerative disorders.
Prabhakar, S. (2013). Uniform, optimal signal processing of mapped deep- Nat. Rev. Genet. 16, 441–458.
sequencing data. Nat. Biotechnol. 31, 615–622. Parikshak, N.N., Swarup, V., Belgard, T.G., Irimia, M., Ramaswami, G., Gan-
Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with dal, M.J., Hartl, C., Leppa, V., de la Torre Ubieta, L., Huang, J., et al. (2016).
Burrows-Wheeler transform. Bioinformatics 25, 1754–1760. Genome-wide changes in lncRNA, alternative splicing, and cortical patterning
Li, H., Radford, J.C., Ragusa, M.J., Shea, K.L., McKercher, S.R., Zaremba, in autism. Nature. http://dx.doi.org/10.1038/nature20612.
J.D., Soussou, W., Nie, Z., Kang, Y.J., Nakanishi, N., et al. (2008). Transcription Pazin, M.J., and Kadonaga, J.T. (1997). What’s up and down with histone de-
factor MEF2C influences neural stem/progenitor cell differentiation and matu- acetylation and transcription? Cell 89, 325–328.
ration in vivo. Proc. Natl. Acad. Sci. USA 105, 9397–9402. Plasschaert, R.N., and Bartolomei, M.S. (2015). Tissue-specific regulation and
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., function of Grb10 during growth and neuronal commitment. Proc. Natl. Acad.
Abecasis, G., and Durbin, R.; 1000 Genome Project Data Processing Sub- Sci. USA 112, 6841–6847.
group (2009). The Sequence Alignment/Map format and SAMtools. Bioinfor- Quail, M.A., Kozarewa, I., Smith, F., Scally, A., Stephens, P.J., Durbin, R.,
matics 25, 2078–2079. Swerdlow, H., and Turner, D.J. (2008). A large genome center’s improvements
Loke, Y.J., Hannan, A.J., and Craig, J.M. (2015). The role of epigenetic change to the Illumina sequencing system. Nat. Methods 5, 1005–1010.
in autism spectrum disorders. Front. Neurol. 6, 107. Ripke, S., Neale, B., Corvin, A., Walters, J., Farh, K.-H., Holmans, P., Lee, P.,
Louveau, A., Smirnov, I., Keyes, T.J., Eccles, J.D., Rouhani, S.J., Peske, J.D., Bulik-Sullivan, B., Collier, D., Huang, H., et al.; Schizophrenia Working Group
Derecki, N.C., Castle, D., Mandell, J.W., Lee, K.S., et al. (2015). Structural and of the Psychiatric Genomics Consortium (2014). Biological insights from 108
functional features of central nervous system lymphatic vessels. Nature 523, schizophrenia-associated genetic loci. Nature 511, 421–427.
337–341. Rodriguez, J.I., and Kern, J.K. (2011). Evidence of microglial activation in
Lunnon, K., Smith, R., Hannon, E., De Jager, P.L., Srivastava, G., Volta, M., autism and its possible role in brain underconnectivity. Neuron Glia Biol. 7,
Troakes, C., Al-Sarraj, S., Burrage, J., Macdonald, R., et al. (2014). Methylomic 205–213.
profiling implicates cortical deregulation of ANK1 in Alzheimer’s disease. Nat. Sanders, S.J., Murtha, M.T., Gupta, A.R., Murdoch, J.D., Raubeson, M.J., Will-
Neurosci. 17, 1164–1170. sey, A.J., Ercan-Sencicek, A.G., DiLullo, N.M., Parikshak, N.N., Stein, J.L.,
Matelski, L., and Van de Water, J. (2016). Risk factors in autism: thinking et al. (2012). De novo mutations revealed by whole-exome sequencing are
outside the brain. J. Autoimmun. 67, 1–7. strongly associated with autism. Nature 485, 237–241.
Mathelier, A., Fornes, O., Arenillas, D.J., Chen, C.-Y.Y., Denay, G., Lee, J., Shi, Schmunk, G., and Gargus, J.J. (2013). Channelopathy pathogenesis in autism
W., Shyr, C., Tan, G., Worsley-Hunt, R., et al. (2016). JASPAR 2016: a major spectrum disorders. Front. Genet. 4, 222.
expansion and update of the open-access database of transcription factor Shulha, H.P., Cheung, I., Whittle, C., Wang, J., Virgil, D., Lin, C.L., Guo, Y., Les-
binding profiles. Nucleic Acids Res. 44(D1), D110–D115. sard, A., Akbarian, S., and Weng, Z. (2012). Epigenetic signatures of autism:
Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., trimethylated H3K4 landscapes in prefrontal neurons. Arch. Gen. Psychiatry
Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., et al. (2006). TRANSFAC 69, 314–324.
and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Splawski, I., Timothy, K.W., Sharpe, L.M., Decher, N., Kumar, P., Bloise, R.,
Nucleic Acids Res. 34, D108–D110. Napolitano, C., Schwartz, P.J., Joseph, R.M., Condouris, K., et al. (2004).
Maurano, M.T., Humbert, R., Rynes, E., Thurman, R.E., Haugen, E., Wang, H., Ca(V)1.2 calcium channel dysfunction causes a multisystem disorder including
Reynolds, A.P., Sandstrom, R., Qu, H., Brody, J., et al. (2012). Systematic arrhythmia and autism. Cell 119, 19–31.
localization of common disease-associated variation in regulatory DNA. Sci- Stein, J.L., Parikshak, N.N., and Geschwind, D.H. (2013). Rare inherited varia-
ence 337, 1190–1195. tion in autism: beginning to see the forest and a few trees. Neuron 77, 209–211.
McElhanon, B.O., McCracken, C., Karpen, S., and Sharp, W.G. (2014). Gastro- The 1000 Genomes Project Consortium, Abecasis, G.R., Auton, A., Brooks,
intestinal symptoms in autism spectrum disorder: a meta-analysis. Pediatrics L.D., DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth,
133, 872–883. G.T., and McVean, G.A. (2012). An integrated map of genetic variation from
McLean, C.Y., Bristor, D., Hiller, M., Clarke, S.L., Schaar, B.T., Lowe, C.B., 1,092 human genomes. Nature 491, 56–65.
Wenger, A.M., and Bejerano, G. (2010). GREAT improves functional interpre- RefSeq. (2002). The Reference Sequence Project. The NCBI Handbook. Be-
tation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501. thesda (MD): National Library of Medicine, National Center for Biotechnology
Mill, J., and Heijmans, B.T. (2013). From promises to practical strategies in Information. http://www.ncbi.nlm.nih.gov/books/NBK21091/.
epigenetic epidemiology. Nat. Rev. Genet. 14, 585–594. van de Geijn, B., McVicker, G., Gilad, Y., and Pritchard, J.K. (2015). WASP:
Montano, C., Taub, M.A., Jaffe, A., Briem, E., Feinberg, J.I., Trygvadottir, R., allele-specific software for robust molecular quantitative trait locus discovery.
Idrizi, A., Runarsson, A., Berndsen, B., Gur, R.C., et al. (2016). Association of Nat. Methods 12, 1061–1063.
DNA methylation differences with schizophrenia in an epigenome-wide asso- Voineagu, I., Wang, X., Johnston, P., Lowe, J.K., Tian, Y., Horvath, S., Mill, J.,
ciation study. JAMA Psychiatry 73, 506–514. Cantor, R.M., Blencowe, B.J., and Geschwind, D.H. (2011). Transcriptomic
Neale, B.M., Kou, Y., Liu, L., Ma’ayan, A., Samocha, K.E., Sabo, A., Lin, C.F., analysis of autistic brain reveals convergent molecular pathology. Nature
Stevens, C., Wang, L.S., Makarov, V., et al. (2012). Patterns and rates of exonic 474, 380–384.
de novo mutations in autism spectrum disorders. Nature 485, 242–245. Wang, H., and Doering, L.C. (2013). Reversing autism by targeting down-
Novara, F., Beri, S., Giorda, R., Ortibus, E., Nageshappa, S., Darra, F., Dalla stream mTOR signaling. Front. Cell. Neurosci. 7, 28.
Bernardina, B., Zuffardi, O., and Van Esch, H. (2010). Refining the phenotype Wang, Z., Zang, C., Rosenfeld, J.A., Schones, D.E., Barski, A., Cuddapah, S.,
associated with MEF2C haploinsufficiency. Clin. Genet. 78, 471–477. Cui, K., Roh, T.-Y.Y., Peng, W., Zhang, M.Q., and Zhao, K. (2008). Combinato-
O’Roak, B.J., Deriziotis, P., Lee, C., Vives, L., Schwartz, J.J., Girirajan, S., Kar- rial patterns of histone acetylations and methylations in the human genome.
akoc, E., Mackenzie, A.P., Ng, S.B., Baker, C., et al. (2011). Exome sequencing Nat. Genet. 40, 897–903.
1396 Cell 167, 1385–1397, November 17, 2016

Williams, S.R., Aldred, M.A., Der Kaloustian, V.M., Halal, F., Gowans, G., Yasuda, H., Yoshida, K., Yasuda, Y., and Tsutsui, T. (2011). Infantile zinc defi-
McLeod, D.R., Zondag, S., Toriello, H.V., Magenis, R.E., and Elsea, S.H. ciency: association with autism spectrum disorders. Sci. Rep. 1, 129.
(2010). Haploinsufficiency of HDAC4 causes brachydactyly mental retardation
Yen, A., and Kellis, M. (2015). Systematic chromatin state comparison of epi-
syndrome, with brachydactyly type E, developmental delays, and behavioral
genomes associated with diverse properties including sex and tissue type.
problems. Am. J. Hum. Genet. 87, 219–228.
Nat. Commun. 6, 7973.
Willsey, A.J., Sanders, S.J., Li, M., Dong, S., Tebbenkamp, A.T., Muhle, R.A.,
Reilly, S.K., Lin, L., Fertuzinhos, S., Miller, J.A., et al. (2013). Coexpression net- Yu, Y., Yoon, S.-O.O., Poulogiannis, G., Yang, Q., Ma, X.M., Villén, J., Kubica,
works implicate human midfetal deep cortical projection neurons in the path- N., Hoffman, G.R., Cantley, L.C., Gygi, S.P., and Blenis, J. (2011). Phospho-
ogenesis of autism. Cell 155, 997–1007. proteomic analysis identifies Grb10 as an mTORC1 substrate that negatively
Wolf, Y., Yona, S., Kim, K.-W., and Jung, S. (2013). Microglia, seen from the regulates insulin signaling. Science 332, 1322–1326.
CX3CR1 angle. Front. Cell. Neurosci. 7, 26. Zhan, Y., Paolicelli, R.C., Sforazzini, F., Weinhard, L., Bolasco, G., Pagani, F.,
Yamaguchi, K., Lantowski, A., Dannenberg, A.J., and Subbaramaiah, K. Vyssotski, A.L., Bifone, A., Gozzi, A., Ragozzino, D., and Gross, C.T. (2014).
(2005). Histone deacetylase inhibitors suppress the induction of c-Jun and Deficient neuron-microglia signaling results in impaired functional brain con-
its target genes including COX-2. J. Biol. Chem. 280, 32569–32577. nectivity and social behavior. Nat. Neurosci. 17, 400–406.
Cell 167, 1385–1397, November 17, 2016 1397

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
H3K27ac Active Motif Cat#39133; RRID: AB_2561016
Protein G Dynal magnetic beads Invitrogen Cat#1003D
Ligase Enzymatics Cat#L6030-HC-L
Polynucleotide kinase Enzymatics Cat#Y9040L
T4 DNA polymerase Enzymatics Cat#P7080L
Phusion Polymerase NEB Cat#M0530L
Klenow exo- Enzymatics Cat#P7010
PCR_Primer_Index_4: Illumina N/A
CAAGCAGAAGACGGCATACGAGATTGG
TCAGTGACTGGAGTTCAGACGTGTGCTC
TTCCGATCT
CAAGCAGAAGACGGCATACGAGATCAC
TGTGTGACTGGAGTTCAGACGTGTGCT
CTTCCGATCT
CAAGCAGAAGACGGCATACGAGATATT
GGCGTGACTGGAGTTCAGACGTGTGCT
CTTCCGATCT
CAAGCAGAAGACGGCATACGAGATGAT
CTGGTGACTGGAGTTCAGACGTGTGCT
CTTCCGATCT
Deposited Data
Raw and analyzed data This paper https://www.synapse.org/#!
Synapse:syn4587616
human reference genome NCBI build 37, Genome Reference Consortium http://www.ncbi.nlm.nih.gov/projects/
GRCh37 genome/assembly/grc/human/
Raw and analyzed data for RNA-seq Parikshak et al., 2016 http://biorxiv.org/content/early/2016/09/
23/077057
Raw data Bernstein et al., 2010; NHGRI Epigenome http://www.genboree.org/epigenomeatlas/
Atlas index.rhtml
Please see Table S3 for the GREAT results This paper N/A
Please see Table S5 for the curated ASD SFARI gene https://gene.sfari.org/autdb/GS_Home.do
Gene list
Refseq gene set Refseq https://www.ncbi.nlm.nih.gov/refseq/
Gencode gene set Gencode https://www.gencodegenes.org/
Analyzed data for human brain BrainSpan http://brainspan.org
development RNA-seq
Transcription factor motif database TRANSFAC http://www.gene-regulation.com/pub/
databases.html
Transcription factor motif database JASPAR http://jaspar.genereg.net/
‘‘Self Chain’’ regions of the genome UCSC Genome Browser http://genome.ucsc.edu/index.html
EUR SNPs and indels database 1000 Genome http://www.internationalgenome.org/
GWAS SNPs on schizophrenia Ripke et al., 2014 N/A

Continued
GWAS SNPs on 5 psychiatric disorders Cross-Disorder Group of the Psychiatric N/A
Genomics Consortium, 2013
Other
Autism Tissue Program (ATP) Harvard Brain Tissue Resource Center http://www.autismtissueprogram.org/site/
c.nlKUL7MQIsG/b.5183271/k.BD86/
Home.htm
UMB BTB University of Maryland Brain and Tissue http://medschool.umaryland.edu/btbank/
Bank
the Oxford Brain Bank University of Oxford https://www.ndcn.ox.ac.uk/
BWA Li and Durbin, 2009 http://bio-bwa.sourceforge.net/
SAMtools Li et al., 2009 http://www.htslib.org/
DFilter and EFilter Kumar et al., 2013 http://collaborations.gis.a-star.edu.sg/
cmb6/kumarv1/dfilter/
CETS Guintivano et al., 2013 https://r-forge.r-project.org/projects/cets/
GREAT McLean et al., 2010 http://bejerano.stanford.edu/great/public/
html/
HOMER: findMotifsGenome.pl script Heinz et al., 2010 http://homer.salk.edu/homer/motif/
GATK (v3.2-2) DePristo et al., 2011 https://software.broadinstitute.org/gatk/
G-SCI test del Rosario et al., 2015 http://collaborations.gis.a-star.edu.sg/
cmb6/G-SCI_test/
R The R Project for Statistical Computing https://www.r-project.org/
MATLAB MathWorks http://www.mathworks.com/products/
matlab/?s_tid=hp_ff_p_matlab
Further information and requests for reagents may be directed to, and will be fulfilled by the Lead Contact Shyam Prabhakar
(prabhakars@gis.a-star.edu.sg).
Human Subjects
Brain samples from 45 ASD and 49 control individuals were acquired from the Autism Tissue Program (ATP) at the Harvard Brain
Tissue Resource Center, the University of Maryland Brain and Tissue Bank and the Oxford Brain Bank. Sample acquisition protocols
were followed for each brain bank, and samples were de-identified prior to acquisition. Sample swaps were verified with independent
genotyping. Brain sample and individual level metadata are provided in Table S1.
METHOD DETAILS
ChIP-Seq on Brain Tissue

For each ChIP-seq experiment approximately 100mg of frozen brain tissue per sample was aliquoted and thawed on ice in 1ml PBS
buffer. Tissue was then homogenized using a manual glass douncer with 7-15 slow strokes on ice. The cell suspension was filtered
with a 40uM cell strainer (Falcon) by spinning at 2000rpm for 1 min at 4C in a swing bucket centrifuge (Eppendorf Centrifuge 5810R).
Pellets were then washed twice with cold PBS, crosslinked with 1% formaldehyde for 15 min at room temperature and excess form-
aldehyde quenched by addition of glycine (0.625M). Cells were lysed with FA and nuclei were collected and re-suspended in 300 mL
SDS lysis buffer (1% SDS, 1% Triton X-100, 2 mM EDTA, 50 mM HEPES-KOH [pH 7.5], 0.1% Sodium deoxy cholate, Roche 1X Com-
plete protease inhibitor). Nuclei were lysed for 15 min, after which sonication was used to fragment chromatin to an average size of
200–500 bp (Bioruptor Next gen, Diagenode). Protein-DNA complexes were immuno-precipitated using 3 mg of H3K27acetyl anti-
body of the same lot for all 257 ChIP experiments (catalog number 39133; Active Motif) coupled to 50 mL protein G Dynal beads
(Invitrogen) overnight. The beads were washed and protein-DNA complexes were eluted with 150 mL of elution buffer (1% SDS,
10 mM EDTA, 50 mM Tris.HCl [pH 8]), followed by protease treatment and de-crosslinking at 65 C overnight. After phenol/chloroform
extraction, DNA was purified by ethanol precipitation. 5% of sheared chromatin was aliquoted and treated with Pronase and RNase

treated following de-crosslinking in the same manner as the ChIP DNA. To prepare pooled input libraries for each brain region, DNA
was then quantified and equal amounts from 10 ASD and 10 control samples (randomly chosen) were pooled. 100ng of pooled DNA
were used for library preparation. Input libraries were multiplexed and sequenced in one HiSeq lane. Library preparation was per-
formed as in Quail et al. (2008). After 15 cycles of PCR using indexing primers, libraries were size selected for 300-500 bp on low
melting agarose gel and 4 libraries were pooled and sequenced in one lane of 2 3 100bp using the same Illumina HiSeq 2000
with V3 reagents.
Read Alignment and Peak Calling

Reads from 257 ChIP-seq libraries were mapped to the human genome (hg19) using BWA (Li and Durbin, 2009). Duplicate reads were
filtered out using SAMtools (Li et al., 2009). 17 libraries were then discarded due to low quality (< 200,000 reads or mapping rate <
6%). Peaks were called in the remaining 240 libraries using DFilter (Kumar et al., 2013) which used the input DNA library for the cor-
responding brain region as a control. On average 20,447 ChIP-seq peaks were called in PFC, 20,583 in TC and 21,685 in CB. 11
samples were then discarded because they contained fewer than 10,000 peaks, leaving 229 samples for further processing. In
CB, singletons were defined as peaks detected in only a single individual (zero overlap with peaks in other libraries). Non-singleton
peaks were then merged across individuals (overlap > 0 bp) to define the consensus set of 38,069 CB peaks. Because peaks in PFC
and TC were highly overlapping, we combined these two brain regions to define the consensus set of 56,503 neocortical peaks. All
subsequent analyses of PFC and TC were performed on this neocortical peak set.
Peak Height Normalization

For each brain region, reads were counted in 100-bp bins for each library and scaled to normalize for sequencing depth (total read
count). Binned counts were then adjusted by normalizing their GC-content against the average GC-content of all libraries in each
brain region. In each peak region, the sum of bin-wise normalized counts was defined as the peak height. Finally, to reduce technical
variation, the heights of peaks in the union peak set were quantile-normalized (Bolstad et al., 2003).
Quality Control of 229 ChIP-Seq Datasets

The set of 229 ChIP-seq datasets contained 13 pairs of biological replicates. The average peak height correlation (Pearson) between
replicate datasets was 0.92, which is similar to replicate correlations in H3K27ac ChIP-seq data from the NHGRI Epigenome Atlas
(Inferior Temporal Lobe: 0.90; Mid-Frontal Lobe: 0.91) (Bernstein et al., 2010). For each of the 229 datasets, we quantified the mean
Pearson correlation (mPC) with other datasets from the same brain region. Then, from each of the 13 pairs of replicates, we discarded
the dataset with lower mPC. From the remaining 216 datasets, we discarded 7 that had low mPC values relative to the norm for the
corresponding brain region (< first quartile – 2.5xinter-quartile range). The remaining 209 ChIP-seq datasets from 94 individuals were
used for downstream analysis.
Removal of Confounding Factors

First, the normalized peak heights were transformed into the log2 domain. Then principal-component analysis (PCA) was performed
to detect potential confounding factors by correlating the top 5 principal components (PCs) with the biological covariates (diagnosis,
age, sex, neuronal cell fraction, ethnicity, and agonal state) and technical covariates (sequencing batches, brain bank, fragment me-
dian insert size from paired-end sequencing, percentage of duplicated reads, sequencing depth of each library and number of peaks
for each library). The neuronal cell fraction for each sample was estimated using CETS (Guintivano et al., 2013) from DNA methylation
data generated in a parallel study on the same cohort (C.C.Y.W., R. Smith, E. Hannon, L. Schalwyk, A. Kepa, J.P., W.S., N.N.P., S.P.,
D.H.G., and J.M., unpublished data). Neuronal cell fractions for samples that were not included in the parallel DNA methylation study
were assigned with the median neuronal cell fraction across all samples. Covariates that significantly correlated with top 5 PCs were
regressed out from the peak height matrix. For PFC, regressed out covariates included age, sex, neuronal cell fraction, sequencing
batches, brain bank, fragment median insert size, percentage of duplicated reads, sequencing depth and number of peaks. For TC,
regressed out covariates included age, sex, neuronal cell fraction, sequencing batches, brain bank, fragment median insert size,
sequencing depth and number of peaks. For CB, regressed out covariates included age, sex, neuronal cell fraction, sequencing
batches, brain bank, fragment median insert size, percentage of duplicated reads, sequencing depth and number of peaks. PCA
was performed again after regression to confirm that no confounding factors correlated strongly with the top 5 PCs (Figure S2).
Downstream analyses were based on the peak height matrix after covariate regression.
Analysis of Differentially Acetylated (DA) Peaks

In each brain region, an initial set of differentially acetylated (DA) peaks between ASD and control was constructed based on the
above-described peak height matrix (fold-change R 1.3; Q % 0.05; Wilcoxon rank sum test; Benjamini-Hochberg correction). Using
this initial set of DA peaks, we calculated the pairwise Pearson correlation coefficient matrix R of peak heights, and raised each
element of the matrix to the ninth power (Rij9). The resulting row vectors were used to define the coordinates of ASD and control
samples in correlation space, for the purpose of calculating Euclidean distances. For each sample, two distances were calculated:
the median Euclidean distance to all the ASD samples (Distance_A) and the median Euclidean distance to all the control samples
(Distance_C). Any ASD sample with Distance_A > 1.05xDistance_C was discarded (Figure S3). Similarly, any control sample with

Distance_C > 1.05xDistance_A was discarded. A final set of DA peaks was constructed between ASD and control using the remain-
ing samples, with the same Q-value and fold change cutoffs as above (Table S4). To test whether these DA peaks were genuine, we
generated 1,000 randomized datasets by permuting sample labels (ASD, control). For each permuted dataset, we called DA peaks
using the above-described two-step approach. For each brain region, the P-value of the number of DA peaks in the actual data was
calculated as the fraction of permuted datasets with an equal or greater number of DA peaks (Figure 1B).
Functional Enrichment of DA Peaks

First, we masked the dup15q locus in the complete peak height matrix. We then used the GREAT tool (McLean et al., 2010) to deter-
mine the enrichment of gene categories in DA peaks. Genes were associated with regulatory regions using the basal+extension as-
sociation rule defined by GREAT. The hypergeometric test was performed to determine if a gene category was enriched for genes
associated to DA peaks (foreground set) compared to genes associated to all peaks (background set). Gene categories with fold-
change R 1.5 and Q % 0.01 were retained. Additionally, we discarded the enriched gene category if less than 5 genes were asso-
ciated with DA peaks in that category. To display the non-redundant significantly enriched gene categories in Figure 2, we further
selected the top 3 non-redundant gene categories in biological process and molecular function gene ontologies. Top 1 gene category
from cellular component, PANTHER pathway, mouse phenotype and disease ontology are shown in the figure as well. The complete
GREAT results can be found in Table S3.
Enrichment of DA Loci for Expression at Specific Developmental Stages

A similar analysis was performed to determine the enrichment of DA peaks in SFARI genes (Tables S5 and S6), DA peaks near indi-
vidual genes (Tables 1 and S6) and DA peaks near developmental stage-specific genes (Figure 4). Again, the dup15q locus was
excluded. The Refseq gene set (RefSeq, 2002) was used in the first two analyses. Gencode v10 gene set (Harrow et al., 2012)
was used in enrichment analysis in brain development. Human brain RNA-seq profiles were downloaded from BrainSpan (BrainSpan,
2015). We defined expressed genes as RPKM > 5 in at least 2 dataset. Then quantile-quantile normalization was performed on the
RPKM values across each developmental time point (8 post-conception weeks to 40 years old) in brain regions that develop into PFC
and TC. At each time point, the median RPKM values were used if there are replicate samples. We computed the coefficient of vari-
ation (CV) of each gene and clustered the samples across time points based on top 5,000 most variable expressed genes (high CVs).
11 and 12 stages were finally defined in PFC (Figure 4A) and TC (Figure 4B) by grouping similar time points based on the dendrogram,
respectively. Samples at 2-3 years old were discarded due to low RNA quality (low RIN values). At each developmental stage, genes
were ranked based on their gene expression fold change relative to the other stages. The top 2,276 (PFC) and 2,549 (TC) upregulated
genes (fold change R 1.5) at each developmental stage were tested for enrichment of DA peaks.
Motif Analysis
For motif enrichment analysis, we used the HOMER ChIP-seq pipeline’s findMotifsGenome.pl script with the ‘‘-mknown’’ option
(Heinz et al., 2010). Motif models were drawn from the TRANSFAC vertebrate database (Matys et al., 2006) and the analysis was
performed separately on Up and Down DA peaks from each of the 3 brain regions (6 DA peak sets in total), with all peaks from
the same brain region as background. Motifs were classified as enriched based on fold enrichment (R1.3), FDR (%0.01) and number
of foreground peaks that had a motif match (R20). The list of enriched motifs was almost identical when we used the JASPAR data-
base (Mathelier et al., 2016) instead of TRANSFAC (data not shown).
SNP-Calling Pipeline
ChIP-seq reads were aggregated across all three brain regions for each individual and then passed to the multi-sample SNP-calling
pipeline. Reads used for SNP calling were de-duplicated and retained only if they were mapped to the genome in the correct orien-
tation. We performed indel realignment, base-quality-score recalibration and SNP calling using GATK version 3.2-2 (DePristo et al.,
2011). 1,297,168 SNPs within peaks in all three brain regions were called using GATK’s Haplotype Caller at a SNP quality threshold of
50. Subsequently, SNP calls were filtered out with the following criteria: MQ0Fraction > 0.001, QD < 4.3, within 6 bp of an indel, more
than seven SNPs within a 100-bp region, Mapping Quality < 45, Homopolymer Run > 10, MQ0 > 9.5, Dels > 0.255. Moreover, only
SNP calls covered by at least 5 non-reference reads across all libraries and 3 or more non-reference reads in at least one library were
retained. SNPs that violated Hardy-Weinberg equilibrium with a binomial test P-value 1 3 10 3 were discarded. To eliminate mapping
artifacts, SNPs in highly paralogous regions of the genome implicated by the ‘‘Self Chain’’ track on the UCSC Genome Browser (Kent
et al., 2002) (normalized score R 90) were filtered out. Finally, a high-confidence set of 821,606 SNPs within PFC and TC peaks and
560,972 SNPs called within CB peaks were obtained. Note that we did not perform genotype calling, since the G-SCI test does not
require prior knowledge of genotypes. Rather, it integrates over the likelihoods of all three genotypes for each individual, given the
data (del Rosario et al., 2015).
haQTL Calling
haQTLs were called in the 84 Caucasian samples using G-SCI test (del Rosario et al., 2015). The diagnosis status and top PCs which
account for more than 5% variance were regressed out from peak heights before haQTLs calling. We then performed the G-SCI test
on each of the 821,606 SNPs within peaks for PFC and TC regions and the 560,972 SNPs within peaks for CB. For each SNP, an

adjusted P-value was computed using a permutation test from 10,000 to 1 million permutations until a nonzero P-value was obtained.
After 1 million permutations, if the adjusted P-value was still 0, it was set to 5 3 10 7. We then used the Benjamini and Hochberg
multiple testing correction to calculate the FDR. At FDR threshold of 10%, 9094, 7468 and 9860 candidate haQTLs were identified
in PFC, TC and CB. To detect possible artificial haQTLs due to different mapping rates to the reference genome between alleles, we
simulated all possible 100 bp paired-end reads covering the haQTL and flanking SNPs and indels. The union of our SNP and indel
(quality > 50 by GATK) calls and the 1000 Genome EUR SNPs and indels (The 1000 Genomes Project Consortium, 2012) were used.
The fragment length of the simulated paired-end reads was set to be equal to 180 which is the median fragment size of all libraries.
The simulated reads were then mapped to the reference genome using BWA. 1510, 1192 and 693 haQTLs were discarded because
their inferred allelic imbalances from the ChIP-seq data were smaller than five times the mapping bias estimated from the simulation.
The remaining haQTLs were further filtered by an effect-size filter which calculated the Pearson correlation between peak height and
the fraction of Q30 nonreference bases and haQTLs with R2 < 0.1 were discarded. The final set of 1912, 2012 and 2255 haQTLs in
PFC, TC and CB were from the remaining haQTLs after effect-size filter and only the most significant SNP in each ChIP-seq peak was
retained.
LD between Pyschiatric Disorder GWAS SNPs and haQTLs

We downloaded two sets of GWAS SNPs, one on schizophrenia (Ripke et al., 2014) and another on 5 psychiatric disorders (Cross-
Disorder Group of the Psychiatric Genomics Consortium, 2013). For the schizophrenia study we used all 128 SNPs while for the 5
psychiatric disorder study, we used a P-value threshold of 5e-8 (99 SNPs). The LD was calculated on the EUR population, hence
for this analysis we only used SNPs that are polymorphic in the 1000 Genomes EUR population (The 1000 Genomes Project Con-
sortium, 2012), yielding 1,863 BA41 haQTLs, 1,714 BA9 haQTLs and 2,141 Vermis haQTLs. An haQTL was considered to be in
LD with a GWAS SNP if R2 was at least 0.8.
Statistical Method of Computation

Statistical methods and software used in this study are cited in the STAR Methods and the Figure legends. The statistical analyses
were performed in MATLAB and R. The initial and final sets of DA peaks were constructed using the Wilcoxon rank sum test and
Benjamini-Hochberg multiple testing correction with fold-change R 1.3 and Q % 0.05 (e.g., Table S4). The P-value of the permutation
test was calculated as the fraction of permuted datasets with an equal or greater number of DA peaks (e.g., Figure 1B). In the Venn
diagram, the P-values were calculated using the hypergeometric test with the set of all peaks as background (e.g., Figure 1D). The P-
values in the dotplots (e.g., Figures 1D, 1E, and S4) and the violin plot (Figure S5) were calculated assuming a t-distributed Pearson
correlation coefficient. The P-value in the dotplot (e.g., Figure 1F) was calculated using the hypergeometric test. The gene category
enrichment (e.g., Figure 2; Table S3), the enrichment of DA peaks in SFARI genes (e.g., Table S5), DA peaks near individual genes
(e.g., Tables 1 and S6) and DA peaks near developmental stage-specific genes (e.g., Figure 4) were evaluated using the hypergeo-
metric test. When calling haQTLs using G-SCI test, an adjusted P-value was computed for each SNP using a permutation test from
10,000 to 1 million permutations until a nonzero P-value was obtained. After 1 million permutations, if the adjusted P-value was still 0,
it was set to 5 3 10 7. We then used the Benjamini and Hochberg multiple testing correction to calculate the FDR with a threshold of
10% (e.g., Figure 5A; Table S7).
Inclusion and Exclusion Criteria of Any Data

After mapping to the reference genome, 17 samples were discarded due to low quality (< 200,000 reads or mapping rate < 6%). 11
samples were discarded after peak calling because they contained fewer than 10,000 peaks.
Data Resources
The accession number for the ChIP-seq data reported in this paper is Synapse: syn4587616.

A 0.08
PFC mean GC distribution
Fraction of reads
0.06
0.04
0.02
0
0 20 40 60 80 100
GC content (%)
B 0.08
TC mean GC distribution
Fraction of reads
0.06
0.04
0.02
0
0 20 40 60 80 100
GC content (%)
C 0.08
CB mean GC distribution
Fraction of reads
0.06
0.04
0.02
0
0 20 40 60 80 100
GC content (%)
Figure S1. GC Content Distribution of Samples in Three Brain Regions, Related to Figure 1
(A) GC content distributions of 81 samples in PFC were normalized to the mean GC distribution in PFC.
(B) GC content distributions of 66 samples in TC were normalized to the mean GC distribution in TC.
(C) GC content distributions of 62 samples in CB were normalized to the mean GC distribution in CB.
Figure S2. Correlation between Top 5 Principal Components and Covariates in Three Brain Regions before and after Regression, Related to
Figure 1
(A) PFC.
(B) TC.
(C) CB.
Pearson correlation coefficient is shown at each grid point. After regressing out correlated confounding factors, the top 5 PCs correlated with none of the co-
variates except diagnosis. InsertSize: fragment median insert size; Dup: percentage of duplicated reads; Reads: sequencing depth; Peaks: number of peaks;
Neuron: neuronal cell fraction.
A PFC
Median distance to ASD

A
C
2
1
1 2 3
Median distance to control
B TC
3
2 A
C
1
1 2 3
C CB
1.6
1 C
0.4
0.4 1 1.6
Figure S3. Identification of Atypical Samples, Related to Figure 1

Scatterplot of median divergence between acetylomes in PFC (A), TC (B) and CB (C). In this analysis, the acetylome is defined as the vector of peak heights at DA
peaks. x axis: median Euclidean distance to other control acetylomes; y axis: median Euclidean distance to other ASD acetylomes (STAR Methods). Red dots:
ASD samples; blue diamonds: control samples. Solid line: Y = X; Dotted lines: Y = 1.05X and X = 1.05Y. ASD samples above the Y = 1.05X line and control samples
below the X = 1.05Y line were defined as atypical samples.
A PFC
2.5
R=0.94
P≈0
log2FC all
0
-2.5
-2.5 0 2.5
log2FC subset
B TC
2.5
R=0.90
P≈0
log2FC all
-2.5
-2.5 0 2.5
log2FC subset
C 2 CB
R=0.98
P≈0
log2FC all
-2
-2 0 2
log2FC subset
Figure S4. Acetylation Fold Change between ASD and Control, Calculated Using All Samples Displayed on the Y Axis or Using Only Typical
Samples Displayed on the X Axis, Related to Figure 1
(A) PFC. The P-value of the fold-change correlation was calculated assuming a t-distributed Pearson correlation coefficient.
(B) Similar plot, TC.
(C) Similar plot, CB.
A 80
AGAS score
0
Variance explained: 12%

P = 0.0019
-80
C
B
100
AGAS score

P = 3.39e-4
-80
C
C
15
AGAS score

P = 1.79e-14
-15
C
Figure S5. ASD-Specific Global Acetylome Signature Scores, Related to Figure 1

Violin plot of AGAS scores in PFC (A), TC (B) and CB (C). A: ASD samples; C: control samples. The P-value was calculated assuming a t-distributed Pearson
correlation coefficient.
Resource
Genetic Drivers of Epigenetic and Transcriptional

Variation in Human Immune Cells
Lu Chen, Bing Ge,
Francesco Paolo Casale, ...,
Kate Downes, Tomi Pastinen,
Nicole Soranzo
Correspondence
tomi.pastinen@mcgill.ca (T.P.),
ns6@sanger.ac.uk (N.S.)
In Brief
As part of the IHEC consortium, this study
integrates genetic, epigenetic, and
transcriptomic profiling in three immune
cell types from nearly 200 people to
characterize the distinct and cooperative
contributions of diverse genomic inputs
to transcriptional variation. Explore the
Cell Press IHEC web portal at http://www.
cell.com/consortium/IHEC.

d Genome, transcriptome, and epigenome reference panel in EGAD00001002663
three human immune cell types EGAD00001002671
EGAD00001002674
d Identified 4,418 genes associated with epigenetic changes EGAD00001002675
independent of genetics EGAD00001002670
EGAD00001002672
d Described genome-epigenome coordination defining cell-
EGAD00001002673
type-specific regulatory events
EGAS00001001456
d Functionally mapped disease mechanisms at 345 unique
autoimmune disease loci
Chen et al., 2016, Cell 167, 1398–1414

November 17, 2016 ª 2016 The Authors. Published by Elsevier Inc.
Resource
Genetic Drivers of Epigenetic and Transcriptional

Variation in Human Immune Cells
Lu Chen,1,2,28 Bing Ge,3,28 Francesco Paolo Casale,4,28 Louella Vasquez,1,28 Tony Kwan,3 Diego Garrido-Martı́n,5,6
Stephen Watt,1 Ying Yan,1 Kousik Kundu,1,2 Simone Ecker,7,8 Avik Datta,9 David Richardson,9 Frances Burden,2,18
Daniel Mead,1 Alice L. Mann,1 Jose Maria Fernandez,7 Sophia Rowlston,2,18 Steven P. Wilder,10 Samantha Farrow,2,18
Xiaojian Shao,3 John J. Lambourne,3,2,18 Adriana Redensek,3 Cornelis A. Albers,13,16 Vyacheslav Amstislavskiy,14
Sofie Ashford,2,18 Kim Berentsen,15 Lorenzo Bomba,1 Guillaume Bourque,3 David Bujold,3 Stephan Busche,3
Maxime Caron,3 Shu-Huang Chen,3 Warren Cheung,3 Oliver Delaneau,12 Emmanouil T. Dermitzakis,12 Heather Elding,1
1Department of Human Genetics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge
CB10 1HH, UK
3Human Genetics, McGill University, 740 Dr. Penfield, Montreal, QC H3A 0G1, Canada
4European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge
CB10 1SD, UK
5Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology,
Carrer del Dr. Aiguader, 88, Barcelona 8003, Spain

6Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Plaça de la Mercè, 10- 12, Barcelona 8002, Spain
SUMMARY repertoire of immune genes and cell subsets implicated in the

pathogenesis of individual disease can vary dramatically.
Characterizing the multifaceted contribution of ge- Genome-wide association studies (GWAS) have contributed to
netic and epigenetic factors to disease phenotypes expanding catalogs of implicated genes and pathways for
is a major challenge in human genetics and medicine. many complex human diseases (Hindorff et al., 2009) and are
We carried out high-resolution genetic, epigenetic, beginning to shed light on shared and unique etiological and
and transcriptomic profiling in three major human pathological components of disease (Farh et al., 2015; Jostins
et al., 2012). A key challenge is that these disease variants
immune cell types (CD14+ monocytes, CD16+ neutro-
map predominantly to noncoding regions of the human genome,
phils, and naive CD4+ T cells) from up to 197 individ-
where they are predicted to alter regulatory function (Kundaje
uals. We assess, quantitatively, the relative contribu- et al., 2015). Linking susceptibility variants to their respective
tion of cis-genetic and epigenetic factors to causative genes and cell-specific regulatory elements thus re-
transcription and evaluate their impact as potential mains a main priority in order to realize the potential of associa-
sources of confounding in epigenome-wide association studies to advance understanding of disease biology and
tion studies. Further, we characterize highly coordi- etiology, leading to therapeutic advances.
nated genetic effects on gene expression, methyl- Molecular quantitative trait locus (QTL) studies testing for
ation, and histone variation through quantitative associations between genetic variants and intermediate pheno-
trait locus (QTL) mapping and allele-specific (AS) an- types, in particular gene expression levels, provide powerful
alyses. Finally, we demonstrate colocalization of approaches to annotate the putative consequence of disease
associations (Montgomery and Dermitzakis, 2011). The biolog-
molecular trait QTLs at 345 unique immune disease
ical resolution of this approach can be further increased using
loci. This expansive, high-resolution atlas of multi-
two main strategies. First, genetic effects on gene expression
omics changes yields insights into cell-type-specific have been shown to be often context-specific (Kundaje et al.,
correlation between diverse genomic inputs, more 2015) and thus are better captured in studies probing multiple
generalizable correlations between these inputs, primary cell types or experimental conditions (Bentham et al.,
and defines molecular events that may underpin 2015; Fairfax et al., 2014; Naranbhai et al., 2015). Second, ex-
complex disease risk. tending these analyses beyond gene expression to other
molecular phenotypes such as variable histone modification or
methylation status can greatly enhance the functional and mech-
INTRODUCTION anistic interpretation of genetic associations (Allum et al., 2015).
Recent studies in cell line models have demonstrated the occur-
Many human complex diseases are characterized by dysregula- rence of a high degree of local coordination between transcrip-
tion of immune and inflammatory activity. However, the tional and epigenetic states and suggested that a fraction of
1398 Cell 167, 1398–1414, November 17, 2016 ª 2016 The Authors. Published by Elsevier Inc.
Irina Colgiu,17 Frederik O. Bagger,2,4,18 Paul Flicek,9 Ehsan Habibi,15 Valentina Iotchkova,1,11 Eva Janssen-Megens,15
Bowon Kim,15 Hans Lehrach,14 Ernesto Lowy,9 Amit Mandoli,15 Filomena Matarese,15 Matthew T. Maurano,19
John A. Morris,3 Vera Pancaldi,7 Farzin Pourfarzad,20 Karola Rehnstrom,2,18 Augusto Rendon,2,21 Thomas Risch,14
Nilofar Sharifi,15 Marie-Michelle Simon,3 Marc Sultan,14 Alfonso Valencia,7 Klaudia Walter,1 Shuang-Yin Wang,15
Mattia Frontini,2,18,22 Stylianos E. Antonarakis,12 Laura Clarke,9 Marie-Laure Yaspo,14 Stephan Beck,8 Roderic Guigo,5,6,23
Daniel Rico,7,24 Joost H.A. Martens,15 Willem H. Ouwehand,1,2,18,22,25 Taco W. Kuijpers,2,20,26 Dirk S. Paul,8,27
Hendrik G. Stunnenberg,15 Oliver Stegle,4 Kate Downes,2,18 Tomi Pastinen,3,* and Nicole Soranzo1,2,22,25,29,*
7Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernandez Almagro, 3,
Madrid 28029, Spain
8UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
9Vertebrate Genomics, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus,
Hinxton, Cambridge CB10 1SD, UK

10Genome Analysis, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton,

11European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton,

12Genetic Medicine and Development, University of Geneva Medical School-CMU, 1 Rue Michel-Servet, Geneva 1211, Switzerland
13Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Centre, P.O. Box 9101,
Nijmegen 6500 HB, the Netherlands

14Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestr. 63/73, Berlin 14195, Germany
15Department of Molecular Biology, Faculty of Science, Radboud University, Nijmegen 6525GA,
the Netherlands
16Molecular Developmental Biology, Radboud Institute for Life Sciences, Radboud University, P.O. Box 9101, Nijmegen 6500 HB, the
Netherlands
17Human Genetics Informatics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK
18National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
19Institute for Systems Genetics, New York University Langone Medical Center, ACLS West, Room 511, 430 East 29th Street, New York,
NY 10016, USA
20Blood Cell Research, Sanquin Research and Landsteiner Laboratory, Plesmanlaan 125, Amsterdam 1066CX, the Netherlands
21Bioinformatics, Genomics England, Charterhouse Square, London EC1M 6BQ, UK

23Computational Genomics, Institut Hospital del Mar d’Investigacions Mediques (IMIM), Carrer del Dr. Aiguader, 88, Barcelona 8003, Spain
24Institute of Cellular Medicine, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 4HH, UK
25The National Institute for Health Research Blood and Transplant Unit (NIHR BTRU) in Donor Health and Genomics at the University of
Cambridge, Strangeways Research Laboratory, University of Cambridge, Wort’s Causeway, Cambridge CB1 8RN, UK
26Emma Children’s Hospital, Academic Medical Center (AMC), University of Amsterdam, Location H7-230, Meibergdreef 9,
Amsterdam 1105AZ, the Netherlands

27Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, Strangeways Research Laboratory,
University of Cambridge, Wort’s Causeway, Cambridge CB1 8RN, UK

28Co-first author
29Lead Contact
*Correspondence: tomi.pastinen@mcgill.ca (T.P.), ns6@sanger.ac.uk (N.S.)

disease-associated genetic variants may alter expression levels and fungal infection, migrating within minutes to sites of infec-
through changes in chromatin state (Grubert et al., 2015; Waszak tion, attracted by local tissue factors and resident macrophages
et al., 2015). Extending these integrated investigations to primary during the acute phase of inflammation. Finally, CD4+ naive
human cells in disease-relevant contexts is the necessary next T cells are part of the adaptive immune response system, repre-
step to unravel the cell- and context-specific regulatory effects senting mature helper T cells that have not yet encountered their
of complex disease variants. cognate antigen.
Here, we report an integrated analysis of genetic, epigenetic, We generated high-resolution whole-genome sequence, tran-
and transcriptomic datasets in the three major cells of the human scriptome, DNA methylation, and histone modification datasets
immune system, namely CD14+ monocytes, CD16+ neutrophils, in up to 197 individuals selected from a population-based sam-
and CD4+ naive T cells. Monocytes contribute to maintenance of ple and applied variance decomposition, QTL, and allelic imbal-
the resident macrophage pool under steady-state conditions ance analyses to investigate genetic and epigenetic influences to
and migrate to sites of infection in the tissues and divide/differ- transcription and RNA splicing in the three primary immune cells.
entiate into macrophages and dendritic cells to elicit an immune We demonstrate colocalization of molecular trait QTLs with 345
response. Neutrophil granulocytes (neutrophils) are primary unique genetic variants predisposing to seven human autoim-
blood cells of the innate immune and inflammatory response mune diseases, involving all data layers. Overall, the data and
system that form a first line of organismal response to bacterial results deepen our understanding of genetic and epigenetic
Cell 167, 1398–1414, November 17, 2016 1399

Genomic DNA WGS
Neutrophils Total RNA

CD66b+,CD16+
Monocytes Methylation
CD14+ CD16-
N=200 H3K4me1
T-cells
CD4+ CD45RA+ H3K27ac
Molecular Data Traits

Gene Expression Sum of exon and junction reads,
[N 17,082] expressed in normalized read count
Transcript (isoform ratio) Ratio of the two major transcripts/

[7,962] isoforms, expressed in MD values
Percent splice-in (PSI) Percent splice-in of alternatively spliced

Splicing
[12,843] Exon skipping junctions
Alt 3’ acceptor
Alt 5’ donor
Methylation intergenic TSS1500 TSS200 UTR 1st exon gene body Methylation M value (logit of beta value)
[440,905]
Histone modifications H3K27Ac

H3K4me1
H3K27Ac & H3K4me1 Normalized read count of the peak
[105,594] regions, expressed in Reads per Million
(RPM)
Variance Quantitative Alleles-specific Disease

Decomposition Trait Loci Analysis Integration
Figure 1. Study Design

Overview of study design and molecular traits investigated. Details of sample collections are given in Figure S1 and Table S1.
regulation of the transcriptional machinery in three primary cells For each individual, we performed whole-genome sequencing
of the immune system and inform the formulation and testing of (WGS) (mean read depth, approximately 73) (Figure S2; Table
functional hypotheses for human complex disease. S1) and probed the transcriptional profiles (RNA sequencing
[RNA-seq] at 80 million reads per sample) (Figure S3),
RESULTS genome-wide DNA methylation (Illumina 450K arrays) (Fig-
ure S2), and two histone modification marks for active and
Study Design poised enhancers and active promoters (H3K4me1 and
As part of the BLUEPRINT epigenome project, we recruited an H3K27ac, chromatin immunoprecipitation sequencing [ChiP-
initial set of 200 blood donors from a local blood donor popu- seq] at R30 million reads per sample) (Figure S4). Molecular as-
lation, ascertained to be free of disease and representative of says for monocytes and neutrophils were distributed across
the United Kingdom (UK) population at large (54% females, four laboratories, and assays on T cells were done at McGill
mean age 55 years) (Figure 1; Table S1). We used a (Figure S1). We carefully assessed and adjusted for possible
multi-step purification strategy (Figure S1) to isolate, for each sequencing artifacts that may arise due to differences in
donor, cell subsets corresponding to classical monocytes protocols between centers, applying stringent quality filters
(CD14+CD16) and neutrophils (CD66b+CD16+). Subsequently, where needed. We confirmed that our approach avoided signif-
through a collaboration with Epigenome Mapping Centre at icant effects by profiling a subset of the same individuals
McGill University, we were able to extend the study to a third across each respective center (‘‘cross-over experiments’’) (Fig-
cell type (‘‘phenotypically naive’’ CD4+CD45RA+ T cells, hence- ures S1, S2, S3, and S4; Table S2). Overall, the project gener-
forth referred to as CD4+ T cells or T cells for simplicity) for 169 ated 116,310 million QC-pass reads across all datasets, with
out of the 200 donors. 80% of donors passing ten or more assays and 56 donors
1400 Cell 167, 1398–1414, November 17, 2016

Figure 2. Variance Decomposition and Epigenetic Association Analysis of Gene Expression
(A) Mechanisms of genetic and epigenetic associations with gene expression. Considered are direct cis-acting genetic effects (light blue) as well as epigenetic
correlations with gene expression that are independent of genetics (dark blue). No assumption is made on causal directionality for shared genetic effects (light
blue, dashed line).
(B) Proportion of transcriptome variance explained by genetic and epigenetic factors for individual genes, when considering putative cis-regulatory elements
(within ±1 Mb of the gene body). Shown is the cumulative contribution of genes with increasing proportions of explained variance, considering genetic factors
(blue), DNA methylation (orange), H3K4me1 (violet), and H3K27ac (pink) in monocytes. Epigenetic variance components were estimated either with (solid lines,
G-corrected) or without (dashed lines, uncorrected) accounting for local cis-genetic variation (Methods).
(C) Scatterplot of the proportion of variance explained by cis-genetics (x axis) versus cis-epigenetic (y axis) effects in monocytes. Significant variance com-
ponents (VCs, FDR <5%) are coded in color.
(D) Overlap of genes with significant cis-genetic and cis-epigenetic contributions to expression variance.
(E) Overlap of genes with significant contributions from (cis) DNA methylation, (cis) H3K4me1, and (cis) H3K27ac.
(F) Manhattan plot for gene TMEM176A obtained from the cis-epigenetic association analysis of gene expression in monocytes. Top panel: analysis without
accounting for cis-genetic variation. Bottom panel: analysis when accounting for cis-genetic variation.
(G) Fraction of genes with significant epigenetic associations (epiGenes, FDR <5%) before (uncorrected) and after correcting (G-corrected) for common cis-
genetic variation. For T cells, a lower number of ChIP-seq data for H3K4me1 was due to lower initial immunoprecipitation enrichment for a subset of
cryopreserved samples with insufficient material for repeat assays; hence only methylation was used in this analysis.
See also Figures S5 and S6 and Tables S3 and S4.
having complete data across all cell types and molecular as- is necessary for the correct interpretation of the contribution of
says (Table S1). epigenetic variation to organismal traits and disease.
We first sought to quantify the relative impact of genetic and
Decomposition of Transcriptional Variance into Genetic epigenetic factors to transcriptional variance. Associations be-
and Epigenetic Components tween epigenetic and RNA traits may arise from two potential
Matched genetic, epigenetic, and gene expression profiles from causes: (1) local epigenetic changes that correlate with RNA
multiple donors in this study provides a unique opportunity to level but themselves are due to DNA sequence variation (Fig-
characterize the relationship between hierarchies of gene regu- ure 2A), and (2) epigenetic changes that are correlated with
lation and how these regulatory links ultimately affect human RNA level and not associated with cis genetic variation.
phenotypic variation. Detailed understanding of this relationship To quantify the relative contribution of genetic and epigenetic
Cell 167, 1398–1414, November 17, 2016 1401

factors to transcriptional variance, we fit variance decomposition then carried out a conventional EWAS analysis testing for associ-
models (Lippert et al., 2014; Casale et al., 2015) to explain tran- ation between epigenetic traits (within 1 Mb of the gene) and gene
scriptome variance using common genetic (minor allele fre- expression without accounting for cis-genetics and compared it
quency [MAF] >4%) and epigenetic features within 1 Mb of the to a second model where we adjusted for local cis-genetic effects
gene. DNA methylation and histone modifications explained (variants within 1 MB of the gene body). In the traditional EWAS,
lower proportion of transcriptome variance in models where we identified significant epigenome associations with gene
epigenetic elements were adjusted for proximal genetic effects expression for between 35% (5,813 in monocytes and 5,190
compared to the corresponding unadjusted models in all cell genes in neutrophils, FDR <5%) (Figures 2F and 2G) and 16.5%
types (Figures 2B, S5A, and S5B), suggesting that genetic ef- of the genes tested (2,942 genes in T cells, where the model
fects are the main determinant of transcriptome variance. was fitted only on methylation owing to the smaller H3K4me1 da-
We next fit a joint model that considers all four molecular layers taset). However, when accounting for cis-genetic effects, >50%
(genetic, methylation, H3K4me1, and H3K27ac). Globally, the of the genes with an EWAS signal were no longer significant (Fig-
proportion of expression variance explained by epigenetic ef- ures 2F, 2G, and S6A). This demonstrates that failure to account
fects (average 3.2% for H3K4me1, 3.1% for H3K27ac, and for genetic factors in EWAS may lead to an overestimation of the
1.9% for methylation in monocytes) was small compared to total contribution of epigenetic factors to phenotype. The magni-
genetic effects (average 13.9% in monocytes) (Figures S5C– tude of this effect may also vary in disease-focused studies.
S5F). Estimates of the overall contribution of DNA methylation Overall, our results demonstrate that a large part of epigenetic
is conservative in this analysis, because methylation sites are associations with transcriptome variance at population level
incompletely ascertained in the Illumina 450k array (representing are correlated with underlying common cis-genetic variation,
2% of all annotated CpGs, for 99% of RefSeq genes at mainly consistent with a high degree of local coordination between
promoters and genic enhancers). When testing for significance genetic, epigenetic, and transcriptional variation (Grubert
of the variance components in this model, we identified 2,451, et al., 2015; Waszak et al., 2015). We further show that this cor-
2,213, and 441 genes with significant epigenetic component relation is an important confounder in epigenome-wide associa-
(false discovery rate [FDR] <5%) in monocytes, neutrophils, tion studies. Notably, however, careful integrative statistical
and T cells, respectively, of which 1,092, 940, and 258 genes modeling can identify clear epigenetic influences independent
had no significant genetic effect (Figures 2C–2E, S5G, and of cis-genetic factors for classes of biologically relevant genes.
S5H; FDR <5%). These results indicate that some local epige-
netic associations with RNA cannot be explained by shared Coordinated Influence of QTL Variants across Molecular
genetic effects due to common variants. These genes were Data Layers
implicated in key functions in innate and acquired immunity We have shown that genetic variants determine a large fraction
and inflammation. As examples, genes of the inflammasome of observed epigenetic and transcriptional variation. Identifying
pathway were strongly enriched in neutrophils (p = 2 3 106; these variants is essential to study their potential influence on
Table S3). Inflammasomes are innate immune system com- cell function and disease mechanisms at individual loci. We first
plexes that regulate the activation of caspase-1 and the proin- applied linear mixed models (Casale et al., 2015) to test associ-
flammatory IL-1 family of cytokines. This process is induced by ations of 7 M DNA sequence variants with gene expression
detection of pathogen-associated molecular patterns (PAMPs) quantified from total RNA sequencing. We considered variants
or danger-associated molecular patterns (DAMPs) culminating within 1 Mb of gene bodies, for a total of 20,403 human genes
in the induction of inflammation in response to infectious mi- that have a minimum of ten read counts in one of the cell types,
crobes and molecules derived from host damage. Inflamma- including 13,245 (65%) protein-coding and 7,158 (35%) non-
somes have also been implicated in a range of inflammatory coding genes. Overall, 6,513 (39.3%), 5,845 (38.9%), and
processes and disorders. In monocytes, we detected epigenetic 5,799 (33.9%) genes had a QTL in monocytes, neutrophils, and
influences for genes within a number of key signaling pathways T cells, respectively (2,482 non-coding genes; Figure 3A), en-
involved in the immune cell function, including the Tec kinase compassing biological functions that were for the most part
and eicosanoid signaling pathway, the nuclear factor kB separate to genes with uniquely epigenetic influences (Table S3).
(NF-kB), CXCL8, and interleukin-10 (IL-10) signaling pathways We next sought to identify shared genetic effects linking genes
(Melcher et al., 2008; Schmidt et al., 2004) (Table S3). These find- to their putative regulatory elements such as gene promoters
ings suggest that function related to pathogen response may be and enhancers. Enhancers play a central role in driving cell-
primed and controlled at least in part through epigenetic rather type-specific gene expression (Ong and Corces, 2012), by acti-
than genetic mechanisms. Finally, estimated variance compo- vating transcription of target genes that may be located at
nents of individual genes were correlated across cell types, distances of tens to hundreds or even thousands of kilobases.
most significantly for genetic factors, but also for pure epigenetic Here, we considered two different histone modifications typically
variance contributions (Figure S5I). associated with poised and active promoters and enhancers
The large prevalence of epigenome-transcriptome associa- (H3K4me1 and H3K27ac), and DNA methylation levels measured
tions that can be attributed to shared genetic effects may have using Illumina 450k arrays. We again tested associations by
important implications for interpretation of epigenome-wide as- considering genetic variants within 1 Mb cis- windows centered
sociation studies (EWAS). To explore this possibility, we next on each feature. On average, 9.89% of methylation probes
considered gene expression measured from RNA sequencing ex- (64,836 probe-trait association), 25.7% of H3K4me1 peaks
periments as a proxy for organismal and disease phenotype. We (21,829 peaks), and 11.5% of H3K27ac peaks (15,548 peaks)
1402 Cell 167, 1398–1414, November 17, 2016

A B
No. of phenotypes associated

8000
Number of genes with 80
50
with the same QTL

eQTL (FDR5%)
6000 20
other non-coding
4000
10
processed transcript
antisense 5
lincRNA
2000
pseudogene 2
protein-coding
1
0
M N T Gene H3K4me1 H3K27ac Meth
C D
0.6
% cell specific QTLs
M 1
0.5
N
0.4 T 0.8
H3K4me1
0.3 0.6
0.2 0.4
0.1 H3K27ac 0.2

0.0
0
1
M 0.8 Meth -0.2

Pairwise
0.6 -0.4
1
N
0.4
-0.6
T 0.2
Gene -0.8
0
M N T M N T M N T M N T -1
Gene Meth H3K27ac H3K4me1 MvN MvT NvT
E hQTL Gene eQTL F

M N T
1
0.8
H3K4me1 0.6
Gene r 2 0.8 Gene 0.4
0.2
H3K27ac 0
-0.2
-0.4
Gene & -0.6

Meth
H3K4me1 -0.8
-1
Gene & Gene & Meth Gene Gene Gene
H3K27ac Gene & Gene & Meth (other gene)
H3K27ac & (same gene)
H3K4me1
G
Chromatin state
5 Transcription ((H3K36me3 low)
Transcription (H3K36me3 high)
Fold enrichment
Heterochromatin (H3K9me3)
4
Low signal
Repressed Polycomb (H3K27me3 high)
3
Repressed Polycomb (H3K27me3 low)
2 Repressed Polycomb TSS (H3K27me3, H3K4me3, H3K4me1)

Enhancer (H3K4me1)
1 Active Enhancer (H3K4me1 & H3K27ac)
Active TSS (H3K4me3 & H3K4me1)
0 Active TSS (H3K4me3 & H3K27ac)
eQTL H3K27ac H3K4me1 meQTL
hQTL hQTL
Figure 3. Features, Cell-type Specificity, and Coordination of QTLs

(A) Number of protein-coding and non-coding genes with significant eQTL (FDR <5%).
(B) Number of phenotypes associated with the same QTL.
Cell 167, 1398–1414, November 17, 2016 1403

had at least one QTL associated with them. The majority of QTLs Genetic Regulation of Alternative Splicing
were associated with one phenotype (e.g., expression of one Alternative splicing regulates lineage commitment of human
gene or one histone peak), while a fraction of them were linked blood progenitors into mature blood cells (Chen et al., 2014)
to up to 15–80 phenotypes (Figure 3B). and contributes to disease as shown in lymphoblastoid cell lines
Using the p1 statistics to assess the extent of sharing of genetic (Li et al., 2016). We explored genetic influences to alternative
signals between the three immune types showed highly cell-type- mRNA splicing in the three primary immune cells using two com-
specific effects at both histone modifications (H3K27ac, p1 = plementary methods of quantification. In a first approach, we
0.27–0.44 and H3K4me1, p1 = 0.23–0.57; Figure 3C), consistent computed the ratio of alternatively spliced junctions (percent-
with predominantly cell-type-specific patterns of association for splice-in or PSI) from mapped total RNA reads (Chen et al.,
enhancers (Farh et al., 2015). Cell specificity was lower for expres- 2014), allowing detailed surveys of splicing junctions for both
sion quantitative trait loci (eQTLs) (p1 = 0.71–0.85) and methyl- annotated exons and also for exons annotated de novo from
ation quantitative trait loci (meQTLs) (p1 = 0.79–0.93); sharing RNA-sequencing data in the three cell types. We identified a total
between the two myeloid cells was marginally greater than with of 32,357 alternatively spliced events (PSI) in 6,560 annotated
T cells (Figure 3C). Across cell types, shared QTLs (defined by genes and 2,288 unannotated transcripts. We then tested asso-
linkage disequilibrium [LD] r2 R 0.8) had predominantly concor- ciation with SNPs within 1-Mb regions surrounding each tran-
dant direction of effect (i.e., the same allele similarly increased script, using comparable approaches to the eQTL analysis. As
or decreased a trait in the two comparison cells; Figure 3D). a second approach, we used sQTLseekeR to test for SNPs asso-
As shown earlier, genetic, epigenetic, and gene expression ciated with variation in the relative abundance of a gene’s tran-
variance within a given cell type are strongly locally coordinated. script isoforms (ISO). Here, we limited testing to associations
We thus sought to identify shared genetic effects linking genes to for local (within the gene body ±5 kb) effects for annotated tran-
their putative regulatory elements. We considered all eQTL scripts of protein-coding genes. In total, 9,485 genes and
sentinel variants (eSNPs) and asked whether the same variant 1,462,663 SNPs were tested for association and FDR was
was also associated with histone modification or methylation used to correct for multiple testing. These two methods thus pro-
status. For this comparison, we required that the variant was vide complementary analysis of alternative splicing, whereby
either identical or in high LD (r2 R 0.8) with the corresponding ISO analysis is more sensitive to events involving main isoforms
histone quantitative trait loci (hQTL) or meQTL sentinel variant. changes, while PSI is able to recover more subtle splicing pat-
Using this rule, 43.3% of eSNPs were also associated with terns involving novel exons.
H3K4me1 or H3K27ac hQTLs (Figure 3E), denoting extensive On average, QTLs were detected (FDR 5%) for 15.3% of PSI
local (median distance, 57 kb) coordination of genetic influences events and 33.2% of ISO events, corresponding to 18.4% of
on gene expression and histone modifications. At shared vari- the genes tested. A sizeable fraction of PSI splicing quantitative
ants, there was strong positive correlation of per-allele effect trait loci (sQTLs) (9.6%–11.7%) involved non protein-coding
size between eQTLs and hQTLs at both histone marks (Fig- genes (Figure 4A), suggesting alternative splicing of non-coding
ure 3F), indicating a predominant activating role. Approximately RNA species may provide an additional layer of genetic regula-
43.3% of eQTL sentinel variants were also associated with a tion of cellular identity and function. The number of sQTL genes
methylation probe, 44.2% of which within corresponding (sGenes) was lower in neutrophils (2,260 and 15.0% of tested
eGenes (Figure 3E). The effect sizes for these meQTLs were genes) compared to monocytes and T cells (Figure 4A), reflecting
weakly negatively correlated to eQTLs of corresponding genes both lower levels of expression and higher rates of intron reten-
(Figure 3F) suggestive of chance overlap or a partial uncoupling tion in neutrophils compared to the other cells (Wong et al.,
between the two (Gutierrez-Arcelus et al., 2013). Further, QTLs 2013). The majority of PSI QTLs involved exon-skipping events,
mapped to distinct regulatory domains defined by chromatin followed by alternative 50 or 30 events, while the majority of ISO
states in matched cells (Carrillo de Santa Pau et al., 2016), where QTLs were complex events, given one isoform can contain
eQTLs were enriched at transcribed regions and transcription several alternative splicing events (Figure 4B).
start sites (Figure 3G), meQTLs around Polycomb-repressed When considering alternative splicing events observed in two
transcription start site (TSS) regions, and hQTLs at active or more cells, the degree of sharing for sQTLs was higher than
enhancer and TSS states. The QTLs provide a rich catalog of previously reported for eQTLs (p1 statistic = 0.88–0.96 for PSI
putative regulatory elements for genes implicated in immune and ISO; Figure 4C), and the effect sizes of shared sQTLs were
function (Table S4). We describe further examples in the context highly correlated across cells (r = 0.94–0.97). However, a large
of allelic expression and disease-focused analyses. Overall, proportion of sQTLs were specific to individual cell types (Fig-
these results demonstrate that associations are highly coordi- ure 4C), for example, up to 56% of T cell PSI QTLs were for alter-
nated within the three cells, and genetic effects underpin native splicing events that are only found in T cells. Overall, this
observed correlation between molecular traits. suggests that although alternative splicing events tend to be
(C) Percentage of phenotypes that are cell-type-specific (top) and genome-wide patterns of QTL sharing (p1 statistics) among cell types (bottom).
(D) Correlation (Pearson) between effect sizes for QTLs shared between different cell types.
(E) Percentage of eSNPs also associated (r2 R 0.8) with H3K27ac and H3K4me1 (left) or methylation levels (right).
(F) Correlation (Pearson) between effect size of expression and other molecular trait QTLs at overlapping signals (LD R0.8).
(G) Fold-enrichment of eQTLs, hQTLs, and meQTLs in different chromatin segmentation states.
See also Figures S2, S3, and S4 and Tables S2, S3, and S4.
1404 Cell 167, 1398–1414, November 17, 2016

A B
C D
Cell 167, 1398–1414, November 17, 2016 1405

highly cell-type-specific, genetic associations for alternative tions in gene flanking regions as compared to QTL mapped var-
splicing events detected in two or more cell types are typically iants (Figure S7A) and allele-specific associations tests show
consistent. Genetic influences on splicing were predominantly functional enrichments to expected chromatin states (Figures
independent from gene expression (Li et al., 2016), as reflected S7B and S7C). Altogether, these results indicate that each
by predominantly unlinked eSNP-sSNP pairs (80% pairs with method captures additional true associations.
r2 < 0.1 within-cell) and the different distribution of eQTLs and We also applied allelic mapping to gain further insight into
sQTLs within genes (Figure 4D). Further, ISO QTLs were en- several features of mapped QTLs. Focusing on genes with
riched closer to splice sites (averaging 1.9-fold) and the nearest deep-phased read measurements, we first estimated the pro-
exon (averaging 1.6-fold) when compared to non-ISO QTLs portion of allelic variation captured by primary ASE mapping.
(p < 0.05, Fisher’s exact test). A subset of genetic variants was We also carried out conditional (secondary) ASE mapping
predicted to result in switch of major isoforms in genes of key among genes with FDR <5% primary ASE signal after removing
importance in immunity and disease (Figures 4E–4G; Tables samples heterozygous for SNP to uncover more common SNP
S3 and S4), potentially involving switch to non-coding tran- effects and included these in our estimation of common SNP ef-
scripts, or nonsense-mediated decay. fects. Remarkably, over 90% of the differences of 3-fold or
greater between allelic transcripts were captured by our mapped
Allele-Specific Mapping from RNA-Sequencing and common SNPs; even lowering the threshold to 1.5-fold differ-
Histone Marks ence, we still observed >70% contribution of common SNPs
As a complementary strategy to standard QTL mapping, we also (Figure 5A). These results indicate a predominant role for com-
considered allele-specific effects. By exploiting within sample mon SNPs in governing allelic traits and argue for the compre-
variation, allelic analyses can help to identify cis-regulatory vari- hensiveness of our catalog for assessing the cis-acting impact
ation in the presence of strong confounding (e.g., trans-acting of common SNPs in these cells as well as set the upper boundary
loci, non-genetic effects), as well as uncover rare and private for prevalence of rare high-effect regulatory alleles.
regulatory variants (Pastinen, 2010). We chose an approach Utilizing the complementary information of QTL and allelic
maximizing allelic information from total RNA-seq, summing up mapping tests, we linked genes to regulatory elements using
strand-specific allele counts across GENCODE v15 transcript the strict LD criteria as previously done for QTLs (lead-SNP
regions (exons and introns) and removed reads with mapping r2 R 0.8). Joint use of all mapping approaches quadrupled
biases. We applied two models for primary association tests genetically controlled gene-peak pairs offered by QTL mapping
either using allelic information alone (allele-specific expression alone. Overall, 70%–89% of expression traits could be linked
mapping [ASE]) or in combination with read depth of non-allelic to at least one H3K4me1 or H3K27ac peak, respectively. Rich
reads (combined haplotype test [CHT]). Overall, 74%–86% of local genetic connectivity uncovered by joint allelic and QTL
genes showed significant ASE or CHT (FDR 5%) with common mapping were further validated by systematically correlating
SNPs (MAF >5%). Using peak sets described for hQTL mapping, intra-individual allelic state of peaks and linked genes (Figure 5B).
we also mapped allelic variation in H3K27ac and H3K4me1 sig- The shared allelic states provide orthogonal information comple-
nals by same approaches used for RNA-sequencing. Histone menting physical interaction maps and allow assessment of
peaks showed lower mapping efficiency (20%–36% mapped us- gene distal genetic effects often observed in association-map-
ing linear AS test or CHT, respectively) likely due to generally ping of disease SNPs (Figures 5C and 5D).
lower allelic information of shorter peak regions as compared Finally, we explored modes of locally strongly correlated
to genes. Similar to QTL mapping, we observed stronger cell- (sharing lead associations in LD r2 R 0.9) expression QTLs,
type-dependence of allele-specific chromatin states as com- where genomic coordinates of transcripts were not overlapping,
pared to ASE (Figure S7D). using allelic data. We identified 2,691 local eQTL pairs when
Variants mapped by allelic tests showed large overlap with limiting to intergene distance of 250 kb or less. The majority of
QTL-mapping, with 29%–43% top SNPs shared (r2 R 0.8), these associations (59%) occurred in transcripts transcribed
with ASE showing slightly better specificity as compared to from the same strand (shared orientation); next, commonly
CHT (that has higher power). There was strong agreement (26%) the genes shared 50 intergenic region (‘‘head-to-head’’
(43%–58% of shared mapped lead associations) of ASE versus orientation); in the rarest cases (15%) the genes sharing same
CHT associations across expression and chromatin traits. Both SNP association were ‘‘tail-to-tail’’ orientation (sharing 30 inter-
allelic approaches showed similar regulatory variant distribu- genic region). The distribution of locally shared associations
Figure 4. Features, Cell-type Specificity, and Examples of Splicing QTLs

(A) Number of protein-coding, non-coding gene, and unannotated events with a significant splicing QTL (FDR <5%).
(B) Percentage of different alternative splicing events from PSI (top) and ISO (bottom) analyses.
(C) Percentage of PSI and ISO events that are cell-type-specific (top), and genome-wide patterns of QTL sharing (p1 statistics) among the three cell types
(bottom).
(D) Probability distribution of lead eQTL and sQTL SNPs around genes.
(E–G) Examples of alternatively spliced genes showing transcript structure and their distribution based on genotypes at each ISO sQTL. (E) IRF5 and rs3807306, a
RA-predisposing SNP that is associated with the switch of two major isoforms that have alternative 50 UTR in neutrophils. (F) BTNL8 gene structure and
rs47007720, which switches a protein-coding major isoform to a non-coding isoform with intron retention in neutrophils. (G) GBP3 gene structure and
rs10922542, which switches a protein-coding major isoform to a nonsense-mediated-decay isoform and involves an exon skipping event in T cells.
See also Figure S3 and Tables S2, S3, and S4.
1406 Cell 167, 1398–1414, November 17, 2016

A B C Scale 100 kb hg19
chr10: 63,700,000 63,800,000 63,900,000 64,000,000
1.0
r2=0.653, p=1.38e-13
10000 20000 30000 40000 50000 60000

2
Allelic r =0.619, p=9.33e-15
Number of observations
correlations 2
r =0.418, p=3.03e-5
Homozygote
Proportion of SNPs
Allelic coordination rate

r2=0.373, p=7.61e-4
0.8
0.4
H3K27ac
Conditional H3K4me1
ARID5B
Lead
0.6
0.3
RNASeq
0.4
0.2
eQTL PV
0.2
NHGRI GWAS rs6479779 rs7089424 rs4948496 rs7912580 rs79193624
0.1
Catalog rs7090445 rs71508903 rs3125734
rs4245595 rs10821944
rs10821936 rs7090871
0.0
<1.5x >1.5x >2.0x >3.0x >4.0x >5.0x >6.0x >7.0x >8.0x >9.0x AS+ AS+ AS+ AS+
QTL QTL QTL QTL
QTL QTL QTL QTL
Measured Allelic Ratios K4me1 K27ac K4me1 K27ac K4me1 K27ac K4me1 K27ac
Scale 100 kb hg19
Gene-peak -ve Gene-peak +ve chr7: 26,800,000 26,900,000 27,000,000 27,100,000
r2=0.909, p<1e-16
r2=0.758, p=1.12e-13
D Allelic
r2=0.678,
p=1.04e-12
correlations r2=0.318, p=6.43e-3 r2=0.465,

Scale 50 kb hg19 p=3.06e-5
chr10: 6,040,000 6,050,000 6,060,000 6,070,000 6,080,000 6,090,000 6,100,000 6,110,000 6,120,000 r2=0.584, p=1.82e-16
r2=0.394, p=3.97e-4
H3K27ac
H3K4me1
HOTAIRM1
r2=0.955, p=3.49e-4
Allelic correlations
RNASeq
H3K27ac/H3K4me1 eQTL PV
IL2RA
NHGRI GWAS rs7804356 rs17436410
Catalog rs10486483
RNA reverse
Average AA vs. BB
expression
rs12722489 rs3118475 LD r2=0.80
12.82 _
Gene CHT Scale 50 kb hg19
mapping results chr9: 21,750,000 21,800,000 21,850,000
-log(pv)
0_
7.654 _
H3K27ac CHT r2=0.383, p=3.65e-2 r2=0.467, p=8.79e-7
mapping results Allelic
-log(pv) correlations r2=0.333, p=2.61e-4
0_
0.5 _
H3K27ac
RNASeq A/(A+B) H3K4me1
allele ratio 0- MTAP
-0.5 _
0.5 _
H3K27ac A/(A+B) RNASeq
allele ratio 0-
-0.5 _
0.5 _ eQTL PV
H3K4me1 A/(A+B) NHGRI GWAS
allele ratio 0- rs4636294 rs7023329
Catalog
-0.5 _
Figure 5. Features of Molecular Traits Revealed by Allelic Analyses

(A) Relationship of significant allelic expression imbalance and mapped common cis-regulatory SNPs. Nearly 90% of transcripts show <1.5-fold difference
between maternal and paternal copy (green line) with >2-fold differences seen in only 3% of transcripts. The primary (blue bar) or secondary (light blue) ASE
mapped SNPs account for the majority of significant allelic effects, because homozygosity for these cis-rSNPs (red bars) is observed in only 7% cases with
allelic imbalances >3-fold.
(B) Coordinated genetic effects for genes and local chromatin peaks (lead SNPs r2 R 0.8) are approximately four times more numerous (blue bars ‘‘Gene-
peak +ve’’ AS+QTL) when both allelic and QTL mapping hits are considered as compared to QTL mapped hits alone (blue bars ‘‘Gene-peak +ve’’ QTL) and can be
validated in up to 47% cases (green line) by intra-individual allelic correlation among genes and peaks. Genes with QTLs (QTL or AS+QTL) without coordinated
genetic effects do not show (<5%) allelic correlation of local peaks.
(C) Validated gene TSS/peak allelic coordination (arcs scaled by Pearson r2). Three (blue arc) H3K27ac and one H3K4me1 (red) elements linked allelicly to
ARID5B, and similar allelic coordination for MTAP, while HOTAIRM1 is linked to multiple regulatory elements. For ARID5B and MTAP, the underlying SNPs (red
[-log10] p value track ‘‘eQTL Pv’’) overlap a coordinated peak as well as a GWAS variant (green NHGRI GWAS catalog SNPs on bottom) linked to rheumatoid
arthritis and nevus counts, respectively.
(D) Disease locus functional phenotype captured solely in allele-specific analyses. IL2RA SNP (rs12722489) is associated with multiple sclerosis and Crohn’s
disease and is the top SNP for a H3K27ac CHT event spanning the transcript (blue bar); the top IL2RA CHT SNP is in high LD (r2 = 0.8) with the chromatin allelic
signal. Allelic variations between gene and H3K27ac among individuals are extremely highly correlated (Pearson r2 > 0.95, blue arc), suggesting that allelic
chromatin altered by disease SNP can lead to differential allelic expression of IL2RA.
deviated strongly from the null distribution (chi-square test = 71, only one of the transcripts in the pair showed allelic expression
2 degrees of freedom [df]) (based on orientation of genes tested) effect with mapped local QTL, whereas other transcripts showed
where the strand sharing was more common (+12%) and ‘‘tail- equal allelic expression for same eSNP (Figure 6C, bottom). This
to-tail’’ configuration was depleted (31%). This can partly be could indicate local trans acting activity of the verified cis variant.
explained by 50 overall bias of regulatory variants giving rise to, This hypothesis was supported in follow-up analyses where we
for example, bi-directional promoter variants (Figure 6A, top). tested 342 ‘‘cis-eQTLs’’ showing potential local cis and trans ef-
We also identified strong locally correlated allelic effects span- fects (the latter showing no allelic bias despite high allelic infor-
ning multiple independent annotations, extending to chromatin mativity) for genome-wide trans-associations and compared
layer and multiple genes in both strands (Figure 6B, middle). them to control set of 678 lead eSNPs (matched by mapping
Perhaps the most intriguing associations were those where significance and distance from TSS). The candidate local
Cell 167, 1398–1414, November 17, 2016 1407

A Figure 6. Allelic Behavior of Locally Corre-
lated eQTLs
Examples of modes for clustering of ‘‘cis-eQTLs.’’
Top to bottom: gene annotations (blue), eQTL pair
sharing same top association (blue and red rect-
angles), local RNA-signal (fwd and rev strand;
black), H3K4me1 (red) and H3K27ac (blue),
average (log) RNA-seq intensity among top SNP
eQTL SNP homozygotes (AA, red; BB, green), top
SNP (blue tick and rsID), eQTL mapping result
(-log10 p value track in blue), allelic expression
deviation (equal expression = 0, monoallelic
expression = j0.5j) among top QTL SNP hetero-
zygotes in forward (black) and reverse (gray)
strands, allelic H3K27ac (blue), and H3Kme1 (red)
deviation among top QTL SNP heterozygotes.
(A) ‘‘head-to-head’’ configuration of eQTL and
allelic effect, where total and allelic difference is
mapped to a variant in a bidirectional promoter.
(B) local SNP altering both chromatin and reverse
and forward strands across multiple transcripts
B and chromatin signal.
(C) example of a putative ‘‘cis-trans’’ pair where
B3GALNT2 shows strong overexpression of one
genotype and consistent allelic effect with eSNP
localizing to its promoter, which also alters
expression level of GGPS1 without detectable
allelic effect.
1408 Cell 167, 1398–1414, November 17, 2016

Cell 167, 1398–1414, November 17, 2016 1409

trans-activators showed 2-fold enrichment (p = 0.0003, Fisher’s displayed marginally higher levels of enrichment within the colo-
exact test) of trans-associations in different chromosomes as calized set compared to eQTLs after accounting for the overall
compared to matched control eQTLs (14% versus 7% of SNPs number of overlapped loci in each mark (Fisher’s p value <
with trans-associations FDR <5%) establishing that allelic infor- 0.01; Figure 7B). MS, T1D, and UC had associations predomi-
mation can be utilized to reveal unexpected features in local nantly colocalizing with T cell marks, compared to other diseases
eQTLs. (Figure 7C).
These overlaps offer insights into disease associations and
Mapping Molecular Mechanisms at Disease-Associated disease specificity, (Figures 7D–7G; Table S5). Colocalization
Variants to eQTLs/sQTLs (31% of disease loci) guide the identification
Many GWAS loci map to regulatory domains. Thus, a key goal of of possible functional gene candidates and mechanisms. For
this study was to explore the value of molecular trait associations instance, CD associations centered on rs7423615 were colocal-
to annotate putative functional consequences of disease-asso- ized with alternative splicing signal for SP140 (nuclear body pro-
ciated loci with single-base resolution. We focused on seven tein) in T cells (Figure 7D). Similarly, the MS-associated variant
autoimmune diseases (celiac disease [CEL] [Dubois et al., rs1800693 colocalizes with tumor necrosis factor receptor su-
2010], inflammatory bowel disease [IBD] [Liu et al., 2015], perfamily member 1A (TNFRSF1A) alternative splicing in mono-
including Crohn’s disease [CD] and ulcerative colitis [UC], multi- cytes and neutrophils; the rs917997 CEL variant colocalizes
ple sclerosis [MS] [Beecham et al., 2013], type 1 diabetes [T1D] with interleukin 18 receptor accessory protein (IL18RAP) alterna-
[Onengut-Gumuscu et al., 2015], and rheumatoid arthritis [RA] tive splicing in neutrophils, rs35260072 (IBD) colocalizes with
[Okada et al., 2014]), for which we retrieved publicly available interferon regulatory factor 1 (IRF1) splicing in neutrophils; and
genome-wide summary statistics. We first tested genome- the rs12936409 IBD/CD variant (Anderson et al., 2011) colocal-
wide enrichment for variants nominally associated with disease izes with gasdermin B (GSDMB) splicing in T cells.
(p value % 105), applying an enrichment test controlling for Intersection with DNA methylation or/and histone modifica-
LD, local gene density, and variant minor allele frequency (Iotch- tions allows extending mechanistic hypotheses for eQTLs (Fig-
kova et al., 2016). We detected moderate-to-strong enrichment ures 7E and 7F). For instance, the IBD/CD locus colocalized an
of disease associations for all classes of molecular QTLs tested eQTL rs1081768 that is associated with TNFSF15 expression
and specific to autoimmune diseases with limited evidence for levels in monocytes and also with H3K27ac/H3K4me1 in the
cell-type specificity of enrichments (Figure 7A). same cell. Similarly, IBD/CD SNP rs4077515 colocalized an
These significant overlaps suggest these are not chance eQTL governing CARD9 expression that is also an hQTL for
events, motivating to investigate individual co-localized variants. H3K4me1 in both monocytes and neutrophils (Figure 7F).
To identify such loci, we first systematically intersected (LD Allele-specific analyses confirmed that H3K4me1 variation was
r2 R 0.8) disease-associated variants with molecular QTLs and linked to CARD9 cis-regulation in both cell types, suggesting
identified 14,074 instances of trait-locus overlap (Figure 7B). weaker marginal effect due to perhaps lower activity of the
We then applied a more stringent Bayesian model test to esti- affected enhancer in neutrophils was missed in eQTL mapping.
mate the posterior probability (PP) of each genomic locus con- At least two thirds of disease-colocalized loci involved a DNA
taining a single variant affecting both disease and molecular trait methylation or histone modification QTL without a correspond-
(‘‘colocalization’’) against other possible models (single trait or ing eQTL, indicating a possible effect on poised or primed pro-
two independent associations) (Giambartolomei et al., 2014; moters/enhancers with no effect on gene expression levels at
Pickrell et al., 2016). Of note, the colocalization model does not baseline conditions (while for a subset of loci incomplete ascer-
differentiate a causal relationship between a molecular trait tainment of eQTLs due to power limitations cannot be ruled out).
and disease from independent (‘‘pleiotropic’’) effects driven by An example is shown in Figure 7G, where the IBD/CD/UC variant
the same. Further, in regions of extended LD, the model has rs7282490 (21q22.3) colocalized with H3K27ac and H3K4me1
limited power to distinguish colocalization from two variants in variation, but not gene expression, in neutrophils. Associations
high LD but with independent effects on phenotype. at this locus were driven by rs8134436, mapping to within the
Overall, 3,169 disease-molecular trait pairs (or 23%) had high two histone modification peaks. This SNP is also predicted to
posterior probability for colocalization according to a stringent affect binding of pioneer transcription factors PU1 and CEBPB
cutoff (PP3 R 0.99), corresponding to 345 unique disease loci in neutrophils (S.W., unpublished data), sits within an active
(Figures 7B and 7C; Table S5). Colocalization of H3K27ac marks enhancer chromatin state in neutrophils, and overlaps binding
Figure 7. Molecular Mechanisms at Autoimmune Disease Loci

(A) Enrichment in molecular QTLs of celiac disease (CEL), Crohn’s disease (CD), inflammatory bowel disease (IBD), ulcerative colitis (UC), multiple sclerosis (MS),
rheumatoid arthritis (RA), and type 1 (T1D) and type 2 diabetes (T2D).
(B) N overlap = Number of observed QTL-trait pairs (top table) or unique disease loci (bottom table) that overlap (r2 R 0.8) disease variants across all three cell
types. Disease colocalized = number and proportion of overlapping pairs that colocalize with disease variants with PP3 R 0.99. FE = Ratio of fold enrichment of
these proportions over eQTLs.
(C) Number (%) of disease loci colocalizing with cell-type-specific molecular QTLs, for associations unique to M, N, T, or shared between two or three cell types.
(D–G) Examples of colocalization between disease and molecular traits. Each plot shows regional association (window 2 Mb centered on the significant peak) for
a given disease locus (gray), molecular mark (color) and cell type, and corresponding molecular trait signal coverage (log2 RPM, 20 kb). (D) PSI sQTL. (E) eQTL/
meQTL. (F) eQTL/hQTL. (G) hQTL with no corresponding eQTL.
See Table S5.
1410 Cell 167, 1398–1414, November 17, 2016

sites of multiple transcription factors (cohesin subunit RAD21, The use of high-resolution genome, transcriptome, and epige-
CEBPB/E, and P300) in the neutrophilic cell line HL60. Additional nome sequencing reveals genetic influences at disease variants
disease links similar to CARD9 locus were evident from allele- captured by effects across epigenomic data layers, and the use
specific analyses with 22% of histone mark or gene expression of distinct primary immune cell lineages reveals a sizeable frac-
traits linked to disease SNPs observed solely in allele-specific tion of genetic variants where correlations are only visible in cell-
datasets (Figure 5D). specific contexts. These cases include hundreds of autoimmune
Overall, these findings confirm the occurrence of widespread disease variants likely acting through perturbation of local regu-
genetic regulation of immune and host defense pathways over- latory circuitry. Overlap between molecular and disease associ-
lapping disease loci and involving not only gene expression but ations alone does not provide proof of causality. However, our
also splicing and epigenetic modifications. The occurrence of rigorous approach combining strict linkage disequilibrium
potential nonfunctional (chance) overlap at individual loci will thresholds and statistical colocalization techniques yielded
require careful follow-up studies to validate functional hypothe- high value targets for experimental follow-up by the community
ses. Nevertheless, these results suggest the convergence of in- to identify causal mechanisms.
dependent regulatory layers for cell-specific function, as well as Overall, the data and results are expected to improve under-
independent techniques for their measurement, yields biological standing of the regulation of the transcriptional machinery in
validity to mapped traits well beyond traditional eQTL studies. three important cells of the immune system. This deep charac-
terization of molecular events is expected to substantially boost
DISCUSSION focused functional explorations of human disease variants,
revealing potential new disease mechanisms and therapeutic
We generated a high quality expansive resource for the scien- opportunities.
tific community. Exploiting this unprecedented dataset, three
distinct aspects of the interplay of genetic and epigenetic factors STAR+METHODS
in gene regulation were explored. Variance decomposition
analysis was used to obtain a quantitative assessment of the Detailed methods are provided in the online version of this paper
contribution of epigenetic factors to transcription, independent and include the following:
of cis-genetic influences. We showed that cis-genetic effects
explain the majority of transcriptional variance for a majority of d KEY RESOURCES TABLE
genes with relatively modest independent epigenetic influences d CONTACT FOR REAGENT AND RESOURCE SHARING
for a small subset of biologically relevant genes. These results d EXPERIMENTAL MODEL AND SUBJECT DETAILS
strongly suggest the need to adequately control for the effect B Human Subjects
of cis-genetic variation in epigenome-focused explorations. d METHOD DETAILS
While our data only covers a fraction of epigenomic space (chro- B Sample Collection and Cell Isolation
matin states, interactions, methylome) these observations are B Molecular Data Generation and Processing
important in context of EWAS, which typically survey a smaller d QUANTIFICATION AND STATISTICAL ANALYSIS
fraction of epigenome. In fact, our estimates may be conserva- B Statistical Analyses
tive for the role of DNA sequence governing gene expression d DATA AND SOFTWARE AVAILABILITY
variance, because it is expected that trans-effects and rare cis- B Data Resources
genetic effects will account for part of the cis-variance we attri- d ADDITIONAL RESOURCES
bute to independent epigenetic effects.
The use of allele-specific analysis in parallel with QTL mapping SUPPLEMENTAL INFORMATION
allowed us to expand the spectrum of genetic influences as-
Supplemental Information includes seven figures and six tables and can be
sayed in this study. True cis-regulatory variation is expected to found with this article online at http://dx.doi.org/10.1016/j.cell.2016.10.026.
give rise to allelic differences in distribution of sequence reads
from functional elements (Pastinen, 2010) and can be used in AUTHOR CONTRIBUTIONS
‘‘cis-rSNP’’ mapping (Adoue et al., 2014) with improved speci-
ficity when methods to tackle alignment biases leading to Conceptualization, L. Chen, T.P., O.S., N.S., and H.G.S.; Methodology, V.A.,
spurious signals (Kumasaka et al., 2016) are applied. The major- S.B., L. Chen, O.D., K.D., S.E., M.F., D.G.-M., R.G., H.L., J.H.A.M., D.S.P.,
A.R., D.R., N.S., T.P., M.S., A.V., L.V., S.P.W., and M.-L.Y.; Software,
ity of coordinated, genetically controlled regulatory element
J.A.M., J.M.F., L. Chen, D.G.-M., R.G., V.I., K.K., A.V., and L.V.; Validation,
connections require the combined discovery power of QTL and L. Chen and K.D.; Formal Analysis, C.A.A., V.A., S.B., M.C., L. Chen, W.C.,
AS-specific mapping techniques, and thus our resource will K.D., S.E., H.E., D.G.-M., B.G., E.H., V.I., K.K., T.K., A.L.M., V.P., T.P.,
allow detailed investigation of long-range interactions in context D.S.P., D.R., T.R., X.S., N.S., L.V., K.W., Y.Y., and M.-L.Y.; Investigation,
of population variation. We further demonstrate that large- T.W.K., I.C., F.P., M.-M.S., S.-Y.W., F.O.B., S.A., K.B., F.B., S.-H.C.,
magnitude allelic imbalance is rare and predominantly (up to L. Chen, K.D., S.E., S.F., M.F., D.G.-M., R.G., E.J.-M., B.K., J.J.L., A.M.,
93%) explained by common cis-regulatory variants. Finally, clus- J.H.A.M., F.M., V.P., F.P.C., A.R., K.R., D.R., S.R., N.S., M.S., N.S., T.P.,
L.V., and S.W.; Resources, M.T.M., V.A., L. Chen, A.D., K.D., S.E., S.F.,
tering of ‘‘cis-eQTLs’’ observed by us and earlier studies were
M.F., F.J., H.L., E.L., J.H.A.M., T.P., D.S.P., F.P.C., D.R., D.R., N.S., L.V.,
explored, and diversity of mechanisms were suggested by allelic S.W., S.P.W., Y.Y., and M.-L.Y.; Data Curation, V.A., L.B., D.B., M.C.,
data, including bi-directional promoters, locally expanding chro- L. Chen, W.C., A.D., K.D., S.F., B.G., F.J., H.H.D.K., E.L., J.H.A.M., T.P.,
matin effects, as well as local ‘‘cis-trans’’ pairs. D.S.P., D.R., N.S., L.V., S.W., and Y.Y.; Writing – Original Draft, L. Chen,
Cell 167, 1398–1414, November 17, 2016 1411

T.P., O.S., N.S., and L.V.; Writing – Review & Editing, L. Chen, L.V., A.L.M., protocol for high-throughput chromatin immunoprecipitation. Genome Biol.
V.P., P.C., O.S., T.P., and N.S.; Visualization, J.A.M., L. Chen, S.F., T.K., 14, R124.
B.G., and L.V.; Supervision, S.B., G.B., L. Clarke, K.D., S.F., P.F., M.F., Allum, F., Shao, X., Guénard, F., Simon, M.M., Busche, S., Caron, M., Lam-
R.G., J.H.A.M., W.H.O., T.P., D.R., N.S., H.G.S., A.V., L.V., and M.-L.Y.; Proj- bourne, J., Lessard, J., Tandre, K., Hedman, A.K., et al.; Multiple Tissue Hu-
ect Administration, G.B., K.D., S.F., M.F., D.M., and N.S.; Funding Acquisition, man Expression Resource Consortium (2015). Characterization of functional
S.E.A., S.B., G.B., E.T.D., S.F., P.F., M.F., W.H.O., T.P., N.S., and H.G.S. methylomes by next-generation capture sequencing identifies novel dis-
ease-associated variants. Nat. Commun. 6, 7211.
ACKNOWLEDGMENTS Anderson, C.A., Boucher, G., Lees, C.W., Franke, A., D’Amato, M., Taylor,
K.D., Lee, J.C., Goyette, P., Imielinski, M., Latiano, A., et al. (2011). Meta-anal-
This work was predominantly funded by the EU FP7 High Impact Project ysis identifies 29 additional ulcerative colitis risk loci, increasing the number of
BLUEPRINT (HEALTH-F5-2011-282510) and the Canadian Institutes of Health confirmed associations to 47. Nat. Genet. 43, 246–252.
Research (CIHR EP1-120608). We thank Simon Dökel, Matthias Linser, Alex- Aryee, M.J., Jaffe, A.E., Corrada-Bravo, H., Ladd-Acosta, C., Feinberg, A.P.,
ander Kovacsovics, and Daniela Balzereit for excellent technical skills on the Hansen, K.D., and Irizarry, R.A. (2014). Minfi: a flexible and comprehensive
Illumina platform and Mark Kristiansen (UCL Genomics) for processing the Il- Bioconductor package for the analysis of Infinium DNA methylation microar-
lumina Infinium HumanMethylation450 BeadChIPs. We gratefully acknowl- rays. Bioinformatics 30, 1363–1369.
edge the participation of all NIHR Cambridge BioResource volunteers. We
Battle, A., Mostafavi, S., Zhu, X., Potash, J.B., Weissman, M.M., McCormick,
thank the Cambridge BioResource staff for their help with volunteer recruit-
C., Haudenschild, C.D., Beckman, K.B., Shi, J., Mei, R., et al. (2014). Charac-
ment. We thank members of the Cambridge BioResource SAB and Manage-
terizing the genetic basis of transcriptome diversity through RNA-sequencing
ment Committee for their support of our study and the National Institute for
of 922 individuals. Genome Res. 24, 14–24.
Health Research Cambridge Biomedical Research Centre for funding. The
research leading to these results has also received funding from the European Beecham, A.H., Patsopoulos, N.A., Xifara, D.K., Davis, M.F., Kemppinen, A.,
Molecular Biology Laboratory, the Max Planck Society, the Spanish Ministry of Cotsapas, C., Shah, T.S., Spencer, C., Booth, D., Goris, A., et al.; International
Economy and Competitiveness, ‘‘Centro de Excelencia Severo Ochoa 2013- Multiple Sclerosis Genetics Consortium (IMSGC); Wellcome Trust Case Con-
2017’’, SEV-2012-0208 and Spanish National Bioinformatics Institute (INB- trol Consortium 2 (WTCCC2); International IBD Genetics Consortium (IIBDGC)
ISCIII) PT13/0001/0021 co-funded by FEDER ‘‘Una Manera de hacer Europa.’’ (2013). Analysis of immune-related loci identifies 48 new susceptibility variants
D.G.-M. is supported by a ‘‘la Caixa’’-Severo Ochoa pre-doctoral fellowship, for multiple sclerosis. Nat. Genet. 45, 1353–1360.
M.F. was supported by the BHF Cambridge Centre of Excellence (RE/13/6/ Bentham, J., Morris, D.L., Cunninghame Graham, D.S., Pinder, C.L., Tomble-
30180), K.D. is funded as an HSST trainee by NHS Health Education England, son, P., Behrens, T.W., Martı́n, J., Fairfax, B.P., Knight, J.C., Chen, L., et al.
S.E. is supported by a fellowship from ‘‘La Caixa’’, V.P. is supported by an (2015). Genetic association analyses implicate aberrant regulation of innate
FEBS long-term fellowship, and N.S.’s research is supported by the Wellcome and adaptive immunity genes in the pathogenesis of systemic lupus erythema-
Trust (WT098051 and WT091310), the EU FP7 (EPIGENESYS 257082), and the tosus. Nat. Genet. 47, 1457–1464.
National Institute for Health Research (NIHR) Cambridge Biomedical Research
Browning, S.R., and Browning, B.L. (2007). Rapid and accurate haplotype
Centre. The Cardiovascular Epidemiology Unit is supported by the UK Medical
phasing and missing-data inference for whole-genome association studies
Research Council (G0800270), British Heart Foundation (SP/09/002), UK Na-
by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097.
tional Institute for Health Research Cambridge Biomedical Research Centre.
The Blood and Transplant Unit (BTRU) in Donor Health and Genomics is part Carrillo de Santa Pau, E., Juan, D., Pancaldi, V., Were, F., Martin-Subero, I.,
of, and funded by, the National Institute for Health Research (NIHR) and is a Rico, D., and Valencia, A. (2016). Searching for the chromatin determinants
partnership between the University of Cambridge and NHS Blood and of human hematopoiesis. bioRxiv. http://dx.doi.org/10.1101/082917.
Transplant (NHSBT) in collaboration with the University of Oxford and the Casale, F.P., Rakitsch, B., Lippert, C., and Stegle, O. (2015). Efficient set tests
Wellcome Trust Sanger Institute. The T cell data were produced by the McGill for the genetic analysis of correlated traits. Nat. Methods 12, 755–758.
Epigenomics Mapping Centre (EMC McGill). It is funded under the Canadian
Chen, C., Grennan, K., Badner, J., Zhang, D., Gershon, E., Jin, L., and Liu, C.
Epigenetics, Environment, and Health Research Consortium (CEEHRC) by
(2011). Removing batch effects in analysis of expression microarray data: an
the Canadian Institutes of Health Research and by Genome Quebec (CIHR
evaluation of six batch adjustment methods. PLoS ONE 6, e17238.
EP1-120608), with additional support from Genome Canada and FRSQ. T.P.
holds a Canada Research Chair. P.F. is a member of the Scientific Advisory Chen, L., Kostadima, M., Martens, J.H., Canu, G., Garcia, S.P., Turro, E.,
Board of Omicia, Inc. S.W. is now an employee of GENOMICS plc, with Downes, K., Macaulay, I.C., Bielczyk-Maczynska, E., Coe, S., et al.; BRIDGE
some minor share options, although all work was conducted when he was Consortium (2014). Transcriptional diversity during lineage commitment of hu-
an employee of EMBL-EBI. man blood progenitors. Science 345, 1251033.
DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C.,
Received: January 31, 2016 Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., et al. (2011). A frame-
Revised: August 19, 2016 work for variation discovery and genotyping using next-generation DNA
Accepted: October 14, 2016 sequencing data. Nat. Genet. 43, 491–498.
Published: November 17, 2016 Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut,
P., Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq
REFERENCES aligner. Bioinformatics 29, 15–21.
Dubois, P.C., Trynka, G., Franke, L., Hunt, K.A., Romanos, J., Curtotti, A.,
Abnizova, I., Skelly, T., Naumenko, F., Whiteford, N., Brown, C., and Cox, T. Zhernakova, A., Heap, G.A., Adány, R., Aromaa, A., et al. (2010). Multiple com-
(2010). Statistical comparison of methods to estimate the error probability in mon variants for celiac disease influencing immune gene expression. Nat.
short-read Illumina sequencing. J. Bioinform. Comput. Biol. 8, 579–591. Genet. 42, 295–302.
Adoue, V., Schiavi, A., Light, N., Almlöf, J.C., Lundmark, P., Ge, B., Kwan, T., Ernst, J., and Kellis, M. (2012). ChromHMM: automating chromatin-state dis-
Caron, M., Rönnblom, L., Wang, C., et al. (2014). Allelic expression mapping covery and characterization. Nat. Methods 9, 215–216.
across cellular lineages to establish impact of non-coding SNPs. Mol. Syst. Fairfax, B.P., Humburg, P., Makino, S., Naranbhai, V., Wong, D., Lau, E., Jos-
Biol. 10, 754. tins, L., Plant, K., Andrews, R., McGee, C., and Knight, J.C. (2014). Innate im-
Aldridge, S., Watt, S., Quail, M.A., Rayner, T., Lukk, M., Bimson, M.F., Gaffney, mune activity conditions the effect of regulatory variants upon monocyte gene
D., and Odom, D.T. (2013). AHT-ChIP-seq: a completely automated robotic expression. Science 343, 1246949.
1412 Cell 167, 1398–1414, November 17, 2016

Farh, K.K., Marson, A., Zhu, J., Kleinewietfeld, M., Housley, W.J., Beik, S., estimation of genetic variance components of complex traits. Genet. Sel.
Shoresh, N., Whitton, H., Ryan, R.J., Shishkin, A.A., et al. (2015). Genetic Evol. 42, 22.
and epigenetic fine mapping of causal autoimmune disease variants. Nature Leek, J.T., Johnson, W.E., Parker, H.S., Jaffe, A.E., and Storey, J.D. (2012).
518, 337–343. The sva package for removing batch effects and other unwanted variation in
Foissac, S., and Sammeth, M. (2007). ASTALAVISTA: dynamic and flexible high-throughput experiments. Bioinformatics 28, 882–883.
analysis of alternative splicing events in custom gene datasets. Nucleic Acids Li, H. (2011). A statistical framework for SNP calling, mutation discovery,
Res. 35, W297–W299. association mapping and population genetical parameter estimation from
Ge, B., Pokholok, D.K., Kwan, T., Grundberg, E., Morcos, L., Verlaan, D.J., Le, sequencing data. Bioinformatics 27, 2987–2993.
J., Koka, V., Lam, K.C., Gagné, V., et al. (2009). Global patterns of cis variation
Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Bur-
in human cells revealed by high-density allelic expression analysis. Nat. Genet.
rows-Wheeler transform. Bioinformatics 25, 1754–1760.
41, 1216–1222.
Li, Y.I., van de Geijn, B., Raj, A., Knowles, D.A., Petti, A.A., Golan, D., Gilad, Y.,
Giambartolomei, C., Vukcevic, D., Schadt, E.E., Franke, L., Hingorani, A.D.,
and Pritchard, J.K. (2016). RNA splicing is a primary link between genetic vari-
Wallace, C., and Plagnol, V. (2014). Bayesian test for colocalisation between
ation and disease. Science 352, 600–604.
pairs of genetic association studies using summary statistics. PLoS Genet.
10, e1004383. Lippert, C., Casale, F.P., Rakitsch, B., and Stegle, O. (2014). LIMIX: genetic
analysis of multiple traits. bioRxiv. http://dx.doi.org/10.1101/003905.
Greenside, P., Srivas, R., Phanstiel, D.H., Pekowska, A., et al. (2015). Genetic Liu, J.Z., van Sommeren, S., Huang, H., Ng, S.C., Alberts, R., Takahashi, A.,
control of chromatin states in humans involves local and distal chromosomal Ripke, S., Lee, J.C., Jostins, L., Shah, T., et al.; International Multiple Sclerosis
interactions. Cell 162, 1051–1065. Genetics Consortium; International IBD Genetics Consortium (2015). Associa-
tion analyses identify 38 susceptibility loci for inflammatory bowel disease and
Gurdasani, D., Carstensen, T., Tekola-Ayele, F., Pagani, L., Tachmazidou, I.,
highlight shared genetic risk across populations. Nat. Genet. 47, 979–986.
Hatzikotoulas, K., Karthikeyan, S., Iles, L., Pollard, M.O., Choudhury, A.,
et al. (2015). The African Genome Variation Project shapes medical genetics Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold
in Africa. Nature 517, 327–332. change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550.
Gutierrez-Arcelus, M., Lappalainen, T., Montgomery, S.B., Buil, A., Ongen, H., Maksimovic, J., Gordon, L., and Oshlack, A. (2012). SWAN: Subset-quantile
Yurovsky, A., Bryois, J., Giger, T., Romano, L., Planchon, A., et al. (2013). Pas- within array normalization for illumina infinium HumanMethylation450
sive and active DNA methylation and the interplay with genetic variation in BeadChips. Genome Biol. 13, R44.
gene regulation. eLife 2, e00523. Melcher, M., Unger, B., Schmidt, U., Rajantie, I.A., Alitalo, K., and Ellmeier, W.
Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocin- (2008). Essential roles for the Tec family kinases Tec and Btk in M-CSF recep-
ski, F., Aken, B.L., Barrell, D., Zadissa, A., Searle, S., et al. (2012). GENCODE: tor signaling pathways that regulate macrophage survival. J. Immunol. 180,
the reference human genome annotation for The ENCODE Project. Genome 8048–8056.
Res. 22, 1760–1774. Monlong, J., Calvo, M., Ferreira, P.G., and Guigó, R. (2014). Identification of
Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., genetic variants associated with alternative splicing using sQTLseekeR. Nat.
Collins, F.S., and Manolio, T.A. (2009). Potential etiologic and functional impli- Commun. 5, 4698.
cations of genome-wide association loci for human diseases and traits. Proc.
Montgomery, S.B., and Dermitzakis, E.T. (2011). From expression QTLs to
Natl. Acad. Sci. USA 106, 9362–9367.
personalized transcriptomics. Nat. Rev. Genet. 12, 277–282.
Horvath, S. (2013). DNA methylation age of human tissues and cell types.
Morales, J.L., and Nocedal, J. (2011). Remark on Algorithm 778: L-BFGS-B,
Genome Biol. 14, R115.
FORTRAN routines for large scale bound constrained optimization. ACM
Iotchkova, V., Ritchie, G.R.S., Geihs, M., Morganella, S., Min, J.L., Walter, K., Trans. Math. Softw. 38, 1.
Timpson, N.J., UK10K Consortium, Dunham, I., Birney, E., and Nicole Sor-
Morris, A.P., Voight, B.F., Teslovich, T.M., Ferreira, T., Segrè, A.V., Steinthors-
anzo. (2016). GARFIELD - GWAS Analysis of Regulatory or Functional Informa-
dottir, V., Strawbridge, R.J., Khan, H., Grallert, H., Mahajan, A., et al.; Well-
tion Enrichment with LD correction. bioRxiv. http://dx.doi.org/10.1101/
come Trust Case Control Consortium; Meta-Analyses of Glucose and Insu-
085738.
lin-related traits Consortium (MAGIC) Investigators; Genetic Investigation of
Johnson, W.E., Li, C., and Rabinovic, A. (2007). Adjusting batch effects in mi- ANthropometric Traits (GIANT) Consortium; Asian Genetic Epidemiology
croarray expression data using empirical Bayes methods. Biostatistics 8, Network–Type 2 Diabetes (AGEN-T2D) Consortium; South Asian Type 2 Dia-
118–127. betes (SAT2D) Consortium; DIAbetes Genetics Replication And Meta-analysis
Jostins, L., Ripke, S., Weersma, R.K., Duerr, R.H., McGovern, D.P., Hui, K.Y., (DIAGRAM) Consortium (2012). Large-scale association analysis provides in-
Lee, J.C., Schumm, L.P., Sharma, Y., Anderson, C.A., et al.; International IBD sights into the genetic architecture and pathophysiology of type 2 diabetes.
Genetics Consortium (IIBDGC) (2012). Host-microbe interactions have shaped Nat. Genet. 44, 981–990.
the genetic architecture of inflammatory bowel disease. Nature 491, 119–124. Naranbhai, V., Fairfax, B.P., Makino, S., Humburg, P., Wong, D., Ng, E., Hill,
Kang, H.M., Ye, C., and Eskin, E. (2008a). Accurate discovery of expression A.V., and Knight, J.C. (2015). Genomic modulators of gene expression in hu-
quantitative trait loci under confounding from spurious and genuine regulatory man neutrophils. Nat. Commun. 6, 7545.
hotspots. Genetics 180, 1909–1925. Nica, A.C., Parts, L., Glass, D., Nisbet, J., Barrett, A., Sekowska, M., Travers,
Kang, H.M., Zaitlen, N.A., Wade, C.M., Kirby, A., Heckerman, D., Daly, M.J., M., Potter, S., Grundberg, E., Small, K., et al.; MuTHER Consortium (2011). The
and Eskin, E. (2008b). Efficient control of population structure in model organ- architecture of gene regulatory variation across multiple human tissues: the
ism association mapping. Genetics 178, 1709–1723. MuTHER study. PLoS Genet. 7, e1002003.
Kumasaka, N., Knights, A.J., and Gaffney, D.J. (2016). Fine-mapping cellular Nordlund, J., Bäcklin, C.L., Wahlberg, P., Busche, S., Berglund, E.C., Eloranta,
QTLs with RASQUAL and ATAC-seq. Nat. Genet. 48, 206–213. M.L., Flaegstad, T., Forestier, E., Frost, B.M., Harila-Saari, A., et al. (2013).
Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, Genome-wide signatures of differential DNA methylation in pediatric acute
A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M.J., et al.; Roadmap Epige- lymphoblastic leukemia. Genome Biol. 14, r105.
nomics Consortium (2015). Integrative analysis of 111 reference human epige- Okada, Y., Wu, D., Trynka, G., Raj, T., Terao, C., Ikari, K., Kochi, Y., Ohmura,
nomes. Nature 518, 317–330. K., Suzuki, A., Yoshida, S., et al.; RACI consortium; GARNET consortium
Lee, S.H., Goddard, M.E., Visscher, P.M., and van der Werf, J.H. (2010). Using (2014). Genetics of rheumatoid arthritis contributes to biology and drug dis-
the realized relationship matrix to disentangle confounding factors for the covery. Nature 506, 376–381.
Cell 167, 1398–1414, November 17, 2016 1413

Onengut-Gumuscu, S., Chen, W.M., Burren, O., Cooper, N.J., Quinlan, A.R., and quantification by RNA-Seq reveals unannotated transcripts and isoform
Mychaleckyj, J.C., Farber, E., Bonnie, J.K., Szpak, M., Schofield, E., et al.; switching during cell differentiation. Nat. Biotechnol. 28, 511–515.
Type 1 Diabetes Genetics Consortium (2015). Fine mapping of type 1 diabetes
Triche, T.J., Jr., Weisenberger, D.J., Van Den Berg, D., Laird, P.W., and Sieg-
susceptibility loci and evidence for colocalization of causal variants with
mund, K.D. (2013). Low-level processing of Illumina Infinium DNA Methylation
lymphoid gene enhancers. Nat. Genet. 47, 381–386.
BeadArrays. Nucleic Acids Res. 41, e90.
Ong, C.T., and Corces, V.G. (2012). Enhancers: emerging roles in cell fate
van de Geijn, B., McVicker, G., Gilad, Y., and Pritchard, J.K. (2015). WASP:
specification. EMBO Rep. 13, 423–430.
allele-specific software for robust molecular quantitative trait locus discovery.
Pastinen, T. (2010). Genome-wide allele-specific analysis: insights into regula- Nat. Methods 12, 1061–1063.
tory variation. Nat. Rev. Genet. 11, 533–538.
Waszak, S.M., Delaneau, O., Gschwind, A.R., Kilpinen, H., Raghav, S.K., Wit-
Pickrell, J.K., Berisa, T., Liu, J.Z., Ségurel, L., Tung, J.Y., and Hinds, D.A.
wicki, R.M., Orioli, A., Wiederkehr, M., Panousis, N.I., Yurovsky, A., et al.
(2016). Detection and interpretation of shared genetic influences on 42 human
(2015). Population variation and genetic control of modular chromatin archi-
traits. Nat. Genet. 48, 709–717.
tecture in humans. Cell 162, 1039–1050.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D.,
Maller, J., Sklar, P., de Bakker, P.I., Daly, M.J., and Sham, P.C. (2007). PLINK: Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., Klemm,
a tool set for whole-genome association and population-based linkage ana- A., Flicek, P., Manolio, T., Hindorff, L., and Parkinson, H. (2014). The NHGRI
lyses. Am. J. Hum. Genet. 81, 559–575. GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids
Res. 42, D1001–D1006.
Schmidt, U., Boucheron, N., Unger, B., and Ellmeier, W. (2004). The role of Tec
family kinases in myeloid cells. Int. Arch. Allergy Immunol. 134, 65–78. Wong, J.J., Ritchie, W., Ebner, O.A., Selbach, M., Wong, J.W., Huang, Y., Gao,
Stegle, O., Parts, L., Piipari, M., Winn, J., and Durbin, R. (2012). Using proba- D., Pinello, N., Gonzalez, M., Baidya, K., et al. (2013). Orchestrated intron
bilistic estimation of expression residuals (PEER) to obtain increased power retention regulates normal granulocyte differentiation. Cell 154, 583–595.
and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507. Yang, J., Lee, S.H., Goddard, M.E., and Visscher, P.M. (2011). GCTA: a tool for
Storey, J.D., and Tibshirani, R. (2003). Statistical significance for genomewide genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82.
studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445. Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E.,
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, Nusbaum, C., Myers, R.M., Brown, M., Li, W., and Liu, X.S. (2008). Model-
M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137.
1414 Cell 167, 1398–1414, November 17, 2016

STAR+METHODS
KEY RESOURCES TABLE

Antibodies
CD16 microbeads Miltenyi Cat# 130-045-701
CD14 microbeads Miltenyi Cat# 130-050-201
M4P9 FITC BD Biosciences Cat# 345786
B73.1 / leu11c PE BD Biosciences Cat# 347617
VEP13, MACS PE Miltenyi Cat# 130-091-245
BIRMA 17C FITC IBGRL-NHS Cat# 9453CE
RPA-T4 FITC BD Biosciences Cat# 561842
HI100 PE BD Biosciences Cat# 555489
H3K4me1 Diagenode Cat# C15410194
H3K27ac Diagenode Cat# C15410196
EasySep Human Naive CD4+ T Cell StemCell Cat# 19155
Enrichment Kit
Illumina TruSeq Stranded Total RNA Illumina Cat# RS-122-2201
Kit with Ribo-Zero Gold
DNeasy Blood & Tissue Kit QIAGEN Cat# 69506
EZ-96 DNA Methylation MagPrep Kit Zymo Research Cat# D5040
Infinium HumanMethylation450 assays Illumina Cat# WG-317-1001
superseded by Infinium MethylationEPIC
BeadChip Kit
Qiaquick MinElute PCR purification Kit QIAGEN Cat# 28004
Kapa Hyper Prep Kit Kappa Cat# KK8500
Agencourt AMPure XP Agencourt Cat# A63880
Protein A Dynabeads Invitrogen Cat# 10001D
NEBnext New England Biolabs Cat# E6000S
Ideal Kit Diagenode Cat# C01010011
GeneRead Size Selection kit QIAGEN Cat# 180514
Deposited Data
1000 Genomes Project http://www.1000genomes.org/data/
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/
GENCODE 15 Harrow et al., 2012 http://www.gencodegenes.org/releases/15.html
ENCODE blacklisted regions http://hgdownload.cse.ucsc.edu/goldenPath/
hg19/encodeDCC/wgEncodeMapability/
wgEncodeDacMapabilityConsensusExcludable.bed.gz
GWAS Catalog Welter et al., 2014 https://www.ebi.ac.uk/gwas/
Blueprint GRCh37 genome and gene N/A ftp://ftp.ebi.ac.uk/pub/databases/blueprint/releases/
annotation 20130301/homo_sapiens/reference
WGS data files This paper EGAD00001002663
RNA data files This paper EGAD00001002671
EGAD00001002674
EGAD00001002675
ChIP-seq data files This paper EGAD00001002670
EGAD00001002672
EGAD00001002673
450k data files This paper EGAS00001001456

Continued
BWA (v0.5.9) Li and Durbin, 2009 http://bio-bwa.sourceforge.net/
Picard (v1.98) N/A https://github.com/broadinstitute/picard
GATK (v3.4) DePristo et al., 2011 https://www.broadinstitute.org/gatk/download/auth?package=
GATK-archive&version=3.4-0-g7e26428
SAMtools/bcftools Li, 2011 https://github.com/SAMtools/SAMtools/releases/tag/1.2
VQSR DePristo et al., 2011 https://www.broadinstitute.org/gatk/guide/article?id=39
BEAGLE r1398 Browning and Browning, https://faculty.washington.edu/browning/beagle/
2007
PLINK v1.9 Purcell et al., 2007 https://www.cog-genomics.org/plink2
FastQC (v0.10.1) N/A http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
STAR (v2.4.0k) Dobin et al., 2013 https://github.com/alexdobin/STAR
DESeq2 (v1.4.5) Love et al., 2014 https://bioconductor.org/install/#install-bioconductor-packages
Cufflinks (v2.2.1.) Trapnell et al., 2010 http://cole-trapnell-lab.github.io/cufflinks/releases/v2.2.1/
sQTLseekeR R package (v2.0) Monlong et al., 2014 https://github.com/jmonlong/sQTLseekeR
AStalavista 3.2 Foissac and Sammeth, http://sammeth.net/confluence/display/ASTA/2+-+Download
2007; Monlong et al.,
2014
Minfi Aryee et al., 2014 https://bioconductor.org/packages/release/bioc/html/minfi.html
NOOB Triche et al., 2013 https://www.bioconductor.org/packages/release/bioc/html/
methylumi.html
SWAN Maksimovic et al., 2012 https://bioconductor.org/packages/release/bioc/html/minfi.html
SVA Leek et al., 2012 http://www.bioconductor.org/packages/release/bioc/html/
sva.html
PhantomPeakQualTools vr18 N/A http://code.google.com/p/phantompeakqualtools/
MACS2 v2.0.10.20131216 Zhang et al., 2008 https://pypi.python.org/pypi/MACS2
BEDOPS v2.4.14 N/A http://bedops.readthedocs.io/en/latest/
ComBat Chen et al., 2011 http://www.bioconductor.org/packages/release/bioc/html/
sva.html
PEER Stegle et al., 2012 https://github.com/PMBio/peer/wiki
DNA Methylation Age Calculator Horvath, 2013 https://dnamage.genetics.ucla.edu/
LIMIX Casale et al., 2015; https://github.com/PMBio/limix
Lippert et al., 2014
association testing (EMMA) Kang et al., 2008a http://mouse.cs.ucla.edu/emma/
multiple hypothesis correction Battle et al., 2014 http://dags.stanford.edu/dgn/
(LRVM)
heritability analysis (GCTA) Yang et al., 2011 http://cnsgenomics.com/software/gcta/
Pysam N/A https://github.com/pysam-developers/pysam
WASP van de Geijn et al., 2015 https://github.com/bmvdgeijn/WASP
qvalue (v1.99.1) Storey and Tibshirani, https://github.com/jdstorey/qvalue
2003
GARFIELD N/A http://www.ebi.ac.uk/birney-srv/GARFIELD/
http://bioconductor.org/packages/release/bioc/html/garfield.html
ChromHMM Ernst and Kellis, 2012 http://compbio.mit.edu/ChromHMM/
Trim Galore v0.32 N/A http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
Bedtools2 v2.23.0 N/A https://github.com/arq5x/bedtools2/releases/tag/v2.23.0
gwas-pw Pickrell et al., 2016 https://github.com/joepickrell/gwas-pw
Ingenuity Pathway Analysis (IPA) N/A www.qiagen.com/ingenuity
QIAGEN Redwood City

Further information and requests for reagents may be directed to the corresponding author/lead contact; Nicole Soranzo (ns6@
sanger.ac.uk).
Human Subjects
Blood was obtained from donors who were members of the NIHR Cambridge BioResource (http://www.cambridgebioresource.org.
uk/) with informed consent (REC 12/EE/0040) at the NHS Blood and Transplant, Cambridge. Details of donor characteristics (gender,
smoking status past and present and age bin), identification (ID) code and donation date are listed in Table S1. Blood collection is
described in the STAR Methods.
METHOD DETAILS
Sample Collection and Cell Isolation

Peripheral Adult Blood Collection
Donors were on average 55 years old (range 20-75 years old) with 46% of donors being male. A unit of whole blood (475 ml) was
collected in 3.2% Sodium Citrate. An aliquot of this sample was collected in EDTA for genomic DNA purification. A full blood count
(FBC) for all donors was obtained from an EDTA blood sample, collected in parallel with the whole-blood unit, using a Sysmex Hae-
matological analyzer. The level of C-reactive protein (CRP), an inflammatory marker, was also measured in the sera of all individuals.
All donors used for the collection had FBC and CRP parameters within the normal healthy range. Blood was processed within 4 hr of
collection.
Isolation of Cell Subsets
To obtain pure samples of ‘classical’ monocytes (CD14+ CD16-), neutrophils (CD66b+ CD16+) and naive CD4+ T cells (CD4+
CD45RA+), we implemented a multi-step purification strategy. Whole blood was diluted 1:1 in a buffer of Dulbecco’s Phosphate Buff-
ered Saline (PBS, Sigma) containing 13mM sodium citrate tribasic dehydrate (Sigma) and 0.2% human serum albumin (HSA, PAA)
and separated using an isotonic Percoll gradient of 1.078 g/ml (Fisher Scientific). Peripheral blood mononuclear cells (PBMCs) were
collected and washed twice with buffer, diluted to 25 million cells/ml and separated into two layers, a monocyte rich layer and a
lymphocyte rich layer, using a Percoll gradient of 1.066 g/ml. Cells from each layer were washed in PBS (13mM sodium citrate
and 0.2% HSA) and subsets purified using an antibody/magnetic bead strategy. To purify monocytes, CD16+ cells were depleted
from the monocyte rich layer using CD16 microbeads (Miltenyi) according to the manufacturer’s instructions. Cells were washed
in PBS (13mM sodium citrate and 0.2% HSA) and CD14+ cells were positively selected using CD14 microbeads (Miltenyi). CD4+
naive T cells were negatively selected using an EasySep Human Naive CD4+ T Cell Enrichment Kit (StemCell) according to the man-
ufacturer’s instructions. To purify neutrophils, the dense layer of cells from the 1.078 g/ml Percoll separation was lysed twice using an
ammonium chloride buffer to remove erythrocytes. The resulting cells (including neutrophils and eosinophils) were washed and neu-
trophils positively selected using CD16 microbeads (Miltenyi) according to the manufacturer’s instructions. The purity of each cell
preparation was assessed by multicolor FACS (Figure S1) using conjugated antibodies for CD14 (M4P9, BD Biosciences) and
CD16 (B73.1 / leu11c, BD Biosciences) for monocytes, CD16 (VEP13, MACS, Miltenyi) and CD66b (BIRMA 17C, IBGRL-NHS) for
neutrophils and CD4 (RPA-T4, BD) and CD45RA (HI100, BD) for naive CD4+ T cells. Purity was on average 95% for monocytes,
98% for neutrophils and 93% for naive CD4+ T cells.
Molecular Data Generation and Processing

Genome and Annotation Version
All alignments and analyses in the Blueprint EpiVar project were carried out using GRCh37/hg19 and GENCODE 15 (Harrow et al.,
2012).
Whole-Genome Sequencing
Sample Preparation. Genomic DNA preparation was performed at the University of Cambridge (UCAM). Red blood cells from EDTA
whole blood were lysed prior to lysis of leukocytes using guanidine hydrochloride, sodium acetate and a protease lysis buffer. DNA
was extracted using chloroform and precipitated in ethanol prior to washing, resuspension in ultra-pure water and quantification
(Qubit, Invitrogen).
Library Preparation. Whole-genome sequencing (WGS) was performed at the Wellcome Trust Sanger Institute (WTSI). Genomic
DNA (approximately 1mg) was fragmented to an average size of 500 base pairs (bp), and indexed, adaptor-ligated DNA libraries
were created using established Illumina paired-end protocols. A portion of each library was used to create an equimolar pool
comprising of eight indexed libraries.
Sequence Data Generation. Libraries were subjected to 100bp paired-end (PE) sequencing (HiSeq 2000/2500; Illumina) at the
WTSI following manufacturer’s instructions. Each pool of eight libraries was sequenced on multiple lanes/flowcells to an (average)
depth of 7.05x coverage (SD = 1.84) of the human genome and aligned to GRCh37/hg19 using BWA (v0.5.9) (Li and Durbin, 2009).

Sequence Data Processing. Sequence data were processed by the Human Genetics Informatics Group at the WTSI as described in
more detail in Gurdasani et al. (2015). Briefly the following steps were carried out:
BAM Processing. After creating BAM files from the sequenced lanes, base qualities were recalibrated (Abnizova et al., 2010) and
mapped to the human reference genome (GRCh37/hg19) with BWA. BAM files were sorted and duplicates were marked using
Picard (v1.98). Then BAMs were realigned around known and discovered INDELs using GATK (v3.4)) (DePristo et al., 2011)
and re-calibrated by GATK.
Variant Calling. SNP and INDEL calls were made using SAMtools/bcftools (Li, 2011) by pooling the alignments from 200 individual
low coverage BAM files. All-samples and all-sites genotype likelihood files (bcf) were created with SAMtools mpileup on chunked
chromosomes. The resulting VCFs were merged and Variant Quality Score Recalibration (VQSR) (DePristo et al., 2011) was per-
formed on the chunks, independently for SNPs and INDELs. GATK was run independently for SNPs and INDELs producing a VCF
file containing variant quality score log odds ratio (VQSLOD) scores for each site. The VQSR filter was applied to the SAMtools
calls.
Variant Quality Control and Filtering. We filtered variants that were identified as an INDEL within 10 bp of an INDEL and a SNP
within 3 bp of an INDEL. Additionally, variants were filtered if their VQSLOD score was below the score that was necessary to
discover 96% of truth sites. For SNPs this cut-off was a minimum VQSLOD score of 1.0078 and for INDELs a score of 0.91.
The missing and low confidence genotypes in the filtered VCFs were filled in with BEAGLE r1398 (Browning and Browning,
2007). Additional filtering was then applied to generate a final dataset containing variants with (i) Allelic R-Squared (AR2) R 0.8
(AR2 is the estimated squared correlation between the most likely allele dosage and the true allele dosage); (ii) Hardy-Weinberg
equilibrium (HWE) R 1x103; and (iii) allele count (AC) > 4.
Data QC. A set of 154,222 robustly QC’d autosomal SNPs extracted from a total of 7,009,917 was used to carry out sample quality
control using principal components analysis (PCA) for the identification of ethnic outliers and Identity-By-Descent (IBD) analysis for
the identification of duplicate samples. The SNPs used for the sample quality control consisted of bi-allelic variants with minor allele
frequency (MAF) R 0.05, Hardy-Weinberg P value R 104 and genotype missingness < 3%. In addition, a pairwise r2 threshold of 0.2
was used to select unlinked SNPs. This was done using the indep-pairwise function within PLINK v1.9 (Purcell et al., 2007), with a
moving window of 1000bp. Ethnicity was evaluated by merging the BLUEPRINT samples with the 14 populations present in the
1000 Genomes Project data. PCA was performed and the first three principal components were plotted to identify possible ethnic
outliers (see Figure S2A). A threshold on PC2 scores of 0.018 was used to differentiate the samples of European origin (GBR,
CEU, TSI, FIN, IBS) from the rest. In total 3 outliers were identified and excluded as being of mixed ethnic origins. The proportion
of alleles that were IBD was estimated in a pairwise manner for all samples using the PLINK Method-of-Moments function. The prob-
ability of sharing zero alleles by descent was found to be between (Z0) 0.91 and 1 for all pairwise estimations and therefore all the
individuals in the data were defined as unrelated. Other metrics for the complete variant call set, such as number of variants per sam-
ple and allele frequency, as well as depth of coverage and Ts/Tv ratio, are shown in Figures S2B–S2H.
RNA-Sequencing Sample Preparation
RNA sequencing (RNA-seq) preparation and library creation at McGill University (naive CD4+ T cells) and the Max Planck Institute for
Molecular Genetics (MPIMG, monocytes and neutrophils) were performed using identical methods. Following purification, cells were
lysed in TRIZOL reagent (Life Technologies) at a concentration of approximately 2.5 million cells/ml. RNA was extracted as per man-
ufacturer’s instructions, resuspended in ultra-pure water and quantified (Qubit, Invitrogen) prior to library preparation.
Data Generation
Library preparation. Sequencing libraries were prepared from 200ng RNA using an Illumina TruSeq Stranded Total RNA Kit with Ribo-
Zero Gold (Illumina). Adaptor-ligated libraries were amplified and indexed via PCR.
RNA Sequencing. For monocytes and neutrophils up to six libraries were multiplexed per lane and sequenced at MPIMG using
100bp single end (SE) protocols following manufacturer’s instructions (V3 chemistry, HiSeq 2000, Illumina). On average each sample
generated 9.18Gb of raw data (med 9.32Gb, SD 1.15Gb). For naive CD4+ T cells, libraries were prepared in the same way and
sequenced at McGill university using 100bp paired-end (PE) reads, generating on average 11.74Gb of raw data (med 10.83Gb,
SD 3.38Gb).
Data Processing
Pre-alignment QC. Prior to alignment reads from each RNA-seq library were initially subjected to a quality control step using FastQC
(v0.10.1), where, based on duplication rates and gene coverage, outliers were identified and discarded from further analysis. Reads
of monocytes, neutrophils and naive CD4+ T cells were trimmed for both PCR and sequencing adapters using Trim Galore (v0.32).
Alignment. Trimmed reads were aligned to the human genome using STAR (v2.4.0k) (Dobin et al., 2013). STAR default settings
were used given that they were optimized for 100bp reads in human. For STAR runs, annotated splice junctions retrieved from
GENCODE 15 were used to guide the alignment step.
Quantification of Gene Expression
To quantify and normalize gene expression, we used DESeq2 (v1.4.5) (Love et al., 2014) to obtain the read counts for each gene an-
notated in GENCODE 15.

RNA Splicing QTLs
We assessed alternative splicing using two complementary methods of quantification.
Identification of Alternative Splicing (PSI). To identify alternative splicing events, we used the uniquely mapped splice junction
output from STAR and examined the ones that shared either the acceptor or the donor site. As described previously (Chen et al.,
2014), these splicing events were compared to GENCODE 15 annotation in order to be classified as: exon-skipping, alternative 30
splice site, or alternative 50 splice site. Alternative splice junctions that could not be matched to any annotated splice junction
were defined as ‘unannotated’. PSIs, the ratio of alternatively spliced junctions, were computed for all the alternative splicing events.
We then tested association for SNPs within a 1 Mb region surrounding the PSI event, using comparable approaches to the eQTL
analysis.
Quantification of Splicing Isoform Abundance (ISO). Splicing isoform abundance was estimated using Cufflinks (v2.2.1.) (Trapnell
et al., 2010), selecting GENCODE 15 as guide annotation, without de novo transcript assembly. Cufflinks was run on RNA-Seq BAM
files for monocytes, neutrophils and naive CD4+ T cells. Abundance was reported in FPKM (expected fragments per kilobase of tran-
script per million fragments sequenced). For ISO sQTL mapping, we employed the sQTLseekeR R package (v2.0) (Monlong et al.,
2014). sQTLseekeR provides an appropriate method to identify SNPs associated with the variation in the relative abundance of a
gene’s transcript isoforms, or transcript ratios, which configure a multivariate phenotype. We considered only GENCODE 15 protein
coding genes that expressed at least two isoforms (thresholds for gene and transcript expression were set at 1 FPKM and 0.1 FPKM,
respectively) and with a minimum splicing variability across samples. Since we were looking for cis-effects on splicing, we focused on
SNPs within the gene body ± 5Kb, and separately for each cell type. To avoid testing uninformative variants, only biallelic SNPs
creating at least two genotypes, each of which present in at least 5 individuals, were considered. In total, 9,485 genes and
1,462,663 SNPs were tested for association and FDR was used to correct for multiple testing. For a given transcript ratio QTL, we
identified the two transcripts of the target gene that changed the most between genotypes and exhibited a symmetric behavior.
Then, we employed AStalavista 3.2 software (Foissac and Sammeth, 2007) to compare their exonic structure, and determined the
proportion of transcript ratio QTLs that were associated to each type of splicing event. Finally, the effect size of the identified tran-
script ratio QTLs was estimated as the maximum difference (MD) in relative expression between genotype groups, e.g., if MD = 0.25,
there is one transcript whose relative expression shifted by 25% between two genotype groups.
DNA Methylation
Sample Preparation. Purified cells were pelleted, snap frozen and stored at 80 C prior to pre-processing at UCAM (monocyte and
neutrophil) or shipping to McGill (naive CD4+ T cells).
DNA Extraction.
UCAM–DNA for monocyte and neutrophil samples were processed at UCAM. Cells were lysed using guanidine hydrochloride, so-
dium acetate and protease lysis buffer. DNA was extracted using chloroform and precipitated in ethanol prior to washing and resus-
pension in ultra-pure water. DNA was quantified (Qubit, Invitrogen) and shipped to University College London (UCL) for processing.
McGill–DNA was extracted from cell pellets of purified naive CD4+ T cells at McGill University using a protocol modified from the
DNeasy Blood and Tissue Handbook (QIAGEN). Briefly, cell pellets were lysed using proteinase K, RNase A, and Buffer AL lysis
buffer, followed by precipitation in ethanol. The DNA was purified through four successive steps with wash buffers in the DNeasy
Mini spin columns, and finally eluted using a DNeasy membrane.
Data Generation.
UCL–500ng of DNA for each monocyte and neutrophil sample was randomly dispensed onto a 96-well plate to reduce batch ef-
fects. Samples were bisulfite-converted using an EZ-96 DNA Methylation MagPrep Kit (Zymo Research) following the manufacturer’s
instructions with optimized incubation conditions (i.e., 16 cycles of 95 C for 30 s, 50 C for 60 min; followed by 4 C until further pro-
cessing). Purified bisulfite-treated DNA was eluted in 15 mL of M-Elution Buffer (Zymo Research).
McGill–DNA samples were bisulfite-converted using the EZ DNA Methylation Kit (Zymo Research) according to manufacturer’s
instructions.
At both institutes, DNA methylation levels were measured using Infinium HumanMethylation450 assays (Illumina) according to the
manufacturer’s protocol.
Data Processing. Data files generated for all cell types were processed at UCL. All 450K array data pre-processing steps were car-
ried out using established analytical methods incorporated in the R package minfi (Aryee et al., 2014). First, we performed back-
ground correction and dye-bias normalization using NOOB (Triche et al., 2013), followed by normalization between Infinium probe
types with SWAN (Maksimovic et al., 2012). Next, we filtered out probes based on the following criteria: (i) median detection
P value R 0.01 in one or more samples; (ii) bead count of less than three in at least 5% of samples; (iii) mapping to sex chromosomes;
(iv) Ambiguous genomic locations (Nordlund et al., 2013); (v) Non-CG probes; (vi) Probes containing SNPs (MAF R 0.05) within 2bp of
the probed CG. Finally, we adjusted for batch effects using an empirical Bayesian framework (Johnson et al., 2007), as implemented
in the ComBat function of the R package SVA (Leek et al., 2012). The final data matrix used for statistical analyses, after additionally
removing samples without a matching WGS sample, comprised DNA methylation M-values across 440,905 CpG sites and 525 sam-
ples, i.e., 196 monocytes, 197 neutrophils and 132 naive CD4+ T cells.
Data Quality. To assess the quality of the presented 450K array data and to exclude the possibility of samples mismatches, we
performed a series of data quality control steps. First, we assessed the distribution of DNA methylation M-values for each cell
type to identify samples of low DNA integrity (Figures S2I–S2K). Second, principal component analyses and multidimensional scaling

were carried out to detect sample mismatches and outliers (Figures S2L–S2N). Third, as detailed later, we performed additional an-
alyses to ensure that DNA methylation profiles were correctly matched to the other datasets obtained from the same donors.
ChIP-Sequencing
Sample Preparation. Purified cells were fixed with 1% formaldehyde (Sigma) at a concentration of approximately 10 million cells/ml.
Fixed cell preparations were washed and either stored re-suspended in PBS at 4 C for monocytes and neutrophils, or pelleted and
stored at 80 C for naive CD4+ T cells at UCAM before shipping to the processing institutes (monocytes and neutrophils to WTSI/
Nijmegen Centre for Molecular Life Sciences (NCMLS), naive CD4+ T cells to McGill University).
Chromatin Immunoprecipitation. Chip-seq was performed at the three different institutes using different protocols (as described
below and Figure S1). Naive CD4+ T cells were processed at McGill University while monocytes and neutrophils were processed
at WTSI and NCMLS. For monocytes H3K27ac/H3K4me1 data, samples 1-49/1-48 are from NCMLS and 50-162/49-172 from
WTSI. For neutrophil H3K27ac/H3K4me1 data, samples 1-48/1-47 are from NCMLS and 49-174/48-173 from WTSI. In these exper-
iments antibodies from identical batches (H3K4me1, C15410194; H3K27ac, C15410196) were obtained from Diagenode (Liege,
Belgium).
NCMLS–Sonication was performed using a Diagenode Bioruptor UCD-300 for 3x 10 min (30 s on; 30 s off). 67ml of chromatin
(1 million cells) was incubated with 229ml dilution buffer, 3ml protease inhibitor cocktail and 1mg of H3K27ac or 0.5mg of H3K4me1
antibody and incubated overnight at 4 C with rotation. Protein A/G magnetic beads were washed in dilution buffer with 0.15%
SDS and 0.1% BSA, added to the chromatin/antibody mix and rotated for 60 min at 4 C. Beads were washed with 400ml buffer
for 5 min at 4 C with five rounds of washes. After washing chromatin was eluted using elution buffer for 20 min. Supernatant was
collected, 8ml 5M NaCl, 3ml proteinase K were added and samples were incubated for 4 hr at 65 C.Finally samples were purified using
QIAGEN; Qiaquick MinElute PCR purification Kit and eluted in 20ml EB. Illumina library preparation was performed using the Kapa
Hyper Prep Kit. For end repair and A-tailing double stranded DNA was incubated with end repair and A-tailing buffer and enzyme
and incubated first for 30 min at 20 C and then for 30 min at 65 C. Subsequently adapters were ligated by adding 30ml ligation buffer,
10 Kapa l DNA ligase, 5ml diluted adaptor in a total volume of 110ml and incubated for 15 min at 15 C.Post-ligation cleanup was per-
formed using Agencourt AMPure XP reagent and products were eluted in 20ml elution buffer. Libraries were amplified by adding 25ml
2x KAPA HiFi Hotstart ReadyMix and 5ml 10x Library Amplification Primer Mix and PCR, 10 cycles. Samples were purified using the
QIAquick MinElute PCR purification kit and 300bp fragments selected using E-gel. Correct size selection was confirmed by
BioAnalyzer analysis.
WTSI–Sonication protocols were performed in a Diagenode PicoRuptor for 8 cycles of 30 s on, 30 s off in a 4 C water cooler. Sam-
ples were checked for sonication efficiency using the criteria of 150-500bp, by Agilent DNA bioanalyzer. ChIP-seq was carried out as
previously described (Aldridge et al., 2013) all liquid handling steps were performed on an Agilent Bravo NGS. Protein A Dynabeads
(Invitrogen) were coupled with 2.5mg of antibody. Sonicated lysate (3-5 million cells) was then added to the bead/antibody mix and
incubated at 4 C overnight. ChIP-DNA bound beads were washed for ten repetitions in cold RIPA solution. Elution of DNA from beads
at 65 C for five hours to reverse the cross linking process. 2ml RNase was added to ChIP-DNA and incubated at 37 C for 30 min,
followed by 2ml of Proteinase K treatment at 55 C for 1 hr. 1:1.8 ratio of Ampure beads (Beckman Coulter, A63881) were added
to the DNA followed by two cold 70% ethanol washes. ChIP-DNA was eluted in 50ml elution buffer. Illumina sequencing libraries
were prepared on a Beckman Fx liquid handling system. End-repair, A-tailing and paired-end adaptor ligation were performed using
NEBnext reagents from New England Biolabs, with purification using a 1:1 ratio of AMPure XP to sample between each reaction.
Amplification of ChIP-DNA was performed using Kapa HiFi mastermix (Kapa Biosystems), 18 cycles of PCR followed by a 0.7:1
Ampure XP clean-up.
McGill–Sonication of nuclei was performed on a BioRuptor UCD-300 for 90 cycles, 10 s on 20 s off, centrifuged every 15 cycles,
chilled by 4 C water cooler. Samples were checked for sonication efficiency using the criteria of 150-500bp by gel electrophoresis.
ChIP reaction was performed on a Diagenode SX-8G IP-Star Compact using Diagenode automated Ideal Kit. 25ml Protein A beads
were washed and then incubated with 3-6mg of antibody and 2-4 million cells of sonicated cell lysate combined with protease inhib-
itors for 10 hr, followed by 20 min wash cycle with provided wash buffers. Reverse cross linking took place on a heat block at 65 C for
4 hr. ChIP samples were then treated with 2ml RNase Cocktail at 65 C for 30 min followed by 2ml Proteinase K at 65 C for 30 min.
Samples were then purified with QIAGEN MiniElute PCR purification kit as per manufacturers’ protocol. Library preparation was
carried out using Kapa HTP Illumina library preparation reagents. Briefly, 25ml of ChIP sample was incubated with 20ml end repair
mix at 20 C for 30 min followed by Ampure XP bead purification. A tailing; bead bound sample was incubated with 50ml buffer
enzyme mix for 30 C 30 min, followed by PEG/NaCl purification. Adaptor ligation, further Ampure purification and library preparation
was completed by 14 cycles of PCR amplification. Size selection was performed using a Sage Pippin prep system and set to
collect 200-400bp fragments, targeting a 300bp peak fragment size and final libraries were purified with QIAGEN GeneRead Size
Selection kit.
Data Processing and Peak Calling. ChIP libraries were sequenced using Illumina HiSeq 2000 at 50bp SE reads in WTSI, 100bp SE in
McGill and 43bp SE in NCMLS. Sequenced reads were aligned to a gender-matched reference genome (Blueprint GRCh37) using
BWA (bwa aln –q 15). Duplicate reads were marked using Picard MarkDuplicates. Reads with mapping quality less than 15 were
removed (SAMtools). The fragment size L for each aligned bam was estimated using PhantomPeakQualTools vr18, which uses
cross correlation of binned read counts between forward and reverse strands. To identify highly enriched genomic regions, we
used MACS2 (v2.0.10.20131216, standard options) (Zhang et al., 2008) for peak calling with the estimated fragment size from

PhantomPeakQualTools (–shiftsize = half fragment size), and with narrow and broad flags set for H3K27ac and H3K4me1 respec-
tively. Furthermore, ChiP input was created from merging 3-12 samples, where we randomly obtained equal number of reads
from each experiment. Significant peaks were selected to be at 1% FDR or less. ChIP inputs were as follows.
d Neutrophils Female (NCMLS): S001GVH2, S000X1H1, S002KJH1;

d Neutrophils Male (NCMLS): S00294H1, S001NHH1, S001C2H1;
d Neutrophils Female (WTSI): NS1140 (pool of S00W29H1, S00WP0H1, S00FK4H1) and NS1163 (pool of S00T4H1, S00NXKH1,
S00PBJH1);
d Neutrophils Male (WTSI): NS1141 (pool of S00JT7H2, S00HVBH1, S00M0GH1) and NS1164 (pool of S00RMQH3, S00RD7H2,
S00NRWH1);
d Monocyte Female (NCMLS): S002KJH4, S000X1H3, S001GVH4;
d Monocyte Male (NCMLS): S00294H2, S001NHH3, S001C2H3;
d T cell Female (McGill): S00DKCH4, S00G7QH2, S00GSLH2, S00GWDH2, S00JYYH2;
d T cell Male (McGill): S0021KH3, S002EVH2, S00382H2, S0064ZH2, S00D9YH1, S00DQ0H2, S00E9UH2, S00GBIH2,
S00GECH2, S00HVBH2, S00JT7H1, S00KEXH3.
The majority of the neutrophil samples were immunoprecipitated at WTSI but sequenced independently at WTSI and NCMLS. For
these specific samples only, we aligned each raw fastq file from the different sequencing centers to the reference genome and merge
aligned bams to create only one bam for each neutrophil sample. For MACS2 peak calling of these merged samples, we used WTSI
ChIP input as these samples were all immunoprecipitated at WTSI. For the case of 55 T cell H3K4me1 donors, we merged the aligned
bams of duplicates of same donors in order to gain signal amplifications, as one bam alone for these donors has poor amplification.
For a complete overview of data production, refer to Figure S1.
Data Quality. We removed ChIP samples that had a fraction of reads in peaks (FriP) score < 0.01, relative strand correlation
(RSC) < 0.8 and normalized strand correlation (NSC) < 1.05. FriP was calculated using the reference peak set that is generated as
described in the next section. We identified highly successful ChIP as those with FriP > 0.01 and RSC > 0.8 and NSC > 1.05. Other-
wise, we used genome browser tracks to confirm visually a good ChiP and include it in the final dataset. Figure S4 shows quality
control metrics and corresponding principal components, showing no batch effects after PEER correction using K = 10 factors.
Normalized Read Count in the Reference Peak Set. For each histone modification marker, we generated one reference peak set for
all cell types to provide an unbiased cross cell comparison of peak-based counts. For each marker, we took the union of significant
peaks (1% FDR) across all donors and across all three cell types, merged overlapping regions (BEDOPS–merge, v2.4.14) and removed
peaks found within ENCODE blacklisted regions. This process created one reference peak set per histone modification marker. Note
that the merging process will introduce very wide peaks (R100 KB) but they are at a very low proportion of less than 1% and 5% for
H3K27ac and H3K4me1 respectively. The reference peak set will be filtered further for read counts as described below.
Next we generated quantification signal of ChiPseq for each donor. Here, we only considered read counts under the peaks, as the
regions outside peaks are more likely to be noise or background signal than true ChIP enrichment. For each donor, we generated a
vector of log2 reads per million (log2RPM) per peak in the reference peak set by counting the number of overlapping reads under the
peaks (BEDOPS bedmap –count) and normalized the counts with the total number of reads in the library.
Note that by using only one reference peak set for all three cell types, there will be peaks where there is no signal in one cell type but
quite high in another. Hence for the QTL association analysis carried out per cell type and any downstream cell-specific analyses, we
further filtered the reference peak set to only consider peaks with log2RPM > 0 in at least 50% of the donors in a given cell type,
corrected for 10 PEER factors and applied quantile normalization across donors.
Additional Quality Control to Estimate Cross-Center and Cross-Sample Identity
Batch Correction. Within the study, sequencing data were generated from difference sequencing centers (Figure S1). We performed
the following steps to correct possible batch effects.
For RNA-sequencing and gene-level quantification, we first quantified gene expression by read count for single end RNA-seq sam-
ples, and fragment (pair) count for paired-end RNA-seq using DESeq2. The sequencing depth of different samples was then corrected
by using library size factor from DESeq2. We used 15 cross-over samples to assess the impact of the different sequencing protocol,
and specifically how the quantifications of single end and paired end samples correlated from the same donor in two different centers.
Using PCA analysis, we observed that the cross-over samples deviated from the main clusters before ComBat, which was corrected
and these samples clustered within the corresponding cell types after ComBat (Figure S3A). In Table S2 and Figure S3B, we assessed
correlation in gene expression for the 15 crossover samples at different stages of the analysis (raw data, before batch effect correction
using ComBat, after batch correction and finally after PEER correction). We observed a high correlation coefficient (mean 0.85) at the
level of raw data. The ComBat further corrected the sequencing center effect and improved the correlation coefficient (mean 0.96),
suggesting that the quantifications of single and paired-end RNA-seq were highly comparably. We observed that lowly expressed
genes tended to be less well correlated. Therefore, in the QTL analysis, we further required that a gene to have more than 10 read count
in 50% of the samples. Furthermore, we applied PEER to infer and correct for 10 hidden factors.
For PSI quantification, we found the crossover samples (see later) to display the greatest differences for low quantification values
(PSI from 0 to 0.1), with low overall correlations in pairwise comparisons (mean 0.556). We therefore requested PSI quantification to

be 0.1 in 50% of samples. After removing the high noise in the low PSI, the correlations were improved to 0.920 (Table S2). PSI values
had bimodal distribution and we standardized and applied PEER to infer and correct for 10 hidden factors (Figure S3C).
For 450K methylation arrays, we used identical protocols across the two production centers (UCL and McGill). We again compared
the correlation between nine crossover samples between the two centers. The average correlation coefficient of normalized beta
value before correlation is 0.959. After applying ComBat correlation using the sequencing center as a covariate, the average corre-
lation coefficient increased to 0.994 (Table S2). After ComBat, we further examined the M-value distribution, PCA analysis and found
that the cross-over samples fitted into their corresponding distribution and PCA in three cell types (Figures S2I–S2N).
For Chip-sequencing data, we generated one reference peak set per histone modification mark and obtained the log2 RPM (reads
per million of sequencing depth) under each peak. In Table S2 and Figure S4E, we show high Pearson correlation (0.86 - 0.97) for 2-3
cross over replicates in two different ChIP centers. Furthermore, we show in Figure S4E that the samples cluster correctly into their
correct cell types. To remove the ChIP center bias that is present in monocytes and neutrophils only, we carried out PEER-correction
for 10 hidden factors on the log2 RPM peak signals and show in Figures S4G–S4J uniform density profiles across all samples and
PCA plots devoid of ChIP center effect. The PEER-corrected signal is used for the QTL association analysis of each cell type, where
we further filtered the peaks to only consider peaks with log2 RPM R 1.
Cross-Center Data Validation. In addition to the samples processed in the above RNA-seq and DNA methylation experiments,
three samples of each cell type per sample batch were sent to the reciprocal institute that did not process that particular cell-
type specific sample set, in order to account for institute-specific experimental variation. For RNA-seq naive CD4+ T cells were
sent to MPIMG, monocytes and neutrophils to McGill University; for DNA methylation naive CD4+ T cells were sent to UCL, mono-
cytes and neutrophils to McGill University.
For RNA-seq, sequencing was done in different centers using either single-end or paired-end protocols. To evaluate whether we
could compare the RNA-seq data cross centers and library protocols, we quantified the cross-over samples and adjusted for batch
effects using sequencing centers and library protocols as covariates in ComBat (Chen et al., 2011). We subsequently performed
principal component and multidimensional scaling analysis on the adjusted data. As shown in Figure S3, the crossover samples (indi-
cated in a darker color) clustered with the main cell-type specific sample set, demonstrating successful correction of any confound-
ing by institute-specific experimental variation.
For DNA methylation, we adjusted for batch effects as described above, and subsequently performed principal component ana-
lyses and multidimensional scaling on the adjusted data. The crossover samples (indicated in a darker color) clustered with the main
cell-type specific sample set, demonstrating successful correction of any confounding institute-specific experimental variation (Fig-
ures S2M and S2N).
Confirmation of Sample Identity across Datasets. Identity matching for each sample and for each analysis was performed by ex-
tracting genotypes from RNA-seq and ChIP-seq and comparing them to SNPs from the WGS data. The first stage of verifying the
sample identity concordance between the RNA-seq/Chip-seq and WGS data involved pre-processing the BAM files for one
autosomal chromosome (chr1) to remove PCR duplicates and reads with mapping quality score < 10. The variants were then called
from the resulting BAM file using mpileup from the SAMtools package (Li, 2011). The variants with QUAL < 20, DP < 5 and GQ < 5 were
filtered out. Then, we compared genotypes of the filtered variants with genotypes generated from WGS and imputation. The geno-
types generated were considered to be from the same sample if the concordance rate was greater than 90%.
For DNA methylation, first we estimated sex and age of all samples based on raw DNA methylation values using the getSex function
in minfi (Aryee et al., 2014) and the DNA Methylation Age Calculator as described by Horvath (Horvath, 2013), respectively. We then
correlated the estimated information from the experimental data to the information collected from each donor, to confirm sample
identity. Second, for the 65 SNPs from methylation probes (the internal controls) on the chip, we derived the genotypes from raw
beta values: raw beta value < 0.3 for homozygous AA or TT genotype call, beta value > 0.7 for homozygous CC or GG genotype,
the remainder were classified as heterozygous genotypes. We then checked sample identity by comparing these inferred genotypes
to their genotype from WGS.
Finally, we used PCA and unsupervised clustering methods to verify that each RNA-seq, Chip-seq and methylation sample
matched its predicted cell type of origin. All sample identities and types were confirmed prior to uploading the data files to the EGA.
Summary of Dataset
The dataset used for the analysis consists of 2,205 samples across all assays from 197 unique donors (Table S1). This breaks down
as follow; WGS 197 samples; RNA-seq 194/192/171 monocytes/neutrophils/CD4+ T cells respectively; DNA methylation 196/197/
133 monocytes/neutrophils/CD4+ T cells respectively; ChIP-seq H3K4me1 172/173/104 monocytes/neutrophils/CD4+ T cells
respectively; ChIP-seq H3K27ac 162/174/142 monocytes/neutrophils/CD4+ T cells respectively.
Statistical Analyses
Variance Component Modeling of Gene Expression
To investigate the contributions to gene expression variability from different proximal molecular features we considered different vari-
ance component models fit using LIMIX (Casale et al., 2015; Lippert et al., 2014).

For all variance component analyses, we considered only individuals for which data for all molecular layers (gene expression, DNA,
methylation, H3H4me1 and H3H27ac) were available. For T cells we excluded the ChIP-seq data, as the matching would have
reduced the dataset size to less than 100 samples. The resulting matched dataset consisted of gene expression profiles for
16,549, 14,986 and 17,802 genes in 158, 165 and 125 individuals respectively in monocytes, neutrophils and CD4+ T cells.
Independent Variance Component Models for Epigenetic and Genetic Effects on Gene Expression.
Accounting for Confounding–When correlating transcriptional and epigenetic variation, there is a concern that sample processing
effects and other sources of heterogeneity may be shared between gene expression levels and epigenetic profiles, thereby intro-
ducing spurious correlations. To mitigate such confounding factors, we applied PEER and used residual profiles for gene expression
levels, DNA methylation and histone modification marks (PEER (Stegle et al., 2012) was fit using 10 factors as described above).
PEER residuals were quantile-normalized to a unit variance Gaussian distribution. To further reduce the risk of confounding corre-
lations we additionally considered a random effect term in our model that accounts for transcriptome heterogeneity not captured by
PEER. Specifically, the sample covariance of this expression heterogeneity term was estimated as K h = ð1=GÞZZ T , where Z is N3G
matrix of gene-expression levels for N individuals and all G genes (after quantile-normalization of the distribution of PEER residuals for
each gene to a unit variance Gaussian distribution).
Without Correction for Local Genetic Effects–For each gene, we considered the model

y = N 1m; s2l K l + s2g K g + s2h K h + s2e I ; (1)
where y denotes the gene-expression profile across individuals, 1m an offset term, K l is a local relatedness matrix built using all fea-
tures from either one of the four molecular layers (genetic, methylation, H3K4me1 or H3K27ac data) that are within 1Mb from the gene
body, K g denotes the realized relatedness matrix (Lee et al., 2010), K h is the expression heterogeneity term and s2e I is the noise term.
Specifically, the local relatedness matrix for each feature type was estimated as linear kernel from all cis features of the considered
type (after standardization).
The variance parameters s2l , s2g , s2h and s2e were fitted using restricted maximum likelihood, independently for each of 16,549,
14,985 and 17,082 genes in monocytes, neutrophils and naive CD4+ T cells. The log restricted marginal likelihood was optimized
using a gradient-based optimization algorithm (BFGS) (Morales and Nocedal, 2011). The proportion of variance explained by individ-
ual components was then estimated analogous to the approach taken in classical (narrow sense) heritability analysis (Yang et al.,
2011):
s2l
h=
s2l + s2g + s2h + s2e
When comparing variance component estimates of the model in (1) with a model that does not account for expression heteroge-
neity, we found that accounting for expression heterogeneity yielded substantially lower epigenome variance estimates, whereas the
genetic variance estimates were unaffected (Figure S5J). Consequently, we considered a model that accounts for expression het-
erogeneity in all subsequent analyses. We also considered alternative window sizes (100kb and 1Mb), finding that the results
were most robust and that the overall variance was slightly increased when using 1MB window sizes (Figure S5K).
Accounting for Local Genetic Effects–To account for cis common genetic variation, we first corrected epigenetic features for local
genetic effects. To do so we fitted a separate variance component model for each individual epigenetic feature, using a local relat-
edness matrix based on all SNPs within 100Kb from the epigenetic mark. The effect from local genetic variants was estimated using
the best linear unbiased predictor and the residuals of this model were then used as an estimate of the non-genetic component of the
epigenetic marks (G-corrected marks). Additionally, we introduced a random effect in the model to account for genetic effects on
gene expression from variants within 1Mb from the gene body. Specifically, for each gene we considered the model

y = N 1m; s2l K l + s2geno K geno + s2g K g + s2h K h + s2e I ; (2)
Here, K geno is a local realized relatedness matrix built considering all genetic variants in 1Mb from the gene-body and K l is a local
relatedness matrix built considering all features from either one of the three epigenetic layers (methylation, H3K4me1 or H3K27ac
data) that are within 1Mb from the gene body. This model was used to estimate the proportion of variance explained by methylation,
H3K4me1 and H3K27ac data while accounting for underlying genetic effects.
The cumulative distribution of the proportion of variance explained by local genetics (using model (1)) and each of the three epige-
netic layers either accounting (model (2)) or not accounting (model (1)) for local genetic effects is shown in Figure 2B for monocytes,
Figure S5A for neutrophils and Figure S5B for T cells.
Joint Variance Component Model.
For each gene, we also considered variance component estimates obtained from a joint model across all four molecular layers
(genetics, methylation, H3K4me1 or H3K27ac)

y = N 1m; s2geno K geno + s2meth K meth + s2K4me1 K K4me1 + s2K27ac K K27ac + s2g K g + s2h K h + s2e I :

Here, the local relatedness matrix for each layer were computed considering all G-corrected epigenetic marks within 1Mb from the
gene body relative to the specific layer. Epigenetic variance estimates were either considered for individual layers or by aggregating
using the sum of the variance components across the three epigenetic layers. The distribution of the total epigenetic contribution to
variance is shown in Figure 2C (y axis) for monocytes, Figure S5G (y axis) for neutrophils and Figure S5H (y axis) for T cells.
Testing for Variance Components
To test for cis genetic contributions (within 1Mb from the gene-body), we considered the model

y = N 1m; s2geno K geno + s2g K g + s2h K h + s2e I ;
and tested for s2geno > 0. To test for cis contributions from methylation, H3K4me1 peaks and H3K27ac peaks that are independent
from cis common genetic variation, we used the model in (2), where the local relatedness matrix K l was built considering either
methylation, H3K4me1 or H3K27ac features (again within 1Mb from the gene body) after correction for local genetic effects, and
tested for s2l > 0. We considered log likelihood ratio (LLR) as test statistics and obtained p values using permutations, similar to
the approach in (Casale et al., 2015; Lippert et al., 2014). Specifically, we considered 30 permutations for each test and gene and
combined null LLRs across all genes. This resulted in a total of 600,000 permutation LLRs for each epigenetic layer and cell
type, which we used to estimate empirical P values (minimum pv z1.7 * 106). Empirical P values were corrected for multiple testing
using the Benjamini-Hochberg procedure. Significant associations with gene expression levels were reported at an overall FDR of
5%. Results from the variance component tests are shown in Figures 2C–2E for monocytes, Figure S5G for neutrophils and Fig-
ure S5H for T cells.
Epigenome-wide Association Analysis of Gene Expression
To differentiate epigenetic associations with gene expression that are due to underlying local genetic variation from associations that
are independent of genetic effects, we also carried out classical single-feature association tests, with and without adjusting for ge-
netic factors in the model. For both models, we considered associations between gene expression level and all epigenetic features
that are in 1Mb from the gene body.
Uncorrected EWAS Model. To test for association between gene expression and epigenetic features within 1Mb from the gene
body (including methylation and histone modification) we consider the following linear mixed model:

y = N 1m + eb; s2g K g + s2h K h + s2e I
Here, y denotes the gene-expression profile across individuals for gene g, 1m an offset term, e is the specific epigenetic feature of
interest, K g denotes the realized relatedness matrix (Lee et al., 2010), K h is the expression heterogeneity term and s2e I explains re-
sidual variance. All epigenetic features and gene-expression levels were quantile-normalized to unit variance Gaussian distribution
prior to testing for associations.
G-Corrected Model. Proceeding as in the variance component analysis, we considered the model:

y = N 1m + e0 b; s2geno K geno + s2g K g + s2h K h + s2e I ;
where e’ is the G-corrected genetic feature being tested and K geno is a local realized relatedness matrix built considering all variants in
1Mb from the gene-body. G-corrected epigenetic features were also quantile-normalized to a normal distribution prior to association
testing.
Association testing was performed using LIMIX (Casale et al., 2015; Lippert et al., 2014). For both models, variance components
were estimated under the null model and only the total variance was updated during the association testing (Kang et al., 2008b). For
multiple hypothesis correction, we performed a two-step procedure (Battle et al., 2014): we first obtained a gene-level P value as the
minimum nominal P value (Bonferroni corrected to account for multiple testing across cis features) and then used the Q-value pro-
cedure (Storey and Tibshirani, 2003) to correct for multiple testing across genes. We called genes with significant epigenetic asso-
ciation at FDR < 5%.
QTL-Mapping
Gene, Methylation, Histone Modification QTL Mapping. Cis-acting QTL mapping was done using the LIMIX package. We considered
genetic variants mapping to within 1 Mb (on each side) of each tested feature, and tested their association with gene expression,
splicing (percent splice in, PSI), methylation levels and histone modification peaks (H3K27ac and H3K4me1).
Linear regression models were fit between the genotypes and trait quantification, also including a random effect term accounting
for polygenic signal and sample relatedness (as in the variance component models above we used the realized relatedness matrix to
capture sample relatedness). Analogously to the variance decomposition analysis, we considered quantile-normalized PEER resid-
uals for this analysis. From the linear regression, we obtained the effect size and p value for each tested association.
To correct for multiple hypothesis testing, we performed a two-step procedure (LRVM) (Battle et al., 2014): first, we corrected for
multiple testing across variants for each molecular outcome using Bonferroni correction and, second, we adjusted the obtained p
values for multiple-testing across phenotypes within each layer using the Q-value procedure (Storey and Tibshirani, 2003), consid-
ered QTLs at a significance threshold of 5% FDR.

Allele-Specific Expression (ASE) Mapping
To assess allele-specific expression (ASE) mapping in a similar manner to QTL mapping (above), the aligned RNA-seq reads (to hg19
reference genome) were divided into separate BAM files based on forward and reverse orientations. Read from only the forward
strand were used in analysis of transcripts in forward orientation, and reads from reverse strand were used for analysis of reverse
orientation transcripts. PCR duplicates were filtered out from all subsequent analysis.
In total 672,115,720, 623,962,195 and 496,318,001 filtered reads with allelic information were available in neutrophils (n = 196),
monocytes (n = 194) and naive CD4+ T cells (n = 169), respectively. Allelic expression from RNA-seq reads at all heterozygous
SNPs was counted with customized python code using the Pysam package. We adjusted for reference bias caused by genome align-
ment by using only heterozygous SNPs with reads in both alleles in the sample. This requirement that both alleles at heterozygous
sites were observed reduced the overall number of informative reads by 8.5%. On average, there were 138K heterozygous expressed
sites per individual with both alleles observed in RNA-seq and with a mean count of 20 reads per site per sample. To further reduce
the reference allele bias, we re-mapped the reads to filter out potential problematic reads by using the WASP (van de Geijn et al.,
2015), which almost completely removed the reference allele bias (final bias 50.1%). This step also reduced informative sequence
by 15% on average.
After WASP correction, we performed two types of allele-specific testing: a) linear regression and b) CHT.
Linear AS Test. The sum of allelic counts of each haplotype in the gene region was used to calculate its ASE ratio. The samples used
in ASE mapping for each gene required at least two informative SNPs (heterozygous genotype with reads in both alleles) with R 10
reads in each SNP in the gene region. Given that ASE mapping can be sensitive to outlier effects when few heterozygotes are avail-
able for analyses, we applied a cutoff of MAF R 0.05 for the tested SNP and more than 5 samples with ASE in the gene region. We
performed ASE-mapping by using those allele ratios in regression testing for local SNP association (250kb flanking each side (TSS/
TES) of the transcript), as was previously described in a linear regression model (Ge et al., 2009). In aggregate, a total of 14,962 gene
loci were tested. The gene region definition is the same as the one used in our earlier eQTL analysis. We also carried out the condi-
tional (secondary) AS mapping by performing a linear AS test using gene allele ratios from samples with homozygous genotype in
lead SNP in the primary mapping. All genes with the most significant p value at 5% FDR were tested.
Combined Haplotype Test (CHT). The analyses were carried out using the combined haplotype test in WASP (van de Geijn et al.,
2015), as per author’s instructions. A minimum of 20 samples with allele-specific data in the gene region was required for the CHT
testing. The first 4 principal components, generated from PCA, were also used as covariates in the CHT. In total, 19,283 gene loci
were tested.
In the linear AS test, one sample contributes to a single data point (average ratio) in the final statistical testing while in the CHT, each
heterozygous SNP site with allele reads is one data point. Therefore, CHT allows us to test more features.
Allele-Specific Histone Mark (ASH) Mapping
The methods and steps are very similar to those described for ASE (above), with the exception that sequence orientation is not
required. After filtering for duplicates, we obtained a total of 154,674,565, 143,357,603 and 373,066,694 sequence reads with allelic
information for H3K4me1 and 122,631,444, 105,172,159 and 203,402,978 sequence reads with allelic information for H3K27ac in
neutrophils, monocytes and naive CD4+ T cells, respectively. We noticed that WASP correction removed less than 5% of informative
sequence reads in ChIP-seq, which is significantly lower than the 15% observed in the RNA-Seq dataset. Similar mapping methods
to ASE (described above) were applied to histone mark reads. A total of 36,729 and 38,546 peak regions were analyzed with the linear
AS test for H3K27ac and H3K4me1 histone marks separately. With the CHT method, 70,894 and 45,867 peaks were tested for
H3K27ac and H3K4me1, respectively.
In the end, we obtained 18 sets of AS mapping data (6 sets from ASE and 12 sets from ASH). We performed false discovery rate
(FDR) estimation from the p values for each dataset, using the qvalue package in R. FDR (or qvalue) were used for later comparisons
between datasets.
In order to assess the regulatory SNPs shared between genes and histone peaks, we used two different approaches. The first
approach is based on LD information between the two lead SNPs from ASE and ASH mapping data, respectively. We define the
gene and peak pair sharing the same regulatory element if two the lead SNPs are in LD (r2 R 0.8). The LD r2 values were calculated
from our phased SNP genotype dataset (197 samples).
The second approach is based on correlation test between the allele ratios of the gene and histone mark. We extracted allele ratios
from both sets and from shared samples for all pairs of genes and histone peak pairs at a distance of less than 1Mb. We required a
minimum of 25 samples with 2 informative SNPs and a minimum of 10 reads in both RNASeq and ChIPseq data. On average, 4% of
tested pairs have correlation jrj R 0.3 or p value < 0.05.
Overall, 60% of pairs of genes and histone peaks in LD were confirmed by correlation of allele ratios in the linear AS test set. How-
ever, only 30% of pairs were confirmed in the CHT set and 15% of pairs in the QTL set. Since the average ratios were used in both
linear AS mapping and allele ratio correlations, this set has much higher concordance rate than that in other sets as expected.
The phased allele ratio was also used to verify ASE mapping results. Allele ratio values that were shared in all three cell types from
166 samples were extracted. To obtain more reliable results, we required a minimum of 3 SNPs in the gene region with at least 40
reads each for all genes with the most significant p value at 10% FDR. The allele ratios between lead SNP homozygous and hetero-
zygous groups should have significant difference.

Annotation and Comparative Analyses
QTL Sharing and Cell-type Specificity. Based on the p1 statistic (Storey and Tibshirani, 2003) and the procedures in (Nica et al., 2011),
QTL sharing was estimated as the proportion of true associations p1 among the QTLs from a first cell type in a second cell type. We
employed qvalue to compute p1 as 1p0, where p0 is the estimated proportion of truly null associations. Cell type specificity was
estimated as 1 minus the average of p1 values from one cell type in the others.
Enrichment of ISO QTLs in Biologically Relevant Features. We tested for ISO QTLs (FDR % 5%) mapping to exons, and splice sites
more often than non-ISO QTLs (FDR > 5%) with matched minor allele frequencies. We also compared the distance to the closest
exon for intronic ISO QTLs and non-ISO QTLs.
GWAS Annotation and Enrichment in QTL Overlaps. To overlap our molecular QTLs to GWAS disease variants, we use the full sum-
mary statistics of selected seven autoimmune diseases: celiac disease [CEL] (Dubois et al., 2010), inflammatory bowel disease [IBD]
(Liu et al., 2015), including Crohn’s disease [CD] (Liu et al., 2015) and ulcerative colitis [UC] (Liu et al., 2015), multiple sclerosis [MS]
(Beecham et al., 2013), Type 1 diabetes [T1D] (Onengut-Gumuscu et al., 2015), and rheumatoid arthritis [RA] (Okada et al., 2014)). The
associations of IBD, CD and UC in the European cohorts were used for this study. We also used Type 2 diabetes (Morris et al., 2012)
as a negative control. If the lead QTL (%5% FDR) or its LD tag (r2 R 0.8) maps to a GWAS variant (P value % 5x108), then we consider
that the QTL overlaps with a GWAS signal. Here, we calculated the LD information of the QTLs based on our WGS data using plink
(Purcell et al., 2007) and 500 kb window.
In order to systematically measure the statistical significance of the overlaps between GWAS disease variants and molecular QTLs,
we used GARFIELD (Iotchkova et al., 2016), a novel enrichment analysis approach taking genome-wide association summary sta-
tistics to calculate odds ratios for association between annotation overlap and disease status at given GWAS significance thresholds,
while testing for significance via generalized linear modeling framework accounting for linkage disequilibrium, minor allele frequency,
and local gene density. Linkage disequilibrium was calculated using SNPs from the combined UK10K and 1000 genomes Phase3
European cohorts. For functional annotations, we used the genomic positions of unique significant variants (5% FDR) for each
QTL type (gene expression, splicing, methylation, H3K27ac and H3K4me1) in all three cell types. We tested for enrichment variants
reaching 1x105 significance threshold for selected autoimmune diseases as listed above. Multiple testing correction was further
performed on the effective number of annotations used.
Colocalization between Diseases and Molecular Trait. We used a Bayesian colocalization method (Giambartolomei et al., 2014;
Pickrell et al., 2016) to elucidate whether the observed overlap between disease and molecular trait may due to a shared genetic
effect. The method calculates the posterior probability (PP), versus the null model of no association, for four alternative models: a
model where a region or locus contains a single variant associated with either the molecular trait or disease (models 1,2); a model
where a single causal variant affects association with both traits (model 3); or a model where two distinct associations exist (model 4).
The method derives the PP of each variant in the locus being causal one under different models, and the PP of a given locus is then the
integral sum of the PPs of all variants within, with all variants under equal prior probability to be causal. The prior for each model is
computed to be one that maximizes the log-likelihood function (Pickrell et al., 2016). We acknowledge the limitations of the model: it
assumes one causal variant in the locus; and in the case of high LD between two causal variants the model has limited power to distin-
guish model 4 from model 3. We also note that colocalization does not imply a causal relationship between molecular trait and
diseases, but may be compatible also with the same variant having independent (‘pleiotropic’) effects on molecular traits and dis-
ease. We applied colocalization test for each of the 1,003 disease-molecular trait pairs, where the lead SNPs in both traits are in
high. r2 R 0.8. To avoid overlapping 2Mb-wide genetic loci due to features in close proximity (e.g., splicing junctions, genes, histones
peaks, CpGs in islands), we tested colocalization per locus, which means that the prior model parameters were estimated using one
locus instead of multiple loci and hence the priors may be overestimated.
Integration with Blueprint ChromHMM Segmentation States. We used the reference Blueprint chromatin segmentation states for
the three cell types in this study, full methodology is described here (Carrillo de Santa Pau et al., 2016). For each cell type, the chro-
matin states were inferred using ChromHMM (Ernst and Kellis, 2012) and six histone modification markers: H3K4me3, H3K36me3,
H3K27ac, H3K4me1, H3K27me3 and H3K9me3. The 11 chromatin states are: E1 for Transcription Low signal H3K36me3; E2 Tran-
scription High signal H3K36me3; E3 Heterochromatin High Signal H3K9me3; E4 Low signal; E5 Repressed Polycomb High signal
H3K27me3; E6 Repressed Polycomb Low signal H3K27me3; E7 Repressed Polycomb TSS High Signal H3K27me3 & H3K4me3 &
H3K4me1; E8 Enhancer High Signal H3K4me1; E9 Active Enhancer High Signal H3K4me1 & H3K27Ac; E10 Active TSS High Signal
H3K4me3 & H3K4me1; E11 Active TSS High Signal H3K4me3 & H3K27Ac. For each cell type, we then merged (Bedtools
multiIntersectBed) the chromatin states from multiple replicates (2 monocytes, 8 neutrophils, 6 T cells) requiring that the state is pre-
sent in at least 50% of the samples. Hence, we only used one reference chromatin state per cell type.
Data Resources
The full QTL summary statistics from this study can be accessed from http://blueprint-dev.bioinfo.cnio.es/WP10/qtls. The accession
numbers for the alignment data reported in this paper are European Genome-phenome Archive (EGA): EGAD00001002663 (WGS),
EGAD00001002671/EGAD00001002674/EGAD00001002675 (RNA), EGAD00001002670/EGAD00001002672/EGAD00001002673

(ChIP-seq) and EGAS00001001456 (450K DNA methylation). Quantification matrices, donor metadata, Chip-seq peaks and Chip-
seq coverage files are available via ftp://ftp.ebi.ac.uk/pub/databases/blueprint/blueprint_Epivar/.
ADDITIONAL RESOURCES
Chromatin immunoprecipitation protocols: http://www.blueprint-epigenome.eu/index.cfm?p=7BF8A4B6-F4FE-861A-

2AD57A08D63D0B58


Figure S1. Sample Collection, Related to Figure 1
(A) Morphological assessment of purified cell preparations. Cells were fixed to slides using a Cytospin and stained using Wright-Giemsa stain prior to photo-
graphing using 100x magnification.
(B) Examples of neutrophil, monocyte and naive CD4+ T cell staining to assess purity of cell preparations.
(C) Histogram of cell purity based on FACs analysis in three cell types.
(D) Details of data production centers. Data from this project were produced in different institutes as detailed here: University of Cambridge- UCAM, European
Bioinformatics Institute- EBI, Wellcome Trust Sanger Institute- WTSI, Nijmegen Centre for Molecular Life Sciences- NCMLS, University College London- UCL,
McGill University- McGill, Max Planck Institute for Molecular Genetics- MPIMG. Peripheral blood mononuclear cells (PBMC) were isolated from donors at UCAM
and from these Monocytes (M), Neutrophils (N), naive CD4+ T cells (T) were extracted, with a further aliquot used as a source of genomic (g)DNA samples. gDNA
was shipped to the WTSI for sequencing, the monocyte/neutrophil samples were divided between MPIMG/UCL/WTSI+NCMLS for RNA-seq, DNA methylation
sequencing (Methylation) and ChIP-seq respectively and the naive CD4+ T cells sent to McGill for RNA-seq/Methylation/ChIP-seq. In addition to this three
samples from each institute/assay set were sent to the reciprocal institute for cross-center validation purposes (eg RNA-seq assays were carried out on the same
three samples at both MPIMG and McGill etc.). Data processing/analysis was carried out at WTSI for WGS and RNA-seq, UCL for DNA methylation sequencing
and EBI for ChIP-seq.
A B C
2 3
0.04
number of SNVs (x 1,000,000)
number of INDELs (x 10,000)

BLUEPR.
2.5
CHB
CHS 1.5
JPT
2
LWK
0.02
YRI
CEU
1 1.5
GBR
FIN
IBS
0.00
1
TSI
ASW 0.5
CLM
0.5
MXL
−0.02
PUR
0 0
1−5 5−10 10−20 20−30 30−40 40−50 50−60 60−70 70−80 80−90 90−100 1−5 5−10 10−20 20−30 30−40 40−50 50−60 60−70 70−80 80−90
−0.02 0.00 0.02 0.04 0.06
PC1
allele frequency (%) allele frequency (%)
D E F
50 4.0 14
deletions insertions number of variants (x 1,000,000)
number of INDELs (x 10,000)
12
40
3.5
counts per sample
depth of coverage
10
counts (x 1,000)
30
8
3.0
20 6
4
2.5
10
2
0 2.0 0
−60 −40 −20 0 20 40 60 0 50 100 150 200 0 50 100 150 200
length samples samples
G H 20
2.6
Ts/Tv ratio
Het/HomAlt ratio
2.4
15
2.2
percentage
ratio
2.0 10
1.8
5
1.6
1.4 0
A>C A>G A>T C>A C>G C>T G>A G>C G>T T>A T>C T>G
0 50 100 150 200
samples substitution types
I J K
0.25
0.25
0.25
0.20
0.20
0.20
0.15
0.15
0.15
Density
Density
Density
0.10
0.10
0.10
0.05
0.05
0.05
0.00
0.00
0.00
−10 −5 0 5 10 −10 −5 0 5 10 −10 −5 0 5 10
M−value M−value M−value
L M N
20
150
600
Proportion of variance (%)
15
PC 2 (13.51%)
Dimension 2
50
200
10
0
−50
−200
5
−150
−600
0
−400 −200 0 200 400 600 800 −200 0 200 400 600
PC1 to PC10 PC 1 (18.71%) Dimension 1

Figure S2. WGS and DNA Methylation Sample and Data Quality Metrics, Related to Figures 3 and 4
WGS (A-H) and DNA methylation (I-N) sample and data quality metrics. (A) Principal component analysis (PCA) scatterplot of the first two components using the
resulting merged datasets (1000GP + Blueprint). The dashed line indicates the arbitrary threshold to discriminate the population of European ancestry.
(B) Number of SNPs (x106) by non-reference allele frequency (AF) bins. (C) Number of INDELs (x104) by non-reference AF bins. (D) Size distribution of INDELs.
Negative lengths represent deletions and positive lengths represent insertions. (E) Number of SNPs (x106) and INDELs (x104) by sample. (F) Depth of coverage by
sample. (G) Ratio of heterozygous and homozygous non-reference SNP genotypes by sample and transition to transversion ratio (Ts/Tv) by sample. (H) Types of
substitution in percentage. (I-K) Distributions of DNA methylation M-values for each cell type. Each line represents one sample. (L) Barplot representing the
proportions of variance explained by the first ten principal components of a principal component analysis across all samples used in the study. (M) Visualization of
the first two principal components of a principal component analysis across all samples used in the study. Each data point represents one sample, colored by cell
type. (N) Multidimensional scaling of all samples used in the study, based on Euclidean distances. Each data point represents one sample, colored by cell type.
Figure S3. RNA-Seq Distribution and Batch Correction, Related to Figures 3 and 4
(A) PCA before and after batch correction using ComBat in gene level. Darker color lines and dots represent cross-over samples from different sequence center.
Distribution of normalized read counts (log2) in gene level in monocytes, neutrophils and naive CD4+ T cells.
(B) Scatterplots of the pairwise correlation of gene quantification between crossover samples before and after batch correction.
(C) Distribution of before and after PEER corrected PSI values (upper panel) and PCA plots (lower panel) in monocytes, neutrophils and naive CD4+ T cell.
A G Monocytes H3K27ac Neutrophil H3K27ac T-cell H3K27ac
70
H3K27ac 80
H3K4me1
4 4 4
4
60 70
Monocytes
60
3 3 3
3
% Individuals
3
50
50
Neutrophils
Density
Density
Density
40
T-cells 2 2 2
2
40
30
30
1 1 1
1
20
20
10 10
0 0 0
0
0 0
0 10 20 30 40 50 60 >70 0 10 20 30 40 50 60 >70 -2−2 -1−1 00 11 22 -2−2 -1−1 00 11 22 -2−2 -1−1 00 11 22
Million Reads H
B log2rpm
ChIP center: WTSI
log2rpm
NCMLS McGill
log2rpm
200000
70000
● ●●
● ●●●
●●●●●●
● ● ●●● ●
● ●● ●
●●
● ● ● ●● ●
●●●● ●●● ●
60000
●
●● ● ● ● ● ●
● ●●
● ●●
●
● ● ● ●●
Number Peaks
●
150000
● ● ●
● ●
● ● ● ●●● ●
50000
● ●● ●●●
● ●●●
●●● ●●● ●●
●
● ●●●
●
●●●
●●
●●
●●●●● ●● ●
● ●●●
●●
●● ● ● ● ● ●
● ●●● ● ●
● ● ● ● ●●● ●●● ●
● ● ● ● ●●●●●●●●● ● ● ●●
● ● ● ●●●● ● ●● ●
40000
● ● ●
● ● ●●
● ● ●● ● ●
● ●●●●● ● ●
●
● ●● ●●●●●●●● ● ● ●
● ● ● ●
●● ●●● ●● ●●●
●●
100000
● ●● ●● ●● ●●● ●
● ●● ● ● ● ●● ●●
● ●● ● ●●● ● ●
●●●● ●●●●● ● ● ●●● ●● ●
30000
● ●●●●
● ●
●
●
●●●●● ● ● ●●●●● ● ●●● ●●● ● ●
● ● ● ● ● ●●●●● ● ●
● ● ●
●●●
●●●●
●
●●●●●● ●●● ●●●●● ●●
●● ●
●●●
●●●● ●
●●●●● ●● ● ● ● ●●●●●
●● ● ●
● ●●● ●●●●● ●●●● ●
● ●● ●
● ●
●● ● ● ●●
●●● ●● ●
●●●● ● ●
●● ●●● ●● ● ●●●●● ● ●
● ● ●●● ● ●●● ●●● ●● ● ● ●
● ● ●●
20000
● ●●●● ●
●●● ● ●●●
● ● ●●●●●● ●●● ● ●
●●●● ● ●
●● ●● ●●● ●●
● ●●●●
● ●●●●●●●●● ● ● ●●
●●●
● ● ●
● ●● ●●● ● ● ● ● ●●● ●
●●●
● ● ● ●● ●
● ●●●● ●●●● ●● ●● ● ●●● ●
50000
● ●●● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
●●●● ●●●
●●●
●●●●●●
●● ●● ●
● ● ● ●●●●
●
●● ●
●●● ● ●●●● ●
10000
● ● ● ● ●
●●● ●●●● ● ●●●
●●● ● ● ● ● ● ●●●
● ●
● ● ● ●●● ●
●
●●●●
●●
● ●● ●●●
●●
●
●●
C ●
●
●
●●
0.1 0.2 0.3 0.4 FriP
70
●●
●
●●
●
●
●
0.1 0.2 0.3 0.4 0.5 I Monocytes
Monocytes H3K4me1 Neutrophil NeutH3K4me1
rophils T-cell H3K4me1
T−cells
100
90
4 4 6
4
6
60
80
5
50
70
3 3
3
3
% Individuals
4
60 40
Density
Density
Density
50 2 2
2
3
30
40
2
2
30 1 1
1
1
20
1
20
10
0 0 0
0
0
10
0 0
-2−2 -1−1 00 11 22 −2
-1−1 00 11 22 -2−2 -1−1 00 11 22
>1.85
-2
1.05
1.15
1.25
1.35
1.45
1.55
1.65
1.75
0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
0.4
0.6
0.8
1.2
1.4
1.6
1.8
2.2
2.4
2.6
2.8
0
1
>3
1
RSC
J log2rpm log2rpm log2rg2grpm
D ChIP center: WTSI NCMLS McGill
25 70
60
20
% Individuals
50
15
40
30
10
20
5
10
0 0
>1.85
1.05
1.15
1.25
1.35
1.45
1.55
1.65
1.75
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1
1.05
1.15
1.25
1.35
1.45
1.55
1.65
1.75
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1
0
>1.85
0
NSC
E F H3K27ac
SOO26A SOO2FT SOO2KJ
Z-score
McGill
r = 0.94 r = 0.87 r = 0.96
NCMLS
SOO6UK SOO7CF SOO7VE
r = 0.86 r = 0.97 r = 0.96 H3K4me1
Z-score
SOO7DD SOO7G7
r = 0.96 r = 0.88

Figure S4. ChiP-Sequencing Data Quality Metrics, Related to Figure 3
(A–D) ChIP-seq quality control plots with consistent color convention throughout; Neutrophil (blue), Monocyte (green) and T cell (yellow). Plots are split by factor
assayed, H3K4me1 (left) and H3K27ac (right). (A) Histogram displaying bins of quality control passed reads on x axis and percent of individuals falling into each
bin on y axis. (B) Scatterplot displaying number of peaks called at FDR threshold per individual and colored by cell type is shown on x axis. On the y axis fraction of
reads intersecting a consensus peak set of regions shared across all three cell types. (C) Histogram displaying bins of normalized strand coefficient on x axis,
y axis percent of individual which fall into each bin. (D) Histogram displaying bins of relative strand coefficient on x axis, y axis percent of individual which fall into
each bin.
(E and F) Scatterplot showing the Pearson correlation r between replicates of same donors processed at NCMLS and McGill (F) Hierarchical clustering for each
histone modification marker using Pearson correlation as distance metrics and standardized log2 RPM (Reads Per Million) in chromosome 1 only. Similar
clustering is likewise seen for all chromosomes.
(G–J) PEER corrected matrices of log2 RPM. Density of log2 RPM values for H3K27ac in (G) and for H3Kme1 in (I). Scatterplot colored by the ChIP center of the
first two orthogonal components from PCA for H3K27ac in (H) and for H3Kme1 in (J).
Figure S5. Variance Decomposition Analyses, Related to Figure 2
(A and B) Figures showing analogous results as those presented in Figure 2B; however, for neutrophils and naive CD4+ T cells.
(C–F) Variance partitioning results obtained from the joint model across all four molecular layers in monocytes. Shown are the distributions of variance explained
by genetics, cumulative epigenetics as well as separately for individual epigenetic layers for different sets of genes. Specifically, genes were stratified by the
median of (C) the total variance explained by the joint model (‘‘low’’ and ‘‘high’’ indicate genes below and above the median), (D) the median gene-expression
level, (E) gene type and (F) the variance of the log of the expression levels.
(G and H) Figures showing analogous results as those presented in Figures 2C–2E; however for neutrophils and naive CD4+ T cells.
(I) Pairwise correlation of the variance explained by different molecular layers between monocytes and neutrophils. Epigenetic contributions were estimated using
a model that accounts for underlying genetic variation (see the STAR Methods). The Spearman’s rank correlation (r) is also reported. Venn Diagrams show the
overlap of genes with significant genetic, methylation, H3K4me1 and H3K27ac contributions between monocytes and neutrophils (FDR < 5%, using a variance
component test, see the STAR Methods).
(J) Comparison of variance component estimates for individual molecular layers either considering a model that accounts for expression heterogeneity (EH,
y axis) or a model that does not account for EH (no EH, x axis) in monocytes (see the STAR Methods). The genetic variance estimates were consistent across both
approaches, whereas epigenetic variance estimates were substantially increased when not using the additional EH adjustment.
(K) Comparison of the proportion of variance explained by different molecular layers across cell types when either considering a 100Kb or a 1Mb cis window (see
the STAR Methods).
A Monocytes Neutrophils T cells B
Both Both Both
Only without accounting Only without accounting Only without accounting
Only accounting Only accounting Only accounting
Figure S6. EWAS, Related to Figure 2

(A) Scatterplot of the gene-level P values (see the STAR Methods) obtained from the EWAS analysis either accounting (y axis) or not (x axis) for genetic effects in all
three cell types. Genes with significant cis-epigenetic association only when not accounting for underlying genetic effects (‘‘Only without accounting,’’ FDR < 5%)
are indicated in dark blue. Genes with significant cis-epigenetic association only when accounting for underlying genetic effects (‘‘Only accounting’’) are indicated
in green. Finally, genes with significant cis-epigenetic associations both when accounting or not for underlying genetic effects (‘‘Both’’) are indicated in blue.
(B) Manhattan plot for the gene MSR1 (ENSG00000038945), illustrating a cis epigenetic association that is robust to correction of genetic effects. Shown
are -log10(pv) from an EWAS analysis either without accounting for cis genetic effects (top panel) or when accounting for cis genetic variation (bottom panel).
A B
5
0.025
ASE
ASE
CHT
SEC
0.020
4
Prob. to be most significant SNP
0.015
3
Fold−enrichment
0.010
2
0.005
1
−100kb
−90kb
−80kb
−70kb
−60kb
−50kb
−40kb
−30kb
−20kb
−10kb
TSS
TES
+10kb
+20kb
+30kb
+40kb
+50kb
+60kb
+70kb
+80kb
+90kb
+100kb
0
E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11
3.5
CHT
C ASE
ASE/QTL
0.025
QTL
3.0
0.020
2.5
Fold−enrichment
2.0
0.015
1.5
0.010
1.0
0.005
0.5
0.000
0.0
−100kb
−90kb
−80kb
−70kb
−60kb
−50kb
−40kb
−30kb
−20kb
−10kb
TSS
TES
+10kb
+20kb
+30kb
+40kb
+50kb
+60kb
+70kb
+80kb
+90kb
+100kb
E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11
SEC
D Monocyte
18857
Neutrophil
0.5
CD4+ T−cell
Monocyte AND Neutrophil
10175
Monocyte AND CD4+ T−cell

Proportion of features with QTL/ASE/ASH (5% FDR)
CD4+ T−cell AND Neutrophil

0.015
Monocyte, Neutrophil, AND CD4+ T−cell

14160
0.4
5772
5724
5473
5315
4591
3714
3542
0.3
13429
3482
2721
3546
4443
0.010
12605
6787
5406
12851
8726
4706
1114
1320
2387
3859
0.2
784
4588
4260
1970
0.005
1035
0.1
2532
2571
468
1299
3094
1071
1312
1669
2428
1997
228
1111
827
169
692
122
870
435
317
0.0
ASE ASE ASE CHT CHT CHT SASE

−100kb
−90kb
−80kb
−70kb
−60kb
−50kb
−40kb
−30kb
−20kb
−10kb
TSS
TES
+10kb
+20kb
+30kb
+40kb
+50kb
+60kb
+70kb
+80kb
+90kb
+100kb
RNASeq H3K4me1 H3K27ac RNASeq H3K4me1 H3K27ac RNASeq
Figure S7. Distribution of Primary ASE Associations, Related to Figures 5 and 6

(A) Distribution of primary associations with respect to measured transcript for ASE (Blue), CHT (Red), or secondary, conditional ASE (Green) associations. The
relative density of associations is adjusted to tested common SNPs in different bins.
(B) Enrichment of chromHMM chromatin states for top primary ASE (Blue), primary CHT (Orange), or secondary ASE (gray) associations. The y axis is the fold-
enrichment of SNPs in E1-E11 chromHMM states relative to all SNPs tested for association.
(C) Enrichment of chromHMM chromatin states for primary ASE associations (Blue), top primary associations overlapping from ASE and QTL tests (Orange), and
from QTL tests (gray). The y axis is the fold-enrichment of SNPs in E1-E11 chromHMM states relative to all SNPs tested for association.
(D) Proportion of associations versus tested traits. For each type of test (ASE/CHE/ASES) and assay (Gene, H3K27ac, H3K4me1), the proportion of features with a
QTL/ASE/ASH at 5% FDR relative to the total number of features tested is shown as bar graph for each cell type alone (Blue shade), common to two cell types
(Red shade), and common all three cell types (green). The actual number of features at 5% FDR is shown above each bar.
Resource
The Allelic Landscape of Human Blood Cell Trait

Variation and Links to Common Complex Disease
William J. Astle, Heather Elding,
Tao Jiang, ..., Willem H. Ouwehand,
Adam S. Butterworth, Nicole Soranzo
Correspondence
jd292@medschl.cam.ac.uk (J.D.),
david.roberts@ndcls.ox.ac.uk (D.J.R.),
who1000@cam.ac.uk (W.H.O.),
asb38@medschl.cam.ac.uk (A.S.B.),
ns6@sanger.ac.uk (N.S.)
In Brief
As part of the IHEC Consortium, this
study probes the allelic architecture and
regulatory landscape of cellular complex
traits with power to identify causal
pathways and links to diseases such as
schizophrenia. Explore the Cell Press
IHEC web portal at http://www.cell.com/
consortium/IHEC.
Highlights
d Genome-wide association study interrogates 36 traits
across the hematopoietic system
d A total of 2,706 associated variants, including 130 rare and

230 low frequency
d Describes allelic spectrum and heritability of coding and

regulatory variants
d Unravels causal contributions to cardiovascular, immune,

and psychiatric disease
Astle et al., 2016, Cell 167, 1415–1429

Resource
The Allelic Landscape of Human Blood Cell Trait

Variation and Links to Common Complex Disease
William J. Astle,1,2,3,4,31 Heather Elding,5,6,31 Tao Jiang,4,31 Dave Allen,7 Dace Ruklisa,1,2,3 Alice L. Mann,5 Daniel Mead,5
Heleen Bouman,5 Fernando Riveros-Mckay,5 Myrto A. Kostadima,1,2,8 John J. Lambourne,1,2 Suthesh Sivapalaratnam,1,9
Kate Downes,1,2 Kousik Kundu,1,5 Lorenzo Bomba,5 Kim Berentsen,10 John R. Bradley,11,12 Louise C. Daugherty,1,2,13
Olivier Delaneau,14 Kathleen Freson,15 Stephen F. Garner,1,2 Luigi Grassi,1,2 Jose Guerrero,1,2 Matthias Haimel,11,13
Eva M. Janssen-Megens,10 Anita Kaan,10 Mihir Kamat,4 Bowon Kim,10 Amit Mandoli,10 Jonathan Marchini,16,17
Joost H.A. Martens,10 Stuart Meacham,1,2,13 Karyn Megy,1,2,13 Jared O’Connell,16,17 Romina Petersen,1,2
Nilofar Sharifi,10 Simon M. Sheard,18 James R. Staley,4 Salih Tuna,1,13 Martijn van der Ent,10 Klaudia Walter,5
2National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
3Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Forvie Site,
Robinson Way, Cambridge CB2 0SR, UK

4MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Strangeways
Research Laboratory, Wort’s Causeway, Cambridge CB1 8RN, UK

5Department of Human Genetics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1HH, UK

6The National Institute for Health Research Blood and Transplant Unit (NIHR BTRU) in Donor Health and Genomics at the University of
Cambridge, University of Cambridge, Strangeways Research Laboratory, Wort’s Causeway, Cambridge CB1 8RN, UK
7Blood Research Group, NHS Blood and Transplant, John Radcliffe Hospital, Headley Way, Headington, Oxford OX3 9BQ, UK
8European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton,

9Department of Haematology, Barts Health NHS Trust, The Royal London Hospital, Whitechapel Road, London, London E1 1BB, UK
10Department of Molecular Biology, Radboud University, Faculty of Science, Nijmegen 6525GA, the Netherlands
11Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0QQ, UK
12National Institute for Health Research Cambridge Biomedical Research Centre, Cambridge University Hospitals, Cambridge CB2 0QQ, UK
13NIHR BioResource-Rare Diseases, University of Cambridge, Cambridge Biomedical Campus, Long Road, Cambridge CB2 0PT, UK
14Département de Génétique et Développement (GEDEV), University of Geneva, 1211 Geneve 4, Switzerland
15Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, University of Leuven, 3000 Leuven, Belgium
16Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
17Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK
18UK Biobank Ltd., 1-4 Spectrum Way, Adswood, Stockport SK3 0SA, UK
SUMMARY genome regulatory domains. Finally, through Mende-

lian randomization, we provide evidence of shared
Many common variants have been associated with genetic pathways linking blood cell indices with
hematological traits, but identification of causal complex pathologies, including autoimmune dis-
genes and pathways has proven challenging. We eases, schizophrenia, and coronary heart disease
performed a genome-wide association analysis in and evidence suggesting previously reported popu-
the UK Biobank and INTERVAL studies, testing 29.5 lation associations between blood cell indices and
million genetic variants for association with 36 red cardiovascular disease may be non-causal.
cell, white cell, and platelet properties in 173,480
European-ancestry participants. This effort yielded
INTRODUCTION
hundreds of low frequency (<5%) and rare (<1%) var-
iants with a strong impact on blood cell phenotypes. Modern genetic analysis has transformed our understanding of
Our data highlight general properties of the allelic the contribution of inherited variation to complex human disease.
architecture of complex traits, including the propor- Over the last decade, the widespread application of large-scale
tion of the heritable component of each blood trait genome-wide association studies based on sparse genotyping
explained by the polygenic signal across different arrays has led to a dramatic increase in the number of known
Shuang-Yin Wang,10 Eleanor Wheeler,5 Steven P. Wilder,8 Valentina Iotchkova,5,8 Carmel Moore,4
Jennifer Sambrook,1,2,4 Hendrik G. Stunnenberg,10 Emanuele Di Angelantonio,4,6,19 Stephen Kaptoge,4,6
Taco W. Kuijpers,20,21 Enrique Carrillo-de-Santa-Pau,22 David Juan,22 Daniel Rico,22,23 Alfonso Valencia,22
Lu Chen,1,5 Bing Ge,24 Louella Vasquez,5 Tony Kwan,24 Diego Garrido-Martı́n,25,26 Stephen Watt,5 Ying Yang,5
Roderic Guigo,25,26,27 Stephan Beck,28 Dirk S. Paul,4,28 Tomi Pastinen,24 David Bujold,24 Guillaume Bourque,24
Mattia Frontini,1,2,19 John Danesh,4,5,6,12,19,* David J. Roberts,29,30,* Willem H. Ouwehand,1,2,5,6,19,*
Adam S. Butterworth,4,6,19,* and Nicole Soranzo1,5,6,19,32,*
20Emma Children’s Hospital, Academic Medical Center (AMC), University of Amsterdam, Location H7-230, Meibergdreef 9,
Amsterdam 1105AZ, the Netherlands
21Blood Cell Research, Sanquin Research and Landsteiner Laboratory, Plesmanlaan 125, Amsterdam, 1066CX, the Netherlands
22Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3,
28029 Madrid, Spain

23Institute of Cellular Medicine, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 4HH, UK
24Human Genetics, McGill University, 740 Dr. Penfield, Montreal, QC H3A 0G1, Canada
25Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology,
Carrer del Dr. Aiguader, 88, Barcelona 8003, Spain

26Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Plaça de la Mercè, 10- 12, Barcelona 8002, Spain
27Computational Genomics, Institut Hospital del Mar d’Investigacions Mediques (IMIM), Carrer del Dr. Aiguader, 88, Barcelona 8003, Spain
28UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
29Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Headington, Oxford OX3 9DU, UK
30Department of Haematology, Churchill Hospital, Headington, Oxford OX3 7LE, UK
31Co-first author
32Lead Contact
*Correspondence: jd292@medschl.cam.ac.uk (J.D.), david.roberts@ndcls.ox.ac.uk (D.J.R.), who1000@cam.ac.uk (W.H.O.), asb38@

medschl.cam.ac.uk (A.S.B.), ns6@sanger.ac.uk (N.S.)
disease-associated genetic variants (Hindorff et al., 2009). The explore rare and low-frequency variant associations. Increas-
development of clinically useful applications of these discov- ingly large whole-genome sequencing (WGS) reference panels
eries, such as disease prediction algorithms, identification of are being created. Larger panels include rare alleles from more
etiological mechanisms (Ferreira et al., 2013; Voight et al., variants and better capture the between-variant correlation
2012), and prioritization of new targets for drug discovery (Lopez, structure of study populations (1000 Genomes Project Con-
2008) has lagged behind. This is due partly to the characteristics sortium et al., 2015; Iotchkova et al., 2016b; Loh et al., 2016;
of the disease-associated variants, which are predominantly UK10K Consortium et al., 2015). Here, we exploit the recent im-
common (minor allele frequency [MAF] R5%), which tend to provements in the quality of imputation to carry out association
be associated with small differences in disease risk and analyses of rare and low-frequency genetic variants with 36
which often lie in regulatory regions of the genome, hindering different blood cell indices.
the identification of causal alleles, genes, and disease Blood cells make essential contributions to oxygen transport,
mechanisms. hemostasis, and innate and acquired immune responses (Jenne
Examples of low-frequency (MAF = 1%–5%) and rare variant et al., 2013; Jensen, 2009; Varol et al., 2015) and participate in
(MAF <1%) associations are beginning to emerge from the many other functions such as iron homeostasis, the clearance
application of massively parallel whole genome and exome of apoptotic cells and toxins, vascular and endothelial cell func-
sequencing to human populations (Polfus et al., 2016). Associ- tion, and response to systemic stress (Buttari et al., 2015). Qual-
ated rare variants tend to be easier to link to genes as they itative or quantitative abnormalities of blood cell formation, and
map predominantly in or near coding regions and have fewer of their physiological and functional properties, have been asso-
correlated variants. Furthermore, they can have larger pheno- ciated with predisposition to cancer and with many severe
typic effect sizes and are more likely to act through interpretable congenital disorders including anemias, bleeding, and throm-
mechanisms such as disruption of protein function. These fea- botic disorders and immunodeficiencies (Routes et al., 2014;
tures also enhance their clinical and scientific usefulness. For Schneider et al., 2015). Furthermore, variations in the properties
instance, rare loss of function alleles can be used to assess of many blood-cell subtypes have been associated with a wide
the likely consequences of modulating a pathway pharmacolog- variety of systemic diseases. However, the causal relationships
ically to prevent disease (Plenge et al., 2013). However, very between blood indices and disease risks are unclear and this
large studies are required for power to detect rare variant asso- hinders their potential value for informing new treatments.
ciations and consequently the sequencing approach is still rela- We report over 2,500 variants independently associated with
tively limited by cost. variation in the 36 indices. We examine the genetic architecture
Genotype imputation of large population cohorts (i.e., the of the associated variants and use them to reveal causal relation-
systematic genome-wide statistical inference of unmeasured ships with autoimmune, cardiovascular, and psychiatric dis-
genotypes using exogenous reference panels of sequenced indi- eases. Overall, this study expands the repertoire of genes and
viduals) (Howie et al., 2011) is fast becoming a viable strategy to regulatory mechanisms governing hematopoietic development
1416 Cell 167, 1415–1429, November 17, 2016

Phenotypes Study samples Genetic analysis
Genotyping with Affymetrix Axiom

UK Biobank
(N=87,265) arrays
HSC
UK BiLEVE Sample and variant QC

MPP (N=45,694)
CMP LMPP INTERVAL Imputation to UK10K+1000Genomes

(N=40,521) Project (phase 3) reference panel
MEP GMP CLP
Sample and variant QC
Study-specific association analysis
Meta-analysis
Ba Ne Eo Mo T B
Multiple regression analysis

P R
Ma APC
3,755 ‘conditional lead’ variants
(6,736 trait-variant pairs)
Platelets Red cells Myeloid Lymphoid
PLT#, PCT, RBC#, HGB, HCT, NEUT#, BASO#, EO#, MONO#, LYMPH#
PDW, MPV MCV, RDW, MCH, MYELOID, GRAN#, GRAN%MYELOID,
MCHC, RET#, RET NEUT%GRAN, BASO%GRAN, EO LD-clumping
%, IRF, HLSR#, %GRAN, (NEUT+EO)#, (BASO
HLSR% +NEUT)#, (EO+BASO)#
2,706 sentinel variants

Compound white cell
BASO%, NEUT%, EO%, MONO%, LYMPH%, WBC
Annotation and integrative analyses
HSC = hematopoietic stem cell; MPP = multipotent progenitor; LMPP = lymphomyeloid-restricted progenitors; CMP = common myeloid progenitor;
CLP = common lymphoid progenitor; MEP = megakaryocyte and erythroblast progenitor; GMP = granulocyte macrophage progenitor; P = platelet; R = red cell;
Ba = basophil; Ne = neutrophil; Eo = eosinophil; Mo = monocyte; Ma = macrophage; APC = antigen presenting cell; T = T-lymphocyte; B = B-lymphocyte.
Figure 1. Study Design for GWAS of Complete Blood Count Indices

The phenotypes and their classification by hematopoietic cell type; the study sample sizes; and a summary of the analysis methods employed to identify
associated loci. Blood cell index names are defined in Table S1.
in humans and opens potential avenues for targeting key path- (p value < 8.31 3 109) associations for each trait (Xu et al.,
ways involved in abnormal or dysregulated hematopoiesis. 2014) (STAR Methods). We identified 6,736 conditionally inde-
pendent index-variant associations and clustered these variants
RESULTS into 2,706 high linkage disequilibrium (LD) groups each repre-
sented by a sentinel variant (between-sentinel pairwise LD r2 <
Genetic Discoveries 0.8) (Figure 2; Tables S3 and S4). We confirmed the accurate
To identify genetic variants associated with 36 blood cell indices imputation of variants at the rare end of the allelic spectrum by
with increased resolution and statistical power, we studied a to- genotype comparisons with high read-depth (>503) whole
tal of 173,480 European ancestry individuals from three large- exome sequencing data from overlapping individuals, which
scale UK studies—INTERVAL (Moore et al., 2014), approved showed 92.95% concordance and 94.97% precision for rare
by Cambridge (East) Research Ethics Committee, UK Biobank alleles (STAR Methods). Of the sentinel variants, 283 were corre-
(Sudlow et al., 2015), and UK BiLEVE (a selected subset of the lated (r2 R 0.8) with previously reported variants (Table S5), vali-
UK Biobank cohort) (Wain et al., 2015), both approved by the dating most blood trait associations reported in populations of
North West Multi-centre Research Ethics Committee (Figures European ancestry (Gieger et al., 2011; van der Harst et al.,
1, S1, and S2; Tables S1 and S2). We tested univariate associa- 2012; Vasquez et al., 2016).
tions of 36 indices with 29.5 million imputed variants passing The sentinel variants included an unprecedented number of
quality control filters (MAF >0.01%, Figure S3) and used low-frequency (n = 210) and rare (n = 130) alleles (Figure 3A).
stepwise multiple regression to identify a parsimonious subset The genetic associations were almost completely cell-type-spe-
of genetic variants explaining the genome-wide significant cific (Figure 3B), with 900 sentinels (33%) associated exclusively
Cell 167, 1415–1429, November 17, 2016 1417

Figure 2. Summary of Genetic Associations with the 36 Blood Cell Indices
A Manhattan plot summarizing genome-wide phenotypic associations over 36 indices. Each dot corresponds to a variant. Its x coordinate represents its genomic
position and its y coordinate represents the maximum -log10 (p value) for association over all phenotypes. Variants with -log10 (p value) <6 have been removed for
clarity. The yellow horizontal line at p = 8.31 3 109 represents the GWAS significance threshold. Sentinel variants are colored green if their associations (or
associations with their proxies) have been previously reported and are colored red otherwise.
See also Table S3.
with red blood cell traits, 1,040 (38%) exclusively with white typic effect sizes between variants mapping to five distinct reg-
cell traits, and 570 (21%) exclusively with platelet traits. Only ulatory states inferred from genome segmentations based on six
five common variants (at ZFP36L2/THADA, SH2B3, HBS1L, histone marks in matched cells. Variants mapping to enhancer
PRTFDC1, and GCKR) were associated with traits across all and promoter regions had larger median effect sizes than those
six trait classes defined in Table S1. mapping to other regulatory classes (Figure 3E).
Curated genes known to cause rare inherited Mendelian blood
Properties and Biological Significance of Associated disorders (Greene et al., 2016; Westbury et al., 2015) were en-
Variants riched among genes containing conditionally significant associ-
To evaluate the representation of classes of genetic variants ations between variants altering protein sequence (missense,
across the allele frequency spectrum, we annotated variants frameshift, stop gained, start lost variants) and blood indices of
with their most severe consequence on GENCODE transcripts cell types matched to the disorders. For instance, we detected
using VEP (McLaren et al., 2016). Variants predicted to have se- a 21.3 (95% confidence interval [CI]: 5.8–52.0) fold enrichment
vere consequences (missense, frameshift, stop gained, start lost (FE) of Mendelian genes for bleeding, thrombotic and platelet
variants; Table S4) were highly enriched in the rare and low-fre- disorders in the platelet-associated genes, a 34.0 FE (95% CI:
quency ranges, consistent with observations from large-scale 11.4–72.1) of genes carrying mutations for Mendelian diseases
sequencing projects (UK10K Consortium et al., 2015) and nega- of the red blood cells in red cell genes and a 6.8 FE (95% CI:
tive selection against variants affecting protein function (Fig- 2.2–15.6) of Mendelian genes for primary immune disorders in
ure 3C). Phenotypic effect sizes (the absolute additive change myeloid white cell genes. The enrichment overlaps included a
in trait mean measured in SD per allele) decreased with known pathogenic missense variant (Landrum et al., 2016) in
decreasing severity of the variant consequence (p = 2.2 3 myeloperoxidase deficiency (MPO) (Romano et al., 1997), and
1016, Jonckheere-Terpstra test for trend in absolute value of we identified additional known pathogenic variants in uncurated
effect size with VEP impact; Figure 3D). For instance, missense genes including CX3CR1 (HIV progression) (Faure et al., 2000)
changes were overrepresented in the rare frequency range and hemochromatosis type 1 (HFE) (Adams et al., 2005) (Table
(p = 9.8 3 1029, Pearson’s c2 test) and displayed larger absolute S4). We also found rare missense variants in Mendelian disorder
effect sizes compared to non-missense variants (median 0.063 genes that had not previously been associated with blood cell
SD versus 0.035 SD, p = 2.5 3 1016, Mann-Whitney-Wilcoxon indices (Table S3) and/or where no pathogenic variants have
test). There were also significant differences in median pheno- been recorded in ClinVar. For example, missense variants in
1418 Cell 167, 1415–1429, November 17, 2016

A C
B E
Figure 3. Distribution of Genetic Effects and Variant Consequences

(A) Number of conditionally independent genetic associations categorized by blood cell index and by MAF range.
(B) Summary of sizes of subsets of sentinel variants categorized by cell types of associated indices, showing that most associations are cell-type-specific. Each
bar counts the number of sentinel variants associated with and only with the blood index class(es) shown. (mRBC, Mature RBC; iRBC, Immature RBC; Lymph,
Lymphoid WBC; Comp, Compound WBC; All, Intersection of all blood index classes. ‘‘Other’’ counts variants uncounted by the other bars.) See Table S1 for
blood index classification.
(C) Bar plot showing the proportions of variants categorized by VEP consequence stratified by derived allele frequency (DAF) range.
(D and E) Violin plots showing the distribution of the absolute value of the estimated effect size stratified by VEP impact categories (D) or cell-matched chromatin
segmentation states (E). p values correspond to Mann-Whitney-Wilcoxon tests comparing the distributions indicated.
See also Table S4.
GMPR, TMC8, and RIOK3 were associated with reticulocyte with platelet indices, ten of which were missense variants and
counts. one a nonsense variant (in KALRN). These include variants
More generally, the 158 variants predicted to alter protein from regions previously identified to contain common weak-
sequence (Table S4) are of interest because of their potential effect variants (IQGAP2, JAK2, SH2B3, and TUBB1) but also
medical value. We focused on rare (MAF < 1%) protein-altering from three gene regions not previously identified by GWAS
variants because they can be more reliably linked to causal (CKAP2L, PLEK, and TNFRSF13B).
genes. For red blood cell indices, we found 14 missense We identified 11 rare protein-altering variants associated with
variants and one frameshift variant (in SPHK1), only one of white cell traits, including ten missense variants in regions previ-
which (rs116100695) was previously identified as pathogenic. ously associated (CEBPE, CXCR2, IL17RA, S1PR4), as well as in
rs116100695 is a rare missense variant in PKLR causing red novel genes not previously known to play a role in hematological
cell pyruvate kinase deficiency, a common cause of hereditary processes. These findings demonstrate roles in leukocyte for-
nonspherocytic hemolytic anemia (Kanno and Miwa, 1991). mation and/or function for ALOX15, AMICA1, and PLEK. Finally,
Some of the other variants are in genes previously associated some rare missense variants had pleiotropic effects across cell
with hereditary anemias. For example, a rare missense variant types. For instance, the rare missense variant in TNFRSF13B
(rs201514157) in SPTA1 was associated with reticulocyte count, (rs72553883) causing common variable immunodeficiency and
and a rare missense variant (rs202099525) in PIEZO1 was asso- selective immunoglobulin A deficiency (Castigli et al., 2005)
ciated with mean corpuscular hemoglobin concentration. Simi- was associated with platelet, myeloid white cell and lymphoid
larly we identified 11 rare protein-altering variants associated white cell indices (Table S4).
Cell 167, 1415–1429, November 17, 2016 1419

A B
C D
Figure 4. Allelic Architecture of Blood Cell Indices

(A) Scatterplot showing the relationship between estimated derived allele frequency (DAF) and the absolute value of the estimated effect size for the sentinel
variants. The inset gives the same plot on the logit/log scales. Only associations annotated with an ancestral allele are shown.
(B) Scatterplot of LD score estimated heritability (due to common variants) against the (unadjusted) phenotypic variance explained by the conditionally significant
variants in a multiple regression model, colored according to index type.
(C) A barplot showing the LD score estimated heritability due to common variants (upper limit of gray bars) and the distribution of the unadjusted proportion of
phenotypic variance explained (R2) by the conditionally significant variants grouped by genomic location (range of color fills).
(D) The same plot for variants grouped by cell-matched chromatin segmentation states.
See also Table S4.
Overall, these results expand our knowledge of the genes and size >0.5 SD, suggesting an upper boundary on phenotypic
regulatory regions controlling blood cell biology and function. effect sizes for variants in these frequency classes. The relation-
For rare variants, there were too few minor allele homozygotes ship between allele frequency and the absolute value of the
to estimate precisely genotypic effects on phenotype, even estimated effect size for the sentinel variants could in principle
across >170,000 individuals. However, the magnitude of some be explained by differential winner’s curse by allele frequency
rare heterozygote effects suggests that the corresponding (Figure 4A). However, the strength of the signal strongly sug-
homozygote effects could be clinically relevant. Indeed, it is gests natural selection against variants with large effects.
possible that effects of some homozygotes are more than double Conversely, associations with large phenotypic effects were
those of corresponding heterozygotes depending on the degree overrepresented among rare variants (p value = 1.58 3 1077,
of loss or gain of function, possible compensatory pathways, Pearson’s c2 test), with 21 rare sentinel variants having an esti-
and stress or demand for adaptation in response to injury or mated effect size >0.5 SD (median MAF = 0.09%), five of which
insult. had effects greater than 1 SD (Table S4). These correspond to
effects on traits of 2.73 g/dl, 3.77 fL (femtoliters), 51 3 109/L,
Allelic Architecture of Hematological Indices and 1.37 3 109/L for hemoglobin concentration (HGB), mean
The comprehensive nature of this study allows us to draw more corpuscular volume (MCV), and platelet and neutrophil counts,
general inferences about the allelic architecture of hematological respectively. The effect sizes seen in heterozygotes are suffi-
indices as an exemplar class of complex human traits. Our anal- ciently large to cause disease when carried in homozygosis.
ysis had at least 80% power to detect associations explaining Using the LD score regression (Finucane et al., 2015)
0.0265% of trait variance, which could be attained by a per-allele approach to polygenic modeling, we estimated that common
additive effect as small as 0.023 phenotypic SD for common autosomal genotypes explained between 18% and 30% of vari-
(MAF R5%) variants and 1.154 SD for variants at the lower limit ance in platelet indices, between 10% and 28% of variance in
of the frequency range we considered (MAF = 0.01%). No com- red cell indices, and between 5% and 21% of variance in white
mon or low-frequency variant had an estimated absolute effect cell indices (Figure 4B). Conditionally significant coding variants
1420 Cell 167, 1415–1429, November 17, 2016

explained between 0.2% and 3.7% of trait variance (R2 unad- 2015), or, more recently, annotation of physical interactions be-
justed for winner’s curse), while intronic variants, variants near tween different regions of the genome (Hughes et al., 2014).
genes, and intergenic variants explained between 1.2% and However, as the fraction of the genome that is annotated con-
18.0%, between 0.6% and 6.7%, and between 0.5% and tinues to increase, so does the risk of non-functional (random)
6.4% of trait variance, respectively (Figure 4C). Interestingly, overlap. The intersection of genetic and regulatory data at the
conditionally significant variants associated with mean platelet level of individual genetic variants allows formal modeling of
volume (MPV) explain a slightly larger proportion of trait variance the probability that a cellular or organismal trait ‘‘colocalizes’’
than the polygenic common-variant estimate of heritability made with its molecular counterpart, thus allowing the robust assign-
by LD score. This suggests that the low frequency and rare var- ment of candidate genes and functional mechanisms to GWAS
iants we discovered contribute more to heritability than the un- variants. For example, in a companion paper by the BLUEPRINT
discovered common variants. The extent of the winner’s curse project, we have shown that only 25% of disease associations
effect will need to be assessed when comparable datasets that were in high LD (r2 R 0.8) with a given molecular event had a
become available (e.g., the remaining 350,000 UK Biobank high probability (>99%) of colocalization (Chen et al., 2016).
participants), but if the effect is weak, we may have identified The Chen et al. (2016) dataset includes three primary human
almost all common variants with non-negligible effects ass- cell types (classical monocytes, neutrophils, and CD4+ naive
ociated with MPV. However, as a substantial proportion of the T cells) matched to our blood indices. We thus accessed sum-
common-variant heritable variance of most blood cell indices re- mary statistics generated for gene expression (eQTL), mRNA
mains unexplained by the conditionally significant genetic splicing (sQTL), and histone modifications marking enhancers
variants, it is likely that many more common variants of small and promoters (H3K4me1 hQTL and H3K27ac hQTL) and used
effect remain to be discovered. Moreover, larger studies are summary-data-based Mendelian randomization (SMR) analysis
also likely to identify even rarer variants with stronger effects, (Zhu et al., 2016) to test for colocalization of signal between mo-
which will be clinically valuable. lecular and blood cell index GWAS in matched cell types
Finally, we estimated the proportion of the heritable compo- (MONO#, NEUT#, and LYMPH#) (Figure 6A).
nent of each blood cell index that was explained by the polygenic Across all the three cell-types and the four QTL datasets, there
signal across different genome regulatory domains, as defined were 276 cell trait variants that colocalized with at least one
by chromatin segmentation states in the relevant cell types molecular QTL in the corresponding cell type, indicating a shared
(Carrillo de Santa Pau et al., 2016). We found that variants lying genetic influence on the two phenotypes (p value HEIDI > 0.05;
in enhancers explained 19%–46% of heritable variation, with Table S6). As in the Chen study, only 30% of overlapping asso-
similar estimates for transcribed regions (15%–48%), and lower ciations detected resulted in a robust colocalization. Overall,
estimates for promoters (4%–30%) and silencers (3%–15%). we can thus assign a putative functional interpretation to 10%
Additionally, we estimated the variance explained by the condi- of all sentinel variants. Only 47% of colocalizing signals involved
tionally significant variants using multiple regression, showing changes in gene expression or mRNA splicing (126 unique
that the identified signals are distributed across regulatory states genes), indicating likely effector genes underpinning associa-
(Figure 4D). To understand the extent to which these patterns tions. These include disease-associated variants (e.g., an eQTL
may be driven by cell-type-specific regulatory elements, we for SLC22A5 associated with Crohn’s disease and a sQTL for
used a robust non-parametric analysis approach to test the GSDMB associated with a range of autoimmune diseases) (Fig-
significance of enrichments of each set of summary statistics ures 6B and 6C). The remaining 53% of signals involved histone
against cell-type matched and cell-type discordant chromatin modifications, indicating a regulatory change not associated with
segmentation states (Iotchkova et al., 2016a). Active enhancer detectable expression changes in our data. Interestingly, 24
regions defined by H3K4me1/H3K27ac histone modifications instances involving both gene expression and histone modifica-
(E9, Figure 5) demonstrated striking patterns of cell-type speci- tions at closely located variants allow us to assign putative regu-
ficity of enrichments compared to those defined by other chro- latory elements to their effector genes, as illustrated by the case
matin states. For example, we saw up to 15-fold enrichment of of the JAZF1 locus (Figure 6B). Overall, these examples show
red-cell associations in corresponding active enhancer regions how genetic variants affect cellular traits and complex disease
and up to 10-fold enrichment for platelet signals in megakaryo- through molecular mechanisms of gene regulation.
cyte (the platelet progenitor cell) enhancers. There was also sta-
tistically significant evidence for depletion of associated variants Causal Contribution of Hematological Trait Variation to
in transcriptionally inactive regions. Common Complex Diseases
Patients with complex disease often display abnormal blood cell
Regulatory Consequences of Blood-Cell-Associated index levels, but it is not always clear whether these reflect etio-
Variants logical roles of hematological pathways or are a consequence of
The linking of regulatory variants to their effector genes and disease. As pharmacological modulation of blood cell indices
mechanisms continues to be a challenge for the complex traits advances, identifying shared causal pathways between these
community. Public resources that annotate sequence variation indices and complex diseases could provide new therapeutic
facilitate the task through overlap with molecular traits including opportunities. Mendelian randomization (MR) uses genetic vari-
cell-type-specific chromatin states and transcription factors ants to estimate causal associations, reducing the potential
(Roadmap Epigenomics Consortium et al., 2015), gene expres- for confounding and reverse causation that limit observational
sion quantitative trait locus datasets (eQTL) (GTEx Consortium, studies.
Cell 167, 1415–1429, November 17, 2016 1421

Figure 5. Enrichment of Trait Associations within Regulatory Regions
Odds ratios (bar heights) and 95% confidence intervals (whiskers) for enrichment of blood-index associations with chromatin segmentation states from blood
cells. P values for significance are obtained from a generalized linear model, modeling a threshold on the GWAS test statistic as a Bernoulli response while
controlling for MAF, distance from gene, and number of LD proxies. The cell types are shown from left to right in each block as follows: megakaryocyte (i.e., the
platelet progenitor, purple), erythroblast (i.e., the red cell progenitor, red), monocyte (orange), eosinophil (orange), neutrophil (orange), naive B cell (light blue), and
T cell (light blue).
See also Table S4.
We conducted a multivariable MR analysis to reassess epide- mune, three cardiometabolic, and five neuropsychiatric diseases
miological correlations between blood cell indices and a range (STAR Methods) and used genetic variants associated with 13
of human complex diseases and to identify shared causal main hematological indices. For each index-disease pair, we
pathways. The multivariable approach is advantageous because estimated the unconfounded increase in the odds ratio of
it ensures that results for one index are conditional on (i.e., disease per unit change (in SD) in the index. We applied a multi-
control for covariation in) all other indices. For this analysis, we ple testing correction for 182 disease-index comparisons
retrieved publicly available summary statistics for six autoim- (Figure 7).
1422 Cell 167, 1415–1429, November 17, 2016

Figure 6. Colocalization between Cellular and Molecular Traits
(A) Illustrates the models tested using SMR, as well as the number of variants that were significant for both the cellular and molecular trait at a p value threshold of
8.4 3 106 that show colocalization (PHEIDI > 0.05) between the cellular and the molecular trait and the overlap of colocalized marks between the four marks
across the three cell types.
(B and C) Regional plots for the colocalization result in the (B) JAZF1, (C) SLC22A5I and GSDMB loci for monocytes and T cells. The gray squares represent the
p value distribution for the corresponding (monocyte and lymphocyte) blood cell index. The black triangles represent the GWAS variant that colocalizes with the
eQTL (pink diamond), hQTL (light blue diamonds), and sQTL (gold diamond). The dark blue diamonds represent QTL in the region that do not show colocalization.
The crosses represent the regional QTL p value distribution.
See also Table S6.
We detected significant associations between white blood cell of the MHC region (Table S7). Other loci containing alleles
indices and autoimmune diseases (Figure 7C). The strongest robustly associated with higher eosinophil count and increased
was a positive association between eosinophilic indices and risk of rheumatoid arthritis were COG6, SPRED2, RUNX1, and
asthma (asthma odds ratio [OR] per SD increase in eosinophil the highly pleiotropic ATXN2/SH2B3/BRAP region (Table S4).
count = 1.71; 95% CI: 1.53–1.95; p = 4.0 3 1022). This finding As with eosinophils, we saw directionally discordant disease
corroborates evidence from known associations with eosinophil associations with lymphocyte count, which had positive associ-
counts at confirmed asthma loci, such as IL5, IL33, and IL1R1, ations with schizophrenia (OR = 1.17, 95% CI: 1.10-1.24; p =
as well as our discovery that the region around TSLP (another 1.1 3 107), multiple sclerosis (OR = 1.28, 95% CI: 1.14–1.45;
known asthma locus) contains three independent signals associ- p = 6.6 3 105), and coronary heart disease (CHD) (OR = 1.10,
ated with eosinophil count (Table S4). There was weaker evi- 95% CI: 1.04–1.15; p = 1.8 3 104), as well as inverse associa-
dence of a positive association between asthma and neutrophil tions with asthma (OR = 0.81, 95% CI: 0.73–0.90; p = 7.6 3 105)
indices (p = 2.74 3 105), as well as inverse associations with and celiac disease (OR = 0.75, 95% CI: 0.64–0.87; p = 2.6 3
monocyte (p = 1.24 3 104) and lymphocyte (p = 7.56 3 105) 104). However, only the associations with multiple sclerosis
counts. There was also strong evidence for a positive associa- and celiac disease were robust to removal of the MHC region,
tion between eosinophilic indices and rheumatoid arthritis suggesting that genes within MHC predominantly drive the links
(OR = 2.34, 95% CI: 2.01–2.74; p = 1.84 3 1027), a signal that between schizophrenia, coronary artery disease, and asthma.
was robust to a range of sensitivity analyses, including removal Finally, there was a weak positive association of CHD risk with
Cell 167, 1415–1429, November 17, 2016 1423

A PDW MPV PLT#
B IRF RET# RDW HCT MCH
C MONO# BASO# EO# NEUT# LYMPH#
Significant (p<2.7x10-4) non-significant Significant (p<2.7x10-4) non-significant Significant (p<2.7x10-4) non-significant
Autoimmune Autoimmune Autoimmune

MONO#
AST AST AST EO#

LYMPH#
NEUT#
CEL CEL CEL EO#

LYMPH#
IBD IBD IBD
MS MS MS LYMPH#
RA RA RA EO#
T1D T1D T1D EO#
Cardiometabolic Cardiometabolic Cardiometabolic
CKD CKD CKD
CHD MPV CHD RET#

CHD LYMPH#
T2D T2D T2D

RET#
Neuropsychiatric Neuropsychiatric Neuropsychiatric

AD AD AD
BpD BpD BpD
CrD CrD CrD
MDD MDD MDD
SCZ SCZ SCZ LYMPH#
0.50 1.0 2.0 0.50 1.0 2.0 0.50 1.0 2.0 0.50 1.0 2.0 3.0 0.25 0.50 1.0 2.0
Odds Ratio Odds Ratio Odds Ratio
Asthma (AST), celiac disease (CEL), inflammatory bowel disease (IBD), multiple sclerosis (MS), rheumatoid arthritis (RA) and type 1 diabetes (T1D). Chronic kidney disease (CKD),
coronary heart disease (CHD) and type 2 diabetes (T2D). Alzheimer's disease (AD), bipolar disorder (BpD), cross disorder (CrD), major depressive disorder (MDD) and schizophrenia (SCZ).
Figure 7. Causal Associations with Common Diseases

(A–C) A forest plot showing the results of the multivariable Mendelian randomization (MR) analysis conducted on 13 blood cell indices versus fourteen common
diseases. Colored diamonds represent the significant trait-disease association at our Bonferroni corrected p value threshold of 2.7 3 104 with uncolored circles
denoting non-significant results. Each diamond/circle represents the estimated unconfounded causal odds ratio of disease risk per SD increase of the blood cell
index, adjusted for all other blood cell indices tested. The size of the shape is inversely proportional to the SE and the whiskers denote 95% confidence intervals.
Forest plots are presented for (A) platelet indices, (B) immature and mature red cell indices, and (C) myeloid and lymphoid white cell indices.
See also Table S7.
reticulocyte indices (OR = 1.12; 95% CI: 1.07–1.17; p value = et al., 2016; Paul et al., 2015). Clues to these molecular pathways
1.7 3 106) and a weak inverse association of CHD risk with have traditionally come from discoveries of highly penetrant mu-
MPV (OR = 0.92; 95% CI: 0.88–0.96; p = 8.1 3 105), both of tations associated with inherited disorders of the hematopoietic
which were robust to all sensitivity analyses (Figure 7). system, somatic mutations underlying blood cell cancers, and
These analyses have suggested a weak but significant positive from functional screens in model organisms (Boatman et al.,
association between hemolysis and CHD risk. This may prompt 2013; Ganz and Nemeth, 2012). More recently, such studies
re-evaluation of the risk of arterial thrombosis for patients with have been complemented by high-throughput molecular and ge-
on-going hemolysis as has been done for venous thrombosis. netic analyses of common biological variation (Vasquez et al.,
Perhaps, most strikingly the association between eosinophil 2016). Our study benefited from a substantial increase in statis-
count and rheumatoid arthritis may trigger more detailed genetic tical power compared to previous GWAS, driven by improve-
and clinico-epidemiological studies to dissect the provoking and ments in study design and data capture, including the use of
perpetuating pathology of this inflammatory disease. dense WGS-imputation panels and the accurate adjustment of
phenotypes for biological and technical covariates.
DISCUSSION The new associations, including a large number of rare and
low-frequency coding variants, define a detailed atlas of genes
The molecular programs that control hematopoietic stem cell dif- and regulatory regions influencing blood cell indices with
ferentiation and proliferation are incompletely understood (Notta cell-type-specific effects. There were several rare variants in
1424 Cell 167, 1415–1429, November 17, 2016

genes known to carry mutations causing severe disorders. For tients, and the magnitude of eosinophilia has been associated
example, rs149000560, a rare missense variant associated with disease severity or activity, but little attention has been
with immature red cell indices, lies in FERMT3, the gene respon- given to a pathogenetic role of eosinophils in rheumatoid
sible for the leukocyte adhesion deficiency-1/variant syndrome arthritis. Our data support recent hypotheses linking eosinophil
(Kuijpers et al., 2009). Loss of function mutations in CKAP2L activation in rheumatoid processes (Rosenstein et al., 2014).
associated with platelet traits cause the autosomal-recessive Eosinophilic indices were also weakly positively associated
Filippi syndrome characterized by microcephaly, pre-, and with both celiac disease (p = 3.28 3 105) and type 1 diabetes
post-natal growth failure, although case series do not describe (p = 7.66 3 105), highlighting a key role of eosinophils in path-
hematological abnormalities (Hussain et al., 2014). CKAP2L is ways influencing the development of a range of autoimmune
associated with microtubules in dividing cells and the associa- diseases.
tion of a mutation in this gene with platelet phenotype and Immune system dysfunction has been suspected to play a role
cortical development reflects the role of tubulin function in meg- in schizophrenia, a hypothesis supported by abnormal lympho-
akaryopoiesis and neuronal migration (Moon and Wynshaw-Bo- cyte levels seen in schizophrenic patients but lacking support
ris, 2013). PLEK (encoding pleckstrin) is not known to carry rare from longitudinal data (Miller et al., 2013). Our finding of shared
pathogenetic mutations but it is a crucial protein for platelet func- genetic links between lymphocyte count and schizophrenia at
tion. Platelets from mice lacking pleckstrin exhibit a marked the MHC region through multiple independent pathways may
defect in exocytosis of delta and alpha granules, alphaIIbbeta3 support a pathogenic role for immune dysfunction in develop-
activation, actin assembly, and aggregation (Lian et al., 2009). ment of schizophrenia, exemplified by the recent identification
Other variants point to previously unknown genes. For instance, of key complement factor genes (C4A, C4B) as drivers of schizo-
the biological functions of TMC8 and RIOK3 in developing phrenia (Sekar et al., 2016). The positive association of lympho-
erythroid cells are not understood but their associations with cyte count with multiple sclerosis is also confirmatory of the
specific blood cell phenotypes may inform future experimental assumed pathogenetic role of T cells and is supported by the
studies. For example, RIOK3 has been associated with organiza- strong enrichment of genes involved in T cell activation or prolif-
tion of the actin cytoskeleton, as a component of pre-40S pre-ri- eration among known multiple sclerosis loci (Sawcer et al.,
bosomal particle and as mediating phosphorylation of MDA5. 2011).
Finally, other rare variants were potentially regulatory, map- The most intriguing observations were the weak positive asso-
ping to intronic regions of genes not expressed in the relevant ciation of CHD risk with reticulocyte indices and the weak inverse
cell types. For instance, there was a series of intronic variants association of CHD risk with MPV. Reticulocyte count and per-
associated with MCV in NPRL3, LUC7L, ITFG3, and AXIN1 centage are indicators of erythrocyte turnover and higher levels
genes that lie within 1Mb of the alpha globin gene. Such variants indicate increased hemolysis, which leads to increased levels
may be in LD with a deletion of the respective variants of alpha of circulating free hemoglobin. Our data were consistent with
globin (HGA), but it is also possible that the respective variants previous studies that have shown that reduced clearance of
are disrupting long-range enhancers of alpha-globin. free hemoglobin in carriers of the haptoglobin Hp2-2 allotype is
One intriguing set of associations with multiple hemato- associated with more oxidative stress and inflammation (Asleh
poietic lineages was of variants in genes involved in sphingosine et al., 2005; Kristiansen et al., 2001) and is associated with a
signaling. A frameshift variant in the sphingosine-1-phosphate higher risk of CHD events in type 1 diabetes (Ijäs et al., 2013;
kinase gene (S1PK) and a missense variant in the sphingosine- Levy et al., 2002). Moreover, it is also well established that free
1-phosphate receptor gene (S1PR2), which is expressed during hemoglobin in blood substitutes leads to reduced nitrous oxide,
erythroid development, were associated with altered reticulo- increased vasoconstriction, and a higher risk of acute myocar-
cyte count. Missense alleles in S1PR4 are associated with dial ischemia (Natanson et al., 2008). Our data support the
reduced neutrophil, monocyte, and eosinophil counts consis- hypothesis that hemolysis and risk of CHD are influenced by
tent with previous reports (CHARGE Consortium Hematology shared causal pathways. However, the pathways through which
Working Group, 2016). Taken together, these data suggest increased MPV could be protective of atherosclerotic disease
sphingosine-1-phosphate may be involved with the release remain to be determined, as does the apparent contradiction
and/or survival of red cells as well as white cells. with prospective observational studies, which have reported
Variation in blood cell indices has been linked to diseases with associations in the opposite direction (Sansanayudh et al.,
high population burdens, including chronic complex conditions 2014).
such as autoimmune disease, susceptibility to infection, and Finally, we were also able to reduce the likelihood of causality
respiratory and cardiovascular illnesses. Here, we used Mende- for several previously reported observational associations be-
lian randomization inference to unravel causal mechanisms tween blood cell indices and risks for various complex diseases,
underlying reported index-disease correlations and applied a including previously reported associations of total white blood
range of sensitivity analyses. Our genetic evidence for a causal cell, granulocyte, and neutrophil counts with CHD risk (Wheeler
role of eosinophilic pathways in asthma supports the pathophys- et al., 2004) and type 2 diabetes (Gkrania-Klotsas et al., 2010),
iological and pharmacological evidence that eosinophils are and red blood cell count associations with CHD risk (Schaffer
key effector cells in asthma pathogenesis (Zijlmans et al., et al., 2015) and red cell distribution width and mean corpuscular
2008). More surprising was the strong evidence for a positive as- volume with type 2 diabetes (Engström et al., 2014). This sug-
sociation between eosinophilic indices and rheumatoid arthritis. gests that the original observational studies were likely to be
Unexplained eosinophilia has been reported in rheumatoid pa- confounded.
Cell 167, 1415–1429, November 17, 2016 1425

In conclusion, the discovery of a substantial number of rare al- W.J.A., H.E., T.J., T.W.K., D.J.R., W.H.O., A.S.B., and N. Soranzo.; Visualiza-
leles with large effect sizes highlights the potential of large-scale tion, W.J.A., H.E., T.J., A.L.M., D.M., K.K., L.B.,V.I., and N. Soranzo; Supervi-
sion, J.D., D.J.R.,W.H.O., A.S.B., and N. Soranzo; Project Administration,
population studies to identify variants on a continuum between
D.M., A.S.B., and N. Soranzo; Funding Acquisition, J.R.B., C.M., E.D.A.,
extremely rare highly penetrant mutations driving Mendelian dis- M.F., J.D., W.H.O., A.S.B., and N. Soranzo.
orders and common variants of weak effect typically identified by
GWAS. Our results are expected to boost current efforts to iden-
ACKNOWLEDGMENTS
tify and assess possible novel etiologies and therapeutic targets
for hematological diseases. Some of the variants discovered This research has been conducted using the UK Biobank Resource under
have phenotypic effects of large magnitude, perhaps sufficient Application Number 13745. We thank Drs. Jarob Saker and Joachim Linssen
to cause disease if carried in homozygosis. Carrier status may in- of Sysmex Europe and Rob Gillions of UK Biobank for invaluable technical
fluence the interpretation of clinical tests of blood cell indices, assistance and advice. We gratefully acknowledge the participation of all UK
Biobank, NIHR Cambridge BioResource, and INTERVAL volunteers. We thank
and the variants and loci could be incorporated into the current
the INTERVAL study co-ordination teams (at the Universities of Cambridge
diagnostic panels for inherited anemia and thrombocytopenia and Oxford and at NHS Blood and Transplant [NHSBT]), including the blood
after biological validation of these results (Lentaigne et al., donation staff at the 25 static centers, for their help with INTERVAL participant
2016; Roy et al., 2016). recruitment and study fieldwork, as well as the Cambridge BioResource and
NHSBT staff for their help with volunteer recruitment. We thank members of
STAR+METHODS the Cambridge BioResource Scientific Advisory Board and Management
Committee for their support of our study and the NIH Research Cambridge
Biomedical Research Centre for funding (RG64219). We thank Stephen
Detailed methods are provided in the online version of this paper Burgess for providing code to conduct multivariable Mendelian randomization
and include the following: analysis. K.D. is funded as a HSST trainee by NHS Health Education England.
M.F. is funded from the BLUEPRINT Grant Code HEALTH-F5-2011-282510
d KEY RESOURCES TABLE and the BHF Cambridge Centre of Excellence (RE/13/6/30180). J.R.S. is
d CONTACT FOR REAGENT AND RESOURCE SHARING funded by a MRC CASE Industrial studentship (G66840), co-funded by Pfizer
d EXPERIMENTAL MODEL AND SUBJECT DETAILS (G73632). J.D. is a British Heart Foundation Professor, European Research
B The UK Biobank Study Council Senior Investigator, and National Institute for Health Research
B The UK BiLEVE Study (NIHR) Senior Investigator. S.M., S.T., M.H., K.M., and L.C.D. are supported
by the NIHR BioResource-Rare Diseases, which is funded by the NIHR
B The INTERVAL Study
(RBAG163). Research in the W.H.O. laboratory is supported by program
d METHOD DETAILS grants from the NIHR to W.H.O., the European Commission (HEALTH-F2-
B The UK Biobank and UK BiLEVE Affymetrix Axiom 2012-279233), the British Heart Foundation (BHF) to W.J.A. and D.Ruklisa.
Genotyping Arrays under numbers RP-PG-0310-1002 and RG/09/12/28096, and Bristol Myers-
B Genotyping Squibb; the laboratory also receives funding from NHSBT. W.H.O. is a NIHR
B Quality Control (QC) of Genotype Data Senior Investigator. The INTERVAL academic coordinating centre receives
core support from the UK Medical Research Council (G0800270), the BHF
B Variant Imputation
(SP/09/002), the NIHR, and Cambridge Biomedical Research Centre, as well
B Phenotype Measurement, QC, and Processing
as grants from the European Research Council (268834), the European
d QUANTIFICATION AND STATISTICAL ANALYSIS Commission Framework Programme 7 (HEALTH-F2-2012-279233), Merck,
B Association Tests, Meta-Analyses, and Identification and Pfizer. D.J.R. and D.A. were supported by the NIHR Programme
of Distinct Associations ‘‘Erythropoiesis in Health and Disease’’ (NIHR-RP-PG-0310-1004). N.Soranzo
B Annotation of Associated Variants is supported by the Wellcome Trust (WT098051 and WT091310), the EU FP7
B Mendelian Randomization Analysis (EPIGENESYS 257082 and BLUEPRINT HEALTH-F5-2011-282510). The
INTERVAL study is funded by NHSBT (11-01-GEN) and has been supported
by the NIHR-BTRU in Donor Health and Genomics (NIHR BTRU-2014-
10024) at the University of Cambridge in partnership with NHSBT. The views
expressed are those of the authors and not necessarily those of the NHS,
Supplemental Information includes three figures and seven tables and can be
the NIHR, the Department of Health of England, or NHSBT. D.G.-M. is sup-
found with this article online at http://dx.doi.org/10.1016/j.cell.2016.10.042.
ported by a ‘‘‘‘la Caixa’’’’-Severo Ochoa pre-doctoral fellowship. J.O’C. is an
employee of Illumina Inc., a public company that develops and markets sys-
AUTHOR CONTRIBUTIONS tems for genetic analysis and receive shares as part of his compensation.
Conceptualization, J.D., D.J.R., W.H.O., A.S.B., and N. Soranzo; Methodol-

Received: February 1, 2016
ogy, W.J.A., H.E., T.J., D. Ruklisa, S.M.S., J.R.S., S.K., A.S.B., and N. Soranzo;
Software, W.J.A., H.E., T.J., D. Ruklisa, M.K., J.R.S., and V.I.; Validation,
W.J.A., H.E., T.J., and F.R.-M.; Formal Analysis, W.J.A., H.E., T.J., D.A., D. Ru-
klisa, A.L.M., M.A.K., K.K., L.B., L.G., R.P., K.W., E.W., V.I., A.S.B., and N. Sor-
anzo; Investigation, W.J.A., H.E., T.J., D. Ruklisa, H.B., M.A.K., J.J.L., S.S.,
K.D., K.F., L.G., J.G., R.P., J.R.S., M.F., A.S.B., and N. Soranzo; Resources, REFERENCES
K.B., J.R.B., O.D., S.F.G., M.H., E.M.J-M., A.K., B.K., A.M., J.M., J.H.A.M.,
J.O’C., N. Sharifi, S.M.S., S.T., M.v.d.E., S-Y.W., S.P.W., C.M., J.S., H.G.S., Abraham, G., and Inouye, M. (2014). Fast principal component analysis of
E.C.-d.-S.-P., D.J., D. Rico, A.V., L.C., B.G., L.V., T.K., D.G.-M., S.W., Y.Y., large-scale genome-wide data. PLoS ONE 9, e93766.
R.G., S.B., D.S.P., T.P., D.B., G.B., M.F., J.D., D.J.R., W.H.O., A.S.B., and Adams, P.C., Reboussin, D.M., Barton, J.C., McLaren, C.E., Eckfeldt, J.H.,
N. Soranzo; Data Curation, W.J.A., H.E., T.J., D.A., D.Ruklisa, L.C.D., S.F.G, McLaren, G.D., Dawkins, F.W., Acton, R.T., Harris, E.L., Gordeuk, V.R.,
S.M., K.M., S.K., A.S.B. and N. Soranzo; Writing — Original Draft, W.J.A., et al.; Hemochromatosis and Iron Overload Screening (HEIRS) Study
H.E., T.J., D.J.R., A.S.B., and N. Soranzo; Writing — Review & Editing, Research Investigators (2005). Hemochromatosis and iron-overload
1426 Cell 167, 1415–1429, November 17, 2016

screening in a racially diverse population. N. Engl. J. Med. 352, 1769– and influences risk of diverse inflammatory diseases. PLoS Genet. 9,
1778. e1003444.
Asleh, R., Guetta, J., Kalet-Litman, S., Miller-Lotan, R., and Levy, A.P. (2005). Finucane, H.K., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P.-R.,
Haptoglobin genotype- and diabetes-dependent differences in iron-mediated Anttila, V., Xu, H., Zang, C., Farh, K., et al.; ReproGen Consortium; Schizo-
oxidative stress in vitro and in vivo. Circ. Res. 96, 435–441. phrenia Working Group of the Psychiatric Genomics Consortium; RACI
1000 Genomes Project Consortium, Auton, A., Brooks, L.D., Durbin, R.M., Consortium (2015). Partitioning heritability by functional annotation using
Garrison, E.P., Kang, H.M., Korbel, J.O., Marchini, J.L., McCarthy, S., McVean, genome-wide association summary statistics. Nat. Genet. 47, 1228–1235.
G.A., and Abecasis, G.R. (2015). A global reference for human genetic varia- Ganz, T., and Nemeth, E. (2012). Hepcidin and iron homeostasis. Biochim.
tion. Nature 526, 68–74. Biophys. Acta 1823, 1434–1443.
Boatman, S., Barrett, F., Satishchandran, S., Jing, L., Shestopalov, I., and Zon, Gieger, C., Radhakrishnan, A., Cvejic, A., Tang, W., Porcu, E., Pistis, G., Ser-
L.I. (2013). Assaying hematopoiesis using zebrafish. Blood Cells Mol. Dis. 51, banovic-Canic, J., Elling, U., Goodall, A.H., Labrune, Y., et al. (2011). New gene
271–276. functions in megakaryopoiesis and platelet formation. Nature 480, 201–208.
Bowden, J., Davey Smith, G., and Burgess, S. (2015). Mendelian randomiza- Gkrania-Klotsas, E., Ye, Z., Cooper, A.J., Sharp, S.J., Luben, R., Biggs, M.L.,
tion with invalid instruments: effect estimation and bias detection through Chen, L.-K., Gokulakrishnan, K., Hanefeld, M., Ingelsson, E., et al. (2010).
Egger regression. Int. J. Epidemiol. 44, 512–525. Differential white blood cell count and type 2 diabetes: systematic review
Burgess, S., and Thompson, S.G. (2015). Multivariable Mendelian randomiza- and meta-analysis of cross-sectional and prospective studies. PLoS ONE 5,
tion: the use of pleiotropic genetic variants to estimate causal effects. Am. J. e13405.
Epidemiol. 181, 251–260. Greene, D., NIHR BioResource, Richardson, S., and Turro, E. (2016). Pheno-
Buttari, B., Profumo, E., and Riganò, R. (2015). Crosstalk between red blood type similarity regression for identifying the genetic determinants of rare
cells and the immune system and its impact on atherosclerosis. BioMed diseases. Am. J. Hum. Genet. 98, 490–499.
Res. Int. 2015, 616834. GTEx Consortium (2015). Human genomics. The Genotype-Tissue Expression
Carrillo de Santa Pau, E., Juan, D., Pancaldi, V., Were, F., Martin-Subero, I., (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348,
Rico, D., and Valencia, A. (2016). Searching for the chromatin determinants 648–660.
of human hematopoiesis. bioRxiv. http://dx.doi.org/10.1101/082917. Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P.,
Castigli, E., Wilson, S.A., Garibyan, L., Rachid, R., Bonilla, F., Schneider, L., Collins, F.S., and Manolio, T.A. (2009). Potential etiologic and functional impli-
and Geha, R.S. (2005). TACI is mutant in common variable immunodeficiency cations of genome-wide association loci for human diseases and traits. Proc.
and IgA deficiency. Nat. Genet. 37, 829–834. Natl. Acad. Sci. USA 106, 9362–9367.
Chami, N., Chen, M.-H., Slater, A.J., Eicher, J.D., Evangelou, E., Tajuddin, Howie, B., Marchini, J., and Stephens, M. (2011). Genotype imputation with
S.M., Love-Gregory, L., Kacprowski, T., Schick, U.M., Nomura, A., et al. thousands of genomes. G3 (Bethesda) 1, 457–470.
(2016). Exome genotyping identifies pleiotropic variants associated with red Hughes, J.R., Roberts, N., McGowan, S., Hay, D., Giannoulatou, E., Lynch, M.,
blood cell traits. Am. J. Hum. Genet. 99, 8–21. De Gobbi, M., Taylor, S., Gibbons, R., and Higgs, D.R. (2014). Analysis of
Chang, C.C., Chow, C.C., Tellier, L.C., Vattikuti, S., Purcell, S.M., and Lee, J.J. hundreds of cis-regulatory landscapes at high resolution in a single, high-
(2015). Second-generation PLINK: rising to the challenge of larger and richer throughput experiment. Nat. Genet. 46, 205–212.
datasets. Gigascience 4, 7. Hussain, M.S., Battaglia, A., Szczepanski, S., Kaygusuz, E., Toliat, M.R., Saka-
CHARGE Consortium Hematology Working Group (2016). Meta-analysis kibara, S., Altmüller, J., Thiele, H., Nürnberg, G., Moosa, S., et al. (2014).
of rare and common exome chip variants identifies S1PR4 and other loci Mutations in CKAP2L, the human homolog of the mouse Radmis gene, cause
influencing blood cell traits. Nat. Genet. 48, 867–876. Filippi syndrome. Am. J. Hum. Genet. 95, 622–632.
Chen, L., Ge, B., Casale, F.P., Vasquez, L., Kwan, T., Garrido-Martı́n, D., Watt, Ijäs, P., Saksi, J., Soinne, L., Tuimala, J., Jauhiainen, M., Jula, A., Kähönen, M.,
S., Yang, Y., Kundu, K., Ecker, S., et al. (2016). Genetic drivers of epigenetic Kesäniemi, Y.A., Kovanen, P.T., Kaste, M., and Lindsberg, P.J. (2013). Hapto-
and transcriptional variation in human immune cells. Cell 167. http://dx.doi. globin 2 allele associates with unstable carotid plaque and major cardiovascu-
org/10.1371/journal.pbio.0000051. lar events. Atherosclerosis 230, 228–234.
Delaneau, O., Zagury, J.-F., and Marchini, J. (2013). Improved whole-chro- International HapMap 3 Consortium, Altshuler, D.M., Gibbs, R.A., Peltonen, L.,
mosome phasing for disease and population genetic studies. Nat. Methods Altshuler, D.M., Gibbs, R.A., Peltonen, L., Dermitzakis, E., Schaffner, S.F., Yu,
10, 5–6. F., et al. (2010). Integrating common and rare genetic variation in diverse
Durbin, R. (2014). Efficient haplotype matching and storage using the posi- human populations. Nature 467, 52–58.
tional Burrows-Wheeler transform (PBWT). Bioinformatics 30, 1266–1272. Iotchkova, V., Ritchie, G.R.S., Geihs, M., Morganella, S., Min, J.L., Walter, K.,
Eicher, J.D., Chami, N., Kacprowski, T., Nomura, A., Chen, M.-H., Yanek, L.R., Timpson, N.J., UK10K Consortium, Dunham, I., Birney, E., and Nicole
Tajuddin, S.M., Schick, U.M., Slater, A.J., Pankratz, N., et al.; Global Lipids Ge- Soranzo. (2016a). GARFIELD - GWAS Analysis of Regulatory or Functional
netics Consortium; CARDIoGRAM Exome Consortium; Myocardial Infarction Information Enrichment with LD correction. bioRxiv. http://dx.doi.org/10.
Genetics Consortium (2016). Platelet-related variants identified by exomechip 1101/085738.
meta-analysis in 157,293 individuals. Am. J. Hum. Genet. 99, 40–55. Iotchkova, V., Huang, J., Morris, J.A., Jain, D., Barbieri, C., Walter, K., Min,
Engström, G., Smith, J.G., Persson, M., Nilsson, P.M., Melander, O., and J.L., Chen, L., Astle, W., Cocca, M., et al. (2016b). Discovery and refinement
Hedblad, B. (2014). Red cell distribution width, haemoglobin A1c and inci- of genetic loci associated with cardiometabolic risk using dense imputation
dence of diabetes mellitus. J. Intern. Med. 276, 174–183. maps. Nat. Genet. 48, 1303–1312.
Ernst, J., and Kellis, M. (2012). ChromHMM: automating chromatin-state Jenne, C.N., Urrutia, R., and Kubes, P. (2013). Platelets: bridging hemostasis,
discovery and characterization. Nat. Methods 9, 215–216. inflammation, and immunity. Int. J. Lab. Hematol. 35, 254–261.
Faure, S., Meyer, L., Costagliola, D., Vaneensberghe, C., Genin, E., Autran, B., Jensen, F.B. (2009). The dual roles of red blood cells in tissue oxygen delivery:
Delfraissy, J.F., McDermott, D.H., Murphy, P.M., Debré, P., et al. (2000). Rapid oxygen carriers and regulators of local blood flow. J. Exp. Biol. 212, 3387–
progression to AIDS in HIV+ individuals with a structural variant of the chemo- 3393.
kine receptor CX3CR1. Science 287, 2274–2277. Jun, G., Flickinger, M., Hetrick, K.N., Romm, J.M., Doheny, K.F., Abecasis,
Ferreira, R.C., Freitag, D.F., Cutler, A.J., Howson, J.M.M., Rainbow, D.B., G.R., Boehnke, M., and Kang, H.M. (2012). Detecting and estimating contam-
Smyth, D.J., Kaptoge, S., Clarke, P., Boreham, C., Coulson, R.M., et al. ination of human DNA samples in sequencing and array-based genotype data.
(2013). Functional IL6R 358Ala allele impairs classical IL-6 receptor signaling Am. J. Hum. Genet. 91, 839–848.
Cell 167, 1415–1429, November 17, 2016 1427

Kanno, H., and Miwa, S. (1991). Single-nucleotide substitution in pyruvate Natanson, C., Kern, S.J., Lurie, P., Banks, S.M., and Wolfe, S.M. (2008). Cell-
kinase deficiency. Blood 78, 1891–1892. free hemoglobin-based blood substitutes and risk of myocardial infarction and
Kristiansen, M., Graversen, J.H., Jacobsen, C., Sonne, O., Hoffman, H.J., Law, death: a meta-analysis. JAMA 299, 2304–2312.
S.K., and Moestrup, S.K. (2001). Identification of the haemoglobin scavenger Notta, F., Zandi, S., Takayama, N., Dobson, S., Gan, O.I., Wilson, G., Kauf-
receptor. Nature 409, 198–201. mann, K.B., McLeod, J., Laurenti, E., Dunant, C.F., et al. (2016). Distinct routes
Kuijpers, T.W., van de Vijver, E., Weterman, M.A.J., de Boer, M., Tool, A.T.J., of lineage development reshape the human blood hierarchy across ontogeny.
van den Berg, T.K., Moser, M., Jakobs, M.E., Seeger, K., Sanal, O., et al. Science 351, aab2116.
(2009). LAD-1/variant syndrome is caused by mutations in FERMT3. Blood O’Connell, J., Sharp, K., Shrine, N., Wain, L., Hall, I., Tobin, M., Zagury, J.-F.,
113, 4740–4746. Delaneau, O., and Marchini, J. (2016). Haplotype estimation for biobank-scale
Roadmap Epigenomics Consortium, Kundaje, A., Meuleman, W., Ernst, J., data sets. Nat. Genet. 48, 817–820.
Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, Paul, F., Arkin, Y., Giladi, A., Jaitin, D.A., Kenigsberg, E., Keren-Shaul, H.,
J., Ziller, M.J., et al. (2015). Integrative analysis of 111 reference human epige- Winter, D., Lara-Astiaso, D., Gury, M., Weiner, A., et al. (2015). Transcriptional
nomes. Nature 518, 317–330. heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–
Landrum, M.J., Lee, J.M., Benson, M., Brown, G., Chao, C., Chitipiralla, S., Gu, 1677.
B., Hart, J., Hoffman, D., Hoover, J., et al. (2016). ClinVar: public archive of Plenge, R.M., Scolnick, E.M., and Altshuler, D. (2013). Validating therapeutic
interpretations of clinically relevant variants. Nucleic Acids Res. 44(D1), targets through human genetics. Nat. Rev. Drug Discov. 12, 581–594.
D862–D868.
Polfus, L.M., Khajuria, R.K., Schick, U.M., Pankratz, N., Pazoki, R., Brody, J.A.,
Lentaigne, C., Freson, K., Laffan, M.A., Turro, E., and Ouwehand, W.H.;
Chen, M.-H., Auer, P.L., Floyd, J.S., Huang, J., et al. (2016). Whole-exome
BRIDGE-BPD Consortium and the ThromboGenomics Consortium (2016).
sequencing identifies loci associated with blood cell traits and reveals a role
Inherited platelet disorders: toward DNA-based diagnosis. Blood 127, 2814–
for alternative GFI1B splice variants in human hematopoiesis. Am. J. Hum.
2823.
Genet. 99, 481–488.
Leslie, R., O’Donnell, C.J., and Johnson, A.D. (2014). GRASP: analysis of
R Core Team (2014). A language and environment for statistical computing
genotype-phenotype results from 1390 genome-wide association studies
(Vienna, Austria: R Foundation for Statistical Computing).
and corresponding open access database. Bioinformatics 30, i185–i194.
Romano, M., Dri, P., Da Dalt, L., Patriarca, P., and Baralle, F.E. (1997).
Levy, A.P., Hochberg, I., Jablonski, K., Resnick, H.E., Lee, E.T., Best, L., and
Biochemical and molecular characterization of hereditary myeloperoxidase
Howard, B.V.; Strong Heart Study (2002). Haptoglobin phenotype is an inde-
deficiency. Blood 90, 4126–4134.
pendent risk factor for cardiovascular disease in individuals with diabetes:
The Strong Heart Study. J. Am. Coll. Cardiol. 40, 1984–1990. Rosenstein, R.K., Panush, R.S., Kramer, N., and Rosenstein, E.D. (2014).
Hypereosinophilia and seroconversion of rheumatoid arthritis. Clin. Rheuma-
Lian, L., Wang, Y., Flick, M., Choi, J., Scott, E.W., Degen, J., Lemmon, M.A.,
tol. 33, 1685–1688.
and Abrams, C.S. (2009). Loss of pleckstrin defines a novel pathway for
PKC-mediated exocytosis. Blood 113, 3577–3584. Routes, J., Abinun, M., Al-Herz, W., Bustamante, J., Condino-Neto, A., De La
Linderman, M.D., Brandt, T., Edelmann, L., Jabado, O., Kasai, Y., Kornreich, Morena, M.T., Etzioni, A., Gambineri, E., Haddad, E., Kobrynski, L., et al.
R., Mahajan, M., Shah, H., Kasarskis, A., and Schadt, E.E. (2014). Analytical (2014). ICON: the early diagnosis of congenital immunodeficiencies. J. Clin.
validation of whole exome and whole genome sequencing for clinical applica- Immunol. 34, 398–424.
tions. BMC Med. Genomics 7, 20. Roy, N.B.A., Wilson, E.A., Henderson, S., Wray, K., Babbs, C., Okoli, S.,
Loh, P.-R., Tucker, G., Bulik-Sullivan, B.K., Vilhjálmsson, B.J., Finucane, H.K., Atoyebi, W., Mixon, A., Cahill, M.R., Carey, P., et al. (2016). A novel 33-Gene
Salem, R.M., Chasman, D.I., Ridker, P.M., Neale, B.M., Berger, B., et al. targeted resequencing panel provides accurate, clinical-grade diagnosis and
(2015). Efficient Bayesian mixed-model analysis increases association power improves patient management for rare inherited anaemias. Br. J. Haematol.
in large cohorts. Nat. Genet. 47, 284–290. 175, 318–330.
Loh, P.-R., Danecek, P., Palamara, P.F., Fuchsberger, C., A Reshef, Y., K Fi- Sansanayudh, N., Anothaisintawee, T., Muntham, D., McEvoy, M., Attia, J.,
nucane, H., Schoenherr, S., Forer, L., McCarthy, S., Abecasis, G.R., et al. and Thakkinstian, A. (2014). Mean platelet volume and coronary artery dis-
(2016). Reference-based phasing using the Haplotype Reference Consortium ease: a systematic review and meta-analysis. Int. J. Cardiol. 175, 433–440.
panel. Nat. Genet. 48, 1443–1448. Sawcer, S., Hellenthal, G., Pirinen, M., Spencer, C.C., Patsopoulos, N.A.,
Lopez, D. (2008). Inhibition of PCSK9 as a novel strategy for the treatment of Moutsianas, L., Dilthey, A., Su, Z., Freeman, C., Hunt, S.E., et al.; International
hypercholesterolemia. Drug News Perspect. 21, 323–330. Multiple Sclerosis Genetics Consortium; Wellcome Trust Case Control Con-
McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R.S., Thormann, A., sortium 2 (2011). Genetic risk and a primary role for cell-mediated immune
Flicek, P., and Cunningham, F. (2016). The Ensembl Variant Effect Predictor. mechanisms in multiple sclerosis. Nature 476, 214–219.
Genome Biol. 17, 122. Schaffer, A., Verdoia, M., Cassetti, E., Barbieri, L., Perrone-Filardi, P., Marino,
Miller, B.J., Gassama, B., Sebastian, D., Buckley, P., and Mellor, A. (2013). P., and De Luca, G. (2015). Impact of red blood cells count on the relationship
Meta-analysis of lymphocytes in schizophrenia: clinical status and antipsy- between high density lipoproteins and the prevalence and extent of coronary
chotic effects. Biol. Psychiatry 73, 993–999. artery disease: a single centre study [corrected]. J. Thromb. Thrombolysis 40,
61–68.
Minikel, E.V., Vallabh, S.M., Lek, M., Estrada, K., Samocha, K.E., Sathirapong-
sasuti, J.F., McLean, C.Y., Tung, J.Y., Yu, L.P.C., Gambetti, P., et al.; Exome Schick, U.M., Jain, D., Hodonsky, C.J., Morrison, J.V., Davis, J.P., Brown, L.,
Aggregation Consortium (ExAC) (2016). Quantifying prion disease penetrance Sofer, T., Conomos, M.P., Schurmann, C., McHugh, C.P., et al. (2016).
using large population control cohorts. Sci. Transl. Med. 8, 322ra9. Genome-wide Association Study of Platelet Count Identifies Ancestry-Specific
Loci in Hispanic/Latino Americans. Am. J. Hum. Genet. 98, 229–242.
Moon, H.M., and Wynshaw-Boris, A. (2013). Cytoskeleton in action: lissen-
cephaly, a neuronal migration disorder. Wiley Interdiscip. Rev. Dev. Biol. 2, Schneider, M., Chandler, K., Tischkowitz, M., and Meyer, S. (2015). Fanconi
229–245. anaemia: genetics, molecular biology, and cancer – implications for clinical
Moore, C., Sambrook, J., Walker, M., Tolkien, Z., Kaptoge, S., Allen, D., management in children and adults. Clin. Genet. 88, 13–24.
Mehenny, S., Mant, J., Di Angelantonio, E., Thompson, S.G., et al. (2014). Sekar, A., Bialas, A.R., de Rivera, H., Davis, A., Hammond, T.R., Kamitaki, N.,
The INTERVAL trial to determine whether intervals between blood donations Tooley, K., Presumey, J., Baum, M., Van Doren, V., et al.; Schizophrenia Work-
can be safely and acceptably decreased to optimise blood supply: study pro- ing Group of the Psychiatric Genomics Consortium (2016). Schizophrenia risk
tocol for a randomised controlled trial. Trials 15, 363. from complex variation of complement component 4. Nature 530, 177–183.
1428 Cell 167, 1415–1429, November 17, 2016

Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Vasquez, L.J., Mann, A.L., Chen, L., and Soranzo, N. (2016). From GWAS to
and Sirotkin, K. (2001). dbSNP: the NCBI database of genetic variation. Nu- function: lessons from blood cells. ISBT Sci. Ser. 11(Suppl, Suppl 1 ), 211–219.
cleic Acids Res. 29, 308–311. Voight, B.F., Peloso, G.M., Orho-Melander, M., Frikke-Schmidt, R., Barbalic,
Sowemimo-Coker, S.O. (2002). Red blood cell hemolysis during processing. M., Jensen, M.K., Hindy, G., Hólm, H., Ding, E.L., Johnson, T., et al. (2012).
Transfus. Med. Rev. 16, 46–60. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian ran-
Staley, J.R., Blackshaw, J., Kamat, M.A., Ellis, S., Surendran, P., Sun, B.B., domisation study. Lancet 380, 572–580.
Paul, D.S., Freitag, D., Burgess, S., Danesh, J., et al. (2016). PhenoScanner: Wain, L.V., Shrine, N., Miller, S., Jackson, V.E., Ntalla, I., Soler Artigas, M., Bill-
a database of human genotype-phenotype associations. Bioinformatics 32, ington, C.K., Kheirallah, A.K., Allen, R., Cook, J.P., et al.; UK Brain Expression
3207–3209. Consortium (UKBEC); OxGSK Consortium (2015). Novel insights into the
Stenson, P.D., Mort, M., Ball, E.V., Shaw, K., Phillips, A., and Cooper, D.N. genetics of smoking behaviour, lung function, and chronic obstructive pulmo-
(2014). The Human Gene Mutation Database: building a comprehensive muta- nary disease (UK BiLEVE): a genetic association study in UK Biobank. Lancet
tion repository for clinical and molecular genetics, diagnostic testing and Respir. Med. 3, 769–781.
personalized genomic medicine. Hum. Genet. 133, 1–9. Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., Klemm,
Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, A., Flicek, P., Manolio, T., Hindorff, L., and Parkinson, H. (2014). The NHGRI
P., Elliott, P., Green, J., Landray, M., et al. (2015). UK biobank: an open access GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids
resource for identifying the causes of a wide range of complex diseases of mid- Res. 42, D1001–D1006.
dle and old age. PLoS Med. 12, e1001779.
Westbury, S.K., Turro, E., Greene, D., Lentaigne, C., Kelly, A.M., Bariana, T.K.,
Tajuddin, S.M., Schick, U.M., Eicher, J.D., Chami, N., Giri, A., Brody, J.A., Hill, Simeoni, I., Pillois, X., Attwood, A., Austin, S., et al.; BRIDGE-BPD Consortium
W.D., Kacprowski, T., Li, J., Lyytikäinen, L.-P., et al. (2016). Large-scale (2015). Human phenotype ontology annotation and cluster analysis to unravel
exome-wide association analysis identifies loci for white blood cell traits and genetic defects in 707 cases with unexplained bleeding and platelet disorders.
pleiotropy with immune-mediated diseases. Am. J. Hum. Genet. 99, 22–39. Genome Med. 7, 36.
Tennessen, J.A., Bigham, A.W., O’Connor, T.D., Fu, W., Kenny, E.E., Gravel,
Wheeler, J.G., Mussolino, M.E., Gillum, R.F., and Danesh, J. (2004). Asso-
S., McGee, S., Do, R., Liu, X., Jun, G., et al.; Broad GO; Seattle GO; NHLBI
ciations between differential leucocyte count and incident coronary heart
Exome Sequencing Project (2012). Evolution and functional impact of rare
disease: 1764 incident cases from seven prospective studies of 30,374 indi-
coding variation from deep sequencing of human exomes. Science 337,
viduals. Eur. Heart J. 25, 1287–1292.
64–69.
Willer, C.J., Li, Y., and Abecasis, G.R. (2010). METAL: fast and efficient meta-
UK10K Consortium, Walter, K., Min, J.L., Huang, J., Crooks, L., Memari, Y.,
analysis of genomewide association scans. Bioinformatics 26, 2190–2191.
McCarthy, S., Perry, J.R., Xu, C., Futema, M., et al. (2015). The UK10K project
identifies rare variants in health and disease. Nature 526, 82–90. Wood, S.N. (2011). Fast stable restricted maximum likelihood and marginal
Ulirsch, J.C., Nandakumar, S.K., Wang, L., Giani, F.C., Zhang, X., Rogov, P., likelihood estimation of semiparametric generalized linear models. J. R. Stat.
Melnikov, A., McDonel, P., Do, R., Mikkelsen, T.S., and Sankaran, V.G. Soc. Series B Stat. Methodol. 73, 3–36.
(2016). Systematic functional dissection of common genetic variation affecting Xu, C., Tachmazidou, I., Walter, K., Ciampi, A., Zeggini, E., and Greenwood,
red blood cell traits. Cell 165, 1530–1545. C.M.; UK10K Consortium (2014). Estimating genome-wide significance for
Ulset, R.A., Petrasch, E., Saker, J., Linssen, J., Kimura, K., Uchihashi, K., Phi- whole-genome sequencing studies. Genet. Epidemiol. 38, 281–290.
lipsen, P., and Eide, A. (2014). ‘‘Aged sample’’ software on automated routine Zhu, Z., Zhang, F., Hu, H., Bakshi, A., Robinson, M.R., Powell, J.E., Montgom-
hematology analyzer enables differentiation between pathological and non- ery, G.W., Goddard, M.E., Wray, N.R., Visscher, P.M., and Yang, J. (2016).
pathological WBC flagging in aging samples. Clin. Lab. 60, 1961–1968. Integration of summary data from GWAS and eQTL studies predicts complex
van der Harst, P., Zhang, W., Mateo Leach, I., Rendon, A., Verweij, N., Sehmi, trait gene targets. Nat. Genet. 48, 481–487.
J., Paul, D.S., Elling, U., Allayee, H., Li, X., et al. (2012). Seventy-five genetic Zijlmans, W.C.W.R., van Kempen, A.A.M.W., Ackermans, M.T., de Metz, J.,
loci influencing the human red blood cell. Nature 492, 369–375. Kager, P.A., and Sauerwein, H.P. (2008). Very young children with uncompli-
Varol, C., Mildner, A., and Jung, S. (2015). Macrophages: development and tis- cated falciparum malaria have higher risk of hypoglycaemia: a study from
sue specialization. Annu. Rev. Immunol. 33, 643–675. Suriname. Trop. Med. Int. Health 13, 626–634.
Cell 167, 1415–1429, November 17, 2016 1429

STAR+METHODS
KEY RESOURCES TABLE

flashpca (Abraham and Inouye, 2014) https://github.com/gabraham/flashpca
R 3.1.2 (R Core Team, 2014) https://www.r-project.org/
biomaRt Bioconductor https://bioconductor.org/packages/release/bioc/html/biomaRt.html
data.table The R Foundation https://cran.r-project.org/web/packages/data.table/index.html
doMC The R Foundation https://cran.r-project.org/web/packages/doMC/index.html
dplyr The R Foundation https://cran.r-project.org/web/packages/dplyr/index.html
foreach The R Foundation https://cran.r-project.org/web/packages/foreach/index.html
GenomicRanges Bioconductor https://bioconductor.org/packages/release/bioc/html/GenomicRanges.html
Hmisc The R Foundation https://cran.r-project.org/web/packages/Hmisc/index.html
openxlsx The R Foundation https://cran.r-project.org/web/packages/openxlsx/index.html
RcppEigen The R Foundation https://cran.r-project.org/web/packages/RcppEigen/index.html
reshape2 The R Foundation https://cran.r-project.org/web/packages/reshape2/index.html
rhdf5 Bioconductor https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html
stringr The R Foundation https://cran.r-project.org/web/packages/stringr/index.html
tidyr The R Foundation https://cran.r-project.org/web/packages/tidyr/index.html
Hmisc The R Foundation https://cran.r-project.org/web/packages/Hmisc/index.html
MASS The R Foundation https://cran.r-project.org/web/packages/MASS/index.html
ggplot2 The R Foundation https://cran.r-project.org/web/packages/ggplot2/index.html
lubridate The R Foundation https://cran.r-project.org/web/packages/lubridate/index.html
mgcv The R Foundation https://cran.r-project.org/web/packages/mgcv/index.html
RColorBrewer The R Foundation https://cran.r-project.org/web/packages/RColorBrewer/index.html
PLINK v1.9 (Chang et al., 2015) https://www.cog-genomics.org/plink2
SHAPEIT3 (O’Connell et al., 2016) https://jmarchini.org/software/
PBWT (Durbin, 2014) https://imputation.sanger.ac.uk/
BOLT-LMM (Loh et al., 2015) https://data.broadinstitute.org/alkesgroup/BOLT-LMM/
METAL (Willer et al., 2010) http://csg.sph.umich.edu//abecasis/Metal/
SMR (Zhu et al., 2016) http://cnsgenomics.com/software/smr/
Other
Clinvar database (Landrum et al., 2016) https://www.ncbi.nlm.nih.gov/clinvar/
Variant Effect Predictor (McLaren et al., 2016) http://www.ensembl.org/info/docs/tools/vep/index.html
PhenoScanner (Staley et al., 2016) http://www.phenoscanner.medschl.cam.ac.uk/phenoscanner
Further information may be directed to the Lead Contact, Nicole Soranzo (ns6@sanger.ac.uk). Results, including genome-wide
univariable summary statistics, are available from http://www.bloodcellgenetics.org.
We analyzed data from three large population studies with measurements of blood cell indices and imputed genome-wide geno-
types - the UK Biobank study, the UK BiLEVE study (a selected subset of UK Biobank) and the INTERVAL study. Although the UK
BiLEVE study is a subset of the UK Biobank study, we often refer to the UK BiLEVE study separately, since we conducted asso-
ciation analyses of UK BiLEVE participants as a distinct dataset due to their selected nature and a slightly different genotyping
array.

The UK Biobank Study
The UK Biobank study is a prospective cohort study of 502,682 participants recruited at 22 assessment centers across the UK
between 2006 and 2010 (Sudlow et al., 2015). Participants aged between 40 and 69 were selected from GP lists and invited to attend
a center, where blood, urine and saliva samples were taken, physical measurements were made (eg, blood pressure, anthropometric
measurements), and extensive health and lifestyle questionnaires were completed.
DNA was extracted from buffy coat at UK Biocenter (Stockport, UK) using a Promega Maxwell 16 Blood DNA Purification Kit
(AS1010). Samples with sufficient DNA concentration and purity (as measured by 260/280 ratio) were aliquoted and 50 mL were ship-
ped for genotyping at Affymetrix (Santa Clara, Ca, USA). A bespoke sample selection algorithm was used to ensure that the samples
on each plate were from participants from a range of recruitment centers.
The UK BiLEVE Study

The UK Biobank Lung Exome Variant Evaluation (UK BiLEVE) study involves a subset of 50,008 participants from UK Biobank,
selected to investigate the genetic determinants of smoking behavior, lung function and chronic obstructive pulmonary disorder
(COPD) (Wain et al., 2015). The UK BiLEVE participants included equal numbers of males and females selected from those who
self-reported being of white European ancestry, had sufficient spirometric measurements to determine lung function measures,
were either never smokers or ‘heavy smokers’ (mean 35 pack years), and had either poor lung function, average lung function or
high lung function. As the UK BiLEVE participants are a subset of the UK Biobank study, DNA extraction, aliquoting and shipment
procedures were as described above.
The INTERVAL Study

The INTERVAL study is a prospective cohort study of approximately 50,000 participants nested within a pragmatic randomized trial
of blood donors (Moore et al., 2014). Between 2012 and 2014, blood donors 18 years and older were consented and recruited from
25 NHSBT (National Health Service Blood and Transplant) static donor centers across England. Participants are predominantly
healthy individuals since people with major disease (myocardial infarction, stroke, cancer etc) are ineligible for donation, as are those
who report being unwell or having had recent illness or infection.
Participants completed online questionnaires containing basic lifestyle and health-related information, including self-reported
height and weight, ethnicity, current smoking status, alcohol consumption, doctor-diagnosed anemia, use of medications (hormone
replacement therapy, iron supplements) and menopausal status.
DNA was extracted from buffy coat at LGC Genomics (UK) using a Kleargene method and samples of sufficient concentration and
purity were aliquoted for shipment to Affymetrix for genotyping. A modified version of the sample selection algorithm used for the UK
Biobank study was implemented to ensure that samples on each plate came from participants with a mix of recruitment center,
recruitment date, regional hub and gender.
The INTERVAL study was approved by the Cambridge (East) Research Ethics Committee and UK Biobank was approved by the
North West Multi-center Research Ethics Committee (MREC). Informed consent was obtained from all participants.
METHOD DETAILS
The UK Biobank and UK BiLEVE Affymetrix Axiom Genotyping Arrays

The UK Biobank Affymetrix Axiom array is a customized genotyping array comprising 845,485 probesets assaying 820,967 single
nucleotide variants (SNVs) and short insertions/deletions (indels; http://www.ukbiobank.ac.uk/scientists-3/uk-biobank-axiom-
array/). The array includes an ‘‘exome’’ component, designed to capture variants likely to have transcriptional consequences (non-
synonymous, splice altering, truncating), and a ‘‘genome-wide association study (GWAS) scaffold’’ selected to ensure good quality
genome-wide imputation of variants that are common (minor allele frequency [MAF] > 5%) or low-frequency (MAF = 1%–5%) in
European populations. The exome component, which includes approximately 130,000 (predominantly rare) variants, was designed
using data from three large exome sequencing projects: the NHLBI Exome Sequencing Project (Tennessen et al., 2012), the Exome
Aggregation Consortium (ExAC) (Minikel et al., 2016) and the UK10K project (UK10K Consortium et al., 2015). Additional rare variants
were included in cardiac disease and cancer predisposition genes, as well as other variants from the Human Gene Mutation Data-
base (HGMD) (Stenson et al., 2014).
The genome-wide imputation scaffold was designed by selecting tagging variants from Affymetrix databases using a custom
algorithm. In addition to 246,000 variants from the 1000 Genomes CEU population designed to tag common variants in European
populations, an additional 103,000 variants from additional European 1000 Genomes populations were added to boost imputation
of common variants, as well as a further 280,000 variants to boost imputation in the UK population in the 1%–5% MAF range. Mean r2
between observed and imputed genotypes for common variants was estimated to be 0.92, while for low-frequency variants it was
0.79 (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=155580), suggesting that the array was able to impute lower frequency vari-
ants with greater accuracy than previous GWAS arrays typically could.
The remaining content on the array includes markers of specific relevance, including markers related to diseases and traits
(Alzheimer’s, autoimmune and inflammatory, blood phenotypes, cancers, cardiometabolic, neurological disease), dense coverage
of selected genomic regions (HLA, ApoE, KIR, Y chromosome, mitochondria, copy number variants relevant to certain conditions)

and other categories (variants related to gene expression, fingerprint markers, tags for Neanderthal ancestry, and pharmacogenetic
markers). Of particular relevance to this study, the array included 2,545 variants related to blood and iron phenotypes, including red
cell blood groups, regulation of hematopoiesis (red blood cells, platelets, white blood cells) and regulation of blood homeostasis iden-
tified from candidate gene studies, GWAS and review of the literature.
The UK BiLEVE Affymetrix Axiom genotyping array preceded the UK Biobank array and was designed similarly (overlapping
content > 95%). Due to the focus of the UK BiLEVE study, their array contained content designed to genotype or tag variants
known or suspected to be related to lung function or disease, COPD, asthma or smoking behavior. In total, the array had
833,090 probesets assaying 807,411 variants. The 50,008 participants in the UK BiLEVE study were genotyped on the UK BiLEVE
array, while the remaining UK Biobank participants and the INTERVAL participants were genotyped on the UK Biobank array.
Genotyping
For all three studies, aliquots were shipped to Affymetrix in 96-well barcoded plates with two empty wells for Affymetrix controls.
Samples were quantified using a PicoGreen-based method to identify plates with high numbers of low concentration samples, which
could be replaced prior to genotyping. Genotyping was performed on the Affymetrix GeneTitan Multi-Channel (MC) Instrument
according to the Affymetrix Axiom 2.0 Assay Automated Workflow. Genotypes were then called in batches of approximately 50 plates
(4800 samples) using the Affymetrix Power Tools software to implement the Axiom GT1 algorithm. For the UK Biobank and UK
BiLEVE studies, rare variants (i.e., those with fewer than six minor alleles in a genotyping batch) were recalled using variant-specific
priors to improve performance.
Quality Control (QC) of Genotype Data

For all studies, Affymetrix implemented standard QC procedures during the genotype calling pipeline, excluding samples with poor
signal intensity (dish QC < 0.82) and samples with low call rate (< 97%) based on 20,000 high quality probesets. Variants were
excluded if they had low call rate (< 95%), had more than three clusters (indicative of off-target measurement), had cluster statistics
(Fisher’s linear discriminant, heterozygous cluster strength, homozygote ratio offset) indicative of poor quality genotyping or were
complicated multi-allelic variants that couldn’t easily be called.
QC of UK BiLEVE Genotype Data
As UK BiLEVE participants were genotyped prior to the other UK Biobank participants on a slightly different array, QC of UK BiLEVE
genotyping data was carried out separately by UK BiLEVE investigators (Wain et al., 2015). Briefly, a total of 50,561 samples were
genotyped in eleven batches. Samples were excluded if they were sex mismatches, unresolvable duplicates (> 98% of alleles iden-
tical by descent [IBD]), heterozygosity outliers (greater than three standard deviations [SD] from the mean), ethnic outliers (greater
than ten SD from the mean on any of the first ten principal components (PCs) generated including all HapMap3 panels (International
HapMap 3 Consortium et al., 2010), or had withdrawn consent. Intentional duplicate pairs and related individuals (IBD > 20%) were
resolved, excluding individuals with the highest number of pairwise relationships then the lowest call rate. After these steps, 48,943
participants remained for analysis.
For variants with multiple probesets, only the probeset with the highest call rate was retained. Variants were additionally excluded
from a batch if they showed within-batch plate effects (p value < 1x106) and variants that failed in more than two of the eleven
batches were dropped from the dataset. A total of 782,260 variants remained after QC.
QC of UK Biobank Genotype Data
At the time of submission of this paper, genotyping data were available on the first 150,000 participants from the UK Biobank
study, including the 50,000 participants selected for the UK BiLEVE study. QC of UK Biobank genotyping data from these par-
ticipants, carried out by UK Biobank investigators, has been described in detail elsewhere (http://biobank.ctsu.ox.ac.uk/crystal/
refer.cgi?id=155580). In total, 153,293 samples were genotyped across 33 batches. Samples with high missingness or high het-
erozygosity (accounting for ethnicity) were excluded based on visual inspection of ancestry-specific plots, as were samples from
participants who had withdrawn. A further eight samples who had low heterozygosity that couldn’t be explained by long runs of
homozygosity were also excluded. For variants with multiple probesets, the probeset defined by Affymetrix as ‘‘best’’ was re-
tained. Variants showing batch effects (either within the UK Biobank batches or between UK Biobank and UK BiLEVE batches),
within-batch plate effects, or within-batch deviations from Hardy-Weinberg equilibrium (HWE) in European ancestry samples
defined by principal component analysis (PCA), all at p value < 1x1012, were filtered from the batches in which they failed.
In total, after these exclusions, data were available for 151,733 participants on 806,466 variants that passed in at least one
batch.
Additional QC of UK Biobank and UK BiLEVE Genotype Data
In addition to the QC steps applied by UK Biobank and UK BiLEVE investigators, we implemented sample filtering on the combined
dataset. We excluded samples with more than 3% missingness, samples with missing phenotypic sex, and samples with sex mis-
matches or dubious sex estimation from the genotyped data. To restrict analyses to participants of European continental ancestry,
we defined a ‘genetic distance’ d (i) between individual i and a hypothetical median ‘‘white British’’ participant using variance
weighted PC scores,

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u 15
uX
dðiÞ = t Em ðPim Cm Þ2
m=1
Where:
m is an index for each of the 15 PCs provided by UK Biobank,
Em represents the eigenvalue corresponding to PC m (i.e., the genetic variance explained by PC m)
Pim represents the score of individual i on PC m
Cm represents the median score on PC m of participants with self-reported White ancestry (defined as ‘‘British,’’ ’’Irish,’’ ‘‘White,’’ or
‘‘Any other White background’’)
We used a threshold of genetic distance > 50 to identify non-Europeans, which resulted in the exclusion of 7,848 non-European
samples.
To implement further QC steps (heterozygosity analysis, PCA and identification of duplicate samples), a robust set of variants were
derived using the same methods as UK Biobank, i.e., selecting autosomal variants on both arrays that had passed variant QC in all 33
batches, had MAF R 2.5% and missingness % 1.5%, were not indels, were not C/G or A/T variants, and were not within 23 regions of
known long-range linkage disequilibrium (LD). These variants were then LD-pruned (r2 < 0.1) to obtain an uncorrelated set of variants.
The first fifty PCs were estimated using flashpca (Abraham and Inouye, 2014) and the heterozygosity analysis, which was carried out
in parallel with the ethnic outlier identification using PLINK v1.9 (Chang et al., 2015), identified 3,030 samples that had autosomal
heterozygosity greater than three SD from the mean, 2,667 of which were also ethnic outliers. To identify duplicate samples, we per-
formed identity-by-descent (IBD) analysis using the PLINK Method-of-Moments approach (http://pngu.mgh.harvard.edu/purcell/
plink/ibdibs.shtml), which identified 19 pairs of duplicate/monozygotic twins (pi_hat R 0.9). All 38 samples were excluded from the
analysis dataset.
Quality control (QC) of INTERVAL Genotype Data
In total, 48,813 INTERVAL samples were genotyped in ten batches. Following standard Affymetrix QC exclusions, within-batch sam-
ple and variant QC was performed. Non-best probesets were excluded to leave a single probeset per variant. As visual inspection of
cluster plots had identified that some variants, particularly rare variants, had minor allele homozygotes incorrectly called due to the
presence of an extreme intensity outlier, we failed variants from a batch if:
d the variant had fewer than ten called minor allele homozygotes;
d the cluster plot contained at least one sample with an intensity at least twice as far from the origin as the next most extreme
sample;
d the outlying sample (s) had an extreme polar angle (< 15 or > 75 ) in the direction of the minor allele.
Prior to further QC of variants within each batch, we excluded duplicate samples and samples that were clearly not of European
ancestry using a set of high-quality autosomal variants, defined as those with:
d MAF > 0.05

d HWE p value > 1x106
d r2 % 0.2 between pairs of variants.
Duplicate samples were defined as those with p b R 0.9 using the PLINK Method-of-Moments IBD approach and non-Europeans
were defined as those with scores on PC1 or PC2 < 0 following a PCA including INTERVAL samples with 1000 Genomes major
ancestry populations (1000 Genomes Project Consortium et al., 2015).
Variants were then excluded from a batch if they strongly deviated from HWE (p value < 5x106), following a Fisher’s exact test for
low-frequency and rare variants (defined as those with a maximum MAF < 0.05 across all ten batches) or a c2 test for common var-
iants. Similarly, variants were excluded from a batch if they had a within-batch call rate < 0.97. Finally, variants were dropped from all
batches if they failed in at least four of the batches due to deviation from HWE, low call rate or Affymetrix variant exclusion criteria.
After merging passing samples and variants across the ten batches, we estimated the level of sample contamination using the
method described by Jun et al. (2012), which examines the relationship between allele frequency and probeset intensity. We
excluded samples with more than 10% contamination, as well as those who had both 3%–10% contamination and ten or more first-
or second-degree relatives (defined as pi_hat R 0.1875). Heterozygosity outliers (heterozygosity more than three standard deviations
away from the mean), samples with missing phenotypic sex and sex mismatches were then also removed, as were variants with a
MAF range greater than 0.05 across all batches, variants that were monomorphic in one or more batches but had MAF > 0.01 in
another batch, and variants that had different minor alleles between batches (only for variants with maximum MAF < 0.475 across
batches).
For IBD analysis and PCA, another set of 100,000 high quality variants was selected using the same criteria described above for
the UK Biobank QC (Figure S3). The global IBD analysis (performed using PLINK Method-of-Moments approach) revealed 69 pairs of
across-batch duplicates (or monozygotic twins), who were removed from the dataset. A between-study IBD analysis, including the
INTERVAL, UK Biobank and UK BiLEVE studies revealed a further 1100 participants who were in both the INTERVAL and combined

UK Biobank-UK BiLEVE datasets, so these participants were excluded from the INTERVAL dataset to avoid overlap. The PCA, per-
formed using flashpca without including 1000 Genomes samples, identified a further twelve outliers who leveraged lower PCs (PC 6,
8 and 9) according to a visual check and were therefore excluded from the dataset. The PCA was then re-run to obtain final PCs for
use as covariates in analysis models. 43,059 participants remained in the final dataset.
Variant Imputation
UK Biobank and UK BiLEVE
The pre-imputation variant QC, phasing and imputation conducted on the combined UK Biobank and UK BiLEVE dataset has been
described in detail (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=157020). Briefly, sample and variant QC was performed as
described above, then variants were additionally removed if they:
d were only on the UK BiLEVE array and had failed in more than one (of eleven) UK BiLEVE batches;
d were only on the UK Biobank array and had failed in more than two (of 22) UK Biobank batches;
d were on both arrays and had failed in three or more of the 33 total batches.
Multiallelic variants and variants with MAF < 0.01 were then removed, as were non-autosomal variants. The UK Biobank and UK
BiLEVE study samples were then jointly phased and imputed using a combined 1000 Genomes Phase 3-UK10K panel. Phasing was
conducted using SHAPEIT3 (O’Connell et al., 2016), a modified version of SHAPEIT2 (Delaneau et al., 2013), in chunks of 5,000 var-
iants with an overlap of 250 variants between chunks. Imputation was performed using IMPUTE3, a modified version of the IMPUTE2
software (Howie et al., 2011), in chunks of 2Mb with a 250kb buffer region. Post-imputation, variants with MAF < 0.00001 (1 in
100,000) were filtered from the dataset using QCTOOL (http://www.well.ox.ac.uk/gav/qctool/), leaving 72,355,667 variants for
analysis in the dataset.
INTERVAL
Prior to imputation, additional variant QC steps were performed to establish a high quality imputation scaffold. We imposed a global
HWE filter of p value < 5x106, a call rate filter of 99% over the batches that a variant was not failed in, and a global call rate filter of
75% (effectively ensuring a variant passed in at least eight of the ten batches). Finally we removed all monomorphic variants.
Non-autosomal and multi-allelic variants were removed as part of the QC process and the dataset was then phased using
SHAPEIT3, with the same criteria used for UK Biobank (chunks of 5,000 variants with an overlap of 250 variants between chunks)
and subsequently imputed using the same combined 1000 Genomes Phase 3-UK10K imputation panel described above. Imputation
was performed on the Sanger Imputation Server (https://imputation.sanger.ac.uk), which uses the PBWT imputation algorithm (Dur-
bin, 2014), and analyses whole chromosomes. No imputation quality or variant frequency filters were applied, resulting in 87,696,910
imputed variants in the dataset.
Using whole-exome sequencing (WES) data for 3,976 INTERVAL study participants who were also in our post-QC imputation
dataset, we were able to assess imputation accuracy. We adapted two metrics (Linderman et al., 2014) to compare genotype
data to sequencing data for these purposes. The first was non-reference concordance, which considers all heterozygotes and minor
allele homozygotes in the WES dataset and calculates the proportion seen in the imputed dataset. The second was precision, which
considers all the heterozygous and minor allele homozygotes in the imputed dataset, and calculates what proportion of these calls
was correct according to the WES dataset. For 146 missense, loss-of-function or rare high-impact (beta > 0.5SD) variants that
passed QC in the WES dataset, we observed a median non-reference concordance of 98.6%, 98.8% and 93.9% in common
(MAF > 0.05), low-frequency (MAF > 0.01 and MAF % 0.05) and rare variants (MAF < 0.01) respectively and median precision of
99.5%, 99.3% and 98.5% in common, low-frequency and rare variants respectively.
Phenotype Measurement, QC, and Processing

Variability of Hematological Indices
We studied 36 hematological traits in individuals of European ancestry selected from the UK Biobank and INTERVAL studies (Figure 1;
Table S1). The traits comprise the main hematological indices of the seven types of cells reported in a standard clinical full blood
count (FBC) analysis and additional variables derived from them (Table S2), measuring properties of mature and immature red blood
cells (twelve indices), platelets (four indices) and myeloid and lymphoid white blood cells (twenty indices). The indices include cell
counts per unit volume of blood (e.g., the counts of the six types of myeloid cells and lymphocytes), ratios of cell counts (e.g., count
of neutrophils as the percentage of myeloid white cells), mean volume of platelets and red cells (MPV and MCV, respectively), pro-
portions of blood volume occupied by cells (e.g., hematocrit) and measurements of the concentration and mass distribution of
cellular hemoglobin (e.g., mean corpuscular hemoglobin [MCH]).
Exploiting extensive metadata on the blood cell index measurements and anthropometric covariates, we performed thorough
quality control to identify and remove sources of technical and non-genetic biological variation, increasing our power to detect ge-
netic associations. Technical covariables such as the time between venipuncture and FBC analysis, FBC instrument drift and cali-
bration events and episodes of malfunction, explained up to 16% of the variance of each index (Figure S1). Further, non-technical
sources of covariation such as age, sex and menopause status were shown to affect blood cell indices strongly, accounting for
up to 40% of variance in the residuals after adjustment for technical factors (Figure S2). We made flexible adjustments for age within

sex and menopause categories using semi-parametric regression. Additionally, using clinical knowledge, we selected a subset of
measured covariates to screen for association with indices in the UK Biobank dataset. Body mass index and variables measuring
smoking habits and alcohol consumption covariates explained at least 0.5% of variance in one or more of red cell, platelet or white
cell indices after adjustment for age and sex, and were thus included in our adjustments.
Measurement of Blood Cell Indices
Full blood counts were measured in UK Biobank and INTERVAL study participants using clinical hematology analyzers at the central-
ized processing laboratory of UK Biocenter (Stockport, UK). Research blood samples for the baseline assays of UK Biobank volun-
teers were collected into 4ml EDTA vacutainers by vacuum draw at the UK Biobank assessment centers and were then stored at 4
degrees centigrade. The samples were transported overnight to UK Biocenter in temperature-controlled shipping boxes.
For the INTERVAL baseline assays, research blood samples were taken from each participant through the satellite pouch of a
blood collection unit, with the venipuncture performed as part of a routine NHSBT whole blood donation (Moore et al., 2014). The
samples for FBC analysis were collected in 3ml EDTA tubes and were transported to NHSBT holding sites (‘hubs’) at Manchester,
Colindale (London) and Bristol, from where they were taken overnight by courier to UK Biocenter. The INTERVAL blood samples
were kept predominantly at ambient temperature from the time of collection to the time of measurement.
At UK Biocenter, the UK Biobank whole blood samples were processed using four Beckman Coulter LH700 Series instruments
while the INTERVAL samples were processed using two Sysmex XN-1000 instruments. The two models of analyzer both measure
full blood counts by a combination of fluorescence and impedance flow cytometry. However, there are some differences in the
cytometric methods used by the instruments to distinguish and count the different types of blood cell. The different analysis tech-
niques require different manufacturer-supplied reagents to treat, lyse and fluoresce the cells, which can result in differences in
the measurement response.
Technical Sample Exclusions
As a blood sample ages, the accuracy of a full blood count (as a measurement of the properties of peripheral blood at the time of
venipuncture) deteriorates. The exact consequences of sample aging depend on the measurement techniques used by the instru-
ment. For example, blood cell membranes become more elastic as a sample ages. Consequently, if the analyzer uses a hypotonic
solution, cells in an older sample tend to swell more at the point of measurement than cells in a younger sample. This excess swelling
leads to bias in the measurement of traits determined as a function of plateletcrit (PCT) or hematocrit (HCT) (Ulset et al., 2014). It may
also become more difficult to differentiate between the types of white cell as a sample ages and very old samples are likely to suffer
from hemolysis (Sowemimo-Coker, 2002).
Greater than 99% of UK Biobank baseline FBCs and 98% of INTERVAL baseline FBCs were measured fewer than 48 hr after veni-
puncture. Respectively 72% and 75% of the FBCs were measured fewer than 24 hr after venipuncture. Although clinical laboratories
do not usually issue FBC reports measured on samples aged for more than about 12 hr, FBCs from blood samples below clinical
standard may still add useful information to genetic association analysis. However there is a tradeoff; the inclusion of very noisy sam-
ples may reduce power if the marginal increase in sample size is insufficient to compensate for the reduction in signal to noise ratio.
Consequently, we excluded participants from the association analysis if they had FBCs measured more than 36 hr after venipuncture.
This removed 11,490 participants from the UK Biobank phenotype dataset, 3,365 of whom had been genotyped, and 1,490 partic-
ipants from the INTERVAL phenotype dataset.
The Coulter analyzers distinguish platelets from red cells by impedance (Table S1), a proxy for cell volume. Consequently small red
cells can sometimes be confused with large platelets. Sysmex analyzers also use impedance to measure platelet volume, but they
measure platelet count by both fluorescence flow cytometry and impedance and routinely report the former measurement. Sysmex
instruments flag measurements of mean platelet volume (MPV) greater than 13 as unreliable on the grounds that the large volume
measurements suggest contamination of the platelet impedance channel by red cells. We excluded such data points from the
INTERVAL analysis. In order to similarly protect against contamination of platelet variables by red cells in the UK Biobank dataset,
we removed all platelet trait data from FBCs with a technically adjusted MPV measurement larger than the 96th percentile.
Blood Cell Index Data Adjustments
In order to optimize the power to detect allelic associations, we adjusted the baseline blood cell index values from the INTERVAL and
UK Biobank datasets to remove variance explained by technical, environmental and sex effects. We adjusted the data from the
INTERVAL and UK Biobank studies independently because of differences between the study populations, the differences between
the sample collection protocols and the use of different models of hematology analyzer. At the time we carried out the analyses
described in the present publication, the UK Biobank study investigators had released genetic data for approximately one third of
the cohort, which included the UK BiLEVE participants as a subset. However, we chose to adjust the phenotypes from the entire
UK Biobank baseline blood indices dataset (n = 476,675) in order to estimate covariate effects as precisely as possible. Our pheno-
typic adjustments are more extensive than has hitherto been usual for genome-wide association studies. Covariate adjustment ab-
sorbs variance (i.e., ‘‘uses up degrees of freedom’’) and we do not model this directly in the association analyses. However, we do use
Genomic Control (see below), which corrects the test statistics for this modelling omission.
We made the adjustments differently by blood cell index and by analyzer model according to whether the index was measured
or derived. For each analyzer model the measured indices are a minimal subset of indices from which the full set of indices can
be deterministically calculated (Table S1). These subsets were chosen, from all possible minimal subsets, to correspond as closely

as possible to direct independent measurements made by the analyzers. The derived indices are the indices complementary to the
measured indices and which can therefore be calculated from them.
We made the blood cell index adjustments in two stages. In the first stage we removed technical outliers and independently
adjusted each measured index for technical and seasonal covariates. We then recomputed the derived indices from the measured
indices. In the second stage we adjusted both measured and derived indices for non-seasonal environmental covariates and for sex.
FBC indices divide into variables that have a population distribution with positive support (counts and concentrations), and vari-
ables that have a population distribution with support in [0,1] or [0%,100%] (cell count ratios and volume proportions). We adjusted
the positively supported indices on the log-scale and the proportion-supported indices on the logit scale. We call these scales the
adjustment-scales for the indices. To adjust platelet distribution width and red cell distribution width, we computed the standard
deviations of the platelet volume and red-cell volume distributions, adjusted these on the log-scale and then recomputed the distri-
bution widths as coefficients of variation.
Technical and Seasonal Variance
Clinical FBCs, like all assays, are subject to measurement error. Moderate technical variation in FBCs is rarely a concern for clinicians
who use FBC reports to diagnose or exclude blood pathologies that cause a large deviation in a measured blood cell parameter from
its typical population value. However, the power of quantitative trait association analysis depends monotonically on the proportion of
variance explained by the associated allele. It is therefore important to remove as much technical variation from the measured trait
values as possible.
By visual inspection of within-instrument window-averaged time series for each blood cell index (e.g., plots of mean index value by
day of study, by week of study within machine, by time of day within machine) we identified for some or all of the measured indices for
both studies the following sources of technical variance (Figures S1 and S2):
d differences in the average index value by instrument

d short time periods during which the day-averaged value of the instrument reading deviated dramatically from the global
average value for the instrument over the duration of the study, probably due to temporary aberrant behavior of the instrument
d continuous long term drift in the average index value reported by the instrument over time
d time-discontinuities in average index values probably due to calibration events
d variation in the average index value by time of year
d variation in the average index value by age of sample i.e., time between venipuncture and measurement
d variation in the average index value by the time of day of measurement.
For each blood cell index, we used the central part of the data (the data differing from the median by less than 3.5 median absolute
deviations on the adjustment scale) to estimate the effect on the mean of the (adjustment scale transformed) index of within machine
time-dependent drifts, delay time between venipuncture and measurement, day of the week and time of year. We restricted the
model fit to the central part of the data in order to minimize influence from outlying data points. After fitting the regression model
we computed model residuals for the full dataset and used these residuals to compute index values adjusted for technical effects.
Specifically, we used the R package mgcv (https://cran.r-project.org/package=mgcv) (R Core Team, 2014; Wood, 2011) to fit a
generalized additive model (GAM) with the following regression equation:
X X
Eðaðyi ÞÞ = s½tðiÞ5mðiÞ + s tday ðiÞ; tven ðiÞ 5mðiÞ + 1wðiÞ = w + c tyear ðiÞ + 1mðiÞ = m
w˛ m
fmon;.;sung
Here:
d a denotes a function transforming the measured index data y to the adjustment scale.
d m (i) denotes the instrument used to acquire measurement i.
d w (i) denotes the day of the week on which measurement yi was acquired.
d t (i) denotes the time difference between the time of measurement of observation i and midnight (am) on the first day of the
study.
d tyear (i) represents the difference between the time of measurement of observation i and midnight (am) on the 1st of January on
the year in which observation i was measured.
d tday (i) represents the difference between the time of measurement of observation i and midnight (am) on the day of
measurement.
d tven (i) represents the difference between midnight (am) on the day observation i was measured and the time of venipuncture.
d Each term in square brackets represents a contribution to the linear predictor.
d s[ ] indicates a smoothing term. For the univariate terms we smooth using P-splines, while for bivariate smooths we smooth
using thin plate splines.
d c[ ] indicates a cyclic smoothing term, used here to model seasonal variation on a circle representing time of year.
d We use the symbol 5 to indicate the presence of an interaction between the smooth and the categorical variable to its right
(in both cases here, the instrument id m (i)).

The first term in the regression equation models long term drift effects, for which we fit a smooth with 50 knots, allowing a different
drift model within each machine. The second term (bivariate in tday (i) and tve (i) and for which we fit a smooth with 30 knots) jointly
absorbs variation due to mean drift within machine over the course of a day and mean drift caused by the time delay between
venipuncture and measurement. The cyclic term (30 knots) models seasonal effects for which we force consistency across the
instruments. The dummy variable terms model mean differences by day of the week and machine.
After making the adjustment for drift we sought to remove data-points due to periods of aberrant operation. After transforming the
index data to the adjustment scale, we computed a standardized score zd,m to measure the deviation of the day (d) and instrument (m)
specific average trait values from the global mean value:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jmeanfaðyi Þ: mðiÞ = m; dðiÞ = dg medianfaðyi Þg j
zd;m = j#faðyi Þ: mðiÞ = m; dðiÞ = dg j 3
medianfjaðyi Þ medianfaðyi Þg j g
Here:
d a (yi) represents the trait data for observation i on the adjustment-scale, after correction for drift using the GAM.
d d (i) denotes the day on which the index measurement for i was acquired.
d Measurements acquired on day-instrument pairs with fewer than 10 data-points or for which zd,m > 8 were excluded from
further analysis.
After making these exclusions we refitted the GAM for drift described above to obtain measured index values that are adjusted for
drift effects without the influence of data from aberrant days. We then recomputed the derived indices from the measured indices. For
some indices, the power gained from the adjustments for technical effects alone is equivalent to thousands of additional samples
(Figure S1).
Exclusions Based on Phenotypes and Covariates
We sought to exclude individuals with blood cancers or major blood disorders from the UK Biobank study on the grounds that, if
included, their noisy blood counts may reduce the power to detect genetic associations. Using data from the baseline health assessment
self-report, the linked cancer registry and linked hospital inpatient record summaries, we identified and removed individuals suffering
from blood cancers or other blood disorders. Specifically we excluded participants who had a self-report or medical history containing
a record of myelofibrosis, lymphoma, leukemia, malignant lymphoma, multiple myeloma, multiple myelofibrosis or myelodysplasia,
chronic lymphocytic leukemia, chronic myeloid leukemia, acute myeloid leukemia, polycythemia vera, polycythemia, a myeloprolifera-
tive disorder, essential thrombocytosis, a hematological cancer histology report, an unspecified lymphatic or general hematological
neoplasm, a myelodysplastic syndrome, or an unspecified heme malignancy, monoclonal gammopathy, an unspecified hereditary he-
matological disorder, hemochromatosis, thalassemia, hemophilia, sickle cell anemia, neutropenia, lymphopenia or pancytopenia. In
aggregate this excluded 5,045 participants from the UK Biobank phenotype dataset, of whom 1,611 had measured genotypes.
Since we had no access to detailed health record data on the INTERVAL participants, we did not make any similar exclusion for
INTERVAL. However, participants in the INTERVAL study are generally healthier than those in UK Biobank and are active whole blood
donors, therefore the incidence of blood disorders is likely to be substantially lower. Hematologists screened the baseline full blood
counts of INTERVAL participants and very few probable cases of leukemia were identified.
Non-seasonal Environmental and Variance Explained by Sex Differences
We adjusted all indices for environmental and sex differences using a GAM, again solely using the central part of the data (the data
after adjustment for technical effects, differing from the median by less than 3.5 median absolute deviations on the adjustment scale)
to fit the model. The measured environmental covariates differ between the INTERVAL and UK Biobank studies and consequently the
models we fitted differed slightly.
For the INTERVAL study dataset we fit a model with the following terms:
d A univariate smooth (30 knots) for age at venipuncture, with an interaction with a categorical variable describing menopausal status
with the following levels: male, female-premenopausal, female-postmenopausal, female-had-hysterectomy, no-answer, unsure
d A bivariate smooth (30 knots) for log-height and log-weight (which implicitly adjusts for body-mass index [BMI]), with the same
categorical interaction variable as for age
d A univariate smooth for pack-years of smoking
d A categorical variable describing current smoking habits with levels: never, special-occasions, rarely, occasional, most-days,
every-day, no-answer
d A categorical variable describing alcohol drinking status with levels: never, previous, current, no-answer
d A categorical variable describing current alcohol drinking habits with levels: never, special-occasions, 1-3-times-monthly,
1-2-times-weekly, 3-5 times weekly, most-days, no-answer.
For the UK Biobank study dataset we fit a model with the following terms:
d A univariate smooth (30 knots) for age at venipuncture, with an interaction with a categorical variable describing menopausal
status with the following levels: male, female-premenopausal, female-postmenopausal, female-had-hysterectomy, unsure

d A univariate smooth (30 knots) for number of days since last period (within women only)
d A bivariate smooth (30 knots) for log-height and log-weight (which implicitly adjusts for BMI), with the same categorical inter-
action variable as for age
d A univariate smooth (30 knots) for quantity of alcohol consumed the day prior to recruitment
d A univariate smooth for pack-years of smoking
d A categorical variable describing current smoking habits with levels: never, special-occasions, rarely, occasional, most-days,
every-day, no-answer
d A categorical variable describing alcohol drinking status with levels: never, previous, current, no-answer
d A categorical variable describing current alcohol drinking habits with levels: never, special-occasions, 1-3-times-monthly,
1-2-times-weekly, 3-5 times weekly, most-days, no-answer.
For both datasets, where data-points were missing for a covariate, we imputed them by the mean covariate value and included a
dummy variable to allow the mean of the index value for individuals with missing data to differ from the mean index value for individ-
uals with non-missing data.
Removal of Outliers and Normalization
We removed observations by index for which there was a large difference between the raw measured index value and the adjusted
index value. Specifically, we removed a data point if the difference, on the adjustment scale, between the original raw measured data
and the adjusted data was more than 3.5 median absolute SD from the median of the distribution of such differences for the relevant
index.
We removed outliers from the phenotype data. We first considered outliers in each marginal univariate distribution. For each index
on the adjustment scale, we removed all data-points lying more than 4.5 median absolute deviations from the median index value on
the adjustment-scale. We then grouped the indices as follows:
d MPV, PLT#, PDW, PCT (platelet traits)

d HGB, RBC#, MRV, MCV, MCH, MCHC, RDW, RDW, RET#, HLR, HCT, RET%, HLR%, IRF (mature and immature red cell traits)
d RET, HLR, RET%, HLR%, IRF (immature red cell traits)
d WBC#, NEUT#, MONO#, BASO#, EO#, LYMPH#, MYELOID, GRAN#, (EO+BASO)#, (NEUT+EO)#, (BASO+NEUT)#, NEUT%,
EO%, MONO%, LYMPH%, BASO%, GRAN%MYELOID, EO%GRAN, NEUT%GRAN, BASO%GRAN (white cell traits)
d NEUT#, BASO#, EO#, GRAN#, (EO+BASO)#, (NEUT+EO)#, (BASO+NEUT)#, EO%GRAN, NEUT%GRAN, BASO%GRAN
(granulocyte traits)
d NEUT#, MONO#, BASO#, EO#, MYELOID, GRAN#, (EO+BASO)#, (NEUT+EO)#, (BASO+NEUT)#, GRAN%MYELOID, EO%
GRAN, NEUT%GRAN, BASO%GRAN (myeloid white cell traits)
d all traits
After standardizing the variables on the adjustment scale, we performed a principal component analysis for each group and
computed the sum of squares of the leading d PC-scores where d is the number of independent measurements required to compute
the variables in each group. We compared the sum of squares to a c2d distribution and removed outliers falling into the upper 107 tail
probability.
Finally, within each study we quantile-inverse-normal transformed the trait data within each level of a categorical variable formed
by crossing a categorical variable indexing the hematology analyzer with a categorical variable with the levels male, female-premen-
opausal, female-postmenopausal, female-had-hysterectomy, no-answer, unsure.
The final number of participants passing phenotype and genotype QC from each of the studies is shown in Table S2, along with
summary statistics for each blood cell index.
Association Tests, Meta-Analyses, and Identification of Distinct Associations

Univariable GWAS
Genetic and phenotypic QC retained 173,480 participants (87,265 in the UK Biobank study dataset, 45,694 in the UK BiLEVE study
dataset and 40,521 in the INTERVAL study dataset). We performed a univariable GWAS for each of the 36 blood cell indices that had
phenotype data measured or derived in all three studies. Specifically, we computed the association statistics (i.e an estimate of the
regression coefficient and the corresponding standard error) for a mixed linear regression of phenotype on the probabilistic imputed
allele dose (i.e., an additive model) separately for each of the three datasets using BOLT-LMM v2.2 (Loh et al., 2015). The linear mixed
model accounts for the genetic component of phenotypic correlation generated by relatedness. In order to maximize protection
against confounding by large scale relatedness, we included a dummy variable for each recruitment center and the first ten PCs
of the study specific kinship matrices as covariates in each regression model.
Meta-Analyses and Significance Threshold
Having performed univariable GWAS within each study, we then combined the results across the three studies using meta-analysis.
For inclusion in the meta-analysis, a variant had to have a study-specific MAF > 0.01%, an imputation dataset-specific information

score greater than 0.4, and non-missing effect size estimates and standard errors in all three datasets. 29.5 million variants were re-
tained. We performed an inverse variance weighted meta-analysis using METAL (Willer et al., 2010). To guard against confounding by
unmodeled relatedness, we performed double genomic control to adjust the pre and post meta-analyses standard errors for variance
inflation, with respect to a genome-wide null assumption. The inflation factors were estimated as ratios of the median of the observed
c21 test statistics to the median of the c21 distribution. Finally, using the meta-analyses summary statistics, we performed a Wald test
for each blood cell-index variant pair against the null hypothesis of no additive allelic association. We used the significance level a =
8.31x109, a threshold recently estimated for genome-wide analyses of common, low frequency and rare variants (UK10K Con-
sortium et al., 2015; Xu et al., 2014).
Heterogeneity Filtering
Substantial statistical evidence for heterogeneity in effect sizes between the studies of a meta-analysis for a genome-wide significant
variant is often taken to suggest a false-positive association. However, effect size heterogeneity in GWAS can be generated by:
d population-genotype interactions (i.e., true allelic effect size differences between studies),
d variation in LD between study populations,
d study specific quantile-inverse-normal transformations, when there are differences in the adjustment of phenotypes for cova-
riates between studies,
d differences in genotyping measurement error between studies (when independent of phenotype, such errors tend to bias as-
sociations toward the null) and
d differences in phenotyping measurement techniques between studies, none of which are necessarily reasons to regard an
observed population association as spurious.
Due to the high power of the present analysis, we found that common variants showing directionally concordant evidence for
association across the three studies were often removed when we filtered variants by thresholding a statistic measuring evidence
for quantitative heterogeneity in effect size (Cochran’s Q). Consequently, we devised an alternative (generalized) statistic to detect
heterogeneity in effect size that we regard as implausible for genuine population associations. The three dimensional plot (Figure S2E)
illustrates our approach.
Model Selection by Stepwise Multiple Regression
Many of our observed associations likely reflect the same underlying causal signal due to LD between the variants. For each blood
index, we therefore sought to identify a parsimonious set of genetic variants explaining the genome-wide significant associations by
stepwise multiple linear regression, using the fastLM implementation in the R package RcppEigen. We first partitioned the blood in-
dex-specific genome-wide significant variants into the unique minimal set of blocks such that no block could be further partitioned
into subsets of variants separated by at least 10Mb. We then performed a block and blood index-specific bidirectional stepwise
model selection procedure, combining the individual level data from all three studies. Every regression model we assessed included
the covariates used in the original marginal analyses (i.e., study-specific principal component scores and dummy variables for
recruitment center). Additionally, we included dummy variables to absorb between-study blood index variation, an adjustment which
was intrinsic to the meta-analyses of marginal associations.
The stepwise procedure started with the ‘empty’ model, containing only covariates as predictors. At each step we adjusted the
model by:
1. adding the unmodeled variant with the smallest p value for association with the residuals of the current model, providing that
such a p value was below the genome-wide significance level (8.31x109)
2. iteratively pruning variants from the model when the p value comparing the current model with the sparser model was greater
than the genome-wide significance level, with the variant corresponding to the largest such p value being pruned at each
iteration.
When neither 1. or 2. were possible the procedure terminated. We modeled only the additive effects of the imputed allele dosages.
After identifying a terminal set of variants for each block, we merged the variants for each blood index across blocks and ran the
same stepwise procedure but on the merged set of variants for each index, starting with the saturated model. This ensured selection
of a set of variants for each index that were mutually conditionally significant at the genome-wide level, accounting for any residual LD
over 10Mb. Although the stepwise procedure made no adjustment of p values to account for the model search, it also ignored addi-
tional strong evidence for associations from the apposition of distinct signals. Our genome-wide significance level is conservative, so
the selected variants for each index are likely to represent causally distinct signals, except in regions where imputation is imprecise
(where multiple variants may tag a single causal signal).
We report univariable and multivariable summary association statistics for the variants with conditionally significant associations in
Table S4.
Consensus Set of Variants over Blood Indices
Because we performed a distinct model selection procedure for each blood cell index, a locus that was associated with
multiple indices could be represented by different sentinel variants. To identify conditional variants reflecting the same signals, we
clumped the selected set of variants from all indices using pairwise LD. First, we identified the set of variants considered conditionally

significantly associated with at least one index after model selection. We then ‘clumped’ the variants by taking each conditionally
significant variant in turn and looking for conditionally significant variants with in LD (r2 > 0.8 in the UK Biobank dataset). If other var-
iants with r2 > 0.8 were found, then these variants were assigned to the same clump. If there were no such variants, then the new
variant was assigned to a new clump. The process was repeated until each variant was assigned to a single clump. We report the
summary information for each clump in Table S3. We defined the sentinel variant within each clump as the variant with the smallest
univariable association p value across all indices.
Annotation of Associated Variants

Conditionally Significant Variant Annotation
We queried dbSNP v14 to retrieve rsIDs for all variants if available (Sherry et al., 2001). All conditionally significant variants
were further annotated using the Ensembl Variant Effect Predictor (VEP) with Ensembl v83 and Gencode v24 for gene annotations
(McLaren et al., 2016). Annotations were retrieved using the ‘‘most severe’’ option, which considers variant annotations across all
genes and transcripts and selects the consequence with the greatest severity in terms of potential functional consequence (Table
S4). Where the most severe consequence affected multiple genes (e.g., a variant that is intronic in overlapping genes), we listed
all genes.
Associations with Traits and Complex Diseases
To identify whether our blood cell trait-associated signals were novel, we extracted previously reported sentinel variants associ-
ated with red blood cell traits, white blood cell traits or platelets from a recent review of published GWAS (Vasquez et al., 2016),
supplemented by a literature review to identify more recent genetic studies of blood cell traits (Chami et al., 2016; CHARGE
Consortium Hematology Working Group, 2016; Eicher et al., 2016; Polfus et al., 2016; Schick et al., 2016; Tajuddin et al.,
2016; Ulirsch et al., 2016). We defined a locus as ‘previously reported’ if our sentinel variant, or any of its strong proxies (defined
as r2 > 0.8 in European participants from the 1000 Genomes project Phase 3 or the UK10K project) had been previously reported
(Table S3).
To identify whether our signals have also been associated with other traits or disease outcomes, we interrogated PhenoScanner
(www.phenoscanner.medschl.cam.ac.uk), a variant-phenotype database capturing a wide range of large-scale genetic association
studies, primarily from GWAS. The database includes the NHGRI-EBI GWAS Catalog (Welter et al., 2014), the GRASP database (Le-
slie et al., 2014), plus more than 100 publicly available sets of summary statistics from published studies. For each of our sentinel
variants, we identified all proxies with r2 R 0.8 in the European participants from 1000 Genomes Phase 3 or the UK10K project.
We then retrieved all associations with p value < 5x108. Associations were flipped across proxies and traits to achieve a consistent
direction of effect for each sentinel variant. For ease of interpretation, we split associations into three categories: expression QTL,
metabolites and other traits or diseases (Table S5).
Annotation of Clinically Relevant Genes and Variants
First, we annotated all strong proxies (r2 R 0.8) of our sentinel variants using VEP as described above and identified coding variants
likely to have functional consequences (i.e., missense, nonsense, frameshift, splice site). Second, we took a systematic approach to
identifying likely causal genes in regions identified to be associated with blood cell indices, using sets of genes known to cause
relevant rare diseases from ClinVar and the set of genes that contain the alleles defining red cell, platelet and neutrophil antigens.
ClinVar is a manually curated database of genetic variants that have evidence for a pathogenic role in human disease or phenotypes
(Landrum et al., 2016).
Integration with BLUEPRINT Cell Type Specific Epigenetic Data
As part of the BLUEPRINT project, ChromHMM (Ernst and Kellis, 2012) was used to segment the genomes of primary blood cells into
regulatory states obtained from histone marks - H3K4me3, H3K4me1, H3K36me3, H3K27ac and H3K9me3 - and DNaseI hypersen-
sitive sites. The regulatory states are as follows: E1:Transcription low signal H3K36me3, E2:Transcription high signal H3K36me3,
E3:Heterochromatin high signal H3K9me3, E4:Low signal, E5:Repressed Polycomb high signal H3K27me3, E6:Repressed Poly-
comb low signal H3K27me3, E7:Repressed Polycomb TSS high signal H3K27me3 & H3K4me3 & H3K4me1, E8:Enhancer high
signal H3K4me1, E9:Active Enhancer high signal H3K4me1 & H3K27Ac, E10:Active TSS high signal H3K4me3 & H3K4me1,
E11:Active TSS high signal H3K4me3 & H3K27Ac.
We focused on the cell types matched as closely as possible to the GWAS traits, specifically CD34-negative CD41-positive CD42-
positive megakaryocytes (cord blood, 2 samples), erythroblasts (cord blood, 2 individuals), CD14-positive CD16-negative classical
monocytes (venous blood, 2 individuals), mature neutrophils (venous blood, 4 individuals), mature eosinophils (venous blood, 2 in-
dividuals), naive B cells (venous blood, 3 individuals) and CD4-positive alpha beta T cells (venous blood, 4 individuals). We merged
the segmentations across individuals by defining consensus states based on majority vote plus one. (e.g., for cell types measured in 2
individuals, both individuals must be called in a region as ‘‘Transcription High Signal - H3K36me3’’ for a that to be the consensus call
in the region).
We used LD score regression v1.0.0 (Finucane et al., 2015) to estimate the heritability due to common (MAF > 5%) genetic variants
for each trait and to partition that heritability across regulatory states estimated from epigenomic data measured in matched cell
types. We generated LD scores using the HapMap3 common variants measured in 1000 Genome Europeans (excluding Finns).
We then partitioned the heritability into regulatory states estimated by the BLUEPRINT consortium.

LD score heritability estimates are based on summary statistics and are biased by genomic control adjustment. Consequently, we
adjusted each raw heritability estimate by the factor
nINTERVAL nUKBiLEVE nUKBiobank
+ +
lINTERVAL lUKBiLEVE lUKBiobank
lMETA nINTERVAL nUKBiLEVE nUKBiobank ;
+ +
l2INTERVAL l2UKBiLEVE l2UKBiobank
where each l corresponds to a genomic control inflation factor (Table S2), to undo approximately the effect of our genomic control
adjustments.
In order to measure systematically the statistical significance of the overlaps between our blood cell index-associated variants and
BLUEPRINT epigenetic data, we used GARFIELD (Iotchkova et al., 2016b), a novel enrichment analysis approach that uses genome-
wide association summary statistics to calculate odds ratios for association between annotation overlap and disease status at given
genome-wide statistical significance thresholds. Tests for significance are implemented via generalized linear modeling framework
accounting for LD, minor allele frequency (MAF), and local gene density. LD (r2) was calculated with PLINK v1.9 using variants from
the combined UK10K and 1000 genomes Phase3 European cohorts in 1 Mb windows. Overlap of blood cell index-associated var-
iants with BLUEPRINT annotations was based on genomic position overlap or LD tagging (r2 R 0.8). Variants significantly associated
with blood cell indices were ‘greedily’ pruned by sequentially retaining the most significant variant and pruning around it (LD r2 R 0.1)
until no significant variants remained. This approach tries to ensure independence of variants in the enrichment tests, while ensuring
that we retain the most significantly associated variants. We tested for enrichment all variants with MAF R 1% reaching a p value of
1x108 and performed multiple testing correction based on the number of traits, segmentation states and cell types used.
Integration with BLUEPRINT Molecular QTL Data
Many of the common variants we discovered were non-coding (i.e., intronic, intergenic, in 50 or 30 untranslated regions or were just
upstream or downstream of genes) suggesting they may act through regulatory mechanisms. To investigate this, we tested coloc-
alization of the 29.5 million variants we included in our GWAS of blood indices with BLUEPRINT molecular QTL data (Table S6) using
the software SMR (Summary data-based Mendelian Randomization) (Zhu et al., 2016). The BLUEPRINT QTL data consists of expres-
sion QTL (eQTL), splicing QTL (sQTL) and a histone mark H3K4me1 (hQTL) identifying sites of active or poised enhancers in 200
European samples (Chen et al., 2016). Data were available for monocytes, neutrophils and T cells, hence we restricted our annotation
to loci that were associated with myeloid or lymphoid cell indices. SMR takes the variant with the most statistically significant asso-
ciation with each QTL (defined as p < 5x108), then tests whether the ratio of that variant’s effect size with the QTL against its effect
size with each myeloid or lymphoid index is significant (p < 0.001). Having established the presence of a QTL and a blood cell index
association at the same locus, the software then proceeds to test whether this apparently colocalized signal is the result of linkage
(i.e., two independent signals in the same genomic region) or causality/pleiotropy (i.e., the same causal variant affects both the QTL
and the blood cell index). This is performed via a Heterogeneity In Dependent Instruments (HEIDI) test statistic, which assesses the
homogeneity of the ratio across variants in the region, with p > 0.05 indicating colocalization (Figure 6).
Mendelian Randomization Analysis

To evaluate the potential causal role of blood cell indices on common complex diseases, we used the set of variants we identified to
perform multivariable Mendelian Randomization (MR) analysis (Table S7). MR analysis uses the random allocation of alleles at
conception to obtain an ‘‘unconfounded’’ estimate of the association between a risk factor and an outcome, thereby avoiding the
potential residual confounding and reverse causation in observational association studies. This is done by effectively treating the ge-
netic information as a proxy for the exposure (in this case, a blood cell index). Under certain assumptions, particularly that the genetic
variant only has one causal pathway to the disease which is via the blood cell index, one can assess the likely causal relationship
between blood cell index and disease. Multivariable MR analysis has the added benefit that we can estimate the causal effect of
each blood cell index on the outcome, conditioning on all other blood cell indices, thereby allowing us to account for the correlation
between them.
Due to the high degree of genetic correlation between the blood cell indices, in particular due to the presence of calculated and
compound indices, we initially selected the minimal set of indices needed to represent all 36 indices by filtering out those that were
strongly correlated (r2 > 0.8). This left 13 sentinel indices (PLT#, MPV, PDW, HCT, MCH, RDW, RET#, IRF, MONO#, NEUT#, EO#,
BASO# and LYMPH#; Table S1). We obtained variant association summary statistics (i.e., betas and standard errors) from publicly
available data using the PhenoScanner (Staley et al., 2016) and ImmunoBase (www.immunobase.org). To be included, a dataset had
to be large (> 5000 disease cases), have good genome coverage (> 100,000 variants), and allow identification of the direction of effect
at each variant. We were able to analyze three cardiometabolic diseases (coronary heart disease, Type 2 diabetes, chronic kidney
disease), five neuropsychiatric diseases (Alzheimer’s disease, bipolar disorder, cross disorder, major depressive disorder and
schizophrenia) and six autoimmune diseases (asthma, celiac disease inflammatory bowel disease, multiple sclerosis, rheumatoid
arthritis and Type 1 diabetes). We identified overlapping variants between our disease dataset and the list of proxies (variants
with R2 > 0.8 with the sentinel variant) for our sentinel variants which went into the MR analysis. We then performed multivariable
MR using the inverse variance weighted approach, which uses summary statistics to regress the effect of each variant on the disease

outcome against its effects on the blood cell indices (Burgess and Thompson, 2015). To account for the 182 tests (13 blood cell trait
indices x 14 disease outcomes), we applied a Bonferroni correction and considered associations with p < 2.7x104 (i.e., 0.05/182) to
be significant.
To assess how robust our results were, we then performed sensitivity analysis using multivariable MR-Egger to test for pleiotropy.
This fits the same model as the multivariable MR but allows the intercept to be freely estimated, which represents the level of unbal-
anced pleiotropy in the system (Bowden et al., 2015). Furthermore, for each blood cell index the regression coefficient is realigned
(i.e., flipping the signs so all the associations with the index are increasing and adjusting the signs on the association with the disease
accordingly to account for this) separately which ensures the intercept represents the level of unbalanced pleiotropy for that
index. Since many of our most significant results involved white blood cell indices and autoimmune diseases, which both have large
components of heritability driven by the MHC region, we also performed a sensitivity analysis removing the region surrounding MHC
(chr6:20,000,000-40,000,000). To ensure our strong association between eosinophil count and asthma risk was genuine and not
driven by a few variants with very strong effects, we removed all known variants associated with asthma at GWAS levels (p <
5x108) before repeating our analysis for asthma as a sensitivity analysis. Finally, we assessed whether our results were driven by
loci that are associated with many cell lineages by repeating our analyses excluding the 42 sentinel variants representing clumps
univariately associated with all five index classes (i.e., platelets, mature red cells, immature red cells, myeloid cells, lymphoid cells).

Figure S1. Adjustment for Technical Covariates Affecting Full Blood Count Measurements, Related to Figure 1, Tables S1 and S2, and the
STAR Methods
(A) Day averaged measurements of MCV taken from a single instrument over the course of UK Biobank baseline recruitment. The discontinuities may have been
generated by calibration of the machine against a variable deterministically related to MCV. Continuous drift is visible within some of the piecewise continuous
segments. The left plot is obtained using the raw data while the right plot is obtained using the technically adjusted trait, showing elimination of discontinuities and
drift.
(B) The effect of the time of day of acquisition on the average measurement of MONO%. Data are taken from a single Coulter instrument over the full UK Biobank
baseline recruitment period. The left plot is obtained using the raw data while the right plot is obtained using the technically adjusted trait, showing elimination of
the dependence of the mean of MONO% on time of day.
(C) Example of the effect of time delay between venipuncture and acquisition on the measurement of the mean white blood cell count. Each point gives the
average WBC# for samples acquired during baseline UK Biobank recruitment on a single Coulter instrument during a fifteen minute delay interval. The boundaries
of the shaded region interpolate the 95% confidence intervals of the means. The left plot is obtained using the raw data while the right plot is obtained using the
WBC# trait data that has been adjusted for the technical covariates. The dependence of the mean cell count on delay time has been eliminated.
(D) Percentages of the variance of each UK Biobank measured variable explained by the adjustment for technical covariates and seasonal drift on the relevant
adjustment scale. Integer labels show the effective number of additional samples gained from making the technical adjustments, meaning the expected number
of additional samples that would be required to obtain equivalent p values in a GWAS for the trait if the adjustment were not made.
(E) As for (D) except for INTERVAL.
Figure S2. Adjustments for Sex and for Biological and Environmental Covariates Affecting Full Blood Count Measurements, Related to
Figure 1, Tables S1 and S2, and the STAR Methods
(A) The dependence of mean neutrophil count on sex and menopause status in the UK Biobank data adjusted for technical effects. The top plot is obtained using
the raw data while the bottom plot is obtained adjusting the data for menopause and sex effects showing the elimination of the variance these covariates explain.
(B) Day averaged measurements of neutrophil count taken from a single instrument over the course of the UK Biobank baseline recruitment. There is a long run
upward drift in the average count over time. Seasonal oscillation in the average counts is also visible. The top plot is obtained using the raw data while the bottom
plot is obtained using the technically adjusted data, showing the elimination of drift and seasonal oscillation.
(C) Percentage of variance of UK Biobank traits explained (on the relevant adjustment scale) by sex and covariates affecting full blood counts, including age,
menopausal status, smoking and alcohol variables.
(D) As for (C) except for INTERVAL traits.
(E) Illustration of the method used to determine the weight of evidence that heterogeneity in effect sizes across the three studies exceeded a tolerance criterion.
The axes represent effect sizes in UK Biobank, INTERVAL and UK BiLEVE. The black dot represents the vector of study specific effect size estimates ( b b UK Biobank,
b
b INTERVAL, b b UK BiLEVE,) for a variant. If the dot lies inside the infinite yellow double-pyramid (defined by three planes intersecting the origin, each normal to one of
n1 = (1, 1/4, 1/4), n2 = ( 1/4,1, 1/4), n3 = ( 1/4, 1/4, 1)) we consider that there is no evidence of between study heterogeneity. If the black dot lies outside the
yellow double-pyramid we measure the strength of evidence for heterogeneity as the distance between the black dot and the nearest point on the surface of
the pyramid (red dot), with distances scaled to account for the standard errors of the study specific estimators. The nearest point on the pyramid is thus defined as
the point in the smallest confidence surface for the estimators that intersects the pyramid (blue ellipsoid). We thresholded the distance score at 5.2 and filtered all
variant-blood index pairs exceeding the score from further analysis.
A
INTERVAL
INTERV
RVAL
Raw genotype data

(48,813 samples - 10 batches,
820,967 genotyped variants)
• Batch-specific variant exclusions: non-best probesets, standard Affymetrixx criteria
fails, non-autosomal, multiallelic, intensity outliers, call rate<95%.
• Batch-specific sample exclusions: non-Europeans, duplicates, call rate<97%.
• Additional variant exclusions: HWE p-value<5x10-6, call rate<97%, variants failing >4
batches because of HWE/call rate/Affymetrix
/ x criteria fail
Merge 10 batches
• Global sample exclusions: heterozygosity (>3SD from the mean), potential sample
contamination (contamination>10% or contamination>3% and >10 close relatives),
missing gender or mismatching genotypic/phenotypic gender, r visual ethnic outliers,
outliers,
across-batch duplicates, duplicates with UK Biobank/UK BiLEVE.
• Global variant exclusions: MAFF difference>0.05 across batches, monomorphic in a
batch but MAF>0.01 in another, r MAF<0.475 and different minor alleles across
batches
Post-QC dataset
• Pre-imputation variant exclusions: HWE p-value<5x10-6, call rate<99% in passed
batches, call rate<75% across all 10 batches, monomorphic variants
Dataset for imputation
(43,059 samples,
654,966 variants)
Imputation
(87,696,910 variants)
• Post-imputation variant exclusions: Info
score<0.4, MAF<0.0001, failed these filters in UK
BiLEVEE or UK Biobank
Association analysis
(~29.5M variants)
B
UK Biobankk + UK BiLEVE
UK Biobank + UK BiLEVE
raw genotype data
(153,293 samples in 33 batches
820,967 variants)
• Batch-specific variant exclusions: duplicate probesets, standard Affymetrixx criteria
fails, non-autosomal,l multiallelic, call rate<95%, within-batch plate effects, HWE
deviation
• Batch-specific sample exclusions: sex mismatches, duplicates, heterozygosity
outliers, high missingness
• Additional variant exclusions: batch effects,
UK Biobank + UK BiLEVE
imputed data release
(151,733 samples,
72,355,667 variants)
• Secondary sample exclusions: non-Europeans (genetic distance>50 based on first

15 PCs), heterozygosityy outliers (>3SD from the mean), duplicates
Analysis dataset
(132,959 samples)
UK Biobank UK BiLEVE
(87,265 samples) (45,694 samples)
• Post-imputation variant exclusions: Info
score<0.4, MAF<0.0001, failed these filters in UK
E UK Biobankk or INTERVAL
BiLEVE, INTERVAL
V
Association Association
analysis analysis
(~29.5M variants) (~29.5M variants)

Figure S3. Quality Control of Genetic Data for UK Biobank, UK BiLEVE, and INTERVAL, Related to the STAR Methods
Workflow describing QC steps for genotypic datasets. Detailed description of QC can be found in the STAR Methods and on the UK Biobank website (http://
biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=155580).
(A) INTERVAL samples.
(B) UK Biobank + UK BiLEVE samples.
SnapShot: Epigenomic Assays
Martin Krzywinski1 and Martin Hirst1,2
1
Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency Research Centre,
BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada; 2Department of Microbiology and Immunology,
Michael Smith Laboratories Centre for High-Throughput Biology, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
3 4
1
2
10 bp
0.1–1 kb 10–100 kb
150 bp
1 DNA methylation analysis 2 Chromatin immunoprecipitation sequencing

Cytosine modification is a feature Histones are decorated with
of the mammalian genome. chemical modifications.
Sonicate Shear DNA

Histone Sonicate Digest Shear DNA
CpG mCpG hmCpG
Modified histone
5-methyl-cytosine (5mC) and its
oxidative derivatives (e.g., 5hmC) Add linkers DNA wound
are measured genome-wide around histone
using enrichment- and Enrich
conversion-based methodologies
Genomic positions of modified histones are Antibody
followed by massively parallel
sequencing. Bisulfite conversion measured genome-wide by chromatin
provides quantitative measure- Bilsulfite Enrich immunoprecipitation followed by massively
ments of 5mC but is unable to treatment parallel sequencing (ChIP-seq). Histones can
distinguish 5hmC. Antibody Antibody be liberated from the genome by sonication,
enrichment provides qualitative enzymatic digestion, or by transposon Remove
measurement of 5mC and 5hmC. insertion (not shown). If sonication is used, histone and
Bisulfite-converted or -enriched UCU CU CU chromatin must first be chemically cross-linked add linkers
DNA is purified, subjected to (see Step 4). Following histone liberation,
library construction and clonally Sequence specific chemical modifications are enriched by
sequenced. Specialized and analyze immuno-absorption. DNA associated with Sequence
algorithms are required to align enriched histones is purified, subjected to and analyze
bisulfite-converted reads to a library construction, clonally sequenced, and
reference genome. aligned to a reference genome.
3 Open chromatin analysis 4 3D chromatin capture

Nucleosome-depleted regions are Chromatin loop contacts reveal
enriched in regulatory elements. distal regulatory elements and
structural domains.
Genomic positions of open chromatin are measured genome-wide by
massively parallel sequencing of a collection of DNA fragments liberated Genomic positions of long-range
Digest
from intact chromatin by either transposon insertion, enzymatic digestion, chromatin contacts are measured
or sonication. The resulting DNA fragments are subjected to size selection genome-wide by massively parallel
or phenol-chloroform extraction to deplete nucleosome-associated DNA. sequencing a collection of DNA
The resulting DNA is purified, subjected to library construction, clonally fragments generated by proximity
sequenced, and aligned to a reference genome. ligation. Intact chromatin is
cross-linked to physically link
genomically distal nucleosomes that
Biotin Ligate ends
are adjacent in 3-dimensional space.
Transposon at Digest specific

open regions to open regions
Sonicate Enrich with

streptavidin
Cross-linking Digest pull-down
Size select
Cross-linked chromatin is enzymatically
digested and the resulting DNA end
labeled with biotin and subjected to Add linkers
Add linkers proximity ligation. Ligated DNA is sheared
by sonication or enzymatic digestion and
the ligated junctions enriched by Sequence
Sequence streptavidin pull-down. The resulting DNA and analyze
and analyze is purified, subjected library construction,
clonally sequenced, and aligned to a
reference genome.
See online version for

1430 Cell 167, November 17, 2016 © 2016 Elsevier Inc. DOI http://dx.doi.org/10.1016/j.cell.2016.11.015 legends and references
SnapShot: Epigenomic Assays
Martin Krzywinski1 and Martin Hirst1,2
1
Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency Research Centre, BC Cancer Agency,
Vancouver, BC V5Z 1L3, Canada
2
Department of Microbiology and Immunology, Michael Smith Laboratories Centre for High-Throughput Biology,
University of British Columbia, Vancouver, BC V6T 1Z4, Canada
Massively parallel sequencing technologies provided the foundation from which the field of epigenomics has been built. This SnapShot depicts key sequencing-based
methods used in the analysis of epigenomes, including (1) bisulfite sequencing, (2) chromatin immunoprecipiation sequencing (ChIP-seq), (3) determination of open chromatin,
and (4) 3D chromatin capture.
Bisulfite Sequencing
Genome-wide 5-methyl-cytosine (5mC) is measured by enrichment-based methods and direct sequencing of bisulfite-converted DNA. Enrichment methods provide
qualitative measures of 5mC (Down et al., 2008), are capable of distinguishing oxidative derivatives (e.g., hmC), and can be combined with methyl-restriction-based assays to
improve detection of unmethylated cytosines (Maunakea et al., 2010). Enrichment-based methods require 50–100 million sequence reads per sample. Direct sequencing of
bisulfite-treated DNA provides quantitative measurements of 5mC genome wide but is unable to distinguish oxidized derivatives. Two main strategies for library construction
have emerged for bisulfite sequencing. In the first method (shown), a genomic library with adapters added is subjected to bisulfite treatment (Lister et al., 2009). In the second
(not shown), genomic DNA is first bisulfite-converted and then subjected to library construction (Miura et al., 2012). For conversion methods 1 billion 100 nt sequence reads
are generated for each sample and library construction methods introduce distinct library specific biases in genome coverage.
ChIP-Seq
Genomic locations of modified histones are detected genome wide by ChIP-seq (Barski et al., 2007). Typically, 25–50 million immunoprecipitated fragments are sequenced
for a histone mark. The two main strategies that have emerged to release nucleosomes from chromatin associated with 100–300 bp genomic fragments are shown. In addi-
tion, strategies that utilize transposon-based fragmentation of chromatin have recently become available (Schmidl et al., 2015).
Mapping Open Chromatin

Genomic positions of open chromatin regions are measured by directly sequencing genomic DNA fragments that are released from chromatin either by transposon inser-
tion (Buenrostro et al., 2013) or enzymatic digestion (Boyle et al., 2008; Crawford et al., 2006). A variant of this protocol (not shown) chemically cross-links chromatin and uses
sonication to release non-cross-linked regions that are subsequently enriched by phenol-chloroform extraction (Boyle et al., 2008). Sequencing requirements depend on
experimental parameters and resolution requirements and range from 10 s to 100 s millions of fragments per sample.
Chromosome Conformation Capture

Chromosome conformation capture (3C) technologies provide the location of DNA fragments that interact together on the basis of their proximity in three-dimensional (3D)
space (Bonev and Cavalli, 2016). To measure chromatin interactions genome-wide a single experiment typically requires 500 million sequence reads.
REFERENCES
Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., and Zhao, K. (2007). Cell 129, 823–837.
Bonev, B., and Cavalli, G. (2016). Nat. Rev. Genet. 17, 661–678.
Boyle, A.P., Davis, S., Shulha, H.P., Meltzer, P., Margulies, E.H., Weng, Z., Furey, T.S., and Crawford, G.E. (2008). Cell 132, 311–322.
Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y., and Greenleaf, W.J. (2013). Nat. Methods 10, 1213–1218.
Crawford, G.E., Holt, I.E., Whittle, J., Webb, B.D., Tai, D., Davis, S., Margulies, E.H., Chen, Y., Bernat, J.A., Ginsburg, D., et al. (2006). Genome Res. 16, 123–131.
Down, T.A., Rakyan, V.K., Turner, D.J., Flicek, P., Li, H., Kulesha, E., Gräf, S., Johnson, N., Herrero, J., Tomazou, E.M., et al. (2008). Nat. Biotechnol. 26, 779–785.
Lister, R., Pelizzola, M., Dowen, R.H., Hawkins, R.D., Hon, G., Tonti-Filippini, J., Nery, J.R., Lee, L., Ye, Z., Ngo, Q.-M., et al. (2009). Nature 462, 315–322.
Maunakea, A.K., Nagarajan, R.P., Bilenky, M., Ballinger, T.J., D’Souza, C., Fouse, S.D., Johnson, B.E., Hong, C., Nielsen, C., Zhao, Y., et al. (2010). Nature 466, 253–257.
Miura, F., Enomoto, Y., Dairiki, R., and Ito, T. (2012). Nucleic Acids Res. 40, e136.
Schmidl, C., Rendeiro, A.F., Sheffield, N.C., and Bock, C. (2015). Nat. Methods 12, 963–965.
1430.e1 Cell 167, November 17, 2016 © 2016 Elsevier Inc. DOI http://dx.doi.org/10.1016/j.cell.2016.11.015

Cell - 17 November 2016

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Cell - 17 November 2016

Încărcat de

Drepturi de autor:

Formate disponibile

Leading Edge

A Cornucopia of Advances in Human Epigenomics

The Cell editorial team

hands and says science doesn’t know

a Bar. It’s the job of science writers to explain

Cell 167, November 17, 2016 1141

1142 Cell 167, November 17, 2016

Cell 167, November 17, 2016 1143

Exon Skipping Therapy

DMD by the Numbers After 4 Years of

1 in 5,000 External control 15%

1985 1995 2005 2015

1144 Cell 167, November 17, 2016 Published by Elsevier Inc.

The International Human Epigenome Consortium:

*Correspondence: h.stunnenberg@ncmls.ru.nl (H.G.S.), mhirst@bcgsc.ca (M.H.)

Cell 167, November 17, 2016 ª 2016 Elsevier Inc. 1145

1146 Cell 167, November 17, 2016

Cell 167, November 17, 2016 1147

1148 Cell 167, November 17, 2016

Cell 167, November 17, 2016 1149

Are Data Sharing and Privacy

1150 Cell 167, November 17, 2016 ª 2016 Elsevier Inc.

Cell 167, November 17, 2016 1151

1152 Cell 167, November 17, 2016

Cell 167, November 17, 2016 1153

1154 Cell 167, November 17, 2016

Building Bridges through Scientific Conferences

Science, University of Copenhagen, 2200 Copenhagen, Denmark

Cell 167, November 17, 2016 ª 2016 Elsevier Inc. 1155

1156 Cell 167, November 17, 2016

Cell 167, November 17, 2016 1157

1158 Cell 167, November 17, 2016

A New Path through the Nuclear Pore

Philadelphia, PA 19104, USA

Cell 167, November 17, 2016 ª 2016 Elsevier Inc. 1159

1160 Cell 167, November 17, 2016

Veggies and Intact Grains

Cell 167, November 17, 2016 ª 2016 Elsevier Inc. 1161

Cordain, L., Eaton, S.B., Sebastian, A., Mann, N.,

1162 Cell 167, November 17, 2016

The Ties That Bind: Mapping the Dynamic

Coupling chromosome conformation capture to molecular enrichment for promoter-containing

Cell 167, November 17, 2016 ª 2016 Elsevier Inc. 1163

1164 Cell 167, November 17, 2016

Cell 167, November 17, 2016 1165

1166 Cell 167, November 17, 2016

Concerted Genetic Function in Blood Traits

Introduction performed whole-genome sequencing, RNA-sequencing (RNA-

1168 Cell 167, November 17, 2016

Cell 167, November 17, 2016 1169

Ever-Changing Landscapes: Transcriptional

Stanford School of Medicine, Stanford University, Stanford, CA 94305, USA

Introduction progress in the fields of genomics, genome editing, and imag-

1170 Cell 167, November 17, 2016 ª 2016 Elsevier Inc.

Cell 167, November 17, 2016 1171

Figure 1. Nucleosome Eviction, Enhancer Grammar, and Models of Enhancer Architecture

1172 Cell 167, November 17, 2016

Cell 167, November 17, 2016 1173

1174 Cell 167, November 17, 2016

(legend continued on next page)

Cell 167, November 17, 2016 1175

1176 Cell 167, November 17, 2016

Cell 167, November 17, 2016 1177

1178 Cell 167, November 17, 2016