The genome-wide architecture of chromatin-associated proteins that maintains chromosome integrity and gene regulation is not well defined. Here we use chromatin immunoprecipitation, exonuclease digestion and DNA sequencing (ChIP–exo/seq)1,2 to define this architecture in Saccharomyces cerevisiae. We identify 21 meta- assemblages consisting of roughly 400 different proteins that are related to DNA replication, centromeres, subtelomeres, transposons and transcription by RNA polymerase (Pol) I, II and III. Replication proteins engulf a nucleosome, centromeres lack a nucleosome, and repressive proteins encompass three nucleosomes at subtelomeric X-elements. We find that most promoters associated with Pol II evolved to lack a regulatory region, having only a core promoter. These constitutive promoters comprise a short nucleosome-free region (NFR) adjacent to a +1 nucleosome, which together bind the transcription-initiation factor TFIID to form a preinitiation complex. Positioned insulators protect core promoters from upstream events. A small fraction of promoters evolved an architecture for inducibility, whereby sequence-specific transcription factors (ssTFs) create a nucleosome- depleted region (NDR) that is distinct from an NFR. We describe structural interactions among ssTFs, their cognate cofactors and the genome. These interactions include the nucleosomal and transcriptional regulators RPD3-L, SAGA, NuA4, Tup1, Mediator and SWI–SNF. Surprisingly, we do not detect interactions between ssTFs and TFIID, suggesting that such interactions do not stably occur. Our model for gene induction involves ssTFs, cofactors and general factors such as TBP and TFIIB, but not TFIID. By contrast, constitutive transcription involves TFIID but not ssTFs engaged with their cofactors. From this, we define a highly integrated network of gene regulation by ssTFs.
Efficient spin-up of Earth System Models usingsequence acceleration
A high-resolution protein architecture of the budding yeast genome.pdf
1. Nature | Vol 592 | 8 April 2021 | 309
Article
Ahigh-resolutionproteinarchitectureofthe
buddingyeastgenome
Matthew J. Rossi1
, Prashant K. Kuntala1
, William K. M. Lai1,2
, Naomi Yamada1
, Nitika Badjatia1
,
Chitvan Mittal1,2
, Guray Kuzu1
, Kylie Bocklund1
, Nina P. Farrell1
, Thomas R. Blanda1
,
Joshua D. Mairose1
, Ann V. Basting1
, Katelyn S. Mistretta1
, David J. Rocco1
, Emily S. Perkinson1
,
Gretta D. Kellogg1,2
, Shaun Mahony1
& B. Franklin Pugh1,2 ✉
Thegenome-widearchitectureofchromatin-associatedproteinsthatmaintains
chromosomeintegrityandgeneregulationisnotwelldefined.Hereweusechromatin
immunoprecipitation,exonucleasedigestionandDNAsequencing(ChIP–exo/seq)1,2
todefinethisarchitectureinSaccharomycescerevisiae.Weidentify21meta-
assemblagesconsistingofroughly400differentproteinsthatarerelatedtoDNA
replication,centromeres,subtelomeres,transposonsandtranscriptionbyRNA
polymerase(Pol)I,IIandIII.Replicationproteinsengulfanucleosome,centromeres
lackanucleosome,andrepressiveproteinsencompassthreenucleosomesat
subtelomericX-elements.WefindthatmostpromotersassociatedwithPolIIevolved
tolackaregulatoryregion,havingonlyacorepromoter.Theseconstitutive
promoterscompriseashortnucleosome-freeregion(NFR)adjacenttoa+1
nucleosome,whichtogetherbindthetranscription-initiationfactorTFIIDtoforma
preinitiationcomplex.Positionedinsulatorsprotectcorepromotersfromupstream
events.Asmallfractionofpromotersevolvedanarchitectureforinducibility,
wherebysequence-specifictranscriptionfactors(ssTFs)createanucleosome-
depletedregion(NDR)thatisdistinctfromanNFR.Wedescribestructural
interactionsamongssTFs,theircognatecofactorsandthegenome.These
interactionsincludethenucleosomalandtranscriptionalregulatorsRPD3-L,SAGA,
NuA4,Tup1,MediatorandSWI–SNF.Surprisingly,wedonotdetectinteractions
betweenssTFsandTFIID,suggestingthatsuchinteractionsdonotstablyoccur.Our
modelforgeneinductioninvolvesssTFs,cofactorsandgeneralfactorssuchasTBP
andTFIIB,butnotTFIID.Bycontrast,constitutivetranscriptioninvolvesTFIIDbutnot
ssTFsengagedwiththeir cofactors.Fromthis,wedefineahighlyintegratednetwork
ofgeneregulationbyssTFs.
Genomes regulate genes so as to achieve homeostasis—the mainte-
nanceofcellularcomponentsinproperbalance.Theyalsoadapt,mak-
ing adjustments in rapidly changing environments, so as to regain
homeostasis3
. Achieving these tasks has necessitated the evolution
ofconstitutiveandinduciblegenecontrol.Whetherornotthesecon-
trols are fundamentally different at the molecular level is unknown.
A classical view posits a single basic regulatory paradigm for genes
(Extended Data Fig. 1a)4
: environmental signals toggle ‘on’ ssTFs that
recruitcofactorsandassembleapreinitiationcomplex(PIC)consisting
ofPolIIandgeneraltranscriptionfactors(GTFs)suchasTBP,TFIIDand
TFIIB at core promoter transcription start sites (TSSs)5
. However, the
extenttowhichconstitutivegeneexpressioninvolvesssTFsisunclear,
as ssTF-binding sites and their cofactors remain unidentified at most
promoters. ssTFs, cofactors, chromatin and PICs play into any dis-
tinction between inducible and constitutive mechanisms, but their
interrelationships remain enigmatic.
Genome-wideproteinmeta-assemblages
Here we used ChIP–exo (Extended Data Fig. 1b)1,2
, an ultra-high-
resolution version of ChIP–seq, to map genome-wide binding. We
selectedtargetproteinsonthebasisofGeneOntology(GO)annotations
related to chromosomal function (Extended Data Fig. 1c and Supple-
mentaryData 1 (1BY);characters in parenthesesreferto theworksheet
numberandcolumnletter).Intotal,wecollected1,229datasetson791
targets, of which 400 targets had reproducibly significant data (Sup-
plementary Data 2 (1A)). The interaction pattern of all 1,229 datasets
aroundindividualandbroadclassesofgenomicfeatures(Fig. 1a)canbe
visualizedanddownloadedatyeastepigenome.org(anexampleisgiven
inExtendedDataFig. 2).WealsodevelopedandprovideScriptManager,
a platform for customized analysis of these data (see Methods).
Binarizedcolocationcountsamongtargetswerehierarchicallyclus-
tered (Fig. 1b). The three largest clusters (yellow) correspond to three
https://doi.org/10.1038/s41586-021-03314-8
Received: 8 May 2020
Accepted: 29 January 2021
Published online: 10 March 2021
Check for updates
1
Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA. 2
Department of Molecular Biology and
Genetics, Cornell University, Ithaca, NY, USA. ✉e-mail: fp265@cornell.edu
2. 310 | Nature | Vol 592 | 8 April 2021
Article
majoraspectsofgeneexpression:first,promoterregulation;second,PIC
assembly;andthird,transcriptionelongation.Thus,thevastmajorityof
chromatin-associatedproteinsarededicatedtogeneregulation.Weused
uniform manifold approximation and projection (UMAP) to represent
eachdatasetasasinglepointinatwo-dimensionalprojection(Fig. 1cand
ExtendedDataFig. 3).Pointsincloseproximityreflectapopulation-based
compositecolocalizationoftargets(‘meta-assemblages’).Weperformed
K-meansclusteringontheprojectionandderived21meta-assemblages
thatcorrespondlargelytoknowninteractingbiochemicalcomplexes,or
relatedgeneontologies(Fig. 1c,outerpie,andSupplementaryData 2 (1F,
1H, 2G–2I)). This probably represents a comprehensive predominant
protein architecture of the yeast genome (‘epigenome’) in rich media
(seeSupplementaryData 2 (1–8)foradeeperanalysis).
Overall,theorganizationdefinedbyUMAPrepresentsaremarkable
degreeofconcordanceandmutualvalidationofbiochemicallypurified
andfunctionallyannotatedcomplexeswiththeirarchitecturalorgani-
zationacrossagenome,particularlyfromanunsupervisedapproach.
Forexample,thepromotercofactorsMediator,SWI–SNF,SAGA,NuA4
andtheircognatessTFseachformedtightmeta-assemblagesthatwere
located near each other but far from gene-body elongation factors
(Fig. 1c).Proteinsofreplicationorigins,subtelomeresandcentromeres
also formed distinct tight meta-assemblages that were far from each
otherandfromgenemeta-assemblages.Thisprovidedstrongvalidation
oftheChIP–exo/seqapproachandepitopetagging.Notably,wecannow
linkmostssTFswiththeircognatecofactorsandpromoterarchitecture.
Proteinarchitectureatgenomicfeatures
DNA replication initiates at 253 autonomously replicating sequence
consensussequence(ACS)elementsthatareconstitutivelyboundby
origin recognition complexes (ORCs)6
. The ‘ORC’ meta-assemblage
contained six measured targets (Fig. 2a and Extended Data Fig. 4),
which gave highly structured ChIP–exo patterns based on ORCs and
the DNA helicase MCM, spread over roughly 300 base pairs. ORCs at
nucleosome-freeACSsengulfedaneighbouringnucleosome.Thebind-
ing of Mcm5 from ORCs was offset by 50–100 bp, consistent with a
recently published model based on cryo-electron microscopy7
.
SubtelomericX-elementsrepresentaheterochromaticenvironmentthat
isrepressedbysilentinformationregulators(SIRs),functionallysupporting
telomeres8
.Indeed,wefoundthatSIRproteinsformedastructurallyrobust
meta-assemblageonasinglenucleosome,centredonroughly300-bp
X-coreelements(XCEs),alongwithORC/MCMsandinsulatorssTFsat
twoflankingnucleosomes(Fig. 2b).KU(Yku70)andRIF(Rif1)complexes,
alongwithssTFsFkh1,Abf1andReb1,werepresentatthevastmajorityof
mappableX-elements.However,aSko1-mediatedTup1repressioncomplex
waspresentatonlyhalf,perhapsreflectingvariablerepressioncapabilities
ofsubtelomericregions.Thus,XCEsappeartocreateawellstructuredtriple
nucleosomeensemblecomprisingmajorrepressorproteins.
Thecentromericmeta-assemblage (‘CEN’)contained12targetsat16
centromeres(Fig. 2c),whichareresponsibleforproperchromosomal
segregationduringcelldivision.Theyincludedsite-specificallybound
Cbf1 at the centromere centre (CDE I) and kinetochore components
offset by roughly 100 bp towards the AT-rich CDE III elements9
. These
factors generated strong and well positioned crosslinks covering
roughly 170 bp of DNA, suggesting that they are positionally fixed to
CDEs. Condensin and cohesin play a part in chromosomal conden-
sation and segregation. They were absent from the centromere and
insteadoverlappedthesurroundingnucleosomes,suggestingthatthey
interactwithnucleosomes.Incontrastwithlower-resolutionmaps10,11
,
histones were not detected at centromeres, despite robust detection
of histone-like Cse4 and kinetochore components there, and robust
detection of histones (H2A, H2A.Z, H2B, H3 and H4) in the immediate
flankingregions12
.Thus,yeastcentromeresappeartolackthehistone
components of a nucleosome in vivo. The resident kinetochore com-
plexprotectsanucleosome-sizedregionofDNAfromnucleases,which
was a basis for a nucleosome originally being called there13
. Nonethe-
less, Cse4-containing nucleosomes have been defined biochemically
and structurally in vitro10,14
, and so the question remains open.
The Pol I complex produces ribosomal RNA (rRNA) from a single
highly repeated gene. It contained TBP anchored near the rRNA TSS
(Fig. 3a).Italsohadmajorcrosslinkinginteractionswiththewellposi-
tionedPol-I-specificupstreamactivatingfactor(UAF,Uaf30)complex,
whichcoveredroughly70 bpbetween−155 bpand−60 bpfromtheTSS.
UAFalsohadreciprocalcrosslinkswithTBPatthecorepromoter.Thus,
thePolIinitiationcomplexhasafixedbipartiteengagementthatcovers
around 200 bp of rRNA promoter DNA, with an intervening 100 bp or
so.ThebroadextensionofPolIdownstreamintotherRNAgenebody
withlessoccupancyatpromotersindicatesthatPolIdissociatesrapidly
from its PIC into an elongating state.
Pol III of the ‘POL3’ meta-assemblage transcribes 272 highly similar
genes encoding transfer RNAs (tRNAs). It contained 18 targets that
couldbeseparatedintoTFIIIB/CandPolIIImeta-assemblages(Fig. 3b).
Theirorganizationmatchedlocationsmodelledfromatomicstructures
oftheTFIIIB/PolIIIpromotercomplex15
,butwiththeTBPcomponent
of TFIIIB crosslinking approximately 30 bp upstream of the TSS. The
ChIP–exo pattern further demonstrated that TFIIIC and Pol III make
crosslinksnotonlyattheinternalAandBboxes,butalsoatcoincident
locations roughly 40 bp upstream of TBP. Owing to DNA bending by
a c
b
Elongation
and chromatin
regulation
Promoter
regulation
PIC
Pol II
Pol III
Replication
Colocalization of 371 targets
High
Low
All features in this study
Transcribed
(7,741)
Non-transcribed
(295)
Replication
(253)
X-element
(25)
Centromere
(16)
Coding
(6,121)
Non-coding
(1,346)
CUTs (447)
SUTs (365)
XUTs (440)
NCR (94)
Pol I
(2)
Pol II
(7,467)
Pol III
(272)
RPG
(137)
STM
(984)
TFO
(1,783)
UNB
(2,474)
LTR
(357) tRNA-proximal (135)
No PIC (251)
•••••••••
•••
• •• PIC
occupancy
TFIID-dominated SAGA-dominated
Other
(3,076)
Not analysed
(11,112)
–15
–10
–5
0
5
10
15
–15 –10 –5 0 5 10 15 20 25
Histones
Tup1
MET
SIR
NuA4
SAGA
RPD3-L
SBF
SWI/SNF
Mediator
GTFs
TFIID
SWR
ORC
Splicing
THO
Nrd1
CPSF
Pol II Spt5
RSC
CEN
TFIIIB/C
Pol III
ssTFs
PAF
Set1
ssTFs ISO
ssTFs
UMAP
Axis 2
Axis
1
Histones
Tup1
MET
SIR
NuA4
SAGA
RPD3-L
SBF
SWI/SNF
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
Mediator
GTFs
TFIID
SWR
ORC
S
THO
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
Nrd1
CPSF
Pol II
RSC
CEN
TFIIIB/C
ssTFs
Set1
ssTFs ISO
ssTFs
UMAP
Fig.1|Genome-widemeta-assemblages.a,Classesofgenomicfeatures,with
Nmembershipsanalysed(SupplementaryData 1 (1D)).PolIIclassesarefrom
thisstudy(see Methods),alongwithrelativePICoccupancylevels(green
dots). CUTs, crypticunstabletranscripts;SUTs, stableunannotated
transcripts;XUT, Xrn1-sensistiveunstabletranscripts;NCR,noncodingRNA.
b,Hierarchicalclusteringshowingthegenome-widecolocalizationof371
targets(SupplementaryData 3).c,UMAPprojectionshowingthecolocations
of371targets(colouredonthebasisofK-means;SupplementaryData 2
(1C,1D)).AU,arbitraryunits.
3. Nature | Vol 592 | 8 April 2021 | 311
TBP,thisregionisincloseproximitytoTFIIIB/CandPolIIIwithingene
bodies. Equivalent positions of crosslinking points were observed
acrossallTFIIIB/C/PolIIIsubunits.Thissuggeststhatasinglepredomi-
nantstructureenvelopesentirePolIIIgenesandapproximately70 bp
upstream, as it makes a short (roughly 80 bp) transcript.
There are around 7,500 distinct Pol II transcription units (defined
byaTSS/PIC),ofwhichapproximately80%codeforproteins.Targets
that are associated with transcription elongation generally matched
Pol II occupancy across gene bodies, but unlike Pol II (Rpb3) were not
presentatpromoters(Fig. 3candExtendedDataFig. 5).Instead,occu-
pancy within genes increased in the 5′ region and decreased in the 3′
region, with many having distinct ‘entry/exit’ points, consistent with
other studies16
. Whether these are true cotranscriptional entry/exit
pointsoraresimplycrosslinkableretentionsitesisnotclear.Termina-
tionfactorssuchasPcf11werefoundprimarilyatsitesoftermination,
alongwithnearbycohesin.Therewaslittleevidenceofthebindingof
a Subtelomeric X-elements (XCE)
b c Centromeres (CEN)
Histone H4
Kinetechore
Mcm16
C
b
f1
b
f
b
f
f
b
f
b
f
b
b
b
f
b
f
f
f
b
f
b
f
b
b
f1
f1
f1
f1
b
b
b
f
f
b
f
b
f
b
f
b
f
b
f
b
b
f
f
f
b
b
Cse4
Nkp2
Nucleosome dyads
CEN
Smc3
(Cohesin)
Cse4
Mcm16
Nkp2
Cbf1
0
–500 500
Distance from CEN start (bp)
Opposite
strand
Same
strand
Occupancy
(AU)
0
–500 500
Distance from ACS start (bp)
ACS
Nucleosome dyads
DNA replication origins (ACS)
Mcm5
Orc6
Orc6
Mcm5
N = 253
–500
Reb1
ORC
O
O
O
O
O
O
O
O
O
O
O
O
O
Fkh1
Tup1
Cyc8
Sko1
X
X
X
X
X
X
Abf1
Sir2,3,4, Yku70
, ,
C
C
C
C
Rif1,2
N = 25
0 500
Distance from XCE start (bp)
Nucleosome dyads
XCE
Fig.2|Architectureatnontranscribedfeatures.a–c,Averageddistribution
ofstrand-separated5′endsofChIP–exosequencingtags(exonucleasestop
sites;seeExtendedDataFig. 1b),showingrepresentativetargetsaround
strand-orientedannotatedfeatures.Thediagramsatthetopofeachpanelare
cartoonrepresentationsofDNA,nucleosomesandproteinfactorsthatbindto
DNAreplicationorigins (a),subtelomericX-elements (b)orcentromeres (c).
Therelevantstartsequences(colouredAs,Ts,CsandGs)arealsoshownin a, c.
Underneatharecompositedatashowingthedistributionoftheprotein
factors.Same-stranddataareorientedwith5′to3′tobereadfromlefttoright.
Opposite-stranddataareinverted(righttoleftis5′to3′).The y-axesshow
lineararbitraryunits(AU),whicharenotcomparableinmagnitudeacross
differentdatasets.NucleosomedyadswerederivedfromMNase-digested
chromatinthatwasassayedbyH3/H2BChIP–seq(strandsaveraged).
a
d
b
Occupancy
(AU)
Opposite
strand
Same
strand
Opposite
strand
Same
strand
TFIIIC-τA
(Tfc4) TFIIIC-τB
(Tfc6)
Pol III
(Rpo31)
TFIIIB
TBP (Spt15)
RNA polymerase III tRNA
TFIIIC
TFIIIB
0
–500 500
Occupancy
(AU)
Distance from Pol I TSS (bp)
0
–500 500
Distance from Pol III TSS (bp)
0
–500 500
Distance from Ty3 start (bp)
Opposite
strand
Same
strand
Pol I (Rpa135)
RNA polymerase I
TBP (Spt15)
UAF (Uaf30)
UCE
rRNA
TBP Pol I c
±500
Distance from Pol II TSS (bp) Distance from Pol II TES (bp)
Occupancy
(AU)
0
–500
RNA polymerase II
Rpb3
Set2
Set1
Elf1
Set3
0 500
Set2
Set1
Elf1
Set3
Nrd1
Pcf11
Rpb3
Pol II Ser2P
TES
Smd1
Spn1
Paf1
Spt5
Spt6
Spt16
Pol II
Ser5P
Spt16
Spt5
Cbc2 Spn1
Paf1
Spt6
+1 +2
TSS
mRNA
Sua7
Ste12
Kar4
Dig1
Transposon Ty3 (σ) LTR LTR
Ste12
Dig1
tRNA
Ty3
Occupancy
(AU)
A B
TFIIIC-τB
(Tfc6)
Fig.3|Architectureattranscribedfeatures.a–d,Experimentswerecarried
outasinFig. 2,butfortranscribedfeatures.Ina,UCEisanupstream control
elementatPolIpromoters.In b,AandBareboxelementsatPolIIIpromoters.
Inc,theproteinarchitectureforRP genesisshown (notstrandseparated);
Ser2PandSer52Parephosphorylatedserines2and5ofheptadrepeats;grey
arrowsshow nucleosomedyads.
4. 312 | Nature | Vol 592 | 8 April 2021
Article
elongation/termination-associatedfactorsbeingrestrictedtospecific
sets of genes, except that Nrd1 of the early termination pathway was
enriched at noncoding transcription (ncRNA) units (Extended Data
Fig. 5a,lowerleft).Inaddition,RNA splicingfactors(suchasSmd1)were
largelylimitedtothe3′halfofintronic genesencodingribosomalpro-
teins(RPs;ExtendedDataFig. 5b,upperright).Thedataareconsistent
with one predominant elongation entourage at most Pol II genes that
changesincompositionatfixeddistancesfromtheTSS or transcription
end site (TES) (rather than at a percentage of gene length).
Consistentwithsomeotherreports17,18
,althoughnotall19–21
,wefound
no evidence for Mediator being stably associated with the Pol II core
initiation or elongation entourage, despite its detection in upstream
promoter regulatory regions (for example, Med2 in Extended Data
Fig. 5b). Equivocal binding in gene bodies may be related to approxi-
mately 100 genes that produced relatively high and variable back-
ground in ChIP assays (see Methods).
Thelongterminalrepeats(LTRs)ofcertainclassesofTytransposons
aretranscribedbyPolIIaspartofretroviral-liketransposition22
.However,
mostlackedaPIC,exceptasubsetoffull-lengthTy1,2(δ)(ExtendedData
Fig. 6).AtTy3(σ),thePolIIpheromonefactorsSte12,Dig1andKar4were
assembledandhadnearlyidenticalpointsofcrosslinking(Fig. 3d).How-
ever,insteadofPolII,wedetectedthePolIIImachineryassociatedwith
adjacentdivergenttRNAgenes.ThissuggeststhatPolIIssTFsmaywork
withPolIIIatsometRNAgenestointegratematingandTy3transposition22
.
Inducibleversusconstitutivepromoters
In classifying Pol II promoters, we opted against an unsupervised
approach,asittreatsbindingeventsequivalently,withoutconsidering
thatcertaintargetshaveamorecentralroleindefiningspecificregula-
tory architectures. Four fundamentally distinct architectural themes
emerged (see Methods, Fig. 4a and Supplementary Data 1 (1D)): first,
an RP theme, as seen for 137 RP promoters with unique architectures
(examinedseparately23
);second,anSTMtheme,asfor984promoters
that had properties associated with inducibility, and characteristi-
callyboundbyssTFsandmajorcofactormeta-assemblagesSAGA,TUP
and/orMediator/SWI–SNF;third,aTFOtheme,from1,783promoters
with a ssTF organization that lacked STM cofactors (but typically had
the insulator ssTFs Abf1 or Reb1); and fourth, a UNB theme, as seen
with 2,474 promoters that were unbound by anything except a PIC.
Notably, as detailed in the Supplementary Information, the consen-
sus architecture at TFO/UNB promoters indicates that two-thirds of
all promoters evolved to lack regulation by ssTFs and their cofactors
under any condition (not just in rich media). This is an architecture
suitableforconstitutivelylowgeneexpression.RPandSTMrepresent
thearchitectureofinduciblepromotersthathaveupstreamactivator
sequences(UASs).Theroughly1,300ncRNApromotersweresimilarly
classified(SupplementaryData 1 (E)),indicatingthattheyaregoverned
by the same regulatory mechanisms.
Assembly of Pol II PICs occurs in the context of chromatin, where
the TSS resides on the inside edge of a downstream +1 nucleosome
(Fig. 4b).MostpromotershaveaconstitutiveNFR.Theseeminglyinter-
changeableterm‘NDR’—applyingtonucleosomedepletionmediated
by ssTFs—is problematic. As ssTFs are absent from UNB promoters,
they should lack ssTF-regulated nucleosome depletion and an NDR.
We therefore considered whether NFRs and NDRs are distinct.
NFRs at TFO/UNB promoters were short (less than 150 bp) and
bisected by a pair of oppositely stranded, nucleosome-disfavouring
In vitro reconstitution
0
–700 300
Poly
(T:A)
+ INO80
NDR
NFR
Nucleosome
dyads
+
RSC
In vivo
In vitro
Occupancy
(AU)
a b c
N = 984
N = 1,783
N = 2,474
PIC
TFIIB
(Sua7)
N = 1,783
N ,
PIC
TFIIB
(Sua7)
N = 984
N
H2A.Z
NDR
Nucleosome
dyads
+1
–1
Nucleosome
dyads
TSS
Insulator ssTFs
GENE
NFR
Stable
–2 –1 +1
ssTFs and
cofactors
0
–700 300
Distance from +1 nucleosome dyad (bp)
STM
UNB
TFO
d
e
0
–700 300
UNB
(1,097)
Occupancy
(AU)
Reb1
PIC (Sua7)
Pcf11
0
–700 300
Distance from +1 nucleosome dyad (bp)
TFO
(292)
Insulation: tandem genes
Insulation: divergent genes
–0.1
0.0
0.1
0.2
0.3
0.4
Correlation
Divergent
transcription
N
STM UNB
TFO
RP
Reb1-
bound
Parent
No
AA
Rap1
AA
Reb1
AA
111 237 78 388
Nascent transcription
TF
PIC
STM
TFO
UNB
Correlated transcription
Pol II promoters
RPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-STMSTMSTMSTMSTMSTMRP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-TFOTFOTFOTFOTFOTFOTFOTFOTFOTFOSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTM TFOTFOTFOTFOTFOTFOTFOALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLSTMSTMSTMSTMALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALL
LeaMudPrpPrpPrpRtsSmTafTafTaf4Taf5Taf6Taf9TafTafTafTafTaf7Taf8BcyCsnElf1NrdPubRpoSenTpkBreBreCtkCtkDbpHosHosNplRadSdcSetSetSgvShgSif2SppSwdSwdEafEafEsaNhpStbYngSgfSptAdaSgfVidGcnNggSptArpHsfDatIfh1Rtr1AcsCcaChdCkaDstFunHstLysNabSccTopFhl1HmRebRvbAorArpBdfBdfHtzSwcSwVpsVpsAbfAzfCrzRpnSteMsnMssRtgSrbMedSrbSrbSsnSsnUbpSptSptRtt1NrmPhoRpdRxtRxt2Rxt3SapSin3GcnUmAroHotSnfSwSwHapLeuPipStbStpTeaHprMftRlr1SetThoThpOafCycTupCinCupRdsSkoStpTbsNrgNrgPhdSfl1SokSutCseMedMedNutRgrSin4SohSrbRtt1HalMacYapYrr1ZapFzf1Hir2Hir3StpGlnSptSptMetMetMetPdrSumStbWhNddFkhMbSwSwAceSnfGalMedPgdAft2HmAft1SwSknHtl1RscRscRscRscRscSfhSthRscGzfRfx1GisBasMigMigRlmSnfTdaUmNutHemRoxEcmFkhGcrMcPdrStbUrcWaGodRif1Rif2LysRapSfpMotBurNcbTaf3CTDCTDCclKinSptSsl1SubTaf2TfbTfbTfbToaByeRadSsl2SuaTfaTfa2TfbTfgIno2Ino4MatRscCTDRpbRpbRpbRpbRpbYtaCft1NafPapPcfRef2RnaRtt1SwdYshIswBurCbcSpnPobSptSptSptSptSptCtr9LeoPafRtf1VidHstTbfRet
AprAprApro
AprAprAprAproatioatioatioatioatioatioatioatioatioatioatioatioeraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
merasclusclusclusclusclusclusclusclusclusclusclusclusclusclusclusclusclusatinatinatinatinatinatinatinatinatinatinmeras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
atinatinatinatinatinatinatinatinatinatinmeras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
Apro
Apro
AprAprAprApro
atinatinatinatinatinatinatinatinatinatinatinatinatinatinatinmeras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
atinatinatinatinatinatinatinmeras
meras
meras
meras
meras
meras
meras
meras
meras
meras
atinatinatinatinatinatinatinatinatinatinatinatinatinatinatinatin clusclusclusclusclusclusclusclusclusclusclusclusclusclus
meras
meras
meras
meras
meras
merasatioatioatioatioatioatioatioationatioatioationatioatioatioatioatioatioatioatioatioatioatioatioatioatioatioatioeraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraserasclusclusclus
Polym
7 7 7 7 7 7 7 14 14 14 14 14 14 36 36 36 36 36 36 20 20 20 20 20 20 20 20 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 18 18 18 18 18 18 18 18 18 18 29 29 29 29 29 29 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 11 11 11 11 11 11 11 11 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 2 2 2 2 2 2 2 2 2 2 16 16 16 16 16 16 16 16 16 16 16 19 19 19 19 19 19 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 24 24 24 24 24 24 24 24 24 24 24 24 24 24 31 31 31 31 31 31 31 31 31 31 31 31 32 32 32 32 32 32 32 39 39 39 39 39 39 39 39 39 39 13 13 13 13 13 13 13 13 13 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 15 15 15 15 18 18 18 18 18 18 26 26 26 26 28 28 28 28 28 28 28 28 28 28 28 28 0 0 0 0 0 0 0 0 0 0 0 3 3 3 3 3 3 3 3 22 22 22 22 22 22 22 22 22 33 33 33 33 33 33 33 33 33 33 34 34 34 34 35 35 35 4
SPLSPLSPLSPLSPLSPLSPLTAFTAFTAFTAFTAFTAFTFIID
TFIID
TFIID
TFIID
TFIID
TFIIDNRDNRDNRDNRDNRDNRDNRDNRDSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETNUANUANUANUANUANUANUANUANUANUAHATHATHATHATHATHATISOISOISOISOISOISOISOISOISOISOISOISOISOISOISOISOISOSWRSWRSWRSWRSWRSWRSWRSWRSWRSWR
MDHMDHMDHMDH
MDHMDHMDH
MDHMDHMDHMDH
MDHMDHMDHMDHMDHMDH
MDHRPDRPDRPDRPDRPDRPDRPDRPDRPDRPDSWSWSWSWSWSWSWSWSWSWSWTHOTHOTHOTHOTHOTHOTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUP
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDMMETMETMETMETMETMETMETMETMETMETMETMETSBFSBFSBFSBFSBFSBFSBFMDTMDTMDTMDTMDTMDTMDTMDTMDTMDTRSCRSCRSCRSCRSCRSCRSCRSCRSCTUPTUPTUPTUPTUPTUPTUPISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4NUANUANUANUANUANUANC2NC2NC2NC2TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIHTFIITFIITFIITFIITFIITFIITFIITFIITFIITFIITFIIPOLPOLPOLPOLPOLPOLPOLPOLCPSCPSCPSCPSCPSCPSCPSCPSCPSDSIFDSIFDSIFDSIFDSIFDSIFDSIFDSIFDSIFDSIFPAFPAFPAFPAFISO2ISO2ISO2POL
OthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthCofCofCofCofCofCofCofCofCofCofCofCofCofCofOthTF CofCofOthOthOthOthOthOthOthOthOthOthOthOthTF TF TF OthOthOthOthOthOthOthOthOthOthTF TF TF TF TF CofCofCofCofCofCofCofCofCofCofCofCofOthCofCofCofCofCofCofCofCofTF TF CofCofCofCofCofTF TF TF TF TF TF OthOthOthOthOthOthCofCofCofTF TF TF TF TF TF TF TF TF TF TF TF CofCofCofCofCofCofCofCofOthTF TF TF TF TF CofCofCofCofOthOthOthTF TF TF TF TF CofCofOthTF TF TF TF CofCofCofCofCofTF TF TF TF TF CofCofCofCofCofCofCofCofTF CofCofCofTF TF TF TF CofCofCofCofOthOthTF TF TF TF TF TF TF TF OthOthOthTF TF TF OthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthTF TF TF CofOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthCofOthTF Oth
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 6 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 7 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 7 # # # # # # # # # # # # # # # # # # # # # 7 # # 7 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 4 # # # # # # # # # # # # # # # # 2 # # # # # # # # # # # # # # # # #
PIC
(Sua7)
Abf1 Reb1
TFO
UNB
H2A.Z
STM
RP
371 targets
H2A.Z
Fig.4|Classificationofinducible,insulatedandconstitutivePolII
promoters.a,Individualpromoters(rows)canbegroupedintofour
architecturalthemes(colouredboxes)andsortedbyPICoccupancylevel.
Targetsarelistedatthetopofcolumns,witharrowsdenotingAbf1,Reb1and
H2A.Z.Blacklinesdenotetargetbinding(SupplementaryData 2 (3)).b,Top,
diagram,andbottom,examplecompositedatafortheSTM,TFOandUNB
classes.‘ssTFsandcofactors’representsacombinedsetoftargetlocations
determinedbyChExMixforthosetargetslabelledassuchinSupplementary
Data 2 (1K),includingssTFs,SAGA,TUPandMediator.c,Compositedatashow
thatSTMpromotershaveNDRs,whereasTFOandUNBpromotershaveNFRs.
In vitronucleosomesassembledwithpurifiedgenomicDNAandhistones
(blackfilledareas)hadATPpluseitherpurifiedRSC(yellow)orINO80(purple)
added(datafromref.24
).Poly(T:A)regionsaresense-strandtracts(largerthan
5 bp)ofAs(red)orTs(green). d,Insulator ssTFsuncoupledivergent
transcription.Dataonnascenttranscription(CRACdata26
)forcontrolstrains
orstrainsdepletedofRap1orReb1bytheanchor-away(AA)techniquewere
collectedforN divergentgenepairssharingthesamepromoterregion,then
correlatedbetweenthegenepairs.Totherightarediagramsofdivergentgene
pairs,withthedifferentialsizeofeachgreenarrowpairreflectingtheextentof
insulation. e,TheterminationfactorPcf11accumulatesatinsulatorssTFs.
Shownisthearchitectureatpromotersadjacenttoanupstreamtermination
region(tandemgenes)andhaving(TFO)orlacking(UNB)aninsulatorssTF.
5. Nature | Vol 592 | 8 April 2021 | 313
poly(dA:dT)tracts(Fig. 4c,red/green).NFRshavebeenbiochemically
reconstitutedongenomicDNAwithpurifiedhistonesandchromatin
remodellers24
. When applied to our promoter classes, we found that
histonesalonepartiallyreconstitutedNFRsin vitroatTFO/UNBpromot-
ers,butlesseffectivelyatSTMpromoters(Fig. 4c,compareblack-filled
dipswithin vivoplots,andExtendedDataFig. 7a).TFO/UNBNFRswere
widened by the RSC remodeller (Fig. 4c, compare the yellow-filled
widerdipwiththeblack-filledareas)andhadtheir−1/+1nucleosomes
positionedbythechromatin-remodellingATPaseINO80(purplefill)24
.
STM promoter nucleosomes, by contrast, had an intrinsic capacity
to form nucleosomes and were less responsive to RSC and INO80
(Fig. 4c,verticalarrowaround−400).TheyboundtossTFsandcofac-
tors in vivo (Fig. 4b, magenta), and were nucleosome-depleted at the
−1/−2nucleosomepositions.Thesesameregionshavebeeninterpreted
tohaveMNase-sensitive‘fragile’nucleosomesin vivo(Supplementary
Data 1 (BX); 69% were ‘fragile’ at STMs versus 19% at UNBs). However,
our data indicate that sensitivity to MNase might reflect the binding
of ssTFs/cofactors rather than unstable nucleosomes25
. Thus, induc-
ible promoters have NDRs, while constitutive promoters have NFRs.
Inthecompactyeastgenome,promotersandterminatorsoftenshare
thesameNFRs or NDRsatadjacentgenes,withthepotentialtomutually
influencetheirexpressionunlessinsulated26
.Insupportofthis,PICoccu-
pancyatdivergentpromoterpairswaslesscorrelatedatTFOpromoters,
whichhaveinsulatorssTFs,comparedwithUNBpromoters(Extended
Data Fig. 7b). The same was observed for divergent nascent transcrip-
tion (Fig. 4d). RP/STM divergent promoters also showed low nascent
transcriptioncorrelation.Anchor-awayremovalofRap1,whichbindsRP/
STMpromoters,resultedinahighercorrelation(Fig. 4d,red).Thiswas
notobservedwithremovalofReb1,whichmainlybindsTFOpromoters.
RemovalofReb1,butnotRap1,resultedinhighercorrelationsatTFOand
Reb1-boundpromoters(Fig. 4d,cyan).Asanegativecontrol,removalof
Rap1hadlittleeffectatReb1-boundpromoters.Wesuggestthatinsula-
tor ssTFs such as Rap1 and Reb1 uncouple divergent transcription at
promoters to which they bind. Similarly, where a gene terminator is
shared with a promoter (tandem genes), the termination factor Pcf11
overlapped with the adjacent PIC, unless an insulator ssTF intervened
(Fig. 4eandExtendedDataFig. 7c).Thisfindingsupportspriorconclu-
sionsoninsulatorsthatwerebasedonnascenttranscription26
.
Taken together, these results suggest that the assembly of PICs is
mechanistically tied to PIC assembly at adjacent upstream divergent
genes,andtotranscriptionterminationattandemgenes,unlessthese
eventsareinsulated.Insucharchitecturalarrangements,someinsula-
torssTFsmaynotactasdirecteffectorsoftranscriptionbyrecruiting
cofactors,butinsteadinsulateanddirect−1/+1nucleosomeposition-
ing24
. Others may recruit cofactors in a condition-specific way.
ssTF–cofactorinteractionsandcircuits
A comprehensive set of 78 ssTFs were detectably bound to promot-
ers in rich media (Supplementary Data 2 (1K)). A search of the JASPAR
databaseofinteractionsbetweenssTFsandsequencemotifsindepen-
dently confirmed proper motif specificity for 90% of the ssTFs (Sup-
plementaryData 2 (1M)).SomessTFshadrobustChIP–exopatterning
aroundtheircognatemotif(ExtendedDataFig. 8a;forexample,Cup9
andCin5),whichreflectstheirsite-specificstructuralinteractionswith
DNAonagenomicscale.Remarkably,mostssTFshadrelativelydiffuse
ChIP–exo patterning flanking their motif (Extended Data Fig. 8a; for
example,Nrg1,Bas1andYrr1).AsexemplifiedbyYrr1inFig. 5a(magenta
versus cyan areas), the diffuse patterning of ssTFs was particularly
pronouncedatsiteswithmultipleSTMcofactorspresent(forexample,
SAGA, TUP, Mediator, SWI–SNF and RPD3-L), and less diffuse at other
sites that bind the same ssTFs but lack STM cofactors. STM cofactors
mayimpartadistinctlocalenvironmentthatresultsinmoredispersed
crosslinking.ThesamediffusepatterningoccurredwithSTMcofactors
whichwereanchoredatssTFsites(Fig. 5aandExtendedDataFig. 8b).
As they tend to co-occupy the same set of promoters (Extended Data
Fig. 9a, Supplementary Data 2 (1K)), ssTFs might coexist with multi-
ple positive/negative cofactors of chromatin accessibility and Pol II
recruitment. This diffuse patterning is consistent with the notion of
condensates that are anchored by ssTFs27
.
In contrast to STM cofactors, we detected essentially no ChIP–exo
patterning of TFIID, TBPs or any GTFs at a consolidated set of ssTF
sites, despite identifying these GTFs in the periphery where TSSs
reside (Fig. 5b and Extended Data Fig. 9b). Thus, although using the
same paradigm for detecting ssTF–cofactor interactions, our results
in yeast do not support the long-standing model that ssTFs stably
engage TFIID at promoters. PIC assembly is driven by TFIID at nearly
allgenes28
,althoughatinduciblegenesitisaugmentedthroughSAGA
independently of TFIID28–30
. Although the gene specificity of SAGA
has been enigmatic and controversial31
, the ChIP–exo assay detects
SAGAatonlyasubsetofgenes.Thediscrepancymayresideinthelow
specificity of other assays32
.
We addressed the specificity of SAGA further. As a direct readout
of TFIID-independent PIC assembly, we expected high levels of GTFs
relative to TFIID where SAGA is bound. However, we found that most
SAGA-boundpromoters(RP/STM/‘SAGA-bound’)lackedhighratiosof
GTFstoTFIID,althoughasmallerfractiondidhavehighratios(equiva-
lentmodesandrightwardtailinFig. 5candExtendedDataFig. 9c).Thus,
SAGA binding is not always concomitant with TFIID-independent PIC
assembly, and may reflect a poised state. Instead, promoters having
multipleSTMcofactorsdisplayedhighGTF/TFIIDratios(‘STM-bound’
and ‘RSTM-bound’ in Fig. 5c). Thus, maximal TFIID-independent PIC
assembly is achieved under conditions in which there is maximal
engagementofawidevarietyofnegativeandpositivessTFsandcofac-
tors with NDRs, including but not limited to SAGA.
PromotersboundbyssTFsincludedbothcognate(motif-based)and
noncognateinteractions(ExtendedDataFig. 10).Inassessingcognate
interactions, we found that most ssTFs bound to the promoters of
a
0
–500 500
Distance from Yrr1 motif (bp)
Opposite
strand
Same
strand
Occupancy
(AU)
Yrr1
STM
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
ST
T
T
T
T
T
T
T
T
T
T
T
T
T
T
TM
M
M
M
M
M
M
M
M
M
M
M
M
M
M
Yrr1
λexo
TUP (Tup1)
SAGA (Sgf73)
Mediator (Med2)
ssTF (Yrr1)
ssTF (Yrr1)
Mediator
(Med2)
STM-bound
Yrr1 sites
Not STM-bound
c
RP
UNB
TFO
670
Frequency
TFIID
PIC
‘SAGA-bound’
PIC/TFIID (GTF/Taf2) log2 ratio (AU)
‘STM-bound’
‘RSTM-bound’
N = 109
N = 305
N = 52
b
0
–500 500
Occupancy
(AU)
Distance from consolidated ssTF motif (bp)
(Taf12)
SAGA and TFIID
Mediator (Med2)
SAGA (Sgf73)
TFIID
(Taf2)
bkd
TBP
(Spt15)
TFIIB
(Sua7)
N = 52
–2.5 –0.5 1.5 3.5 5.5
0.3
0.2
0.1
0.0
Fig.5|ssTFsstablyinteractwithSTMcofactorsbutnotGTFs. a,Architecture
atYrr1motifsintwoclassesofYrr1-boundpromoters:‘STM-bound’(labelson
left)and‘notSTM-bound’(cyanandblacklabelsonright)(Methods).Thearrow
pointstowherecofactorcrosslinkingpermeatesYrr1crosslinking.
b,RepresentativearchitectureofSTMcofactorsorPICcomponentsata
consolidatedsetofssTF-bindingmotifsatRSTMpromoters(strandaveraged;
see MethodsandSupplementaryData 1 (1AI)),andorientedbyTSS.Taf12isin
SAGAandTFIID;bkd,backgroundthatwasgeneratedfromastrainlackinga
TAPtag.c,FrequencydistributionofpromotershavingtheindicatedPIC/TFIID
ratios(averageofsixGTFs;three-binmovingaverage),separatedbypromoter
class(RP,STM,TFOorUNB)orpromotersetsbasedoncofactorenrichment.
‘SAGA-bound’excludesRPpromoters,whicharehighlyenrichedwithSAGA
andshownseparately.The‘STM-bound’promotersetrequiredallofthe
followingtobepresent:SAGA,Mediator/SWI/SNFandTUP;‘RSTM-bound’also
requiredthepresenceoftheRPD3-Lcomplex.Thex-axisisinarbitraryunits.
7. Methods
No statistical methods were used to predetermine sample size. The
experiments were not randomized and the investigators were not
blinded to allocation during experiments and outcome assessment.
Strainsandantibodies
The vast majority of data for this study were collected from tandem
affinity purification (TAP)-tagged S. cerevisiae strains (originally pur-
chased from Dharmacon; now available from Horizon Inspired Cell
Solutions, Cambridge, UK). The background strain for this collection
was BY4741 (a derivative of S288-C; MATa his3Δ1 leu2Δ0 met15Δ0
ura3Δ0). Negative control ChIPs and ChIPs with specific antibodies
were performed with BY4741. If the TAP-tagged strain for a particular
targetwasunavailable,weinsteadusedahaemagglutinin(HA)-tagged
strain(originallypurchasedfromDharmacon;nowavailablefromHori-
zonInspiredCellSolutions).ThebackgroundstrainfortheHA-tagged
collection was diploid, derived from BY4741 and designated Y800
(MATa leu2-D98cry1R/MATα leu2-D98CRY1 ade2-101 HIS3/ade2-101
his3-D200ura3-52caniR/ura3-52CAN1lys2-801/lys2-801CYH2/cyh2R
trp1-1/TRP1 Cir0 carrying pGAL-cre (amp, ori, CEN, LEU2)).
Rabbit IgG (Sigma, catalogue number I5006, various lot numbers)
conjugatedtoDynabeadswasusedtoimmunoprecipitatechromatin
fromTAP-taggedstrains.SantaCruzBiotechnologysc-7392antibody
was used to immunoprecipitate chromatin from HA-tagged strains.
Millipore antibodies 04-1570-I, 04-1571-I or 04-1572-I were used to
immunoprecipitatePolIIhavingitscarboxy-terminaldomainphospho-
rylatedatpositionsserine7,2or5,respectively,oftheheptadrepeats.
Milliporeantibody07-352wasusedtoimmunoprecipitatehistoneH3
with acetylated lysine 9 (H3K9ac). Cell Signaling antibody 5546S was
usedtoimmunoprecipitatehistoneH2Bwithubiquitinatedlysine123
(H2BK123ub). Cse4 antibody from C. Wu (Johns Hopkins Univ., Balti-
more, MD) was used to immunoprecipitate Cse4. Heat shock factor 1
(Hsf1)antibodyfromD.Gross(LouisianaStateUniv.,BatonRouge,LA)
wasusedtoimmunoprecipitateHsf1.ChIP–seqexperimentsusingmic-
rococcalnuclease(MNase)toidentifynucleosomeswereperformedfor
thefollowinghistonesandhistonemodifications:H3(detectedusing
Abcam antibody ab1971), H3K27ac (ab4729), H3K36me3 (ab9050),
H3K4me3(ab8580),H3K79me3(ab2621),H3K12ac(ab46983)andH2B
(Active Motif 39237).
CellgrowthandChIP–exo
S. cerevisiae strains were grown in 67 ml of yeast peptone dextrose
(YPD) media to an optical density at 600 nm (OD600) of 0.8 at 25 °C.
Cellswerecrosslinkedwithformaldehydeatafinalconcentrationof1%
for15 minat25 °C,andquenchedwithafinalconcentrationof125 mM
glycine for 5 min. Cells were collected by centrifugation, and washed
in 1 ml of ST buffer (10 mM Tris-HCl, pH 7.5, 100 mM NaCl) at 4 °C. The
cellswerepelletedagain,thesupernatantwasremoved,andthepellet
was flash frozen.
AsSTMclassificationcriteriaincludedpromotersthatbecamebound
bySAGAuponacuteheatshock(asdescribed36
),wecarriedoutequiva-
lent heat-shock experiments but using the workflow of this study. We
used these new data to assign heat-shock-induced binding locations
ofSAGA(whichcorrelatedhighlywithbindinglocationsinref.36
).For
theseheat-shocksamples,yeastwasgrownin67 mlofYPDtoanOD600 of
0.8at25 °C;anequalvolumeofYPDmediumat55 °Cwasaddedtoraise
thetemperatureofthecultureto37 °Candincubatedat37 °Cfor6 min.
Then, cells were crosslinked with formaldehyde at a final concentra-
tionof1%for15 minatroomtemperaturebyaddinga50 mlsolutionof
ice-cold3.7%formaldehydeinwater.Notethatprotein–DNAcrosslinks
occurrapidly.Crosslinkingwasquenchedwithafinalconcentrationof
125 mM glycine for 5 min. Cells were collected by centrifugation, and
washed in 1 ml of ST buffer at 4 °C. The cells were pelleted again, the
supernatant was removed, and the pellet was flash frozen.
Chromatin preparations are based on modifications of a prior pro-
tocol1
.Frozencellpelletswereresuspendedandlysedin1 mlofFAlysis
buffer(50 mMHepes-KOH,pH 7.5,150 mMNaCl,2 mMEDTA,1%Triton,
0.1%sodiumdeoxycholateandcompleteproteaseinhibitor(CPI))and
a 500 μl volume of 0.5 mm zirconia/silica beads by bead beating in a
Mini-Beadbeater-96 machine (Biospec) for three cycles each of three
minutes on/seven minutes off (samples were kept in a −20 °C freezer
during the off cycle). The lysates were transferred to a new tube and
microcentrifugedatmaximumspeedfor3 minat4 °Ctopelletthechro-
matin.Thesupernatantswerediscarded;thepelletswereresuspended
in600 μlofFAlysisbufferandtransferredto15 mlpolystyreneconical
tubes containing 300 μl of 0.1 mm zirconia/silica beads. The samples
were then sonicated in a Bioruptor Pico (Diagenode) for 8 cycles (15 s
on/30 soff)toobtainDNAfragmentsof100–500 bpinsize.EachChIP–
exo assay processed the equivalent of 33 ml of cell culture (roughly
8 × 108
cells).Theremaininghalfoftheprocessedchromatinwasflash
frozenandstoredat−80 °Cincaseatechnicalreplicatewasdesired.
Acultureequivalentof33 ml(roughly630millioncells)ofyeastwas
fragmentedtoproducesolubilizedchromatin(roughly190 μl).Thiswas
incubated overnight (roughly 16 h) at 4 °C with the appropriate anti-
body.A10 μlbedvolumeofconjugatedIgG–Dynabeads(0.83 mg ml−1
IgGand5 mg ml−1
Dynabeads)or3 μgofspecificantibodieswitha10 μl
slurry-equivalent of Protein A Mag Sepharose (GE Healthcare) was
used in each reaction.
ChIP–exo5.0wasperformedasdescribed1
.Essentially,ChIPlibraries
werepartiallyconstructedontheimmunoprecipitatedresin,andthen
λexonucleasewasusedtotrimnucleotidesinthe5′to3′directionuntil
stopped by a protein–DNA crosslink. The DNA was then eluted and
library construction completed.
In a typical experiment with TAP-tagged yeast strains, 48 ChIP–exo
experimentswereperformedconcurrently.Eachsetincluded46unique
targets,aReb1–TAPsampleasapositivecontrol,andaBY4741sample
(from a parental strain lacking the TAP tag) as a negative control. Fol-
lowing 18 cycles of polymerase chain reaction (PCR), all 48 samples
were pooled equally by volume. Library concentration was quanti-
fiedbyquantitativePCR(qPCR).Equivalentworkflowsoccurredwith
other strains.
Using paired-end Illumina sequencing and cellular conditions
identical to those used to produce ChIP–exo data, we generated a
genome-widenucleosomemap(MNasehistoneH3andH2BChIP–seq)
with improved accuracy over our prior maps. MNase ChIP–seq was
performedasdescribed37
.Briefly,formaldehyde-crosslinkedchroma-
tin was digested with MNase to achieve roughly 80% of mononucle-
osomes.AfterH3orH2BChIPandlibraryconstruction,librarieswere
size selected by agarose gel electrophoresis, and sequenced.
Sequencingandmapping
High-throughputDNAsequencingwasperformedwithanIlluminaNext-
Seq500or550inpaired-endmode,producinga40 bpRead_1anda36 bp
Read_2.AdditionalpreviouslypublishedChIP–exodatasets23,36
forHsf1,
Msn2, Spt15, Spt16, Ifh1 and Fhl1 were included in data processing and
analysisforourstudy.Dataweremanaged,qualitycontrolled,andpro-
cessedthroughacustomautomatedworkflowcontrolcalledPEGR(Plat-
formforEpi-GenomicResearch)38
.Sequencereadswerealignedtothe
yeast(sacCer3)genomeusingbwa-mem(version0.7.17).Alignedreads
werefilteredusingPicard(version2.7.1)39
andsamtools(version0.1.18)40
to remove PCR duplicates (that is, where the 5′ coordinates-strand of
Read_1andRead_2wereidenticaltoanotherreadpair)andnon-uniquely
mapping reads. For ChIP–exo, the resulting mapped 5′ end of Read_1
(theexonucleasestopsite)isdefinedasatag.ForMNase,theresulting
mappedmidpointofRead_1andRead_2isdefinedasatag.
Dataquality,statisticsandreproducibility
We tested many targets that were not expected to bind directly to
DNA, and thus could not assume that every target would produce a
8. Article
positive ChIP signal. We empirically determined that a minimum of
200,000 deduplicated tags were required to assess the quality of an
individual dataset. If a dataset received less than 200,000 tags, then
we required the tag duplication level (number of reads discarded by
PICARD)/(number of input reads) of the sample to be less than 70%
before we sequenced it more deeply. For example, if a dataset had
100,000mappablededuplicatedtags(uniqueRead_1andRead_2com-
bination), but a total of 1 million mappable tags before filtering, then
the duplication level was 90% and it was assumed that the library was
insufficiently complex to warrant additional sequencing. If a library
was insufficiently complex, we performed a technical replicate with
theremainderofthechromatinpreparation.Followingthisprocedure,
we produced a sufficiently complex library for more than 95% of tar-
getstestedfromasingleyeastculture.Inpractice,poolingequivalent
proportions of 48 barcoded libraries (in terms of reaction volumes)
provided similar sequencing depth across all samples. All analysed
dataset were confirmed with independent biological replicates that
passedourquality-controlmetrics.Adatasetwasconsideredsuccess-
fulifsignificantlocations(binomial,1.5-fold,P < 0.01)wereidentified
by ChExMix (see below) and these locations were not in regions that
producehighlyvariabledata.Nvaluesarereportedforthenumberof
target datasets (hierarchical clustering and UMAP) or the number of
genomic features (composite plots and heatmaps) analysed.
RawFASTQreadsforeachsamplewerealignedagainsttheknownTAP
or HA FASTA sequence and nearby genomic sequence to confirm the
presenceandlocationoftheepitopeineachstrain.See03_EpitopeID
at https://github.com/CEGRcode/2021-Rossi_Nature.
Mappingstatisticsforeachdatasetareavailableatyeastepigenome.
org, along with mapped data downloads. Analyses shown at yeast-
epigenome.org can be reproduced or further custom analysed using
ScriptManager(https://github.com/CEGRcode/scriptmanager),which
provides a simple user-friendly interface. It includes straightforward
instructionsforinstallationandfordataanalysis.Datavaluesfromthe
paper’scompositeplotscanbefoundin01_Composite_Filesathttps://
github.com/CEGRcode/2021-Rossi_Nature.
ChExMixlocations
ChExMix41
version 0.31 was run with the following non-default
parameters: --noread2 --scalewin 1000 --minmodelupdateevents 50
--fixedalpha0--mememinw8--mememaxw21--minmodelupdaterefs25
--
lenientplus. We also used the --excludebed option to exclude from
analysis a custom set of hypervariable regions (ChExMix_Peak_Filter_
List_190612.bed),includingtherDNAlocus,tRNAgenesandtelomere
regions (this list is available in 02_References_and_Features_Files at
https://github.com/CEGRcode/2021-Rossi_Nature).Bydefault,ChEx-
Mixrequiresthetagcountatbindingeventstoachieveatleast1.5-fold
enrichmentandaminimumBenjamini–Hochberg42
correctedPvalue
of0.01(binomial),comparedwiththescaled‘masterNoTag_20180928’
negativecontrolcount.Allexperimentsforagivenproteintargetwere
analysed by ChExMix individually. The resulting peak calls for each
individual replicate experiment can be found at yeastepigenome.org
or the National Center for Biotechnology Information (NCBI) Gene
Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/). In
addition, the --lenientplus option enables a multireplicate reproduc-
ibilityassessmentmodeinChExMix.Usingthisfeature,replicateexperi-
ments passing quality control were analysed simultaneously, and the
resulting joint peak calls were used to classify Pol II features (see the
section on ‘Pol II promoter classes’ below). Locations are defined as
ChExMixpeaksiftheirtagcountspassthethresholdsinthecombined
meta-experiment (essentially merging tag counts across replicates),
orinoneormoreindividualreplicateexperiments.However,locations
are reported only if the normalization of ChIP–seq (NCIS)-scaled tag
countsdidnotvarysignificantlyacrossreplicates(binomial,1.5-fold,
P < 0.01).Thislatterconditionhadtheeffectofscreeningoutlocations
that were not reproducibly enriched across replicated experiments.
Locations resulting from a combined analysis of two independent
replicates can be found in 04_ChExMix_Peaks at https://github.com/
CEGRcode/2021-Rossi_Nature (and at https://doi.org/10.26208/rykf-
6050 for individual replicates).
The negative control for ChExMix peak calling, termed ‘masterNo-
Tag_20180928’, was created by merging 15 individual BY4741 (parent
strain containing no epitope tag) ChIP–exo experiments into a single
BAM file. These negative controls were generated over an 18-month
period during the main phase of data collection. The file ‘masterNo-
Tag_20180928.bam’comprisesthefollowingSampleIDs:11851,11946,
12094, 12880, 13484, 13822, 14202, 14408, 14637, 14825, 15256, 15818,
16073, 17814 and 18504, and is available at https://doi.org/10.26208/
rykf-6050.
Meta-assemblages
Meta-assemblagesarebasedoncellpopulations.Thus,theirmember
targetstendtobindthesamegenomiclocations,althoughnotneces-
sarilyatthesametimeoraboveapresetalgorithmicthreshold.Owing
to parameter constraints placed on clustering, significant (P < 0.01)
butrare(forexample,HIR)and/orhighlyisolated(forexample,Vid22/
Tbf1)bindingeventstendedtoclusterneareachotherinUMAP,andso
wereplacedinasinglemiscellaneousmeta-assemblage(ISO)without
further analysis.
Usingbedtoolsintersect(bedtoolsversion2.27.1),allChExMixpeaks
(regardless of whether they were associated with the Pol II sector,
definedabove)foreachof384validatedinputtargetswereintersected
ina100-bpwindowaroundthemselves.Thisproducedasymmetrical
matrixofcountsrepresentingthefrequencyofpeakoverlapbetween
all samples. 2D hierarchical clustering43
was then performed, using
average linkage and uncentred correlation as the metric.
Theinteractionmatrixwasfurtherfilteredtoremove13targetswith
fewerthanfivetotalChExMixpeaks(forexample,PolItargetshaving
only two binding locations that are annotated in the reference yeast
genome, despite the rDNA locus being highly repetitive). This pro-
ducedasymmetricalmatrixof371samples(Fig. 1bandSupplementary
Data 3).ThematrixwasthenusedastheinputintotheUMAPalgorithm
(version0.3.7)44
usingthefollowingparameters:umap.UMAP(n_neigh-
bours = 5,min_dist = 0.0,n_components = 2,metric = ‘correlation’,ran-
dom_state = RS,).fit_transform(X).K-meansclusteringwasperformed
on the resulting 2D projection at a variety of K values (5, 10, 20, 25,
30, 35, 40, 100, 145). No new biologically distinct clusters appeared
beyond K = 40.
Referencefeaturesandintervals
Coordinates for 253 replication origins (ACS sequences, for ‘auton-
omously replicating sequence (ARS) consensus sequences’) were
obtained from ref. 6
. Note that ACS_6_32973 has a duplicate entry on
theyeastepigenome.orgwebsite,resultingin254features.Coordinates
for X-core elements (XCEs), centromeres (CENs), RNA polymerase I
(Pol I) TSS, Pol III TSS, NCR (SGD-defined noncoding RNA annotated
as ncRNA_gene, snoRNA_gene, and snRNA_gene) and Ty transposon
LTRswereobtainedfromtheSaccharomycesGenomeDatabase(SGD;
https://www.yeastgenome.org)on3March2017(availableasSGD_fea-
tures_170331.tabin02_References_and_Features_Filesathttps://github.
com/CEGRcode/2021-Rossi_Nature).TSSandTEScoordinatesforPol
II were obtained from ref. 45
. They were matched to each SGD coding
featurethroughtheirsystematicGeneID.Thesecoordinateswerebased
onmicroarrays.ForTSS,themost5′-enrichedsense-strandcoordinate
ineachpromoterisreported.Whennotranscriptwasreportedforan
SGDfeature,theTSSandTESwereimputedfromtheSGDcoordinates
by moving 70 bp upstream of the start ATG (SGD start) for TSSs and
70 bpdownstreamofthestopcodon(SGDend)forTESs.Thisimputa-
tionwasbasedontheempiricalobservationthatthemediandistance
fromtheTSSdefinedinref.45
andthestartcodonwas70 bp.‘Dubious
ORFs’wereinitiallyconsideredandthenexcludedfromfurtheranalysis
9. because we and others46
found no validating evidence. Noncoding
RNAs (ncRNAs) were from SGD annotations; cryptic unstable tran-
scripts (CUTs) and stable unannotated transcripts (SUTs) were from
ref.45
;andXrn1-sensistiveunstabletranscripts(XUTs)werefromref.47
.
Referencedatasetsareavailablein02_References_and_Features_Files
at https://github.com/CEGRcode/2021-Rossi_Nature: SGD features
(SGD_features_170331.tab),ORFTSS(Xu_2009_ORF-Ts_V64.gff3),CUT
(Xu_2009_CUTs_V64.gff3), SUT (Xu_2009_SUTs_V64.gff3), and XUT
(van_Dijk_2011_XUTs_V64.gff3).
NucleosomemapsatPolIIpromoterregions
MNaseH3andH2BChIP–seqpaired-endreadswerebioinformatically
filteredtofragmentsizesof100–160 bp,andthennucleosomedyads
(peaks)werecalledfromthemappedmidpointlocationofRead_1and
Read_2 5′ ends using GeneTrack (v1) (parameters: s40e80F1)48
. Peaks
were required to overlap within a 75-bp window in at least 4 of 6 data-
sets(3H2Band3H3MNaseChIP–seq;SampleIDs10951,10952,10967;
10947, 10948 and 10966) to call a consensus nucleosome (N = 6). The
averagelocationofoverlappingpeaksdefinedthedyadcoordinateof
a consensus nucleosome.
The +1 nucleosome was defined as the nucleosome dyad peak that
was closest to a TSS in a window −60 bp to +140 bp. If no nucleosome
wasfound,thenanadditionalsearchwasperformed−80 bpto−61 bp
relative to the TSS. If none was found, then the region was viewed in
IntegratedGenomeViewerversion2.5.2(IGV;http://software.broadin-
stitute.org/software/igv/)49
,andmanuallyassigned.Ifnonucleosomes
could visually be assigned to a TSS in IGV, then a +1 nucleosome dyad
coordinate was imputed as the SGD ATG start coordinate (which is
theconsensuslocationof+1nucleosomes).ThisplacedtheTSSatthe
genome-wide canonical location relative to the imputed +1 dyad.
We previously defined consensus −1 nucleosome positions of all
genes transcribed by Pol II, regardless of whether a nucleosome had
low occupancy or was even detectable50
. However, here our intent
was to define the region encompassing NFRs and NDRs, and so we
chose to ignore nucleosome positions that were highly depleted of
nucleosomes. Our goal was to manually determine the location of
the most robust algorithmic nucleosome position (upstream stable
nucleosome, USN) that was located closest to a TSS and in a window
−500 bp to −60 bp from the TSS, as long as that nucleosome was not
already called a +1 nucleosome. If one of the following criteria was
met, then the nucleosome landscape was visualized in IGV, and the
USN and/or +1 nucleosomes were manually (re)assigned (N = 753): 1)
either the USN or +1 was not present in the original algorithmically
definedset;2)theUSN-to-(+1)dyad-to-dyaddistancewascalculated
to be smaller than 187 bp (the size of a nucleosome (147 bp) and two
linkers (2 × 20 bp)); 3) a ssTF peak was, first, located less than 600 bp
upstream of the TSS, and second, upstream (more 5′ to the nearest
TSS) of a nucleosome call having an occupancy score that was in
the bottom 5% of all nucleosomes (that is, an algorithmically called
nucleosomethatwasinfacthighlydepletedinthevicinityofassTF).
If no nucleosomes could visually be assigned, the USN nucleosome
coordinate was imputed as 750 bp upstream of the +1 nucleosome
dyad (99th percentile of calculated NDR/NFR lengths). The NDR/
NFRlengthatthesefeatureswasreportedas‘9999’inSupplementary
Data 1 (1S) (N = 297). As the promoter regions defined in this study
include arbitrary limits and do not consider limits defined by insula-
tion, there will be some inaccuracies in relation to actual biological
promoter boundaries. This is expected to result in some promoter
misclassifications.
In total, 59,002 nucleosomes were called across the S. cerevisiae
genome. Nucleosome occupancy and fuzziness scores were calcu-
latedasdescribed51
.Allnucleosomecallswiththeirmedianoccupancy
and fuzziness scores are available as Nucleosome_calls_and_stats.
xlsx in 02_References_and_Features_Files at https://github.com/
CEGRcode/2021-Rossi_Nature.
ChExMixlocationsatfilteredPolIIgenes
The initial list of all compiled features totalled 11,112 (Supplementary
Data 1). Numerous quality-control metrics were calculated for each
Pol II transcribed feature to assess their validity and mappability. We
usedtwoGTFs(Sua7(SampleID = 11743)andSsl2(11747))andanegative
control(masterNoTag_20180928.bam),withtotaltagssettobeequal
across all three in order to assess the enrichment around each candi-
date coding and noncoding Pol II TSS (N = 9,844; feature class level 1:
01–12,14,24and25inSupplementaryData 1 (1D)),asdescribedbelow.
Aregionofthegenomewasdefinedforeachtranscribedfeaturethat
included the transcribed sequence (TSS to TES) and the surrounding
regulatory region. The upstream (promoter) regulatory region was
defined as the inclusive interval between the dyad coordinate of the
USN (see above) and the TSS. When no USN was called for a feature,
thentheupstreamboundarywasdefinedas750 bpupstream(5′)ofthe
TSS. Note that the upstream boundary does not consider boundaries
defined by insulators, as they have not yet been fully defined. This
may result in unwarranted attachment of ssTF/cofactor locations to
somepromoters.Thedownstreamregulatoryregionwasdefinedasthe
inclusiveintervalfromTESto100 bpdownstream(3′).Thisboundary
was based on the consensus position of the termination machinery
relative to the TES. The genomic region from the USN dyad to 100 bp
downstream of TES was defined as a ‘Pol II sector’.
ChExMix peaks for all datasets here were intersected with each Pol
II sector using Bedtools. A protein was defined to be located within a
featureifatleastoneChExMixpeakoverlappedwithanyportionofthe
sector.IfaChExMixpeakintersectedtwooverlappingsectors(thatis,
the peak exists in the promoter region of two genes in a head-to-head
orientation), then that protein was located in both sectors. Conse-
quently, the number of ChExMix peaks and the number of bound fea-
tures (or sectors) is not equal.
PolIIsectorswereexcludedas‘hypervariable’ifanyofthefollowing
conditions were met: 1) the TSS was in the highest 1% of masterNo-
Tag_20180928 tag counts (negative control) in a 1,000-bp window
centred over the TSS; 2) the TSS was in the highest 5% of masterNo-
Tag_20180928 tag counts in a 200-bp window centred over the TSS
and the occupancy ratios of both Sua7/NoTag and Ssl2/NoTag were
less than 2 (based on total tag normalization). The rationale for these
criteria was that if the signal in the negative control was too high, and
thesignal-to-noiseratiosofrobustGTFssuchasSua7andSsl2werenot
well above the high background, then we did not have confidence in
locationscalledatthesesites.Thesectorwasretainedifitoverlapped
withapeakcallfromanydatasetinthisstudy.Weassumedthatthepeak
indicated enough dynamic range to have useable data in this region.
WeexcludedN = 75PolIIsectorsbythismetric(‘08_Hyper-variable’in
Supplementary Data 1 (1D)).
Pol II sectors were excluded for having ‘poor mappability’ if any of
the following conditions were met: 1) the TSS was in the lowest 1% of
MasterNoTag_20180928tagcountsina1,000-bpwindowcentredover
theTSS;2)theTSSwasinthelowest5%ofmasterNoTag_20180928tag
counts in a 200-bp window centred over the TSS and the occupancy
ratiosofbothSua7/NoTagandSsl2/NoTagwerelessthan2(basedon
total tag normalization). Visual inspection of heatmaps confirmed
thatthesesegmentsofthegenomewerenotuniquelymappable,and
thus had low intrinsic tag counts. We excluded N = 116 Pol II sectors
by this metric (‘24_Hyper-variable_noncoding’ in Supplementary
Data 1 (1D)).
Pol II sectors were excluded as ‘Quiescent-NoPIC’ if the occupancy
ratios of both Sua7/NoTag and Ssl2/NoTag were less than 1. The sec-
tor was retained if it overlapped with a peak call from any dataset in
this study. The rationale here was that if there were no peaks in the
sector vicinity and no enrichment of GTFs, then this feature was rela-
tively quiescent. Thus, it was uninformative to analyse it further. We
donotexcludethepossibilitythatthesefeatureshadlowsubthreshold
10. Article
activity.WeexcludedN = 251PolIIsectorsbythismetric(‘05_NoPIC’in
Supplementary Data 1 (1D)).
Pol II sectors were excluded as ‘tRNA proximal’ if peaks from Tfc3
(11835)—a component of the RNA polymerase III transcription ini-
tiation factor complex—overlapped with the region between the +1
nucleosomedyadandtheUSNdyadofthesector.tRNAgenesproduced
high levels of background owing to strong crosslinking of the Pol III
machinery, which digestion by λ exonuclease then focuses into high
backgroundpeaks.Althoughthisbackgroundispresentinallsamples,
itismostproblematicorevidentwherethetargetforegroundsignalis
closetobackground.WeexcludedN = 135PolIIsectorsbythismetric:
(‘06_tRNAprox’ in Supplementary Data 1 (1D)).
PolIIsectorswereexcludedas‘ChExMixextreme’iftheyoverlapped
withanunusuallyhighnumberofpeaks.Thesefeaturescontainedmany
gene-body peaks for targets that, across the rest of the genome, were
bound primarily in promoter regions. Further analysis revealed that
thedensityoftagsacrossthegenebodyinthemasterNoTag_20180928
negativecontrolwasabnormallyhighorlow,relativetotherestofthe
genome, thereby creating statistical anomalies of bound locations.
Consequently, ChExMix produced many false-positive peak calls in
unrelated datasets at these extreme regions where the background
modelappearstobreakdown.Thepeakcallsattheseextremefeatures
arestillincludedintheChExMixpeakfilesbutshouldnotbeconsidered
validlocationsunlessvalidatedbyorthogonalmethods.Thenumberof
PolIIsectorsgiventhislabelwasempiricallycappedatN = 25(‘07_ChEx-
Mix_extreme’ in Supplementary Data 1 (1D)). The value of this filter
is that it decreased the number of potentially artefactual locations
occurring in noncanonical places, particularly for ssTFs that bind to
fewgenes.However,wedonotexcludethepossibilityofnoncanonical
extreme, yet still biological, behaviour occurring at these genes. For
example, large condensates might behave in this way.
Our analysis of the ncRNA features reported in refs. 45,47
found that
many of these calls were not supported by evidence of GTF binding
(Sua7) in the TSS vicinity, suggesting that many were false positives.
NoncodingPolIIsectorswereexcludedifnoSua7peakwasfoundwithin
80 bp of the TSS. We excluded N = 2,161 ncRNA Pol II sectors by this
metric (‘25_excluded_ncRNA’ in Supplementary Data 1 (1D)).
PolIIpromoterclasses
Ourunsupervisedapproachtochromatinorganizationgenome-wide
produced meta-assemblages that reflect predominant architec-
tural themes. Meta-assemblages are computed ensembles of many
genome-wide locations averaged across millions of cells, and thus do
notnecessarilycorrespondtobiochemicallystablecomplexes.There
are cases in which a meta-assemblage such as ORC would appear to
have a corresponding biochemical ensemble at replication origins.
Thismakesmeta-assemblagesandrealensemblesseeminglythesame.
However,asexpected,therewasnosinglepromoterarchitecturethat
emergedfromourunsupervisedapproach.Instead,meta-assemblages
reflectedpredominantarchitecturalthemesthatrangedalongacom-
positionalspectrumfromrelativelyheterogeneous(ssTFs/MED/SAGA/
TUP) to relatively homogeneous (PIC). Meta-assemblages could be
merged or subdivided to achieve levels of granularity, but also levels
of uncertainty. They permeated promoters to varying extents.
The variation in the types of meta-assemblages within and across
promoter classes gives them their unique regulatory properties, but
also makes promoter classification fluid. Classification depends on
input criteria that reflect subjective concepts. For example, prior
workcreatedSAGA-dominatedandTFIID-dominatedgenegroupson
the basis of functional criteria (relative sensitivity to SAGA and TFIID
mutants)28
. This helped to produce a genome-wide concept of induc-
ible versus constitutive genes, but could not address other concepts
suchasinsulation,orthefactthatsomethemesmaynotbemanifested
throughSAGAandTFIID,orthattheremaybemoregranularityineach
of those classes. We attempt here to provide more granularity, but
recognizethatsimplifyingoverarchingconceptsarebestservedwith
fewergroups.Tothisend,wecreatedpromoterclassesthataroseinpart
fromourunsupervisedlearningapproach.However,wealsoinjected
additionala prioriknowledge.Thisknowledgeconsidersthefunctional-
ityofeachfactorthatcontributestodistinctiveregulatoryarchetypes.
The137RPpromoters(definedbySGD)encodesubunitsoftheribo-
some. They comprise the largest known set of genes that are thought
tobecoregulatedunderallconditions.Thismaybeduetothefactthat
they are predominantly regulated by the ssTF Rap1. They are highly
expressed and well studied by ChIP-exo as a group23
, and so form a
distinct gene set.
SAGA, Mediator and Tup1 (‘STM’) are major cofactor complexes
that, along with other ssTFs and cofactors (listed in Supplementary
Data 2 (1K)), co-occur at highly expressed genes and formed major
UMAP clusters. We therefore defined a set of non-RP STM promoters
(using the Bedtools intersect) if the region between the +1 nucleo-
someandUSNdyadshadatleastoneSAGA,MediatororTUPChExMix
call (Supplementary Data 2 (10A)) in YPD at 25 °C or a SAGA call upon
acuteheatshock36
(6 minat37 °C)(N = 984intheSTMgroup;seeSup-
plementaryData 1 (1E)).MostSTMpromoterregions(N = 854,or87%)
also bound at least one of 78 ssTFs site-specifically (Supplementary
Data 2 (10C)). The majority of these ssTF peaks overlapped position-
ally with STM cofactor peaks. Applicable to Fig. 5b, we labelled each
ssTF-boundmotifasa‘consolidatedssTFmotif’ifitoverlappedwitha
STMpeak.Thisconsolidatedmotifsetwasconsideredtheorganizing
centreofthatpromoter.WhenassTFmotifwasabsent,thessTFpeak
callwasusedininstead.WhennumerousssTFswereboundtothesame
promoter,thessTFclosesttotheSTMpeakwasused(Supplementary
Data 1 (1Y–1AI)).
Of the remaining promoters (non-RP, non-STM), a subset had ssTF
ChExMixpeaks(whethersite-specificallyboundornot)orothercofac-
tor ChExMix peaks in the region between the +1 nucleosome and the
USN.ThislistofssTFsandcofactorsdidnotincludethecoretranscrip-
tionmachinery(initiation,elongationortermination),whichneverthe-
lesswerepresent.Wethereforedefinedtheseas‘TFO’(N = 1,783).About
one-quarterofTFOpromotershadaboundssTFthatwasmoreassoci-
atedwithSTMpromoters,andthuspresumablycapableofrecruiting
cofactors(SupplementaryData 2 (8)).TheseTFOpromotersmayhave
been algorithmically misclassified, perhaps being expressed under
otherenvironmentalconditions.Thosenon-RP,non-STMandnon-TFO
promoters that remained constituted 2,474 promoters whose pro-
moter regions lacked evidence of a binding event beyond a PIC or a
nucleosome,andthusformedthelargestofallgroups,the‘unbound’
(‘UNB’). These classifications are indicated in Fig. 1a, along with their
relationship to the TFIIDdom and SAGAdom gene classes. Relative PIC
occupancy (green-dot count) is based on average TFIIB (Sua7) occu-
pancy (Supplementary Data 1 (1AJ)) but confirmed with nascent and
steady-state transcription.
StringentPolIIpromoterclasses
Theseclassificationsweremorestringentthanthoseaboveandrelateto
Fig. 5b,c,andExtendedDataFig. 9b,c.The‘SAGA-bound’classification
requiredapromotertohaveaChExMixpeakcall(‘1’inSupplementary
Data 2 (3))fortwoormoreofthefollowingtargets:Spt7,Ada2,Sgf11or
Sgf73.The‘STM-bound’classificationrequiredapromotertohaveall
threeofthefollowinglabels:SAGA-bound,TUP-boundandMediator/
SWI–SNF-bound,asfollows.The‘TUP-bound’classificationrequireda
promoter to have a ChExMix call (‘1’) for two or more of the following
targets: Tup1, Cyc8, Sok2 and Cin5. The ‘Mediator/SWI–SNF-bound’
classification required a promoter to have a ChExMix call (‘1’) for
two or more of the following targets: Swi1, Med2, Snf6 and Swi3. The
‘RSTM-bound’ classification required a promoter to have both of the
followinglabels:STM-boundandRPD-bound.TheRPD-boundclassifi-
cationrequiredapromotertohaveaChExMixcall(‘1’)fortwoormore
ofthefollowingtargets:Rpd3,Rxt1/Cti6,Rxt2,Rxt3,Nrm1andUme6.
11. Heatmapsandcompositeplots
Analysis was performed using the GUI ScriptManager version 012,
which is available for download at https://github.com/CEGRcode/
scriptmanager. ScriptManager provides a simple user-friendly inter-
face for ChIP–exo analysis, and includes simple installation instruc-
tions.HeatmapsandcompositeplotsweregeneratedusingTagPileup
script. For ChIP–exo data, the following settings were used: Read_1
5′ end; separate strands, 0 bp tag shift, 1 bp bin size, sliding window
(moving average) 11. For MNase ChIP–seq data the following settings
were used: (paired-end) read midpoint; combined strands, 0 bp tag
shift, 1 bp bin size, sliding window 21. All data are oriented by TSS or
reference point strand.
For graphical display of composite plots, output data consisted of
frequency counts of Read_1 5′ ends for ChIP–exo or Read_1/Read_2
midpoint for MNase H3/H2B ChIP–seq dyads (BAM files) that were at
x-axisbase-pairdistancesfromsetsofgenomicreferencepoints(BED
files). Underlying patterns and datapoints are available at yeastepig-
enome.org and as Excel_Composite_Data_Processed.xlsx in 01_Com-
posite_Files at https://github.com/CEGRcode/2021-Rossi_Nature. An
additional moving average of 20 bp (30 bp for Pol II elongation and
Yrr1 composites) was performed for the purpose of improving visual
clarity. Without this, the high-bp resolution of ChIP–exo resulted in
peaks that were quite narrow in the 1-kb visualization window, such
thattheirfillpatternswerelessvisuallyobvious.Forgene-bodytargets
(Fig. 3c and Extended Data Fig. 5), smoothed strand-separated data
were shifted 50 bp in the 3′ direction before combining strands. The
rationaleforthisisthatwhenweexaminedeachstrandseparately,we
noticedthatpatternsonthetranscribedstrandshowedsomemirror-
ing on the nontranscribed strand. But this pattern was shifted in the
3′ direction relative to transcribed strand (that is, more downstream
of the TSS). We surmise that this ‘double-vision’ effect was caused by
efficient crosslinking such that the 5′–3′ λ exonuclease is generally
stoppedatthebackendofthePolIIentourageonthetranscribedstrand
and stopped at the front-end of the entourage on the nontranscribed
strand. Shifting data on both strands by 50 bp in their respective 3′
directionspartiallycorrectedthisdoublevisionandreflectsthemiddle
ofthecomplex.Intheabsenceofastrand-specific3′shiftforgene-body
targets,patternsneartheTSSreflectthebackendofthePolIIentourage,
and patterns near the TES represents its front end. The data in Fig. 5b
and Extended Data Fig. 9b were not strand-shifted before removing
strand information.
Incompositeplots,they-axisislabelled‘Occupancy(a.u.)’(arbitrary
units), reflecting y-axis scaling that was adjusted to highlight the pat-
terning of the data. Within a single figure (including any Extended
Data figure counterparts), occupancy levels can be compared across
multiplepanelsonlyforthesamedataset.Occupancylevelsofdifferent
datasetsinthesameordifferentpanelscannotbecompareddirectly.
Only the peak positions are comparable. For Fig. 2, the MEME motif
obtainedandshownforOrc6startsatposition2oftheACS.ForCbf1,the
MEME motif starts at position 1 of CEN. Schematics reflect subjective
interpretationsofpeaklocations,arenonlinearwithrespecttothedia-
grammedDNAlinearity,anddonotreflectproteinmolecularweights.
NascentRNA(CRAC)analysis
ThisanalysisrelatestoFig. 4d.CRACdatasetsweredownloadedfrom
GEO using accession code GSE97913. Raw sequencing data were
trimmed of adapters and aligned to the sacCer3 genome using the
recommendedparametersinref.26
.The5′endsofreads(corresponding
to the 3′ ends of sequenced nascent RNA) were counted in a window
from the TSS45
to 300 bp downstream (more 3′ on the ‘sense’ strand).
Only those reads that mapped to the sense strand relative to the gene
body were retained. Datasets were normalized such that the total tag
countswereequal.However,asallanalysiswasinternaltoeachdataset,
this had no effect on final output.
This analysis relates to Extended Data Fig. 7b. TFIIB (Sua7) occu-
pancy data (Read_1 5′ end) were counted in a 100-bp window centred
on each promoter TSS. The list of all coding genes was filtered to be
only head-to-head such that each gene possessed a promoter region
overlapping/adjacent to another gene’s promoter (Supplementary
Data 1 (1AZ–1BG)). Promoter regions were then separated into three
groups: RP + STM, TFO and UNB. A separate Reb1-bound group was
alsocreated.APearsoncorrelationwascalculatedforCRACsignalsfor
onepromotersidecomparedwiththeotherside,withineachdataset.
ClassificationofssTFs
WeusedGOclassificationsandtheJASPARmotifdatabasetoidentify
candidatessTFs.HerewedefineassTFasatargetthathasatleastfour
ChExMix peaks in the total set of promoter regions, and an enriched
motifthatisnotmoreenrichedwithanotherssTF.AsofOctober2019,
the JASPAR database reported 175 nonredundant ssTF motifs for S.
cerevisiae, which are based on experimental assays including in vitro
protein-bindingmicroarrayswithpurifiedprotein52
.Ofthose,70cor-
respondedtossTFs,inwhichweconfirmedtheirsitespecificityin vivo
by ChIP–exo. Another two (Mot3 and Rgt1) were confirmed after this
study was completed. As ChIP–exo can define site specificity within a
few base pairs, this represents a remarkable degree of concordance
between in vivo and in vitro binding. Because of co-occurrence of
motifs in the genome, additional nearby motifs were also enriched
for these ssTFs. If multiple targets had a match with essentially the
same JASPAR motif, then we used GO descriptions and the literature
toidentifythosethatweremostlikelytobedirectbinders(ssTFs).The
rest were labelled as cofactors. For example, Nrg1 and Nrg2 bind the
same motif, although JASPAR assigns this motif to Nrg1. We labelled
both as ssTFs. Another equivalent example involves Met4, Met31 and
Met32. Both Yox1 and Mcm1 have distinct motifs reported in JASPAR,
and both biochemically interact. However, ChIP–exo reported the
Mcm1 motif for both, with Mcm1 being much stronger. We therefore
classified Yox1 as a cofactor in YPD at 25 °C, instead of a ssTF. Eight
targets had GO annotations indicative of a ssTF and yielded robust
motifs by ChIP–exo with a robust ChIP–exo pattern, but five of them
hadnomotifinJASPAR(Nrg2,Hms2,Hmo1,War1andPip2),andthree
had a different motif in JASPAR (Tea1, Rds2 and Sum1). These eight
were also labelled as ssTFs. This resulted in 78 ssTFs that ChIP–exo/
ChExMix detected as bound to a motif in YPD at 25 °C. The remaining
candidate targets that had JASPAR motifs were not labelled as ssTFs
for the following reasons. First, one (Yox1) appeared site specific but
wasclassifiedasacofactor.Second,oneisaGTF(TBP/Spt15).Third,21
producedChExMixbindinglocationsbutweredeemedtobecofactors
inYPDat25 °C(thatis,theyhadboundlocations,butwerenotbound
site-specifically). Their site specificity could be condition specific.
Fourth,37werenottestedornotepitope-tagged(possiblybecauseof
lethality or technical difficulty in tagging). The remaining targets did
notpassourdetectionthresholds.SeeSupplementaryData 2 (1,9,11)
for the complete list of candidate factors, JASPAR/cis-bp motifs, and
matches to ssTF-bound location in ref. 53
.
CircuitryinvolvingssTFs
ThisanalysisrelatestoExtendedDataFig. 10.Weanalysedthesetof78
genesencodingssTFs(definedinYPD)alongwiththessTFsthatbound
theirpromoterregionssitespecifically(SupplementaryData 2 (1K)).A
circuit-like diagram was then constructed by connecting ssTFs to the
ssTF-encodinggenestowhichtheybound.Thetotalnumberofgenes
(ssTFandallothers)towhichassTFboundwasrecorded,separatedinto
site-specifically bound versus those for which binding was recorded
but a cognate motif was not detected.
Theyeastepigenome.orgwebsite
Design.Thebackendofyeastepigenome.orgiscomposedoftwointer-
nalmodules:anodejsRESTapplicationandMongoDBdatabase(version
12. Article
4.2.8).MongoDBstoressample-specificmetainformationandURLsin
a JSON/BSON structure. The frontend of yeastepigenome.org is com-
posedofaReactapplication,bootstrappedusingthecreate-react-app
tool. A target page is subdivided into sections containing heatmaps,
composite plots and other analyses and visualizations. The frontend
retrievessampleinformationbymakinganapplicationprograminter-
face(API)requesttothebackendapplication.Thefrontendisdesigned
tosupportacartsystemfordownloadingtargetdatasets;ithasUCSC
(https://genome.ucsc.edu) trackhub integrations and an integrated
targetlookupontheSGDwebsite,anditcomeswithasetoffrequently
asked questions (FAQs) with detailed explanations of all of the plots
and visualizations.
Targetlocations.ChExMixcalledbindingeventsusingastringentsta-
tisticaltestofhighlylocalizedtagsthatwasoptimizedtominimizefalse
positives41
. As a consequence, ChExMix did not call bound locations
wheretagdistributionswerediffuseandmarginallyabovebackground
(for example, chromatin remodellers). To potentially capture events
withmarginalsignificance,wedividedeachsectorintofive‘subsectors’
and determined for each dataset whether there was enrichment over
thenegativecontrol(masterNoTag_20180928)acrosseachsubsector.
Wedefinedthesubsectorsasfollows:first,promoterregion(−350 bp
to −75 bp relative to the TSS); second, TSS region (−75 bp to +150 bp
relativetotheTSS);third,genebody5′-end(+150 bpto+450 bprelative
to the TSS); fourth, gene body 3′-end (−400 bp to −100 bp relative to
theTES);andfifth,TESregion(−100 bpto+100 bprelativetotheTES).
Theratiooftagcounts(test/control)inasubsector(ortheselected
region)wascalculatedafterthetestandthenegativecontrolsamples
were normalized using the NCIS method54
. The following steps were
taken to calculate the significance of tag enrichment in a subsector.
First,test/controltagratiosforsubsectorswerecalculated,thencon-
vertedtoalog2 scale.Second,aGaussianmodel,whichrepresentsthe
backgroundratiooftagcounts,wasfittothedistributionoftagratios.
Third,asignificancevaluewascalculatedwithrespecttotheGaussian
model. Fourth, P values were adjusted with the Benjamini–Hochberg
correction42
(P = 0.05). The subsector analysis of each dataset is pre-
sentedasaseparatetabatyeastepigenome.org.Thesesubsectorswere
not used for any other analyses herein.
Motifdiscovery.Thede novomotifdiscoverypresentedatyeastepig-
enome.orgwasachievedusingtheMEMEsuite55
asfollows:aChExMix
peak .bed file was intersected with a curated BED file (Merged_sec-
tors_for_MEME_924.bed)consistingofallgenesectors(thisreference
dataset is available at in 02_References_and_Features_Files at https://
github.com/CEGRcode/2021-Rossi_Nature),withoverlappingregions
mergedintoasingleregion.TheintersectedoutputBEDfilewassorted
on the basis of the score reported by ChExMix for each peak. After
sorting, the top 200 peak locations were bidirectionally expanded
to 60 bp and the underlying DNA sequence was extracted in FASTA
format. These sequences were used as the input for MEME55
. Default
parameters were used, with the following exceptions: the minimum
and maximum motif widths (mememinw and mememaxw) were set
as 6 and 18, respectively.
Datavisualization.Togenerateheatmaps,the‘TagPileUpFrequency’
tool was used with no tagshifts, single-base-pair bins, and tags set to
equal with combined strands. The tool takes in an input of BED file
containing regions that have at least one overlapping ChExMix peak
andthetargetExperimentBAMfile.Thetooloutputsamatrixcontain-
ingtagfrequencies,witheachrowrepresentingtheregionofinterest
andeachcolumnasingle-base-pairbin.Thisoutputfilewasfedintoa
heatmap script that uses Java TreeView’s algorithm and matplotlib to
generatetherequiredheatmap.BEDfileswerepresortedonthebasis
of the criteria indicated in each online graphical image before run-
ningTagPileUpFrequencytogeneratedesiredheatmaps.Allheatmaps
were set to the same contrast threshold, which is calculated from the
tag pileup frequency matrix of BoundGenes and determining a 95th
percentile cutoff from this frequency distribution.
Togeneratecomposites,the‘TagPileUpFrequency’toolwasusedwith
notagshifts,single-base-pairbinsandtagssettoequalwithcombined
strands. One of the inputs to this tool is a BED file containing regions
that have at least one overlapping ChExMix peak; the other is a BAM
file.ThetoolwasrunonthetargetandmasterNoTag_20180928control
BAM files individually, to generate two data files that were fed into
a composite generation script. The script uses matplotlib, a python
plotting library, to generate a combined composite plot.
Reportingsummary
Further information on research design is available in the Nature
Research Reporting Summary linked to this paper.
Dataavailability
SeeSupplementaryData 4foralistofwheretofindavailabledataand
codeonline.Inessence,allrawsequencingdataandpeakfilesfromthis
study are available at the NCBI GEO (https://www.ncbi.nlm.nih.gov/
geo/)underaccessionnumberGSE147927.Processeddataareavailable
at https://doi.org/10.26208/rykf-6050. Additional analyses and data
are at yeastepigenome.org. We warn that single-replicate data files
arenotlikelytohavemeaningfuldataandshouldnotbeusedwithout
further replication. All underlying data used to generate composite
plots, coordinate files and script parameters for Figs. 2–5, Extended
DataFigs. 4,5,7,8bandSupplementaryFig. 1canbedownloadedfrom
https://github.com/CEGRcode/2021-Rossi_Nature. Final composite
plot values can be found in Supplementary Data 5.
Codeavailability
Code is available at https://github.com/CEGRcode/scriptmanager.
36. Vinayachandran, V. et al. Widespread and precise reprogramming of yeast
protein-genome interactions in response to heat shock. Genome Res. 28, 357–366 (2018).
37. Wal, M. & Pugh, B. F. Genome-wide mapping of nucleosome positions in yeast using
high-resolution MNase ChIP-Seq. Methods Enzymol. 513, 233–250 (2012).
38. Shao, D., Kellogg, G. D., Lai, W. K. M., Mahony, S. & Pugh, B. F. in Practice and Experience in
Advanced Research Computing 285–292 (Association for Computing Machinery,
Portland, OR, 2020).
39. Picard Toolkit. http://broadinstitute.github.io/picard/ (2019).
40. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–
2079 (2009).
41. Yamada, N., Lai, W. K. M., Farrell, N., Pugh, B. F. & Mahony, S. Characterizing protein–DNA
binding event subtypes in ChIP–exo data. Bioinformatics 35, 903–913 (2019).
42. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful
approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995).
43. de Hoon, M. J., Imoto, S., Nolan, J. & Miyano, S. Open source clustering software.
Bioinformatics 20, 1453–1454 (2004).
44. Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat.
Biotechnol. 37, 38–44 (2019).
45. Xu, Z. et al. Bidirectional promoters generate pervasive transcription in yeast. Nature 457,
1033–1037 (2009).
46. Rhee, H. S. & Pugh, B. F. Genome-wide structure and organization of eukaryotic
pre-initiation complexes. Nature 483, 295–301 (2012).
47. van Dijk, E. L. et al. XUTs are a class of Xrn1-sensitive antisense regulatory non-coding
RNA in yeast. Nature 475, 114–117 (2011).
48. Albert, I., Wachi, S., Jiang, C. & Pugh, B. F. GeneTrack—a genomic data processing and
visualization framework. Bioinformatics 24, 1305–1306 (2008).
49. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
50. Jiang, C. & Pugh, B. F. A compiled and systematic reference map of nucleosome positions
across the Saccharomyces cerevisiae genome. Genome Biol. 10, R109 (2009).
51. Yen, K., Vinayachandran, V., Batta, K., Koerber, R. T. & Pugh, B. F. Genome-wide
nucleosome specificity and directionality of chromatin remodelers. Cell 149, 1461–1473
(2012).
52. Badis, G. et al. A library of yeast transcription factor motifs reveals a widespread function
for Rsc3 in targeting nucleosome exclusion at promoters. Mol. Cell 32, 878–887 (2008).
53. MacIsaac, K. D. et al. An improved map of conserved regulatory sites for Saccharomyces
cerevisiae. BMC Bioinformatics 7, 113 (2006).
54. Liang, K. & Keleş, S. Normalization of ChIP-seq data with control. BMC Bioinformatics 13,
199 (2012).
55. Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover
motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
Acknowledgements This work was supported by National Institutes of Health (NIH) grants
ES013768, GM059055 and HG004160 to B.F.P.; National Science Foundation (NSF) ABI
INNOVATION grant 1564466 to S.M.; grants from the Pennsylvania State University Institute for
13. Computational and Data Sciences to B.F.P. and W.K.M.L.; and computation from Advanced
CyberInfrastructure (ROAR) at the Pennsylvania State University. We thank D. Shao for her role
as lead software engineer for the PEGR platform and for support through the Penn State
Institute and Computational Data Sciences (ICDS) Research Innovations with Scientists and
Engineers (RISE) team. We thank O. Lang for operating EpitopeID.
Author contributions M.J.R. designed and conducted experiments; performed library
sequencing and data analysis; designed and tested the quality-control pipelines and web
page; trained and managed lab personnel to produce data; supervised the project and
co-wrote the manuscript. P.K.K. designed, developed and implemented the quality-control
pipeline, analysis pipeline and website; organized and maintained data files; and provided
bioinformatic support. W.K.M.L. performed high-throughput data processing and analysis, and
provided bioinformatic support and scientific discussion. N.Y. provided bioinformatic support
and developed the initial quality-control pipeline. N.B. and C.M. performed ChIP–exo and
MNase ChIP–seq experiments and provided scientific discussion. G.K. provided bioinformatic
support. K.B. and N.P.F. conducted ChIP–exo experiments and performed library sequencing.
T.R.B., J.D.M., A.V.B., K.S.M., D.J.R. and E.S.P. conducted ChIP–exo experiments. G.D.K. provided
high-performance infrastructure architecture and development, and edge-computing
infrastructure design and support. S.M. provided bioinformatic guidance and support. B.F.P.
conceptualized the project and conclusions, designed experiments, analysed the data, wrote
the main text of the manuscript and co-wrote the remaining parts.
Competing interests B.F.P. has a financial interest in Peconic, LLC, which offers the ChIP–exo
technology (US Patent 20100323361A1) implemented herein as a commercial service and
could potentially benefit from the outcomes of this research. The remaining authors declare
no competing interests.
Additional information
Supplementary information The online version contains supplementary material available at
https://doi.org/10.1038/s41586-021-03314-8.
Correspondence and requests for materials should be addressed to B.F.P.
Peer review information Nature thanks Vishwanath Iyer and the other, anonymous, reviewer(s)
for their contribution to the peer review of this work. Peer reviewer reports are available.
Reprints and permissions information is available at http://www.nature.com/reprints.