SlideShare a Scribd company logo
1 of 29
Download to read offline
Nature | Vol 592 | 8 April 2021 | 309
Article
Ahigh-resolutionproteinarchitectureofthe
buddingyeastgenome
­­­­­­
Matthew J. Rossi1
, Prashant K. Kuntala1
, William K. M. Lai1,2
, Naomi Yamada1
, Nitika Badjatia1
,
Chitvan Mittal1,2
, Guray Kuzu1
, Kylie Bocklund1
, Nina P. Farrell1
, Thomas R. Blanda1
,
Joshua D. Mairose1
, Ann V. Basting1
, Katelyn S. Mistretta1
, David J. Rocco1
, Emily S. Perkinson1
,
Gretta D. Kellogg1,2
, Shaun Mahony1
& B. Franklin Pugh1,2 ✉
Thegenome-widearchitectureofchromatin-associatedproteinsthatmaintains
chromosomeintegrityandgeneregulationisnotwelldefined.Hereweusechromatin
immunoprecipitation,exonucleasedigestionandDNAsequencing(ChIP–exo/seq)1,2
todefinethisarchitectureinSaccharomycescerevisiae.Weidentify21meta-
assemblagesconsistingofroughly400differentproteinsthatarerelatedtoDNA
replication,centromeres,subtelomeres,transposonsandtranscriptionbyRNA
polymerase(Pol)I,IIandIII.Replicationproteinsengulfanucleosome,centromeres
lackanucleosome,andrepressiveproteinsencompassthreenucleosomesat
subtelomericX-elements.WefindthatmostpromotersassociatedwithPolIIevolved
tolackaregulatoryregion,havingonlyacorepromoter.Theseconstitutive
promoterscompriseashortnucleosome-freeregion(NFR)adjacenttoa+1
nucleosome,whichtogetherbindthetranscription-initiationfactorTFIIDtoforma
preinitiationcomplex.Positionedinsulatorsprotectcorepromotersfromupstream
events.Asmallfractionofpromotersevolvedanarchitectureforinducibility,
wherebysequence-specifictranscriptionfactors(ssTFs)createanucleosome-
depletedregion(NDR)thatisdistinctfromanNFR.Wedescribestructural
interactionsamongssTFs,theircognatecofactorsandthegenome.These
interactionsincludethenucleosomalandtranscriptionalregulatorsRPD3-L,SAGA,
NuA4,Tup1,MediatorandSWI–SNF.Surprisingly,wedonotdetectinteractions
betweenssTFsandTFIID,suggestingthatsuchinteractionsdonotstablyoccur.Our
modelforgeneinductioninvolvesssTFs,cofactorsandgeneralfactorssuchasTBP
andTFIIB,butnotTFIID.Bycontrast,constitutivetranscriptioninvolvesTFIIDbutnot
ssTFsengagedwiththeir cofactors.Fromthis,wedefineahighlyintegratednetwork
ofgeneregulationbyssTFs.
Genomes regulate genes so as to achieve homeostasis—the mainte-
nanceofcellularcomponentsinproperbalance.Theyalsoadapt,mak-
ing adjustments in rapidly changing environments, so as to regain
homeostasis3
. Achieving these tasks has necessitated the evolution
ofconstitutiveandinduciblegenecontrol.Whetherornotthesecon-
trols are fundamentally different at the molecular level is unknown.
A classical view posits a single basic regulatory paradigm for genes
(Extended Data Fig. 1a)4
: environmental signals toggle ‘on’ ssTFs that
recruitcofactorsandassembleapreinitiationcomplex(PIC)consisting
ofPolIIandgeneraltranscriptionfactors(GTFs)suchasTBP,TFIIDand
TFIIB at core promoter transcription start sites (TSSs)5
. However, the
extenttowhichconstitutivegeneexpressioninvolvesssTFsisunclear,
as ssTF-binding sites and their cofactors remain unidentified at most
promoters. ssTFs, cofactors, chromatin and PICs play into any dis-
tinction between inducible and constitutive mechanisms, but their
interrelationships remain enigmatic.
Genome-wideproteinmeta-assemblages
Here we used ChIP–exo (Extended Data Fig. 1b)1,2
, an ultra-high-
resolution version of ChIP–seq, to map genome-wide binding. We
selectedtargetproteinsonthebasisofGeneOntology(GO)annotations
related to chromosomal function (Extended Data Fig. 1c and Supple-
mentaryData 1 (1BY);characters in parenthesesreferto theworksheet
numberandcolumnletter).Intotal,wecollected1,229datasetson791
targets, of which 400 targets had reproducibly significant data (Sup-
plementary Data 2 (1A)). The interaction pattern of all 1,229 datasets
aroundindividualandbroadclassesofgenomicfeatures(Fig. 1a)canbe
visualizedanddownloadedatyeastepigenome.org(anexampleisgiven
inExtendedDataFig. 2).WealsodevelopedandprovideScriptManager,
a platform for customized analysis of these data (see Methods).
Binarizedcolocationcountsamongtargetswerehierarchicallyclus-
tered (Fig. 1b). The three largest clusters (yellow) correspond to three
https://doi.org/10.1038/s41586-021-03314-8
Received: 8 May 2020
Accepted: 29 January 2021
Published online: 10 March 2021
Check for updates
1
Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA. 2
Department of Molecular Biology and
Genetics, Cornell University, Ithaca, NY, USA. ✉e-mail: fp265@cornell.edu
310 | Nature | Vol 592 | 8 April 2021
Article
majoraspectsofgeneexpression:first,promoterregulation;second,PIC
assembly;andthird,transcriptionelongation.Thus,thevastmajorityof
chromatin-associatedproteinsarededicatedtogeneregulation.Weused
uniform manifold approximation and projection (UMAP) to represent
eachdatasetasasinglepointinatwo-dimensionalprojection(Fig. 1cand
ExtendedDataFig. 3).Pointsincloseproximityreflectapopulation-based
compositecolocalizationoftargets(‘meta-assemblages’).Weperformed
K-meansclusteringontheprojectionandderived21meta-assemblages
thatcorrespondlargelytoknowninteractingbiochemicalcomplexes,or
relatedgeneontologies(Fig. 1c,outerpie,andSupplementaryData 2 (1F,
1H, 2G–2I)). This probably represents a comprehensive predominant
protein architecture of the yeast genome (‘epigenome’) in rich media
(seeSupplementaryData 2 (1–8)foradeeperanalysis).
Overall,theorganizationdefinedbyUMAPrepresentsaremarkable
degreeofconcordanceandmutualvalidationofbiochemicallypurified
andfunctionallyannotatedcomplexeswiththeirarchitecturalorgani-
zationacrossagenome,particularlyfromanunsupervisedapproach.
Forexample,thepromotercofactorsMediator,SWI–SNF,SAGA,NuA4
andtheircognatessTFseachformedtightmeta-assemblagesthatwere
located near each other but far from gene-body elongation factors
(Fig. 1c).Proteinsofreplicationorigins,subtelomeresandcentromeres
also formed distinct tight meta-assemblages that were far from each
otherandfromgenemeta-assemblages.Thisprovidedstrongvalidation
oftheChIP–exo/seqapproachandepitopetagging.Notably,wecannow
linkmostssTFswiththeircognatecofactorsandpromoterarchitecture.
Proteinarchitectureatgenomicfeatures
DNA replication initiates at 253 autonomously replicating sequence
consensussequence(ACS)elementsthatareconstitutivelyboundby
origin recognition complexes (ORCs)6
. The ‘ORC’ meta-assemblage
contained six measured targets (Fig. 2a and Extended Data Fig. 4),
which gave highly structured ChIP–exo patterns based on ORCs and
the DNA helicase MCM, spread over roughly 300 base pairs. ORCs at
nucleosome-freeACSsengulfedaneighbouringnucleosome.Thebind-
ing of Mcm5 from ORCs was offset by 50–100 bp, consistent with a
recently published model based on cryo-electron microscopy7
.
SubtelomericX-elementsrepresentaheterochromaticenvironmentthat
isrepressedbysilentinformationregulators(SIRs),functionallysupporting
telomeres8
.Indeed,wefoundthatSIRproteinsformedastructurallyrobust
meta-assemblageonasinglenucleosome,centredonroughly300-bp
X-coreelements(XCEs),alongwithORC/MCMsandinsulatorssTFsat
twoflankingnucleosomes(Fig. 2b).KU(Yku70)andRIF(Rif1)complexes,
alongwithssTFsFkh1,Abf1andReb1,werepresentatthevastmajorityof
mappableX-elements.However,aSko1-mediatedTup1repressioncomplex
waspresentatonlyhalf,perhapsreflectingvariablerepressioncapabilities
ofsubtelomericregions.Thus,XCEsappeartocreateawellstructuredtriple
nucleosomeensemblecomprisingmajorrepressorproteins.
Thecentromericmeta-assemblage (‘CEN’)contained12targetsat16
centromeres(Fig. 2c),whichareresponsibleforproperchromosomal
segregationduringcelldivision.Theyincludedsite-specificallybound
Cbf1 at the centromere centre (CDE I) and kinetochore components
offset by roughly 100 bp towards the AT-rich CDE III elements9
. These
factors generated strong and well positioned crosslinks covering
roughly 170 bp of DNA, suggesting that they are positionally fixed to
CDEs. Condensin and cohesin play a part in chromosomal conden-
sation and segregation. They were absent from the centromere and
insteadoverlappedthesurroundingnucleosomes,suggestingthatthey
interactwithnucleosomes.Incontrastwithlower-resolutionmaps10,11
,
histones were not detected at centromeres, despite robust detection
of histone-like Cse4 and kinetochore components there, and robust
detection of histones (H2A, H2A.Z, H2B, H3 and H4) in the immediate
flankingregions12
.Thus,yeastcentromeresappeartolackthehistone
components of a nucleosome in vivo. The resident kinetochore com-
plexprotectsanucleosome-sizedregionofDNAfromnucleases,which
was a basis for a nucleosome originally being called there13
. Nonethe-
less, Cse4-containing nucleosomes have been defined biochemically
and structurally in vitro10,14
, and so the question remains open.
The Pol I complex produces ribosomal RNA (rRNA) from a single
highly repeated gene. It contained TBP anchored near the rRNA TSS
(Fig. 3a).Italsohadmajorcrosslinkinginteractionswiththewellposi-
tionedPol-I-specificupstreamactivatingfactor(UAF,Uaf30)complex,
whichcoveredroughly70 bpbetween−155 bpand−60 bpfromtheTSS.
UAFalsohadreciprocalcrosslinkswithTBPatthecorepromoter.Thus,
thePolIinitiationcomplexhasafixedbipartiteengagementthatcovers
around 200 bp of rRNA promoter DNA, with an intervening 100 bp or
so.ThebroadextensionofPolIdownstreamintotherRNAgenebody
withlessoccupancyatpromotersindicatesthatPolIdissociatesrapidly
from its PIC into an elongating state.
Pol III of the ‘POL3’ meta-assemblage transcribes 272 highly similar
genes encoding transfer RNAs (tRNAs). It contained 18 targets that
couldbeseparatedintoTFIIIB/CandPolIIImeta-assemblages(Fig. 3b).
Theirorganizationmatchedlocationsmodelledfromatomicstructures
oftheTFIIIB/PolIIIpromotercomplex15
,butwiththeTBPcomponent
of TFIIIB crosslinking approximately 30 bp upstream of the TSS. The
ChIP–exo pattern further demonstrated that TFIIIC and Pol III make
crosslinksnotonlyattheinternalAandBboxes,butalsoatcoincident
locations roughly 40 bp upstream of TBP. Owing to DNA bending by
a c
b
Elongation
and chromatin
regulation
Promoter
regulation
PIC
Pol II
Pol III
Replication
Colocalization of 371 targets
High
Low
All features in this study
Transcribed
(7,741)
Non-transcribed
(295)
Replication
(253)
X-element
(25)
Centromere
(16)
Coding
(6,121)
Non-coding
(1,346)
CUTs (447)
SUTs (365)
XUTs (440)
NCR (94)
Pol I
(2)
Pol II
(7,467)
Pol III
(272)
RPG
(137)
STM
(984)
TFO
(1,783)
UNB
(2,474)
LTR
(357) tRNA-proximal (135)
No PIC (251)
•••••••••
•••
• •• PIC
occupancy
TFIID-dominated SAGA-dominated
Other
(3,076)
Not analysed
(11,112)
–15
–10
–5
0
5
10
15
–15 –10 –5 0 5 10 15 20 25
Histones
Tup1
MET
SIR
NuA4
SAGA
RPD3-L
SBF
SWI/SNF
Mediator
GTFs
TFIID
SWR
ORC
Splicing
THO
Nrd1
CPSF
Pol II Spt5
RSC
CEN
TFIIIB/C
Pol III
ssTFs
PAF
Set1
ssTFs ISO
ssTFs
UMAP
Axis 2
Axis
1
Histones
Tup1
MET
SIR
NuA4
SAGA
RPD3-L
SBF
SWI/SNF
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
Mediator
GTFs
TFIID
SWR
ORC
S
THO
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
Nrd1
CPSF
Pol II
RSC
CEN
TFIIIB/C
ssTFs
Set1
ssTFs ISO
ssTFs
UMAP
Fig.1|Genome-widemeta-assemblages.a,Classesofgenomicfeatures,with
Nmembershipsanalysed(SupplementaryData 1 (1D)).PolIIclassesarefrom
thisstudy(see Methods),alongwithrelativePICoccupancylevels(green
dots). CUTs, crypticunstabletranscripts;SUTs, stableunannotated
transcripts;XUT, Xrn1-sensistiveunstabletranscripts;NCR,noncodingRNA.
b,Hierarchicalclusteringshowingthegenome-widecolocalizationof371
targets(SupplementaryData 3).c,UMAPprojectionshowingthecolocations
of371targets(colouredonthebasisofK-means;SupplementaryData 2 
(1C,1D)).AU,arbitraryunits.
Nature | Vol 592 | 8 April 2021 | 311
TBP,thisregionisincloseproximitytoTFIIIB/CandPolIIIwithingene
bodies. Equivalent positions of crosslinking points were observed
acrossallTFIIIB/C/PolIIIsubunits.Thissuggeststhatasinglepredomi-
nantstructureenvelopesentirePolIIIgenesandapproximately70 bp
upstream, as it makes a short (roughly 80 bp) transcript.
There are around 7,500 distinct Pol II transcription units (defined
byaTSS/PIC),ofwhichapproximately80%codeforproteins.Targets
that are associated with transcription elongation generally matched
Pol II occupancy across gene bodies, but unlike Pol II (Rpb3) were not
presentatpromoters(Fig. 3candExtendedDataFig. 5).Instead,occu-
pancy within genes increased in the 5′ region and decreased in the 3′
region, with many having distinct ‘entry/exit’ points, consistent with
other studies16
. Whether these are true cotranscriptional entry/exit
pointsoraresimplycrosslinkableretentionsitesisnotclear.Termina-
tionfactorssuchasPcf11werefoundprimarilyatsitesoftermination,
alongwithnearbycohesin.Therewaslittleevidenceofthebindingof
a Subtelomeric X-elements (XCE)
b c Centromeres (CEN)
Histone H4
Kinetechore
Mcm16
C
b
f1
b
f
b
f
f
b
f
b
f
b
b
b
f
b
f
f
f
b
f
b
f
b
b
f1
f1
f1
f1
b
b
b
f
f
b
f
b
f
b
f
b
f
b
f
b
b
f
f
f
b
b
Cse4
Nkp2
Nucleosome dyads
CEN
Smc3
(Cohesin)
Cse4
Mcm16
Nkp2
Cbf1
0
–500 500
Distance from CEN start (bp)
Opposite
strand
Same
strand
Occupancy
(AU)
0
–500 500
Distance from ACS start (bp)
ACS
Nucleosome dyads
DNA replication origins (ACS)
Mcm5
Orc6
Orc6
Mcm5
N = 253
–500
Reb1
ORC
O
O
O
O
O
O
O
O
O
O
O
O
O
Fkh1
Tup1
Cyc8
Sko1
X
X
X
X
X
X
Abf1
Sir2,3,4, Yku70
, ,
C
C
C
C
Rif1,2
N = 25
0 500
Distance from XCE start (bp)
Nucleosome dyads
XCE
Fig.2|Architectureatnontranscribedfeatures.a–c,Averageddistribution
ofstrand-separated5′endsofChIP–exosequencingtags(exonucleasestop
sites;seeExtendedDataFig. 1b),showingrepresentativetargetsaround
strand-orientedannotatedfeatures.Thediagramsatthetopofeachpanelare
cartoonrepresentationsofDNA,nucleosomesandproteinfactorsthatbindto
DNAreplicationorigins (a),subtelomericX-elements (b)orcentromeres (c).
Therelevantstartsequences(colouredAs,Ts,CsandGs)arealsoshownin a, c.
Underneatharecompositedatashowingthedistributionoftheprotein
factors.Same-stranddataareorientedwith5′to3′tobereadfromlefttoright.
Opposite-stranddataareinverted(righttoleftis5′to3′).The y-axesshow
lineararbitraryunits(AU),whicharenotcomparableinmagnitudeacross
differentdatasets.NucleosomedyadswerederivedfromMNase-digested
chromatinthatwasassayedbyH3/H2BChIP–seq(strandsaveraged).
a
d
b
Occupancy
(AU)
Opposite
strand
Same
strand
Opposite
strand
Same
strand
TFIIIC-τA
(Tfc4) TFIIIC-τB
(Tfc6)
Pol III
(Rpo31)
TFIIIB
TBP (Spt15)
RNA polymerase III tRNA
TFIIIC
TFIIIB
0
–500 500
Occupancy
(AU)
Distance from Pol I TSS (bp)
0
–500 500
Distance from Pol III TSS (bp)
0
–500 500
Distance from Ty3 start (bp)
Opposite
strand
Same
strand
Pol I (Rpa135)
RNA polymerase I
TBP (Spt15)
UAF (Uaf30)
UCE
rRNA
TBP Pol I c
±500
Distance from Pol II TSS (bp) Distance from Pol II TES (bp)
Occupancy
(AU)
0
–500
RNA polymerase II
Rpb3
Set2
Set1
Elf1
Set3
0 500
Set2
Set1
Elf1
Set3
Nrd1
Pcf11
Rpb3
Pol II Ser2P
TES
Smd1
Spn1
Paf1
Spt5
Spt6
Spt16
Pol II
Ser5P
Spt16
Spt5
Cbc2 Spn1
Paf1
Spt6
+1 +2
TSS
mRNA
Sua7
Ste12
Kar4
Dig1
Transposon Ty3 (σ) LTR LTR
Ste12
Dig1
tRNA
Ty3
Occupancy
(AU)
A B
TFIIIC-τB
(Tfc6)
Fig.3|Architectureattranscribedfeatures.a–d,Experimentswerecarried
outasinFig. 2,butfortranscribedfeatures.Ina,UCEisanupstream control
elementatPolIpromoters.In b,AandBareboxelementsatPolIIIpromoters.
Inc,theproteinarchitectureforRP genesisshown (notstrandseparated);
Ser2PandSer52Parephosphorylatedserines2and5ofheptadrepeats;grey
arrowsshow nucleosomedyads.
312 | Nature | Vol 592 | 8 April 2021
Article
elongation/termination-associatedfactorsbeingrestrictedtospecific
sets of genes, except that Nrd1 of the early termination pathway was
enriched at noncoding transcription (ncRNA) units (Extended Data
Fig. 5a,lowerleft).Inaddition,RNA splicingfactors(suchasSmd1)were
largelylimitedtothe3′halfofintronic genesencodingribosomalpro-
teins(RPs;ExtendedDataFig. 5b,upperright).Thedataareconsistent
with one predominant elongation entourage at most Pol II genes that
changesincompositionatfixeddistancesfromtheTSS or transcription
end site (TES) (rather than at a percentage of gene length).
Consistentwithsomeotherreports17,18
,althoughnotall19–21
,wefound
no evidence for Mediator being stably associated with the Pol II core
initiation or elongation entourage, despite its detection in upstream
promoter regulatory regions (for example, Med2 in Extended Data
Fig. 5b). Equivocal binding in gene bodies may be related to approxi-
mately 100 genes that produced relatively high and variable back-
ground in ChIP assays (see Methods).
Thelongterminalrepeats(LTRs)ofcertainclassesofTytransposons
aretranscribedbyPolIIaspartofretroviral-liketransposition22
.However,
mostlackedaPIC,exceptasubsetoffull-lengthTy1,2(δ)(ExtendedData
Fig. 6).AtTy3(σ),thePolIIpheromonefactorsSte12,Dig1andKar4were
assembledandhadnearlyidenticalpointsofcrosslinking(Fig. 3d).How-
ever,insteadofPolII,wedetectedthePolIIImachineryassociatedwith
adjacentdivergenttRNAgenes.ThissuggeststhatPolIIssTFsmaywork
withPolIIIatsometRNAgenestointegratematingandTy3transposition22
.
Inducibleversusconstitutivepromoters
In classifying Pol II promoters, we opted against an unsupervised
approach,asittreatsbindingeventsequivalently,withoutconsidering
thatcertaintargetshaveamorecentralroleindefiningspecificregula-
tory architectures. Four fundamentally distinct architectural themes
emerged (see Methods, Fig. 4a and Supplementary Data 1 (1D)): first,
an RP theme, as seen for 137 RP promoters with unique architectures
(examinedseparately23
);second,anSTMtheme,asfor984promoters
that had properties associated with inducibility, and characteristi-
callyboundbyssTFsandmajorcofactormeta-assemblagesSAGA,TUP
and/orMediator/SWI–SNF;third,aTFOtheme,from1,783promoters
with a ssTF organization that lacked STM cofactors (but typically had
the insulator ssTFs Abf1 or Reb1); and fourth, a UNB theme, as seen
with 2,474 promoters that were unbound by anything except a PIC.
Notably, as detailed in the Supplementary Information, the consen-
sus architecture at TFO/UNB promoters indicates that two-thirds of
all promoters evolved to lack regulation by ssTFs and their cofactors
under any condition (not just in rich media). This is an architecture
suitableforconstitutivelylowgeneexpression.RPandSTMrepresent
thearchitectureofinduciblepromotersthathaveupstreamactivator
sequences(UASs).Theroughly1,300ncRNApromotersweresimilarly
classified(SupplementaryData 1 (E)),indicatingthattheyaregoverned
by the same regulatory mechanisms.
Assembly of Pol II PICs occurs in the context of chromatin, where
the TSS resides on the inside edge of a downstream +1 nucleosome
(Fig. 4b).MostpromotershaveaconstitutiveNFR.Theseeminglyinter-
changeableterm‘NDR’—applyingtonucleosomedepletionmediated
by ssTFs—is problematic. As ssTFs are absent from UNB promoters,
they should lack ssTF-regulated nucleosome depletion and an NDR.
We therefore considered whether NFRs and NDRs are distinct.
NFRs at TFO/UNB promoters were short (less than 150 bp) and
bisected by a pair of oppositely stranded, nucleosome-disfavouring
In vitro reconstitution
0
–700 300
Poly
(T:A)
+ INO80
NDR
NFR
Nucleosome
dyads
+
RSC
In vivo
In vitro
Occupancy
(AU)
a b c
N = 984
N = 1,783
N = 2,474
PIC
TFIIB
(Sua7)
N = 1,783
N ,
PIC
TFIIB
(Sua7)
N = 984
N
H2A.Z
NDR
Nucleosome
dyads
+1
–1
Nucleosome
dyads
TSS
Insulator ssTFs
GENE
NFR
Stable
–2 –1 +1
ssTFs and
cofactors
0
–700 300
Distance from +1 nucleosome dyad (bp)
STM
UNB
TFO
d
e
0
–700 300
UNB
(1,097)
Occupancy
(AU)
Reb1
PIC (Sua7)
Pcf11
0
–700 300
Distance from +1 nucleosome dyad (bp)
TFO
(292)
Insulation: tandem genes
Insulation: divergent genes
–0.1
0.0
0.1
0.2
0.3
0.4
Correlation
Divergent
transcription
N
STM UNB
TFO
RP
Reb1-
bound
Parent
No
AA
Rap1
AA
Reb1
AA
111 237 78 388
Nascent transcription
TF
PIC
STM
TFO
UNB
Correlated transcription
Pol II promoters
RPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-STMSTMSTMSTMSTMSTMRP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-TFOTFOTFOTFOTFOTFOTFOTFOTFOTFOSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTM TFOTFOTFOTFOTFOTFOTFOALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLSTMSTMSTMSTMALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALL
LeaMudPrpPrpPrpRtsSmTafTafTaf4Taf5Taf6Taf9TafTafTafTafTaf7Taf8BcyCsnElf1NrdPubRpoSenTpkBreBreCtkCtkDbpHosHosNplRadSdcSetSetSgvShgSif2SppSwdSwdEafEafEsaNhpStbYngSgfSptAdaSgfVidGcnNggSptArpHsfDatIfh1Rtr1AcsCcaChdCkaDstFunHstLysNabSccTopFhl1HmRebRvbAorArpBdfBdfHtzSwcSwVpsVpsAbfAzfCrzRpnSteMsnMssRtgSrbMedSrbSrbSsnSsnUbpSptSptRtt1NrmPhoRpdRxtRxt2Rxt3SapSin3GcnUmAroHotSnfSwSwHapLeuPipStbStpTeaHprMftRlr1SetThoThpOafCycTupCinCupRdsSkoStpTbsNrgNrgPhdSfl1SokSutCseMedMedNutRgrSin4SohSrbRtt1HalMacYapYrr1ZapFzf1Hir2Hir3StpGlnSptSptMetMetMetPdrSumStbWhNddFkhMbSwSwAceSnfGalMedPgdAft2HmAft1SwSknHtl1RscRscRscRscRscSfhSthRscGzfRfx1GisBasMigMigRlmSnfTdaUmNutHemRoxEcmFkhGcrMcPdrStbUrcWaGodRif1Rif2LysRapSfpMotBurNcbTaf3CTDCTDCclKinSptSsl1SubTaf2TfbTfbTfbToaByeRadSsl2SuaTfaTfa2TfbTfgIno2Ino4MatRscCTDRpbRpbRpbRpbRpbYtaCft1NafPapPcfRef2RnaRtt1SwdYshIswBurCbcSpnPobSptSptSptSptSptCtr9LeoPafRtf1VidHstTbfRet
AprAprApro
AprAprAprAproatioatioatioatioatioatioatioatioatioatioatioatioeraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
merasclusclusclusclusclusclusclusclusclusclusclusclusclusclusclusclusclusatinatinatinatinatinatinatinatinatinatinmeras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
atinatinatinatinatinatinatinatinatinatinmeras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
Apro
Apro
AprAprAprApro
atinatinatinatinatinatinatinatinatinatinatinatinatinatinatinmeras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
meras
atinatinatinatinatinatinatinmeras
meras
meras
meras
meras
meras
meras
meras
meras
meras
atinatinatinatinatinatinatinatinatinatinatinatinatinatinatinatin clusclusclusclusclusclusclusclusclusclusclusclusclusclus
meras
meras
meras
meras
meras
merasatioatioatioatioatioatioatioationatioatioationatioatioatioatioatioatioatioatioatioatioatioatioatioatioatioatioeraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraserasclusclusclus
Polym
7 7 7 7 7 7 7 14 14 14 14 14 14 36 36 36 36 36 36 20 20 20 20 20 20 20 20 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 18 18 18 18 18 18 18 18 18 18 29 29 29 29 29 29 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 11 11 11 11 11 11 11 11 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 2 2 2 2 2 2 2 2 2 2 16 16 16 16 16 16 16 16 16 16 16 19 19 19 19 19 19 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 24 24 24 24 24 24 24 24 24 24 24 24 24 24 31 31 31 31 31 31 31 31 31 31 31 31 32 32 32 32 32 32 32 39 39 39 39 39 39 39 39 39 39 13 13 13 13 13 13 13 13 13 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 15 15 15 15 18 18 18 18 18 18 26 26 26 26 28 28 28 28 28 28 28 28 28 28 28 28 0 0 0 0 0 0 0 0 0 0 0 3 3 3 3 3 3 3 3 22 22 22 22 22 22 22 22 22 33 33 33 33 33 33 33 33 33 33 34 34 34 34 35 35 35 4
SPLSPLSPLSPLSPLSPLSPLTAFTAFTAFTAFTAFTAFTFIID
TFIID
TFIID
TFIID
TFIID
TFIIDNRDNRDNRDNRDNRDNRDNRDNRDSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETNUANUANUANUANUANUANUANUANUANUAHATHATHATHATHATHATISOISOISOISOISOISOISOISOISOISOISOISOISOISOISOISOISOSWRSWRSWRSWRSWRSWRSWRSWRSWRSWR
MDHMDHMDHMDH
MDHMDHMDH
MDHMDHMDHMDH
MDHMDHMDHMDHMDHMDH
MDHRPDRPDRPDRPDRPDRPDRPDRPDRPDRPDSWSWSWSWSWSWSWSWSWSWSWTHOTHOTHOTHOTHOTHOTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUP
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDM
MDMMETMETMETMETMETMETMETMETMETMETMETMETSBFSBFSBFSBFSBFSBFSBFMDTMDTMDTMDTMDTMDTMDTMDTMDTMDTRSCRSCRSCRSCRSCRSCRSCRSCRSCTUPTUPTUPTUPTUPTUPTUPISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4NUANUANUANUANUANUANC2NC2NC2NC2TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIH
TFIIHTFIITFIITFIITFIITFIITFIITFIITFIITFIITFIITFIIPOLPOLPOLPOLPOLPOLPOLPOLCPSCPSCPSCPSCPSCPSCPSCPSCPSDSIFDSIFDSIFDSIFDSIFDSIFDSIFDSIFDSIFDSIFPAFPAFPAFPAFISO2ISO2ISO2POL
OthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthCofCofCofCofCofCofCofCofCofCofCofCofCofCofOthTF CofCofOthOthOthOthOthOthOthOthOthOthOthOthTF TF TF OthOthOthOthOthOthOthOthOthOthTF TF TF TF TF CofCofCofCofCofCofCofCofCofCofCofCofOthCofCofCofCofCofCofCofCofTF TF CofCofCofCofCofTF TF TF TF TF TF OthOthOthOthOthOthCofCofCofTF TF TF TF TF TF TF TF TF TF TF TF CofCofCofCofCofCofCofCofOthTF TF TF TF TF CofCofCofCofOthOthOthTF TF TF TF TF CofCofOthTF TF TF TF CofCofCofCofCofTF TF TF TF TF CofCofCofCofCofCofCofCofTF CofCofCofTF TF TF TF CofCofCofCofOthOthTF TF TF TF TF TF TF TF OthOthOthTF TF TF OthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthTF TF TF CofOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthCofOthTF Oth
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 6 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 7 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 7 # # # # # # # # # # # # # # # # # # # # # 7 # # 7 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 4 # # # # # # # # # # # # # # # # 2 # # # # # # # # # # # # # # # # #
PIC
(Sua7)
Abf1 Reb1
TFO
UNB
H2A.Z
STM
RP
371 targets
H2A.Z
Fig.4|Classificationofinducible,insulatedandconstitutivePolII
promoters.a,Individualpromoters(rows)canbegroupedintofour
architecturalthemes(colouredboxes)andsortedbyPICoccupancylevel.
Targetsarelistedatthetopofcolumns,witharrowsdenotingAbf1,Reb1and
H2A.Z.Blacklinesdenotetargetbinding(SupplementaryData 2 (3)).b,Top,
diagram,andbottom,examplecompositedatafortheSTM,TFOandUNB
classes.‘ssTFsandcofactors’representsacombinedsetoftargetlocations
determinedbyChExMixforthosetargetslabelledassuchinSupplementary
Data 2 (1K),includingssTFs,SAGA,TUPandMediator.c,Compositedatashow
thatSTMpromotershaveNDRs,whereasTFOandUNBpromotershaveNFRs.
In vitronucleosomesassembledwithpurifiedgenomicDNAandhistones
(blackfilledareas)hadATPpluseitherpurifiedRSC(yellow)orINO80(purple)
added(datafromref.24
).Poly(T:A)regionsaresense-strandtracts(largerthan
5 bp)ofAs(red)orTs(green). d,Insulator ssTFsuncoupledivergent
transcription.Dataonnascenttranscription(CRACdata26
)forcontrolstrains
orstrainsdepletedofRap1orReb1bytheanchor-away(AA)techniquewere
collectedforN divergentgenepairssharingthesamepromoterregion,then
correlatedbetweenthegenepairs.Totherightarediagramsofdivergentgene
pairs,withthedifferentialsizeofeachgreenarrowpairreflectingtheextentof
insulation. e,TheterminationfactorPcf11accumulatesatinsulatorssTFs.
Shownisthearchitectureatpromotersadjacenttoanupstreamtermination
region(tandemgenes)andhaving(TFO)orlacking(UNB)aninsulatorssTF.
Nature | Vol 592 | 8 April 2021 | 313
poly(dA:dT)tracts(Fig. 4c,red/green).NFRshavebeenbiochemically
reconstitutedongenomicDNAwithpurifiedhistonesandchromatin
remodellers24
. When applied to our promoter classes, we found that
histonesalonepartiallyreconstitutedNFRsin vitroatTFO/UNBpromot-
ers,butlesseffectivelyatSTMpromoters(Fig. 4c,compareblack-filled
dipswithin vivoplots,andExtendedDataFig. 7a).TFO/UNBNFRswere
widened by the RSC remodeller (Fig. 4c, compare the yellow-filled
widerdipwiththeblack-filledareas)andhadtheir−1/+1nucleosomes
positionedbythechromatin-remodellingATPaseINO80(purplefill)24
.
STM promoter nucleosomes, by contrast, had an intrinsic capacity
to form nucleosomes and were less responsive to RSC and INO80
(Fig. 4c,verticalarrowaround−400).TheyboundtossTFsandcofac-
tors in vivo (Fig. 4b, magenta), and were nucleosome-depleted at the
−1/−2nucleosomepositions.Thesesameregionshavebeeninterpreted
tohaveMNase-sensitive‘fragile’nucleosomesin vivo(Supplementary
Data 1 (BX); 69% were ‘fragile’ at STMs versus 19% at UNBs). However,
our data indicate that sensitivity to MNase might reflect the binding
of ssTFs/cofactors rather than unstable nucleosomes25
. Thus, induc-
ible promoters have NDRs, while constitutive promoters have NFRs.
Inthecompactyeastgenome,promotersandterminatorsoftenshare
thesameNFRs or NDRsatadjacentgenes,withthepotentialtomutually
influencetheirexpressionunlessinsulated26
.Insupportofthis,PICoccu-
pancyatdivergentpromoterpairswaslesscorrelatedatTFOpromoters,
whichhaveinsulatorssTFs,comparedwithUNBpromoters(Extended
Data Fig. 7b). The same was observed for divergent nascent transcrip-
tion (Fig. 4d). RP/STM divergent promoters also showed low nascent
transcriptioncorrelation.Anchor-awayremovalofRap1,whichbindsRP/
STMpromoters,resultedinahighercorrelation(Fig. 4d,red).Thiswas
notobservedwithremovalofReb1,whichmainlybindsTFOpromoters.
RemovalofReb1,butnotRap1,resultedinhighercorrelationsatTFOand
Reb1-boundpromoters(Fig. 4d,cyan).Asanegativecontrol,removalof
Rap1hadlittleeffectatReb1-boundpromoters.Wesuggestthatinsula-
tor ssTFs such as Rap1 and Reb1 uncouple divergent transcription at
promoters to which they bind. Similarly, where a gene terminator is
shared with a promoter (tandem genes), the termination factor Pcf11
overlapped with the adjacent PIC, unless an insulator ssTF intervened
(Fig. 4eandExtendedDataFig. 7c).Thisfindingsupportspriorconclu-
sionsoninsulatorsthatwerebasedonnascenttranscription26
.
Taken together, these results suggest that the assembly of PICs is
mechanistically tied to PIC assembly at adjacent upstream divergent
genes,andtotranscriptionterminationattandemgenes,unlessthese
eventsareinsulated.Insucharchitecturalarrangements,someinsula-
torssTFsmaynotactasdirecteffectorsoftranscriptionbyrecruiting
cofactors,butinsteadinsulateanddirect−1/+1nucleosomeposition-
ing24
. Others may recruit cofactors in a condition-specific way.
ssTF–cofactorinteractionsandcircuits
A comprehensive set of 78 ssTFs were detectably bound to promot-
ers in rich media (Supplementary Data 2 (1K)). A search of the JASPAR
databaseofinteractionsbetweenssTFsandsequencemotifsindepen-
dently confirmed proper motif specificity for 90% of the ssTFs (Sup-
plementaryData 2 (1M)).SomessTFshadrobustChIP–exopatterning
aroundtheircognatemotif(ExtendedDataFig. 8a;forexample,Cup9
andCin5),whichreflectstheirsite-specificstructuralinteractionswith
DNAonagenomicscale.Remarkably,mostssTFshadrelativelydiffuse
ChIP–exo patterning flanking their motif (Extended Data Fig. 8a; for
example,Nrg1,Bas1andYrr1).AsexemplifiedbyYrr1inFig. 5a(magenta
versus cyan areas), the diffuse patterning of ssTFs was particularly
pronouncedatsiteswithmultipleSTMcofactorspresent(forexample,
SAGA, TUP, Mediator, SWI–SNF and RPD3-L), and less diffuse at other
sites that bind the same ssTFs but lack STM cofactors. STM cofactors
mayimpartadistinctlocalenvironmentthatresultsinmoredispersed
crosslinking.ThesamediffusepatterningoccurredwithSTMcofactors
whichwereanchoredatssTFsites(Fig. 5aandExtendedDataFig. 8b).
As they tend to co-occupy the same set of promoters (Extended Data
Fig. 9a, Supplementary Data 2 (1K)), ssTFs might coexist with multi-
ple positive/negative cofactors of chromatin accessibility and Pol II
recruitment. This diffuse patterning is consistent with the notion of
condensates that are anchored by ssTFs27
.
In contrast to STM cofactors, we detected essentially no ChIP–exo
patterning of TFIID, TBPs or any GTFs at a consolidated set of ssTF
sites, despite identifying these GTFs in the periphery where TSSs
reside (Fig. 5b and Extended Data Fig. 9b). Thus, although using the
same paradigm for detecting ssTF–cofactor interactions, our results
in yeast do not support the long-standing model that ssTFs stably
engage TFIID at promoters. PIC assembly is driven by TFIID at nearly
allgenes28
,althoughatinduciblegenesitisaugmentedthroughSAGA
independently of TFIID28–30
. Although the gene specificity of SAGA
has been enigmatic and controversial31
, the ChIP–exo assay detects
SAGAatonlyasubsetofgenes.Thediscrepancymayresideinthelow
specificity of other assays32
.
We addressed the specificity of SAGA further. As a direct readout
of TFIID-independent PIC assembly, we expected high levels of GTFs
relative to TFIID where SAGA is bound. However, we found that most
SAGA-boundpromoters(RP/STM/‘SAGA-bound’)lackedhighratiosof
GTFstoTFIID,althoughasmallerfractiondidhavehighratios(equiva-
lentmodesandrightwardtailinFig. 5candExtendedDataFig. 9c).Thus,
SAGA binding is not always concomitant with TFIID-independent PIC
assembly, and may reflect a poised state. Instead, promoters having
multipleSTMcofactorsdisplayedhighGTF/TFIIDratios(‘STM-bound’
and ‘RSTM-bound’ in Fig. 5c). Thus, maximal TFIID-independent PIC
assembly is achieved under conditions in which there is maximal
engagementofawidevarietyofnegativeandpositivessTFsandcofac-
tors with NDRs, including but not limited to SAGA.
PromotersboundbyssTFsincludedbothcognate(motif-based)and
noncognateinteractions(ExtendedDataFig. 10).Inassessingcognate
interactions, we found that most ssTFs bound to the promoters of
a
0
–500 500
Distance from Yrr1 motif (bp)
Opposite
strand
Same
strand
Occupancy
(AU)
Yrr1
STM
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
ST
T
T
T
T
T
T
T
T
T
T
T
T
T
T
TM
M
M
M
M
M
M
M
M
M
M
M
M
M
M
Yrr1
λexo
TUP (Tup1)
SAGA (Sgf73)
Mediator (Med2)
ssTF (Yrr1)
ssTF (Yrr1)
Mediator
(Med2)
STM-bound
Yrr1 sites
Not STM-bound
c
RP
UNB
TFO
670
Frequency
TFIID
PIC
‘SAGA-bound’
PIC/TFIID (GTF/Taf2) log2 ratio (AU)
‘STM-bound’
‘RSTM-bound’
N = 109
N = 305
N = 52
b
0
–500 500
Occupancy
(AU)
Distance from consolidated ssTF motif (bp)
(Taf12)
SAGA and TFIID
Mediator (Med2)
SAGA (Sgf73)
TFIID
(Taf2)
bkd
TBP
(Spt15)
TFIIB
(Sua7)
N = 52
–2.5 –0.5 1.5 3.5 5.5
0.3
0.2
0.1
0.0
Fig.5|ssTFsstablyinteractwithSTMcofactorsbutnotGTFs. a,Architecture
atYrr1motifsintwoclassesofYrr1-boundpromoters:‘STM-bound’(labelson
left)and‘notSTM-bound’(cyanandblacklabelsonright)(Methods).Thearrow
pointstowherecofactorcrosslinkingpermeatesYrr1crosslinking.
b,RepresentativearchitectureofSTMcofactorsorPICcomponentsata
consolidatedsetofssTF-bindingmotifsatRSTMpromoters(strandaveraged;
see MethodsandSupplementaryData 1 (1AI)),andorientedbyTSS.Taf12isin
SAGAandTFIID;bkd,backgroundthatwasgeneratedfromastrainlackinga
TAPtag.c,FrequencydistributionofpromotershavingtheindicatedPIC/TFIID
ratios(averageofsixGTFs;three-binmovingaverage),separatedbypromoter
class(RP,STM,TFOorUNB)orpromotersetsbasedoncofactorenrichment.
‘SAGA-bound’excludesRPpromoters,whicharehighlyenrichedwithSAGA
andshownseparately.The‘STM-bound’promotersetrequiredallofthe
followingtobepresent:SAGA,Mediator/SWI/SNFandTUP;‘RSTM-bound’also
requiredthepresenceoftheRPD3-Lcomplex.Thex-axisisinarbitraryunits.
314 | Nature | Vol 592 | 8 April 2021
Article
around 4–30 genes; roughly 20% of ssTFs bound 50–100 genes each;
and8,whichweremostlyinsulator-like(Abf1,Reb1,Cin5,Mcm1,Tbf1,
Ume6, Fkh1 and Rap1), bound more than 100 genes each.
ssTFs also bound to the promoters of genes encoding other ssTFs
(Extended Data Fig. 10), from which archetypical regulatory circuit
motifs have been described33
. About half of all these ssTF-encoding
geneslackedboundssTFs(42of78wereUNBs),andthusareexpected
to be constitutive and at the start of their regulatory circuit. Of note,
abouthalf(43of78)ofthessTFsexistedwithinasinglehighlyintegrated
circuit,suggestingthattheregulationofssTFsishighlyinterconnected.
Eleven ssTFs bound to multiple ssTF-encoding genes (multi-output
archetype), suggesting that they have the potential to diversify their
control through other such factors. Most ssTFs (47 of 78) bound only
one other ssTF gene (single output), thereby propagating the circuit.
TherewerelongregulatoryserieswithasmanyassevenssTFsinseries
thatbifurcatedand/orlooped(ExtendedDataFig. 11a).Aboutone-third
ofthessTFsboundtotheirownpromoter(inasimpleloop),indicating
thatdirectfeedbackcontrolofthesefactorsiscommon(anautoregula-
tionarchetype).NinepromotersofssTF-encodinggeneshadmultiple
ssTFs bound (multi-input archetype; Extended Data Fig. 11b). In most
cases,eachboundssTFwasamemberofadifferentmeta-assemblage
(for example, RPD, SAGA, TUP and MED). Thus, numerous regulatory
mechanisms/meta-assemblagesmayconvergeatpromotersthrough
distinctssTFs.One-quarter(21of78ssTFs)boundtonootherssTFgene
and thus are likely to be at the end of their circuit.
Conclusions
Consistentwithpublishedstudies,wehavefoundthatthevastmajor-
ity of Pol II promoters share the same basic constitutive architecture.
LocalDNAsequenceandchromatinremodellerscreateaconstitutive
NFR that is flanked by stable and well positioned nucleosomes. This
is recognized by TFIID and is configured for constitutively low gene
expression.ssTFsandcofactorsaregenerallynotinvolved,exceptthat
somessTFs(suchasAbf1andReb1)organizenucleosomesandinsulate
against nearby genomic events.
ssTFs and cofactors that directly regulate PIC assembly define
roughly20%ofallgenes,withanarchitecturethatsupportsinducibility.
Ourdatasupportadynamic‘futilecycle’ofnucleosomeacetylation(by
SAGAandNua4)anddeacetylation(byRpd3-L),coupledtonucleosome
eviction(mediatedbySWI/SNF)andstabilization(byTup1andCyc8),
which produces an NDR. In this inducible environment, the assembly
of a PIC is augmented beyond what TFIID delivers. The stage is then
set for enhanced recruitment of Pol II via ensembles of ssTFs and the
Mediator complex34
. Much of this induced transcription may exist in
hubs in which numerous induced promoters coalesce, perhaps for
the purposes of efficiently recycling the transcription machinery34
.
Once transcription has cleared the promoter, most genes appear to
encounter the same Pol II ensemble, whose architecture changes at
fixed distances from either the TSS or the TES.
Thiscomprehensivehigh-resolutionviewofgenomicchromatinarchi-
tecturetiesintoourunderstandingofthepost-initiationglobalregulatory
controlofconstitutivegenes35
,andraisesquestionsastohowenviron-
mentalsignallingdirectsinducibilitythroughthecontrolofssTFsand
cofactors.Aclearviewofepigenomicarchitectureshouldprovideabet-
tercontextforunderstandinghowitintegrateswithotherlayersofgene
regulationthatoccurduringRNAprocessing,transportandtranslation.
Moreover,asmostofthekeyproteinsexaminedhereareevolutionarily
conserved,theirarchitecturalthemesarelikelytoexistinothereukaryotes.
Onlinecontent
Anymethods,additionalreferences,NatureResearchreportingsum-
maries, source data, extended data, supplementary information,
acknowledgements, peer review information; details of author con-
tributions and competing interests; and statements of data and code
availabilityareavailableathttps://doi.org/10.1038/s41586-021-03314-8.
1.	 Rossi, M. J., Lai, W. K. M. & Pugh, B. F. Simplified ChIP-exo assays. Nat. Commun. 9, 2842
(2018).
2.	 Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein-DNA interactions detected
at single-nucleotide resolution. Cell 147, 1408–1419 (2011).
3.	 Hahn, S. & Young, E. T. Transcriptional regulation in Saccharomyces cerevisiae:
transcription factor regulation and function, mechanisms of initiation, and roles of
activators and coactivators. Genetics 189, 705–736 (2011).
4.	 Levine, M., Cattoglio, C. & Tjian, R. Looping back to leap forward: transcription enters a
new era. Cell 157, 13–25 (2014).
5.	 Cramer, P. Organization and regulation of gene transcription. Nature 573, 45–54 (2019).
6.	 Eaton, M. L., Galani, K., Kang, S., Bell, S. P. & MacAlpine, D. M. Conserved nucleosome
positioning defines replication origins. Genes Dev. 24, 748–753 (2010).
7.	 Li, N. et al. Structure of the origin recognition complex bound to DNA replication origin.
Nature 559, 217–222 (2018).
8.	 Wellinger, R. J. & Zakian, V. A. Everything you ever wanted to know about Saccharomyces
cerevisiae telomeres: beginning to end. Genetics 191, 1073–1105 (2012).
9.	 Biggins, S. The composition, functions, and regulation of the budding yeast kinetochore.
Genetics 194, 817–846 (2013).
10.	 Camahort, R. et al. Cse4 is part of an octameric nucleosome in budding yeast. Mol. Cell
35, 794–805 (2009).
11.	 Henikoff, S. et al. The budding yeast centromere DNA element II wraps a stable Cse4
hemisome in either orientation in vivo. eLife 3, e01861 (2014).
12.	 Rhee, H. S., Bataille, A. R., Zhang, L. & Pugh, B. F. Subnucleosomal structures and
nucleosome asymmetry across a genome. Cell 159, 1377–1388 (2014).
13.	 Furuyama, S. & Biggins, S. Centromere identity is specified by a single centromeric
nucleosome in budding yeast. Proc. Natl Acad. Sci. USA 104, 14706–14711 (2007).
14.	 Yan, K. et al. Structure of the inner kinetochore CCAN complex assembled onto a
centromeric nucleosome. Nature 574, 278–282 (2019).
15.	 Han, Y., Yan, C., Fishbain, S., Ivanov, I. & He, Y. Structural visualization of RNA polymerase
III transcription machineries. Cell Discov. 4, 40 (2018).
16.	 Mayer, A. et al. Uniform transitions of the general RNA polymerase II transcription
complex. Nat. Struct. Mol. Biol. 17, 1272–1278 (2010).
17.	 Petrenko, N., Jin, Y., Wong, K. H. & Struhl, K. Evidence that Mediator is essential for Pol II
transcription, but is not a required component of the preinitiation complex in vivo. eLife
6, e28447 (2017).
18.	 Jeronimo, C. et al. Tail and kinase modules differently regulate core Mediator recruitment
and function in vivo. Mol. Cell 64, 455–466 (2016).
19.	 Andrau, J. C. et al. Genome-wide location of the coactivator mediator: binding without
activation and transient Cdk8 interaction on DNA. Mol. Cell 22, 179–192 (2006).
20.	 Paul, E., Zhu, Z. I., Landsman, D. & Morse, R. H. Genome-wide association of mediator and
RNA polymerase II in wild-type and mediator mutant yeast. Mol. Cell. Biol. 35, 331–342
(2015).
21.	 Zhu, X. et al. Genome-wide occupancy profile of mediator and the Srb8-11 module
reveals interactions with coding regions. Mol. Cell 22, 169–178 (2006).
22.	 Krastanova, O., Hadzhitodorov, M. & Pesheva, M. Ty elements of the yeast Saccharomyces
cerevisiae. Biotechnol. Biotechnol. Equip. 19, 19–26 (2005).
23.	 Reja, R., Vinayachandran, V., Ghosh, S. & Pugh, B. F. Molecular mechanisms of ribosomal
protein gene coregulation. Genes Dev. 29, 1942–1954 (2015).
24.	 Krietenstein, N. et al. Genomic nucleosome organization reconstituted with pure
proteins. Cell 167, 709–721 (2016).
25.	 Chereji, R. V., Ocampo, J. & Clark, D. J. MNase-sensitive complexes in yeast: nucleosomes
and non-histone barriers. Mol. Cell 65, 565–577 (2017).
26.	 Candelli, T. et al. High-resolution transcription maps reveal the widespread impact of
roadblock termination in yeast. EMBO J. 37, e97490 (2018).
27.	 Brzovic, P. S. et al. The acidic transcription activator Gcn4 binds the mediator subunit
Gal11/Med15 using a simple protein interface forming a fuzzy complex. Mol. Cell 44, 942–
953 (2011).
28.	 Huisinga, K. L. & Pugh, B. F. A genome-wide housekeeping role for TFIID and a highly
regulated stress-related role for SAGA in Saccharomyces cerevisiae. Mol. Cell 13, 573–585
(2004).
29.	 Dudley, A. M., Rougeulle, C. & Winston, F. The Spt components of SAGA facilitate TBP
binding to a promoter at a post-activator-binding step in vivo. Genes Dev. 13, 2940–2945
(1999).
30.	 Moqtaderi, Z., Bai, Y., Poon, D., Weil, P. A. & Struhl, K. TBP-associated factors are not
generally required for transcriptional activation in yeast. Nature 383, 188–191 (1996).
31.	 Baptista, T. et al. SAGA is a general cofactor for RNA polymerase II transcription. Mol. Cell
68, 130–143 (2017).
32.	 Mittal, C., Rossi, M. J. & Pugh, B. F. High similarity among ChEC-seq datasets. Preprint at
https://www.biorxiv.org/content/10.1101/2021.02.04.429774v1 (2021).
33.	 Harbison, C. T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431,
99–104 (2004).
34.	 Boija, A. et al. Transcription factors activate genes through the phase-separation capacity
of their activation domains. Cell 175, 1842–1855 (2018).
35.	 Badjatia, N. et al. Acute stress drives global repression through two independent RNA
polymerase II stalling events in Saccharomyces. Cell Rep. 34, 108640 (2021).
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
© The Author(s), under exclusive licence to Springer Nature Limited 2021
Methods
No statistical methods were used to predetermine sample size. The
experiments were not randomized and the investigators were not
blinded to allocation during experiments and outcome assessment.
Strainsandantibodies
The vast majority of data for this study were collected from tandem
affinity purification (TAP)-tagged S. cerevisiae strains (originally pur-
chased from Dharmacon; now available from Horizon Inspired Cell
Solutions, Cambridge, UK). The background strain for this collection
was BY4741 (a derivative of S288-C; MATa his3Δ1 leu2Δ0 met15Δ0
ura3Δ0). Negative control ChIPs and ChIPs with specific antibodies
were performed with BY4741. If the TAP-tagged strain for a particular
targetwasunavailable,weinsteadusedahaemagglutinin(HA)-tagged
strain(originallypurchasedfromDharmacon;nowavailablefromHori-
zonInspiredCellSolutions).ThebackgroundstrainfortheHA-tagged
collection was diploid, derived from BY4741 and designated Y800
(MATa leu2-D98cry1R/MATα leu2-D98CRY1 ade2-101 HIS3/ade2-101
his3-D200ura3-52caniR/ura3-52CAN1lys2-801/lys2-801CYH2/cyh2R
trp1-1/TRP1 Cir0 carrying pGAL-cre (amp, ori, CEN, LEU2)).
Rabbit IgG (Sigma, catalogue number I5006, various lot numbers)
conjugatedtoDynabeadswasusedtoimmunoprecipitatechromatin
fromTAP-taggedstrains.SantaCruzBiotechnologysc-7392antibody
was used to immunoprecipitate chromatin from HA-tagged strains.
Millipore antibodies 04-1570-I, 04-1571-I or 04-1572-I were used to
immunoprecipitatePolIIhavingitscarboxy-terminaldomainphospho-
rylatedatpositionsserine7,2or5,respectively,oftheheptadrepeats.
Milliporeantibody07-352wasusedtoimmunoprecipitatehistoneH3
with acetylated lysine 9 (H3K9ac). Cell Signaling antibody 5546S was
usedtoimmunoprecipitatehistoneH2Bwithubiquitinatedlysine123
(H2BK123ub). Cse4 antibody from C. Wu (Johns Hopkins Univ., Balti-
more, MD) was used to immunoprecipitate Cse4. Heat shock factor 1
(Hsf1)antibodyfromD.Gross(LouisianaStateUniv.,BatonRouge,LA)
wasusedtoimmunoprecipitateHsf1.ChIP–seqexperimentsusingmic-
rococcalnuclease(MNase)toidentifynucleosomeswereperformedfor
thefollowinghistonesandhistonemodifications:H3(detectedusing
Abcam antibody ab1971), H3K27ac (ab4729), H3K36me3 (ab9050),
H3K4me3(ab8580),H3K79me3(ab2621),H3K12ac(ab46983)andH2B
(Active Motif 39237).
CellgrowthandChIP–exo
S. cerevisiae strains were grown in 67 ml of yeast peptone dextrose
(YPD) media to an optical density at 600 nm (OD600) of 0.8 at 25 °C.
Cellswerecrosslinkedwithformaldehydeatafinalconcentrationof1%
for15 minat25 °C,andquenchedwithafinalconcentrationof125 mM
glycine for 5 min. Cells were collected by centrifugation, and washed
in 1 ml of ST buffer (10 mM Tris-HCl, pH 7.5, 100 mM NaCl) at 4 °C. The
cellswerepelletedagain,thesupernatantwasremoved,andthepellet
was flash frozen.
AsSTMclassificationcriteriaincludedpromotersthatbecamebound
bySAGAuponacuteheatshock(asdescribed36
),wecarriedoutequiva-
lent heat-shock experiments but using the workflow of this study. We
used these new data to assign heat-shock-induced binding locations
ofSAGA(whichcorrelatedhighlywithbindinglocationsinref.36
).For
theseheat-shocksamples,yeastwasgrownin67 mlofYPDtoanOD600 of
0.8at25 °C;anequalvolumeofYPDmediumat55 °Cwasaddedtoraise
thetemperatureofthecultureto37 °Candincubatedat37 °Cfor6 min.
Then, cells were crosslinked with formaldehyde at a final concentra-
tionof1%for15 minatroomtemperaturebyaddinga50 mlsolutionof
ice-cold3.7%formaldehydeinwater.Notethatprotein–DNAcrosslinks
occurrapidly.Crosslinkingwasquenchedwithafinalconcentrationof
125 mM glycine for 5 min. Cells were collected by centrifugation, and
washed in 1 ml of ST buffer at 4 °C. The cells were pelleted again, the
supernatant was removed, and the pellet was flash frozen.
Chromatin preparations are based on modifications of a prior pro-
tocol1
.Frozencellpelletswereresuspendedandlysedin1 mlofFAlysis
buffer(50 mMHepes-KOH,pH 7.5,150 mMNaCl,2 mMEDTA,1%Triton,
0.1%sodiumdeoxycholateandcompleteproteaseinhibitor(CPI))and
a 500 μl volume of 0.5 mm zirconia/silica beads by bead beating in a
Mini-Beadbeater-96 machine (Biospec) for three cycles each of three
minutes on/seven minutes off (samples were kept in a −20 °C freezer
during the off cycle). The lysates were transferred to a new tube and
microcentrifugedatmaximumspeedfor3 minat4 °Ctopelletthechro-
matin.Thesupernatantswerediscarded;thepelletswereresuspended
in600 μlofFAlysisbufferandtransferredto15 mlpolystyreneconical
tubes containing 300 μl of 0.1 mm zirconia/silica beads. The samples
were then sonicated in a Bioruptor Pico (Diagenode) for 8 cycles (15 s
on/30 soff)toobtainDNAfragmentsof100–500 bpinsize.EachChIP–
exo assay processed the equivalent of 33 ml of cell culture (roughly
8 × 108
cells).Theremaininghalfoftheprocessedchromatinwasflash
frozenandstoredat−80 °Cincaseatechnicalreplicatewasdesired.
Acultureequivalentof33 ml(roughly630millioncells)ofyeastwas
fragmentedtoproducesolubilizedchromatin(roughly190 μl).Thiswas
incubated overnight (roughly 16 h) at 4 °C with the appropriate anti-
body.A10 μlbedvolumeofconjugatedIgG–Dynabeads(0.83 mg ml−1
IgGand5 mg ml−1
Dynabeads)or3 μgofspecificantibodieswitha10 μl
slurry-equivalent of Protein A Mag Sepharose (GE Healthcare) was
used in each reaction.
ChIP–exo5.0wasperformedasdescribed1
.Essentially,ChIPlibraries
werepartiallyconstructedontheimmunoprecipitatedresin,andthen
λexonucleasewasusedtotrimnucleotidesinthe5′to3′directionuntil
stopped by a protein–DNA crosslink. The DNA was then eluted and
library construction completed.
In a typical experiment with TAP-tagged yeast strains, 48 ChIP–exo
experimentswereperformedconcurrently.Eachsetincluded46unique
targets,aReb1–TAPsampleasapositivecontrol,andaBY4741sample
(from a parental strain lacking the TAP tag) as a negative control. Fol-
lowing 18 cycles of polymerase chain reaction (PCR), all 48 samples
were pooled equally by volume. Library concentration was quanti-
fiedbyquantitativePCR(qPCR).Equivalentworkflowsoccurredwith
other strains.
Using paired-end Illumina sequencing and cellular conditions
identical to those used to produce ChIP–exo data, we generated a
genome-widenucleosomemap(MNasehistoneH3andH2BChIP–seq)
with improved accuracy over our prior maps. MNase ChIP–seq was
performedasdescribed37
.Briefly,formaldehyde-crosslinkedchroma-
tin was digested with MNase to achieve roughly 80% of mononucle-
osomes.AfterH3orH2BChIPandlibraryconstruction,librarieswere
size selected by agarose gel electrophoresis, and sequenced.
Sequencingandmapping
High-throughputDNAsequencingwasperformedwithanIlluminaNext-
Seq500or550inpaired-endmode,producinga40 bpRead_1anda36 bp
Read_2.AdditionalpreviouslypublishedChIP–exodatasets23,36
forHsf1,
Msn2, Spt15, Spt16, Ifh1 and Fhl1 were included in data processing and
analysisforourstudy.Dataweremanaged,qualitycontrolled,andpro-
cessedthroughacustomautomatedworkflowcontrolcalledPEGR(Plat-
formforEpi-GenomicResearch)38
.Sequencereadswerealignedtothe
yeast(sacCer3)genomeusingbwa-mem(version0.7.17).Alignedreads
werefilteredusingPicard(version2.7.1)39
andsamtools(version0.1.18)40
to remove PCR duplicates (that is, where the 5′ coordinates-strand of
Read_1andRead_2wereidenticaltoanotherreadpair)andnon-uniquely
mapping reads. For ChIP–exo, the resulting mapped 5′ end of Read_1
(theexonucleasestopsite)isdefinedasatag.ForMNase,theresulting
mappedmidpointofRead_1andRead_2isdefinedasatag.
Dataquality,statisticsandreproducibility
We tested many targets that were not expected to bind directly to
DNA, and thus could not assume that every target would produce a
Article
positive ChIP signal. We empirically determined that a minimum of
200,000 deduplicated tags were required to assess the quality of an
individual dataset. If a dataset received less than 200,000 tags, then
we required the tag duplication level (number of reads discarded by
PICARD)/(number of input reads) of the sample to be less than 70%
before we sequenced it more deeply. For example, if a dataset had
100,000mappablededuplicatedtags(uniqueRead_1andRead_2com-
bination), but a total of 1 million mappable tags before filtering, then
the duplication level was 90% and it was assumed that the library was
insufficiently complex to warrant additional sequencing. If a library
was insufficiently complex, we performed a technical replicate with
theremainderofthechromatinpreparation.Followingthisprocedure,
we produced a sufficiently complex library for more than 95% of tar-
getstestedfromasingleyeastculture.Inpractice,poolingequivalent
proportions of 48 barcoded libraries (in terms of reaction volumes)
provided similar sequencing depth across all samples. All analysed
dataset were confirmed with independent biological replicates that
passedourquality-controlmetrics.Adatasetwasconsideredsuccess-
fulifsignificantlocations(binomial,1.5-fold,P < 0.01)wereidentified
by ChExMix (see below) and these locations were not in regions that
producehighlyvariabledata.Nvaluesarereportedforthenumberof
target datasets (hierarchical clustering and UMAP) or the number of
genomic features (composite plots and heatmaps) analysed.
RawFASTQreadsforeachsamplewerealignedagainsttheknownTAP
or HA FASTA sequence and nearby genomic sequence to confirm the
presenceandlocationoftheepitopeineachstrain.See03_EpitopeID
at https://github.com/CEGRcode/2021-Rossi_Nature.
Mappingstatisticsforeachdatasetareavailableatyeastepigenome.
org, along with mapped data downloads. Analyses shown at yeast-
epigenome.org can be reproduced or further custom analysed using
ScriptManager(https://github.com/CEGRcode/scriptmanager),which
provides a simple user-friendly interface. It includes straightforward
instructionsforinstallationandfordataanalysis.Datavaluesfromthe
paper’scompositeplotscanbefoundin01_Composite_Filesathttps://
github.com/CEGRcode/2021-Rossi_Nature.
ChExMixlocations
ChExMix41
version 0.31 was run with the following non-default
parameters: --noread2 --scalewin 1000 --minmodelupdateevents 50
--fixedalpha0--mememinw8--mememaxw21--minmodelupdaterefs25
--
lenientplus. We also used the --excludebed option to exclude from
analysis a custom set of hypervariable regions (ChExMix_Peak_Filter_
List_190612.bed),includingtherDNAlocus,tRNAgenesandtelomere
regions (this list is available in 02_References_and_Features_Files at
https://github.com/CEGRcode/2021-Rossi_Nature).Bydefault,ChEx-
Mixrequiresthetagcountatbindingeventstoachieveatleast1.5-fold
enrichmentandaminimumBenjamini–Hochberg42
correctedPvalue
of0.01(binomial),comparedwiththescaled‘masterNoTag_20180928’
negativecontrolcount.Allexperimentsforagivenproteintargetwere
analysed by ChExMix individually. The resulting peak calls for each
individual replicate experiment can be found at yeastepigenome.org
or the National Center for Biotechnology Information (NCBI) Gene
Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/). In
addition, the --lenientplus option enables a multireplicate reproduc-
ibilityassessmentmodeinChExMix.Usingthisfeature,replicateexperi-
ments passing quality control were analysed simultaneously, and the
resulting joint peak calls were used to classify Pol II features (see the
section on ‘Pol II promoter classes’ below). Locations are defined as
ChExMixpeaksiftheirtagcountspassthethresholdsinthecombined
meta-experiment (essentially merging tag counts across replicates),
orinoneormoreindividualreplicateexperiments.However,locations
are reported only if the normalization of ChIP–seq (NCIS)-scaled tag
countsdidnotvarysignificantlyacrossreplicates(binomial,1.5-fold,
P < 0.01).Thislatterconditionhadtheeffectofscreeningoutlocations
that were not reproducibly enriched across replicated experiments.
Locations resulting from a combined analysis of two independent
replicates can be found in 04_ChExMix_Peaks at https://github.com/
CEGRcode/2021-Rossi_Nature (and at https://doi.org/10.26208/rykf-
6050 for individual replicates).
The negative control for ChExMix peak calling, termed ‘masterNo-
Tag_20180928’, was created by merging 15 individual BY4741 (parent
strain containing no epitope tag) ChIP–exo experiments into a single
BAM file. These negative controls were generated over an 18-month
period during the main phase of data collection. The file ‘masterNo-
Tag_20180928.bam’comprisesthefollowingSampleIDs:11851,11946,
12094, 12880, 13484, 13822, 14202, 14408, 14637, 14825, 15256, 15818,
16073, 17814 and 18504, and is available at https://doi.org/10.26208/
rykf-6050.
Meta-assemblages
Meta-assemblagesarebasedoncellpopulations.Thus,theirmember
targetstendtobindthesamegenomiclocations,althoughnotneces-
sarilyatthesametimeoraboveapresetalgorithmicthreshold.Owing
to parameter constraints placed on clustering, significant (P < 0.01)
butrare(forexample,HIR)and/orhighlyisolated(forexample,Vid22/
Tbf1)bindingeventstendedtoclusterneareachotherinUMAP,andso
wereplacedinasinglemiscellaneousmeta-assemblage(ISO)without
further analysis.
Usingbedtoolsintersect(bedtoolsversion2.27.1),allChExMixpeaks
(regardless of whether they were associated with the Pol II sector,
definedabove)foreachof384validatedinputtargetswereintersected
ina100-bpwindowaroundthemselves.Thisproducedasymmetrical
matrixofcountsrepresentingthefrequencyofpeakoverlapbetween
all samples. 2D hierarchical clustering43
was then performed, using
average linkage and uncentred correlation as the metric.
Theinteractionmatrixwasfurtherfilteredtoremove13targetswith
fewerthanfivetotalChExMixpeaks(forexample,PolItargetshaving
only two binding locations that are annotated in the reference yeast
genome, despite the rDNA locus being highly repetitive). This pro-
ducedasymmetricalmatrixof371samples(Fig. 1bandSupplementary
Data 3).ThematrixwasthenusedastheinputintotheUMAPalgorithm
(version0.3.7)44
usingthefollowingparameters:umap.UMAP(n_neigh-
bours = 5,min_dist = 0.0,n_components = 2,metric = ‘correlation’,ran-
dom_state = RS,).fit_transform(X).K-meansclusteringwasperformed
on the resulting 2D projection at a variety of K values (5, 10, 20, 25,
30, 35, 40, 100, 145). No new biologically distinct clusters appeared
beyond K = 40.
Referencefeaturesandintervals
Coordinates for 253 replication origins (ACS sequences, for ‘auton-
omously replicating sequence (ARS) consensus sequences’) were
obtained from ref. 6
. Note that ACS_6_32973 has a duplicate entry on
theyeastepigenome.orgwebsite,resultingin254features.Coordinates
for X-core elements (XCEs), centromeres (CENs), RNA polymerase I
(Pol I) TSS, Pol III TSS, NCR (SGD-defined noncoding RNA annotated
as ncRNA_gene, snoRNA_gene, and snRNA_gene) and Ty transposon
LTRswereobtainedfromtheSaccharomycesGenomeDatabase(SGD;
https://www.yeastgenome.org)on3March2017(availableasSGD_fea-
tures_170331.tabin02_References_and_Features_Filesathttps://github.
com/CEGRcode/2021-Rossi_Nature).TSSandTEScoordinatesforPol
II were obtained from ref. 45
. They were matched to each SGD coding
featurethroughtheirsystematicGeneID.Thesecoordinateswerebased
onmicroarrays.ForTSS,themost5′-enrichedsense-strandcoordinate
ineachpromoterisreported.Whennotranscriptwasreportedforan
SGDfeature,theTSSandTESwereimputedfromtheSGDcoordinates
by moving 70 bp upstream of the start ATG (SGD start) for TSSs and
70 bpdownstreamofthestopcodon(SGDend)forTESs.Thisimputa-
tionwasbasedontheempiricalobservationthatthemediandistance
fromtheTSSdefinedinref.45
andthestartcodonwas70 bp.‘Dubious
ORFs’wereinitiallyconsideredandthenexcludedfromfurtheranalysis
because we and others46
found no validating evidence. Noncoding
RNAs (ncRNAs) were from SGD annotations; cryptic unstable tran-
scripts (CUTs) and stable unannotated transcripts (SUTs) were from
ref.45
;andXrn1-sensistiveunstabletranscripts(XUTs)werefromref.47
.
Referencedatasetsareavailablein02_References_and_Features_Files
at https://github.com/CEGRcode/2021-Rossi_Nature: SGD features
(SGD_features_170331.tab),ORFTSS(Xu_2009_ORF-Ts_V64.gff3),CUT
(Xu_2009_CUTs_V64.gff3), SUT (Xu_2009_SUTs_V64.gff3), and XUT
(van_Dijk_2011_XUTs_V64.gff3).
NucleosomemapsatPolIIpromoterregions
MNaseH3andH2BChIP–seqpaired-endreadswerebioinformatically
filteredtofragmentsizesof100–160 bp,andthennucleosomedyads
(peaks)werecalledfromthemappedmidpointlocationofRead_1and
Read_2 5′ ends using GeneTrack (v1) (parameters: s40e80F1)48
. Peaks
were required to overlap within a 75-bp window in at least 4 of 6 data-
sets(3H2Band3H3MNaseChIP–seq;SampleIDs10951,10952,10967;
10947, 10948 and 10966) to call a consensus nucleosome (N = 6). The
averagelocationofoverlappingpeaksdefinedthedyadcoordinateof
a consensus nucleosome.
The +1 nucleosome was defined as the nucleosome dyad peak that
was closest to a TSS in a window −60 bp to +140 bp. If no nucleosome
wasfound,thenanadditionalsearchwasperformed−80 bpto−61 bp
relative to the TSS. If none was found, then the region was viewed in
IntegratedGenomeViewerversion2.5.2(IGV;http://software.broadin-
stitute.org/software/igv/)49
,andmanuallyassigned.Ifnonucleosomes
could visually be assigned to a TSS in IGV, then a +1 nucleosome dyad
coordinate was imputed as the SGD ATG start coordinate (which is
theconsensuslocationof+1nucleosomes).ThisplacedtheTSSatthe
genome-wide canonical location relative to the imputed +1 dyad.
We previously defined consensus −1 nucleosome positions of all
genes transcribed by Pol II, regardless of whether a nucleosome had
low occupancy or was even detectable50
. However, here our intent
was to define the region encompassing NFRs and NDRs, and so we
chose to ignore nucleosome positions that were highly depleted of
nucleosomes. Our goal was to manually determine the location of
the most robust algorithmic nucleosome position (upstream stable
nucleosome, USN) that was located closest to a TSS and in a window
−500 bp to −60 bp from the TSS, as long as that nucleosome was not
already called a +1 nucleosome. If one of the following criteria was
met, then the nucleosome landscape was visualized in IGV, and the
USN and/or +1 nucleosomes were manually (re)assigned (N = 753): 1)
either the USN or +1 was not present in the original algorithmically
definedset;2)theUSN-to-(+1)dyad-to-dyaddistancewascalculated
to be smaller than 187 bp (the size of a nucleosome (147 bp) and two
linkers (2 × 20 bp)); 3) a ssTF peak was, first, located less than 600 bp
upstream of the TSS, and second, upstream (more 5′ to the nearest
TSS) of a nucleosome call having an occupancy score that was in
the bottom 5% of all nucleosomes (that is, an algorithmically called
nucleosomethatwasinfacthighlydepletedinthevicinityofassTF).
If no nucleosomes could visually be assigned, the USN nucleosome
coordinate was imputed as 750 bp upstream of the +1 nucleosome
dyad (99th percentile of calculated NDR/NFR lengths). The NDR/
NFRlengthatthesefeatureswasreportedas‘9999’inSupplementary
Data 1 (1S) (N = 297). As the promoter regions defined in this study
include arbitrary limits and do not consider limits defined by insula-
tion, there will be some inaccuracies in relation to actual biological
promoter boundaries. This is expected to result in some promoter
misclassifications.
In total, 59,002 nucleosomes were called across the S. cerevisiae
genome. Nucleosome occupancy and fuzziness scores were calcu-
latedasdescribed51
.Allnucleosomecallswiththeirmedianoccupancy
and fuzziness scores are available as Nucleosome_calls_and_stats.
xlsx in 02_References_and_Features_Files at https://github.com/
CEGRcode/2021-Rossi_Nature.
ChExMixlocationsatfilteredPolIIgenes
The initial list of all compiled features totalled 11,112 (Supplementary
Data 1). Numerous quality-control metrics were calculated for each
Pol II transcribed feature to assess their validity and mappability. We
usedtwoGTFs(Sua7(SampleID = 11743)andSsl2(11747))andanegative
control(masterNoTag_20180928.bam),withtotaltagssettobeequal
across all three in order to assess the enrichment around each candi-
date coding and noncoding Pol II TSS (N = 9,844; feature class level 1:
01–12,14,24and25inSupplementaryData 1 (1D)),asdescribedbelow.
Aregionofthegenomewasdefinedforeachtranscribedfeaturethat
included the transcribed sequence (TSS to TES) and the surrounding
regulatory region. The upstream (promoter) regulatory region was
defined as the inclusive interval between the dyad coordinate of the
USN (see above) and the TSS. When no USN was called for a feature,
thentheupstreamboundarywasdefinedas750 bpupstream(5′)ofthe
TSS. Note that the upstream boundary does not consider boundaries
defined by insulators, as they have not yet been fully defined. This
may result in unwarranted attachment of ssTF/cofactor locations to
somepromoters.Thedownstreamregulatoryregionwasdefinedasthe
inclusiveintervalfromTESto100 bpdownstream(3′).Thisboundary
was based on the consensus position of the termination machinery
relative to the TES. The genomic region from the USN dyad to 100 bp
downstream of TES was defined as a ‘Pol II sector’.
ChExMix peaks for all datasets here were intersected with each Pol
II sector using Bedtools. A protein was defined to be located within a
featureifatleastoneChExMixpeakoverlappedwithanyportionofthe
sector.IfaChExMixpeakintersectedtwooverlappingsectors(thatis,
the peak exists in the promoter region of two genes in a head-to-head
orientation), then that protein was located in both sectors. Conse-
quently, the number of ChExMix peaks and the number of bound fea-
tures (or sectors) is not equal.
PolIIsectorswereexcludedas‘hypervariable’ifanyofthefollowing
conditions were met: 1) the TSS was in the highest 1% of masterNo-
Tag_20180928 tag counts (negative control) in a 1,000-bp window
centred over the TSS; 2) the TSS was in the highest 5% of masterNo-
Tag_20180928 tag counts in a 200-bp window centred over the TSS
and the occupancy ratios of both Sua7/NoTag and Ssl2/NoTag were
less than 2 (based on total tag normalization). The rationale for these
criteria was that if the signal in the negative control was too high, and
thesignal-to-noiseratiosofrobustGTFssuchasSua7andSsl2werenot
well above the high background, then we did not have confidence in
locationscalledatthesesites.Thesectorwasretainedifitoverlapped
withapeakcallfromanydatasetinthisstudy.Weassumedthatthepeak
indicated enough dynamic range to have useable data in this region.
WeexcludedN = 75PolIIsectorsbythismetric(‘08_Hyper-variable’in
Supplementary Data 1 (1D)).
Pol II sectors were excluded for having ‘poor mappability’ if any of
the following conditions were met: 1) the TSS was in the lowest 1% of
MasterNoTag_20180928tagcountsina1,000-bpwindowcentredover
theTSS;2)theTSSwasinthelowest5%ofmasterNoTag_20180928tag
counts in a 200-bp window centred over the TSS and the occupancy
ratiosofbothSua7/NoTagandSsl2/NoTagwerelessthan2(basedon
total tag normalization). Visual inspection of heatmaps confirmed
thatthesesegmentsofthegenomewerenotuniquelymappable,and
thus had low intrinsic tag counts. We excluded N = 116 Pol II sectors
by this metric (‘24_Hyper-variable_noncoding’ in Supplementary
Data 1 (1D)).
Pol II sectors were excluded as ‘Quiescent-NoPIC’ if the occupancy
ratios of both Sua7/NoTag and Ssl2/NoTag were less than 1. The sec-
tor was retained if it overlapped with a peak call from any dataset in
this study. The rationale here was that if there were no peaks in the
sector vicinity and no enrichment of GTFs, then this feature was rela-
tively quiescent. Thus, it was uninformative to analyse it further. We
donotexcludethepossibilitythatthesefeatureshadlowsubthreshold
Article
activity.WeexcludedN = 251PolIIsectorsbythismetric(‘05_NoPIC’in
Supplementary Data 1 (1D)).
Pol II sectors were excluded as ‘tRNA proximal’ if peaks from Tfc3
(11835)—a component of the RNA polymerase III transcription ini-
tiation factor complex—overlapped with the region between the +1
nucleosomedyadandtheUSNdyadofthesector.tRNAgenesproduced
high levels of background owing to strong crosslinking of the Pol III
machinery, which digestion by λ exonuclease then focuses into high
backgroundpeaks.Althoughthisbackgroundispresentinallsamples,
itismostproblematicorevidentwherethetargetforegroundsignalis
closetobackground.WeexcludedN = 135PolIIsectorsbythismetric:
(‘06_tRNAprox’ in Supplementary Data 1 (1D)).
PolIIsectorswereexcludedas‘ChExMixextreme’iftheyoverlapped
withanunusuallyhighnumberofpeaks.Thesefeaturescontainedmany
gene-body peaks for targets that, across the rest of the genome, were
bound primarily in promoter regions. Further analysis revealed that
thedensityoftagsacrossthegenebodyinthemasterNoTag_20180928
negativecontrolwasabnormallyhighorlow,relativetotherestofthe
genome, thereby creating statistical anomalies of bound locations.
Consequently, ChExMix produced many false-positive peak calls in
unrelated datasets at these extreme regions where the background
modelappearstobreakdown.Thepeakcallsattheseextremefeatures
arestillincludedintheChExMixpeakfilesbutshouldnotbeconsidered
validlocationsunlessvalidatedbyorthogonalmethods.Thenumberof
PolIIsectorsgiventhislabelwasempiricallycappedatN = 25(‘07_ChEx-
Mix_extreme’ in Supplementary Data 1 (1D)). The value of this filter
is that it decreased the number of potentially artefactual locations
occurring in noncanonical places, particularly for ssTFs that bind to
fewgenes.However,wedonotexcludethepossibilityofnoncanonical
extreme, yet still biological, behaviour occurring at these genes. For
example, large condensates might behave in this way.
Our analysis of the ncRNA features reported in refs. 45,47
found that
many of these calls were not supported by evidence of GTF binding
(Sua7) in the TSS vicinity, suggesting that many were false positives.
NoncodingPolIIsectorswereexcludedifnoSua7peakwasfoundwithin
80 bp of the TSS. We excluded N = 2,161 ncRNA Pol II sectors by this
metric (‘25_excluded_ncRNA’ in Supplementary Data 1 (1D)).
PolIIpromoterclasses
Ourunsupervisedapproachtochromatinorganizationgenome-wide
produced meta-assemblages that reflect predominant architec-
tural themes. Meta-assemblages are computed ensembles of many
genome-wide locations averaged across millions of cells, and thus do
notnecessarilycorrespondtobiochemicallystablecomplexes.There
are cases in which a meta-assemblage such as ORC would appear to
have a corresponding biochemical ensemble at replication origins.
Thismakesmeta-assemblagesandrealensemblesseeminglythesame.
However,asexpected,therewasnosinglepromoterarchitecturethat
emergedfromourunsupervisedapproach.Instead,meta-assemblages
reflectedpredominantarchitecturalthemesthatrangedalongacom-
positionalspectrumfromrelativelyheterogeneous(ssTFs/MED/SAGA/
TUP) to relatively homogeneous (PIC). Meta-assemblages could be
merged or subdivided to achieve levels of granularity, but also levels
of uncertainty. They permeated promoters to varying extents.
The variation in the types of meta-assemblages within and across
promoter classes gives them their unique regulatory properties, but
also makes promoter classification fluid. Classification depends on
input criteria that reflect subjective concepts. For example, prior
workcreatedSAGA-dominatedandTFIID-dominatedgenegroupson
the basis of functional criteria (relative sensitivity to SAGA and TFIID
mutants)28
. This helped to produce a genome-wide concept of induc-
ible versus constitutive genes, but could not address other concepts
suchasinsulation,orthefactthatsomethemesmaynotbemanifested
throughSAGAandTFIID,orthattheremaybemoregranularityineach
of those classes. We attempt here to provide more granularity, but
recognizethatsimplifyingoverarchingconceptsarebestservedwith
fewergroups.Tothisend,wecreatedpromoterclassesthataroseinpart
fromourunsupervisedlearningapproach.However,wealsoinjected
additionala prioriknowledge.Thisknowledgeconsidersthefunctional-
ityofeachfactorthatcontributestodistinctiveregulatoryarchetypes.
The137RPpromoters(definedbySGD)encodesubunitsoftheribo-
some. They comprise the largest known set of genes that are thought
tobecoregulatedunderallconditions.Thismaybeduetothefactthat
they are predominantly regulated by the ssTF Rap1. They are highly
expressed and well studied by ChIP-exo as a group23
, and so form a
distinct gene set.
SAGA, Mediator and Tup1 (‘STM’) are major cofactor complexes
that, along with other ssTFs and cofactors (listed in Supplementary
Data 2 (1K)), co-occur at highly expressed genes and formed major
UMAP clusters. We therefore defined a set of non-RP STM promoters
(using the Bedtools intersect) if the region between the +1 nucleo-
someandUSNdyadshadatleastoneSAGA,MediatororTUPChExMix
call (Supplementary Data 2 (10A)) in YPD at 25 °C or a SAGA call upon
acuteheatshock36
(6 minat37 °C)(N = 984intheSTMgroup;seeSup-
plementaryData 1 (1E)).MostSTMpromoterregions(N = 854,or87%)
also bound at least one of 78 ssTFs site-specifically (Supplementary
Data 2 (10C)). The majority of these ssTF peaks overlapped position-
ally with STM cofactor peaks. Applicable to Fig. 5b, we labelled each
ssTF-boundmotifasa‘consolidatedssTFmotif’ifitoverlappedwitha
STMpeak.Thisconsolidatedmotifsetwasconsideredtheorganizing
centreofthatpromoter.WhenassTFmotifwasabsent,thessTFpeak
callwasusedininstead.WhennumerousssTFswereboundtothesame
promoter,thessTFclosesttotheSTMpeakwasused(Supplementary
Data 1 (1Y–1AI)).
Of the remaining promoters (non-RP, non-STM), a subset had ssTF
ChExMixpeaks(whethersite-specificallyboundornot)orothercofac-
tor ChExMix peaks in the region between the +1 nucleosome and the
USN.ThislistofssTFsandcofactorsdidnotincludethecoretranscrip-
tionmachinery(initiation,elongationortermination),whichneverthe-
lesswerepresent.Wethereforedefinedtheseas‘TFO’(N = 1,783).About
one-quarterofTFOpromotershadaboundssTFthatwasmoreassoci-
atedwithSTMpromoters,andthuspresumablycapableofrecruiting
cofactors(SupplementaryData 2 (8)).TheseTFOpromotersmayhave
been algorithmically misclassified, perhaps being expressed under
otherenvironmentalconditions.Thosenon-RP,non-STMandnon-TFO
promoters that remained constituted 2,474 promoters whose pro-
moter regions lacked evidence of a binding event beyond a PIC or a
nucleosome,andthusformedthelargestofallgroups,the‘unbound’
(‘UNB’). These classifications are indicated in Fig. 1a, along with their
relationship to the TFIIDdom and SAGAdom gene classes. Relative PIC
occupancy (green-dot count) is based on average TFIIB (Sua7) occu-
pancy (Supplementary Data 1 (1AJ)) but confirmed with nascent and
steady-state transcription.
StringentPolIIpromoterclasses
Theseclassificationsweremorestringentthanthoseaboveandrelateto
Fig. 5b,c,andExtendedDataFig. 9b,c.The‘SAGA-bound’classification
requiredapromotertohaveaChExMixpeakcall(‘1’inSupplementary
Data 2 (3))fortwoormoreofthefollowingtargets:Spt7,Ada2,Sgf11or
Sgf73.The‘STM-bound’classificationrequiredapromotertohaveall
threeofthefollowinglabels:SAGA-bound,TUP-boundandMediator/
SWI–SNF-bound,asfollows.The‘TUP-bound’classificationrequireda
promoter to have a ChExMix call (‘1’) for two or more of the following
targets: Tup1, Cyc8, Sok2 and Cin5. The ‘Mediator/SWI–SNF-bound’
classification required a promoter to have a ChExMix call (‘1’) for
two or more of the following targets: Swi1, Med2, Snf6 and Swi3. The
‘RSTM-bound’ classification required a promoter to have both of the
followinglabels:STM-boundandRPD-bound.TheRPD-boundclassifi-
cationrequiredapromotertohaveaChExMixcall(‘1’)fortwoormore
ofthefollowingtargets:Rpd3,Rxt1/Cti6,Rxt2,Rxt3,Nrm1andUme6.
Heatmapsandcompositeplots
Analysis was performed using the GUI ScriptManager version 012,
which is available for download at https://github.com/CEGRcode/
scriptmanager. ScriptManager provides a simple user-friendly inter-
face for ChIP–exo analysis, and includes simple installation instruc-
tions.HeatmapsandcompositeplotsweregeneratedusingTagPileup
script. For ChIP–exo data, the following settings were used: Read_1
5′ end; separate strands, 0 bp tag shift, 1 bp bin size, sliding window
(moving average) 11. For MNase ChIP–seq data the following settings
were used: (paired-end) read midpoint; combined strands, 0 bp tag
shift, 1 bp bin size, sliding window 21. All data are oriented by TSS or
reference point strand.
For graphical display of composite plots, output data consisted of
frequency counts of Read_1 5′ ends for ChIP–exo or Read_1/Read_2
midpoint for MNase H3/H2B ChIP–seq dyads (BAM files) that were at
x-axisbase-pairdistancesfromsetsofgenomicreferencepoints(BED
files). Underlying patterns and datapoints are available at yeastepig-
enome.org and as Excel_Composite_Data_Processed.xlsx in 01_Com-
posite_Files at https://github.com/CEGRcode/2021-Rossi_Nature. An
additional moving average of 20 bp (30 bp for Pol II elongation and
Yrr1 composites) was performed for the purpose of improving visual
clarity. Without this, the high-bp resolution of ChIP–exo resulted in
peaks that were quite narrow in the 1-kb visualization window, such
thattheirfillpatternswerelessvisuallyobvious.Forgene-bodytargets
(Fig. 3c and Extended Data Fig. 5), smoothed strand-separated data
were shifted 50 bp in the 3′ direction before combining strands. The
rationaleforthisisthatwhenweexaminedeachstrandseparately,we
noticedthatpatternsonthetranscribedstrandshowedsomemirror-
ing on the nontranscribed strand. But this pattern was shifted in the
3′ direction relative to transcribed strand (that is, more downstream
of the TSS). We surmise that this ‘double-vision’ effect was caused by
efficient crosslinking such that the 5′–3′ λ exonuclease is generally
stoppedatthebackendofthePolIIentourageonthetranscribedstrand
and stopped at the front-end of the entourage on the nontranscribed
strand. Shifting data on both strands by 50 bp in their respective 3′
directionspartiallycorrectedthisdoublevisionandreflectsthemiddle
ofthecomplex.Intheabsenceofastrand-specific3′shiftforgene-body
targets,patternsneartheTSSreflectthebackendofthePolIIentourage,
and patterns near the TES represents its front end. The data in Fig. 5b
and Extended Data Fig. 9b were not strand-shifted before removing
strand information.
Incompositeplots,they-axisislabelled‘Occupancy(a.u.)’(arbitrary
units), reflecting y-axis scaling that was adjusted to highlight the pat-
terning of the data. Within a single figure (including any Extended
Data figure counterparts), occupancy levels can be compared across
multiplepanelsonlyforthesamedataset.Occupancylevelsofdifferent
datasetsinthesameordifferentpanelscannotbecompareddirectly.
Only the peak positions are comparable. For Fig. 2, the MEME motif
obtainedandshownforOrc6startsatposition2oftheACS.ForCbf1,the
MEME motif starts at position 1 of CEN. Schematics reflect subjective
interpretationsofpeaklocations,arenonlinearwithrespecttothedia-
grammedDNAlinearity,anddonotreflectproteinmolecularweights.
NascentRNA(CRAC)analysis
ThisanalysisrelatestoFig. 4d.CRACdatasetsweredownloadedfrom
GEO using accession code GSE97913. Raw sequencing data were
trimmed of adapters and aligned to the sacCer3 genome using the
recommendedparametersinref.26
.The5′endsofreads(corresponding
to the 3′ ends of sequenced nascent RNA) were counted in a window
from the TSS45
to 300 bp downstream (more 3′ on the ‘sense’ strand).
Only those reads that mapped to the sense strand relative to the gene
body were retained. Datasets were normalized such that the total tag
countswereequal.However,asallanalysiswasinternaltoeachdataset,
this had no effect on final output.
This analysis relates to Extended Data Fig. 7b. TFIIB (Sua7) occu-
pancy data (Read_1 5′ end) were counted in a 100-bp window centred
on each promoter TSS. The list of all coding genes was filtered to be
only head-to-head such that each gene possessed a promoter region
overlapping/adjacent to another gene’s promoter (Supplementary
Data 1 (1AZ–1BG)). Promoter regions were then separated into three
groups: RP + STM, TFO and UNB. A separate Reb1-bound group was
alsocreated.APearsoncorrelationwascalculatedforCRACsignalsfor
onepromotersidecomparedwiththeotherside,withineachdataset.
ClassificationofssTFs
WeusedGOclassificationsandtheJASPARmotifdatabasetoidentify
candidatessTFs.HerewedefineassTFasatargetthathasatleastfour
ChExMix peaks in the total set of promoter regions, and an enriched
motifthatisnotmoreenrichedwithanotherssTF.AsofOctober2019,
the JASPAR database reported 175 nonredundant ssTF motifs for S.
cerevisiae, which are based on experimental assays including in vitro
protein-bindingmicroarrayswithpurifiedprotein52
.Ofthose,70cor-
respondedtossTFs,inwhichweconfirmedtheirsitespecificityin vivo
by ChIP–exo. Another two (Mot3 and Rgt1) were confirmed after this
study was completed. As ChIP–exo can define site specificity within a
few base pairs, this represents a remarkable degree of concordance
between in vivo and in vitro binding. Because of co-occurrence of
motifs in the genome, additional nearby motifs were also enriched
for these ssTFs. If multiple targets had a match with essentially the
same JASPAR motif, then we used GO descriptions and the literature
toidentifythosethatweremostlikelytobedirectbinders(ssTFs).The
rest were labelled as cofactors. For example, Nrg1 and Nrg2 bind the
same motif, although JASPAR assigns this motif to Nrg1. We labelled
both as ssTFs. Another equivalent example involves Met4, Met31 and
Met32. Both Yox1 and Mcm1 have distinct motifs reported in JASPAR,
and both biochemically interact. However, ChIP–exo reported the
Mcm1 motif for both, with Mcm1 being much stronger. We therefore
classified Yox1 as a cofactor in YPD at 25 °C, instead of a ssTF. Eight
targets had GO annotations indicative of a ssTF and yielded robust
motifs by ChIP–exo with a robust ChIP–exo pattern, but five of them
hadnomotifinJASPAR(Nrg2,Hms2,Hmo1,War1andPip2),andthree
had a different motif in JASPAR (Tea1, Rds2 and Sum1). These eight
were also labelled as ssTFs. This resulted in 78 ssTFs that ChIP–exo/
ChExMix detected as bound to a motif in YPD at 25 °C. The remaining
candidate targets that had JASPAR motifs were not labelled as ssTFs
for the following reasons. First, one (Yox1) appeared site specific but
wasclassifiedasacofactor.Second,oneisaGTF(TBP/Spt15).Third,21
producedChExMixbindinglocationsbutweredeemedtobecofactors
inYPDat25 °C(thatis,theyhadboundlocations,butwerenotbound
site-specifically). Their site specificity could be condition specific.
Fourth,37werenottestedornotepitope-tagged(possiblybecauseof
lethality or technical difficulty in tagging). The remaining targets did
notpassourdetectionthresholds.SeeSupplementaryData 2 (1,9,11)
for the complete list of candidate factors, JASPAR/cis-bp motifs, and
matches to ssTF-bound location in ref. 53
.
CircuitryinvolvingssTFs
ThisanalysisrelatestoExtendedDataFig. 10.Weanalysedthesetof78
genesencodingssTFs(definedinYPD)alongwiththessTFsthatbound
theirpromoterregionssitespecifically(SupplementaryData 2 (1K)).A
circuit-like diagram was then constructed by connecting ssTFs to the
ssTF-encodinggenestowhichtheybound.Thetotalnumberofgenes
(ssTFandallothers)towhichassTFboundwasrecorded,separatedinto
site-specifically bound versus those for which binding was recorded
but a cognate motif was not detected.
Theyeastepigenome.orgwebsite
Design.Thebackendofyeastepigenome.orgiscomposedoftwointer-
nalmodules:anodejsRESTapplicationandMongoDBdatabase(version
Article
4.2.8).MongoDBstoressample-specificmetainformationandURLsin
a JSON/BSON structure. The frontend of yeastepigenome.org is com-
posedofaReactapplication,bootstrappedusingthecreate-react-app
tool. A target page is subdivided into sections containing heatmaps,
composite plots and other analyses and visualizations. The frontend
retrievessampleinformationbymakinganapplicationprograminter-
face(API)requesttothebackendapplication.Thefrontendisdesigned
tosupportacartsystemfordownloadingtargetdatasets;ithasUCSC
(https://genome.ucsc.edu) trackhub integrations and an integrated
targetlookupontheSGDwebsite,anditcomeswithasetoffrequently
asked questions (FAQs) with detailed explanations of all of the plots
and visualizations.
Targetlocations.ChExMixcalledbindingeventsusingastringentsta-
tisticaltestofhighlylocalizedtagsthatwasoptimizedtominimizefalse
positives41
. As a consequence, ChExMix did not call bound locations
wheretagdistributionswerediffuseandmarginallyabovebackground
(for example, chromatin remodellers). To potentially capture events
withmarginalsignificance,wedividedeachsectorintofive‘subsectors’
and determined for each dataset whether there was enrichment over
thenegativecontrol(masterNoTag_20180928)acrosseachsubsector.
Wedefinedthesubsectorsasfollows:first,promoterregion(−350 bp
to −75 bp relative to the TSS); second, TSS region (−75 bp to +150 bp
relativetotheTSS);third,genebody5′-end(+150 bpto+450 bprelative
to the TSS); fourth, gene body 3′-end (−400 bp to −100 bp relative to
theTES);andfifth,TESregion(−100 bpto+100 bprelativetotheTES).
Theratiooftagcounts(test/control)inasubsector(ortheselected
region)wascalculatedafterthetestandthenegativecontrolsamples
were normalized using the NCIS method54
. The following steps were
taken to calculate the significance of tag enrichment in a subsector.
First,test/controltagratiosforsubsectorswerecalculated,thencon-
vertedtoalog2 scale.Second,aGaussianmodel,whichrepresentsthe
backgroundratiooftagcounts,wasfittothedistributionoftagratios.
Third,asignificancevaluewascalculatedwithrespecttotheGaussian
model. Fourth, P values were adjusted with the Benjamini–Hochberg
correction42
(P = 0.05). The subsector analysis of each dataset is pre-
sentedasaseparatetabatyeastepigenome.org.Thesesubsectorswere
not used for any other analyses herein.
Motifdiscovery.Thede novomotifdiscoverypresentedatyeastepig-
enome.orgwasachievedusingtheMEMEsuite55
asfollows:aChExMix
peak .bed file was intersected with a curated BED file (Merged_sec-
tors_for_MEME_924.bed)consistingofallgenesectors(thisreference
dataset is available at in 02_References_and_Features_Files at https://
github.com/CEGRcode/2021-Rossi_Nature),withoverlappingregions
mergedintoasingleregion.TheintersectedoutputBEDfilewassorted
on the basis of the score reported by ChExMix for each peak. After
sorting, the top 200 peak locations were bidirectionally expanded
to 60 bp and the underlying DNA sequence was extracted in FASTA
format. These sequences were used as the input for MEME55
. Default
parameters were used, with the following exceptions: the minimum
and maximum motif widths (mememinw and mememaxw) were set
as 6 and 18, respectively.
Datavisualization.Togenerateheatmaps,the‘TagPileUpFrequency’
tool was used with no tagshifts, single-base-pair bins, and tags set to
equal with combined strands. The tool takes in an input of BED file
containing regions that have at least one overlapping ChExMix peak
andthetargetExperimentBAMfile.Thetooloutputsamatrixcontain-
ingtagfrequencies,witheachrowrepresentingtheregionofinterest
andeachcolumnasingle-base-pairbin.Thisoutputfilewasfedintoa
heatmap script that uses Java TreeView’s algorithm and matplotlib to
generatetherequiredheatmap.BEDfileswerepresortedonthebasis
of the criteria indicated in each online graphical image before run-
ningTagPileUpFrequencytogeneratedesiredheatmaps.Allheatmaps
were set to the same contrast threshold, which is calculated from the
tag pileup frequency matrix of BoundGenes and determining a 95th
percentile cutoff from this frequency distribution.
Togeneratecomposites,the‘TagPileUpFrequency’toolwasusedwith
notagshifts,single-base-pairbinsandtagssettoequalwithcombined
strands. One of the inputs to this tool is a BED file containing regions
that have at least one overlapping ChExMix peak; the other is a BAM
file.ThetoolwasrunonthetargetandmasterNoTag_20180928control
BAM files individually, to generate two data files that were fed into
a composite generation script. The script uses matplotlib, a python
plotting library, to generate a combined composite plot.
Reportingsummary
Further information on research design is available in the Nature
Research Reporting Summary linked to this paper.
Dataavailability
SeeSupplementaryData 4foralistofwheretofindavailabledataand
codeonline.Inessence,allrawsequencingdataandpeakfilesfromthis
study are available at the NCBI GEO (https://www.ncbi.nlm.nih.gov/
geo/)underaccessionnumberGSE147927.Processeddataareavailable
at https://doi.org/10.26208/rykf-6050. Additional analyses and data
are at yeastepigenome.org. We warn that single-replicate data files
arenotlikelytohavemeaningfuldataandshouldnotbeusedwithout
further replication. All underlying data used to generate composite
plots, coordinate files and script parameters for Figs. 2–5, Extended
DataFigs. 4,5,7,8bandSupplementaryFig. 1canbedownloadedfrom
https://github.com/CEGRcode/2021-Rossi_Nature. Final composite
plot values can be found in Supplementary Data 5.
Codeavailability
Code is available at https://github.com/CEGRcode/scriptmanager.
36.	 Vinayachandran, V. et al. Widespread and precise reprogramming of yeast
protein-genome interactions in response to heat shock. Genome Res. 28, 357–366 (2018).
37.	 Wal, M. & Pugh, B. F. Genome-wide mapping of nucleosome positions in yeast using
high-resolution MNase ChIP-Seq. Methods Enzymol. 513, 233–250 (2012).
38.	 Shao, D., Kellogg, G. D., Lai, W. K. M., Mahony, S. & Pugh, B. F. in Practice and Experience in
Advanced Research Computing 285–292 (Association for Computing Machinery,
Portland, OR, 2020).
39.	 Picard Toolkit. http://broadinstitute.github.io/picard/ (2019).
40.	 Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–
2079 (2009).
41.	 Yamada, N., Lai, W. K. M., Farrell, N., Pugh, B. F. & Mahony, S. Characterizing protein–DNA
binding event subtypes in ChIP–exo data. Bioinformatics 35, 903–913 (2019).
42.	 Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful
approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995).
43.	 de Hoon, M. J., Imoto, S., Nolan, J. & Miyano, S. Open source clustering software.
Bioinformatics 20, 1453–1454 (2004).
44.	 Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat.
Biotechnol. 37, 38–44 (2019).
45.	 Xu, Z. et al. Bidirectional promoters generate pervasive transcription in yeast. Nature 457,
1033–1037 (2009).
46.	 Rhee, H. S. & Pugh, B. F. Genome-wide structure and organization of eukaryotic
pre-initiation complexes. Nature 483, 295–301 (2012).
47.	 van Dijk, E. L. et al. XUTs are a class of Xrn1-sensitive antisense regulatory non-coding
RNA in yeast. Nature 475, 114–117 (2011).
48.	 Albert, I., Wachi, S., Jiang, C. & Pugh, B. F. GeneTrack—a genomic data processing and
visualization framework. Bioinformatics 24, 1305–1306 (2008).
49.	 Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
50.	 Jiang, C. & Pugh, B. F. A compiled and systematic reference map of nucleosome positions
across the Saccharomyces cerevisiae genome. Genome Biol. 10, R109 (2009).
51.	 Yen, K., Vinayachandran, V., Batta, K., Koerber, R. T. & Pugh, B. F. Genome-wide
nucleosome specificity and directionality of chromatin remodelers. Cell 149, 1461–1473
(2012).
52.	 Badis, G. et al. A library of yeast transcription factor motifs reveals a widespread function
for Rsc3 in targeting nucleosome exclusion at promoters. Mol. Cell 32, 878–887 (2008).
53.	 MacIsaac, K. D. et al. An improved map of conserved regulatory sites for Saccharomyces
cerevisiae. BMC Bioinformatics 7, 113 (2006).
54.	 Liang, K. & Keleş, S. Normalization of ChIP-seq data with control. BMC Bioinformatics 13,
199 (2012).
55.	 Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover
motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
Acknowledgements This work was supported by National Institutes of Health (NIH) grants
ES013768, GM059055 and HG004160 to B.F.P.; National Science Foundation (NSF) ABI
INNOVATION grant 1564466 to S.M.; grants from the Pennsylvania State University Institute for
Computational and Data Sciences to B.F.P. and W.K.M.L.; and computation from Advanced
CyberInfrastructure (ROAR) at the Pennsylvania State University. We thank D. Shao for her role
as lead software engineer for the PEGR platform and for support through the Penn State
Institute and Computational Data Sciences (ICDS) Research Innovations with Scientists and
Engineers (RISE) team. We thank O. Lang for operating EpitopeID.
Author contributions M.J.R. designed and conducted experiments; performed library
sequencing and data analysis; designed and tested the quality-control pipelines and web
page; trained and managed lab personnel to produce data; supervised the project and
co-wrote the manuscript. P.K.K. designed, developed and implemented the quality-control
pipeline, analysis pipeline and website; organized and maintained data files; and provided
bioinformatic support. W.K.M.L. performed high-throughput data processing and analysis, and
provided bioinformatic support and scientific discussion. N.Y. provided bioinformatic support
and developed the initial quality-control pipeline. N.B. and C.M. performed ChIP–exo and
MNase ChIP–seq experiments and provided scientific discussion. G.K. provided bioinformatic
support. K.B. and N.P.F. conducted ChIP–exo experiments and performed library sequencing.
T.R.B., J.D.M., A.V.B., K.S.M., D.J.R. and E.S.P. conducted ChIP–exo experiments. G.D.K. provided
high-performance infrastructure architecture and development, and edge-computing
infrastructure design and support. S.M. provided bioinformatic guidance and support. B.F.P.
conceptualized the project and conclusions, designed experiments, analysed the data, wrote
the main text of the manuscript and co-wrote the remaining parts.
Competing interests B.F.P. has a financial interest in Peconic, LLC, which offers the ChIP–exo
technology (US Patent 20100323361A1) implemented herein as a commercial service and
could potentially benefit from the outcomes of this research. The remaining authors declare
no competing interests.
Additional information
Supplementary information The online version contains supplementary material available at
https://doi.org/10.1038/s41586-021-03314-8.
Correspondence and requests for materials should be addressed to B.F.P.
Peer review information Nature thanks Vishwanath Iyer and the other, anonymous, reviewer(s)
for their contribution to the peer review of this work. Peer reviewer reports are available.
Reprints and permissions information is available at http://www.nature.com/reprints.
Article
ExtendedDataFig.1|ChIP–exotargetswithinmeta-assemblages.
a,Simplifiedviewoftranscriptionalregulation.AssTF(TF;forexample,Gal4)
bindstoitscognatemotif(aUAS)withinpromotersincompetitionwith
chromatin/nucleosomes(redbar).ThessTFrecruits(pink/greenarrow)
cofactors(forexample,SAGAandMediator)thatassistintheassemblyofaPIC
(comprisingTBP,TFIIB,andsoon)andofPolIIatthetranscriptstartsite(TSS)
ofgenes.PolIIthentraversesthegenetothetranscriptionendsite(TES).
b,DiagramshowingtheChIP–exoassay.Proteintargetsarecrosslinkedto
DNA,whichisthenfragmented.Specificproteinsarecapturedthroughan
engineeredTAPtagthatbindsthecommonFcregionofanyimmobilizedIgG.
Near-base-pairresolutionisachievedusingastrand-specificλexonuclease
thatdigestseachstrandofDNAinthe5′–3′directionuptothepointof
crosslinking. c,Piechartshowingassayedtargetsseparatedbybroad
GO-basedclassifications(innerring),orbyUMAP-basedclusteringof
genome-widebindinglocations(outerring).Listedarethecommonnamesof
ChIP–exotargetsthatgeneratedsignificantlyenrichedlocations(with
‘significance’definedintheMethodssection‘ChExMixlocations’),groupedby
theirUMAP/K-means-derivedmeta-assemblageabbreviations(alongwith
membershipcount),whicharefurthergroupedbythesimplifiedGO-related
categories.SeealsoSupplementaryData 2 (2H).
ExtendedDataFig.2|Datavisualizationanddiscoveryinyeastepigenome.
org.Shownisanexamplewebbrowserviewatyeastepigenome.orgofChIP–
exooccupancypatternsforalltargets(forexample,Reb1)aroundpredefined
genomicfeatures.Rowsaresortedbygeneorpromoter(NFR/NDR)length,or
bydistancefromtheindicatedreferencefeature(wherex = 0).Promoter
classesinclude(fromtoptobottom)RP,STM,TFO,UNBandothers.See
SupplementaryData 1 (1G,1J,1C)fortheidentificationnumbersand
coordinatesofrespectiverowfeatures,andforthesortorderoffeaturesthat
areconstantinalltargetdisplaywindows.Thelowerrightbox(whenpresent)
providesstrand-separatedtag5′endsdistributedaroundthetarget’scognate
DNAmotif,withthemotif’soppositestrand(red)invertedinthecomposite
plot.Correspondingcolour-codednucleotidesequencesareshown.All
images,underlyingdatavaluesanddatasetscanbedownloadedthrough
embedded‘METADATA’target-specificlinksatyeastepigenome.org.Each
datasetdownloadincludesaReadMefiledescribingthecontentsofthe
download.Wewarnthattargetswithonlyasinglereplicatedidnotpassour
significancethreshold.SeeSupplementaryData 1 (1C)forsortordersthatare
notprovidedinthedownload.
Article
ExtendedDataFig.3|UMAPgranularity.UMAPprojectionfromFig. 1c,withzoomed-ininserts.Labelsare40K-means-basedabbreviations(Supplementary
Data 2 (1J)).Forcoordinatevaluesforindividualtargets,seeSupplementaryData 2 (1C,1D).
ExtendedDataFig.4|ConfidenceintervalsfortwoexampleChIP–exo
datasets. Left,plotsshowingtheChIP–exopatternsforOrc6andMcm5.Bold
linesrepresentmeans;dottedlinesrepresentthe5–95%confidenceinterval
(CI).TheCIwascalculatedforeachbasepairinthe1-kbwindowacrossallACSs
(n = 253).Right,heatmapsshowingACSoccupancybyOrc6andMcm5.Blue
representsChIP–exodataontheACSmotifstrand;redrepresentsdataonthe
oppositestrand.
Article
ExtendedDataFig.5|ProteinarchitectureatregionstranscribedbyPolII.
Shownisgene-bodyoccupancy(strandscombined)ofselectedPolII
elongation-associatedtargets(aandbhavedifferentdatasets,asannotated).
Ineachpanel,datawerealignedandorientatedbyTSS(left)andTES(right).
Shownarethetop200codinggenes(middle)andthetop200noncodinggenes
(bottom)(basedonSua7occupancy).SeealsoFig. 3c andnotethattheRP
panelsareidenticaltoFig. 3c.They-axisvaluesrepresentarbitrarylinearunits
(a.u.),andarenotcomparableacrossdifferentdatasets,butarescaled
equivalentlyacrosseachofthesixsubpanelsforthesamedataset(shownwith
thesamefillcolour).IndividualplottedvaluescanbefoundinSupplementary
Data 5.
ExtendedDataFig.6|ArchitectureatLTRs.Heatmapsshowtheoccupancy
oftheTFIIIB(Bdp1andTBP/Spt15)componentsofthePolIIIPICandtheTFIIB
(Sua7)componentsofthePolIIPICatthefiveTyLTRclasses,alongwiththe
nucleotidecomposition(±100 bpfromtheLTRstart;fromyeastepignome.org;
Gs,As,TsandCsareinyellow,red,greenandblue,respectively).Allrowsare
linkedandsortedbyLTRclass,thenlength.
Article
ExtendedDataFig.7|Seenextpageforcaption.
A high-resolution protein architecture of the budding yeast genome.pdf
A high-resolution protein architecture of the budding yeast genome.pdf
A high-resolution protein architecture of the budding yeast genome.pdf
A high-resolution protein architecture of the budding yeast genome.pdf
A high-resolution protein architecture of the budding yeast genome.pdf
A high-resolution protein architecture of the budding yeast genome.pdf
A high-resolution protein architecture of the budding yeast genome.pdf
A high-resolution protein architecture of the budding yeast genome.pdf
A high-resolution protein architecture of the budding yeast genome.pdf

More Related Content

Similar to A high-resolution protein architecture of the budding yeast genome.pdf

Dr. Ita Margaretha Nainggolan - Stem Cell Research and its Development in Ind...
Dr. Ita Margaretha Nainggolan - Stem Cell Research and its Development in Ind...Dr. Ita Margaretha Nainggolan - Stem Cell Research and its Development in Ind...
Dr. Ita Margaretha Nainggolan - Stem Cell Research and its Development in Ind...
BayuWinata3
 
Brainbow - Combinatorial Fluorescent Protein Techniques to Map The Human Conn...
Brainbow - Combinatorial Fluorescent Protein Techniques to Map The Human Conn...Brainbow - Combinatorial Fluorescent Protein Techniques to Map The Human Conn...
Brainbow - Combinatorial Fluorescent Protein Techniques to Map The Human Conn...
Corbett Hall
 
BolingerJustin - Honors Thesis
BolingerJustin - Honors ThesisBolingerJustin - Honors Thesis
BolingerJustin - Honors Thesis
Justin P. Bolinger
 
Glypican and Biglycan in the Nuclei of Neurons and Glioma Cells
Glypican and Biglycan in the Nuclei of Neurons and Glioma CellsGlypican and Biglycan in the Nuclei of Neurons and Glioma Cells
Glypican and Biglycan in the Nuclei of Neurons and Glioma Cells
Yu Liang
 
Ptacin_et_al-2013-Cellular_Microbiology (review)
Ptacin_et_al-2013-Cellular_Microbiology (review)Ptacin_et_al-2013-Cellular_Microbiology (review)
Ptacin_et_al-2013-Cellular_Microbiology (review)
Jerod Ptacin
 
Molecular Biology
Molecular BiologyMolecular Biology
Molecular Biology
Bealise Sc
 
Characterising the Interactome of EZH2 in Embryonic Stem Cells (3)
Characterising the Interactome of EZH2 in Embryonic Stem Cells (3)Characterising the Interactome of EZH2 in Embryonic Stem Cells (3)
Characterising the Interactome of EZH2 in Embryonic Stem Cells (3)
Daire Murphy
 
ASCB_Poster_edit_TM1_GA1_TV5
ASCB_Poster_edit_TM1_GA1_TV5ASCB_Poster_edit_TM1_GA1_TV5
ASCB_Poster_edit_TM1_GA1_TV5
James Luginsland
 
Functional neurological restoration of amputated peripheral nerve using biohy...
Functional neurological restoration of amputated peripheral nerve using biohy...Functional neurological restoration of amputated peripheral nerve using biohy...
Functional neurological restoration of amputated peripheral nerve using biohy...
BkesNar
 
7 Ab Brain Cytochrome Oxidase Subunit Complementary DNAs
7 Ab Brain Cytochrome Oxidase Subunit Complementary DNAs7 Ab Brain Cytochrome Oxidase Subunit Complementary DNAs
7 Ab Brain Cytochrome Oxidase Subunit Complementary DNAs
Mary Mullen
 

Similar to A high-resolution protein architecture of the budding yeast genome.pdf (20)

Seah_SURF (1)
Seah_SURF (1)Seah_SURF (1)
Seah_SURF (1)
 
Dr. Ita Margaretha Nainggolan - Stem Cell Research and its Development in Ind...
Dr. Ita Margaretha Nainggolan - Stem Cell Research and its Development in Ind...Dr. Ita Margaretha Nainggolan - Stem Cell Research and its Development in Ind...
Dr. Ita Margaretha Nainggolan - Stem Cell Research and its Development in Ind...
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Brainbow - Combinatorial Fluorescent Protein Techniques to Map The Human Conn...
Brainbow - Combinatorial Fluorescent Protein Techniques to Map The Human Conn...Brainbow - Combinatorial Fluorescent Protein Techniques to Map The Human Conn...
Brainbow - Combinatorial Fluorescent Protein Techniques to Map The Human Conn...
 
BolingerJustin - Honors Thesis
BolingerJustin - Honors ThesisBolingerJustin - Honors Thesis
BolingerJustin - Honors Thesis
 
Glypican and Biglycan in the Nuclei of Neurons and Glioma Cells
Glypican and Biglycan in the Nuclei of Neurons and Glioma CellsGlypican and Biglycan in the Nuclei of Neurons and Glioma Cells
Glypican and Biglycan in the Nuclei of Neurons and Glioma Cells
 
Ptacin_et_al-2013-Cellular_Microbiology (review)
Ptacin_et_al-2013-Cellular_Microbiology (review)Ptacin_et_al-2013-Cellular_Microbiology (review)
Ptacin_et_al-2013-Cellular_Microbiology (review)
 
A novel phylum-level archaea characterized by combining single-cell and metag...
A novel phylum-level archaea characterized by combining single-cell and metag...A novel phylum-level archaea characterized by combining single-cell and metag...
A novel phylum-level archaea characterized by combining single-cell and metag...
 
Molecular Biology
Molecular BiologyMolecular Biology
Molecular Biology
 
3DSIG 2014 Presentation: Systematic detection of internal symmetry in proteins
3DSIG 2014 Presentation: Systematic detection of internal symmetry in proteins3DSIG 2014 Presentation: Systematic detection of internal symmetry in proteins
3DSIG 2014 Presentation: Systematic detection of internal symmetry in proteins
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Characterising the Interactome of EZH2 in Embryonic Stem Cells (3)
Characterising the Interactome of EZH2 in Embryonic Stem Cells (3)Characterising the Interactome of EZH2 in Embryonic Stem Cells (3)
Characterising the Interactome of EZH2 in Embryonic Stem Cells (3)
 
ASCB_Poster_edit_TM1_GA1_TV5
ASCB_Poster_edit_TM1_GA1_TV5ASCB_Poster_edit_TM1_GA1_TV5
ASCB_Poster_edit_TM1_GA1_TV5
 
A physical sciences network characterization of non-tumorigenic and metastati...
A physical sciences network characterization of non-tumorigenic and metastati...A physical sciences network characterization of non-tumorigenic and metastati...
A physical sciences network characterization of non-tumorigenic and metastati...
 
Vu-2015_tissue-plasticity-PNET
Vu-2015_tissue-plasticity-PNETVu-2015_tissue-plasticity-PNET
Vu-2015_tissue-plasticity-PNET
 
Vu-2015_tissue-plasticity-PNET
Vu-2015_tissue-plasticity-PNETVu-2015_tissue-plasticity-PNET
Vu-2015_tissue-plasticity-PNET
 
00003 Jc Silva 2006 Mcp V5n4p589
00003 Jc Silva 2006 Mcp V5n4p58900003 Jc Silva 2006 Mcp V5n4p589
00003 Jc Silva 2006 Mcp V5n4p589
 
Functional neurological restoration of amputated peripheral nerve using biohy...
Functional neurological restoration of amputated peripheral nerve using biohy...Functional neurological restoration of amputated peripheral nerve using biohy...
Functional neurological restoration of amputated peripheral nerve using biohy...
 
Grindberg - PNAS
Grindberg - PNASGrindberg - PNAS
Grindberg - PNAS
 
7 Ab Brain Cytochrome Oxidase Subunit Complementary DNAs
7 Ab Brain Cytochrome Oxidase Subunit Complementary DNAs7 Ab Brain Cytochrome Oxidase Subunit Complementary DNAs
7 Ab Brain Cytochrome Oxidase Subunit Complementary DNAs
 

More from Cornell University

Widespread and precise reprogramming of yeast protein–genome interactions in ...
Widespread and precise reprogramming of yeast protein–genome interactions in ...Widespread and precise reprogramming of yeast protein–genome interactions in ...
Widespread and precise reprogramming of yeast protein–genome interactions in ...
Cornell University
 
Disrupted development and altered hormone signaling in male Padi2:Padi4 doubl...
Disrupted development and altered hormone signaling in male Padi2:Padi4 doubl...Disrupted development and altered hormone signaling in male Padi2:Padi4 doubl...
Disrupted development and altered hormone signaling in male Padi2:Padi4 doubl...
Cornell University
 
A conserved isoleucine in the LOV1 domain of a novel phototropin from the mar...
A conserved isoleucine in the LOV1 domain of a novel phototropin from the mar...A conserved isoleucine in the LOV1 domain of a novel phototropin from the mar...
A conserved isoleucine in the LOV1 domain of a novel phototropin from the mar...
Cornell University
 

More from Cornell University (6)

Phase separation directs ubiquitination of gene-body nucleosomes.pdf
Phase separation directs ubiquitination of gene-body nucleosomes.pdfPhase separation directs ubiquitination of gene-body nucleosomes.pdf
Phase separation directs ubiquitination of gene-body nucleosomes.pdf
 
Widespread and precise reprogramming of yeast protein–genome interactions in ...
Widespread and precise reprogramming of yeast protein–genome interactions in ...Widespread and precise reprogramming of yeast protein–genome interactions in ...
Widespread and precise reprogramming of yeast protein–genome interactions in ...
 
Acute stress drives global repression through two independent RNA polymerase ...
Acute stress drives global repression through two independent RNA polymerase ...Acute stress drives global repression through two independent RNA polymerase ...
Acute stress drives global repression through two independent RNA polymerase ...
 
Disrupted development and altered hormone signaling in male Padi2:Padi4 doubl...
Disrupted development and altered hormone signaling in male Padi2:Padi4 doubl...Disrupted development and altered hormone signaling in male Padi2:Padi4 doubl...
Disrupted development and altered hormone signaling in male Padi2:Padi4 doubl...
 
A conserved isoleucine in the LOV1 domain of a novel phototropin from the mar...
A conserved isoleucine in the LOV1 domain of a novel phototropin from the mar...A conserved isoleucine in the LOV1 domain of a novel phototropin from the mar...
A conserved isoleucine in the LOV1 domain of a novel phototropin from the mar...
 
High similarity among ChEC-seq datasets.pdf
High similarity among ChEC-seq datasets.pdfHigh similarity among ChEC-seq datasets.pdf
High similarity among ChEC-seq datasets.pdf
 

Recently uploaded

CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Cherry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Cherry
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Cherry
 
ONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for voteONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for vote
RaunakRastogi4
 

Recently uploaded (20)

CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
ONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for voteONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for vote
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 

A high-resolution protein architecture of the budding yeast genome.pdf

  • 1. Nature | Vol 592 | 8 April 2021 | 309 Article Ahigh-resolutionproteinarchitectureofthe buddingyeastgenome ­­­­­­ Matthew J. Rossi1 , Prashant K. Kuntala1 , William K. M. Lai1,2 , Naomi Yamada1 , Nitika Badjatia1 , Chitvan Mittal1,2 , Guray Kuzu1 , Kylie Bocklund1 , Nina P. Farrell1 , Thomas R. Blanda1 , Joshua D. Mairose1 , Ann V. Basting1 , Katelyn S. Mistretta1 , David J. Rocco1 , Emily S. Perkinson1 , Gretta D. Kellogg1,2 , Shaun Mahony1 & B. Franklin Pugh1,2 ✉ Thegenome-widearchitectureofchromatin-associatedproteinsthatmaintains chromosomeintegrityandgeneregulationisnotwelldefined.Hereweusechromatin immunoprecipitation,exonucleasedigestionandDNAsequencing(ChIP–exo/seq)1,2 todefinethisarchitectureinSaccharomycescerevisiae.Weidentify21meta- assemblagesconsistingofroughly400differentproteinsthatarerelatedtoDNA replication,centromeres,subtelomeres,transposonsandtranscriptionbyRNA polymerase(Pol)I,IIandIII.Replicationproteinsengulfanucleosome,centromeres lackanucleosome,andrepressiveproteinsencompassthreenucleosomesat subtelomericX-elements.WefindthatmostpromotersassociatedwithPolIIevolved tolackaregulatoryregion,havingonlyacorepromoter.Theseconstitutive promoterscompriseashortnucleosome-freeregion(NFR)adjacenttoa+1 nucleosome,whichtogetherbindthetranscription-initiationfactorTFIIDtoforma preinitiationcomplex.Positionedinsulatorsprotectcorepromotersfromupstream events.Asmallfractionofpromotersevolvedanarchitectureforinducibility, wherebysequence-specifictranscriptionfactors(ssTFs)createanucleosome- depletedregion(NDR)thatisdistinctfromanNFR.Wedescribestructural interactionsamongssTFs,theircognatecofactorsandthegenome.These interactionsincludethenucleosomalandtranscriptionalregulatorsRPD3-L,SAGA, NuA4,Tup1,MediatorandSWI–SNF.Surprisingly,wedonotdetectinteractions betweenssTFsandTFIID,suggestingthatsuchinteractionsdonotstablyoccur.Our modelforgeneinductioninvolvesssTFs,cofactorsandgeneralfactorssuchasTBP andTFIIB,butnotTFIID.Bycontrast,constitutivetranscriptioninvolvesTFIIDbutnot ssTFsengagedwiththeir cofactors.Fromthis,wedefineahighlyintegratednetwork ofgeneregulationbyssTFs. Genomes regulate genes so as to achieve homeostasis—the mainte- nanceofcellularcomponentsinproperbalance.Theyalsoadapt,mak- ing adjustments in rapidly changing environments, so as to regain homeostasis3 . Achieving these tasks has necessitated the evolution ofconstitutiveandinduciblegenecontrol.Whetherornotthesecon- trols are fundamentally different at the molecular level is unknown. A classical view posits a single basic regulatory paradigm for genes (Extended Data Fig. 1a)4 : environmental signals toggle ‘on’ ssTFs that recruitcofactorsandassembleapreinitiationcomplex(PIC)consisting ofPolIIandgeneraltranscriptionfactors(GTFs)suchasTBP,TFIIDand TFIIB at core promoter transcription start sites (TSSs)5 . However, the extenttowhichconstitutivegeneexpressioninvolvesssTFsisunclear, as ssTF-binding sites and their cofactors remain unidentified at most promoters. ssTFs, cofactors, chromatin and PICs play into any dis- tinction between inducible and constitutive mechanisms, but their interrelationships remain enigmatic. Genome-wideproteinmeta-assemblages Here we used ChIP–exo (Extended Data Fig. 1b)1,2 , an ultra-high- resolution version of ChIP–seq, to map genome-wide binding. We selectedtargetproteinsonthebasisofGeneOntology(GO)annotations related to chromosomal function (Extended Data Fig. 1c and Supple- mentaryData 1 (1BY);characters in parenthesesreferto theworksheet numberandcolumnletter).Intotal,wecollected1,229datasetson791 targets, of which 400 targets had reproducibly significant data (Sup- plementary Data 2 (1A)). The interaction pattern of all 1,229 datasets aroundindividualandbroadclassesofgenomicfeatures(Fig. 1a)canbe visualizedanddownloadedatyeastepigenome.org(anexampleisgiven inExtendedDataFig. 2).WealsodevelopedandprovideScriptManager, a platform for customized analysis of these data (see Methods). Binarizedcolocationcountsamongtargetswerehierarchicallyclus- tered (Fig. 1b). The three largest clusters (yellow) correspond to three https://doi.org/10.1038/s41586-021-03314-8 Received: 8 May 2020 Accepted: 29 January 2021 Published online: 10 March 2021 Check for updates 1 Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA. 2 Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA. ✉e-mail: fp265@cornell.edu
  • 2. 310 | Nature | Vol 592 | 8 April 2021 Article majoraspectsofgeneexpression:first,promoterregulation;second,PIC assembly;andthird,transcriptionelongation.Thus,thevastmajorityof chromatin-associatedproteinsarededicatedtogeneregulation.Weused uniform manifold approximation and projection (UMAP) to represent eachdatasetasasinglepointinatwo-dimensionalprojection(Fig. 1cand ExtendedDataFig. 3).Pointsincloseproximityreflectapopulation-based compositecolocalizationoftargets(‘meta-assemblages’).Weperformed K-meansclusteringontheprojectionandderived21meta-assemblages thatcorrespondlargelytoknowninteractingbiochemicalcomplexes,or relatedgeneontologies(Fig. 1c,outerpie,andSupplementaryData 2 (1F, 1H, 2G–2I)). This probably represents a comprehensive predominant protein architecture of the yeast genome (‘epigenome’) in rich media (seeSupplementaryData 2 (1–8)foradeeperanalysis). Overall,theorganizationdefinedbyUMAPrepresentsaremarkable degreeofconcordanceandmutualvalidationofbiochemicallypurified andfunctionallyannotatedcomplexeswiththeirarchitecturalorgani- zationacrossagenome,particularlyfromanunsupervisedapproach. Forexample,thepromotercofactorsMediator,SWI–SNF,SAGA,NuA4 andtheircognatessTFseachformedtightmeta-assemblagesthatwere located near each other but far from gene-body elongation factors (Fig. 1c).Proteinsofreplicationorigins,subtelomeresandcentromeres also formed distinct tight meta-assemblages that were far from each otherandfromgenemeta-assemblages.Thisprovidedstrongvalidation oftheChIP–exo/seqapproachandepitopetagging.Notably,wecannow linkmostssTFswiththeircognatecofactorsandpromoterarchitecture. Proteinarchitectureatgenomicfeatures DNA replication initiates at 253 autonomously replicating sequence consensussequence(ACS)elementsthatareconstitutivelyboundby origin recognition complexes (ORCs)6 . The ‘ORC’ meta-assemblage contained six measured targets (Fig. 2a and Extended Data Fig. 4), which gave highly structured ChIP–exo patterns based on ORCs and the DNA helicase MCM, spread over roughly 300 base pairs. ORCs at nucleosome-freeACSsengulfedaneighbouringnucleosome.Thebind- ing of Mcm5 from ORCs was offset by 50–100 bp, consistent with a recently published model based on cryo-electron microscopy7 . SubtelomericX-elementsrepresentaheterochromaticenvironmentthat isrepressedbysilentinformationregulators(SIRs),functionallysupporting telomeres8 .Indeed,wefoundthatSIRproteinsformedastructurallyrobust meta-assemblageonasinglenucleosome,centredonroughly300-bp X-coreelements(XCEs),alongwithORC/MCMsandinsulatorssTFsat twoflankingnucleosomes(Fig. 2b).KU(Yku70)andRIF(Rif1)complexes, alongwithssTFsFkh1,Abf1andReb1,werepresentatthevastmajorityof mappableX-elements.However,aSko1-mediatedTup1repressioncomplex waspresentatonlyhalf,perhapsreflectingvariablerepressioncapabilities ofsubtelomericregions.Thus,XCEsappeartocreateawellstructuredtriple nucleosomeensemblecomprisingmajorrepressorproteins. Thecentromericmeta-assemblage (‘CEN’)contained12targetsat16 centromeres(Fig. 2c),whichareresponsibleforproperchromosomal segregationduringcelldivision.Theyincludedsite-specificallybound Cbf1 at the centromere centre (CDE I) and kinetochore components offset by roughly 100 bp towards the AT-rich CDE III elements9 . These factors generated strong and well positioned crosslinks covering roughly 170 bp of DNA, suggesting that they are positionally fixed to CDEs. Condensin and cohesin play a part in chromosomal conden- sation and segregation. They were absent from the centromere and insteadoverlappedthesurroundingnucleosomes,suggestingthatthey interactwithnucleosomes.Incontrastwithlower-resolutionmaps10,11 , histones were not detected at centromeres, despite robust detection of histone-like Cse4 and kinetochore components there, and robust detection of histones (H2A, H2A.Z, H2B, H3 and H4) in the immediate flankingregions12 .Thus,yeastcentromeresappeartolackthehistone components of a nucleosome in vivo. The resident kinetochore com- plexprotectsanucleosome-sizedregionofDNAfromnucleases,which was a basis for a nucleosome originally being called there13 . Nonethe- less, Cse4-containing nucleosomes have been defined biochemically and structurally in vitro10,14 , and so the question remains open. The Pol I complex produces ribosomal RNA (rRNA) from a single highly repeated gene. It contained TBP anchored near the rRNA TSS (Fig. 3a).Italsohadmajorcrosslinkinginteractionswiththewellposi- tionedPol-I-specificupstreamactivatingfactor(UAF,Uaf30)complex, whichcoveredroughly70 bpbetween−155 bpand−60 bpfromtheTSS. UAFalsohadreciprocalcrosslinkswithTBPatthecorepromoter.Thus, thePolIinitiationcomplexhasafixedbipartiteengagementthatcovers around 200 bp of rRNA promoter DNA, with an intervening 100 bp or so.ThebroadextensionofPolIdownstreamintotherRNAgenebody withlessoccupancyatpromotersindicatesthatPolIdissociatesrapidly from its PIC into an elongating state. Pol III of the ‘POL3’ meta-assemblage transcribes 272 highly similar genes encoding transfer RNAs (tRNAs). It contained 18 targets that couldbeseparatedintoTFIIIB/CandPolIIImeta-assemblages(Fig. 3b). Theirorganizationmatchedlocationsmodelledfromatomicstructures oftheTFIIIB/PolIIIpromotercomplex15 ,butwiththeTBPcomponent of TFIIIB crosslinking approximately 30 bp upstream of the TSS. The ChIP–exo pattern further demonstrated that TFIIIC and Pol III make crosslinksnotonlyattheinternalAandBboxes,butalsoatcoincident locations roughly 40 bp upstream of TBP. Owing to DNA bending by a c b Elongation and chromatin regulation Promoter regulation PIC Pol II Pol III Replication Colocalization of 371 targets High Low All features in this study Transcribed (7,741) Non-transcribed (295) Replication (253) X-element (25) Centromere (16) Coding (6,121) Non-coding (1,346) CUTs (447) SUTs (365) XUTs (440) NCR (94) Pol I (2) Pol II (7,467) Pol III (272) RPG (137) STM (984) TFO (1,783) UNB (2,474) LTR (357) tRNA-proximal (135) No PIC (251) ••••••••• ••• • •• PIC occupancy TFIID-dominated SAGA-dominated Other (3,076) Not analysed (11,112) –15 –10 –5 0 5 10 15 –15 –10 –5 0 5 10 15 20 25 Histones Tup1 MET SIR NuA4 SAGA RPD3-L SBF SWI/SNF Mediator GTFs TFIID SWR ORC Splicing THO Nrd1 CPSF Pol II Spt5 RSC CEN TFIIIB/C Pol III ssTFs PAF Set1 ssTFs ISO ssTFs UMAP Axis 2 Axis 1 Histones Tup1 MET SIR NuA4 SAGA RPD3-L SBF SWI/SNF F F F F F F F F F F F F F F F F F F F F F F F Mediator GTFs TFIID SWR ORC S THO T T T T T T T T T T T T T T T T T T Nrd1 CPSF Pol II RSC CEN TFIIIB/C ssTFs Set1 ssTFs ISO ssTFs UMAP Fig.1|Genome-widemeta-assemblages.a,Classesofgenomicfeatures,with Nmembershipsanalysed(SupplementaryData 1 (1D)).PolIIclassesarefrom thisstudy(see Methods),alongwithrelativePICoccupancylevels(green dots). CUTs, crypticunstabletranscripts;SUTs, stableunannotated transcripts;XUT, Xrn1-sensistiveunstabletranscripts;NCR,noncodingRNA. b,Hierarchicalclusteringshowingthegenome-widecolocalizationof371 targets(SupplementaryData 3).c,UMAPprojectionshowingthecolocations of371targets(colouredonthebasisofK-means;SupplementaryData 2  (1C,1D)).AU,arbitraryunits.
  • 3. Nature | Vol 592 | 8 April 2021 | 311 TBP,thisregionisincloseproximitytoTFIIIB/CandPolIIIwithingene bodies. Equivalent positions of crosslinking points were observed acrossallTFIIIB/C/PolIIIsubunits.Thissuggeststhatasinglepredomi- nantstructureenvelopesentirePolIIIgenesandapproximately70 bp upstream, as it makes a short (roughly 80 bp) transcript. There are around 7,500 distinct Pol II transcription units (defined byaTSS/PIC),ofwhichapproximately80%codeforproteins.Targets that are associated with transcription elongation generally matched Pol II occupancy across gene bodies, but unlike Pol II (Rpb3) were not presentatpromoters(Fig. 3candExtendedDataFig. 5).Instead,occu- pancy within genes increased in the 5′ region and decreased in the 3′ region, with many having distinct ‘entry/exit’ points, consistent with other studies16 . Whether these are true cotranscriptional entry/exit pointsoraresimplycrosslinkableretentionsitesisnotclear.Termina- tionfactorssuchasPcf11werefoundprimarilyatsitesoftermination, alongwithnearbycohesin.Therewaslittleevidenceofthebindingof a Subtelomeric X-elements (XCE) b c Centromeres (CEN) Histone H4 Kinetechore Mcm16 C b f1 b f b f f b f b f b b b f b f f f b f b f b b f1 f1 f1 f1 b b b f f b f b f b f b f b f b b f f f b b Cse4 Nkp2 Nucleosome dyads CEN Smc3 (Cohesin) Cse4 Mcm16 Nkp2 Cbf1 0 –500 500 Distance from CEN start (bp) Opposite strand Same strand Occupancy (AU) 0 –500 500 Distance from ACS start (bp) ACS Nucleosome dyads DNA replication origins (ACS) Mcm5 Orc6 Orc6 Mcm5 N = 253 –500 Reb1 ORC O O O O O O O O O O O O O Fkh1 Tup1 Cyc8 Sko1 X X X X X X Abf1 Sir2,3,4, Yku70 , , C C C C Rif1,2 N = 25 0 500 Distance from XCE start (bp) Nucleosome dyads XCE Fig.2|Architectureatnontranscribedfeatures.a–c,Averageddistribution ofstrand-separated5′endsofChIP–exosequencingtags(exonucleasestop sites;seeExtendedDataFig. 1b),showingrepresentativetargetsaround strand-orientedannotatedfeatures.Thediagramsatthetopofeachpanelare cartoonrepresentationsofDNA,nucleosomesandproteinfactorsthatbindto DNAreplicationorigins (a),subtelomericX-elements (b)orcentromeres (c). Therelevantstartsequences(colouredAs,Ts,CsandGs)arealsoshownin a, c. Underneatharecompositedatashowingthedistributionoftheprotein factors.Same-stranddataareorientedwith5′to3′tobereadfromlefttoright. Opposite-stranddataareinverted(righttoleftis5′to3′).The y-axesshow lineararbitraryunits(AU),whicharenotcomparableinmagnitudeacross differentdatasets.NucleosomedyadswerederivedfromMNase-digested chromatinthatwasassayedbyH3/H2BChIP–seq(strandsaveraged). a d b Occupancy (AU) Opposite strand Same strand Opposite strand Same strand TFIIIC-τA (Tfc4) TFIIIC-τB (Tfc6) Pol III (Rpo31) TFIIIB TBP (Spt15) RNA polymerase III tRNA TFIIIC TFIIIB 0 –500 500 Occupancy (AU) Distance from Pol I TSS (bp) 0 –500 500 Distance from Pol III TSS (bp) 0 –500 500 Distance from Ty3 start (bp) Opposite strand Same strand Pol I (Rpa135) RNA polymerase I TBP (Spt15) UAF (Uaf30) UCE rRNA TBP Pol I c ±500 Distance from Pol II TSS (bp) Distance from Pol II TES (bp) Occupancy (AU) 0 –500 RNA polymerase II Rpb3 Set2 Set1 Elf1 Set3 0 500 Set2 Set1 Elf1 Set3 Nrd1 Pcf11 Rpb3 Pol II Ser2P TES Smd1 Spn1 Paf1 Spt5 Spt6 Spt16 Pol II Ser5P Spt16 Spt5 Cbc2 Spn1 Paf1 Spt6 +1 +2 TSS mRNA Sua7 Ste12 Kar4 Dig1 Transposon Ty3 (σ) LTR LTR Ste12 Dig1 tRNA Ty3 Occupancy (AU) A B TFIIIC-τB (Tfc6) Fig.3|Architectureattranscribedfeatures.a–d,Experimentswerecarried outasinFig. 2,butfortranscribedfeatures.Ina,UCEisanupstream control elementatPolIpromoters.In b,AandBareboxelementsatPolIIIpromoters. Inc,theproteinarchitectureforRP genesisshown (notstrandseparated); Ser2PandSer52Parephosphorylatedserines2and5ofheptadrepeats;grey arrowsshow nucleosomedyads.
  • 4. 312 | Nature | Vol 592 | 8 April 2021 Article elongation/termination-associatedfactorsbeingrestrictedtospecific sets of genes, except that Nrd1 of the early termination pathway was enriched at noncoding transcription (ncRNA) units (Extended Data Fig. 5a,lowerleft).Inaddition,RNA splicingfactors(suchasSmd1)were largelylimitedtothe3′halfofintronic genesencodingribosomalpro- teins(RPs;ExtendedDataFig. 5b,upperright).Thedataareconsistent with one predominant elongation entourage at most Pol II genes that changesincompositionatfixeddistancesfromtheTSS or transcription end site (TES) (rather than at a percentage of gene length). Consistentwithsomeotherreports17,18 ,althoughnotall19–21 ,wefound no evidence for Mediator being stably associated with the Pol II core initiation or elongation entourage, despite its detection in upstream promoter regulatory regions (for example, Med2 in Extended Data Fig. 5b). Equivocal binding in gene bodies may be related to approxi- mately 100 genes that produced relatively high and variable back- ground in ChIP assays (see Methods). Thelongterminalrepeats(LTRs)ofcertainclassesofTytransposons aretranscribedbyPolIIaspartofretroviral-liketransposition22 .However, mostlackedaPIC,exceptasubsetoffull-lengthTy1,2(δ)(ExtendedData Fig. 6).AtTy3(σ),thePolIIpheromonefactorsSte12,Dig1andKar4were assembledandhadnearlyidenticalpointsofcrosslinking(Fig. 3d).How- ever,insteadofPolII,wedetectedthePolIIImachineryassociatedwith adjacentdivergenttRNAgenes.ThissuggeststhatPolIIssTFsmaywork withPolIIIatsometRNAgenestointegratematingandTy3transposition22 . Inducibleversusconstitutivepromoters In classifying Pol II promoters, we opted against an unsupervised approach,asittreatsbindingeventsequivalently,withoutconsidering thatcertaintargetshaveamorecentralroleindefiningspecificregula- tory architectures. Four fundamentally distinct architectural themes emerged (see Methods, Fig. 4a and Supplementary Data 1 (1D)): first, an RP theme, as seen for 137 RP promoters with unique architectures (examinedseparately23 );second,anSTMtheme,asfor984promoters that had properties associated with inducibility, and characteristi- callyboundbyssTFsandmajorcofactormeta-assemblagesSAGA,TUP and/orMediator/SWI–SNF;third,aTFOtheme,from1,783promoters with a ssTF organization that lacked STM cofactors (but typically had the insulator ssTFs Abf1 or Reb1); and fourth, a UNB theme, as seen with 2,474 promoters that were unbound by anything except a PIC. Notably, as detailed in the Supplementary Information, the consen- sus architecture at TFO/UNB promoters indicates that two-thirds of all promoters evolved to lack regulation by ssTFs and their cofactors under any condition (not just in rich media). This is an architecture suitableforconstitutivelylowgeneexpression.RPandSTMrepresent thearchitectureofinduciblepromotersthathaveupstreamactivator sequences(UASs).Theroughly1,300ncRNApromotersweresimilarly classified(SupplementaryData 1 (E)),indicatingthattheyaregoverned by the same regulatory mechanisms. Assembly of Pol II PICs occurs in the context of chromatin, where the TSS resides on the inside edge of a downstream +1 nucleosome (Fig. 4b).MostpromotershaveaconstitutiveNFR.Theseeminglyinter- changeableterm‘NDR’—applyingtonucleosomedepletionmediated by ssTFs—is problematic. As ssTFs are absent from UNB promoters, they should lack ssTF-regulated nucleosome depletion and an NDR. We therefore considered whether NFRs and NDRs are distinct. NFRs at TFO/UNB promoters were short (less than 150 bp) and bisected by a pair of oppositely stranded, nucleosome-disfavouring In vitro reconstitution 0 –700 300 Poly (T:A) + INO80 NDR NFR Nucleosome dyads + RSC In vivo In vitro Occupancy (AU) a b c N = 984 N = 1,783 N = 2,474 PIC TFIIB (Sua7) N = 1,783 N , PIC TFIIB (Sua7) N = 984 N H2A.Z NDR Nucleosome dyads +1 –1 Nucleosome dyads TSS Insulator ssTFs GENE NFR Stable –2 –1 +1 ssTFs and cofactors 0 –700 300 Distance from +1 nucleosome dyad (bp) STM UNB TFO d e 0 –700 300 UNB (1,097) Occupancy (AU) Reb1 PIC (Sua7) Pcf11 0 –700 300 Distance from +1 nucleosome dyad (bp) TFO (292) Insulation: tandem genes Insulation: divergent genes –0.1 0.0 0.1 0.2 0.3 0.4 Correlation Divergent transcription N STM UNB TFO RP Reb1- bound Parent No AA Rap1 AA Reb1 AA 111 237 78 388 Nascent transcription TF PIC STM TFO UNB Correlated transcription Pol II promoters RPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRPRP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-STMSTMSTMSTMSTMSTMRP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-RP-TFOTFOTFOTFOTFOTFOTFOTFOTFOTFOSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTMSTM TFOTFOTFOTFOTFOTFOTFOALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLSTMSTMSTMSTMALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALLALL LeaMudPrpPrpPrpRtsSmTafTafTaf4Taf5Taf6Taf9TafTafTafTafTaf7Taf8BcyCsnElf1NrdPubRpoSenTpkBreBreCtkCtkDbpHosHosNplRadSdcSetSetSgvShgSif2SppSwdSwdEafEafEsaNhpStbYngSgfSptAdaSgfVidGcnNggSptArpHsfDatIfh1Rtr1AcsCcaChdCkaDstFunHstLysNabSccTopFhl1HmRebRvbAorArpBdfBdfHtzSwcSwVpsVpsAbfAzfCrzRpnSteMsnMssRtgSrbMedSrbSrbSsnSsnUbpSptSptRtt1NrmPhoRpdRxtRxt2Rxt3SapSin3GcnUmAroHotSnfSwSwHapLeuPipStbStpTeaHprMftRlr1SetThoThpOafCycTupCinCupRdsSkoStpTbsNrgNrgPhdSfl1SokSutCseMedMedNutRgrSin4SohSrbRtt1HalMacYapYrr1ZapFzf1Hir2Hir3StpGlnSptSptMetMetMetPdrSumStbWhNddFkhMbSwSwAceSnfGalMedPgdAft2HmAft1SwSknHtl1RscRscRscRscRscSfhSthRscGzfRfx1GisBasMigMigRlmSnfTdaUmNutHemRoxEcmFkhGcrMcPdrStbUrcWaGodRif1Rif2LysRapSfpMotBurNcbTaf3CTDCTDCclKinSptSsl1SubTaf2TfbTfbTfbToaByeRadSsl2SuaTfaTfa2TfbTfgIno2Ino4MatRscCTDRpbRpbRpbRpbRpbYtaCft1NafPapPcfRef2RnaRtt1SwdYshIswBurCbcSpnPobSptSptSptSptSptCtr9LeoPafRtf1VidHstTbfRet AprAprApro AprAprAprAproatioatioatioatioatioatioatioatioatioatioatioatioeraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras merasclusclusclusclusclusclusclusclusclusclusclusclusclusclusclusclusclusatinatinatinatinatinatinatinatinatinatinmeras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras atinatinatinatinatinatinatinatinatinatinmeras meras meras meras meras meras meras meras meras meras meras Apro Apro AprAprAprApro atinatinatinatinatinatinatinatinatinatinatinatinatinatinatinmeras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras meras atinatinatinatinatinatinatinmeras meras meras meras meras meras meras meras meras meras atinatinatinatinatinatinatinatinatinatinatinatinatinatinatinatin clusclusclusclusclusclusclusclusclusclusclusclusclusclus meras meras meras meras meras merasatioatioatioatioatioatioatioationatioatioationatioatioatioatioatioatioatioatioatioatioatioatioatioatioatioatioeraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraseraserasclusclusclus Polym 7 7 7 7 7 7 7 14 14 14 14 14 14 36 36 36 36 36 36 20 20 20 20 20 20 20 20 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 18 18 18 18 18 18 18 18 18 18 29 29 29 29 29 29 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 11 11 11 11 11 11 11 11 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 2 2 2 2 2 2 2 2 2 2 16 16 16 16 16 16 16 16 16 16 16 19 19 19 19 19 19 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 24 24 24 24 24 24 24 24 24 24 24 24 24 24 31 31 31 31 31 31 31 31 31 31 31 31 32 32 32 32 32 32 32 39 39 39 39 39 39 39 39 39 39 13 13 13 13 13 13 13 13 13 10 10 10 10 10 10 10 15 15 15 15 15 15 15 15 15 15 15 15 15 15 18 18 18 18 18 18 26 26 26 26 28 28 28 28 28 28 28 28 28 28 28 28 0 0 0 0 0 0 0 0 0 0 0 3 3 3 3 3 3 3 3 22 22 22 22 22 22 22 22 22 33 33 33 33 33 33 33 33 33 33 34 34 34 34 35 35 35 4 SPLSPLSPLSPLSPLSPLSPLTAFTAFTAFTAFTAFTAFTFIID TFIID TFIID TFIID TFIID TFIIDNRDNRDNRDNRDNRDNRDNRDNRDSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETSETNUANUANUANUANUANUANUANUANUANUAHATHATHATHATHATHATISOISOISOISOISOISOISOISOISOISOISOISOISOISOISOISOISOSWRSWRSWRSWRSWRSWRSWRSWRSWRSWR MDHMDHMDHMDH MDHMDHMDH MDHMDHMDHMDH MDHMDHMDHMDHMDHMDH MDHRPDRPDRPDRPDRPDRPDRPDRPDRPDRPDSWSWSWSWSWSWSWSWSWSWSWTHOTHOTHOTHOTHOTHOTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUPTUP MDM MDM MDM MDM MDM MDM MDM MDM MDM MDM MDM MDM MDM MDMMETMETMETMETMETMETMETMETMETMETMETMETSBFSBFSBFSBFSBFSBFSBFMDTMDTMDTMDTMDTMDTMDTMDTMDTMDTRSCRSCRSCRSCRSCRSCRSCRSCRSCTUPTUPTUPTUPTUPTUPTUPISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4ISO4NUANUANUANUANUANUANC2NC2NC2NC2TFIIH TFIIH TFIIH TFIIH TFIIH TFIIH TFIIH TFIIH TFIIH TFIIH TFIIH TFIIHTFIITFIITFIITFIITFIITFIITFIITFIITFIITFIITFIIPOLPOLPOLPOLPOLPOLPOLPOLCPSCPSCPSCPSCPSCPSCPSCPSCPSDSIFDSIFDSIFDSIFDSIFDSIFDSIFDSIFDSIFDSIFPAFPAFPAFPAFISO2ISO2ISO2POL OthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthCofCofCofCofCofCofCofCofCofCofCofCofCofCofOthTF CofCofOthOthOthOthOthOthOthOthOthOthOthOthTF TF TF OthOthOthOthOthOthOthOthOthOthTF TF TF TF TF CofCofCofCofCofCofCofCofCofCofCofCofOthCofCofCofCofCofCofCofCofTF TF CofCofCofCofCofTF TF TF TF TF TF OthOthOthOthOthOthCofCofCofTF TF TF TF TF TF TF TF TF TF TF TF CofCofCofCofCofCofCofCofOthTF TF TF TF TF CofCofCofCofOthOthOthTF TF TF TF TF CofCofOthTF TF TF TF CofCofCofCofCofTF TF TF TF TF CofCofCofCofCofCofCofCofTF CofCofCofTF TF TF TF CofCofCofCofOthOthTF TF TF TF TF TF TF TF OthOthOthTF TF TF OthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthTF TF TF CofOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthOthCofOthTF Oth # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 6 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 7 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 7 # # # # # # # # # # # # # # # # # # # # # 7 # # 7 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 4 # # # # # # # # # # # # # # # # 2 # # # # # # # # # # # # # # # # # PIC (Sua7) Abf1 Reb1 TFO UNB H2A.Z STM RP 371 targets H2A.Z Fig.4|Classificationofinducible,insulatedandconstitutivePolII promoters.a,Individualpromoters(rows)canbegroupedintofour architecturalthemes(colouredboxes)andsortedbyPICoccupancylevel. Targetsarelistedatthetopofcolumns,witharrowsdenotingAbf1,Reb1and H2A.Z.Blacklinesdenotetargetbinding(SupplementaryData 2 (3)).b,Top, diagram,andbottom,examplecompositedatafortheSTM,TFOandUNB classes.‘ssTFsandcofactors’representsacombinedsetoftargetlocations determinedbyChExMixforthosetargetslabelledassuchinSupplementary Data 2 (1K),includingssTFs,SAGA,TUPandMediator.c,Compositedatashow thatSTMpromotershaveNDRs,whereasTFOandUNBpromotershaveNFRs. In vitronucleosomesassembledwithpurifiedgenomicDNAandhistones (blackfilledareas)hadATPpluseitherpurifiedRSC(yellow)orINO80(purple) added(datafromref.24 ).Poly(T:A)regionsaresense-strandtracts(largerthan 5 bp)ofAs(red)orTs(green). d,Insulator ssTFsuncoupledivergent transcription.Dataonnascenttranscription(CRACdata26 )forcontrolstrains orstrainsdepletedofRap1orReb1bytheanchor-away(AA)techniquewere collectedforN divergentgenepairssharingthesamepromoterregion,then correlatedbetweenthegenepairs.Totherightarediagramsofdivergentgene pairs,withthedifferentialsizeofeachgreenarrowpairreflectingtheextentof insulation. e,TheterminationfactorPcf11accumulatesatinsulatorssTFs. Shownisthearchitectureatpromotersadjacenttoanupstreamtermination region(tandemgenes)andhaving(TFO)orlacking(UNB)aninsulatorssTF.
  • 5. Nature | Vol 592 | 8 April 2021 | 313 poly(dA:dT)tracts(Fig. 4c,red/green).NFRshavebeenbiochemically reconstitutedongenomicDNAwithpurifiedhistonesandchromatin remodellers24 . When applied to our promoter classes, we found that histonesalonepartiallyreconstitutedNFRsin vitroatTFO/UNBpromot- ers,butlesseffectivelyatSTMpromoters(Fig. 4c,compareblack-filled dipswithin vivoplots,andExtendedDataFig. 7a).TFO/UNBNFRswere widened by the RSC remodeller (Fig. 4c, compare the yellow-filled widerdipwiththeblack-filledareas)andhadtheir−1/+1nucleosomes positionedbythechromatin-remodellingATPaseINO80(purplefill)24 . STM promoter nucleosomes, by contrast, had an intrinsic capacity to form nucleosomes and were less responsive to RSC and INO80 (Fig. 4c,verticalarrowaround−400).TheyboundtossTFsandcofac- tors in vivo (Fig. 4b, magenta), and were nucleosome-depleted at the −1/−2nucleosomepositions.Thesesameregionshavebeeninterpreted tohaveMNase-sensitive‘fragile’nucleosomesin vivo(Supplementary Data 1 (BX); 69% were ‘fragile’ at STMs versus 19% at UNBs). However, our data indicate that sensitivity to MNase might reflect the binding of ssTFs/cofactors rather than unstable nucleosomes25 . Thus, induc- ible promoters have NDRs, while constitutive promoters have NFRs. Inthecompactyeastgenome,promotersandterminatorsoftenshare thesameNFRs or NDRsatadjacentgenes,withthepotentialtomutually influencetheirexpressionunlessinsulated26 .Insupportofthis,PICoccu- pancyatdivergentpromoterpairswaslesscorrelatedatTFOpromoters, whichhaveinsulatorssTFs,comparedwithUNBpromoters(Extended Data Fig. 7b). The same was observed for divergent nascent transcrip- tion (Fig. 4d). RP/STM divergent promoters also showed low nascent transcriptioncorrelation.Anchor-awayremovalofRap1,whichbindsRP/ STMpromoters,resultedinahighercorrelation(Fig. 4d,red).Thiswas notobservedwithremovalofReb1,whichmainlybindsTFOpromoters. RemovalofReb1,butnotRap1,resultedinhighercorrelationsatTFOand Reb1-boundpromoters(Fig. 4d,cyan).Asanegativecontrol,removalof Rap1hadlittleeffectatReb1-boundpromoters.Wesuggestthatinsula- tor ssTFs such as Rap1 and Reb1 uncouple divergent transcription at promoters to which they bind. Similarly, where a gene terminator is shared with a promoter (tandem genes), the termination factor Pcf11 overlapped with the adjacent PIC, unless an insulator ssTF intervened (Fig. 4eandExtendedDataFig. 7c).Thisfindingsupportspriorconclu- sionsoninsulatorsthatwerebasedonnascenttranscription26 . Taken together, these results suggest that the assembly of PICs is mechanistically tied to PIC assembly at adjacent upstream divergent genes,andtotranscriptionterminationattandemgenes,unlessthese eventsareinsulated.Insucharchitecturalarrangements,someinsula- torssTFsmaynotactasdirecteffectorsoftranscriptionbyrecruiting cofactors,butinsteadinsulateanddirect−1/+1nucleosomeposition- ing24 . Others may recruit cofactors in a condition-specific way. ssTF–cofactorinteractionsandcircuits A comprehensive set of 78 ssTFs were detectably bound to promot- ers in rich media (Supplementary Data 2 (1K)). A search of the JASPAR databaseofinteractionsbetweenssTFsandsequencemotifsindepen- dently confirmed proper motif specificity for 90% of the ssTFs (Sup- plementaryData 2 (1M)).SomessTFshadrobustChIP–exopatterning aroundtheircognatemotif(ExtendedDataFig. 8a;forexample,Cup9 andCin5),whichreflectstheirsite-specificstructuralinteractionswith DNAonagenomicscale.Remarkably,mostssTFshadrelativelydiffuse ChIP–exo patterning flanking their motif (Extended Data Fig. 8a; for example,Nrg1,Bas1andYrr1).AsexemplifiedbyYrr1inFig. 5a(magenta versus cyan areas), the diffuse patterning of ssTFs was particularly pronouncedatsiteswithmultipleSTMcofactorspresent(forexample, SAGA, TUP, Mediator, SWI–SNF and RPD3-L), and less diffuse at other sites that bind the same ssTFs but lack STM cofactors. STM cofactors mayimpartadistinctlocalenvironmentthatresultsinmoredispersed crosslinking.ThesamediffusepatterningoccurredwithSTMcofactors whichwereanchoredatssTFsites(Fig. 5aandExtendedDataFig. 8b). As they tend to co-occupy the same set of promoters (Extended Data Fig. 9a, Supplementary Data 2 (1K)), ssTFs might coexist with multi- ple positive/negative cofactors of chromatin accessibility and Pol II recruitment. This diffuse patterning is consistent with the notion of condensates that are anchored by ssTFs27 . In contrast to STM cofactors, we detected essentially no ChIP–exo patterning of TFIID, TBPs or any GTFs at a consolidated set of ssTF sites, despite identifying these GTFs in the periphery where TSSs reside (Fig. 5b and Extended Data Fig. 9b). Thus, although using the same paradigm for detecting ssTF–cofactor interactions, our results in yeast do not support the long-standing model that ssTFs stably engage TFIID at promoters. PIC assembly is driven by TFIID at nearly allgenes28 ,althoughatinduciblegenesitisaugmentedthroughSAGA independently of TFIID28–30 . Although the gene specificity of SAGA has been enigmatic and controversial31 , the ChIP–exo assay detects SAGAatonlyasubsetofgenes.Thediscrepancymayresideinthelow specificity of other assays32 . We addressed the specificity of SAGA further. As a direct readout of TFIID-independent PIC assembly, we expected high levels of GTFs relative to TFIID where SAGA is bound. However, we found that most SAGA-boundpromoters(RP/STM/‘SAGA-bound’)lackedhighratiosof GTFstoTFIID,althoughasmallerfractiondidhavehighratios(equiva- lentmodesandrightwardtailinFig. 5candExtendedDataFig. 9c).Thus, SAGA binding is not always concomitant with TFIID-independent PIC assembly, and may reflect a poised state. Instead, promoters having multipleSTMcofactorsdisplayedhighGTF/TFIIDratios(‘STM-bound’ and ‘RSTM-bound’ in Fig. 5c). Thus, maximal TFIID-independent PIC assembly is achieved under conditions in which there is maximal engagementofawidevarietyofnegativeandpositivessTFsandcofac- tors with NDRs, including but not limited to SAGA. PromotersboundbyssTFsincludedbothcognate(motif-based)and noncognateinteractions(ExtendedDataFig. 10).Inassessingcognate interactions, we found that most ssTFs bound to the promoters of a 0 –500 500 Distance from Yrr1 motif (bp) Opposite strand Same strand Occupancy (AU) Yrr1 STM S S S S S S S S S S S S S S S S ST T T T T T T T T T T T T T T TM M M M M M M M M M M M M M M Yrr1 λexo TUP (Tup1) SAGA (Sgf73) Mediator (Med2) ssTF (Yrr1) ssTF (Yrr1) Mediator (Med2) STM-bound Yrr1 sites Not STM-bound c RP UNB TFO 670 Frequency TFIID PIC ‘SAGA-bound’ PIC/TFIID (GTF/Taf2) log2 ratio (AU) ‘STM-bound’ ‘RSTM-bound’ N = 109 N = 305 N = 52 b 0 –500 500 Occupancy (AU) Distance from consolidated ssTF motif (bp) (Taf12) SAGA and TFIID Mediator (Med2) SAGA (Sgf73) TFIID (Taf2) bkd TBP (Spt15) TFIIB (Sua7) N = 52 –2.5 –0.5 1.5 3.5 5.5 0.3 0.2 0.1 0.0 Fig.5|ssTFsstablyinteractwithSTMcofactorsbutnotGTFs. a,Architecture atYrr1motifsintwoclassesofYrr1-boundpromoters:‘STM-bound’(labelson left)and‘notSTM-bound’(cyanandblacklabelsonright)(Methods).Thearrow pointstowherecofactorcrosslinkingpermeatesYrr1crosslinking. b,RepresentativearchitectureofSTMcofactorsorPICcomponentsata consolidatedsetofssTF-bindingmotifsatRSTMpromoters(strandaveraged; see MethodsandSupplementaryData 1 (1AI)),andorientedbyTSS.Taf12isin SAGAandTFIID;bkd,backgroundthatwasgeneratedfromastrainlackinga TAPtag.c,FrequencydistributionofpromotershavingtheindicatedPIC/TFIID ratios(averageofsixGTFs;three-binmovingaverage),separatedbypromoter class(RP,STM,TFOorUNB)orpromotersetsbasedoncofactorenrichment. ‘SAGA-bound’excludesRPpromoters,whicharehighlyenrichedwithSAGA andshownseparately.The‘STM-bound’promotersetrequiredallofthe followingtobepresent:SAGA,Mediator/SWI/SNFandTUP;‘RSTM-bound’also requiredthepresenceoftheRPD3-Lcomplex.Thex-axisisinarbitraryunits.
  • 6. 314 | Nature | Vol 592 | 8 April 2021 Article around 4–30 genes; roughly 20% of ssTFs bound 50–100 genes each; and8,whichweremostlyinsulator-like(Abf1,Reb1,Cin5,Mcm1,Tbf1, Ume6, Fkh1 and Rap1), bound more than 100 genes each. ssTFs also bound to the promoters of genes encoding other ssTFs (Extended Data Fig. 10), from which archetypical regulatory circuit motifs have been described33 . About half of all these ssTF-encoding geneslackedboundssTFs(42of78wereUNBs),andthusareexpected to be constitutive and at the start of their regulatory circuit. Of note, abouthalf(43of78)ofthessTFsexistedwithinasinglehighlyintegrated circuit,suggestingthattheregulationofssTFsishighlyinterconnected. Eleven ssTFs bound to multiple ssTF-encoding genes (multi-output archetype), suggesting that they have the potential to diversify their control through other such factors. Most ssTFs (47 of 78) bound only one other ssTF gene (single output), thereby propagating the circuit. TherewerelongregulatoryserieswithasmanyassevenssTFsinseries thatbifurcatedand/orlooped(ExtendedDataFig. 11a).Aboutone-third ofthessTFsboundtotheirownpromoter(inasimpleloop),indicating thatdirectfeedbackcontrolofthesefactorsiscommon(anautoregula- tionarchetype).NinepromotersofssTF-encodinggeneshadmultiple ssTFs bound (multi-input archetype; Extended Data Fig. 11b). In most cases,eachboundssTFwasamemberofadifferentmeta-assemblage (for example, RPD, SAGA, TUP and MED). Thus, numerous regulatory mechanisms/meta-assemblagesmayconvergeatpromotersthrough distinctssTFs.One-quarter(21of78ssTFs)boundtonootherssTFgene and thus are likely to be at the end of their circuit. Conclusions Consistentwithpublishedstudies,wehavefoundthatthevastmajor- ity of Pol II promoters share the same basic constitutive architecture. LocalDNAsequenceandchromatinremodellerscreateaconstitutive NFR that is flanked by stable and well positioned nucleosomes. This is recognized by TFIID and is configured for constitutively low gene expression.ssTFsandcofactorsaregenerallynotinvolved,exceptthat somessTFs(suchasAbf1andReb1)organizenucleosomesandinsulate against nearby genomic events. ssTFs and cofactors that directly regulate PIC assembly define roughly20%ofallgenes,withanarchitecturethatsupportsinducibility. Ourdatasupportadynamic‘futilecycle’ofnucleosomeacetylation(by SAGAandNua4)anddeacetylation(byRpd3-L),coupledtonucleosome eviction(mediatedbySWI/SNF)andstabilization(byTup1andCyc8), which produces an NDR. In this inducible environment, the assembly of a PIC is augmented beyond what TFIID delivers. The stage is then set for enhanced recruitment of Pol II via ensembles of ssTFs and the Mediator complex34 . Much of this induced transcription may exist in hubs in which numerous induced promoters coalesce, perhaps for the purposes of efficiently recycling the transcription machinery34 . Once transcription has cleared the promoter, most genes appear to encounter the same Pol II ensemble, whose architecture changes at fixed distances from either the TSS or the TES. Thiscomprehensivehigh-resolutionviewofgenomicchromatinarchi- tecturetiesintoourunderstandingofthepost-initiationglobalregulatory controlofconstitutivegenes35 ,andraisesquestionsastohowenviron- mentalsignallingdirectsinducibilitythroughthecontrolofssTFsand cofactors.Aclearviewofepigenomicarchitectureshouldprovideabet- tercontextforunderstandinghowitintegrateswithotherlayersofgene regulationthatoccurduringRNAprocessing,transportandtranslation. Moreover,asmostofthekeyproteinsexaminedhereareevolutionarily conserved,theirarchitecturalthemesarelikelytoexistinothereukaryotes. Onlinecontent Anymethods,additionalreferences,NatureResearchreportingsum- maries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author con- tributions and competing interests; and statements of data and code availabilityareavailableathttps://doi.org/10.1038/s41586-021-03314-8. 1. Rossi, M. J., Lai, W. K. M. & Pugh, B. F. Simplified ChIP-exo assays. Nat. Commun. 9, 2842 (2018). 2. Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011). 3. Hahn, S. & Young, E. T. Transcriptional regulation in Saccharomyces cerevisiae: transcription factor regulation and function, mechanisms of initiation, and roles of activators and coactivators. Genetics 189, 705–736 (2011). 4. Levine, M., Cattoglio, C. & Tjian, R. Looping back to leap forward: transcription enters a new era. Cell 157, 13–25 (2014). 5. Cramer, P. Organization and regulation of gene transcription. Nature 573, 45–54 (2019). 6. Eaton, M. L., Galani, K., Kang, S., Bell, S. P. & MacAlpine, D. M. Conserved nucleosome positioning defines replication origins. Genes Dev. 24, 748–753 (2010). 7. Li, N. et al. Structure of the origin recognition complex bound to DNA replication origin. Nature 559, 217–222 (2018). 8. Wellinger, R. J. & Zakian, V. A. Everything you ever wanted to know about Saccharomyces cerevisiae telomeres: beginning to end. Genetics 191, 1073–1105 (2012). 9. Biggins, S. The composition, functions, and regulation of the budding yeast kinetochore. Genetics 194, 817–846 (2013). 10. Camahort, R. et al. Cse4 is part of an octameric nucleosome in budding yeast. Mol. Cell 35, 794–805 (2009). 11. Henikoff, S. et al. The budding yeast centromere DNA element II wraps a stable Cse4 hemisome in either orientation in vivo. eLife 3, e01861 (2014). 12. Rhee, H. S., Bataille, A. R., Zhang, L. & Pugh, B. F. Subnucleosomal structures and nucleosome asymmetry across a genome. Cell 159, 1377–1388 (2014). 13. Furuyama, S. & Biggins, S. Centromere identity is specified by a single centromeric nucleosome in budding yeast. Proc. Natl Acad. Sci. USA 104, 14706–14711 (2007). 14. Yan, K. et al. Structure of the inner kinetochore CCAN complex assembled onto a centromeric nucleosome. Nature 574, 278–282 (2019). 15. Han, Y., Yan, C., Fishbain, S., Ivanov, I. & He, Y. Structural visualization of RNA polymerase III transcription machineries. Cell Discov. 4, 40 (2018). 16. Mayer, A. et al. Uniform transitions of the general RNA polymerase II transcription complex. Nat. Struct. Mol. Biol. 17, 1272–1278 (2010). 17. Petrenko, N., Jin, Y., Wong, K. H. & Struhl, K. Evidence that Mediator is essential for Pol II transcription, but is not a required component of the preinitiation complex in vivo. eLife 6, e28447 (2017). 18. Jeronimo, C. et al. Tail and kinase modules differently regulate core Mediator recruitment and function in vivo. Mol. Cell 64, 455–466 (2016). 19. Andrau, J. C. et al. Genome-wide location of the coactivator mediator: binding without activation and transient Cdk8 interaction on DNA. Mol. Cell 22, 179–192 (2006). 20. Paul, E., Zhu, Z. I., Landsman, D. & Morse, R. H. Genome-wide association of mediator and RNA polymerase II in wild-type and mediator mutant yeast. Mol. Cell. Biol. 35, 331–342 (2015). 21. Zhu, X. et al. Genome-wide occupancy profile of mediator and the Srb8-11 module reveals interactions with coding regions. Mol. Cell 22, 169–178 (2006). 22. Krastanova, O., Hadzhitodorov, M. & Pesheva, M. Ty elements of the yeast Saccharomyces cerevisiae. Biotechnol. Biotechnol. Equip. 19, 19–26 (2005). 23. Reja, R., Vinayachandran, V., Ghosh, S. & Pugh, B. F. Molecular mechanisms of ribosomal protein gene coregulation. Genes Dev. 29, 1942–1954 (2015). 24. Krietenstein, N. et al. Genomic nucleosome organization reconstituted with pure proteins. Cell 167, 709–721 (2016). 25. Chereji, R. V., Ocampo, J. & Clark, D. J. MNase-sensitive complexes in yeast: nucleosomes and non-histone barriers. Mol. Cell 65, 565–577 (2017). 26. Candelli, T. et al. High-resolution transcription maps reveal the widespread impact of roadblock termination in yeast. EMBO J. 37, e97490 (2018). 27. Brzovic, P. S. et al. The acidic transcription activator Gcn4 binds the mediator subunit Gal11/Med15 using a simple protein interface forming a fuzzy complex. Mol. Cell 44, 942– 953 (2011). 28. Huisinga, K. L. & Pugh, B. F. A genome-wide housekeeping role for TFIID and a highly regulated stress-related role for SAGA in Saccharomyces cerevisiae. Mol. Cell 13, 573–585 (2004). 29. Dudley, A. M., Rougeulle, C. & Winston, F. The Spt components of SAGA facilitate TBP binding to a promoter at a post-activator-binding step in vivo. Genes Dev. 13, 2940–2945 (1999). 30. Moqtaderi, Z., Bai, Y., Poon, D., Weil, P. A. & Struhl, K. TBP-associated factors are not generally required for transcriptional activation in yeast. Nature 383, 188–191 (1996). 31. Baptista, T. et al. SAGA is a general cofactor for RNA polymerase II transcription. Mol. Cell 68, 130–143 (2017). 32. Mittal, C., Rossi, M. J. & Pugh, B. F. High similarity among ChEC-seq datasets. Preprint at https://www.biorxiv.org/content/10.1101/2021.02.04.429774v1 (2021). 33. Harbison, C. T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004). 34. Boija, A. et al. Transcription factors activate genes through the phase-separation capacity of their activation domains. Cell 175, 1842–1855 (2018). 35. Badjatia, N. et al. Acute stress drives global repression through two independent RNA polymerase II stalling events in Saccharomyces. Cell Rep. 34, 108640 (2021). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. © The Author(s), under exclusive licence to Springer Nature Limited 2021
  • 7. Methods No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. Strainsandantibodies The vast majority of data for this study were collected from tandem affinity purification (TAP)-tagged S. cerevisiae strains (originally pur- chased from Dharmacon; now available from Horizon Inspired Cell Solutions, Cambridge, UK). The background strain for this collection was BY4741 (a derivative of S288-C; MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0). Negative control ChIPs and ChIPs with specific antibodies were performed with BY4741. If the TAP-tagged strain for a particular targetwasunavailable,weinsteadusedahaemagglutinin(HA)-tagged strain(originallypurchasedfromDharmacon;nowavailablefromHori- zonInspiredCellSolutions).ThebackgroundstrainfortheHA-tagged collection was diploid, derived from BY4741 and designated Y800 (MATa leu2-D98cry1R/MATα leu2-D98CRY1 ade2-101 HIS3/ade2-101 his3-D200ura3-52caniR/ura3-52CAN1lys2-801/lys2-801CYH2/cyh2R trp1-1/TRP1 Cir0 carrying pGAL-cre (amp, ori, CEN, LEU2)). Rabbit IgG (Sigma, catalogue number I5006, various lot numbers) conjugatedtoDynabeadswasusedtoimmunoprecipitatechromatin fromTAP-taggedstrains.SantaCruzBiotechnologysc-7392antibody was used to immunoprecipitate chromatin from HA-tagged strains. Millipore antibodies 04-1570-I, 04-1571-I or 04-1572-I were used to immunoprecipitatePolIIhavingitscarboxy-terminaldomainphospho- rylatedatpositionsserine7,2or5,respectively,oftheheptadrepeats. Milliporeantibody07-352wasusedtoimmunoprecipitatehistoneH3 with acetylated lysine 9 (H3K9ac). Cell Signaling antibody 5546S was usedtoimmunoprecipitatehistoneH2Bwithubiquitinatedlysine123 (H2BK123ub). Cse4 antibody from C. Wu (Johns Hopkins Univ., Balti- more, MD) was used to immunoprecipitate Cse4. Heat shock factor 1 (Hsf1)antibodyfromD.Gross(LouisianaStateUniv.,BatonRouge,LA) wasusedtoimmunoprecipitateHsf1.ChIP–seqexperimentsusingmic- rococcalnuclease(MNase)toidentifynucleosomeswereperformedfor thefollowinghistonesandhistonemodifications:H3(detectedusing Abcam antibody ab1971), H3K27ac (ab4729), H3K36me3 (ab9050), H3K4me3(ab8580),H3K79me3(ab2621),H3K12ac(ab46983)andH2B (Active Motif 39237). CellgrowthandChIP–exo S. cerevisiae strains were grown in 67 ml of yeast peptone dextrose (YPD) media to an optical density at 600 nm (OD600) of 0.8 at 25 °C. Cellswerecrosslinkedwithformaldehydeatafinalconcentrationof1% for15 minat25 °C,andquenchedwithafinalconcentrationof125 mM glycine for 5 min. Cells were collected by centrifugation, and washed in 1 ml of ST buffer (10 mM Tris-HCl, pH 7.5, 100 mM NaCl) at 4 °C. The cellswerepelletedagain,thesupernatantwasremoved,andthepellet was flash frozen. AsSTMclassificationcriteriaincludedpromotersthatbecamebound bySAGAuponacuteheatshock(asdescribed36 ),wecarriedoutequiva- lent heat-shock experiments but using the workflow of this study. We used these new data to assign heat-shock-induced binding locations ofSAGA(whichcorrelatedhighlywithbindinglocationsinref.36 ).For theseheat-shocksamples,yeastwasgrownin67 mlofYPDtoanOD600 of 0.8at25 °C;anequalvolumeofYPDmediumat55 °Cwasaddedtoraise thetemperatureofthecultureto37 °Candincubatedat37 °Cfor6 min. Then, cells were crosslinked with formaldehyde at a final concentra- tionof1%for15 minatroomtemperaturebyaddinga50 mlsolutionof ice-cold3.7%formaldehydeinwater.Notethatprotein–DNAcrosslinks occurrapidly.Crosslinkingwasquenchedwithafinalconcentrationof 125 mM glycine for 5 min. Cells were collected by centrifugation, and washed in 1 ml of ST buffer at 4 °C. The cells were pelleted again, the supernatant was removed, and the pellet was flash frozen. Chromatin preparations are based on modifications of a prior pro- tocol1 .Frozencellpelletswereresuspendedandlysedin1 mlofFAlysis buffer(50 mMHepes-KOH,pH 7.5,150 mMNaCl,2 mMEDTA,1%Triton, 0.1%sodiumdeoxycholateandcompleteproteaseinhibitor(CPI))and a 500 μl volume of 0.5 mm zirconia/silica beads by bead beating in a Mini-Beadbeater-96 machine (Biospec) for three cycles each of three minutes on/seven minutes off (samples were kept in a −20 °C freezer during the off cycle). The lysates were transferred to a new tube and microcentrifugedatmaximumspeedfor3 minat4 °Ctopelletthechro- matin.Thesupernatantswerediscarded;thepelletswereresuspended in600 μlofFAlysisbufferandtransferredto15 mlpolystyreneconical tubes containing 300 μl of 0.1 mm zirconia/silica beads. The samples were then sonicated in a Bioruptor Pico (Diagenode) for 8 cycles (15 s on/30 soff)toobtainDNAfragmentsof100–500 bpinsize.EachChIP– exo assay processed the equivalent of 33 ml of cell culture (roughly 8 × 108 cells).Theremaininghalfoftheprocessedchromatinwasflash frozenandstoredat−80 °Cincaseatechnicalreplicatewasdesired. Acultureequivalentof33 ml(roughly630millioncells)ofyeastwas fragmentedtoproducesolubilizedchromatin(roughly190 μl).Thiswas incubated overnight (roughly 16 h) at 4 °C with the appropriate anti- body.A10 μlbedvolumeofconjugatedIgG–Dynabeads(0.83 mg ml−1 IgGand5 mg ml−1 Dynabeads)or3 μgofspecificantibodieswitha10 μl slurry-equivalent of Protein A Mag Sepharose (GE Healthcare) was used in each reaction. ChIP–exo5.0wasperformedasdescribed1 .Essentially,ChIPlibraries werepartiallyconstructedontheimmunoprecipitatedresin,andthen λexonucleasewasusedtotrimnucleotidesinthe5′to3′directionuntil stopped by a protein–DNA crosslink. The DNA was then eluted and library construction completed. In a typical experiment with TAP-tagged yeast strains, 48 ChIP–exo experimentswereperformedconcurrently.Eachsetincluded46unique targets,aReb1–TAPsampleasapositivecontrol,andaBY4741sample (from a parental strain lacking the TAP tag) as a negative control. Fol- lowing 18 cycles of polymerase chain reaction (PCR), all 48 samples were pooled equally by volume. Library concentration was quanti- fiedbyquantitativePCR(qPCR).Equivalentworkflowsoccurredwith other strains. Using paired-end Illumina sequencing and cellular conditions identical to those used to produce ChIP–exo data, we generated a genome-widenucleosomemap(MNasehistoneH3andH2BChIP–seq) with improved accuracy over our prior maps. MNase ChIP–seq was performedasdescribed37 .Briefly,formaldehyde-crosslinkedchroma- tin was digested with MNase to achieve roughly 80% of mononucle- osomes.AfterH3orH2BChIPandlibraryconstruction,librarieswere size selected by agarose gel electrophoresis, and sequenced. Sequencingandmapping High-throughputDNAsequencingwasperformedwithanIlluminaNext- Seq500or550inpaired-endmode,producinga40 bpRead_1anda36 bp Read_2.AdditionalpreviouslypublishedChIP–exodatasets23,36 forHsf1, Msn2, Spt15, Spt16, Ifh1 and Fhl1 were included in data processing and analysisforourstudy.Dataweremanaged,qualitycontrolled,andpro- cessedthroughacustomautomatedworkflowcontrolcalledPEGR(Plat- formforEpi-GenomicResearch)38 .Sequencereadswerealignedtothe yeast(sacCer3)genomeusingbwa-mem(version0.7.17).Alignedreads werefilteredusingPicard(version2.7.1)39 andsamtools(version0.1.18)40 to remove PCR duplicates (that is, where the 5′ coordinates-strand of Read_1andRead_2wereidenticaltoanotherreadpair)andnon-uniquely mapping reads. For ChIP–exo, the resulting mapped 5′ end of Read_1 (theexonucleasestopsite)isdefinedasatag.ForMNase,theresulting mappedmidpointofRead_1andRead_2isdefinedasatag. Dataquality,statisticsandreproducibility We tested many targets that were not expected to bind directly to DNA, and thus could not assume that every target would produce a
  • 8. Article positive ChIP signal. We empirically determined that a minimum of 200,000 deduplicated tags were required to assess the quality of an individual dataset. If a dataset received less than 200,000 tags, then we required the tag duplication level (number of reads discarded by PICARD)/(number of input reads) of the sample to be less than 70% before we sequenced it more deeply. For example, if a dataset had 100,000mappablededuplicatedtags(uniqueRead_1andRead_2com- bination), but a total of 1 million mappable tags before filtering, then the duplication level was 90% and it was assumed that the library was insufficiently complex to warrant additional sequencing. If a library was insufficiently complex, we performed a technical replicate with theremainderofthechromatinpreparation.Followingthisprocedure, we produced a sufficiently complex library for more than 95% of tar- getstestedfromasingleyeastculture.Inpractice,poolingequivalent proportions of 48 barcoded libraries (in terms of reaction volumes) provided similar sequencing depth across all samples. All analysed dataset were confirmed with independent biological replicates that passedourquality-controlmetrics.Adatasetwasconsideredsuccess- fulifsignificantlocations(binomial,1.5-fold,P < 0.01)wereidentified by ChExMix (see below) and these locations were not in regions that producehighlyvariabledata.Nvaluesarereportedforthenumberof target datasets (hierarchical clustering and UMAP) or the number of genomic features (composite plots and heatmaps) analysed. RawFASTQreadsforeachsamplewerealignedagainsttheknownTAP or HA FASTA sequence and nearby genomic sequence to confirm the presenceandlocationoftheepitopeineachstrain.See03_EpitopeID at https://github.com/CEGRcode/2021-Rossi_Nature. Mappingstatisticsforeachdatasetareavailableatyeastepigenome. org, along with mapped data downloads. Analyses shown at yeast- epigenome.org can be reproduced or further custom analysed using ScriptManager(https://github.com/CEGRcode/scriptmanager),which provides a simple user-friendly interface. It includes straightforward instructionsforinstallationandfordataanalysis.Datavaluesfromthe paper’scompositeplotscanbefoundin01_Composite_Filesathttps:// github.com/CEGRcode/2021-Rossi_Nature. ChExMixlocations ChExMix41 version 0.31 was run with the following non-default parameters: --noread2 --scalewin 1000 --minmodelupdateevents 50 --fixedalpha0--mememinw8--mememaxw21--minmodelupdaterefs25 -- lenientplus. We also used the --excludebed option to exclude from analysis a custom set of hypervariable regions (ChExMix_Peak_Filter_ List_190612.bed),includingtherDNAlocus,tRNAgenesandtelomere regions (this list is available in 02_References_and_Features_Files at https://github.com/CEGRcode/2021-Rossi_Nature).Bydefault,ChEx- Mixrequiresthetagcountatbindingeventstoachieveatleast1.5-fold enrichmentandaminimumBenjamini–Hochberg42 correctedPvalue of0.01(binomial),comparedwiththescaled‘masterNoTag_20180928’ negativecontrolcount.Allexperimentsforagivenproteintargetwere analysed by ChExMix individually. The resulting peak calls for each individual replicate experiment can be found at yeastepigenome.org or the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/). In addition, the --lenientplus option enables a multireplicate reproduc- ibilityassessmentmodeinChExMix.Usingthisfeature,replicateexperi- ments passing quality control were analysed simultaneously, and the resulting joint peak calls were used to classify Pol II features (see the section on ‘Pol II promoter classes’ below). Locations are defined as ChExMixpeaksiftheirtagcountspassthethresholdsinthecombined meta-experiment (essentially merging tag counts across replicates), orinoneormoreindividualreplicateexperiments.However,locations are reported only if the normalization of ChIP–seq (NCIS)-scaled tag countsdidnotvarysignificantlyacrossreplicates(binomial,1.5-fold, P < 0.01).Thislatterconditionhadtheeffectofscreeningoutlocations that were not reproducibly enriched across replicated experiments. Locations resulting from a combined analysis of two independent replicates can be found in 04_ChExMix_Peaks at https://github.com/ CEGRcode/2021-Rossi_Nature (and at https://doi.org/10.26208/rykf- 6050 for individual replicates). The negative control for ChExMix peak calling, termed ‘masterNo- Tag_20180928’, was created by merging 15 individual BY4741 (parent strain containing no epitope tag) ChIP–exo experiments into a single BAM file. These negative controls were generated over an 18-month period during the main phase of data collection. The file ‘masterNo- Tag_20180928.bam’comprisesthefollowingSampleIDs:11851,11946, 12094, 12880, 13484, 13822, 14202, 14408, 14637, 14825, 15256, 15818, 16073, 17814 and 18504, and is available at https://doi.org/10.26208/ rykf-6050. Meta-assemblages Meta-assemblagesarebasedoncellpopulations.Thus,theirmember targetstendtobindthesamegenomiclocations,althoughnotneces- sarilyatthesametimeoraboveapresetalgorithmicthreshold.Owing to parameter constraints placed on clustering, significant (P < 0.01) butrare(forexample,HIR)and/orhighlyisolated(forexample,Vid22/ Tbf1)bindingeventstendedtoclusterneareachotherinUMAP,andso wereplacedinasinglemiscellaneousmeta-assemblage(ISO)without further analysis. Usingbedtoolsintersect(bedtoolsversion2.27.1),allChExMixpeaks (regardless of whether they were associated with the Pol II sector, definedabove)foreachof384validatedinputtargetswereintersected ina100-bpwindowaroundthemselves.Thisproducedasymmetrical matrixofcountsrepresentingthefrequencyofpeakoverlapbetween all samples. 2D hierarchical clustering43 was then performed, using average linkage and uncentred correlation as the metric. Theinteractionmatrixwasfurtherfilteredtoremove13targetswith fewerthanfivetotalChExMixpeaks(forexample,PolItargetshaving only two binding locations that are annotated in the reference yeast genome, despite the rDNA locus being highly repetitive). This pro- ducedasymmetricalmatrixof371samples(Fig. 1bandSupplementary Data 3).ThematrixwasthenusedastheinputintotheUMAPalgorithm (version0.3.7)44 usingthefollowingparameters:umap.UMAP(n_neigh- bours = 5,min_dist = 0.0,n_components = 2,metric = ‘correlation’,ran- dom_state = RS,).fit_transform(X).K-meansclusteringwasperformed on the resulting 2D projection at a variety of K values (5, 10, 20, 25, 30, 35, 40, 100, 145). No new biologically distinct clusters appeared beyond K = 40. Referencefeaturesandintervals Coordinates for 253 replication origins (ACS sequences, for ‘auton- omously replicating sequence (ARS) consensus sequences’) were obtained from ref. 6 . Note that ACS_6_32973 has a duplicate entry on theyeastepigenome.orgwebsite,resultingin254features.Coordinates for X-core elements (XCEs), centromeres (CENs), RNA polymerase I (Pol I) TSS, Pol III TSS, NCR (SGD-defined noncoding RNA annotated as ncRNA_gene, snoRNA_gene, and snRNA_gene) and Ty transposon LTRswereobtainedfromtheSaccharomycesGenomeDatabase(SGD; https://www.yeastgenome.org)on3March2017(availableasSGD_fea- tures_170331.tabin02_References_and_Features_Filesathttps://github. com/CEGRcode/2021-Rossi_Nature).TSSandTEScoordinatesforPol II were obtained from ref. 45 . They were matched to each SGD coding featurethroughtheirsystematicGeneID.Thesecoordinateswerebased onmicroarrays.ForTSS,themost5′-enrichedsense-strandcoordinate ineachpromoterisreported.Whennotranscriptwasreportedforan SGDfeature,theTSSandTESwereimputedfromtheSGDcoordinates by moving 70 bp upstream of the start ATG (SGD start) for TSSs and 70 bpdownstreamofthestopcodon(SGDend)forTESs.Thisimputa- tionwasbasedontheempiricalobservationthatthemediandistance fromtheTSSdefinedinref.45 andthestartcodonwas70 bp.‘Dubious ORFs’wereinitiallyconsideredandthenexcludedfromfurtheranalysis
  • 9. because we and others46 found no validating evidence. Noncoding RNAs (ncRNAs) were from SGD annotations; cryptic unstable tran- scripts (CUTs) and stable unannotated transcripts (SUTs) were from ref.45 ;andXrn1-sensistiveunstabletranscripts(XUTs)werefromref.47 . Referencedatasetsareavailablein02_References_and_Features_Files at https://github.com/CEGRcode/2021-Rossi_Nature: SGD features (SGD_features_170331.tab),ORFTSS(Xu_2009_ORF-Ts_V64.gff3),CUT (Xu_2009_CUTs_V64.gff3), SUT (Xu_2009_SUTs_V64.gff3), and XUT (van_Dijk_2011_XUTs_V64.gff3). NucleosomemapsatPolIIpromoterregions MNaseH3andH2BChIP–seqpaired-endreadswerebioinformatically filteredtofragmentsizesof100–160 bp,andthennucleosomedyads (peaks)werecalledfromthemappedmidpointlocationofRead_1and Read_2 5′ ends using GeneTrack (v1) (parameters: s40e80F1)48 . Peaks were required to overlap within a 75-bp window in at least 4 of 6 data- sets(3H2Band3H3MNaseChIP–seq;SampleIDs10951,10952,10967; 10947, 10948 and 10966) to call a consensus nucleosome (N = 6). The averagelocationofoverlappingpeaksdefinedthedyadcoordinateof a consensus nucleosome. The +1 nucleosome was defined as the nucleosome dyad peak that was closest to a TSS in a window −60 bp to +140 bp. If no nucleosome wasfound,thenanadditionalsearchwasperformed−80 bpto−61 bp relative to the TSS. If none was found, then the region was viewed in IntegratedGenomeViewerversion2.5.2(IGV;http://software.broadin- stitute.org/software/igv/)49 ,andmanuallyassigned.Ifnonucleosomes could visually be assigned to a TSS in IGV, then a +1 nucleosome dyad coordinate was imputed as the SGD ATG start coordinate (which is theconsensuslocationof+1nucleosomes).ThisplacedtheTSSatthe genome-wide canonical location relative to the imputed +1 dyad. We previously defined consensus −1 nucleosome positions of all genes transcribed by Pol II, regardless of whether a nucleosome had low occupancy or was even detectable50 . However, here our intent was to define the region encompassing NFRs and NDRs, and so we chose to ignore nucleosome positions that were highly depleted of nucleosomes. Our goal was to manually determine the location of the most robust algorithmic nucleosome position (upstream stable nucleosome, USN) that was located closest to a TSS and in a window −500 bp to −60 bp from the TSS, as long as that nucleosome was not already called a +1 nucleosome. If one of the following criteria was met, then the nucleosome landscape was visualized in IGV, and the USN and/or +1 nucleosomes were manually (re)assigned (N = 753): 1) either the USN or +1 was not present in the original algorithmically definedset;2)theUSN-to-(+1)dyad-to-dyaddistancewascalculated to be smaller than 187 bp (the size of a nucleosome (147 bp) and two linkers (2 × 20 bp)); 3) a ssTF peak was, first, located less than 600 bp upstream of the TSS, and second, upstream (more 5′ to the nearest TSS) of a nucleosome call having an occupancy score that was in the bottom 5% of all nucleosomes (that is, an algorithmically called nucleosomethatwasinfacthighlydepletedinthevicinityofassTF). If no nucleosomes could visually be assigned, the USN nucleosome coordinate was imputed as 750 bp upstream of the +1 nucleosome dyad (99th percentile of calculated NDR/NFR lengths). The NDR/ NFRlengthatthesefeatureswasreportedas‘9999’inSupplementary Data 1 (1S) (N = 297). As the promoter regions defined in this study include arbitrary limits and do not consider limits defined by insula- tion, there will be some inaccuracies in relation to actual biological promoter boundaries. This is expected to result in some promoter misclassifications. In total, 59,002 nucleosomes were called across the S. cerevisiae genome. Nucleosome occupancy and fuzziness scores were calcu- latedasdescribed51 .Allnucleosomecallswiththeirmedianoccupancy and fuzziness scores are available as Nucleosome_calls_and_stats. xlsx in 02_References_and_Features_Files at https://github.com/ CEGRcode/2021-Rossi_Nature. ChExMixlocationsatfilteredPolIIgenes The initial list of all compiled features totalled 11,112 (Supplementary Data 1). Numerous quality-control metrics were calculated for each Pol II transcribed feature to assess their validity and mappability. We usedtwoGTFs(Sua7(SampleID = 11743)andSsl2(11747))andanegative control(masterNoTag_20180928.bam),withtotaltagssettobeequal across all three in order to assess the enrichment around each candi- date coding and noncoding Pol II TSS (N = 9,844; feature class level 1: 01–12,14,24and25inSupplementaryData 1 (1D)),asdescribedbelow. Aregionofthegenomewasdefinedforeachtranscribedfeaturethat included the transcribed sequence (TSS to TES) and the surrounding regulatory region. The upstream (promoter) regulatory region was defined as the inclusive interval between the dyad coordinate of the USN (see above) and the TSS. When no USN was called for a feature, thentheupstreamboundarywasdefinedas750 bpupstream(5′)ofthe TSS. Note that the upstream boundary does not consider boundaries defined by insulators, as they have not yet been fully defined. This may result in unwarranted attachment of ssTF/cofactor locations to somepromoters.Thedownstreamregulatoryregionwasdefinedasthe inclusiveintervalfromTESto100 bpdownstream(3′).Thisboundary was based on the consensus position of the termination machinery relative to the TES. The genomic region from the USN dyad to 100 bp downstream of TES was defined as a ‘Pol II sector’. ChExMix peaks for all datasets here were intersected with each Pol II sector using Bedtools. A protein was defined to be located within a featureifatleastoneChExMixpeakoverlappedwithanyportionofthe sector.IfaChExMixpeakintersectedtwooverlappingsectors(thatis, the peak exists in the promoter region of two genes in a head-to-head orientation), then that protein was located in both sectors. Conse- quently, the number of ChExMix peaks and the number of bound fea- tures (or sectors) is not equal. PolIIsectorswereexcludedas‘hypervariable’ifanyofthefollowing conditions were met: 1) the TSS was in the highest 1% of masterNo- Tag_20180928 tag counts (negative control) in a 1,000-bp window centred over the TSS; 2) the TSS was in the highest 5% of masterNo- Tag_20180928 tag counts in a 200-bp window centred over the TSS and the occupancy ratios of both Sua7/NoTag and Ssl2/NoTag were less than 2 (based on total tag normalization). The rationale for these criteria was that if the signal in the negative control was too high, and thesignal-to-noiseratiosofrobustGTFssuchasSua7andSsl2werenot well above the high background, then we did not have confidence in locationscalledatthesesites.Thesectorwasretainedifitoverlapped withapeakcallfromanydatasetinthisstudy.Weassumedthatthepeak indicated enough dynamic range to have useable data in this region. WeexcludedN = 75PolIIsectorsbythismetric(‘08_Hyper-variable’in Supplementary Data 1 (1D)). Pol II sectors were excluded for having ‘poor mappability’ if any of the following conditions were met: 1) the TSS was in the lowest 1% of MasterNoTag_20180928tagcountsina1,000-bpwindowcentredover theTSS;2)theTSSwasinthelowest5%ofmasterNoTag_20180928tag counts in a 200-bp window centred over the TSS and the occupancy ratiosofbothSua7/NoTagandSsl2/NoTagwerelessthan2(basedon total tag normalization). Visual inspection of heatmaps confirmed thatthesesegmentsofthegenomewerenotuniquelymappable,and thus had low intrinsic tag counts. We excluded N = 116 Pol II sectors by this metric (‘24_Hyper-variable_noncoding’ in Supplementary Data 1 (1D)). Pol II sectors were excluded as ‘Quiescent-NoPIC’ if the occupancy ratios of both Sua7/NoTag and Ssl2/NoTag were less than 1. The sec- tor was retained if it overlapped with a peak call from any dataset in this study. The rationale here was that if there were no peaks in the sector vicinity and no enrichment of GTFs, then this feature was rela- tively quiescent. Thus, it was uninformative to analyse it further. We donotexcludethepossibilitythatthesefeatureshadlowsubthreshold
  • 10. Article activity.WeexcludedN = 251PolIIsectorsbythismetric(‘05_NoPIC’in Supplementary Data 1 (1D)). Pol II sectors were excluded as ‘tRNA proximal’ if peaks from Tfc3 (11835)—a component of the RNA polymerase III transcription ini- tiation factor complex—overlapped with the region between the +1 nucleosomedyadandtheUSNdyadofthesector.tRNAgenesproduced high levels of background owing to strong crosslinking of the Pol III machinery, which digestion by λ exonuclease then focuses into high backgroundpeaks.Althoughthisbackgroundispresentinallsamples, itismostproblematicorevidentwherethetargetforegroundsignalis closetobackground.WeexcludedN = 135PolIIsectorsbythismetric: (‘06_tRNAprox’ in Supplementary Data 1 (1D)). PolIIsectorswereexcludedas‘ChExMixextreme’iftheyoverlapped withanunusuallyhighnumberofpeaks.Thesefeaturescontainedmany gene-body peaks for targets that, across the rest of the genome, were bound primarily in promoter regions. Further analysis revealed that thedensityoftagsacrossthegenebodyinthemasterNoTag_20180928 negativecontrolwasabnormallyhighorlow,relativetotherestofthe genome, thereby creating statistical anomalies of bound locations. Consequently, ChExMix produced many false-positive peak calls in unrelated datasets at these extreme regions where the background modelappearstobreakdown.Thepeakcallsattheseextremefeatures arestillincludedintheChExMixpeakfilesbutshouldnotbeconsidered validlocationsunlessvalidatedbyorthogonalmethods.Thenumberof PolIIsectorsgiventhislabelwasempiricallycappedatN = 25(‘07_ChEx- Mix_extreme’ in Supplementary Data 1 (1D)). The value of this filter is that it decreased the number of potentially artefactual locations occurring in noncanonical places, particularly for ssTFs that bind to fewgenes.However,wedonotexcludethepossibilityofnoncanonical extreme, yet still biological, behaviour occurring at these genes. For example, large condensates might behave in this way. Our analysis of the ncRNA features reported in refs. 45,47 found that many of these calls were not supported by evidence of GTF binding (Sua7) in the TSS vicinity, suggesting that many were false positives. NoncodingPolIIsectorswereexcludedifnoSua7peakwasfoundwithin 80 bp of the TSS. We excluded N = 2,161 ncRNA Pol II sectors by this metric (‘25_excluded_ncRNA’ in Supplementary Data 1 (1D)). PolIIpromoterclasses Ourunsupervisedapproachtochromatinorganizationgenome-wide produced meta-assemblages that reflect predominant architec- tural themes. Meta-assemblages are computed ensembles of many genome-wide locations averaged across millions of cells, and thus do notnecessarilycorrespondtobiochemicallystablecomplexes.There are cases in which a meta-assemblage such as ORC would appear to have a corresponding biochemical ensemble at replication origins. Thismakesmeta-assemblagesandrealensemblesseeminglythesame. However,asexpected,therewasnosinglepromoterarchitecturethat emergedfromourunsupervisedapproach.Instead,meta-assemblages reflectedpredominantarchitecturalthemesthatrangedalongacom- positionalspectrumfromrelativelyheterogeneous(ssTFs/MED/SAGA/ TUP) to relatively homogeneous (PIC). Meta-assemblages could be merged or subdivided to achieve levels of granularity, but also levels of uncertainty. They permeated promoters to varying extents. The variation in the types of meta-assemblages within and across promoter classes gives them their unique regulatory properties, but also makes promoter classification fluid. Classification depends on input criteria that reflect subjective concepts. For example, prior workcreatedSAGA-dominatedandTFIID-dominatedgenegroupson the basis of functional criteria (relative sensitivity to SAGA and TFIID mutants)28 . This helped to produce a genome-wide concept of induc- ible versus constitutive genes, but could not address other concepts suchasinsulation,orthefactthatsomethemesmaynotbemanifested throughSAGAandTFIID,orthattheremaybemoregranularityineach of those classes. We attempt here to provide more granularity, but recognizethatsimplifyingoverarchingconceptsarebestservedwith fewergroups.Tothisend,wecreatedpromoterclassesthataroseinpart fromourunsupervisedlearningapproach.However,wealsoinjected additionala prioriknowledge.Thisknowledgeconsidersthefunctional- ityofeachfactorthatcontributestodistinctiveregulatoryarchetypes. The137RPpromoters(definedbySGD)encodesubunitsoftheribo- some. They comprise the largest known set of genes that are thought tobecoregulatedunderallconditions.Thismaybeduetothefactthat they are predominantly regulated by the ssTF Rap1. They are highly expressed and well studied by ChIP-exo as a group23 , and so form a distinct gene set. SAGA, Mediator and Tup1 (‘STM’) are major cofactor complexes that, along with other ssTFs and cofactors (listed in Supplementary Data 2 (1K)), co-occur at highly expressed genes and formed major UMAP clusters. We therefore defined a set of non-RP STM promoters (using the Bedtools intersect) if the region between the +1 nucleo- someandUSNdyadshadatleastoneSAGA,MediatororTUPChExMix call (Supplementary Data 2 (10A)) in YPD at 25 °C or a SAGA call upon acuteheatshock36 (6 minat37 °C)(N = 984intheSTMgroup;seeSup- plementaryData 1 (1E)).MostSTMpromoterregions(N = 854,or87%) also bound at least one of 78 ssTFs site-specifically (Supplementary Data 2 (10C)). The majority of these ssTF peaks overlapped position- ally with STM cofactor peaks. Applicable to Fig. 5b, we labelled each ssTF-boundmotifasa‘consolidatedssTFmotif’ifitoverlappedwitha STMpeak.Thisconsolidatedmotifsetwasconsideredtheorganizing centreofthatpromoter.WhenassTFmotifwasabsent,thessTFpeak callwasusedininstead.WhennumerousssTFswereboundtothesame promoter,thessTFclosesttotheSTMpeakwasused(Supplementary Data 1 (1Y–1AI)). Of the remaining promoters (non-RP, non-STM), a subset had ssTF ChExMixpeaks(whethersite-specificallyboundornot)orothercofac- tor ChExMix peaks in the region between the +1 nucleosome and the USN.ThislistofssTFsandcofactorsdidnotincludethecoretranscrip- tionmachinery(initiation,elongationortermination),whichneverthe- lesswerepresent.Wethereforedefinedtheseas‘TFO’(N = 1,783).About one-quarterofTFOpromotershadaboundssTFthatwasmoreassoci- atedwithSTMpromoters,andthuspresumablycapableofrecruiting cofactors(SupplementaryData 2 (8)).TheseTFOpromotersmayhave been algorithmically misclassified, perhaps being expressed under otherenvironmentalconditions.Thosenon-RP,non-STMandnon-TFO promoters that remained constituted 2,474 promoters whose pro- moter regions lacked evidence of a binding event beyond a PIC or a nucleosome,andthusformedthelargestofallgroups,the‘unbound’ (‘UNB’). These classifications are indicated in Fig. 1a, along with their relationship to the TFIIDdom and SAGAdom gene classes. Relative PIC occupancy (green-dot count) is based on average TFIIB (Sua7) occu- pancy (Supplementary Data 1 (1AJ)) but confirmed with nascent and steady-state transcription. StringentPolIIpromoterclasses Theseclassificationsweremorestringentthanthoseaboveandrelateto Fig. 5b,c,andExtendedDataFig. 9b,c.The‘SAGA-bound’classification requiredapromotertohaveaChExMixpeakcall(‘1’inSupplementary Data 2 (3))fortwoormoreofthefollowingtargets:Spt7,Ada2,Sgf11or Sgf73.The‘STM-bound’classificationrequiredapromotertohaveall threeofthefollowinglabels:SAGA-bound,TUP-boundandMediator/ SWI–SNF-bound,asfollows.The‘TUP-bound’classificationrequireda promoter to have a ChExMix call (‘1’) for two or more of the following targets: Tup1, Cyc8, Sok2 and Cin5. The ‘Mediator/SWI–SNF-bound’ classification required a promoter to have a ChExMix call (‘1’) for two or more of the following targets: Swi1, Med2, Snf6 and Swi3. The ‘RSTM-bound’ classification required a promoter to have both of the followinglabels:STM-boundandRPD-bound.TheRPD-boundclassifi- cationrequiredapromotertohaveaChExMixcall(‘1’)fortwoormore ofthefollowingtargets:Rpd3,Rxt1/Cti6,Rxt2,Rxt3,Nrm1andUme6.
  • 11. Heatmapsandcompositeplots Analysis was performed using the GUI ScriptManager version 012, which is available for download at https://github.com/CEGRcode/ scriptmanager. ScriptManager provides a simple user-friendly inter- face for ChIP–exo analysis, and includes simple installation instruc- tions.HeatmapsandcompositeplotsweregeneratedusingTagPileup script. For ChIP–exo data, the following settings were used: Read_1 5′ end; separate strands, 0 bp tag shift, 1 bp bin size, sliding window (moving average) 11. For MNase ChIP–seq data the following settings were used: (paired-end) read midpoint; combined strands, 0 bp tag shift, 1 bp bin size, sliding window 21. All data are oriented by TSS or reference point strand. For graphical display of composite plots, output data consisted of frequency counts of Read_1 5′ ends for ChIP–exo or Read_1/Read_2 midpoint for MNase H3/H2B ChIP–seq dyads (BAM files) that were at x-axisbase-pairdistancesfromsetsofgenomicreferencepoints(BED files). Underlying patterns and datapoints are available at yeastepig- enome.org and as Excel_Composite_Data_Processed.xlsx in 01_Com- posite_Files at https://github.com/CEGRcode/2021-Rossi_Nature. An additional moving average of 20 bp (30 bp for Pol II elongation and Yrr1 composites) was performed for the purpose of improving visual clarity. Without this, the high-bp resolution of ChIP–exo resulted in peaks that were quite narrow in the 1-kb visualization window, such thattheirfillpatternswerelessvisuallyobvious.Forgene-bodytargets (Fig. 3c and Extended Data Fig. 5), smoothed strand-separated data were shifted 50 bp in the 3′ direction before combining strands. The rationaleforthisisthatwhenweexaminedeachstrandseparately,we noticedthatpatternsonthetranscribedstrandshowedsomemirror- ing on the nontranscribed strand. But this pattern was shifted in the 3′ direction relative to transcribed strand (that is, more downstream of the TSS). We surmise that this ‘double-vision’ effect was caused by efficient crosslinking such that the 5′–3′ λ exonuclease is generally stoppedatthebackendofthePolIIentourageonthetranscribedstrand and stopped at the front-end of the entourage on the nontranscribed strand. Shifting data on both strands by 50 bp in their respective 3′ directionspartiallycorrectedthisdoublevisionandreflectsthemiddle ofthecomplex.Intheabsenceofastrand-specific3′shiftforgene-body targets,patternsneartheTSSreflectthebackendofthePolIIentourage, and patterns near the TES represents its front end. The data in Fig. 5b and Extended Data Fig. 9b were not strand-shifted before removing strand information. Incompositeplots,they-axisislabelled‘Occupancy(a.u.)’(arbitrary units), reflecting y-axis scaling that was adjusted to highlight the pat- terning of the data. Within a single figure (including any Extended Data figure counterparts), occupancy levels can be compared across multiplepanelsonlyforthesamedataset.Occupancylevelsofdifferent datasetsinthesameordifferentpanelscannotbecompareddirectly. Only the peak positions are comparable. For Fig. 2, the MEME motif obtainedandshownforOrc6startsatposition2oftheACS.ForCbf1,the MEME motif starts at position 1 of CEN. Schematics reflect subjective interpretationsofpeaklocations,arenonlinearwithrespecttothedia- grammedDNAlinearity,anddonotreflectproteinmolecularweights. NascentRNA(CRAC)analysis ThisanalysisrelatestoFig. 4d.CRACdatasetsweredownloadedfrom GEO using accession code GSE97913. Raw sequencing data were trimmed of adapters and aligned to the sacCer3 genome using the recommendedparametersinref.26 .The5′endsofreads(corresponding to the 3′ ends of sequenced nascent RNA) were counted in a window from the TSS45 to 300 bp downstream (more 3′ on the ‘sense’ strand). Only those reads that mapped to the sense strand relative to the gene body were retained. Datasets were normalized such that the total tag countswereequal.However,asallanalysiswasinternaltoeachdataset, this had no effect on final output. This analysis relates to Extended Data Fig. 7b. TFIIB (Sua7) occu- pancy data (Read_1 5′ end) were counted in a 100-bp window centred on each promoter TSS. The list of all coding genes was filtered to be only head-to-head such that each gene possessed a promoter region overlapping/adjacent to another gene’s promoter (Supplementary Data 1 (1AZ–1BG)). Promoter regions were then separated into three groups: RP + STM, TFO and UNB. A separate Reb1-bound group was alsocreated.APearsoncorrelationwascalculatedforCRACsignalsfor onepromotersidecomparedwiththeotherside,withineachdataset. ClassificationofssTFs WeusedGOclassificationsandtheJASPARmotifdatabasetoidentify candidatessTFs.HerewedefineassTFasatargetthathasatleastfour ChExMix peaks in the total set of promoter regions, and an enriched motifthatisnotmoreenrichedwithanotherssTF.AsofOctober2019, the JASPAR database reported 175 nonredundant ssTF motifs for S. cerevisiae, which are based on experimental assays including in vitro protein-bindingmicroarrayswithpurifiedprotein52 .Ofthose,70cor- respondedtossTFs,inwhichweconfirmedtheirsitespecificityin vivo by ChIP–exo. Another two (Mot3 and Rgt1) were confirmed after this study was completed. As ChIP–exo can define site specificity within a few base pairs, this represents a remarkable degree of concordance between in vivo and in vitro binding. Because of co-occurrence of motifs in the genome, additional nearby motifs were also enriched for these ssTFs. If multiple targets had a match with essentially the same JASPAR motif, then we used GO descriptions and the literature toidentifythosethatweremostlikelytobedirectbinders(ssTFs).The rest were labelled as cofactors. For example, Nrg1 and Nrg2 bind the same motif, although JASPAR assigns this motif to Nrg1. We labelled both as ssTFs. Another equivalent example involves Met4, Met31 and Met32. Both Yox1 and Mcm1 have distinct motifs reported in JASPAR, and both biochemically interact. However, ChIP–exo reported the Mcm1 motif for both, with Mcm1 being much stronger. We therefore classified Yox1 as a cofactor in YPD at 25 °C, instead of a ssTF. Eight targets had GO annotations indicative of a ssTF and yielded robust motifs by ChIP–exo with a robust ChIP–exo pattern, but five of them hadnomotifinJASPAR(Nrg2,Hms2,Hmo1,War1andPip2),andthree had a different motif in JASPAR (Tea1, Rds2 and Sum1). These eight were also labelled as ssTFs. This resulted in 78 ssTFs that ChIP–exo/ ChExMix detected as bound to a motif in YPD at 25 °C. The remaining candidate targets that had JASPAR motifs were not labelled as ssTFs for the following reasons. First, one (Yox1) appeared site specific but wasclassifiedasacofactor.Second,oneisaGTF(TBP/Spt15).Third,21 producedChExMixbindinglocationsbutweredeemedtobecofactors inYPDat25 °C(thatis,theyhadboundlocations,butwerenotbound site-specifically). Their site specificity could be condition specific. Fourth,37werenottestedornotepitope-tagged(possiblybecauseof lethality or technical difficulty in tagging). The remaining targets did notpassourdetectionthresholds.SeeSupplementaryData 2 (1,9,11) for the complete list of candidate factors, JASPAR/cis-bp motifs, and matches to ssTF-bound location in ref. 53 . CircuitryinvolvingssTFs ThisanalysisrelatestoExtendedDataFig. 10.Weanalysedthesetof78 genesencodingssTFs(definedinYPD)alongwiththessTFsthatbound theirpromoterregionssitespecifically(SupplementaryData 2 (1K)).A circuit-like diagram was then constructed by connecting ssTFs to the ssTF-encodinggenestowhichtheybound.Thetotalnumberofgenes (ssTFandallothers)towhichassTFboundwasrecorded,separatedinto site-specifically bound versus those for which binding was recorded but a cognate motif was not detected. Theyeastepigenome.orgwebsite Design.Thebackendofyeastepigenome.orgiscomposedoftwointer- nalmodules:anodejsRESTapplicationandMongoDBdatabase(version
  • 12. Article 4.2.8).MongoDBstoressample-specificmetainformationandURLsin a JSON/BSON structure. The frontend of yeastepigenome.org is com- posedofaReactapplication,bootstrappedusingthecreate-react-app tool. A target page is subdivided into sections containing heatmaps, composite plots and other analyses and visualizations. The frontend retrievessampleinformationbymakinganapplicationprograminter- face(API)requesttothebackendapplication.Thefrontendisdesigned tosupportacartsystemfordownloadingtargetdatasets;ithasUCSC (https://genome.ucsc.edu) trackhub integrations and an integrated targetlookupontheSGDwebsite,anditcomeswithasetoffrequently asked questions (FAQs) with detailed explanations of all of the plots and visualizations. Targetlocations.ChExMixcalledbindingeventsusingastringentsta- tisticaltestofhighlylocalizedtagsthatwasoptimizedtominimizefalse positives41 . As a consequence, ChExMix did not call bound locations wheretagdistributionswerediffuseandmarginallyabovebackground (for example, chromatin remodellers). To potentially capture events withmarginalsignificance,wedividedeachsectorintofive‘subsectors’ and determined for each dataset whether there was enrichment over thenegativecontrol(masterNoTag_20180928)acrosseachsubsector. Wedefinedthesubsectorsasfollows:first,promoterregion(−350 bp to −75 bp relative to the TSS); second, TSS region (−75 bp to +150 bp relativetotheTSS);third,genebody5′-end(+150 bpto+450 bprelative to the TSS); fourth, gene body 3′-end (−400 bp to −100 bp relative to theTES);andfifth,TESregion(−100 bpto+100 bprelativetotheTES). Theratiooftagcounts(test/control)inasubsector(ortheselected region)wascalculatedafterthetestandthenegativecontrolsamples were normalized using the NCIS method54 . The following steps were taken to calculate the significance of tag enrichment in a subsector. First,test/controltagratiosforsubsectorswerecalculated,thencon- vertedtoalog2 scale.Second,aGaussianmodel,whichrepresentsthe backgroundratiooftagcounts,wasfittothedistributionoftagratios. Third,asignificancevaluewascalculatedwithrespecttotheGaussian model. Fourth, P values were adjusted with the Benjamini–Hochberg correction42 (P = 0.05). The subsector analysis of each dataset is pre- sentedasaseparatetabatyeastepigenome.org.Thesesubsectorswere not used for any other analyses herein. Motifdiscovery.Thede novomotifdiscoverypresentedatyeastepig- enome.orgwasachievedusingtheMEMEsuite55 asfollows:aChExMix peak .bed file was intersected with a curated BED file (Merged_sec- tors_for_MEME_924.bed)consistingofallgenesectors(thisreference dataset is available at in 02_References_and_Features_Files at https:// github.com/CEGRcode/2021-Rossi_Nature),withoverlappingregions mergedintoasingleregion.TheintersectedoutputBEDfilewassorted on the basis of the score reported by ChExMix for each peak. After sorting, the top 200 peak locations were bidirectionally expanded to 60 bp and the underlying DNA sequence was extracted in FASTA format. These sequences were used as the input for MEME55 . Default parameters were used, with the following exceptions: the minimum and maximum motif widths (mememinw and mememaxw) were set as 6 and 18, respectively. Datavisualization.Togenerateheatmaps,the‘TagPileUpFrequency’ tool was used with no tagshifts, single-base-pair bins, and tags set to equal with combined strands. The tool takes in an input of BED file containing regions that have at least one overlapping ChExMix peak andthetargetExperimentBAMfile.Thetooloutputsamatrixcontain- ingtagfrequencies,witheachrowrepresentingtheregionofinterest andeachcolumnasingle-base-pairbin.Thisoutputfilewasfedintoa heatmap script that uses Java TreeView’s algorithm and matplotlib to generatetherequiredheatmap.BEDfileswerepresortedonthebasis of the criteria indicated in each online graphical image before run- ningTagPileUpFrequencytogeneratedesiredheatmaps.Allheatmaps were set to the same contrast threshold, which is calculated from the tag pileup frequency matrix of BoundGenes and determining a 95th percentile cutoff from this frequency distribution. Togeneratecomposites,the‘TagPileUpFrequency’toolwasusedwith notagshifts,single-base-pairbinsandtagssettoequalwithcombined strands. One of the inputs to this tool is a BED file containing regions that have at least one overlapping ChExMix peak; the other is a BAM file.ThetoolwasrunonthetargetandmasterNoTag_20180928control BAM files individually, to generate two data files that were fed into a composite generation script. The script uses matplotlib, a python plotting library, to generate a combined composite plot. Reportingsummary Further information on research design is available in the Nature Research Reporting Summary linked to this paper. Dataavailability SeeSupplementaryData 4foralistofwheretofindavailabledataand codeonline.Inessence,allrawsequencingdataandpeakfilesfromthis study are available at the NCBI GEO (https://www.ncbi.nlm.nih.gov/ geo/)underaccessionnumberGSE147927.Processeddataareavailable at https://doi.org/10.26208/rykf-6050. Additional analyses and data are at yeastepigenome.org. We warn that single-replicate data files arenotlikelytohavemeaningfuldataandshouldnotbeusedwithout further replication. All underlying data used to generate composite plots, coordinate files and script parameters for Figs. 2–5, Extended DataFigs. 4,5,7,8bandSupplementaryFig. 1canbedownloadedfrom https://github.com/CEGRcode/2021-Rossi_Nature. Final composite plot values can be found in Supplementary Data 5. Codeavailability Code is available at https://github.com/CEGRcode/scriptmanager. 36. Vinayachandran, V. et al. Widespread and precise reprogramming of yeast protein-genome interactions in response to heat shock. Genome Res. 28, 357–366 (2018). 37. Wal, M. & Pugh, B. F. Genome-wide mapping of nucleosome positions in yeast using high-resolution MNase ChIP-Seq. Methods Enzymol. 513, 233–250 (2012). 38. Shao, D., Kellogg, G. D., Lai, W. K. M., Mahony, S. & Pugh, B. F. in Practice and Experience in Advanced Research Computing 285–292 (Association for Computing Machinery, Portland, OR, 2020). 39. Picard Toolkit. http://broadinstitute.github.io/picard/ (2019). 40. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078– 2079 (2009). 41. Yamada, N., Lai, W. K. M., Farrell, N., Pugh, B. F. & Mahony, S. Characterizing protein–DNA binding event subtypes in ChIP–exo data. Bioinformatics 35, 903–913 (2019). 42. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995). 43. de Hoon, M. J., Imoto, S., Nolan, J. & Miyano, S. Open source clustering software. Bioinformatics 20, 1453–1454 (2004). 44. Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019). 45. Xu, Z. et al. Bidirectional promoters generate pervasive transcription in yeast. Nature 457, 1033–1037 (2009). 46. Rhee, H. S. & Pugh, B. F. Genome-wide structure and organization of eukaryotic pre-initiation complexes. Nature 483, 295–301 (2012). 47. van Dijk, E. L. et al. XUTs are a class of Xrn1-sensitive antisense regulatory non-coding RNA in yeast. Nature 475, 114–117 (2011). 48. Albert, I., Wachi, S., Jiang, C. & Pugh, B. F. GeneTrack—a genomic data processing and visualization framework. Bioinformatics 24, 1305–1306 (2008). 49. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011). 50. Jiang, C. & Pugh, B. F. A compiled and systematic reference map of nucleosome positions across the Saccharomyces cerevisiae genome. Genome Biol. 10, R109 (2009). 51. Yen, K., Vinayachandran, V., Batta, K., Koerber, R. T. & Pugh, B. F. Genome-wide nucleosome specificity and directionality of chromatin remodelers. Cell 149, 1461–1473 (2012). 52. Badis, G. et al. A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Mol. Cell 32, 878–887 (2008). 53. MacIsaac, K. D. et al. An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006). 54. Liang, K. & Keleş, S. Normalization of ChIP-seq data with control. BMC Bioinformatics 13, 199 (2012). 55. Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994). Acknowledgements This work was supported by National Institutes of Health (NIH) grants ES013768, GM059055 and HG004160 to B.F.P.; National Science Foundation (NSF) ABI INNOVATION grant 1564466 to S.M.; grants from the Pennsylvania State University Institute for
  • 13. Computational and Data Sciences to B.F.P. and W.K.M.L.; and computation from Advanced CyberInfrastructure (ROAR) at the Pennsylvania State University. We thank D. Shao for her role as lead software engineer for the PEGR platform and for support through the Penn State Institute and Computational Data Sciences (ICDS) Research Innovations with Scientists and Engineers (RISE) team. We thank O. Lang for operating EpitopeID. Author contributions M.J.R. designed and conducted experiments; performed library sequencing and data analysis; designed and tested the quality-control pipelines and web page; trained and managed lab personnel to produce data; supervised the project and co-wrote the manuscript. P.K.K. designed, developed and implemented the quality-control pipeline, analysis pipeline and website; organized and maintained data files; and provided bioinformatic support. W.K.M.L. performed high-throughput data processing and analysis, and provided bioinformatic support and scientific discussion. N.Y. provided bioinformatic support and developed the initial quality-control pipeline. N.B. and C.M. performed ChIP–exo and MNase ChIP–seq experiments and provided scientific discussion. G.K. provided bioinformatic support. K.B. and N.P.F. conducted ChIP–exo experiments and performed library sequencing. T.R.B., J.D.M., A.V.B., K.S.M., D.J.R. and E.S.P. conducted ChIP–exo experiments. G.D.K. provided high-performance infrastructure architecture and development, and edge-computing infrastructure design and support. S.M. provided bioinformatic guidance and support. B.F.P. conceptualized the project and conclusions, designed experiments, analysed the data, wrote the main text of the manuscript and co-wrote the remaining parts. Competing interests B.F.P. has a financial interest in Peconic, LLC, which offers the ChIP–exo technology (US Patent 20100323361A1) implemented herein as a commercial service and could potentially benefit from the outcomes of this research. The remaining authors declare no competing interests. Additional information Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41586-021-03314-8. Correspondence and requests for materials should be addressed to B.F.P. Peer review information Nature thanks Vishwanath Iyer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Reprints and permissions information is available at http://www.nature.com/reprints.
  • 14. Article ExtendedDataFig.1|ChIP–exotargetswithinmeta-assemblages. a,Simplifiedviewoftranscriptionalregulation.AssTF(TF;forexample,Gal4) bindstoitscognatemotif(aUAS)withinpromotersincompetitionwith chromatin/nucleosomes(redbar).ThessTFrecruits(pink/greenarrow) cofactors(forexample,SAGAandMediator)thatassistintheassemblyofaPIC (comprisingTBP,TFIIB,andsoon)andofPolIIatthetranscriptstartsite(TSS) ofgenes.PolIIthentraversesthegenetothetranscriptionendsite(TES). b,DiagramshowingtheChIP–exoassay.Proteintargetsarecrosslinkedto DNA,whichisthenfragmented.Specificproteinsarecapturedthroughan engineeredTAPtagthatbindsthecommonFcregionofanyimmobilizedIgG. Near-base-pairresolutionisachievedusingastrand-specificλexonuclease thatdigestseachstrandofDNAinthe5′–3′directionuptothepointof crosslinking. c,Piechartshowingassayedtargetsseparatedbybroad GO-basedclassifications(innerring),orbyUMAP-basedclusteringof genome-widebindinglocations(outerring).Listedarethecommonnamesof ChIP–exotargetsthatgeneratedsignificantlyenrichedlocations(with ‘significance’definedintheMethodssection‘ChExMixlocations’),groupedby theirUMAP/K-means-derivedmeta-assemblageabbreviations(alongwith membershipcount),whicharefurthergroupedbythesimplifiedGO-related categories.SeealsoSupplementaryData 2 (2H).
  • 15. ExtendedDataFig.2|Datavisualizationanddiscoveryinyeastepigenome. org.Shownisanexamplewebbrowserviewatyeastepigenome.orgofChIP– exooccupancypatternsforalltargets(forexample,Reb1)aroundpredefined genomicfeatures.Rowsaresortedbygeneorpromoter(NFR/NDR)length,or bydistancefromtheindicatedreferencefeature(wherex = 0).Promoter classesinclude(fromtoptobottom)RP,STM,TFO,UNBandothers.See SupplementaryData 1 (1G,1J,1C)fortheidentificationnumbersand coordinatesofrespectiverowfeatures,andforthesortorderoffeaturesthat areconstantinalltargetdisplaywindows.Thelowerrightbox(whenpresent) providesstrand-separatedtag5′endsdistributedaroundthetarget’scognate DNAmotif,withthemotif’soppositestrand(red)invertedinthecomposite plot.Correspondingcolour-codednucleotidesequencesareshown.All images,underlyingdatavaluesanddatasetscanbedownloadedthrough embedded‘METADATA’target-specificlinksatyeastepigenome.org.Each datasetdownloadincludesaReadMefiledescribingthecontentsofthe download.Wewarnthattargetswithonlyasinglereplicatedidnotpassour significancethreshold.SeeSupplementaryData 1 (1C)forsortordersthatare notprovidedinthedownload.