The UCSC genome browser: A Neuroscience focused overview

The UCSC genome browser: A Neuroscience focused overview
Vicky Perreau, The Florey Bioinformatics Core
Tuesday 17th March 2015
vperreau@unimelb.edu.au

Overview

•  Browser

– Training

– Conﬁgura2on

– Manipula2on

– naviga2on

•  Loca2ng
and
loading
Encode
data

•  Data
types

UCSC
genome
browser

•  Purpose

–  Lots
of
data

–  Customisable

–  Detailed
info
pages

–  Access
images
(visigene)

–  Access
sequence
informa2on-‐FASTA

–  Do
sequence
alignments-‐

•  BLAT

•  Virtual
PCR

UCSC
genome
browser

•  Structure

– Built
upon
tables
of
data

– Each
table
must
have
genomic
coordinates

•  Eg.
list
of
known
genes

– Browser
visualizes
the
data

– Endless
customizable
searches

•  Correla2ng
one
type
of
data
with
another

Free
Open
Helix
tutorials-‐
great
introduc2on

Default
view
for

tracks
in
human

hg19

MBP

String
search
or
loca8on

Organisa2on
of
genomic
data
(customizable)

•  Chromosome
band

•  Gap
loca2ons

•  Known
genes

•  Predicted
genes

•  Phenotype
and
disease

•  Enhancer/promoter
data

•  Microarray
expression
data

•  Evolu2onary
conserva2on

•  SNPs
and
structural
varia2on

•  Repeated
regions

Types
of
Data

Reference
sequence

Annota8on
tracks

Gene/protein

informa8on

Comparision
with

other
species

SNPs

NGS
data:
raw
data
to
bigwig
files

filename.fastq
=raw
sequence
data,
sequence

and
quality
scores
only.

filename.bam
=aligned
sequence
data,
sequence

data
preserved.

filename.bedgraph
=
posi2on
data
only
for

reads,
no
sequence
data
preserved.

filename.bigwig
=
histogram
of
coverage
for

genomic
posi2on
only,
reads
and
sequence
data

not
preserved.
Small
file
size
allowing
for
ease
of

use
in
genome
browsers
and
overlay
of
mul2ple

bigwig
files.

NGS
data:
coverage
plots
for
RNAseq
data

Sebastian Schubert et al. Blood
2014;124:493-502
General
features
of
an
mRNA
transcript
as
visualized
by
RNA-‐seq.

Types
of
Data

NGS
data
coverage
plot

(histogram)
is
con8nuous.

SNP
posi8ons

are
discrete

Gene
models:

Line
height
denotes

exon,
intron
or
UTR

Arrows
show

direc8on
of

transcripton

Whole
page
overview

Expression (such as microarray)
Variation and Repeats
(including SNPs, copy number variation)
Groups of data (Tracks)
Mapping and Sequencing Tracks
Genes and Gene Prediction Tracks
(including sno/miRNA data)
Phenotype and Disease Tracks
Regulation (including TFBS)
mRNA and EST Tracks
Comparative Genomics
• As a group
• Individual species

s

Originally
selected

gene
is
in
black

Drag
like
Google
maps

s

ShiO/mouse
(right
click)
to
select

region
to
zoom
or
highlight
region

Range
covered
in
view

Data
from
the
gene

detail
page
and
links

out
to
other
resources

informative
description
other resource links
microarray data
mRNA secondary structure
links to sequences
protein domains/structure
orthologs in other species
Gene Ontology™ descriptions
mRNA descriptions
pathways
genetic association
studies
comparative toxicology
gene model

Select
a
track
of
interest

Link
out
to
Allen
Brain
Atlas

drag
to
reorder

Available
CNS
expression
data
in
hg19

BDNF
expression
by
RNAseq

ENCODE
project

•  In
2003
the
Na2onal
Human
Genome
Research
Ins2tute
embarked
upon:

•  The
ENClyopedia
Of
DNA
Elements
(ENCODE)

•  Aim
to
delineate
all
of
the
func2onal
elements
in
the
human
genome.
More
recent

data
includes
a
lot
of
mouse
data.

•  Goal:

•  To
provide
the
scien2fic
community
with
high
quality,
comprehensive

annota2ons
of
candidate
func2onal
elements
in
the
human
genome.

•  Func2onal
elements?

•  “discrete
region
of
the
genome
that
encodes
a
defined
product
(eg
protein)

or
a
reproducible
biochemical
signature,
such
as
transcrip2on
or
specific

chroma2n
structure”

•  Developed
detailed
experiment
guidelines.

• 
A
great
resources
if
you
are
considering
designing
your
own
NGS
experiment

(hdps://www.encodeproject.org/about/experiment-‐guidelines/)

ENCODE:
data
use
policy

•  Early
phase:

•  Moratorium
on
public
presenta2on
or
publica2on
of
data
un2l
9

months
aeer
release.

•  Now:

•  All
data
produced
will
be
available
for
unrestricted
use
immediately

upon
release
to
public
databases,
elimina2ng
the
nine-‐month

moratorium
previously
used
by
ENCODE.

•  External
data
users
may
freely
download,
analyze
and
publish

results
based
on
any
ENCODE
data
without
restric8ons
as
soon
as

they
are
released.

•  Must
include
appropriate
cita2on.

hdps://www.encodeproject.org/about/data-‐use-‐policy

ENCODE:
accessing
data

•  2003-‐2007:
Pilot
phase
examining
1%
of
the
genome

•  2007:
expanded
to
study
en2re
genome

•  2012:
30
high
proﬁle
ar2cles
published

•  2014:

>150
experiments
using
brain
or
spinal
cord
released

•  UCSC
was
the
original
Data
Coordina2on
Center
for
ENCODE
and
data

prior
to
2013
is
fully
integrated.

•  ENCODE
results
from
2013
and
later
are
available
from
the
ENCODE

Project
Portal.

hdp://genome.cse.ucsc.edu/encode/

Link
to
Encode
portal

Lots
of
CNS
data
made
public
in
2014

View
expression
data
in
UCSC
with
a
few
mouse
clicks…
Filterdatasetsondesiredcriteria.
BigwigfilesareeasytoviewinUCSC.

Select
an
experiment…

Click
“Visualise
data”
budon

Enter gene name
Note:
Not
all
experiments
have
a
“visualise
data”
budon.

For
some
experiments
you
can
down
load
the
bigwig
ﬁle

and
upload
it
into
UCSC
as
a
custom
track.
Data
from

some
experiments
may
require
some
addi2onal

formalng
for
viewing
in
a
genome
browser.

Transcription from
minus strand
Custom
tracks
automa2cally
loaded
at
top
of
the
browser

Transcription from
plus strand

Older
ENCODE
tracks
are
preloaded
in
UCSC
browser

•  Look
for
the
NHGRI
logo
•  Select
Human

(GRCh37/hg19)
Assembly

GENCODE
gene
models:
from
the
ENCODE
data

GENCODE: annotate all evidence based gene features
with high accuracy

Gene
model
tracks
from
diﬀerent
resources
may
vary.

View
Transcrip2on
data

MBP
expression
in
7
cell
lines

Select
region
and
add
ver2cal
highlight

Transcriptome
data

•  Other
tracks
in
the
“expression”
block
of
tracks

supply
data
on

– Poly
A
status

– Subcellular
localisa2on

– Proteogenomics-‐mapping
pep2de
loca2ons

– Start
and
end
points
of
RNA
molecules
in
cells

– Exon
array
and
RNAseq
data
both
available

•  Choose
them
all,
but
one
at
a
2me
to
start
with.

It’s
a
lot
of
data!

Drill
down
to
mul2ple
layers

•  Tracks
with
similar
data
collected
together:

– Super
tracks

•  View
meta
data

•  Many
customizable
op2ons

– Custom
filtering
thresholds-‐

•  level
of
detec2on

•  Dependent
on
project
and
technology

– Cell
lines
on
or
off

– Replicates
on
or
off

– Viewing
op2ons

Expression
levels
from
Sestan
Brain
data

Custom
tracks
(neuroscience)

Human

Mouse

GWAS
of
bipolar
disorder
showing
SNPs

Monoallelic
expression
in
mouse
CNS
cell
lines

Li
SM,
Valo
Z,
Wang
J,
Gao
H,
Bowers
CW,
et
al.
(2012)
Transcriptome-‐Wide
Survey
of
Mouse
CNS-‐Derived
Cells
Reveals
Monoallelic
Expression

within
Novel
Gene
Families.
PLoS
ONE
7(2):
e31751.
doi:10.1371/journal.pone.0031751

hdp://127.0.0.1:8081/plosone/ar2cle?id=info:doi/10.1371/journal.pone.0031751

Glutamate
Receptor,
Ionotropic,
AMPA
3

Use configure to increase the width of the track
name column to view complete cell line names

Monoallelic
expression
preserved
aeer
diﬀeren2a2on
into

neurons
an
astrocytes

Li
SM,
Valo
Z,
Wang
J,
Gao
H,
Bowers
CW,
et
al.
(2012)
Transcriptome-‐Wide
Survey
of
Mouse
CNS-‐Derived
Cells
Reveals
Monoallelic
Expression

within
Novel
Gene
Families.
PLoS
ONE
7(2):
e31751.
doi:10.1371/journal.pone.0031751

hdp://127.0.0.1:8081/plosone/ar2cle?id=info:doi/10.1371/journal.pone.0031751

Brain
RNAseq

hdp://web.stanford.edu/group/barres_lab/brain_rnaseq.html

Cell
type
speciﬁc
splice
variants
of
APP

Addi2onal
RNAseq
expression
data

available
from
Brain
Span

Type
gene
of
interest
into
search
bar.

Click here to get RNAseq
expression data.
Find genes with similar
expression profiles across region
and/or developmental age.
First
select
gene

RNAseq
data
view:
sorted
by
2ssue
region

Exon location (grey box)
White arrow
denotes sample
Change sort
order from
region to age
Download

RNAseq
data
view:
sorted
by
age

Change sort
order from
region to age
Increasing age 8 pcw to 40 years

Other
genome
browsers

•  Ensembl

•  hdp://asia.ensembl.org/index.html

•  WasU
browser

•  hdp://epigenomegateway.wustl.edu/browser/

•  IGV

•  hdp://www.broadins2tute.org/igv/

Viewing
BDNF
in
human
brain
RNAseq
data
in
Ensemble

Viewing
BDNF
in
human
brain
RNAseq
data
in
UCSC

Peak
expression
does
not
correspond
with
the
genomic
loca2on
of
a
coding
exon
for

BDNF,
but
rather
to
a
region
of
the
processed
non
coding
an2sense
transcript,

transcribed
oﬀ
the
opposite
strand.

Inhibi2on
of
BDNF
an2sense
transcript
increased

BDNF
protein

BDNF
an2sense
transcript
level
reduced

BDNF
protein
levels
increased

Acknowledgements

If
you
use
a
database
in
your
research
please
acknowledge
it.

•  Most
websites
have
a
page
where
they
specify
how
to
acknowledge

them,
usually
by
most
recent
pub.

•  Cita8on
or
acknowledgement
is
their
main
means
of
applying
for

con8nued
funding.

If
they
cant
get
funding
one
of
three
things
will
happen:

•  They
are
no
longer
free.

•  They
are
no
longer
maintained.

•  They
no
longer
exist!

Cau8on:

•  Check
update/news
page
of
an
unfamiliar
website.

Some
are
s8ll
accessible
but
not
maintained.

Informa8cs
resources
go
out
of
date
quickly
in
this
ﬁeld.
Look
for
recent
NAR
pub.

•  Be
sure
of
your
gene/protein
ID.
Synonyms
can
cause
havoc
when
searching
the
literature
and

databases
(esp
PPI
databases).
If
necessary
check
the
DNA/AA
sequence.

The UCSC genome browser: A Neuroscience focused overview

More Related Content

What's hot

Viewers also liked

Similar to The UCSC genome browser: A Neuroscience focused overview

Recently uploaded

The UCSC genome browser: A Neuroscience focused overview