This document summarizes three data visualizations created by a group of students to analyze the Divvy bike sharing dataset from Chicago. It describes the dataset used, which includes over 3 million records of bike trips and station information. The group created visualizations to show bike usage patterns by weekday/weekend and time of day, a network map of stations in downtown Chicago, and a circular network visualization of bike trips. The visualizations help answer questions about when and where riders are going, busy stations, and other usage patterns.
An intro to applied multi stat with r by everitt et al
Divvy Bike Visualizations
1. Divvy
Bike
Challenge
Visualizations
CSC
465
-‐
DATA
VISUALIZATION:
FINAL
PROJECT
GROUP
#1:
HASSAN
AL
ALAIWI,
RICARDO
LOURENÇO,
AND
MATT
SIEDLECKI
March
2015
2.
2
Contents
Abstract
.........................................................................................................................................................................
3
Description
....................................................................................................................................................................
3
Scope
............................................................................................................................................................................
3
Dataset
..........................................................................................................................................................................
3
Dataset
Variables
..........................................................................................................................................................
4
Final
Visualzaitions
........................................................................................................................................................
5
Usage
by
Weekday/Weekend
and
Time
of
Day
.......................................................................................................
5
Network
Map
of
Chicago
Loop
.................................................................................................................................
6
Circular
Network
Visualization
.................................................................................................................................
7
Discussion
.....................................................................................................................................................................
8
Usage
by
Weekday/Weekend
and
Time
of
Day
.......................................................................................................
8
Network
Map
of
Chicago
Loop
...............................................................................................................................
13
Circular
Network
Visualization
and
Related
Analysis
.............................................................................................
19
Summary
of
Team
Member
Contributions
.................................................................................................................
31
3.
3
Abstract
In
this
final
project
report,
we
are
trying
to
shed
light
on
some
of
our
final
data
visualizations
of
the
chosen
Divvy
bikes
dataset.
The
report
details
each
visualization
technique
used
to
display
information
about
the
dataset
and
possible
its
correlation
implications.
In
order
to
have
a
very
coherent
and
concrete
report,
a
few
segments
in
this
report
have
been
extracted
from
our
project
milestones
that
were
previously
submitted
as
part
of
the
project
progress.
Higher
resolution
files
of
the
final
visualizations
are
enclosed.
Description
Every
year,
Divvy
launches
a
data
challenge
providing
their
magnificent
dataset
in
the
purpose
of
scrutinizing
and
visualizing
the
data
under
different
categories.
This
year,
Divvy
celebrates
its
first
time
full-‐year
dataset
(2014)
with
over
3.2
Million
rows
of
data
where
it
is
even
more
challenging
and
more
enticing
for
data
scientists
and
other
participants.
Scope
We
are
tasked
to
fulfill
our
CSC
465
project’s
objectives
of
visualizing
a
dataset
through
the
best
visualization
techniques
that
were
discussed
throughout
the
course.
Since
both
mapping
and
geographical
data
is
available
in
this
dataset,
we
have
completed
multiple
graphs
that
visualize
the
dataset
statistically
and
geographically.
The
software
that
were
used
for
this
purpose
are
R-‐Studio,
Tableau,
JMP
and
ArcGIS
which
provided
a
sufficient
platform
for
our
objective.
Those
multiple
visualizations
provide
answers
to
the
following
questions1
in
the
most
clear
and
accurate
methods
that
we
could
have
achieved:
Ø When
&
Where
are
riders
going?
Ø What
are
the
most
and
least
busy
stations?
Ø What
interesting
usage
patterns
emerge?
Ø How
the
bikers’
demography
can
be
presented?
Dataset
The
full-‐year
dataset
is
broken
down
by
quarters
with
a
total
of
2.4
million
records.
However,
due
to
the
size
of
the
dataset,
we
have
mainly
used
a
Simple
Random
Selection
of
the
data
in
order
to
have
a
manageable
dataset
size
of
100,000
records
that
will
still
be
sufficient
for
our
educational
purposes.
Source:
Chicago
Divvy
Bikes
website
(Annual
data
challenge)
Website:
http://www.divvybikes.com/datachallenge
1
Some
of
these
inquiries
are
part
of
the
2015
Divvy
Data
Challenge
2
Description
of
the
dataset
variables
was
provided
by
Divvy
Data
Challenge
4.
4
Dataset
Variables
The
dataset
consists
of
two
tables
(sub-‐datasets)
which
will
be
used
in
the
project2
.
Ø Trips
dataset
(the
main
dataset):
This
dataset
is
the
main
project
dataset
which
includes
all
trips’
records
and
transactions
whenever
a
bike
is
rented
from
a
station.
The
following
12
variables
are
captured
in
every
data
row:
trip_id:
ID
attached
to
each
trip
taken
–
(Type:
Category
–
serial
key)
starttime:
day
and
time
trip
started,
in
CST
–
(Type:
Date&Time)
stoptime:
day
and
time
trip
ended,
in
CST
–
(Type:
Date&Time)
bikeid:
ID
attached
to
each
bike
–
(Type:
Category)
tripduration:
time
of
trip
in
seconds
–
(Type:
Numeric)
from_station_name:
name
of
station
where
trip
originated
–
(Type:
Category)
to_station_name:
name
of
station
where
trip
terminated
–
(Type:
Category)
from_station_id:
ID
of
station
where
trip
originated
–
(Type:
Category)
to_station_id:
ID
of
station
where
trip
terminated
–
(Type:
Category)
usertype:
"Customer"
is
a
rider
who
purchased
a
24-‐Hour
Pass;
"Subscriber"
is
a
rider
who
purchased
an
Annual
Membership
–
(Type:
Category)
gender:
gender
of
rider
–
(Type:
Binary)
birthyear:
birth
year
of
rider
–
(Type:
Numeric)
Ø Stations
dataset
(Table
relationship
dataset):
This
relations
dataset
includes
the
location
details
of
Divvy
stations
which
will
be
used
in
the
project
to
map
the
start
and
end
locations
of
each
bike
trip
used
in
the
main
table.
The
5
variables
are:
name:
station
name
–
(Type:
Category)
latitude:
station
latitude
–
(Type:
GPS
location)
longitude:
station
longitude
–
(Type:
GPS
location)
dpcapacity:
number
of
total
docks
at
each
station
as
of
12/31/2014
–
(Type:
Numeric)
online
date:
date
the
station
went
live
in
the
system
–
(Type:
Date&Time)
2
Description
of
the
dataset
variables
was
provided
by
Divvy
Data
Challenge
5.
5
Final
Visualizations
Visualization
#1:
Divvy
Bike
Usage
by
Weekday/Weekend
and
Time
of
Day
6.
6
Visualization
#2:
Network
Map
of
Chicago
Loop
8.
8
Divvy
Bike
Usage
by
Time
of
Day
and
Day
of
Week:
Discussion
Overview
For
this
visualization
small
multiple
maps
were
combined
with
histograms
to
display
information
about
Divvy
bike
usage
by
time
of
day
and
day
of
week.
Looking
horizontally,
the
maps
show
data
in
4-‐hour
blocks
starting
at
midnight.
The
histograms
on
the
top
and
bottom
display
a
histogram
of
the
total
system
usage.
The
segments
drawn
on
the
map
illustrate
the
popular
routes
selected
using
a
combination
of
thresholds
for
number
of
segments
on
a
map
and
minimum
usage
needed
to
be
considered
for
the
map.
Additional
discussion
on
the
design
decisions
made
to
select
which
bike
trip
segments
were
included
follows
in
the
design
consideration
section.
The
viewer
is
able
to
discern
a
number
of
pieces
of
information
from
this
graphic,
including:
• Usage
is
much
higher
during
the
week
(than
the
weekend)
• Weekday
usage
has
a
bimodal
distribution
with
peaks
during
morning
and
evening
commuting
times.
9.
9
• More
commuters
use
the
bikes
in
the
evening
than
in
the
morning
• Weekend
usage
is
much
less
than
weekday
• Weekend
usage
has
a
unimodal
distribution
centered
in
the
early
afternoon.
• In
general,
off-‐peak
hours
have
riders
scattered
throughout
the
city,
especially
near
train
stations,
while
usage
is
more
concentrated
during
the
day
• Weekend
is
heavily
concentrated
along
the
lakeshore,
Lincoln
Park,
Navy
Pier,
and
some
smaller
tourist
locations
such
as
the
Hyde
Park
Museum
Campus
• Lakeshore
path
is
more
prominent
in
evening
commute
hours
than
morning
commute
hours,
possibly
due
to
higher
system
usage
during
that
time
Design
Considerations
Number
of
Small
Multiple
Maps
The
final
visualization
splits
the
entire
day
up
into
4-‐hour
blocks
and
shows
6
maps
for
weekday
and
6
for
weekend
rides.
We
chose
to
employ
an
equal
number
of
hours
in
each
map
to
make
it
clear
to
viewer.
That
gave
us
the
option
of
2,
4,
6,
or
12-‐hour
blocks.
Twelve-‐hour
blocks
were
not
seriously
considered
because
it
would
not
show
very
much
interesting
patters
in
the
data.
On
the
other
hand,
2-‐
hour
blocks
would
have
created
double
the
maps
in
the
final
visualization,
and
we
concluded
that
would
be
too
much.
Ultimately,
4-‐hour
blocks
were
convenient
because
it
clearly
differentiated
the
afternoon
(Noon-‐4PM)
from
the
commuting
hours
after
4PM.
Trip
Segments
One
thing
that
became
very
clear
initially
was
that
a
key
design
decision
was
how
to
display
the
trip
segments.
Showing
all
segments
was
a
jumbled
mess
that
yielded
minimal
useful
information.
We
attempted
including
all
segments,
but
making
the
lines
very
thin
on
low
traffic
routes,
and
bigger
on
the
higher
traffic
routes,
but
still
found
that
this
was
very
cluttered.
Additionally,
scaling
the
line
thickness
was
problematic
because
the
data
was
highly
skewed
and
we
did
not
want
a
small
number
of
stations
to
dominate
the
visualization.
Arguably
that
is
an
aspect
of
the
data
that
could
be
better
highlighted
in
my
visualization;
however,
in
the
final
visualization
all
lines
are
the
same
(relatively
thin,
but
viewable)
size,
which
adds
clarity
by
better
allowing
you
to
see
the
trips
that
are
included.
We
experimented
with
both
thresholds
of
the
number
of
trips
per
4-‐hour
block,
and
a
ranking
of
the
top
n
trips
for
that
threshold.
The
tradeoff
here
was
that
if
you
set
an
absolute
trip
cutoff
then
the
off-‐hours
have
literally
no
data
(unless
you
overwhelm
the
peak
charts),
and
if
you
choose
to
show
the
top
n
trips
then
it
appears
at
a
glance
that
traffic
is
equal
at
all
times
of
day
when
it
is
actually
highly
skewed
to
certain
times
of
day.
Ultimately
a
two-‐pronged
approach
was
employed
to
deal
with
this.
First,
we
compromised
and
chose
up
to
75
trips
for
each
map,
but
only
included
them
if
they
averaged
at
least
5
trips
per
hour.
This
allowed
us
to
both
show
data
for
non-‐peak
times,
but
also
not
make
it
appear
visually
that
traffic
was
comparable
at
all
hours
across
the
day.
Secondly,
the
histograms
(which
will
be
explained
in
greater
10.
10
detail
in
a
subsequent
section)
add
additional
context
around
which
times
of
day
experience
the
most
traffic.
Background
Tile
Image
We
experimented
with
a
number
of
different
backgrounds.
There
was
a
compromise
to
be
made
between
showing
additional
context
in
the
background
layer
with
more
detail,
and
allocating
more
of
the
available
pixels
to
show
data
about
the
Divvy
bikes.
In
the
first
iteration
a
very
plain
background
that
did
not
show
additional
data
beyond
the
lake
and
station
locations
was
used.
But
after
getting
feedback,
subsequent
iterations
included
a
street
map.
To
select
a
background
a
number
of
options
from
the
OpenStreetMap
package
in
R
were
attempted.
The
options
that
were
considered
are
displayed
below
with
the
Divvy
bike
locations
plotted
on
them.
We
attempted
to
strike
a
balance
between
showing
some
context
to
allow
the
viewer
to
contextualize
an
individual
point
on
the
map,
and
a
need
not
to
make
the
background
dominate.
11.
11
Plot
Area
There
are
a
total
of
300
Divvy
bike
stations
in
the
city,
but
they
are
not
distributed
evenly
across
the
city.
In
particular,
most
of
the
station
locations
and
an
overwhelming
number
of
the
most
popular
locations
are
found
in
the
Loop
and
north
side
neighborhoods.
We
considered
several
approaches
for
cropping
the
map.
1. City
limits
• One
approach
would
be
to
include
the
entire
city
in
the
map.
This
would
highlight
the
discrepancies
that
exist
where
certain
parts
of
the
city
have
no
Divvy
stations.
The
lack
of
Divvy
stations
in
certain
areas
in
the
city
was
an
interesting
aspect
of
the
data.
The
City
sponsors
divvy,
but
stations
are
not
available
in
all
neighborhoods.
However,
this
approach
was
not
used
in
the
final
visualization,
because
we
concluded
that
cropping
that
map
around
the
entire
city
led
to
a
lot
of
“blank”
space
on
the
map
that
could
have
been
more
effectively
utilized
by
zooming
in
on
existing
locations.
2. Divvy
station
locations
• This
was
my
chosen
approach.
This
was
a
compromise
between
utilizing
space
to
show
the
Divvy
data,
while
still
showing
all
Divvy
stations
that
exist
for
this
analysis.
3. Zooming
in
only
on
a
popularity/usage
threshold
and
using
those
stations
or
focusing
on
specific
neighborhood(s)
such
as
the
loop
• While
a
more
focused
analysis
of
a
neighborhood
could
have
been
interesting,
it
also
could
have
masked
interesting
patterns,
such
as
how
on
the
weekends
there
is
more
activity
in
the
Hyde
Park
Museum
campus,
and
further
north
along
the
lakefront
trail
on
the
weekend
would
be
lost
if
the
map
zoomed
in.
12.
12
Direction
It
would
have
been
very
interesting
to
show
the
direction
of
the
trips
on
the
maps;
however,
the
approaches
that
were
explored
did
not
improve
the
visualization.
We
experimented
with
using
color
to
show
direction,
but
ran
into
a
couple
of
challenges
with
that.
The
first
challenge
was
defining
what
colors
to
group
together.
Initially
started
by
using
two
colors
and
then
adjusting
the
color
of
the
line
depending
on
whether
or
not
the
user
was
heading
north
or
south.
That
added
some
information,
but
I
concluded
that
was
confusing
because
in
many
cases
(i.e.
commuting
time)
the
more
relevant
direction
is
whether
or
not
you
are
headed
towards
the
Loop.
So
a
next
iteration
I
used
color
to
indicate
whether
or
not
you
were
headed
to
or
from
the
loop
based
on
a
cutoff
of
Madison
Street.
This
was
okay,
but
I
felt
that
it
was
not
effective
for
the
popular
bike
stations
in
the
West
Loop
near
the
train
stations
and
I
felt
that
it
did
not
add
much
value
for
some
of
the
weekend
locations
where
people
are
less
likely
to
be
commuting.
Although
direction
was
explored,
it
was
not
a
dimension
included
in
the
final
visualization.
Histograms
The
initial
iteration
of
this
visualization
included
only
the
small
multiple
maps
without
the
histograms
that
are
included
at
the
top
and
bottom.
The
maps
illustrate
a
number
of
interesting
aspects
of
the
data;
however,
they
do
not
explicitly
answer
simple
questions
that
you
might
be
interested
in
if
you
were
curious
about
Divvy
bike
usage
by
day
of
week
and
time
of
day.
Specifically,
since
this
visualization
attempted
to
illustrate
patterns
around
usage
by
weekday/weekend
and
time
of
day,
we
wanted
to
make
it
very
easy
for
the
viewer
to
discern
a
few
key
facts
about
the
data:
• Usage
is
much
higher
during
the
week
(than
the
weekend)
• Weekday
usage
has
a
bimodal
distribution
with
peaks
during
morning
and
evening
commuting
times.
• More
commuters
use
the
bikes
in
the
evening
than
in
the
morning
• Weekend
usage
is
much
less
than
weekday
• Weekend
usage
has
a
unimodal
distribution
centered
in
the
early
afternoon.
Our
approach
for
illustrating
those
key
aspects
of
the
data
was
to
make
histograms
and
include
them
in
the
chart.
There
was
some
thought
as
to
where
to
place
the
histograms.
It
was
a
tradeoff
between
making
the
histogram
shows
as
much
information
as
possible
while
preventing
confusion
and
possibly
distracting
from
the
graphs.
Ultimately,
I
erred
on
the
side
of
clarity
and
simplicity
by
putting
the
histograms
above
and
below
their
respective
maps.
The
histograms
do
not
provide
exact
values
(there
are
no
labels),
but
they
do
illustrate
the
main
themes
(outlined
above).
Adding
labels
and
putting
the
two
histograms
on
the
same
axis
would
likely
have
yielded
improvements
to
the
histograms
as
standalone
visuals,
but
as
part
of
the
larger
visual,
maintaining
clarity
was
the
driving
design
principle.
We
attempted
to
align
the
histogram
with
the
small
multiples
so
that
the
four-‐hour
time
periods
in
the
small
multiples
had
the
corresponding
to
add
consistency
across
the
individual
components
of
the
visual
and
facilitate
the
histogram
aiding
in
the
interpretation
of
the
maps.
13.
13
Network
Analysis:
Discussion
Overview
For
this
analysis,
the
idea
was
to
represent
the
overall
flow
between
Divvy
stations
using
the
whole
dataset.
There
are
many
possible
representations
for
the
Divvy
dataset,
but
being
a
georeferenced
data
allows
you
to
see
how
this
bike
sharing
system
is
related
to
the
city,
and
it’s
infrastructure,
in
a
compact
and
accurate
view.
Data
and
Systems
Used
For
this
visualization
we
was
used
the
whole
2014
Divvy
Challenge
Dataset,
after
making
some
data
transformation.
First
by
summarizing
all
of
the
routes,
grouping
them
by
their
origin
and
destination
station,
and
ID’s.
This
generated
a
calculated
variable
that
has
the
record
count
for
each
route.
After
this,
each
record
was
georeferenced
by
merging
the
origin
and
destination
fields
(related
to
stations)
with
their
respective
geographical
coordinates.
This
procedure
was
done
using
SPSS
Modeler,
due
to
the
data
size,
and
having
as
input
and
output
CSV
files.
Once
the
preprocessing
was
done,
the
dataset
was
opened
into
ArcGIS
10.2.2.
After
this
was
loaded
separately,
the
dataset
containing
each
Divvy
Station,
and
we
also
loaded
a
georeferenced
CTA
Stations
dataset,
obtained
from
the
City
of
Chicago
Data
Portal.
There
was
a
hypothesis
on
a
possible
relationship
between
CTA
and
Divvy,
because
of
people
commuting
may
use
both
forms
of
transportation,
and
so
this
is
why
we
also
explored
this
supplemental
data
source.
Methodology
Displaying
movement
data
using
maps
is
tricky,
because
in
addition
to
the
two
dimensional
data
that
maps
usually
display,
we
also
have
additional
dimensions
including
movement
and
time.
Time-‐lapse
cartography
is
a
direct
option,
and
you
can
use
a
sequence
of
overlaid
maps
of
same
region,
to
try
to
figure
out
differences
on
space.
When
you
are
dealing
with
small
changes,
not
on
the
whole
map,
it’s
better
to
use
a
Flow
Map,
or
a
Network
Map.
A
Flow
Map
is
designed
to
represent
a
relation
of
one
(or
a
few)
source(s)
to
many.
Its
usage
comes
from
early
representation
between
countries
in
the
colonial
period
of
history.
14.
14
Figure
1-‐
Example
of
a
Flow
Map.
Charles
Minard
-‐
Minard,
C.
J.
"Carte
figurative
et
approximative
des
quantités
de
vin
français
exportés
par
mer
en
1864".
lith.
(835
x
547),
1865.
Copy
[from
http://en.wikipedia.org/wiki/Flow_map
here].
A
Network
Map,
on
the
other
hand
has
the
objective
to
show
relations
of
many
to
many
features
on
a
map.
A
popular
use
is
for
airline
routes,
with
connections
between
local
airports,
and
major
hubs:
Figure
2-‐
American
Airlines
OneWorld
Map
(http://www.aa.com/content/images/production/generic/onworld-‐map.jpg)
15.
15
Another
popular
example
of
a
Network
Map
was
recently
presented
by
Facebook,
who
displayed
the
connections
between
groups
of
users:
Figure
3-‐
Facebook
User
Connections
(Obtained
on
Facebook.com)
In
this
case,
rather
than
just
displaying
the
connections,
they
were
overlaid
among
themselves,
and
with
transparency
usage
it
was
possible
to
have
an
accumulated
view
of
these
relations,
allowing
the
viewer
to
see
clearly
where
the
traffic
comes
from
and
goes
to,
and
also
about
it’s
intensity.
For
our
case,
the
goal
was
to
properly
display
the
routes
that
are
on
the
whole
dataset,
with
differential
scaling
and
color
grading
to
pop
up
the
most
used
routes,
even
with
an
estimate
of
the
usage
level,
but
without
sampling
the
data,
keeping
all
routes
shown.
This
is
a
good
effort,
considering
that
almost
2,400,000
trips
were
described
on
the
dataset,
complicating
the
georeferencing,
the
load,
and
the
transformation
of
the
data
to
lines,
and
find
proper
representation
on
color,
transparencies,
and
most
important
scaling.
Multiscale
representation
on
maps
is
an
old
challenge,
because
they
are
normally
interactive.
Using
GIS,
it’s
different
because
each
view
could
be
rendered
separately.
Also
it
was
a
challenge
to
represent
the
CTA
dataset.
The
idea
was
to
see
its
influence
on
the
Divvy
system.
So,
the
design
was
created
by
calculating
a
buffer
from
a
certain
distance
of
a
CTA
station,
to
see
where
it
is
located
a
possible
influence
zone
between
CTA
and
Divvy
stations,
suggesting
commutation
among
those
systems.
For
this
two
maps
were
created,
one
with
a
whole
view
of
the
Divvy
Stations,
and
other,
in
a
more
detail
level,
to
show
specificities
of
this
interaction
in
the
Loop.
18.
18
The
presented
maps
shows
the
flow
among
Divvy
Stations
(red
dots),
with
connections
using
just
two
data
categories,
one
with
thinner
lines
represented
as
green,
with
routes
with
record
counts
between
log101
(0)
and
log101000
(3),
and
thicker
red
lines
describing
routes
with
record
counts
between
log101000
(3)
and
log1010000
(4).
These
categories
were
normalized
with
logarithmic
scaling
because
of
the
difference
on
magnitude
between
routes,
as
a
way
to
proper
represent
different
dimension
levels
on
a
same
graph.
Final
Considerations
Looking
at
those
maps,
some
inferences
could
be
made.
However,
it
is
important
to
remember
that
this
analysis
does
not
suggest
causation
on
these
relationships.
On
the
first
map,
with
a
broader
view
of
the
Divvy
Stations
in
Chicago,
it’s
possible
to
see
a
high
concentration
of
routes
along
the
lakeshore,
with
grading
to
countryside,
clearly
shown
by
the
predominance
of
red
flows
on
east
grading
to
green.
On
this
map
it
is
also
possible
to
see
that
the
southern
and
western
stations,
as
well
as
many
northern
ones,
there
are
no
flow
lines
being
draw.
Flow
exists
on
those
stations,
but
it’s
not
represented
on
this
map
because
the
origin
and
destination
station
were
often
the
same
place.
Considering
the
buffers
of
half
a
mile
drawn
centered
on
each
CTA
station,
it
is
possible
to
see
that,
despite
the
loop,
much
traffic
and
stations
overlapping,
there
is
a
relation
of
green
flow
and
CTA
stations,
even
with
more
radial
distribution
of
lines,
into
countryside
and
also
a
few
to
the
shore.
Perhaps
that
could
be
the
commuters
connecting
from
their
houses,
work,
or
leisure
places
to
the
CTA.
And
then
taking
the
Divvy
bikes
for
the
remainder
of
their
trip.
Anonymization
also
limits
our
ability
to
merging
these
datasets
and
perform
a
deeper
analysis
of
commuting
patterns.
The
second
map
focuses
on
the
Loop.
The
main
concentration
of
flows
is
on
the
surroundings
of
the
Grand
Central
Station
(Metra),
within
the
Loop,
Merchandise
Mart
and
The
Magnificent
Mile.
Most
of
these
should
be
inferred
to
be
people
going
from
and
to
work,
because
these
three
areas
are
highly
related
on
the
main
commuter
train
stations.
A
second
major
trend
is
verified
on
the
Adams/Wabash
CTA
station
and
the
Navy
Pier,
Millennium
Park,
The
Chicago
Yacht
Club,
and
southern
to
Adler
and
Field
Museums.
With
these
characteristics,
it’s
also
possible
to
suggest
that
these
high
intensity
routes
have
more
relation
on
tourism,
rather
than
work,
by
the
use
and
occupation
of
space.
Also,
the
high
traffic
near
the
lakeshore
stations
reinforces
this,
as
those
are
places
that
many
people
go
for
leisure
activities
at
the
beach
and
parks
A
learning
that
we
took
away
from
this
part
of
the
project
was
that
it
is
possible
to
improve
map
visualizations
and
interpretations
by
using
a
full
a
GIS
system,
rather
than
just
a
map
plot.
It
aggregates
interactivity
capabilities,
and
also
tools
designed
for
spatial
analysis,
allowing
the
end-‐user
to
explore
the
initial
dataset,
but
also
integrating
this
with
others,
amplifying
the
spatial
analysis.
19.
19
Circular
Network
Visualization:
Discussion
and
Related
Analysis
Visualization
techniques:
Below
are
the
top
visualizations
that
were
created
as
part
of
the
same
analysis
that
led
to
the
circular
network
map
visualization:
1. Divvy
bikes
rush
hours
Ø Description:
These
visualizations
highlight
the
utilization
time
of
Divvy
Bikes.
The
heatmap
in
plots
the
“hours
of
the
day”
in
the
x-‐axis
and
the
“days
of
the
week”
in
the
y-‐
axis.
The
count
of
the
rented
bikes
is
represented
through
the
heatmap
matrix.
Colors
of
the
heatmap
varies
between:
(Green
–
Yellow
-‐
Red)
in
response
to
the
bikes
“count”
levels
which
I
believe
clearly
draw
our
eyes
to
the
peak
hours
plotted
in
the
red/orange
colors
of
the
heatmap.
Throughout
my
analysis
and
examining
the
data,
I
release
the
importance
of
segregating
users’
types:
(Subscribers
&
Customers)
in
separate
plots
for
almost
all
my
visualizations.
On
the
left
side,
there
is
a
plot
of
data
clustering
of
(day
of
the
week)
based
on
which
levels
on
the
y-‐axis
are
sorted.
20.
20
Ø Data
Analysis:
Subscribers’
heatmap:
It
appears
that
the
bikes’
highest
demand
during
the
weekdays
moves
along
the
rush
hours
(7:00
–
8:00
and
16:00
–
18:00).
There
is
also
a
small
-‐-‐
but
worth
mentioning
-‐-‐
and
spread
demand
of
the
bikes
between
10:00
and
15:00
during
the
weekends.
Subscribers
would
also
tend
to
rent/return
bikes
at
relatively
late
times
on
Friday
night
and
Saturday
night
-‐-‐
displayed
through
the
lighter
green
color.
We
can
also
see
that
subscribers
seem
to
be
leaving
their
work
a
little
early
(or
on
time)
on
Friday
and
therefore
return
home
a
little
early
probably
for
some
weekend
plans.
On
a
similar
note,
we
can
use
the
scale
to
approximately
count
number
of
rentals
per
hour.
It
seems
that
more
people
use
the
bikes
to
return
home
rather
than
going
to
work.
Probably,
people
avoid
arriving
to
work
sweating
and
tired
or
they
avoid
arriving
late
to
work
and
therefore
they
prefer
to
arrive
refreshed
and
on
time.
Customers’
heatmap:
Non-‐subscribers’
(casual
customers)
have
an
inverse
demand.
Their
highest
demand
is
during
the
weekends
between
(10:00
–
19:00).
Most
demand
demand
occurs
between
11:00
–
18:00
during
the
weekdays.
It’s
also
worth
mentioning
-‐-‐
using
cluster
analysis
-‐-‐
that
more
customers
use
divvy
bikes
in
the
first
and
last
working
days
of
the
week
(Monday
and
Friday
respectively).
I
would
only
assume
that
tourists
who
are
visiting
the
city
tend
to
take
one
day
off
work
(Monday/Friday)
to
have
a
longer
weekend,
which
therefore
explains
the
busier
traffic
during
first/last
day
of
weekdays.
Ø History
of
revisions:
The
revision
of
this
final
heatmap
evolved
over
time.
I
started
with
the
simple
heatmap
function
that
was
covered
in
class.
Then,
I
made
some
more
research
about
other
available
heatmaps
in
R
to
discover
the
newly
created
heatmap3
package
-‐-‐
launched
on
June
2014.
Some
further
revisions
were
implemented
on
the
map
including:
colors,
data
cluster,
scale
and
axis.
One
very
tricky
part
was
to
reformat
the
data
in
a
matrix-‐style,
which
is
necessary
for
heatmaps.
Re-‐grouping
the
data
rows
by
their
corresponding
hours
and
days
took
a
lot
of
time
and
research.
In
the
final
graph,
the
two
users-‐types
were
separated
to
enable
more
in-‐depth
analysis.
Version
1
Version
2
22.
22
2. Divvy
bikes
traffic
flow
among
Chicago
districts
23.
23
Ø Description:
These
two
visualizations
highlight
the
traffic
flow
of
Divvy
bikes
between
Chicago
Districts
for
both
subscribers
and
customers.
We
can
see
an
inner
arc
and
three
outer
arcs.
The
inner
arc
represent
Chicago
districts
(six
districts
are
present
in
this
database)
each
with
a
different
color.
Each
district
has
two
different
sets
of
arrows/lines:
the
set
that
has
the
same
color
as
the
district
represents
outgoing
traffic
(divvy
bikes)
starting
from
that
location
-‐-‐
whereas
the
set
with
different
color
than
the
district
represents
incoming
traffic
arriving
to
that
location.
The
overall
magnitude
of
district
traffic
is
represented
through
the
scale
in
the
inner
arc
-‐-‐
whereas
the
magnitude
of
each
arrow/route
is
represented
by
the
thickness
of
the
arrow.
Each
one
of
the
outer
arcs
represents
a
percentage
of
traffic
flow.
The
first
arc
(the
very
outer
one)
shows
the
percentage
weight
of
the
overall
(incoming
and
outgoing)
traffic
in
that
particular
one
district.
The
second
outer
arc
shows
the
percentage
of
the
incoming
traffic.
The
third
arc
shows
the
percentage
of
the
outgoing
traffic.
These
arcs
are
mainly
used
for
comparison
purposes.
24.
24
Ø Data
Analysis:
Subscribers’
network
diagram:
From
the
traffic
scale,
it
appears
that
the
North
Side
is
the
busiest
location
with
the
largest
traffic
whereas
the
South
Side
is
the
least
busy
area.
A
lot
of
users
commute
to
stations
within
the
North
Side
Area
or
to
North
Loop.
Chicago
North
Side
is
considered
to
be
the
most
densely
populated
residential
area3
,
which
explains
the
heavy
Divvy
traffic.
Interestingly,
both
the
Loop
and
West
Loop
areas
have
almost
the
same
magnitude
of
traffic
flow,
although
I
was
expecting
a
busier
traffic
being
in
the
city
center.
In
addition,
traffic
in
West
Loop
and
South
Loop
looks
almost
symmetrical.
When
comparing
the
two
outer
arcs
(incoming
and
outgoing),
we
can
see
that
both
are
identical.
They
have
the
same
magnitude
and
even
the
same
colors
order,
which
means
that
we
have
a
very
consistence
traffic
flow
in
these
two
areas.
We
can
also
note
minimal
traffic
between
far-‐apart
areas
such
as:
Southern
and
Northern
areas.
Customers’
network
diagram:
Similar
to
the
subscribers’
diagram,
the
North
Side
is
still
the
busiest
location
for
the
casual
customers.
What
is
interesting
is
that
the
two
diagrams
form
exactly
the
same
trend
in
the
North
Side
-‐-‐
Some
major
traffic
occurs
within
the
North
Side
and
to
the
North
Loop.
Another
interesting
observation
is
that
the
West
Loop
has
significantly
shrunk;
yet
it
is
still
very
symmetrical!
Unlike
the
subscriber
users,
casual
customers
are
less
interested
in
using
Union
and
Ogilvie
Train
Stations,
which
are
accountable
for
heavy
traffic
flow
for
the
suburbs.
In
both
models,
trips
from/to
North
Side
and
North
Loop
hold
between
50%
-‐
70%
of
the
overall
traffic
in
Divvy
Stations.
Ø History
of
revisions:
At
the
beginning,
I
was
not
sure
what
the
best
way
to
visualize
the
traffic
from/to
Divvy
stations
in
a
meaningful
way.
I
started
with
a
simple
heatmap
to
display
network
in
a
simple
and
effective
way.
It
worked
just
fine
but
it
was
not
very
evocative
and
conclusion
did
not
stand
out.
Then,
I
came
across
the
new
software
and
tried
to
map
all
the
250
stations
into
one
network
diagram.
The
graph
was
not
expressive
and
had
a
spaghetti
shape.
The
names
of
the
stations
were
overlapping
and
the
thickness
of
the
lines
did
not
have
any
insights.
An
important
suggestion
rises
during
the
final
presentation
to
group
the
stations
together.
So,
I
made
an
attempt
to
group
stations
by
Chicago
76
neighborhoods.
The
diagram
became
much
3
Source:
http://en.wikipedia.org/wiki/Community_areas_in_Chicago#North_side
25.
25
better
but
required
some
additional
grouping.
The
final
visualization
did
group
the
neighborhoods
further
by
their
geographical
locations
(Chicago
districts4
).
The
trick
(and
most
time-‐consuming)
part
was
grouping
stations
by
their
locations.
Being
an
international
student,
it
was
a
very
fun
exercise
to
get
to
know
the
different
neighborhoods
of
ChicagoJ.
I
used
Wikipedia
and
Chicago
Portal
to
precisely
go
through
the
locations
and
build
the
final
version.
Version
1
Version
2
Version
3
Version
4
4
Source:
http://en.wikipedia.org/wiki/Community_areas_in_Chicago
27.
27
3. Divvy
bikes
rentals
over
the
year
seasons
Ø Description:
I
have
been
trying
to
plot
a
time
series
visualization
as
this
idea
seems
unique
and
came
to
me
suddenly
while
I
was
doing
some
research.
The
x-‐axis
represents
the
timeline
of
2014.
The
y-‐axis
has
dual
axes.
The
left
axis
represents
the
total
bikes
rentals
whereas
the
right
axis
represents
the
mean
temperature
in
(ᴏ
F).
The
two
lines
are
differentiated
by
two
different
colors.
Ø Data
Analysis:
We
are
here
trying
to
see
the
correlation
between
the
temperature
and
number
of
rented
bikes
over
time.
We
know
that
it
is
difficult
to
use
bikes
during
rainy
or
snowy
seasons,
which
is
in
general
associated
with
temperature.
During
the
months
of
December
through
March
where
temperature
is
around
20
ᴏ
F,
we
have
the
least
bikes’
rental
activities.
However,
when
temperature
starts
warming
up,
rentals
start
to
pick
up
till
it
reaches
its
peak
season:
June
through
September
where
temperature
is
around
70
ᴏ
F.
We
can
see
that
both
curves
are
almost
identical
with
the
exception
of
a
few
outlier
days
where
we
had
a
dramatic
drop
or
rise
of
rentals
against
the
trend.
These
outliers
can
be
further
explored
using
a
new
dimension
of
dataset
specialized
in
Chicago
events
possibly.
Ø History
of
revisions:
The
most
challenging
part
here
was
the
necessity
of
having
a
continuous
timeline
in
order
to
have
an
accurate
time
series.
Since
I
was
working
on
a
sample
data,
I
had
to
go
back
and
work
28.
28
with
the
original
and
complete
dataset
(2.4MM
data
rows)
for
this
visualization.
I
tried
every
time
to
plot
the
time
data
against
different
statistical
variables
in
order
to
explore
different
aspect
of
the
data.
Version
1
Version
2
Version
3
29.
29
4. Divvy
bikes
stations
capacities
and
concentrations
Ø Description:
The
two
visualizations
associate
the
geographical
variables
with
numerical
variables
in
order
to
have
a
more
meaningful
and
informative
impact.
Number
of
docks
(bikes
spaces)
is
plotted
against
stations
locations
in
the
first
graph.
A
color
scale
of
the
stations’
docks
size
is
displayed
to
quickly
identify
larger
stations.
The
second
graph
plots
divvy
bikes
users
in
accordance
to
their
stations
locations.
The
frequency
of
rentals
is
plotted
using
a
darker
color
scale.
30.
30
Ø Data
Analysis:
The
first
graph
shows
the
geographical
distribution
of
the
Divvy
bike
stations
around
the
city
of
Chicago.
The
colors
indicate
the
docks
capacity
of
each
station.
The
red
color
stations
are
those
with
large
capacity
of
docks
namely
around:
Navy
Pier,
Millennium
Park
and
Union/Ogilvie
Station.
I’ve
also
added
a
contour
of
(level
7)
to
further
draw
a
smoother
look
to
the
map.
The
locations
with
larger
number
of
docks
indicate
more
rental
transactions
occur
there
and
therefore
two
possibilities:
either
stations
run
out
of
bikes
(high
demand
as
a
start
off)
or
stations
run
out
of
space
(high
returns
as
final
destination).
In
either
case,
increase
of
docks
space
was
deemed
necessary.
The
second
graph
includes
designing
the
geographical
map
by
the
“users-‐type”.
This
graph
enables
us
to
better
explore
commuters’
final
destinations.
It
appears
that
most
“customer”
bikers
commute
North
to
the
Lincoln
Park
Zoo
and
all
the
way
South
to
the
Museum
Campus
passing
by
the
Magnificent
Mile
and
Navy
Pier.
The
“Subscribers”
bikers
however
cluster
around
the
Loop
busy
area
and
mainly
around
the
Union/Ogilvie
Stations.
This
graph
helps
us
in
identifying
dense
locations
of
certain
divvy
users
and
therefore
might
utilize
this
information
for
marketing
or
customer
advocacy
purposes.
Ø History
of
revisions:
I
have
started
plotting
this
data
in
Tableau
where
it
nicely
plotted
the
data
points
(longitude
and
latitude)
on
the
map.
However,
I
could
not
move
much
further
from
there.
Therefore,
I
used
another
software
that
worked
better
with
GIS
data
in
order
to
build
a
more
detailed
map
using
OpenStreetMap.
The
2nd
version
included
some
unexpected
users-‐type
(dependent),
which
appeared
to
be
a
special
case.
A
recommendation
rose
during
the
final
presentation
to
exclude
this
category
as
it
was
supposed
to
be
cleaned
by
Divvy
Technical
Team.
Version
1
Version
2
31.
31
Summary
of
Team
Member
Contributions
Throughout
the
project,
the
team
collaborated
on
many
pieces
of
the
project,
including:
• Sharing
output
of
data
pre-‐processing
steps
• Suggesting
software
that
could
be
relevant
for
various
tasks
• Providing
feedback
on
visuals
• Reviewing
final
analysis
Each
of
the
three
team
members
was
the
primary
author
of
one
of
the
final
visualizations
(and
each
main
section
discussing
it).
Matt
Siedlecki:
made
the
traffic
by
weekday/weekend
and
time
of
week
patterns
using
small
multiples
Ricardo B. Lourenço: created the network map focused on Loop utilizing log-scaling to
summarize the full dataset
Hassan
A
Al
Alaiwi:
prepared
the
circular
network
visual,
as
well
as
several
related
pieces
of
analysis