More Related Content
Similar to Getting Started with Big Data
Similar to Getting Started with Big Data (20)
Getting Started with Big Data
- 1. A
Big
Data
Primer
Stacia Misner
E-mail: smisner@datainspirations.com
Twitter: @StaciaMisner
Blog: blog.datainspirations.com
- 2. Session
Overview
• What’s
the
Fuss?
• What’s
in
the
Big
Data
Stack?
• Where
Do
I
Start?
2 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 3. What’s
the
Fuss?
• Some
Background…
• Classic
Data
Analysis
versus
Big
Data
• Why
Now?
• Why
Bother?
3 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 4. Some
Background…
Google Trends: “Big Data”
4 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 5. Has
Big
Data
Jumped
the
Shark?
Volume
Velocity
Variety
Variability
5 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 6. Is
Big
Data
the
Next
Fron;er?
6 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 7. Classic
Data
Analysis
…Uses
Just
a
Subset
Data Warehouse &
BI Solutions
ETL
7 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 8. Classic
Data
Analysis
…Requires
Structure
Data Warehouse &
BI Solutions
ETL
8 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 10. Big
Data
versus
Tradi;onal
BI
http://blogs.forrester.com/brian_hopkins/11-08-29-big_data_brewer_and_a_couple_of_webinars
10 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 11. Why
Now?
The
Times…
They
Are
A’Changin’
Cost of Storage Decreasing
1970 1 TB $1,000,000 2013 1 TB < $100
Direct attached storage,
not Enterprise SAN!
11 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 12. The
Times…
They
Are
A’Changin’
Data Volumes Increasing
All Books 15 TB Daily Tweets 15 TB
12 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 13. The
Times…
They
Are
A’Changin’
Processing Power Increasing
Then… Now…
10 Years 1 Week
Completed in 2003 At 1/10th the Cost
3 Billion Base Pairs to Analyze
13 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 14. Why
Now?
Powerful, Scalable, Cheap, Elasticity
14 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 15. Why
Bother?
• Make
more
data
available
faster
• Deliver
access
to
more
detailed,
accurate
informa;on
to
adjust
just-‐in-‐;me
• Segment
customers
at
more
granular
level
for
personaliza;on
of
products
and
services
http://
• Perform
more
sophis;cated
analy;cs
wiki.apache.
org/hadoop/
• Improve
products
PoweredBy
Case Study
Customer,
Product,
Promo4on
Data
-‐>
Personalized
Promo4ons
Before
Big
Data
A[er
Big
Data
8
weeks
1
week
and
dropping
15 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 16. What’s
In
the
Big
Data
Stack?
• Key
Differences
• Hadoop
Ecosystem
• Hadoop
and
Analysis
Services
16 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 17. Key
Differences
Basically
Available
Soft-state
Eventually
consistent
Scale Out As Needed Impose Schema
With Commodity Hardware On Read
17 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 18. Hadoop
Ecosystem
Note: This is only a
subset of ecosystem!
MapReduce
HDFS
18 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 19. Problem
to
Solve
• Elas;city
o Ability
to
analyze
structured,
unstructured
data
o DW
imposes
structure
for
ques;ons
we
know
we
want
answered
o Need
ability
to
incorporate
other
types
of
data
on
demand
• Scale
o Low
cost
commodity
hardware
o Distributed
workload
19 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 20. Hadoop
&
Analysis
Services
–
High
Latency
20 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 21. Hadoop
&
Analysis
Services-‐
Medium
Latency
Linked Server
HiveODBC driver
21 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 22. Hadoop
&
Analysis
Services-‐
Medium
Latency
Analysis Management Objects
(AMO) to push data into SSAS
22 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 23. Hadoop
&
Analysis
Services-‐Low
Latency
Options:
• Impala (Cloudera)
• Spark and Shark (UC Berkeley)
• Stinger (Hortonworks)
23 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 24. Where
Do
I
Start?
• Big
Data
Lifecycle
• Approaches
24 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 25. Look at internal/external
Big
Data
Lifecycle
processes –
What is a challenge?
Where could overwhelming
advantage be useful?
Discovery
Formulate hypothesis
Data
Produc;on
Prepara;on
Result
Communica;on
Model
Planning
Model
Building
25 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 26. Big
Data
Business
Models
26 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 27. Big
Data
Lifecycle
Explore the data in a sandbox
Discovery
Condition the data
Data
Produc;on
Prepara;on
Result
Communica;on
Model
Planning
Model
Building
27 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 28. Big
Data
Lifecycle
Discovery
Data
Produc;on
Prepara;on
Result
Communica;on
Model
Planning
Decide on methods and models
Examine data for key variables
Model
Building
28 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 29. Big
Data
Lifecycle
Discovery
Data
Produc;on
Prepara;on
Result
Communica;on
Model
Planning
Create data sets for testing,
training, and production Model
Building
Set up hardware environment
29 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 30. Big
Data
Lifecycle
Discovery
Data
Produc;on
Prepara;on
Validate (or not) hypothesis
Share findings
Result
Communica;on
Model
Planning
Model
Building
30 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 31. Big
Data
Lifecycle
Pilot project Discovery
Operationalize
Data
Produc;on
Prepara;on
Result
Communica;on
Model
Planning
Model
Building
31 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 32. Approaches
–
Store
and
Analyze
• Integrate
and
consolidate
o Becer
data
quality
o Access
to
history
o Higher
storage
requirements
and
latency
impact
• Choose
hardware
o Massively
Parallel
Processing
(PDW)
o Tabular
–
data
compression
o RDBMS
–
column-‐store
o NoSQL
–
mul;ple
variable
data
sources
• Analyze
data
at
rest
32 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 33. Approaches
–
Analyze
and
Store
• Filter
and
aggregate
data
before
adding
to
DW
o Reduce
ac;on
;me
(receipt
of
raw
data
to
decision
point)
to
acain
greater
business
agility
o Lower
storage
and
administra;ve
overhead
• Analyze
data
in
mo;on
(complex
event
processing)
33 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 34. Overwhelmed?
Prototype
First!
• Define
a
small
project
–
focus
on
one
product,
for
example
• Capture
data
for
the
subset
of
focus
for
limited
dura;on
(one
month)
• Take
ac;on
on
analy;cs
and
measure
resul;ng
change
http://www.microsoft.com/bigdata
34 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 35. Session
Review
• What’s
the
Fuss?
• What’s
in
the
Big
Data
Stack?
• Where
Do
I
Start?
35 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 36. Resources
• Big
data
has
jumped
the
shark
(9/11/2011)
o www.dbms2.com/2011/09/11/big-‐data-‐has-‐jumped-‐the-‐
shark/
• Big
data:
The
next
fron;er
for
innova;on,
compe;;on,
and
produc;vity
(aka
The
McKinsey
report)
o hcp://www.mckinsey.com/Insights/MGI/Research/
Technology_and_Innova;on/
Big_data_The_next_fron;er_for_innova;on
• What
a
Big
Data
Model
Looks
Like
o hcp://blogs.hbr.org/cs/2012/12/what_a_big-‐
data_business_model.html
36 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.
- 37. Resources
•
Architectures
for
Running
SSAS
on
Data
in
Hadoop
Hive
o hcp://thinknook.com/architectures-‐for-‐running-‐sql-‐
server-‐analysis-‐service-‐ssas-‐on-‐data-‐in-‐hadoop-‐
hive-‐2013-‐02-‐25/
37 Copyright
©
2013
by
Data
Inspira;ons
Inc.
All
rights
reserved.