This document provides 4 pieces of advice for big data initiatives:
1. Don't get bogged down in technology and focus on business goals.
2. Find a data quality and enrichment provider to help with complex data challenges.
3. Be realistic about what can be achieved given investment and data constraints.
4. Scale is expensive, so sample data when possible to reduce costs and complexity.
4. Complex Environment
4
DataScience
Big Data
Predictive Analysis
Machine Learning
Marketing Analytics
Sales Analytics
Columnar Data Bases
DataCubes
Hadoop
Hive
Spark
PigImpala
ETL
WebAnalytics
Churn
Segmentation
Clustering
Drill
Propensity
Uplift
Business Intelligence
Chief Intelligence Officer
Data Warehouse
InformationValuation
Entity Linkage
De-duplication
ImmutableStore
MesosSupervised
Un-supervised
Non-parametric
5. Big Data
Gartner believes big data is neither a technology
nor a distinct and uniquely measured market of
products. We believe it is a phenomenon brought
about by rapid data growth, complex new data
types and parallel advancements in technology,
all combining to enable people to analyze
information in new ways to produce more useful
insights about the world around them.
Brownelles November 14, 20125
7. What is changing?
Brownelles November 14, 20127
Experts
Intermediate
Beginners
A Few Tens Hundreds
Many
Algorithms
Experimental
Value Focused
Audience
Data Sources
8. Complexity and Direction of Predictive Big Data
A few learnings that may increase you likely hood of
success.
9. 1st (4) Advice: Don’t get bogged down
in technology.
9
Data Access (Query Expressiveness)
Scale
HDFS
HBase
ParAccelRedShift
Cassandra CouchBase
Cascading
Riak
MySQL
Vertica
InfoBright
VectorWise
Spark
CitusData
WibiData
Phoenix
MSSQL
MSAS Mahout
Map/Reduce
R MatLab
SciPy
Snow
Hive
Impala
Drill Pig
10. 2nd (4) Advice: Find a DQE provider
Brownelles November 14, 201210
Complex
Entity linkage
Fuzzy matching
External data
De duplication
Repetitive
&
Scale
Continous
Lots of data
Common
Necessary
but not
unique
11. 3rd(4) Advice: Be Realistic
Brownelles November 14, 201211
Narrow solution Customized
Low Investment
High Investment
*Size Indicates Return
12. 4th(4) Advice: Scale is expensive,
sample when you can.
12
http://www.agilone.com/email-marketing/what-you-shouldnt-need-to-know-about-big-data-and-machine-learning/
Relation
Simple Complex Noisy Biased
Sample Big Data Overkill ✓ ✓ N/A
Large Overkill ✓ ✓ ≈✓
Small ✓ ✗ ✗ ✗
Data set of
Learning Scoring
Propensity to buy Sample Complete
Customer clustering Sample Complete
Customer segmentation Sample Complete
U2P Recommendation Sample Complete
P2P Recommendations Complete Complete
13. Bonus Advice: Orchestration is a ….
1
13
Batch Real-timeDead-line-time Speed-of-thought
Eventual
L Revenue
impact
*Size indicates # of customer
immediately impacted
M Revenue
impact
S Revenue
impact