Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013.


Published on

Sharing why it is hard to succeed with Big Data/Predictive projects in terms of productionalizing them what you can do to reduce risk while take is steps in the right direction.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013.

  1. 1. 4 Advice for your Big Data initiative Jari Koister Talk at IEEE Big Data/Cloud Conference, June 28th 2013
  2. 2. Complexity and Direction of Predictive Big Data A few learning that may increase you likely hood of success.
  3. 3. Infochimps about challenges…. Brownelles November 14, 20123
  4. 4. Complex Environment 4 DataScience Big Data Predictive Analysis Machine Learning Marketing Analytics Sales Analytics Columnar Data Bases DataCubes Hadoop Hive Spark PigImpala ETL WebAnalytics Churn Segmentation Clustering Drill Propensity Uplift Business Intelligence Chief Intelligence Officer Data Warehouse InformationValuation Entity Linkage De-duplication ImmutableStore MesosSupervised Un-supervised Non-parametric
  5. 5. Big Data Gartner believes big data is neither a technology nor a distinct and uniquely measured market of products. We believe it is a phenomenon brought about by rapid data growth, complex new data types and parallel advancements in technology, all combining to enable people to analyze information in new ways to produce more useful insights about the world around them. Brownelles November 14, 20125
  6. 6. 6 Hype, Maturity, Potential… Gartner Hypercycle for Big Data, 2012
  7. 7. What is changing? Brownelles November 14, 20127 Experts Intermediate Beginners A Few Tens Hundreds Many Algorithms Experimental Value Focused Audience Data Sources
  8. 8. Complexity and Direction of Predictive Big Data A few learnings that may increase you likely hood of success.
  9. 9. 1st (4) Advice: Don’t get bogged down in technology. 9 Data Access (Query Expressiveness) Scale HDFS HBase ParAccelRedShift Cassandra CouchBase Cascading Riak MySQL Vertica InfoBright VectorWise Spark CitusData WibiData Phoenix MSSQL MSAS Mahout Map/Reduce R MatLab SciPy Snow Hive Impala Drill Pig
  10. 10. 2nd (4) Advice: Find a DQE provider Brownelles November 14, 201210 Complex Entity linkage Fuzzy matching External data De duplication Repetitive & Scale Continous Lots of data Common Necessary but not unique
  11. 11. 3rd(4) Advice: Be Realistic Brownelles November 14, 201211 Narrow solution Customized Low Investment High Investment *Size Indicates Return
  12. 12. 4th(4) Advice: Scale is expensive, sample when you can. 12 Relation Simple Complex Noisy Biased Sample Big Data Overkill ✓ ✓ N/A Large Overkill ✓ ✓ ≈✓ Small ✓ ✗ ✗ ✗ Data set of Learning Scoring Propensity to buy Sample Complete Customer clustering Sample Complete Customer segmentation Sample Complete U2P Recommendation Sample Complete P2P Recommendations Complete Complete
  13. 13. Bonus Advice: Orchestration is a …. 1 13 Batch Real-timeDead-line-time Speed-of-thought Eventual L Revenue impact *Size indicates # of customer immediately impacted M Revenue impact S Revenue impact
  14. 14. Thank you for listening 14