PART TIME SPARK USER
Rajiv Shah
www.rajivshah.com
Chicago Spark Users Meetup
Nov 5, 2015
ROADMAP
• Status of spark
• My take
• Examples
status of spark
Strata+Hadoop mentions of spark
Cloudera Blog Post on Sparkling Water
http://blog.cloudera.com/blog/2015/10/how-to-build-a-machine-learning-app-using-sparkling-water-and-a
my personal take
Insufficient Algorithms
http://projects.rajivshah.com/shiny/outlier/
surfing for algorithms
ML - MLLIB
http://spark.apache.org/docs/latest/mllib-
guide.html
Language Schizophrenia
Scala, Python, R
Lack of Documentation
Difficult to tune
Not for small or big data
USING SPARK
Spark makes the impossible,
possible
Spark is hard
COOL THINGS ABOUT
SPARK
• Scales up
• Streaming
• Enterprise worthy
• It looks like it will play nice
SUGGESTIONS
• Get data engineers that will work with your
data scientists
• If you can’t take advantage of spark’s
strengths, don't use it
EXAMPLES
• Spark streaming - Streaming Kmeans
clustering
• Anomaly Detection using H2O
• Recommenders

Using Spark Part Time

Editor's Notes

  • #2 Perspective on spark - not a typical technical talk Should all leave with a better appreciation of how spark fits into data science at this point in time
  • #5 IBM’s announcement - hiring of spark users donate system ML thousands of its own researchers on spark training its leaders in using spark ML educate more than one million data scientists and data engineers on Spark. 630,000 PMP
  • #6 http://conferences.oreilly.com/strata/big-data-conference-ca-2015/public/schedule/full/public
  • #7 http://blog.cloudera.com/blog/2015/10/how-to-build-a-machine-learning-app-using-sparkling-water-and-apache-spark http://h2o.ai/blog/2015/04/deep-learning-public-safety/ http://www.kdnuggets.com/2015/04/deep-learning-fight-crime.html every cliche spark, h2o, deep learning, visualizations, especially geographic we are missing a D3/javascript and we even have weather - IBM’s annomucment, weather in every model worst kind of data science analysis for the sake of analysis also, look at the difference between code in the post and the actual code