Small cars areSmall cars are
dangerous!dangerous!
Willem Hendriks
Data Scientist IBM
willem.hendriks@nl.ibm.com
https://github.com/willemhendriks
Nice to be in
Groningen again!
“More data usually beats
better algorithms”Anand Rajaraman (when teaching at Stanford)
http://anand.typepad.com/datawocky/2008/03/more-data-usual.html
What I learned in Groningen...What I learned in Groningen... What I am doing now...What I am doing now...
Parallel Computing is not easy....
Google Trends of “Apache Spark”
Apache Spark™ is a fast
and general engine for
large-scale data
processing.
Why Spark?
(4) Nice library!
Is it really that easy &Is it really that easy &
quick?quick?
Best deal if you want a Mercedes MLBest deal if you want a Mercedes ML Best place to have dinner in BrusselsBest place to have dinner in Brussels
(and have a walk afterwards)(and have a walk afterwards)
Let's combine police reports datasets &
marktplaats advertisements...
(not big data, just a toy example of spark)
Do thieves like certain
neighborhoods with certain
items?
Download advertisement data with script
Find postal code of each neighborhood
Combine in Apache Spark
Scale models are an indication for burglary!
Check markplaats.nl if more than 70
advertisements are in a radius of 600
meter!!!!
Maybe markplaats.nl advertisements can predict.....
House-pricing trends? Crime? Education level?
They have something!!!
If you were asked to build a model, on the Netherlands, what
tool would you use?
*dataset too small to make this statement
Try yourself! (GB's limited)Try yourself! (GB's limited)
●
Mix with various Services, e.g. Hadoop/NoSQLMix with various Services, e.g. Hadoop/NoSQL
●
Free Trial & Paid (with Serious Power)Free Trial & Paid (with Serious Power)
●
Made for the App DeveloperMade for the App Developer
● Run Spark Online
● (Various) Notebook, to use for Python, Scala, & R
● Free, perfect to start & learn! (examples)
● Made for the Data Scientist
Try yourself! (GB's limited)Try yourself! (GB's limited)
IBM Will: “Educate one million data scientistsIBM Will: “Educate one million data scientists
and data engineers on Apache Spark throughand data engineers on Apache Spark through
extensive partnerships with AMPLab,extensive partnerships with AMPLab,
DataCamp, MetiStream, Galvanize and BigDataCamp, MetiStream, Galvanize and Big
Data University MOOC”Data University MOOC”
Join us, & start today at the BIG DATAJoin us, & start today at the BIG DATA
University! https://bigdatauniversity.com/University! https://bigdatauniversity.com/
Spark Hackathon Coming soon in NL!Spark Hackathon Coming soon in NL!
IBM Wants YOU to learn
spark!
Questions about...Questions about...
Start with Spark?Start with Spark?
IBM & Spark?IBM & Spark?
Markplaats.nl?Markplaats.nl?
Code will be on GithubCode will be on Github
(after cleaning)(after cleaning)
Thank you!Thank you!
Willem Hendriks
06 2240 8900
Data Scientist IBM
willem.hendriks@nl.ibm.com
https://github.com/willemhendriks

Big data groningen

  • 1.
    Small cars areSmallcars are dangerous!dangerous! Willem Hendriks Data Scientist IBM willem.hendriks@nl.ibm.com https://github.com/willemhendriks
  • 2.
    Nice to bein Groningen again!
  • 3.
    “More data usuallybeats better algorithms”Anand Rajaraman (when teaching at Stanford) http://anand.typepad.com/datawocky/2008/03/more-data-usual.html What I learned in Groningen...What I learned in Groningen... What I am doing now...What I am doing now...
  • 4.
  • 5.
    Google Trends of“Apache Spark” Apache Spark™ is a fast and general engine for large-scale data processing.
  • 6.
  • 7.
    Is it reallythat easy &Is it really that easy & quick?quick? Best deal if you want a Mercedes MLBest deal if you want a Mercedes ML Best place to have dinner in BrusselsBest place to have dinner in Brussels (and have a walk afterwards)(and have a walk afterwards)
  • 8.
    Let's combine policereports datasets & marktplaats advertisements... (not big data, just a toy example of spark) Do thieves like certain neighborhoods with certain items?
  • 9.
    Download advertisement datawith script Find postal code of each neighborhood Combine in Apache Spark
  • 10.
    Scale models arean indication for burglary! Check markplaats.nl if more than 70 advertisements are in a radius of 600 meter!!!! Maybe markplaats.nl advertisements can predict..... House-pricing trends? Crime? Education level? They have something!!! If you were asked to build a model, on the Netherlands, what tool would you use? *dataset too small to make this statement
  • 11.
    Try yourself! (GB'slimited)Try yourself! (GB's limited) ● Mix with various Services, e.g. Hadoop/NoSQLMix with various Services, e.g. Hadoop/NoSQL ● Free Trial & Paid (with Serious Power)Free Trial & Paid (with Serious Power) ● Made for the App DeveloperMade for the App Developer
  • 12.
    ● Run SparkOnline ● (Various) Notebook, to use for Python, Scala, & R ● Free, perfect to start & learn! (examples) ● Made for the Data Scientist Try yourself! (GB's limited)Try yourself! (GB's limited)
  • 13.
    IBM Will: “Educateone million data scientistsIBM Will: “Educate one million data scientists and data engineers on Apache Spark throughand data engineers on Apache Spark through extensive partnerships with AMPLab,extensive partnerships with AMPLab, DataCamp, MetiStream, Galvanize and BigDataCamp, MetiStream, Galvanize and Big Data University MOOC”Data University MOOC” Join us, & start today at the BIG DATAJoin us, & start today at the BIG DATA University! https://bigdatauniversity.com/University! https://bigdatauniversity.com/ Spark Hackathon Coming soon in NL!Spark Hackathon Coming soon in NL! IBM Wants YOU to learn spark!
  • 14.
    Questions about...Questions about... Startwith Spark?Start with Spark? IBM & Spark?IBM & Spark? Markplaats.nl?Markplaats.nl? Code will be on GithubCode will be on Github (after cleaning)(after cleaning) Thank you!Thank you! Willem Hendriks 06 2240 8900 Data Scientist IBM willem.hendriks@nl.ibm.com https://github.com/willemhendriks