Big data groningen

Small cars areSmall cars are
dangerous!dangerous!
Willem Hendriks
Data Scientist IBM
willem.hendriks@nl.ibm.com
https://github.com/willemhendriks

Nice to be in
Groningen again!

“More data usually beats
better algorithms”Anand Rajaraman (when teaching at Stanford)
http://anand.typepad.com/datawocky/2008/03/more-data-usual.html
What I learned in Groningen...What I learned in Groningen... What I am doing now...What I am doing now...

Parallel Computing is not easy....

Google Trends of “Apache Spark”
Apache Spark™ is a fast
and general engine for
large-scale data
processing.

Is it really that easy &Is it really that easy &
quick?quick?
Best deal if you want a Mercedes MLBest deal if you want a Mercedes ML Best place to have dinner in BrusselsBest place to have dinner in Brussels
(and have a walk afterwards)(and have a walk afterwards)

Let's combine police reports datasets &
marktplaats advertisements...
(not big data, just a toy example of spark)
Do thieves like certain
neighborhoods with certain
items?

Download advertisement data with script
Find postal code of each neighborhood
Combine in Apache Spark

Scale models are an indication for burglary!
Check markplaats.nl if more than 70
advertisements are in a radius of 600
meter!!!!
Maybe markplaats.nl advertisements can predict.....
House-pricing trends? Crime? Education level?
They have something!!!
If you were asked to build a model, on the Netherlands, what
tool would you use?
*dataset too small to make this statement

Try yourself! (GB's limited)Try yourself! (GB's limited)
●
Mix with various Services, e.g. Hadoop/NoSQLMix with various Services, e.g. Hadoop/NoSQL
●
Free Trial & Paid (with Serious Power)Free Trial & Paid (with Serious Power)
●
Made for the App DeveloperMade for the App Developer

● Run Spark Online
● (Various) Notebook, to use for Python, Scala, & R
● Free, perfect to start & learn! (examples)
● Made for the Data Scientist
Try yourself! (GB's limited)Try yourself! (GB's limited)

IBM Will: “Educate one million data scientistsIBM Will: “Educate one million data scientists
and data engineers on Apache Spark throughand data engineers on Apache Spark through
extensive partnerships with AMPLab,extensive partnerships with AMPLab,
DataCamp, MetiStream, Galvanize and BigDataCamp, MetiStream, Galvanize and Big
Data University MOOC”Data University MOOC”
Join us, & start today at the BIG DATAJoin us, & start today at the BIG DATA
University! https://bigdatauniversity.com/University! https://bigdatauniversity.com/
Spark Hackathon Coming soon in NL!Spark Hackathon Coming soon in NL!
IBM Wants YOU to learn
spark!

Questions about...Questions about...
Start with Spark?Start with Spark?
IBM & Spark?IBM & Spark?
Markplaats.nl?Markplaats.nl?
Code will be on GithubCode will be on Github
(after cleaning)(after cleaning)
Thank you!Thank you!
Willem Hendriks
06 2240 8900
Data Scientist IBM
willem.hendriks@nl.ibm.com
https://github.com/willemhendriks

Big data groningen

More Related Content

What's hot

Viewers also liked

Similar to Big data groningen

Recently uploaded

Big data groningen