Big Insights with Big Data
Actionable Leads
Daniel Marcous
Google, Waze, Data Wizard
dmarcous@google.com
Data is Everything.
Apps (and companies) win or lose based on how they use it.
Organize the world’s
information and make it
universally accessible
and useful.
Google’s Mission
3
“
Google excels at collecting, storing, and extracting value from
big quantities of data
Google is a (Big) Data Company
Google in just 1 minute:
1000 new
devices
3M Searches 100 Hours
1B Activated
Devices
100M GB
Search
Content
10+ Years of Tackling Big Data Problems
6
Google
Papers
20082002 2004 2006 2010 2012 2014 2015
GFS
Map
Reduce
Flume
Java
Millwheel
Open
Source
2005
Google
Cloud
Product
s
BigQuery Pub/Sub Dataflow Bigtable
BigTable Dremel PubSub
Apache
Beam
Tensorflow
“Google is living a few years in the
future and sending the rest of us
messages”
Doug Cutting, Hadoop Co-Creator
Capture ProcessStore Analyze
Data on your terms
Capture ProcessStore Analyze
Data on your terms
Machine Learning
“Machine Learning is concerned with computer
programs that automatically improve their
performance through experience. “
Herbert Simon
Turing Award 1975
Nobel Prize in Economics 1978
Machine Learning
“A breakthrough in machine learning would be worth
ten Microsofts” (Bill Gates, Chairman, Microsoft)
“Machine learning is the next Internet”
(Tony Tether, Director, DARPA)
“Machine learning is the hot new thing”
(John Hennessy, President, Stanford)
Google is no stranger to ML
So what do you actually do?
Gain Actionable Insights!
Trending Locations - Hilton TLV
Trending Locations / Day of Week Breakdown
Opening Hours Inference
Optimising - Ad clicks / Time from drive start
Time to Content (US) - Day of week / Category
Irregular Events / Anomaly Detection
Major events, causing out of the ordinary traffic/road blocks etc’ affecting large
numbers of users.
Dangerous Places - Clustering
Find most dangerous areas / streets, using custom developed clustering algorithms
● Alert authorities / users
● Compare & share with 3rd parties (NYPD)
Server Distribution Optimisation
Calculate the optimal routing servers distribution according to geographical load.
● Better experience - faster response time
● Saves money - no need for redundant elastic scaling of servers
Text Mining - Topic Analysis
Topic 1 - ETA Topic 2 - Unusual Topic 3 - Share info Topic 4 - Reports Topic 5 - Jams Topic 6 -Voice
wazers usual road social still morgan
eta traffic driving drivers will ang
con stay info reporting update freeman
zona today using helped drive kanan
usando times area nearby delay voice
real clear realtime traffic add meter
tiempo slower sharing jam jammed kan
carretera accident soci drive near masuk
Text Mining - New Version Impressions
● Text analysis - stemming / stopword detection etc.
● Topic modeling
● Sentiment analysis
Waze V4 update :
● Good - “redesign”, ”smarter”, “cleaner”, “improved”
● Bad - “stuck”
Overall very positive score!
Text Mining - Store Sentiments
Text Mining - Sentiment by Time & Place
Daniel Marcous
dmarcous@google.com
dmarcous@gmail.com

Big Data - Big Insights - Waze @Google

  • 1.
    Big Insights withBig Data Actionable Leads Daniel Marcous Google, Waze, Data Wizard dmarcous@google.com
  • 2.
    Data is Everything. Apps(and companies) win or lose based on how they use it.
  • 3.
    Organize the world’s informationand make it universally accessible and useful. Google’s Mission 3 “
  • 4.
    Google excels atcollecting, storing, and extracting value from big quantities of data Google is a (Big) Data Company
  • 5.
    Google in just1 minute: 1000 new devices 3M Searches 100 Hours 1B Activated Devices 100M GB Search Content
  • 6.
    10+ Years ofTackling Big Data Problems 6 Google Papers 20082002 2004 2006 2010 2012 2014 2015 GFS Map Reduce Flume Java Millwheel Open Source 2005 Google Cloud Product s BigQuery Pub/Sub Dataflow Bigtable BigTable Dremel PubSub Apache Beam Tensorflow
  • 7.
    “Google is livinga few years in the future and sending the rest of us messages” Doug Cutting, Hadoop Co-Creator
  • 8.
  • 9.
  • 11.
    Machine Learning “Machine Learningis concerned with computer programs that automatically improve their performance through experience. “ Herbert Simon Turing Award 1975 Nobel Prize in Economics 1978
  • 12.
    Machine Learning “A breakthroughin machine learning would be worth ten Microsofts” (Bill Gates, Chairman, Microsoft) “Machine learning is the next Internet” (Tony Tether, Director, DARPA) “Machine learning is the hot new thing” (John Hennessy, President, Stanford)
  • 13.
    Google is nostranger to ML
  • 14.
    So what doyou actually do? Gain Actionable Insights!
  • 15.
  • 16.
    Trending Locations /Day of Week Breakdown
  • 17.
  • 18.
    Optimising - Adclicks / Time from drive start
  • 19.
    Time to Content(US) - Day of week / Category
  • 20.
    Irregular Events /Anomaly Detection Major events, causing out of the ordinary traffic/road blocks etc’ affecting large numbers of users.
  • 21.
    Dangerous Places -Clustering Find most dangerous areas / streets, using custom developed clustering algorithms ● Alert authorities / users ● Compare & share with 3rd parties (NYPD)
  • 22.
    Server Distribution Optimisation Calculatethe optimal routing servers distribution according to geographical load. ● Better experience - faster response time ● Saves money - no need for redundant elastic scaling of servers
  • 23.
    Text Mining -Topic Analysis Topic 1 - ETA Topic 2 - Unusual Topic 3 - Share info Topic 4 - Reports Topic 5 - Jams Topic 6 -Voice wazers usual road social still morgan eta traffic driving drivers will ang con stay info reporting update freeman zona today using helped drive kanan usando times area nearby delay voice real clear realtime traffic add meter tiempo slower sharing jam jammed kan carretera accident soci drive near masuk
  • 24.
    Text Mining -New Version Impressions ● Text analysis - stemming / stopword detection etc. ● Topic modeling ● Sentiment analysis Waze V4 update : ● Good - “redesign”, ”smarter”, “cleaner”, “improved” ● Bad - “stuck” Overall very positive score!
  • 25.
    Text Mining -Store Sentiments
  • 26.
    Text Mining -Sentiment by Time & Place
  • 29.