Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mark Watkins Big Data Presentation


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Mark Watkins Big Data Presentation

  1. 1. BIG DATA AT TELENAVUSING DATA TO IMPROVE YOUR LIFEMark Watkins, general manager, entertainment content@viking2917 2/21/2012 © 2012 Telenav, Proprietary and Confidential 1
  2. 2. A PIONEER IN LOCATION SERVICES OUR GPSPublic company: $200M+ revenue, 11 NAVIGATION PARTNERS years in businessLeader in Personalized Mobile Navigation: 30MM+ subscribersLeader in Drive To Mobile Advertising: 750K local advertisersLeader in Mobile Distribution Platforms: 900+ devicesGrowing Global Carrier Audience Reach: 14 carriers in 29 countries 2
  3. 3. KEY PROBLEMS WE ARE WORKING ONTraffic & MappingLocal Search for businesses, events, points of interestLifestyle content & recommendation engineCombination of “traditional” big data processing, machine learning and proprietary algorithmsPeople are drowning in information – use “big data” signals to condense to something manageable
  4. 4. TRAFFIC & MAPSTraffic-aware routing engine – Navigation is core competency – 1.3B routes/trips since 2007Routes generate traffic/motion data – “probe data” from app (billions/month) – Anonymized & summarized to power routing – Persisted in aggregate form for historical traffic metricsUsed to augment Open Street Map – Turn restrictions, stop signs, road geometry – Deduced from probe patternsTechnology set – Hadoop + Hive
  5. 5. AUTOMATED DEVELOPMENT OF RICH LOCAL CONTENT(YOU MAY KNOW THIS AS GOBY) Categorized to taxonomy (“blues”, “hiking trails”) all entities geotagged OTHER FEATURES WORTH NOTING • automatic entity/place creation • aggregated ratings & reviews • proprietary result ranking formula venues automatically recognized; events • domain-specific metadata extraction mapped to venues • sorting by metadata (e.g. price, rating)
  6. 6. AUTOMATED DEVELOPMENT OF RICH LOCAL DATAData space is large, but not immense – Tens or Hundreds of millions (or smaller), not billionsBut very complex – Thousands of data sources – attribute space is 10,000 wide – E.g. how many holes in the golf course; how long is the hiking trail?Generates a large, sparse matrix – Ambiguous, conflicting data – Unstructured or semi-structured data – Need to recognize entities & merge/dedup
  7. 7. SOME LEARNINGSLots of data sources / signals generate “goodness” – Ranking, Confidence, importance, comprehensiveness“Interesting” ≠ “Most Popular”Frequency of occurrence Museum of Bad Art The Middle East NightclubFred’s dry cleaners Museum of Science 2/21/2012 © 2012 Telenav, Proprietary and Confidential 7
  8. 8. COMPOSITE, STRUCTURED LOCAL DATA 2/21/2012 © 2012 Telenav, Proprietary and Confidential 8
  9. 9. PERSONALIZED RECOMMENDATIONS 2/21/2012 © 2012 Telenav, Proprietary and Confidential 9
  10. 10. RECOMMENDATIONS – WORK IN PROGRESSKey signals – Personalized “interest graph” – “Drive to” data (where are people driving to?) – Entity-level “page rank” – Web/mobile clickstream dataIntegrated with social media – Facebook actions influencing recommendationsKey technology enablers – Large amounts of user-generated data – Proprietary algorithms; machine learning / SVM
  11. 11. TELENAV.COM – SCOUT 2/21/2012 © 2012 Telenav, Proprietary and Confidential 11