Creating Added Value with Big Data

787 views

Published on

This talk essentially tells the story of the data science team at Massive Media, the company behind Netlog.com and Twoo.com. After obtaining invaluable first-hand experience in working with big data as a member of the information retrieval team at the music discovery website Last.fm, I joined Massive Media to conceive, build and lead a brand new team around big data and data science for them. In doing so, I developed a pretty clear perspective on how to introduce big data within a company and create added value from it, which is precisely what I would like to share in this talk.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
787
On SlideShare
0
From Embeds
0
Number of Embeds
33
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Creating Added Value with Big Data

  1. 1. CREATINGADDED VALUEWITH BIG DATA by KLAAS BOSTEELS @klbostee
  2. 2. MY CAREER PATH SO FAR2007: Began working with big data as PhD student2009: Embarked on a data science career at Last.fm2011: Joined Massive Media as Lead Data Scientist Data company at heart; one of the earliest Hadoop adopters world- wide; inventors of Ketama; organised first “NoSQL” meetup in SF. Huge audience and tremendous potential, but data science newcomer at the time.
  3. 3. Second big product of Massive Media, after Netlog2011: Initial launch of Twoo.com2012: Biggest dating site world-wide on comScore2013: Massive Media acquired by InterActiveCorp
  4. 4. IT’S A BIG FAMILYIAC’s main personals brands:Some other well-known IAC brands:
  5. 5. STEP 1FOLLOW THE MONEY photo by Chris Isherwood
  6. 6. BOOTSTRAP BY SAVING OR GAINING MONEYYou need to get some capital to get startedSaving money tends to be easier in practiceReal-world example: • Analyzing CDN logs unveiled abuse • Stopping the abuse greatly reduced the bills
  7. 7. STEP 2EMBRACE HADOOP photo by Doug Kukurudza
  8. 8. HADOOPNot the holy grail, but deserves a central roleIt has a vibrant community and is proven to be: ECONOMICAL runs on commodity hardware SCALABLE smart distributed processing MAINTAINABLE very robust and fault-tolerant FLEXIBLE predefined schemas not required
  9. 9. STEP 3BUILD DASHBOARDS photo by Dawn Hopkins
  10. 10. STATS PIPELINE BASED ON HADOOP Log collector HDFS MapReduceDashboards HBase in batches continuous
  11. 11. STATS PIPELINE BASED ON HADOOPCfr. “lambdaarchitecture” Log collector coined by@nathanmarz HDFS Realtime processing MapReduce Dashboards HBase in batches continuous
  12. 12. STATS PIPELINE BASED ON HADOOPCfr. “lambdaarchitecture” Log collector coined by@nathanmarz HDFS Realtime Ad-hoc processing results MapReduce Dashboards HBase in batches continuous
  13. 13. CUSTOM-TAILORED WEB INTERFACE Annotation & exporting functionality Supports A/B testing and cohort analysis Various other nifty extra’s
  14. 14. STEP 4ASSEMBLE A TEAM photo by Jean-François Schmitz
  15. 15. THE SECRET IS IN THE MIXHadoop’s tricks also apply to data science teams • Avoid specialisation to allow easy distribution and scaling • Exploit data locality by hiring people with wide skill setGreat Data Scientists have the right mix of skills • Hackers with solid technical background • Analytical mind that knows statistics and machine learning • Clever and creative in everything they do
  16. 16. CHEAPER TECH MAKES PEOPLE MORE EXPENSIVEGraph by Trifacta. Source: John C. McCallum, Wikipedia and Federal Reserve Bank of St Louis. Inflation adjusted to 2011 dollars.
  17. 17. STEP 5EXPLORE & INNOVATE photo by NASAr
  18. 18. SOME TIPS AND TRICKSDare to fail and/or start from estimatesIntroduce data exploration/innovation days • Basically 20% time devoted to playing with data • Incorporate collaborative brainstorming • Goal is to find promising new projects to work onCommunicate findings to the rest of the company • Fun and silliness are allowed • Prototype early and often
  19. 19. PRODUCT INSIGHTS & EXTENSIONS E.g. recommendations and activity patterns analysis
  20. 20. CUTE OBSERVATIONS FOR PRhttp://www.twoo.com/blog/2012/04/twoos-great-global-vocabulary-experiment
  21. 21. FIVE SIMPLE STEPS IS ALL IT TAKES1 FOLLOW THE MONEY2 EMBRACE HADOOP3 BUILD DASHBOARDS4 ASSEMBLE A TEAM5 EXPLORE & INNOVATE
  22. 22. FIVE SIMPLE STEPS IS ALL IT TAKES1 FOLLOW THE MONEY2 EMBRACE HADOOP Thanks!3 BUILD DASHBOARDS Questions?4 ASSEMBLE A TEAM5 EXPLORE & INNOVATE

×