• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Creating Added Value with Big Data
 

Creating Added Value with Big Data

on

  • 492 views

This talk essentially tells the story of the data science team at Massive Media, the company behind Netlog.com and Twoo.com. After obtaining invaluable first-hand experience in working with big data ...

This talk essentially tells the story of the data science team at Massive Media, the company behind Netlog.com and Twoo.com. After obtaining invaluable first-hand experience in working with big data as a member of the information retrieval team at the music discovery website Last.fm, I joined Massive Media to conceive, build and lead a brand new team around big data and data science for them. In doing so, I developed a pretty clear perspective on how to introduce big data within a company and create added value from it, which is precisely what I would like to share in this talk.

Statistics

Views

Total Views
492
Views on SlideShare
475
Embed Views
17

Actions

Likes
1
Downloads
6
Comments
0

3 Embeds 17

http://www.linkedin.com 8
https://www.linkedin.com 5
https://twitter.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Creating Added Value with Big Data Creating Added Value with Big Data Presentation Transcript

    • CREATINGADDED VALUEWITH BIG DATA by KLAAS BOSTEELS @klbostee
    • MY CAREER PATH SO FAR2007: Began working with big data as PhD student2009: Embarked on a data science career at Last.fm2011: Joined Massive Media as Lead Data Scientist Data company at heart; one of the earliest Hadoop adopters world- wide; inventors of Ketama; organised first “NoSQL” meetup in SF. Huge audience and tremendous potential, but data science newcomer at the time.
    • Second big product of Massive Media, after Netlog2011: Initial launch of Twoo.com2012: Biggest dating site world-wide on comScore2013: Massive Media acquired by InterActiveCorp
    • IT’S A BIG FAMILYIAC’s main personals brands:Some other well-known IAC brands:
    • STEP 1FOLLOW THE MONEY photo by Chris Isherwood
    • BOOTSTRAP BY SAVING OR GAINING MONEYYou need to get some capital to get startedSaving money tends to be easier in practiceReal-world example: • Analyzing CDN logs unveiled abuse • Stopping the abuse greatly reduced the bills
    • STEP 2EMBRACE HADOOP photo by Doug Kukurudza
    • HADOOPNot the holy grail, but deserves a central roleIt has a vibrant community and is proven to be: ECONOMICAL runs on commodity hardware SCALABLE smart distributed processing MAINTAINABLE very robust and fault-tolerant FLEXIBLE predefined schemas not required
    • STEP 3BUILD DASHBOARDS photo by Dawn Hopkins
    • STATS PIPELINE BASED ON HADOOP Log collector HDFS MapReduceDashboards HBase in batches continuous
    • STATS PIPELINE BASED ON HADOOPCfr. “lambdaarchitecture” Log collector coined by@nathanmarz HDFS Realtime processing MapReduce Dashboards HBase in batches continuous
    • STATS PIPELINE BASED ON HADOOPCfr. “lambdaarchitecture” Log collector coined by@nathanmarz HDFS Realtime Ad-hoc processing results MapReduce Dashboards HBase in batches continuous
    • CUSTOM-TAILORED WEB INTERFACE Annotation & exporting functionality Supports A/B testing and cohort analysis Various other nifty extra’s
    • STEP 4ASSEMBLE A TEAM photo by Jean-François Schmitz
    • THE SECRET IS IN THE MIXHadoop’s tricks also apply to data science teams • Avoid specialisation to allow easy distribution and scaling • Exploit data locality by hiring people with wide skill setGreat Data Scientists have the right mix of skills • Hackers with solid technical background • Analytical mind that knows statistics and machine learning • Clever and creative in everything they do
    • CHEAPER TECH MAKES PEOPLE MORE EXPENSIVEGraph by Trifacta. Source: John C. McCallum, Wikipedia and Federal Reserve Bank of St Louis. Inflation adjusted to 2011 dollars.
    • STEP 5EXPLORE & INNOVATE photo by NASAr
    • SOME TIPS AND TRICKSDare to fail and/or start from estimatesIntroduce data exploration/innovation days • Basically 20% time devoted to playing with data • Incorporate collaborative brainstorming • Goal is to find promising new projects to work onCommunicate findings to the rest of the company • Fun and silliness are allowed • Prototype early and often
    • PRODUCT INSIGHTS & EXTENSIONS E.g. recommendations and activity patterns analysis
    • CUTE OBSERVATIONS FOR PRhttp://www.twoo.com/blog/2012/04/twoos-great-global-vocabulary-experiment
    • FIVE SIMPLE STEPS IS ALL IT TAKES1 FOLLOW THE MONEY2 EMBRACE HADOOP3 BUILD DASHBOARDS4 ASSEMBLE A TEAM5 EXPLORE & INNOVATE
    • FIVE SIMPLE STEPS IS ALL IT TAKES1 FOLLOW THE MONEY2 EMBRACE HADOOP Thanks!3 BUILD DASHBOARDS Questions?4 ASSEMBLE A TEAM5 EXPLORE & INNOVATE