So,	
  What	
  Does	
  a	
  Data	
  Scien/st	
  do?	
  
       A	
  Data	
  Scien/st	
  in	
  the	
  Music	
  Industry	
  

                     Dr	
  Jameel	
  Syed	
  
                            March	
  2012	
  
                   h>p://jasyed.com/datascience/	
  
Overview	
  
–  Musicmetric	
  CTO	
  
–  InforSense	
  founding	
  member	
  
    •  PhD	
  in	
  Workflows	
  for	
  Life	
  Sciences	
  Analysis	
  
–  Co-­‐organiser	
  Big	
  Data	
  London	
  meetup	
  
Some	
  ques/ons...	
  
Music	
  has	
  moved	
  online	
  
•  The	
  world	
  has	
  changed	
  
    –  Do	
  you	
  buy	
  vinyl/tapes/CDs	
  of	
  music?	
  
    –  Do	
  you	
  buy	
  music	
  downloads?	
  
    –  Do	
  you	
  download	
  illegal	
  content	
  from	
  bi>orrent?	
  
    –  Do	
  you	
  listen	
  to	
  music	
  on	
  YouTube?	
  
    –  Do	
  you	
  “like”	
  bands	
  on	
  Facebook?	
  
    –  Do	
  you	
  subscribe	
  to	
  Spo/fy?	
  
    –  Do	
  you	
  listen	
  on	
  the	
  radio	
  to	
  the	
  weekly	
  charts	
  on	
  a	
  
       Sunday	
  aWernoon?	
  
•  What’s	
  happening	
  online?	
  
How	
  popular	
  am	
  I?	
  
Who	
  are	
  my	
  fans?	
  
Where	
  are	
  my	
  fans?	
  
What	
  is	
  the	
  press	
  saying?	
  
 Who	
  is	
  popular?	
  	
  
A	
  Data	
  Scien/st	
  in	
  the	
  Music	
  Industry	
  
•     Raw	
  Data	
  -­‐>	
  Derived	
  Data	
  -­‐>	
  Insight	
  
        –  Who	
  is	
  popular	
  right	
  now/in	
  the	
  immediate	
  future?	
  
        –  What	
  was	
  the	
  effect	
  of	
  appearing	
  at	
  a	
  fes/val?	
  
        –  Which	
  ar/sts	
  are	
  (becoming)	
  popular	
  with	
  listeners	
  
           with	
  certain	
  demographics	
  (in	
  a	
  region)?	
  
•     Data	
  processing,	
  machine	
  learning	
  &	
  sta/s/cal	
  
      methods	
  
        –    Sen/ment	
  analysis	
  
        –    Named	
  En/ty	
  Recogni/on	
  
        –    Ranking	
  
        –    Segmenta/on	
  
•     One-­‐offs	
  
        –  Infographics	
  and	
  microsites	
  for	
  events	
  
        –  Brand	
  alignment	
  via	
  demographics	
  
        –  Music	
  Hack	
  Days	
  
•     Product	
  
        –  Daily	
  charts	
  
        –  Sen/ment	
  scoring	
  web	
  crawled	
  reviews	
  
What	
  is	
  a	
  Data	
  Scien/st?	
  
Have	
  we	
  been	
  here	
  before?	
  
•    Sta/s/cian	
  
•    Data	
  Analyst	
  
•    Quan/ta/ve	
  analyst	
  
•    Bioinforma/cian	
  
•    Data	
  Miner	
  
•    Business	
  Intelligence	
  consultant	
  
•    Computa/onal	
  physicst	
  
A	
  Life	
  Sciences	
  digression...	
  
What’s	
  new?	
  
•  Data	
  provides	
  the	
  opportunity	
  
    –  Old:	
  Collect	
  and	
  store	
  data	
  presupposing	
  how	
  it	
  will	
  be	
  used	
  
    –  New:	
  Collect	
  raw	
  data	
  &	
  explore	
  which	
  deriva/ons	
  are	
  
       interes/ng;	
  integra/ng	
  data	
  from	
  mul/ple	
  online	
  sources.	
  
    –  Big	
  Data	
  technology	
  to	
  cope	
  with	
  data	
  volume	
  
•  Programming	
  is	
  essen/al	
  
    –  APIs	
  
    –  Heterogeneous	
  environment(s)	
  
•  Method	
  of	
  presenta/on	
  
    –  Infographics	
  
    –  Interac/ve	
  (web)	
  applica/ons	
  
    –  (Raw	
  data)	
  
Data	
  Scien/st	
  
•  “Jack	
  of	
  all	
  trades”	
  
    –  “Hacker”	
  mentality:	
  learn	
  new	
  technology	
  and	
  
       approaches	
  for	
  a	
  project	
  on	
  short	
  no/ce	
  
    –  Crea/ve	
  self-­‐starters	
  
    –  Work	
  alongside	
  other	
  experts	
  (data,	
  domain,	
  
       soWware	
  engineering)	
  
A	
  Data	
  Scien/st	
  is	
  good	
  at	
  knieng?	
  
•  Not	
  building	
  from	
  scratch,	
  knieng	
  together	
  pre-­‐exis/ng	
  parts	
  

•  Data	
  
     –  Databases	
  (rela/onal/NoSQL)	
  
     –  Files	
  
     –  APIs	
  
•  Algorithms	
  
     –  Open	
  source	
  libraries	
  
     –  Off	
  the	
  shelf	
  tools	
  
•  Compute	
  
     –  Linux	
  
     –  AWS?	
  
•  Languages	
  
     –  Many,	
  especially	
  “scrip/ng”	
  languages	
  

A Data Scientist in the Music Industry

  • 1.
    So,  What  Does  a  Data  Scien/st  do?   A  Data  Scien/st  in  the  Music  Industry   Dr  Jameel  Syed   March  2012   h>p://jasyed.com/datascience/  
  • 2.
    Overview   –  Musicmetric  CTO   –  InforSense  founding  member   •  PhD  in  Workflows  for  Life  Sciences  Analysis   –  Co-­‐organiser  Big  Data  London  meetup  
  • 3.
  • 4.
    Music  has  moved  online   •  The  world  has  changed   –  Do  you  buy  vinyl/tapes/CDs  of  music?   –  Do  you  buy  music  downloads?   –  Do  you  download  illegal  content  from  bi>orrent?   –  Do  you  listen  to  music  on  YouTube?   –  Do  you  “like”  bands  on  Facebook?   –  Do  you  subscribe  to  Spo/fy?   –  Do  you  listen  on  the  radio  to  the  weekly  charts  on  a   Sunday  aWernoon?   •  What’s  happening  online?  
  • 5.
  • 6.
    Who  are  my  fans?  
  • 7.
  • 8.
    What  is  the  press  saying?  
  • 9.
  • 10.
    A  Data  Scien/st  in  the  Music  Industry   •  Raw  Data  -­‐>  Derived  Data  -­‐>  Insight   –  Who  is  popular  right  now/in  the  immediate  future?   –  What  was  the  effect  of  appearing  at  a  fes/val?   –  Which  ar/sts  are  (becoming)  popular  with  listeners   with  certain  demographics  (in  a  region)?   •  Data  processing,  machine  learning  &  sta/s/cal   methods   –  Sen/ment  analysis   –  Named  En/ty  Recogni/on   –  Ranking   –  Segmenta/on   •  One-­‐offs   –  Infographics  and  microsites  for  events   –  Brand  alignment  via  demographics   –  Music  Hack  Days   •  Product   –  Daily  charts   –  Sen/ment  scoring  web  crawled  reviews  
  • 11.
    What  is  a  Data  Scien/st?  
  • 12.
    Have  we  been  here  before?   •  Sta/s/cian   •  Data  Analyst   •  Quan/ta/ve  analyst   •  Bioinforma/cian   •  Data  Miner   •  Business  Intelligence  consultant   •  Computa/onal  physicst  
  • 13.
    A  Life  Sciences  digression...  
  • 14.
    What’s  new?   • Data  provides  the  opportunity   –  Old:  Collect  and  store  data  presupposing  how  it  will  be  used   –  New:  Collect  raw  data  &  explore  which  deriva/ons  are   interes/ng;  integra/ng  data  from  mul/ple  online  sources.   –  Big  Data  technology  to  cope  with  data  volume   •  Programming  is  essen/al   –  APIs   –  Heterogeneous  environment(s)   •  Method  of  presenta/on   –  Infographics   –  Interac/ve  (web)  applica/ons   –  (Raw  data)  
  • 15.
    Data  Scien/st   • “Jack  of  all  trades”   –  “Hacker”  mentality:  learn  new  technology  and   approaches  for  a  project  on  short  no/ce   –  Crea/ve  self-­‐starters   –  Work  alongside  other  experts  (data,  domain,   soWware  engineering)  
  • 16.
    A  Data  Scien/st  is  good  at  knieng?   •  Not  building  from  scratch,  knieng  together  pre-­‐exis/ng  parts   •  Data   –  Databases  (rela/onal/NoSQL)   –  Files   –  APIs   •  Algorithms   –  Open  source  libraries   –  Off  the  shelf  tools   •  Compute   –  Linux   –  AWS?   •  Languages   –  Many,  especially  “scrip/ng”  languages