Big	  Data	  Analy,cs:	  	  Applica,ons	  &	  Opportuni,es	  in	  On-­‐line	                         Predic,ve	  Modeling	...
Outline	  •    Big	  Data	  all	  around	  us	  •    IntroducIon	  to	  Data	  Mining	  and	  PredicIve	  AnalyIcs	  •    ...
What	  Ma3ers	  in	  the	  Age	  of	  Analy/cs?	  	  	           1. Being	  Able	  to	  exploit	  all	  the	  data	  that	...
What	  OrganizaIons	  Are	  Struggling	  With	  •  Data	  Strategy	  -­‐	  	  how	  much	  data?	  why	  data?	  how	  doe...
Why	  Big	  Data?	  A	  new	  term,	  with	  associated	  “Data	  Scien:st”	  posi:ons:	  •  Big	  Data:	  is	  a	  mix	  ...
What	  Makes	  Data	  “Big	  Data”?	  •  Big	  Data	  is	  Characterized	  by	  the	  3-­‐V’s:	      –  Volume:	  larger	 ...
The	  DisIncIon	  between	  “Data”	  and	  “Big	                   Data”	  is	  fast	  disappearing	  	  •  Most	  real	  ...
Text	  Data:	  The	  Big	  Driver	  	  •  While	  we	  speak	  of	  “big	  data”	  and	  the	  “Variety”	  in	  3-­‐V’s	  ...
To	  Hadoop	  or	  not	  to	  Hadoop?	  when	  to	  use	  techniques	  requiring	  Map-­‐Reduce	  and	  grid	  compu,ng?	 ...
10	  
11	  
Hadoop	  Use	  Cases	  by	  Data	  Type	             ERP	  Financial	  Data	      Supply	  Chain	  Data	                  ...
Big	  Data	  Applica:ons	  and	  Uses	                                                                    100%	  Capture	 ...
Analysis	  &	  Programming	  Soqware	                                       PIG                                           ...
15	  
From	  Basic	  Dashboards	  to	  Advanced	  Analy:cs	  •  Data	  ReducIon	  to	  get	      –  Advanced	  views	  oriented	...
What	  is	  Data	  Mining?	  Finding	  interes,ng	  structure	  in	  data	  •  Structure:	  refers	  to	  staIsIcal	  pa]e...
Data	  Mining	  and	  Databases	  Many	  interesIng	  analysis	  queries	  are	  difficult	  to	  state	       precisely	  	...
Many	  Business	  Uses	  Analy:c	  technique	                       Uses	  in	  business	  Marke:ng	  and	  sales	        ...
So	  this	  Internet	  thing	  is	  going	  to	                     be	  big!	           Big	  opportunity,	  Big	  Data,	...
Stats	  about	  on-­‐line	  usage	     •  How	  many	  people	  are	  on-­‐line	  today?	  	          –  2.1	  Billion	  (...
How	  Are	  Users	  Distributed	  Geographically	  *Sources: Feb.2012 - from Go-Gulf.com compiled from Comscoredatamine.co...
How Do People Spend Their On-line Time?•  On-line Shopping?                5%•  Searches?                      21%•  Email...
Most	  Popular	  AcIviIes	  On-­‐Line?	  *Sources: Feb.2012 - from Go-Gulf.com compiled from Comscoredatamine.com, Nielsen...
Top	  10	  Sites	  Visited?	  •  Google:	  	  153.4M	  visitors/month	  	      –  each	  spending	  1h	  47mins	  •  Faceb...
InteresIng	  Events	  •  Google:	  	  How	  many	  queries	  per	  day?	         –  More	  than	  1	  Billion	  •  Twi$er:...
InteresIng	  Events	  •  Country	  with	  Highest	  online	  friends?	         –  Brazil	         –  481	  friends	  per	 ...
So	  Internet	  is	  a	  big	  place	  with	                                 lots	  happening?	  	  Do	  we	  understand	 ...
Case	  Studies	     Yahoo!	  Big	  Data	                                 29
Yahoo!	  –	  One	  of	  Largest	                         DesInaIons	  on	  the	  Web	                                     ...
Yahoo!	  Big	  Data	  –	  A	  league	  of	  its	  own…	                                                                   ...
Behavioral	  TargeIng	  (BT)	                                                       SearchContent                       BT...
Yahoo!	  User	  DNA	      Registration          Campaign             Behavior      Unknown                         Clicked...
How	  it	  works	  |	  Network	  +	  Interests	  +	  Modelling	   Varying ProductRelevant Users   Identify Users to the Mo...
Recency	  Ma]ers,	  So	  Does	  Intensity	  Active now…                     …and with feeling                       35	  
DifferenIaIon	  |	  Category	  specific	  modelling	  Example 1: Category Automotive                                        ...
DifferenIaIon	  |	  Category	  specific	  modelling	  Example 1: Category Automotive                                        ...
Automobile	  Purchase	  Intender	  Example	  •  A	  test	  ad-­‐campaign	  with	  a	  major	  Euro	  automobile	  manufact...
Mortgage	  Intender	  Example	                                                 Results from a client campaign on Yahoo!Exa...
Experience summary at Yahoo!•  Dealing with the largest data source in the world (25   Terabyte per day)•  BT business was...
Social	  Network	  Social	  Graph	  Analysis	  (no	  Ime)	    Social	  Network	  MarkeIng	  Understanding	  Context	  for	...
Case Study: TWITTER Social Marketing?        Viacom’s VH1 Twitter campaign on ANVIL (the movie)                           ...
The	  Display	  Ads	  Challenge	  Today	                        What	  Ad	  would	  you	                           place	 ...
The	  Display	  Ads	  Challenge	  Today	                                 Damaging	  to	  Brand?	                        44...
The	  Display	  Ads	  Challenge	  Today	                         What	  Ad	                        would	  you	           ...
The	  Display	  Ads	  Challenge	  Today	                                 Irrelevant	  and	                                ...
NetSeer:	  Intent	  for	  Display	  •  Currently	  Processing	  4	  Billion	  Impressions	  per	  Day	                    ...
Problem:	  Hard	  to	  Understand	  User	                  Intent	    Contextual	  Ad	  served	  by	  Google	             ...
Case	  Studies	    nPario	  –	  Data	  Management	  Plavorm	  ChoozOn	  –	  Big	  Data	  over	  Offers	  Universe	         ...
Example	  of	  a	  Big	  Data	  DMP	  ScalenPario builds an infinitely scalable data management platform (DMP) thatallows ...
Powerful	  Technology.	  	     540m	  Users,	  8+	  Petabytes,	  	             &	  16	  Patents	  	                       ...
DMP	  Data Sources First-Party                               Self-Service Data Intake & ETL                      Deep Insi...
nPario	  Case	  Study	  Marketing to100+ million GamersChallenge: Provide cross-platform campaigninsights for advertisers ...
Big	  Data	  in	  AcIon:	  EA	  Legend	                        54	  
Specialized	  Search	  through	  Big	  Data	   AnalyIcs	  over	  the	  Offers	  Universe	                         55	      ...
Chaos	  for	  Consumers	  	                                Deep                            Discount                       ...
Chaos	  for	  Marketers	                                Deep                             Discou                           ...
What	  Consumers	  &	  Marketers	  Want	          Consumers        •  Value from brands they love        •  Tame the deal ...
Keys	  to	  Being	  THE	  Consumer	  Network	          Comprehensive       Personalization         Intelligent        Deal...
Solu:on	  for	  Consumers	   Daily Deals   Flash Sale            Deep                  Affiliate Deals &      Loyalty    S...
Carol’s	  Consumer	  Network	  “Chozen”	  Brands	                             Interests	  (Intent)	                       ...
What	  Carol	  Gets	  Carol’s	  Consumer	  Network	                     The	  Universe	  of	  Deals	                      ...
Big	  Picture	  on	  Big	  Data	  AnalyIcs	                     Key	  points	                           63	  
Don’t	  Forget	  The	  Basics	  •  Metrics	  and	  Scorecards	  are	  the	  first	  steps	  to	     awareness	  •  Plays	  ...
Focus	  On	  The	  Right	  Measure	            Referral Site Metrics 2.5                                           •  Tota...
Focus	  On	  The	  Right	  Measure	           Referral Site Metrics2.5                                               •  To...
Focus	  On	  The	  Right	  Measure	           Referral Site Metrics2.5                                               •  To...
SomeImes,	  	  Simple	  is	  Very	  Powerful!	    Retaining	  New	  Yahoo!	  Mail	  Registrants	  
IntegraIng	  Mail	  and	  News	  •  Data	  showed	  that	  users	  oqen	  check	  their	  mail	  and	     news	  in	  the	...
“In	  the	  news”	  Module	  on	  Mail	  Welcome	  Page	  •  Increased	  retenIon	  on	  Mail	  for	  light	  users	  by	 ...
Benefits	  of	  Advanced	  AnalyIcs	  •  Advanced	  AnalyIcs	  brings	  out	  the	  real	  value	  of	  data	  	  •  The	  ...
Thank	  You!	  	  	  	  &	  	  	  QuesIons?	                                                     	                        ...
Upcoming SlideShare
Loading in...5
×

Big Data Analytics: Applications and Opportunities in On-line Predictive Modeling by Usama Fayyad

3,881

Published on

Talk by Usama Fayyad at BigMine12 at KDD12.

Virtually all organizations are having to deal with Big Data in many contexts: marketing, operations, monitoring, performance, and even financial management. Big Data is characterized not just by its size, but by its Velocity and its Variety for which keeping up with the data flux, let alone its analysis, is challenging at best and impossible in many cases. In this talk I will cover some of the basics in terms of infrastructure and design considerations for effective an efficient BigData. In many organizations, the lack of consideration of effective infrastructure and data management leads to unnecessarily expensive systems for which the benefits are insufficient to justify the costs. We will refer to example frameworks and clarify the kinds of operations where Map-Reduce (Hadoop and and its derivatives) are appropriate and the situations where other infrastructure is needed to perform segmentation, prediction, analysis, and reporting appropriately – these being the fundamental operations in predictive analytics. We will thenpay specific attention to on-line data and the unique challenges and opportunities represented there. We cover examples of Predictive Analytics over Big Data with case studies in eCommerce Marketing, on-line publishing and recommendation systems, and advertising targeting: Special focus will be placed on the analysis of on-line data with applications in Search, Search Marketing, and targeting of advertising. We conclude with some technical challenges as well as the solutions that can be used to these challenges in social network data.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,881
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
221
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Big Data Analytics: Applications and Opportunities in On-line Predictive Modeling by Usama Fayyad

  1. 1. Big  Data  Analy,cs:    Applica,ons  &  Opportuni,es  in  On-­‐line   Predic,ve  Modeling   Usama  Fayyad,  Ph.D.     Chairman  &  CTO   ChoozOn  Corpora/on   Twi$er:  @usamaf     August  12,  2012   BigMine:  BigData  Mining  Workshop   KDD-­‐2012  –  Beijing,  China   1  
  2. 2. Outline  •  Big  Data  all  around  us  •  IntroducIon  to  Data  Mining  and  PredicIve  AnalyIcs  •  On-­‐line  data  and  facts  •  Case  studies  from  mulIple  verIcals:   1.  Yahoo!  Big  Data   2.  Social  Network  Data   3.  Case  Study  from  nPario  ApplicaIons   4.  ChoozOn  Big  Data  from  offers  •  High-­‐level  view:  don’t  forget  the  basics  •  Summary  and  conclusions   2  
  3. 3. What  Ma3ers  in  the  Age  of  Analy/cs?       1. Being  Able  to  exploit  all  the  data  that  is  available     •  not  just  what  youve  got  available     •  what  you  can  acquire  and  use  to  enhance  your  acIons     2.   ProliferaIng  analyIcs  throughout  your   organizaIon   •  make  every  part  of  your  business  smarter     3.   Driving  significant  business  value     •  embedding  analyIcs  into  every  area  of  your  business   can  help  you  drive  top  line  revenues  and/or  bo]om   line  cost  efficiencies     3  
  4. 4. What  OrganizaIons  Are  Struggling  With  •  Data  Strategy  -­‐    how  much  data?  why  data?  how  does  it   impact  my  business?  •  PrioriIzaIon  conducted  based  on  business  need,  not  IT   –  Business  jusIficaIons  for  Big  Data   –  DemonstraIng  value  of  data  in  impacIng  the  business   –  Looking  at  specialized  stores  to  reduce  TCO   –  File  systems  for  grid  compuIng  (Hadoop)  •  We  do  need  to  stay  on  top  of  our  basic  business  ops   –  billing,  monitoring,  inventory  management,  etc...   –  Most  can  be  handled  by  stream  processing  and  tradiIonal  BI  •  But,  a  new  generaIon  of  requirements  are  becoming  a   priority  for  data-­‐driven  business   –  PredicIve  analyIcs,  advanced  forecasIng,  automated  detecIon  of   events  of  interest   4  
  5. 5. Why  Big  Data?  A  new  term,  with  associated  “Data  Scien:st”  posi:ons:  •  Big  Data:  is  a  mix  of  structured,  semi-­‐structured,  and   unstructured  data:   –  Typically  breaks  barriers  for  tradiIonal  RDB  storage   –  Typically  breaks  limits  of  indexing  by  “rows”   –  Typically  requires  intensive  pre-­‐processing  before  each  query   to  extract  “some  structure”  –  usually  using  Map-­‐Reduce  type   operaIons  •  Above  leads  to  “messy”  situaIons  with  no  standard   recipes  or  architecture:  hence  the  need  for  “data   scienIsts”     –  conduct  “Data  ExpediIons”     –  Discovery  and  learning  on  the  spot   5  
  6. 6. What  Makes  Data  “Big  Data”?  •  Big  Data  is  Characterized  by  the  3-­‐V’s:   –  Volume:  larger  than  “normal”  –  challenging  to  load/process   •  Expensive  to  do  ETL   •  Expensive  to  figure  out  how  to  index  and  retrieve   •  MulIple  dimensions  that  are  “key”   –  Velocity:  Rate  of  arrival  poses  real-­‐/me  constraints  on  what   are  typically  “batch  ETL”  opera/ons   •  If  you  fall  behind  catching  up  is  extremely  expensive  (replicate  very   expensive  systems)   •  Must  keep  up  with  rate  and  service  queries  on-­‐the-­‐fly   –  Variety:  Mix  of  data  types  and  varying  degrees  of  structure   •  Non-­‐standard  schema   •  Lots  of  BLOB’s  and  CLOB’s   •  DB  queries  don’t  know  what  to  do  with  semi-­‐structured  and   unstructured  data.   6  
  7. 7. The  DisIncIon  between  “Data”  and  “Big   Data”  is  fast  disappearing    •  Most  real  data  sets  nowadays  come  with  a  serious  mix  of   semi-­‐structured  and  unstructured  components:   –  Images   –  Video   –  Text  descripIons  and  news,  blogs,  etc…   –  User  and  customer  commentary   –  ReacIons  on  social  media:  e.g.  Twi]er  is  a  mix  of  data  anyway  •  Using  standard  transforms,  enIty  extracIon,  and  new   generaIon  tools  to  transform  unstructured  raw  data  into   semi-­‐structured  analyzable  data    •  Hadoop  vs.  Not  Hadoop  -­‐    when  to  use  what  kind  of   techniques  requiring  Map-­‐Reduce  and  grid  compuIng   7  
  8. 8. Text  Data:  The  Big  Driver    •  While  we  speak  of  “big  data”  and  the  “Variety”  in  3-­‐V’s  •  Reality:  biggest  driver  of  growth  of  Big  Data  has  been  text   data  •  In  fact  Map-­‐Reduce  became  popularized  by  Google  to  address   the  problem  of  processing  large  amounts  of  text  data:     –  Indexing  a  full  copy  of  the  web   –  Frequent  re-­‐indexing   –  Many  operaIons  with  each  being  a  simple  operaIon  but  done  at  large   scale  •  Most  work  on  analysis  of  “images”  and  “video”  data  has  really   been  reduced  to  analysis  of  surrounding  text     8  
  9. 9. To  Hadoop  or  not  to  Hadoop?  when  to  use  techniques  requiring  Map-­‐Reduce  and  grid  compu,ng?  •  Typically  organizaIons  try  to  use  Map-­‐Reduce  for  everything  to   do  with  Big  Data   –  This  is  actually  very  inefficient  and  oqen  irraIonal   –  Certain  operaIons  require  specialized  storage   •  UpdaIng  segment  memberships  over  large  numbers  of  users   •  Defining  new  segments  on  user  or  usage  data  •  Map-­‐Reduce  is  useful  when  a  very  simple  operaIon  is  to  be   applied  on  a  large  body  of  unstructured  data   –  Typically  this  is  during  enIty  and  a]ribute  extracIon   –  SIll  need  Big  Data  analysis  post  Hadoop  •  Map-­‐Reduce  is  not  efficient  or  effecIve  for  tasks  involving  deeper   staIsIcal  modeling   –   good  for  gathering  counts  and  simple  (sufficient)  staIsIcs   •  E.g.  how  many  Imes  a  keyword  occurs,  quick  aggregaIon  of  simple  facts  in  unstructured   data,  esImates  of  variances,  density,  etc…   –  Mostly  pre-­‐processing  for  Data  Mining   9  
  10. 10. 10  
  11. 11. 11  
  12. 12. Hadoop  Use  Cases  by  Data  Type   ERP  Financial  Data   Supply  Chain  Data   Sensor  Data   Financial  Trading  Data   1%   2%   2%   4%   CRM  Data   4%   Science  Data   7%   Content  and  Preference  Data   24%   AdverIsing  Data   10%   IT  Log  Data   19%   Social  Data   11%   Text  and  Language  Data   16%   12  
  13. 13. Big  Data  Applica:ons  and  Uses   100%  Capture   IT  Log  &  Security     Forensics  &  AnalyIcs   Find  New  Signal   Predict  Events   Product  Planning   Automated  Device  Data   Failure  Analysis   AnalyIcs   ProacIve  Fixes   RecommendaIon   AdverIsing   SegmentaIon   AnalyIcs   Social  Media   Hadoop  +    MPP  +  EDW   Big  Data   Cost  ReducIon   Ad  Hoc  Insight   Warehouse  AnalyIcs   PredicIve  AnalyIcs   13  
  14. 14. Analysis  &  Programming  Soqware   PIG HIPI 14  
  15. 15. 15  
  16. 16. From  Basic  Dashboards  to  Advanced  Analy:cs  •  Data  ReducIon  to  get   –  Advanced  views  oriented  by  customer  or  product   –  SegmentaIon   –  Pa]ern  analysis  and  summaries  •  PredicIve  AnalyIcs   –  Data  Mining   –  StaIsIcal  analysis   –  OpImizaIon  of  processes  and  spend   The  same  analyIcs  technique  apply  across   many  industries:  fraud  detecIon  is  fraud   detecIon,  is  fraud  detecIon   16  
  17. 17. What  is  Data  Mining?  Finding  interes,ng  structure  in  data  •  Structure:  refers  to  staIsIcal  pa]erns,  predicIve  models,  hidden   relaIonships  •  Interes/ng:  Accurate  predicIons,  associated  with  new  revenue   potenIal,  associated  with  cost  savings,  enables  opImizaIon      •  Examples  of  tasks  addressed  by  Data  Mining   –  PredicIve  Modeling  (classificaIon,  regression)   –  SegmentaIon  (Data  Clustering  )   –  Affinity  (SummarizaIon)     •  relaIons  between  fields,  associaIons,  visualizaIon   17  
  18. 18. Data  Mining  and  Databases  Many  interesIng  analysis  queries  are  difficult  to  state   precisely    •  Examples:   –  which  records  represent  fraudulent  transac:ons?   –  which  households  are  likely  to  prefer  a  Ford  over  a  Toyota?   –  Who  is  a  good  credit  risk  in  my  customer  DB?   –  Why  are  these  automobiles  in  need  of  unusual  repairs?    •  Yet  database  contains  the  informaIon     –  good/bad  customer,  profitability   –  did/didn’t  respond  to  mailout/survey/campaign/...   –  automobile  repair  and  warranty  records   18  
  19. 19. Many  Business  Uses  Analy:c  technique   Uses  in  business  Marke:ng  and  sales    IdenIfy  potenIal  customers;  establish  the   effecIveness  of  a  campaign    Understanding  customer  behavior   model  churn,  affiniIes,  propensiIes,  …  Web  analy:cs  &  metrics   model  user  preferences  from  data,  collaboraIve   filtering,  targeIng,  etc.  Fraud  detec:on   IdenIfy  fraudulent  transacIons  Credit  scoring   Establish  credit  worthiness  of  a  customer  requesIng  a   loan  Manufacturing  process  analysis   IdenIfy  the  causes  of  manufacturing  problems  PorWolio  trading   opImize  a  porvolio  of  financial  instruments  by   maximizing  returns  &  minimizing  risks  Healthcare  Applica:on   fraud  detecIon,  cost  opImizaIon,  detecIon  of  events   like  epidemics,  etc...  Insurance   fraudulent  claim  detecIon,  risk  assessment  Security  and  Surveillance   intrusion  detecIon,  sensor  data  analysis,  remote   sensing,  object/person  detecIon,  link  analysis,  etc...   19  
  20. 20. So  this  Internet  thing  is  going  to   be  big!   Big  opportunity,  Big  Data,  Big   Challenges!   20  
  21. 21. Stats  about  on-­‐line  usage   •  How  many  people  are  on-­‐line  today?     –  2.1  Billion  (per  Comscore  esImates)   –  30%  of  world  PopulaIon   •  How  much  Ime  is  spent  on-­‐line  per  month  by   the  whole  world?   –  4M  person-­‐years  per  month   •  How  many  hours  per  month  per  Internet  User?   –  16  hours  (global  average)   –  32  hours  (U.S.  Average)  *Sources: Feb.2012 - from Go-Gulf.com compiled from Comscoredatamine.com, Nielsen.com,thisDigitalLife.com, PewInternet.org 21  
  22. 22. How  Are  Users  Distributed  Geographically  *Sources: Feb.2012 - from Go-Gulf.com compiled from Comscoredatamine.com, Nielsen.com,thisDigitalLife.com, PewInternet.org 22  
  23. 23. How Do People Spend Their On-line Time?•  On-line Shopping? 5%•  Searches? 21%•  Email/Communication? 19%•  Reading Content? 20%•  Social Networking? 22%•  Multimedia Sites? 13% 23  
  24. 24. Most  Popular  AcIviIes  On-­‐Line?  *Sources: Feb.2012 - from Go-Gulf.com compiled from Comscoredatamine.com, Nielsen.com,thisDigitalLife.com, PewInternet.org 24  
  25. 25. Top  10  Sites  Visited?  •  Google:    153.4M  visitors/month     –  each  spending  1h  47mins  •  Facebook:  137.6M  visitors  per  month     –  each  spending  7h  50mins   25  
  26. 26. InteresIng  Events  •  Google:    How  many  queries  per  day?   –  More  than  1  Billion  •  Twi$er:  How  many  Tweets/day?   –  More  than  250M  •  Facebook:  Updates  per  day?   –  More  than  800M  •  YouTube:  Views/day   –  4  Billion  views   –  60  hours  of  video  uploaded  every  minute!  •  Social  Networks:  users  who  have  used  sites  for  spying   on  their  partners?   –  56%     *Sources: Feb.2012 - from Go-Gulf.com compiled from Comscoredatamine.com, Nielsen.com, thisDigitalLife.com, PewInternet.org 26  
  27. 27. InteresIng  Events  •  Country  with  Highest  online  friends?   –  Brazil   –  481  friends  per  user   –  Japan  has  least  at  29  •  Country  with  maximum  Ime  spent  shopping  on-­‐line??   –  China:  5  hours/week     *Sources: Feb.2012 - from Go-Gulf.com compiled from Comscoredatamine.com, Nielsen.com, thisDigitalLife.com, PewInternet.org 27  
  28. 28. So  Internet  is  a  big  place  with   lots  happening?    Do  we  understand  what  each  individual  is  trying  to   achieve?  •  What  is  user  intent?  •  Cri/cal  in  mone/za/on,  adver/sing,  etc…  Do  we  understand  what  a  community’s  sen:ment  is?  •    What  is  the  emoIon?  •    Is  it  negaIve  or  posiIve?  •    What  is  the  health  of  my  brand  online?  Do  we  understand  context  and  content?  •    What  are  appropriate  ads?  •    Is  it  Ok  to  associate  my  brand  with  this  content?  •  Is  content  sad?,  happy?,  serious?,  informaIve?   28  
  29. 29. Case  Studies   Yahoo!  Big  Data   29
  30. 30. Yahoo!  –  One  of  Largest   DesInaIons  on  the  Web   More people visited Yahoo! in the past month than:" "80%  of  the  U.S.  Internet  popula:on  uses  Yahoo!     •  Use coupons"–  Over  600  million  users  per  month  globally!   •  Vote"   •  Recycle"•  Global  network  of  content,  commerce,  media,  search  and  access   •  Exercise regularly" products   •  Have children•  100+  properIes  including  mail,  TV,  news,  shopping,  finance,   living at home" autos,  travel,  games,  movies,  health,  etc.   •  Wear sunscreen regularly"•  25+  terabytes  of  data  collected  each  day   •  RepresenIng  1000’s  of  cataloged  consumer  behaviors   Data is used to develop content, consumer, category and campaign insights for our key content partners and large advertisers 30   Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005.
  31. 31. Yahoo!  Big  Data  –  A  league  of  its  own…   Terrabytes of Warehoused Data Millions of Events Processed Per Day 14,000 5,000 1,000 500 2,000 25 49 94 10050 120 225 Amazon Walmart warehouse Telecom Warehouse AT&T Y! Panama Y! LiveStor Korea Y! MainSABRE VISA NYSE YSM Y! Global GRAND CHALLENGE PROBLEMS OF DATA PROCESSING TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET Y! Data Challenge Exceeds others by 2 orders of magnitude 31  
  32. 32. Behavioral  TargeIng  (BT)   SearchContent BT Search Clicks Ad Clicks Targe:ng  your  ads  to   consumers  whose  recent   behaviors  online  indicate   that  your  product  category  is   32   relevant  to  them  
  33. 33. Yahoo!  User  DNA   Registration Campaign Behavior Unknown Clicked on Checks Yahoo! Lawyer Mail daily via Lives in SF Sony Plasma TV Searched on: “Hillary Clinton” PC & Phone Searched on: SS ad “Italian restaurant Palo Alto” Searched on Male, age 32 from London Spends 10 hour/week Purchased Da Has 25 IM Buddies,last week On the internet Vinci Code from Moderates 3 Y! Amazon Groups, and hosts a 360 page viewed by 10k people•  On a per consumer basis: maintain a behavioral/interests profile and profitability (user value and LTV) metrics 33  
  34. 34. How  it  works  |  Network  +  Interests  +  Modelling   Varying ProductRelevant Users Identify Users to the Models Rewarding Good Behaviour Match Most Purchase Cycles Analyze predictive patterns for purchase cycles in over 100 product categories In each category, build models to describe behaviour most likely to lead to an ad response (i.e. click). Score each user for fit with every category…daily. Target ads to users who get highest ‘relevance’ scores in the targeting categories 34  
  35. 35. Recency  Ma]ers,  So  Does  Intensity  Active now… …and with feeling 35  
  36. 36. DifferenIaIon  |  Category  specific  modelling  Example 1: Category Automotive Example 2: Category Travel/Last Minute Intense Click Zone intensity score intensity score Intense Click Zone time time Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click Different models allow us to weight and determine intensity and recency 36  
  37. 37. DifferenIaIon  |  Category  specific  modelling  Example 1: Category Automotive user is in the Intense Click Zone intensity score Intense Click Zone with no further activity, decay takes effect timeAlt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click Different models allow us to weight and determine intensity and recency 37  
  38. 38. Automobile  Purchase  Intender  Example  •  A  test  ad-­‐campaign  with  a  major  Euro  automobile  manufacturer   –  Designed  a  test  that  served  the  same  ad  creaIve  to  test  and  control  groups   on  Yahoo   –  Success  metric:  performing  specific  acIons  on  Jaguar  website  •  Test  results:  900%  conversion  liq  vs.  control  group   –  Purchase  Intenders  were  9  Imes  more  likely  to  configure  a  vehicle,  request   a  price  quote  or  locate  a  dealer  than  consumers  in  the  control  group   –  ~3x  higher  click  through  rates  vs.  control  group   38  
  39. 39. Mortgage  Intender  Example   Results from a client campaign on Yahoo!Example: Mortgages Network +626% Conv LiftWe found: +122%1,900,000 people looking CTR Liftfor mortgage loans. Example search terms qualified for this target: Mortgages Home Loans Refinancing Ditech Example Yahoo! Pages visited: Financing section in Real Estate Mortgage Loans area in Finance Real Estate section in Yellow Pages Source: Campaign Click thru Rate lift is determined by Yahoo! Internal research. Conversion is the number of qualified leads from clicks over number of impressions served. Audience size represents the audienceDate: March 2006 39   within this behavioral interest category that has the highest propensity to engage with a brand or product and to click on an offer.
  40. 40. Experience summary at Yahoo!•  Dealing with the largest data source in the world (25 Terabyte per day)•  BT business was grown from $20M to about $500M in 3 years of investment!•  Building the largest database systems: –  World’s largest Oracle data warehouse –  World’s largest single DB –  Over 300 data mart data –  Analytics with thousands of KPI’s –  Over 5000 users of reports –  Largest targeting system in the world•  Big demands for grid computing (Hadoop) 40  
  41. 41. Social  Network  Social  Graph  Analysis  (no  Ime)   Social  Network  MarkeIng  Understanding  Context  for  Ads   41  
  42. 42. Case Study: TWITTER Social Marketing? Viacom’s VH1 Twitter campaign on ANVIL (the movie) (week of May 14th , 2009 – see AdAge article) Marketing ANVIL movie Very niche audience, how do you reach them? Diffusion Give 20 free DVD’s to major related artists/groups, ask them To notify Twitter groups – reached over 2M people Social Identity: power of word of mouth... What is the Cost to VH-1? Compare with traditional approach: TV commercials to promote a documentary film? 42 42  
  43. 43. The  Display  Ads  Challenge  Today   What  Ad  would  you   place  here?   43  
  44. 44. The  Display  Ads  Challenge  Today   Damaging  to  Brand?   44  
  45. 45. The  Display  Ads  Challenge  Today   What  Ad   would  you   place  here?   45  
  46. 46. The  Display  Ads  Challenge  Today   Irrelevant  and   Damaging  to  Brand   Completely  Irrelevant   46  
  47. 47. NetSeer:  Intent  for  Display  •  Currently  Processing  4  Billion  Impressions  per  Day   47  
  48. 48. Problem:  Hard  to  Understand  User   Intent   Contextual  Ad  served  by  Google   What  NetSeer  Sees:     48  
  49. 49. Case  Studies   nPario  –  Data  Management  Plavorm  ChoozOn  –  Big  Data  over  Offers  Universe   49  
  50. 50. Example  of  a  Big  Data  DMP  ScalenPario builds an infinitely scalable data management platform (DMP) thatallows advertisers and marketers to manage, understand, and monetize theirdata. Their technology has been proven at companies such as Yahoo and EA.ApplicationsnPario applications include segmentation of audiences for increasing the valueof advertising, reporting/analytics for examining performance, attribution toshow which advertising works, and experimentation to test ideas.AccessnPario emphasizes putting access to data in the hands of marketing,advertising and other business users 50  
  51. 51. Powerful  Technology.     540m  Users,  8+  Petabytes,     &  16  Patents      nPario has the only commercially In Production at Yahooavailable Big Data management nPario technology manages the world’stechnology built  for  one  of  the  “Big  Five”.   largest data system. Used for Yahoo’s Marketing and Advertising business. Used across Yahoo’s platforms and 120 onlineSignificant Investment properties. Used by hundreds of analystsnPario’s technology is the result of more than$50m investment in development and 16issued patents. 51  
  52. 52. DMP  Data Sources First-Party Self-Service Data Intake & ETL Deep Insights & Big Data Analytics Offline Data Processing, Insights CRM Data Tag Mgmt Enrichment and Analytics Normalization Events Modeling & Discovery Multiple Channels Search eMail Self-Service Applications 1   User DNA Segment Target Mobile Video IntegraIon  &  APIs   Display Site Third-Party Real-Time Data Availability Attribution Experimentation Data Management Platform 52   52  
  53. 53. nPario  Case  Study  Marketing to100+ million GamersChallenge: Provide cross-platform campaigninsights for advertisers and enable audiencediscovery across channels.Result: Unified view of gamers “EA increased its worldwide audience reach by 30% this year […]. Combining that major jump inacross multiple cross platform data reach with the launch of EA Legend puts us insources. perfect position to compete” Dave Madden, Senior Vice President of GlobalPogo (online), sponsored content, Console Media Solutionsgame interaction and ad interaction (Xbox, Electronic Arts.PS3), Mobile, Playfish (facebook), externalsources (Collective Media, Comscore,Omniture, Dynamic Logic) 53  
  54. 54. Big  Data  in  AcIon:  EA  Legend   54  
  55. 55. Specialized  Search  through  Big  Data   AnalyIcs  over  the  Offers  Universe   55   55  
  56. 56. Chaos  for  Consumers     Deep Discount Sites Coupon Flash Sites Sale Sites Loyalty Programs Daily Deals Sites Consumer Online Loyalty Program Social s Search Network Engines s 56  
  57. 57. Chaos  for  Marketers   Deep Discou nt Sites Flash Coupon Sale Sites Sites Loyalty Program s Daily Deals Sites Online Loyalty Progra Social ms Search Networ Engines ks 57  
  58. 58. What  Consumers  &  Marketers  Want   Consumers •  Value from brands they love •  Tame the deal chaos Marketers •  Reach targeted consumers •  Build loyalty •  Create brand evangelists 58   58  
  59. 59. Keys  to  Being  THE  Consumer  Network   Comprehensive Personalization Intelligent Deal Coverage Matching Permission- Loyalty Multi- based Solution Channel Targeting Reach Solution10/18/2010   ChoozOn  ConfidenIal   59   59  
  60. 60. Solu:on  for  Consumers   Daily Deals Flash Sale Deep Affiliate Deals & Loyalty Sites Sites Discounters Coupon Sites Programs Choozer  Interests  &  Preferences   Machine- Intelligent   Social based Matching   (Pals,Clubs) Web App Mobile App Email Digital Media 60   60  
  61. 61. Carol’s  Consumer  Network  “Chozen”  Brands   Interests  (Intent)   Shopping  Pals   Deal  Clubs   61  
  62. 62. What  Carol  Gets  Carol’s  Consumer  Network   The  Universe  of  Deals   Affiliate  Deals   Loyalty  Programs   1,500+  brands   Daily  Deals     Flash  Sales   100,000+  offers   Deep  Discounters   PLUS her Inbox™ Intelligent Matching A  Personal  Shopper  for  Deals   62  
  63. 63. Big  Picture  on  Big  Data  AnalyIcs   Key  points   63  
  64. 64. Don’t  Forget  The  Basics  •  Metrics  and  Scorecards  are  the  first  steps  to   awareness  •  Plays  a  huge  role  in  deploying  predicIve  models  and   monitoring  and  proving  their  effecIveness  •  Oqen  scorecards  require   –  Going  through  huge  amounts  of  data  to  produce  the  required  metrics   –  Ability  to  get  to  the  metrics  in  low  latency   –  Ability  to  modify  metrics  and  update  quickly   –  IntegraIon  with  data  warehouse   64  
  65. 65. Focus  On  The  Right  Measure   Referral Site Metrics 2.5 •  Total  traffic  not  a  good   Response performance  measure   2.0 Conversion •  High-­‐traffic  referral  sites   1.5 oqen  produce  poorer   1.0 Margin quality  click  throughs   0.5 •  Ads  best  response  not   0.0 most  effecIve     Site 1 Site 2 •  Target  the  message   65  
  66. 66. Focus  On  The  Right  Measure   Referral Site Metrics2.5 •  Total  traffic  not  a  good   Response performance  measure  2.0 •  High-­‐traffic  referral  sites  1.5 Conversion oqen  produce  poorer  1.0 Margin quality  click  throughs  0.5 •  Ads  best  response  not  0.0 most  effecIve     Site 1 Site 2 •  Target  the  message   66  
  67. 67. Focus  On  The  Right  Measure   Referral Site Metrics2.5 •  Total  traffic  not  a  good   Response performance  measure  2.0 •  High-­‐traffic  referral  sites  1.5 Conversion oqen  produce  poorer  1.0 Margin quality  click  throughs  0.5 •  Ads  best  response  not  0.0 most  effecIve     Site 1 Site 2 •  Target  the  message   67  
  68. 68. SomeImes,    Simple  is  Very  Powerful!   Retaining  New  Yahoo!  Mail  Registrants  
  69. 69. IntegraIng  Mail  and  News  •  Data  showed  that  users  oqen  check  their  mail  and   news  in  the  same  session   –  But  no  easy  way  to  navigate  to  Y!  News  from  Y!  Mail  •  Mail  users  who  also  visit  Y!  News  are  3X  more  acIve   on  Yahoo   –  Higher  retenIon,  repeat  visits  and  Ime-­‐spent  on   Yahoo   69  
  70. 70. “In  the  news”  Module  on  Mail  Welcome  Page  •  Increased  retenIon  on  Mail  for  light  users  by  40%!   –  Est.  Incremental  revenue  of  $16m  a  year  on  Y!  Mail  alone   70  
  71. 71. Benefits  of  Advanced  AnalyIcs  •  Advanced  AnalyIcs  brings  out  the  real  value  of  data    •  The  business  begins  to  understand  the  true  value  and  role  of   data  in  moving  the  big  needles  •  Focus  on  useful  requirements  from  data,  rather  than  “data   acrobaIcs”  •  Value  creaIon  from  data  leads  to  proper  investment  scoping   –  Many  are  realizing  predicIve  analyIcs  and  data  mining    are  much   more  useful  than  reporIng   –  IntegraIon  of  analyIcs  story  with  data  storage  very  criIcal  •  Big  data  makes  analyIcs  even  more  essenIal  and  more  useful   –  Avoiding  the  challenges  of  separaIng  analyIcs  from  big  data  are   increasingly  important   71  
  72. 72. Thank  You!        &      QuesIons?     Twi]er:  @usamaf     Usama@choozOn.net   www.ChoozOn.com  
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×