Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

8,239 views

8,189 views

8,189 views

Published on

ARF Webcast Presentation, September 12, 2011

No Downloads

Total views

8,239

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

89

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Early Lessons Learned in Applying Big Data To TV Advertising<br />ARF September 12, 2011<br />Jack Smith, Chief Product Officer, Simulmedia<br />
- 2. About Us<br />Who We Are<br />We are a New York based start-up. We are venture backed by Avalon Ventures, Union Square Ventures and Time-Warner.<br />Where We Have Been<br />Our 35 person team has veterans of:<br />What We Believe<br />Television is still the most powerful advertising medium in the world. While addressability will come, we’re not waiting for it. We’ve taken a few strategies we learned from the Internet and are applying it to linear TV advertising, today.<br />Through partnerships with major data providers, we have assembled the world’s largest set of actionable television data.<br />How We Do It<br />How We Make Money<br />We sell television advertising. With inventory in over 106 million US households, we can cost-effectively extend reach into high-value target audiences across virtually any advertiser category. We use big data and science to do this.<br />
- 3. Why Did We Leave The Web?<br />Television remains the dominant consumer medium<br />(a) Nielsen US TV Viewing AudicenceTraditional Live-Only TV based on average monthly viewing during 1Q2011. Internet and Online Video based on average monthly consumption during July 2011. Video on Demand based on consumption during May 2011.<br />
- 4. TV Spend Is Increasing<br />Source: MAGNAGLOBAL<br />
- 5. Audience Is Fragmenting<br />Source: Nielsen via TVbythenumbers.com<br />
- 6. Campaign Reach Is Declining<br />Impossible for measurement and planning tools to keep pace <br />Source: Simulmedia analysis of data from SQAD, Nielsen and TVB<br />
- 7. Big Data<br />
- 8. Big Data Is Driving Growth<br />“We are on the cusp of a tremendous wave of innovation, productivity and growth, as well as new modes of competition and value-capture – all driven by Big Data.”<br />- McKinsey Global Institute, May 2011<br />“For CMOs,Big Data is a very big deal.”<br />- Alfredo Gangotena, CMO, Mastercard, July 2011<br />
- 9. Size Is Relative<br />1 byte x 1000 = 1 kilobyte<br />…x 1000 = 1 megabyte<br />…x 1000 = 1 gigabyte<br />…x 1000 = 1 terabyte<br />…x 1000 = 1 petabyte<br />…x 1000 = 1 exabyte <br />
- 10. Size Is Relative<br />Telegram = 100 bytes<br />Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm<br />
- 11. Size Is Relative<br />Page of an Encyclopedia = 100 kilobytes<br />Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm<br />
- 12. Size Is Relative<br />Pickup truck bed full of paper = 1 gigabyte <br />Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm<br />
- 13. Size Is Relative<br />Entire print collection of the Library of Congress = 10 terabytes<br />Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm<br />
- 14. Size Is Relative<br />All hard drives produced in 1995 = 20 petabytes <br />Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm<br />
- 15. Size Is Relative<br />All printed material = 200 petabytes <br />Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm<br />
- 16. But Big Data Is More Than Size<br />What happened?<br />Why did it happen?<br />BIG DATA<br />What’s going to happen next?<br />Time:<br />Past<br />Future<br />Focus:<br />Reporting<br />Prediction<br />Supports:<br />Human decisions<br />Machine decisions<br />Structured<br />Aggregated<br />Unstructured<br />Unaggregated<br />Data:<br />Dashboards<br />Excel<br />Discovery<br />Visualization<br />Statistics & Physics<br />Human Skills:<br />
- 17. Accelerating The Push To Big Data<br />Hadoop, cloud computing, Facebook, Yahoo, quants, Bittorrent, machine learning, Stanford, large hadron collider, Wal-Mart, text processing, Amazon S3 & EC2, open source intelligence, NoSQL, social media, Google, commodity hardware, Hive, fraud detection, trading desks, MapReduce, natural language processing <br />
- 18. What Can It Mean For TV Advertising?<br />Big data drove the rise of web & search advertising<br /><ul><li>Accumulation of high volume of direct measurement of media consumption
- 19. Better predictions about consumer interests
- 20. Real time return path
- 21. Automation
- 22. Interim step for addressability
- 23. More diligence around consumer privacy
- 24. Media buyers and sellers rethinking their approach to audience packaging, campaign planning, technology, data assembly and people</li></li></ul><li>Post Modern Architecture<br />Have we reached the limits of classic data storage architecture?<br />Data Warehouses<br /><ul><li>Yahoo!: 700 tb1
- 25. Australian Bureau of Statistics: 250 tb1
- 26. AT&T: 250 tb1
- 27. Nielsen: 45 tb1
- 28. Adidas: 13 tb1
- 29. Wal-Mart: 1 pb2</li></ul>Data Lakes<br /><ul><li>Facebook: 30 pb3 (7x compression)
- 30. Yahoo: 22 pb4
- 31. Google: ???</li></ul>1 Oracle F1Q10 Earnings Call September 16, 2009 Transcript<br />2Stair, Principles of Information Systems, 2009, p 181<br />3 Dhruba Borthakur, Facebook, December 2010, http://www.facebook.com/note.php?note_id=468211193919<br />4 Simulmedia estimate<br />
- 32. Our Idea of Big Data<br />Bringing the data set together in a single platform<br />Our (comparatively modest) data set:<br /><ul><li>200 tb (approx. 7x compression)
- 33. 113,858,592 daily events
- 34. Approximately 402,301 weekly ads
- 35. Double capacity every 6 months</li></ul>…And we don’t load every data point across all data sets, yet<br />
- 36. Rethinking Media Data Architecture<br />Applying big data to television required us to rethink what our technical architecture should be<br />Commodity Hardware<br /><ul><li>No clouds allowed (ISO compliance)
- 37. Expect hardware failure
- 38. Learn from those who have done it
- 39. Participate in the Open Source community</li></ul>Open Source Software<br />Write Your Own Software<br /><ul><li>ELT(Extract, Load, Transform)
- 40. Meddle
- 41. Machine learning</li></ul>Science<br /><ul><li>Advanced statistical techniques
- 42. Experimentation</li></li></ul><li>Some Wrinkles In The Matrix<br />No standards for set top boxes<br />Channel mapping<br />Time synchronization<br />On/off rules<br />….<br />Consult the sages<br />Build the team<br />
- 43. The People We Needed<br />A different approach required different skill sets<br /><ul><li>New core skills for everyone in the company
- 44. Pattern recognition
- 45. Visualization
- 46. Technology
- 47. Experimentation
- 48. Where do you find hard to find tech skills?
- 49. You don’t find them. You make them.
- 50. A dedicated Science team
- 51. Non traditional researchers (Brain imaging, bioinformatics, economic modeling, genetics)
- 52. People who watch a lot of television</li></li></ul><li>10 Lessons We’ve Learned<br />
- 53. Some Things To Know, First<br /><ul><li>Live viewing unless otherwise noted
- 54. Time shifting lessons is a whole other presentation
- 55. Time shifting + live viewing lessons is a whole other other presentation
- 56. Video on demand is a whole other other other presentation
- 57. We name names and provide numbers where clients and data partners permit
- 58. Client confidentiality is important to us
- 59. None of this work would’ve been possible without the help of our clients and partners</li></ul>This box will contain important information about the graphs on each page.<br />Read me…<br />
- 60. 60% of TV Viewers Watch 90% of TV<br />
- 61. Where The Other 40% Are<br />Networks with relatively fewer lighter viewer impressions <br />Networks with relatively more lighter viewer impressions <br />Vertical: Ratio of Heavy Viewers to light viewer impressions. <br />Horizontal: Low rated to Highly rated networks Call outs: Ratio is the number of Heavier Viewer impressions you would deliver to reach a Lighter Viewer on a given network<br />Higher rated networks<br />Lower<br />rated networks<br />Sources: Nielsen & Simulmedia’s a7<br />
- 62. Where The Other 40% Are<br />To capture light viewers, media planning and measurement tools must quickly apply new methods to emerging data sets<br />
- 63. Quality Control Is A Full Time Job<br />
- 64. When Data Goes Missing<br />Automation of error checking/quality control is essential<br />Reuse the data to solve other problems<br />Occasionally observe missing data<br />Three choices:<br /><ul><li>Pick up the phone
- 65. Estimate missing fields
- 66. Work around the missing data</li></ul>Time series of SYFY network. 10645 observations from 2010.02.28 at 7:00pm Eastern to 2010.10.14 at 12:30pm Eastern<br />Source: Simulmedia’s a7<br />
- 67. More Data Really Is Better<br />
- 68. Disambiguation: The Madonna Problem<br />OR<br />Pop Icon?<br />Religious icon?<br />
- 69. The Revolution of Simple Methods<br />More data beats better algorithms.<br />The best performing algorithm underperforms the worst algorithm when given an order of magnitude more data. <br />Simple algorithms at very large scale can help better predict audience movement.<br />Peter Norvig | Internet Scale Data Analysis | June 21, 2010<br />Original graph sourced from: Banko & Brill, 2001. Mitigating the paucity-of-data problem: exploring the effect of training corpus size on classifier performance for natural language processing<br />
- 70. Packaging Reach<br />Very large data sets better predict TV audience movements<br />Peter Norvig | Internet Scale Data Analysis | June 21, 2010<br />
- 71. The Cost Of More Data<br />More data drives better results but there are costs<br />
- 72. The Data Isn’t Biased Just Because It Comes From A Set Top Box<br />
- 73. Applying Simple Methods At Scale<br />High correlation of a7 measures and Nielsen estimates.<br />Either bias is insignificant or Nielsen data and our data share the same bias.<br />Multiple methods yield similar results<br />Regression analysis of Nielsen Household Cume Rating against Simulmedia’s a7 cume rating. 20 Primetime Network shows with HAWAII FIVE-0. Fall 2010.<br />Sources: Nielsen & Simulmedia’s a7<br />
- 74. And Then We Kept Going<br />We measured program Tune-In, Spot Tune-In, Campaign Reach, Campaign Rating using multiple slices of our data set using two different sample sets and time frames<br />Two samples<br />Sample 1: Fall 2010: 20 Primetime broadcast series launches + promos<br />Sample 2: Jan 2011: 15 Primetime cable series premieres + promos (Plus one multi-season/year primetime broadcast premiere + promos)<br /><ul><li>Hand selected programs
- 75. Mix of genres
- 76. Mix of new vs. returning shows</li></ul>How we sliced it<br /><ul><li>Entire a7 data set
- 77. Cross correlated individual data sets contained in a7 aggregate data set
- 78. Aggregate cross geographies (DMA to DMA)</li></ul>Observations<br /><ul><li>Sample 1 average r2>0.85
- 79. Sample 2 average r2>0.93</li></li></ul><li>Addressability Is Here<br />
- 80. Closing The Loop On Program Promotion<br />Spring 2010 broadcast premiere promotion. Horizontal: Left to right moves back in time. 0 is the premiere time. Vertical: Conversion rate is measured in percent. Size of the bubble represents total conversions for a given spot.<br />Sources: Simulmedia’s a7<br />
- 81. Closing The Loop On Program Promotion<br />Spring 2010 broadcast premiere promotion. Horizontal: Left to right moves back in time. 0 is the premiere time. Vertical: Conversion rate is measured in percent. Size of the bubble represents total conversions for a given spot.<br />Sources: Simulmedia’s a7<br />
- 82. Closing The Loop<br />Long held beliefs and rules of thumb in planning may or may not be supported by data<br />TV marketers now have more options for show promotion<br />
- 83. Nielsen’s Ratings Are Good (Surprisingly Good)<br />
- 84. Time Series: Broadcast: CBS<br />Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) <br />60 networks. High correlation between Nielsen large sample measurement and a7 measures<br />Sources: Nielsen & Simulmedia’s a7<br />
- 85. Time Series: Broadcast: Fox<br />Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) <br />Sources: Nielsen & Simulmedia’s a7<br />
- 86. Time Series: Broadcast: ABC<br />Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) <br />Sources: Nielsen & Simulmedia’s a7<br />
- 87. Time Series: Cable: Investigation Discovery<br />Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) <br />Sources: Nielsen & Simulmedia’s a7<br />
- 88. Time Series: Cable: Golf<br />Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) <br />Sources: Nielsen & Simulmedia’s a7<br />
- 89. Time Series: Cable: Bravo<br />Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) <br />Sources: Nielsen & Simulmedia’s a7<br />
- 90. Time Series: Cable: ESPN2<br />Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) <br />Sources: Nielsen & Simulmedia’s a7<br />
- 91. Time Series: Cable: Speed<br />Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) <br />Sources: Nielsen & Simulmedia’s a7<br />
- 92. …but…<br />
- 93. When You Look Closer<br />Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987)) <br />Sources: Nielsen & Simulmedia’s a7<br />
- 94. High Frequency Time Series: ABC Family<br />Volatility in dayparts, low rated networks, demographics…. <br />Unrated networks “don’t exist.” Did NOT look at local.<br />a7<br />Nielsen<br />Sample graph from High Frequency (Second and Minute level) Time Series Analysis of 45 networks on January 19th2011. <br />Simulmedia a7Sample (Second by Second to Minute) <br />Nielsen Sample (Minute by Minute) <br />Sources: Nielsen & Simulmedia’s a7<br />
- 95. Women Are More Different Than Men<br />
- 96. Gender Driven Geographic Variation<br />Viewing by zip code among women across markets is more varied than men in the same zip codes<br />Men 18-54<br />Women 18-54<br />Fraction of view time for ages 18-54 as fraction of view time for all TV viewers. Week 2 vs. the same fraction for week 1 (last two weeks in January). Three markets: Philadelphia (blue) Atlanta (red) and Chicago (green) Each point represents a zip code in one of these markets. <br />Source: Simulmedia’s a7<br />
- 97. Gender Driven Geographic Variation<br />Planning tactics for female targeted campaigns should be different than male target campaigns<br />PS…Also a good case for geo based creative versioning<br />
- 98. Privacy Matters<br />
- 99. 59<br />Privacy By Design<br /><ul><li>All marketing data companies need to care
- 100. Make consumer privacy protection part of the business from the beginning
- 101. Anonymous, aggregated data only
- 102. No personal data or data that can be related to particular individuals or devices
- 103. Broad marketing segmentations, not profiling
- 104. No sensitive data</li></ul>Don’t be creepy<br />
- 105. Mass Reach Is Indiscriminant<br />
- 106. Fragmentation Effects On Frequency<br />Each segment was above 70% reach but the frequency distribution was nearly identical<br />Percent of audience reached for major animated motion picture campaign 2011. Two weeks prior to release. Each stacked bar is a different audience segment. Each color with the stacked bar represents the frequency of ad view for each segment. <br />Source: Nielsen & Simulmedia’s a7<br />
- 107. Fragmentation Effects On Frequency<br />Fragmentation is affecting all high reach campaigns.<br />Percent of audience reached for insurance advertisers September to October 2010. Approximately 8000 ads. Each stacked bar is a different audience segment. Each color with the stacked bar represents the frequency of ad view for each segment. <br />Source: Nielsen & Simulmedia’s a7<br />
- 108. Fragmentation Effects On Frequency<br />The TV advertising market can’t continue to support this<br />
- 109. 40% Of The Audience Is Getting 85% Of The Impressions<br />
- 110. Fragmentation Rears It’s Head Again <br />Campaign impressions increasingly concentrated against heavy viewers.<br />0.0% <br />0.0 <br />Total <br />US Television Audience<br />1.4 <br />3.6% <br />4.3 <br />10.8% <br />Percent of audience reached for a different major animated motion picture campaign 2011. Two weeks prior to release. The stacked bar represents quintiles. Blue labels are average frequency per respective quintile. Red labels are % of total campaign impressions by respective quintile.<br />23.0% <br />9.1 <br />62.6% <br />24.8 <br />Average Frequency <br />Per Quintile<br />% of Total Impressions <br />Per Quintile<br />Source: Nielsen & Simulmedia’s a7<br />
- 111. Fragmentation Effects on Frequency<br />Advertisers won’t continue to support this<br />
- 112. What Happens Next?<br />
- 113. Choices<br /><ul><li>If fragmentation is causing declining campaign reach and frequency imbalances, marketers must make choices.
- 114. Reduce reach
- 115. Do nothing
- 116. Use other channels
- 117. Stabilize or improve reach
- 118. Re-aggregate audiences using big data</li></ul>What do you think?<br />
- 119. Jack Smith<br />jack@simulmedia.com@simulmedia<br />@jkellonsmith<br />
- 120. About Our Science Team<br /><ul><li>Krishna Balasubramanian, Chief Scientist
- 121. Previously: Chief Scientist, Tacoda. Chief Scientist, Real Media.
- 122. Doctoral Candidate, Physics. (Condensed Matter Physics) The Ohio State University
- 123. MS, Computer & Information Systems. The Ohio State University
- 124. MSc, Physics. Indian Institute of Technology, Kanpur
- 125. Yuliya Torosjan, Scientist
- 126. Previously: Clinical Research (Brain Imaging), Mount Sinai College of Medicine
- 127. MA, Statistics. Columbia University
- 128. BSE, Computer Science & Engineering. University of Pennsylvania
- 129. BA, Psychology. University of Pennsylvania
- 130. Mario Morales, Scientist
- 131. Previously: Lecturer, Bioinformatics, New York University. Senior Consultant, Weiser LLP.
- 132. MS, Statistics. Hunter College
- 133. MS, Bioinformatics. New York University
- 134. Dr. Sidd Mukherjee, Scientist
- 135. Previously, Visiting Scholar (Atomic Scattering experiments), The Ohio State University
- 136. Post doctoral research, Heat capacity of Helium-4. Pennsylvania State University
- 137. PhD, Physics. (Thesis: Measurements of Diffuse and Specular Scattering of 4He Atoms from 4He Films), Ohio State University
- 138. MS, Computer &Information Systems. The Ohio State University
- 139. BSc, Physics & Mathematics. University of Bombay</li>

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment