More Related Content

Slideshows for you(20)

Similar to How Nielsen Utilized Databricks for Large-Scale Research and Development with Matt VanLandeghem(20)


More from Spark Summit(20)

Recently uploaded(20)


How Nielsen Utilized Databricks for Large-Scale Research and Development with Matt VanLandeghem

  1. Matt VanLandeghem, Nielsen How Nielsen Utilized Databricks for Large-scale Research and Development #EUent4
  2. About Nielsen • Founded 1923 • Buy & Watch – Buy: Market Research – Watch: Audience Measurement • Not just TV! • Also Radio and Digital, including PC, Mobile, Connected Devices, Digital Audio, Digital TV • Digital Ad spending now meeting/exceeding TV Ad spending 2#EUent4
  3. What is Nielsen Digital Ad Ratings (DAR)? • Measurement of computer, mobile, and over-the-top device audience – Comparable to TV ratings – Who is behind the screen? • Advertising campaigns – Primary focus is age/gender demographic breaks – On-Target Delivery (%) is a key metric • Global product – 25 countries 3#EUent4
  4. How does DAR work? 4#EUent4 4.3 2.4 2.0 0 1 2 3 4 5 Third-party Demographics Report to ClientAd Impression “Big Data”Mobile, computer, over-the- top Overnight daily reporting of: Unique audience, Ad impressions, On-Target % Nielsen Bias-Correction Adjustment Focus of today’s presentation…
  5. Nielsen Adjustments • “Big Data” is not perfect – Needs bias correction – Where the value of Nielsen’s high-quality panels really shines – Nielsen’s panels provide a “truth set” that can be used to develop models that adjust big data • 3 sources of bias – Misrepresentation – Misattribution – Non-coverage • Nielsen adjustments are an active area of Research and Development 5#EUent4
  6. Nielsen Adjustments • Metered home PC behavior – Representative sample of U.S. homes – “Medium” data • Production impression data – Big data • What is the best way to create Nielsen adjustments AND test them in a production environment? • Foundation for Nielsen’s Databricks Use Case 6#EUent4
  7. Nielsen Business Case • Recently created new DAR adjustment methodologies – Small-scale testing showed the new methodologies are an enhancement over current methodologies • Business requirement: test new methodologies on a large # of campaigns – Need to understand client impact – Large-scale testing could identify corner or edge cases where new methodologies could break down and cause a data quality issue – Small scale testing: ~20 campaigns – Large scale testing: ~4000 campaigns 7#EUent4
  8. Databricks • Cluster management • Provide a friendly interface to Spark for our Data Scientists – Multiple programming languages – Create adjustment factors • Uses an algorithm not available in SQL – Link to production databases – Apply adjustment factors to production-level data – Analyze data with new adjustment factors applied 8#EUent4
  9. Nielsen Business Case 9#EUent4 Aggregated panel data Netezza Cloud -Combine small and large data -Run all analyses in one place using PySpark/Spark SQL Data Lake Oracle Aggregated production data
  10. 10#EUent4
  11. 11#EUent4
  12. Nielsen Business Case • Performance gains: – What would have taken 36 hours with standalone Python only took 1.5 hours in Spark/Databricks – Edge-cases identified • Advantages of one methodology over another also identified – Short turn-around if any revisions to methodology 12#EUent4
  13. Nielsen Business Case • Other benefits – Reduced time from idea to deployment – Enhanced support/investigation once deployed • Client data inquiries and issues addressed quicker – Collaboration • Application Development teams • International data science teams • These new methodologies being tested in other products – Enhanced skillsets of data scientists 13#EUent4
  14. Summary • At the end of the day, the Databricks/Spark technology allowed us to solve this business use case • The reduced R&D timeline plus extensive testing will allow enhanced methodologies to be available to our clients sooner 14#EUent4
  15. Copyright © 2017 The Nielsen Company. Confidential and proprietary. Special thanks: Mala Sivarajan, Anil Singh