Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)


Published on

More data is being created than ever before and the rate at which it is being created is not slowing. This means that new techniques are being developed and used to tackle big data challenges. At Nielsen, big data means opportunities for new products like digital measurement, but also exposed a skills gap for dealing with big data. This session will cover Nielsen’s successful implementation of Spark and Databricks which has allowed Nielsen to scale its products and its Data Scientists’ skillsets.

Published in: Data & Analytics
  • Be the first to comment

Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)

  1. 1. Copyright©2017TheNielsenCompany.Confidentialandproprietary. Scaling Your Skillset with Your Data Through the Modernization of Data Science Technology Jarrett Garcia October 25, 2017
  2. 2. Copyright©2017TheNielsenCompany.Confidentialand proprietary. 2 WHAT WE MEASURE What Consumers WATCH What Consumers BUY “We are a measurement company. Data is the core of our business. Everything we do is to help our clients and our company realize the full value of that measurement” 6.7 BILLION STORE TRANSACTIONS EACH MONTH 250K PANEL HOUSEHOLDS 1.7 MILLION STORE VISITS EACH MONTH 1.6 TRILLION IMPRESSIONS PER YR 1.7 BILLION VIEWING RECORDS EACH MONTH 7 MILLION WEB EVENTS DAILY FROM MOBILE DEVICES
  3. 3. 3 Copyright©2017TheNielsenCompany.Confidentialandproprietary. CHALLENGE ● Business - sample based data collection to digital scale ● People - Variety of skill sets requires future proof tooling ● Technology - Over dependence on proprietary software ($$$) ● Infrastructure - old, expensive and not scalable
  4. 4. 4 Copyright©2017TheNielsenCompany.Confidentialandproprietary. STRATEGY Modernize technology stack Support speed and scale of R&D Break group silos and increase collaboration Attract Data Science talent Goal: Create a modern Data Science organization that uses best practices and tools Python, SQL, Databricks, AWS , Cluster computing & data access SDLC collaboration software Modern methods & tools
  5. 5. 5 Copyright©2017TheNielsenCompany.Confidentialandproprietary. MODERN TECHNOLOGY STACK Data Lake S3 Storage Bitbucket Databricks Unified Analytics Platform Spark SQL
  6. 6. 6 Copyright©2017TheNielsenCompany.Confidentialandproprietary. DATABRICKS SCALING DATA SCIENCE ✓ Data Science can utilize Python as a SAS replacement. Databricks’ notebooks provide a Python coding platform ✓ Data Science can work within the Cloud ecosystem. Databricks provides the users access to cloud scale and elasticity ✓ Data Science can access data and perform analytics quickly Databricks democratizes data access
  7. 7. 7 Copyright©2017TheNielsenCompany.Confidentialandproprietary. SUCCESS STORIES ● Databricks has helped Nielsen Data Science reduce its SAS footprint by 40% for an annual savings of just under two million dollars. ● Reduced the runtime of the Viewer Assignment model from 36 days to 4 hours. ● The Databricks accelerated a Machine Learning model by removing manual classification process and improve the AUC score from .6 to .8. ● Increased Databricks users from 9 to 156 users while partnering with Databricks to reduce DBU usage by 45%. The Area Under the Curve (AUC) is a common evaluation metric for binary classification problems. If the classifier is very good, the area under the curve will be close to 1.
  8. 8. Copyright © 2017 The Nielsen Company. Confidential and proprietary.
  9. 9. Copyright © 2017 The Nielsen Company. Confidential and proprietary.
  10. 10. 10 Copyright©2017TheNielsenCompany.Confidentialandproprietary. Move to Cloud Computing Requirements: ● Flexible compute options ● Separate compute and storage ● Parallelization options ● Improved deployment speed
  11. 11. 11 Copyright©2017TheNielsenCompany.Confidentialandproprietary. CHALLENGE ● Legacy servers will not support newer versions of various languages, packages and tools ● Older and outdated processor and memory capabilities ● Limited space capabilities and costly expansions ● Long computation times for tools and algorithms ● Over dependence on proprietary software ($$$)