Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Minimalism & Data Value by Michaela Regneri

120 views

Published on

Data is "the new oil" or "the new gold". In the context of AI systems, we often treat Data more like "the new bacon": Bigger data is better data, and we overfeed AI systems with data as a cheap, infinitely available resource.

We want to fight data's bacon-like image by promoting the concept of data minimalism for AI as a strategy to enhance both, quality and sustainability of AI systems. In order to survive as data minimalists, we compute the (monetary) value of single data points, and then try to just keep the valuable ones.

Implementing this concept is as challenging and as interesting as it sounds. As a corporate-scale example, we show how much data actually is wasted in an e-commerce recommender system, and how we also found toxic data while applying our data-minimalization strategies.

Topic was presented at a joint event of Munich Datageeks and Women in Big Data Munich

https://munich-datageeks.de/
https://www.womeninbigdata.org/

For more content like this, visit IT Knowledge Bank website:
https://www.itknowledgebank.com/

Video
You can watch the recording of the presentation in our YouTube channel:
https://youtu.be/zNAXnWUaqaU

About Michaela Regneri

Michaela Regneri works as a Senior Expert for Artificial Intelligence & Cognitive Computing at OTTO (Hamburg). She is fascinated by AI, especially by its visual, linguistic and cognitive implications for human-computer interaction.

After her PhD in Computational Linguistics, she joined Der SPIEGEL as a R&D engineer, working on search and text mining for the newsroom. In 2016, she started to work at OTTO as a product manager for Business Intelligence Analytics, developing applications with and around data science.

In her current role, she continues to drive & challenge different areas of AI for e-commerce, with a particular interest in AI innovation processes and corporate digital responsibility.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Data Minimalism & Data Value by Michaela Regneri

  1. 1. 1 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Data Minimalism & Data Value (Or: Why Data should not be the new Bacon) Michaela Regneri Munich Data Geeks / Women in Big Data, 22.01.2020 Joint work with: Julia Georgi, Jurij Kost, Niklas Pietsch, Sabine Stamm Special Thanks to Malte Hoffmann & Timo Schulz
  2. 2. 2 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Data Minimalism
  3. 3. 3 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Data Minimalism & AI
  4. 4. 4 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Data Costs beyond Money: Energy & Emissions The internet needs more enegery than a metropole (25 power plants) Data traffic causes more carbon emissions than air traffic.
  5. 5. 5 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Data Costs beyond Money: Safety (and trust, and more money) Source: Statista - Data carries enormous value - Data value does not necessarily depend on its mass! 0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 0 200 400 600 800 1.000 1.200 1.400 1.600 1.800 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 US Data Breaches # data breaches # stolen records (millions)
  6. 6. How much Data do we need?
  7. 7. 7 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 How much data do we need? Does every click count? Performance(gains!) Data Volume (costs!) Standard learning curve – depends on algorithm & task! Assumption 1: most data value happens in automation Assumption 2: You can measure perofrmance (usage-based value)
  8. 8. 8 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 How much data do we need? (At OTTO, in real life) Example case: a recommender system (in multiple versions) In our case: „Customers who clicked this item…“
  9. 9. 9 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 How much data do we need? (At OTTO, in real life) Experiment: How does Data Volume affect KPIs? (Machine Learning System) 🍐 🍇 🍒 🍎 🍎 🍎 🍐 🍐 🍫 🍫 🍒 🍒🍎 🍇 You might also like 🍎 🍎 🍎 User Sessions Word2Vec Recommendations 🍒 You might also like 🍎 🍎 🍎 10% of Data 🍐 🍇 🍒 🍎 🍎 🍐 🍫 🍒 You might also like 🍎 🍎 🍎 20% of Data 🍐 🍇 🍒 🍎 🍎 🍐 🍫 🍐 🍇 🍒 🍎 🍎 🍐 🍫 🍒 You might also like 🍎 🍎 🍎 30% of Data 🍐 🍇 🍒 🍎 🍎 🍐 🍫 🍐 🍇 🍒 🍎 🍎 🍐 🍫 🍐 🍇 🍒 🍎 🍎 🍐 🍫 Increase Data Volume … Evaluate KPI change
  10. 10. 10 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 How much data do we need? (At OTTO, in real life) Experiment: How does Data Volume affect KPIs? (Machine Learning System) 0,70% 3,33% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% KPIs(normalizedfrom0to1) Amount of Data (relative to max.) computing time revenue conversion rate % of customers who bought a recommendation ~ 9 TB
  11. 11. Which data do we need? (How much is a click?)
  12. 12. 12 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 customer data celery AI-Algorithm (cravingknowledge) data deluxe burger expectable Page Impression (e.g. Daily Deal) click on new search result redundand or irrelevant information new & relevant information 💶 💶 💶 💰 € € € Which data do we need? Finding clicks that matter.
  13. 13. 13 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 data celery data deluxe burger expectable Page Impression (e.g. Daily Deal) redundand or irrelevant information new & relevant information 💶 💶 💶 💰 € € € customer click on new search result AI-Algorithm (cravingknowledge) Which data do we need? Finding clicks that matter.
  14. 14. 14 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Data Value: Sensitivity Analysis 🍐 🍇 🍒 🍎 🍎 🍎 🍐 🍐 🍫 🍫 🍒 🍒🍎 🍇 You might also like 🍎 🍎 🍎 Reference System 🍎 🍎 🍎 You might also like: ? ? ? 🍎 🍐 🍇 🍒 🍎 🍎 🍎 🍐 🍐 🍫 🍫 🍒🍎 🍇 🍎 🍎 🍎 You might also like: ? ? ? 🍎 🍎 🍎 🍎 You might also like: ? ? ? 🍎 🍎 🍎 🍎 You might also like: ? ? ? 🍎 🍇 🍒 🍎 🍐 🍐 🍫 🍫 🍒🍎 🍇 🍐 🍎 🍎 🍇 🍒 🍎 🍐 🍐 🍫 🍫 🍒🍎 🍇 🍐 🍎 🍎 🍇 🍒 🍎 🍐 🍐 🍫 🍫 🍒🍎 🍇 🍐 🍎 🍎 500 test systems with one data point omitted in each Difference in recommendation quality? ? Computing the value of individual data points (by leaving them out)
  15. 15. 15 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Data value: Results (Lab-scale experiment, real revenue data) More than 62%of test data points with positive value about 11%with negative value („toxic data“) 26%of the data points with (virtually) no effect
  16. 16. 16 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Data Value ↔ Informational Value 2Sensitivity Analysis: what does a single data point change? vs. 3 Relate output changes to KPI changes (more informed does not always imply better performance) 💰?= 1Determine your system‘s business impact
  17. 17. 17 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Toxic Data: Typical Online Marketing Example
  18. 18. 18 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Toxic Data: New Edition of Tech vs. Sales Experiment: „Deal of the Day“ and Recommendation Quality • Generates lots of clicks / engagement • Generates lots of “unnatural” use sessions, too…
  19. 19. 19 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Product as Deal of the Day (Averaged over one month’s deals) Click rate: -28% Conversion Rate: -8% 30 days ahead of „deal day“ 30 days after „deal day“ Toxic Data: New Edition of Tech vs. Sales Experiment: „Deal of the Day“ and Recommendation Quality
  20. 20. 20 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Data Minimalism Sustainability: economical, ecological and social necessity for future-proof systems Quality: enabling optimization by filtering toxic data Is as complex as the decision system using the data (but still feasible – needs explainable AI) …so much fun research to do. ☺
  21. 21. 21 Data Minimalism & Data Value (Why Data shouldn‘t be the new Bacon) Michaela Regneri Munich Datageeks, January 2020 Looking forward to chat about… michaela.regneri@otto.de - Data value & explainable AI - AI, ethics & digitallLiteracy - Applied research & innovation processes Analyzing Hypersensitive AI: Instability in Corporate-Scale Machine Learning. Explainable AI (XAI) 2018 Computing the Value of Data: Towards Applied Data Minimalism. Green Data Mining 2019

×