Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Curse of the Data Lake Monster

162 views

Published on

Artificial intelligence and machine learning are currently all the rage. Every organisation is trying to jump on this bandwagon and cash in on their data reserves. At ThoughtWorks, we’d agree that this tech has huge potential — but as with all things, realising value depends on understanding how best to use it.

Published in: Software
  • Be the first to comment

  • Be the first to like this

The Curse of the Data Lake Monster

  1. 1. The Curse of the Data Lake Monster Kiran Prakash and Lucy Chambers
  2. 2. We have a problem @lucyfedia
  3. 3. So what is a data lake? ● Democratisation of Data ● Centralized and Monolithic ● Domain Agnostic ● Structured and unstructured data @kiran_p
  4. 4. https://martinfowler.com/articles/data-monolith-to-mesh.html @kiran_p
  5. 5. Why do Data Lakes Fail? @kiran_p
  6. 6. Build it they will come! ● Seen primarily as an infrastructure problem ● Pinning down uses cases & value stream is hard ● Analysis paralysis & overengineering @kiran_p
  7. 7. Centralised and Monolithic https://martinfowler.com/articles/data-monolith-to-mesh.html @kiran_p
  8. 8. Functional Decomposition @kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
  9. 9. Axis of change @kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
  10. 10. @kiran_p Data Swamps
  11. 11. Focus on initiatives which align with business outcomes. Structure teams around business capabilities. Product Thinking Self service platform for storage, catalogue, computation, access rights and pipelines etc. Autonomous teams with clear bounded context building and running products independently. Platform Thinking Domain Driven Design The Data Mesh Paradigm @kiran_p
  12. 12. Product Thinking For Data Projects @lucyfedia
  13. 13. Project vs Product Project Mode Product Mode START Solution (often) defined at outset. Problem identified at outset. Solution developed iteratively and tested. STOP Team moves on when solution delivered. Team moves on when problem verifiably fixed. FOCUS Features delivered in a given time & budget. Progress made on key business goals (measured by metrics). HAS FIXED SCOPE? Usually. Almost never. @lucyfedia
  14. 14. Product teams have two jobs and two customers ● Deliver business capabilities - External User ● Expose their domain’s data for others to consume - (often) Internal User @lucyfedia
  15. 15. ● Discoverable ● Addressable ● Trustworthy ● Self-describing ● Interoperable ● Secure A data product is:
  16. 16. @lucyfedia Data Swamps
  17. 17. “If a tree falls in a wood, and no-one is around to hear it, does it make a sound?” - Some philosopher @lucyfedia
  18. 18. “If someone puts data into a data lake, and no-one can find it, is it even there?” - Me @lucyfedia
  19. 19. Data Mesh Architecture
  20. 20. Domain Driven Design Self Service Platforms @kiran_p
  21. 21. Distributed Pipelines @kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
  22. 22. @kiran_p Self service platforms for: ● Storage ● Data pipeline ● Discovery & Catalogue ● Access control ● Archiving ● Encryption
  23. 23. Data Mesh @kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
  24. 24. Example: a fictional insurance company
  25. 25. Reduce fraud by 5% per year Identify fraudulent claims Reduce vehicle damage claims by 2% per year Increase conversion rate by 2% Predict Weather Patterns Upselling Insurance Products The Use-Cases @lucyfedia
  26. 26. @lucyfedia Fraud Detection Customer Claims Customer Health Customer Vehicle Claims Health Claims Vehicle Customer House Claims House Lake Shore Marts Data Lake (for Raw Data)
  27. 27. @lucyfedia Fraud Detection Customer Claims Customer Health Customer Vehicle Claims Health Claims Vehicle Customer House Claims House Lake Shore Marts Data Lake (for Raw Data) Upselling Customer Products Products
  28. 28. @lucyfedia Fraud Detection Customer Claims Customer Health Customer Vehicle Claims Health Claims Vehicle Customer House Claims House Lake Shore Marts Data Lake (for Raw Data) Upselling Customer Products Alert Customer Weather Products Weather
  29. 29. Not a technology problem Becoming data-driven is usually an organisational problem Work with cross functional product teams and real use- cases to deliver business value. Build by autonomous cross functional teams using data platforms instead of centralized data lake Domain data is a product Distributed Data Mesh Key Takeaways @kiran_p & @lucyfedia
  30. 30. Kiran Prakash @kiran_p Thank you Lucy Chambers @lucyfedia How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh martinfowler.com/articles/ data-monolith-to-mesh.html The Curse of the Data Lake Monster thoughtworks.com/insights/ blog/curse-data-lake-monster

×