Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Lake,beyond the Data Warehouse

5,765 views

Published on

Data Lake,beyond the Data Warehouse
Data Science Thailand Meetup#4
Shifting to the 3rd gen platform with Data Lake

Published in: Data & Analytics
  • Be the first to comment

Data Lake,beyond the Data Warehouse

  1. 1. Data Lake, beyond the Warehouse 1 Cheow Lan Lake, Thailand โกเมษ​​จันทวิมล February, 3, 2016 Komes Chandavimol Data Science Thailand Meetup#4 Shifting to the 3rd gen platform with Data Lake
  2. 2. 2 http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427 https://www.domo.com/learn/data-never-sleeps-3-0
  3. 3. The Growth of Data 3 http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427 https://www.domo.com/learn/data-never-sleeps-3-0
  4. 4. 4 http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427 https://www.domo.com/learn/data-never-sleeps-3-0
  5. 5. Can these tools support Big Data?  Spreadsheet?  Database?  Data Mart?  Data Warehouse? 5 Source: Forrester Research’s James Kobielus
  6. 6. The Emergence of Big Data Tools 6 http://blogs.forrester.com/category/hadoop http://solutions.forrester.com/Global/FileLib/webinars/Big_Data_-_Gold_Rush_or_Illusion.pdf
  7. 7. HADOOP 7http://opensource.com/life/14/8/intro-apache-hadoop-big-data
  8. 8. Analytics 3.0 Data Mining Tools 8 Data Discovery and Visualization Tools Tableu.com, RapidMiner.com
  9. 9. How to apply to current environment? 9 http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
  10. 10. Traditional Data Warehouse 10 http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
  11. 11. New Data Management Architecture 11 http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
  12. 12. New Data Management Architecture 12 http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
  13. 13. Data Lake 13 https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
  14. 14. Data Lake A single place to store every type of data in its native format with no fixed limits on account size or file size, high throughput to increase analytic performance and native integration with the Hadoop ecosystem. 15 Reference: James Serra's Blog Data Lake Development with Big Data , Pradeep Pasupuleti (2015) https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
  15. 15. Data Lake Processes 16 www.emc.com
  16. 16. Data Lake and Data Warehouse 17 Hadoop Distributed Compared,BlazeClan Technology,2015
  17. 17. Data Lake and Data Warehouse 18 Hadoop Distributed Compared,BlazeClan Technology,2015
  18. 18. Data Lakes 19 http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key- differences.html
  19. 19. Data Lake  Type of Data  Raw Data  Derived Data  Aggregated Data  Type of Environment  Discovery Environment  Production Environment 20 The Definition of Data Lake, John O’Brien(2015)
  20. 20. How the Data Lake works? 21 http://www.clearpeaks.com/blog/category/tableau Traditional Enterprise Data warehouse
  21. 21. New Data Management Architecture 22 http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
  22. 22. 23 http://www.kdnuggets.com/2014/05/big-data-landscape-v30- analyzed.html
  23. 23. Data Lake Maturity 25 The Definition of Data Lake, John O’Brien(2015)
  24. 24. 4 Maturity Stages of Data Lake  Stage 1 – Pilot Project (Understand the Technology)  Stage 2 – Productionize Hadoop and its capabilities  Stage 3 – Proactive consolidate data to (Big) Data Analytics  Stage 4 – Platform the Data Lake to Core Competency 26 The Definition of Data Lake, John O’Brien(2015) Putting the Data Lake to Work, Teradata, Hortonworks (2015)
  25. 25. Stage 1 – Pilot Project  Handling data at scale  Involves getting the plumbing in place and learning to acquire and transform data at scale.  The analytics may be quite simple, but much is learned about making Hadoop work the way you desire. 27 The Definition of Data Lake, John O’Brien(2015) Putting the Data Lake to Work, Teradata, Hortonworks (2015)
  26. 26. Stage 2– Productionize Hadoop and its capabilities  Involves improving the ability to transform and analyze data.  Find the tools that are most appropriate to their skillset  Acquiring more data and build applications. 28 The Definition of Data Lake, John O’Brien(2015) Putting the Data Lake to Work, Teradata, Hortonworks (2015)
  27. 27. Stage 3 – Proactive consolidate data to (Big) Data Analytics  Involves getting data and analytics into the hands of as many people as possible.  It is in this stage that the data lake and the enterprise data warehouse start to work in unison, each playing its role.  Started with a data lake eventually added an enterprise data warehouse to operationalize its data. 29 The Definition of Data Lake, John O’Brien(2015) Putting the Data Lake to Work, Teradata, Hortonworks (2015)
  28. 28. Big Data Analytics 30 http://dataofthings.blogspot.com/2014/04/the-bbbt-sessions-hortonworks-big-data.html
  29. 29. Data Lake and Big Data Analytics 31http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/
  30. 30. Stage 4 – Platform the Data Lake to Core Competency  Enhance Enterprise Capabilities are added to the data lake.  Few companies have reached this level of maturity, but many will as the use of big data grows,  Require Data governance, compliance, security, and auditing (and incorporate to Company Data Strategy) 32 The Technology of the Business Data Lake, Capgemini (2013)
  31. 31. Business Data Lake 33 The Technology of the Business Data Lake, Capgemini (2014)
  32. 32. 34https://shefsite.files.wordpress.com/2014/04/where.jpg
  33. 33. 35
  34. 34. 36 http://image.slidesharecdn.com/mapr-db-in-hadoop-nosql-overview-150929062856-lva1- app6892/95/maprdb-the-first-inhadoop-document-database-12-638.jpg?cb=1443536326
  35. 35. 37http://www.predictiveanalyticstoday.com/waterline-data- self-service-for-the-hadoop-data-lake/
  36. 36. The Data Lake Unifies Data Discovery, Data Science, and BI 3.0 38 Big Data Self Serve Business Data Science Machine Learning Visual Analytics Business Discovery Deep Learning Self Serve Business Hadoop Feature Engineering Spark Business Intelligence 3.0 YARN Predictive Analytics Hive Data Lake Data Visualization Graph Analytics Big Data
  37. 37.  20+ posts relates to “Data Lake”  Type “Data Science Thailand” “Data Lake” 40
  38. 38. 41
  39. 39. 42 http://www.clearpeaks.com/blog/category/tableau Traditional Enterprise Data warehouse
  40. 40. Questions? 43
  41. 41. 44

×