Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using hadoop for big data

3,114 views

Published on

Hadoop for (Young) Data Scientist
Big Data - Hadoop Spark Workshops

Published in: Data & Analytics
  • Be the first to comment

Using hadoop for big data

  1. 1. Hadoop for (Young) Data Scientist Komes Chandavimol and Team Data Science Lab, Thailand komes@datascienceth.com
  2. 2. Agenda • Big Data, Analytics and Data Science • Hadoop + Sparks Workshops • Sharing Experience: Hadoop (Real) Use Cases • Hadoop + Spark Trends,
  3. 3. 3 Big Data, Analytics and Data Science
  4. 4. Big Data http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427 https://www.domo.com/learn/data-never-sleeps-3-0
  5. 5. Internet of Things http://topmanagement.com.mx/innovacion-social-y-empresarial-objetivo-de-hitachi/
  6. 6. 6 http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427 https://www.domo.com/learn/data-never-sleeps-3-0 The Growth of Data
  7. 7. 7 http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427 https://www.domo.com/learn/data-never-sleeps-3-0 What is Big Data?
  8. 8. 8 http://blogs.forrester.com/category/hadoop http://solutions.forrester.com/Global/FileLib/webinars/Big_Data_-_Gold_Rush_or_Illusion.pdf The Big Data Tools
  9. 9. http://thebigdatablog.weebly.com/blog/the-hadoop-ecosystem-overview
  10. 10. 11 http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/ Traditional Data Management Architecture
  11. 11. 12 http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/ New Data Management Architecture
  12. 12. 13 http://www.kdnuggets.com/2014/05/big-data-landscape-v30-analyzed.html
  13. 13. 14 https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now Data Lake
  14. 14. How the Data Lake works? 15 http://www.clearpeaks.com/blog/category/tableau Traditional Enterprise Data warehouse
  15. 15. 16 What you consume from Data Lake? https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
  16. 16. 17 Volume? Variety? Velocity? https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
  17. 17. 18 Value https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
  18. 18. 19 Big Data + Analytics = Values https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
  19. 19. Big Data Analytics 20http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/
  20. 20. Big Data Analytics 21 http://dataofthings.blogspot.com/2014/04/the-bbbt-sessions-hortonworks-big-data.html
  21. 21. Big Data Analytics 22http://www.gartner.com/it-glossary/predictive-analytics
  22. 22. 23 How to do Big Data Analytics? https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
  23. 23. Data Science Experience Sharing, Big Data Challenge #2,Bangkok Thailand http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram What is Data Science?
  24. 24. The Rise of Data Scientist 27 http://flowingdata.com/2009/06/04/rise-of-the-data-scientist/ 2009 https://hbr.org/
  25. 25. 28http://hrb.org http://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html 2014 The Rise of Data Scientist
  26. 26. Data Science Experience Sharing, Big Data Challenge #2,Bangkok Thailand http://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html 2014 The Data Science
  27. 27. 30 The Solution, Data Science Team
  28. 28. 31 Data Science Team Doing Data Science by O'Neil et al (2013)
  29. 29. 32 Doing Data Science by O'Neil et al (2013)
  30. 30. 33 Doing Data Science by O'Neil et al (2013) Data Science Team Analyzing the Analyzers, Harris (2013)
  31. 31. 34 Data Science Team Data Scientist & Data Engineer http://www.kdnuggets.com/2015/11/different-data-science-roles-industry.html
  32. 32. 35 Data Science Team Data Scientist & Data Engineer http://www.kdnuggets.com/2015/11/different-data-science-roles-industry.html https://www.facebook.com/DataScienceTh/posts/931828353527079:0
  33. 33. 36 Data Science Professionals http://www.kdnuggets.com/2015/11/different-data-science-roles-industry.html
  34. 34. 37 Data Science for Dummies Pierson (2015) ∗Build In-house Team • Train existing employee • Train existing employee and hire experts • Hire experts ∗Outsourcing requirements to private DS consultants • Outsourcing for comprehensive DS Strategy development • Outsource for DS Solutions to specific problem ∗Leverage Cloud-based platform solutions How to build DS Team?
  35. 35. Machine Learning Improving Performance in some Task with Experience”. Tom Mitchell Tom Mitchell (1998) The field of study that gives computers the ability to learn without being explicitly programmed. Arthur Samuel (1990) Wikipedia, Data Visualization for Dummies (2014) Data Points: Visualization That Means Something(2013) 38 Machine Learning deals with systems that can learn from data.
  36. 36. 39
  37. 37. Machine Learning  Discovery • Class Discovery • Correlation Discovery • Novelty (Surprise) Discovery • Association (or Link Discovery) 40 KirkBorne-workshop-ODSC2016.pdf
  38. 38. The XYZ of Data Science Smart X : • Smart Cities • Smart Highways • Smart Supply Chain Precision Y : • Precision Medicine • Precision Farming • Precision Pricing Personalized Z : • Personalized Health • Personalized Learning • Personalized Shopping Experience 41 KirkBorne-Workshop-ODSC2016.pdf Intelligence at the edge of the network… at the point of data collection
  39. 39. 42DataInquest – Predictive Analytics and Data Science Bootcamp
  40. 40. Data Science is a Team Sport http://www.ibmbigdatahub.com/blog/why-data-science-team-sport
  41. 41. 44 How to Start?
  42. 42. 45 Hadoop + Spark Workshops
  43. 43. 49 Workshop #1 การติดตั้ง HDFS และ YARN
  44. 44. 51 Workshop #2 WordCount
  45. 45. 53 Workshop #3 WordCount (Streaming)
  46. 46. 54 Workshop #4 WordCount (Frequency Sort)
  47. 47. 56 Workshop #5 Setup Cloudera QuickStart
  48. 48. 58 Workshop #6 Exploring HBASE data in HUE
  49. 49. 59 Workshop #7 Design a Schema for quick twitter relationship lookup
  50. 50. 60 Workshop #8 Design a schema for IoT log (Smart Meter)
  51. 51. 61 Workshop #9 Create an HBase table for Smart meter data
  52. 52. 62 Workshop #10 Bank Customer Snapshot
  53. 53. 65 Workshop #10.1 - 10.1 Create Hive Tables 10.2 Create External Hive Tables 10.3 Create External Hive Tables 10.4 Partition
  54. 54. 67 Workshop #11 SQOOP
  55. 55. 73 Workshop spk1 WordCount spk2 WordCount spk3 WordCount
  56. 56. 76 Workshop spk4 SparkSQL + ML
  57. 57. 84 Sharing Experience:
  58. 58. Source: Analytics: The New Path to Value, a joint MIT Sloan Management Review and IBM Institute for Business Value study. Copyright © Massachusetts Institute of Technology 2010. Top Performers Use Analytics 5 Times More Than Lower Performers
  59. 59. Revenue - Cost = Profit
  60. 60. Monitoring and Maintenance Data sources: IoT Sensors in factory Data products: predictive maintenance models http://www.electrex.it/en/news/600-automated-energy-management-system-a-enms-for-cement-production-plants.ht
  61. 61. Customer Engagement + Location Data sources: Mobile App, Loyalty Program, GIS Data products: Buying behavior analysis, coupon-response model , location visualization http://www.fastcompany.com/3020859/most-creative-people/how-chinas-one-child-policy-forced-starbucks-to-rethink-its-beijing-sto
  62. 62. Fuel Saving Data sources: Telematics (sensor), GPS Data products: Prescriptive analytics – route optimization, predictive maintenance (parts/malfunction) http://www.cnet.com/news/ups-turns-data-analysis-into-big-savings/
  63. 63. Fraud Detection Data sources: historical pattern of transaction data Data products: predictive models – fraud/non-fraudhttps://bluefishway.com/2013/09/13/panic-oh-no-not-again/
  64. 64. HR Analytics – Google Hiring Data sources: Historical hiring attributes Data products: Predictive model – recruiting high performer Behavioral Test Situational Test GPA Brain Teaser Good School
  65. 65. Average ROI of Analytics/Data Science
  66. 66. 93 Hadoop + Spark Trends

×