Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Concepts, use cases and principles to build big data systems (1)

343 views

Published on

1) Introduction to the key Big Data concepts
1.1 The Origins of Big Data
1.2 What is Big Data ?
1.3 Why is Big Data So Important ?
1.4 How Is Big Data Used In Practice ?

2) Introduction to the key principles of Big Data Systems
2.1 How to design Data Pipeline in 6 steps
2.2 Using Lambda Architecture for big data processing

3) Practical case study : Chat bot with Video Recommendation Engine

4) FAQ for student

Published in: Data & Analytics
  • Be the first to comment

Concepts, use cases and principles to build big data systems (1)

  1. 1. Concepts, use cases and principles to build big data systems http://www.bigdatavietnam.org https://www.facebook.com/bigdatavn Compiled by Nguyễn Tấn Triều
  2. 2. Key Contents 1. Introduction to the key Big Data concepts ○ The Origins of Big Data ○ What is Big Data ? ○ Why is Big Data So Important ? ○ How Is Big Data Used In Practice ? 2. Introduction to the key principles of Big Data Systems ○ How to design Data Pipeline in 6 steps ○ Using Lambda Architecture for big data processing 3. Practical case study ○ Chat bot with Video Recommendation Engine 4. FAQ for student
  3. 3. Introduction to the key Big Data concepts ○ The Origins of Big Data ○ What is Big Data ? ○ Why is Big Data so important ? ○ How Is Big Data used in practice ?
  4. 4. The Origins of Big Data
  5. 5. https://www.kdnuggets.com/2017/02/origins-big-data.html
  6. 6. What is Big Data ?
  7. 7. What is Big Data ?
  8. 8. What is Big Data ?
  9. 9. Why is Big Data So Important ?
  10. 10. Why is Big Data So Important ?
  11. 11. Source: https://internetofthingsagenda.techtarget.com/definition/Internet-of-Things-IoT How Is Big Data Used In Practice ?
  12. 12. How Is Big Data Used In Practice ?
  13. 13. Why is Big Data So Important ?
  14. 14. How Is Big Data Used In Practice ? Device Analytics Which device is most popular used ?
  15. 15. How Is Big Data Used In Practice ? Time-series Analytics The peak hours of system
  16. 16. How Is Big Data Used In Practice ? GeoLocation Heatmap Analytics
  17. 17. Introduction to the key principles of Big Data Systems ○ How to design Data Pipeline in 6 steps ○ Using Lambda Architecture for big data processing
  18. 18. How to design Data Pipeline Systems Collecting → Storing → Processing → Analyzing → Learning → Visualizing Data engineering process: 3 tasks 1. Collecting a. Concepts b. Technology 2. Storing a. Big Data Storage Concepts b. Big Data Storage Technology 3. Processing a. Big Data Processing Concepts b. Big Data Processing Technology Data Science/Machine Learning process: 3 tasks 4) Analyzing → 5) Learning → 5) Visualizing
  19. 19. Data Engineer Tasks Data Analyst Tasks Big Data Analytics Lifecycle Collecting Storing Processing Analyzing Learning Visualizing
  20. 20. (Collecting) → Storing → Processing → Analyzing → Learning → Reacting
  21. 21. Collecting
  22. 22. Collecting tools Batch collecting: Apache Sqoop ( from DBMS to Apache Hadoop) Real-time collecting: Log Collector with Apache Kafka
  23. 23. Collecting → (Storing) → Processing → Analyzing → Learning → Reacting
  24. 24. Storing Concepts ● Clusters ● Scale-Up vs Scale-Out ● File Systems and Distributed File Systems ● NoSQL ● Sharding ● Replication ● Sharding and Replication ● CAP Theorem
  25. 25. Clusters
  26. 26. Scale-Up vs Scale-Out
  27. 27. Database in Big Data
  28. 28. NoSQL
  29. 29. NoSQL
  30. 30. Sharding
  31. 31. Replication (Master-Slave)
  32. 32. Replication (Peer-to-Peer)
  33. 33. CAP Theorem
  34. 34. Collecting → Storing → (Processing) → Analyzing → Learning → Reacting
  35. 35. Processing concepts ● Parallel Data Processing ● Distributed Data Processing ● Hadoop ● Processing Workloads ● Cluster ● Processing in Batch Mode ● Processing in Realtime Mode
  36. 36. Parallel Data Processing
  37. 37. Distributed Data Processing
  38. 38. Hadoop Hadoop is a versatile framework that provides both processing and storage capabilities
  39. 39. Batch processing (offline processing)
  40. 40. Transactional processing
  41. 41. Cluster
  42. 42. Map and Reduce Tasks
  43. 43. Processing in Realtime Mode
  44. 44. When standard relational database (Oracle,MySQL, ...) is not good enough the “analytic system” MySQL database from a startup, tracking all actions in mobile games: iOS, Android, ...
  45. 45. 3 common problems in Big Data System 1. Size: the volume of the datasets is a critical factor. 2. Complexity: the structure, behaviour and permutations of the datasets is a critical factor. 3. Technologies: the tools and techniques which are used to process a sizable or complex dataset is a critical factor.
  46. 46. Key ideas of Lambda Architecture in Big Data System
  47. 47. Practical case study Chat bot with Video Recommendation Engine
  48. 48. Problem ● A company want to develop a chat bot for news recommendation ● They want to classify data into standard categories (26 categories) for user-friendly query ● The engineering team have develop a data pipeline for system
  49. 49. Solution Diagram Big Data is here Author @tantrieuf31
  50. 50. Problem: Topic Classification for News
  51. 51. Solution Diagram
  52. 52. FAQ for students How to learn Big Data ? Job Opportunity Ref resources
  53. 53. How to learn Big Data ? 1. Have lots of passion, curiosity with data 2. Knowledge about data structure, statistics and basic maths 3. Love to solve complex problems with data-driven mindset 4. Database knowledge: when to use NoSQL vs RDBMS 5. Knowledge about distributed computing 6. Linux / Open Source Tools 7. Programming language: Python / Java / SQL / JavaScript 8. English skills
  54. 54. Big Data Job Market is really hot https://www.class-central.com/subject/big-data
  55. 55. Some good books for self-learning ● http://sachvui.com/ebook/du-lieu-lon-big-data.281.html ● https://drive.google.com/open?id=0B3dHGVpTXDOhQXJCR01PVkpQMGM ● https://drive.google.com/file/d/1rPvfio6EkaUvGtgfQoq9p9Fa2ljOMIn1/view?usp=sharing ● https://drive.google.com/open?id=0B3dHGVpTXDOhVTBKX09NUnlLcm8
  56. 56. Free MOOC https://www.class-central.com/subject/big-data

×