Big Data in Action

3,658
-1

Published on

Big Data technologies and applications in Vietnam

Published in: Technology, Business
4 Comments
22 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,658
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
87
Comments
4
Likes
22
Embeds 0
No embeds

No notes for slide

Big Data in Action

  1. 1. Big Data in Action Ngon Pham, Lana Engineer
  2. 2. Introduction ● ● ● ● ● Introduction Problem Approach Demo Big Data in Vietnam
  3. 3. Introduction ● Internet-enabled devices ○ Tons of data generated every second ● Hardware becomes much cheaper ○ We can now store and process much more data
  4. 4. Problem ● How to process 10TB, how long and how much? ○ Assume ■ Amazon EC2 ■ HDD read at 50MB/s ■ Computation time is less than I/O time
  5. 5. Problem ● 1 machine, 1 core, 1 HDD ○ Time: 55.56 hours ○ Amazon Cost: $0.12 x 55.56 = $6.67 ● 10 machines, 40 cores, 40 HDD ○ Time: 1.39 hours ○ Amazon Cost: $0.48 x 10 x 1.39 = $6.67 ⇒ The same cost but 40x faster
  6. 6. Question ● How to divide data/process between machines? ● How to make each process read data inside the machine directly instead of another? ● How to replicate data, restore the process if there is failure? ● Lots of task management questions...
  7. 7. Approach ● Hadoop ● MongoDB ● Spark
  8. 8. Hadoop Approach ● Storage ○ HDFS
  9. 9. Hadoop Approach ● Computation ○ MapReduce
  10. 10. MongoDB Approach ● Storage ○ Document
  11. 11. MongoDB Approach ● Computation ○ SQL ○ Aggregation ○ MapReduce
  12. 12. Spark Approach ● Storage ○ Resilient distributed dataset (RDD) ○ Persistent backed by HDFS / HBase...
  13. 13. Spark Approach ● Computation ○ Mixed ○ In-memory computing
  14. 14. Demo ● Hadoop ○ Run script to create Amazon cluster ○ Play with Hadoop / HDFS / Spark ○ Process Wikipedia data ● MongoDB ○ Collect data from different sources and analyze
  15. 15. Big Data in Vietnam
  16. 16. Big Data in Vietnam ● Why is MongoDB popular? ○ Lots of PHP developers prefer ○ Simple to setup and use ○ Similar to MySQL
  17. 17. Big Data in Vietnam ● Hadoop is used by a few big local online companies & international startups ○ Analyze tons of data ○ Create new competitive advantage ⇒ But there is a big shortage of skilled engineers
  18. 18. Q&A Q&A
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×