Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Big Data in Action
Ngon Pham, Lana Engineer
Introduction
●
●
●
●
●

Introduction
Problem
Approach
Demo
Big Data in Vietnam
Introduction
● Internet-enabled devices
○ Tons of data generated every second

● Hardware becomes much cheaper
○ We can no...
Problem
● How to process 10TB, how long and how
much?
○ Assume
■ Amazon EC2
■ HDD read at 50MB/s
■ Computation time is les...
Problem
● 1 machine, 1 core, 1 HDD
○ Time: 55.56 hours
○ Amazon Cost: $0.12 x 55.56 = $6.67

● 10 machines, 40 cores, 40 H...
Question
● How to divide data/process between
machines?
● How to make each process read data inside
the machine directly i...
Approach
● Hadoop
● MongoDB
● Spark
Hadoop Approach
● Storage
○ HDFS
Hadoop Approach
● Computation
○ MapReduce
MongoDB Approach
● Storage
○ Document
MongoDB Approach
● Computation
○ SQL
○ Aggregation
○ MapReduce
Spark Approach
● Storage
○ Resilient distributed
dataset (RDD)
○ Persistent backed by
HDFS / HBase...
Spark Approach
● Computation
○ Mixed
○ In-memory
computing
Demo
● Hadoop
○ Run script to create Amazon cluster
○ Play with Hadoop / HDFS / Spark
○ Process Wikipedia data

● MongoDB
...
Big Data in Vietnam
Big Data in Vietnam
● Why is MongoDB popular?
○ Lots of PHP developers prefer
○ Simple to setup and use
○ Similar to MySQL
Big Data in Vietnam
● Hadoop is used by a few big local online
companies & international startups
○ Analyze tons of data
○...
Q&A

Q&A
Upcoming SlideShare
Loading in …5
×

Big Data in Action

3,943 views

Published on

Big Data technologies and applications in Vietnam

Published in: Technology, Business
  • @Khoa: You are right. This is just a simple calculation for the viewer to see the power of Big Data. In reality, there are lots of things needed to add to the calculation, which I can't explain in one slide. Please come to event, then we can exchange more :)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Cảm ơn anh đã chia sẻ những thông tin quý giá này!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Big Data in Action

  1. 1. Big Data in Action Ngon Pham, Lana Engineer
  2. 2. Introduction ● ● ● ● ● Introduction Problem Approach Demo Big Data in Vietnam
  3. 3. Introduction ● Internet-enabled devices ○ Tons of data generated every second ● Hardware becomes much cheaper ○ We can now store and process much more data
  4. 4. Problem ● How to process 10TB, how long and how much? ○ Assume ■ Amazon EC2 ■ HDD read at 50MB/s ■ Computation time is less than I/O time
  5. 5. Problem ● 1 machine, 1 core, 1 HDD ○ Time: 55.56 hours ○ Amazon Cost: $0.12 x 55.56 = $6.67 ● 10 machines, 40 cores, 40 HDD ○ Time: 1.39 hours ○ Amazon Cost: $0.48 x 10 x 1.39 = $6.67 ⇒ The same cost but 40x faster
  6. 6. Question ● How to divide data/process between machines? ● How to make each process read data inside the machine directly instead of another? ● How to replicate data, restore the process if there is failure? ● Lots of task management questions...
  7. 7. Approach ● Hadoop ● MongoDB ● Spark
  8. 8. Hadoop Approach ● Storage ○ HDFS
  9. 9. Hadoop Approach ● Computation ○ MapReduce
  10. 10. MongoDB Approach ● Storage ○ Document
  11. 11. MongoDB Approach ● Computation ○ SQL ○ Aggregation ○ MapReduce
  12. 12. Spark Approach ● Storage ○ Resilient distributed dataset (RDD) ○ Persistent backed by HDFS / HBase...
  13. 13. Spark Approach ● Computation ○ Mixed ○ In-memory computing
  14. 14. Demo ● Hadoop ○ Run script to create Amazon cluster ○ Play with Hadoop / HDFS / Spark ○ Process Wikipedia data ● MongoDB ○ Collect data from different sources and analyze
  15. 15. Big Data in Vietnam
  16. 16. Big Data in Vietnam ● Why is MongoDB popular? ○ Lots of PHP developers prefer ○ Simple to setup and use ○ Similar to MySQL
  17. 17. Big Data in Vietnam ● Hadoop is used by a few big local online companies & international startups ○ Analyze tons of data ○ Create new competitive advantage ⇒ But there is a big shortage of skilled engineers
  18. 18. Q&A Q&A

×