• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data in Action
 

Big Data in Action

on

  • 3,277 views

Big Data technologies and applications in Vietnam

Big Data technologies and applications in Vietnam

Statistics

Views

Total Views
3,277
Views on SlideShare
3,265
Embed Views
12

Actions

Likes
21
Downloads
79
Comments
4

1 Embed 12

http://nhieulevan.wordpress.com 12

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

14 of 4 previous next Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • @ngonpham Thanks for kindly answering my question :)
    Are you sure you want to
    Your message goes here
    Processing…
  • @Khoa: You are right. This is just a simple calculation for the viewer to see the power of Big Data. In reality, there are lots of things needed to add to the calculation, which I can't explain in one slide. Please come to event, then we can exchange more :)
    Are you sure you want to
    Your message goes here
    Processing…
  • Nice info. However, I wonder how you can estimate the Amazon cost between 2 solutions like that. In the latter solution, if they are on-demand nodes, you have to count data transfer (in/out > 10TB), disk I/O,... so processing time and cost both increase a lot. In case of permanent nodes, of course, you have to pay 40x more.
    Are you sure you want to
    Your message goes here
    Processing…
  • Cảm ơn anh đã chia sẻ những thông tin quý giá này!
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data in Action Big Data in Action Presentation Transcript

    • Big Data in Action Ngon Pham, Lana Engineer
    • Introduction ● ● ● ● ● Introduction Problem Approach Demo Big Data in Vietnam
    • Introduction ● Internet-enabled devices ○ Tons of data generated every second ● Hardware becomes much cheaper ○ We can now store and process much more data
    • Problem ● How to process 10TB, how long and how much? ○ Assume ■ Amazon EC2 ■ HDD read at 50MB/s ■ Computation time is less than I/O time
    • Problem ● 1 machine, 1 core, 1 HDD ○ Time: 55.56 hours ○ Amazon Cost: $0.12 x 55.56 = $6.67 ● 10 machines, 40 cores, 40 HDD ○ Time: 1.39 hours ○ Amazon Cost: $0.48 x 10 x 1.39 = $6.67 ⇒ The same cost but 40x faster
    • Question ● How to divide data/process between machines? ● How to make each process read data inside the machine directly instead of another? ● How to replicate data, restore the process if there is failure? ● Lots of task management questions...
    • Approach ● Hadoop ● MongoDB ● Spark
    • Hadoop Approach ● Storage ○ HDFS
    • Hadoop Approach ● Computation ○ MapReduce
    • MongoDB Approach ● Storage ○ Document
    • MongoDB Approach ● Computation ○ SQL ○ Aggregation ○ MapReduce
    • Spark Approach ● Storage ○ Resilient distributed dataset (RDD) ○ Persistent backed by HDFS / HBase...
    • Spark Approach ● Computation ○ Mixed ○ In-memory computing
    • Demo ● Hadoop ○ Run script to create Amazon cluster ○ Play with Hadoop / HDFS / Spark ○ Process Wikipedia data ● MongoDB ○ Collect data from different sources and analyze
    • Big Data in Vietnam
    • Big Data in Vietnam ● Why is MongoDB popular? ○ Lots of PHP developers prefer ○ Simple to setup and use ○ Similar to MySQL
    • Big Data in Vietnam ● Hadoop is used by a few big local online companies & international startups ○ Analyze tons of data ○ Create new competitive advantage ⇒ But there is a big shortage of skilled engineers
    • Q&A Q&A