Your SlideShare is downloading. ×
0
Mariusz Gil

BIG
data
ecosystem
/ ABOUT ME /
This talk is about

BIG DATA
What is...

BIG DATA?
VOLUME
large amounts of data
VELOCITY
needs to be analyzed quickly
VARIETY

different types of structured and unstructured data
Big Data is data that is too large,
complex and dynamics for any conventional data tools
to capture, store, manage and ana...
30 billion pieces of content we added past month
more than 2 billion videos were watched yesterday
more than 58 millions messages were send yesterday
/ MAIN QUESTIONS /
WHY?
49

%
IMPROVED RISK
MANAGEMENT

32

%
INCREASED
SALES FIGURES

36 40

%
IMPROVED
MANAGEMENT
CONTROL

%
IT ANALYSIS

43

%
...
690 nodes Hadoop cluster for predictions and analytics
HOW?
HDFS

YARN / MapReduce v2

HADOOP DISTRIBUTED FILE SYSTEM

DISTRIBUTED PROCESSING FRAMEWORK

COLUMNAR STORAGE

SQL DATA WA...
We can choose from multiple

VENDORS
like Cloudera, HortonWorks or Amazon
Even from...
Can we get results

FASTER?
Cloudera Impala
Storm
Apache Drill
thanks
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Upcoming SlideShare
Loading in...5
×

Big data ecosystem

385

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
385
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Big data ecosystem"

  1. 1. Mariusz Gil BIG data ecosystem
  2. 2. / ABOUT ME /
  3. 3. This talk is about BIG DATA
  4. 4. What is... BIG DATA?
  5. 5. VOLUME large amounts of data
  6. 6. VELOCITY needs to be analyzed quickly
  7. 7. VARIETY different types of structured and unstructured data
  8. 8. Big Data is data that is too large, complex and dynamics for any conventional data tools to capture, store, manage and analyze.
  9. 9. 30 billion pieces of content we added past month
  10. 10. more than 2 billion videos were watched yesterday
  11. 11. more than 58 millions messages were send yesterday
  12. 12. / MAIN QUESTIONS /
  13. 13. WHY?
  14. 14. 49 % IMPROVED RISK MANAGEMENT 32 % INCREASED SALES FIGURES 36 40 % IMPROVED MANAGEMENT CONTROL % IT ANALYSIS 43 % MARKET-ORIENTED PRODUCT DEVELOPMENT 27 % FINANCES AND ECONOMICS
  15. 15. 690 nodes Hadoop cluster for predictions and analytics
  16. 16. HOW?
  17. 17. HDFS YARN / MapReduce v2 HADOOP DISTRIBUTED FILE SYSTEM DISTRIBUTED PROCESSING FRAMEWORK COLUMNAR STORAGE SQL DATA WAREHOUSE ENGINE HIVE DATA SERIALIZATION AVRO SCALABLE MACHINE LEARNING MAHOUT SCRIPTING FOR LARGE DATA SETS PIG WORKFLOWS ORCHESTRATION PROVISIONING, MANAGING AND MONITORING CLUSTERS HBASE DATA EXCHANGE SQOOP OOZIE DISTRIBUTED COORDINATION SERVICE ZOOKEEPER LOG COLLECTOR FLUME AMBARI WHIRR RUNNING CLOUD SERVICES
  18. 18. We can choose from multiple VENDORS like Cloudera, HortonWorks or Amazon
  19. 19. Even from...
  20. 20. Can we get results FASTER?
  21. 21. Cloudera Impala Storm Apache Drill
  22. 22. thanks
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×