Your SlideShare is downloading. ×
0
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Big data ecosystem
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big data ecosystem

381

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
381
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Mariusz Gil BIG data ecosystem
  • 2. / ABOUT ME /
  • 3. This talk is about BIG DATA
  • 4. What is... BIG DATA?
  • 5. VOLUME large amounts of data
  • 6. VELOCITY needs to be analyzed quickly
  • 7. VARIETY different types of structured and unstructured data
  • 8. Big Data is data that is too large, complex and dynamics for any conventional data tools to capture, store, manage and analyze.
  • 9. 30 billion pieces of content we added past month
  • 10. more than 2 billion videos were watched yesterday
  • 11. more than 58 millions messages were send yesterday
  • 12. / MAIN QUESTIONS /
  • 13. WHY?
  • 14. 49 % IMPROVED RISK MANAGEMENT 32 % INCREASED SALES FIGURES 36 40 % IMPROVED MANAGEMENT CONTROL % IT ANALYSIS 43 % MARKET-ORIENTED PRODUCT DEVELOPMENT 27 % FINANCES AND ECONOMICS
  • 15. 690 nodes Hadoop cluster for predictions and analytics
  • 16. HOW?
  • 17. HDFS YARN / MapReduce v2 HADOOP DISTRIBUTED FILE SYSTEM DISTRIBUTED PROCESSING FRAMEWORK COLUMNAR STORAGE SQL DATA WAREHOUSE ENGINE HIVE DATA SERIALIZATION AVRO SCALABLE MACHINE LEARNING MAHOUT SCRIPTING FOR LARGE DATA SETS PIG WORKFLOWS ORCHESTRATION PROVISIONING, MANAGING AND MONITORING CLUSTERS HBASE DATA EXCHANGE SQOOP OOZIE DISTRIBUTED COORDINATION SERVICE ZOOKEEPER LOG COLLECTOR FLUME AMBARI WHIRR RUNNING CLOUD SERVICES
  • 18. We can choose from multiple VENDORS like Cloudera, HortonWorks or Amazon
  • 19. Even from...
  • 20. Can we get results FASTER?
  • 21. Cloudera Impala Storm Apache Drill
  • 22. thanks

×