Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BIGdatatodayand
tomorrow
Mariusz Gil
/ ABOUT ME /
BIGDATA
This talk is about
BIGDATA?
What is...
VOLUMElarge amounts of data
VELOCITYneeds to be analyzed quickly
VARIETYdifferent types of structured and unstructured data
Big Data is data that is too large,
complex and dynamics for any conventional data tools
to capture, store, manage and ana...
30 billion pieces of content we added past month
more than 2 billion videos were watched yesterday
more than 58 millions messages were send yesterday
WHY?
690 nodes Hadoop cluster for predictions and analytics
HOW?
HBASE
COLUMNAR STORAGE
HIVE
SQLDATA WAREHOUSE ENGINE
AVRO
DATA SERIALIZATION
MAHOUT
SCALABLE MACHINE LEARNING
OOZIE
WORKFL...
EVOLVE
HADOOP!
The future is not only
REALTIME
Future is low latency and
Apache Drill
Storm
BIGTHING
Data is the next
thanksmariusz@mariuszgil.com
Big data today and tomorrow
Big data today and tomorrow
Big data today and tomorrow
Big data today and tomorrow
Upcoming SlideShare
Loading in …5
×

Big data today and tomorrow

507 views

Published on

  • Be the first to comment

  • Be the first to like this

Big data today and tomorrow

  1. 1. BIGdatatodayand tomorrow Mariusz Gil
  2. 2. / ABOUT ME /
  3. 3. BIGDATA This talk is about
  4. 4. BIGDATA? What is...
  5. 5. VOLUMElarge amounts of data
  6. 6. VELOCITYneeds to be analyzed quickly
  7. 7. VARIETYdifferent types of structured and unstructured data
  8. 8. Big Data is data that is too large, complex and dynamics for any conventional data tools to capture, store, manage and analyze.
  9. 9. 30 billion pieces of content we added past month
  10. 10. more than 2 billion videos were watched yesterday
  11. 11. more than 58 millions messages were send yesterday
  12. 12. WHY?
  13. 13. 690 nodes Hadoop cluster for predictions and analytics
  14. 14. HOW?
  15. 15. HBASE COLUMNAR STORAGE HIVE SQLDATA WAREHOUSE ENGINE AVRO DATA SERIALIZATION MAHOUT SCALABLE MACHINE LEARNING OOZIE WORKFLOWS ORCHESTRATION ZOOKEEPER DISTRIBUTED COORDINATION SERVICE FLUME LOG COLLECTOR HDFS HADOOP DISTRIBUTED FILE SYSTEM YARN / MapReduce v2 DISTRIBUTED PROCESSING FRAMEWORK AMBARI PROVISIONING, MANAGING AND MONITORING CLUSTERS WHIRR RUNNING CLOUD SERVICES
  16. 16. EVOLVE
  17. 17. HADOOP! The future is not only
  18. 18. REALTIME Future is low latency and
  19. 19. Apache Drill Storm
  20. 20. BIGTHING Data is the next
  21. 21. thanksmariusz@mariuszgil.com

×