Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Adoption Status

786 views

Published on

Presentation about Big Data Adoption Status
by Nuno Barreto - Partner & Big Data Lead @Xpand IT

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Big Data Adoption Status

  1. 1. Big Data Adoption Status Nuno Barreto Partner | Big Data Lead
  2. 2. AGENDA • STATUS CHECK • TECHNOLOGY HIGHLIGHTS • COOL STUFF WE’RE DOING • LOOKING AHEAD
  3. 3. STATUS CHECK
  4. 4. HADOOP – THE ULTIMATE DATA TOOLKIT
  5. 5. HADOOP – THE ECO-SYSTEM
  6. 6. from Batch to Near-Real-Time from Analytics to Operational
  7. 7. ACTIVE INDUSTRIES • RETAIL • UTILITIES • TELCO • MOBILITY • FINANCIAL SERVICES • E-BUSINESS
  8. 8. CLUSTER SIZES & TOPOLOGIES • 5 TO TENS OF NODES • COUPLE HUNDRED GiB TO DUZEN TiB OF RAM • COUPLE TiB TO HUNDRED TiB OF RAW SPACE • ON-PREM AND CLOUD
  9. 9. TECHNOLOGY HIGHLIGHTS
  10. 10. HBASE AND HDFS/PARQUET • HDFS/PARQUET IS GREAT FOR LARGE SCANS, BUT… • HBASE IS GREAT FOR INDEXED READS/WRITES, BUT…
  11. 11. COMPLEX ARCHITECTURES source: http://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-storage-for-fast-analytics-on-fast-data/
  12. 12. THE “WORST” OF BOTH WORLDS • KUDU IS SLIGHTLY WORSE THAN HBASE FOR INDEXED OPS • KUDU SHOULD BE NO MORE THAN 2x WORSE THAN PARQUET FOR LARGE SCANS
  13. 13. KUDU BENCHMARKS – INGEST RATE HIGHER IS BETTER source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/
  14. 14. KUDU BENCHMARKS – RANDOM LOOKUP LOWER IS BETTER source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/
  15. 15. KUDU BENCHMARKS – SCAN RATE HIGHER IS BETTER source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/
  16. 16. KUDU BENCHMARKS – SUMMARY source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/
  17. 17. COOL STUFF WE’RE DOING
  18. 18. NEAR–REAL–TIME DATA LAKE
  19. 19. BY
  20. 20. ZWOOX - INGESTION FRAMEWORK • LOW LATENCY, HIGH THROUGHPUT, HIGHLY AVAILABLE • NEAR-REAL TIME for KUDU & BATCH for HIVE/IMPALA • BATCH & STREAMING REPLICATIONS • AUTOMATIC CONSOLIDATION INTO HDFS BASED TABLES • MULTIPLE TABLEs WITH SPECIFIC PARTITIONING SCHEME • IN-LINE PROCESSING • AUTOMATIC AUDIT DATA
  21. 21. MESSAGE BUS • JMS SEMANTICS IS LIMITED • JMS SCALING IS HARD • JMS PERFORMANCE IS POOR COMPARED TO KAFKA
  22. 22. LOOKING AHEAD (two trends)
  23. 23. IOT – VALUE IS IN THE CORE DEVICE CENTRIC VS GATEWAY CENTRIC VS CENTRALIZED PLATFORM CENTRIC
  24. 24. DATA SCIENCE
  25. 25. CLOUDERA WORKBENCH
  26. 26. THANK YOU

×