Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cloudera Customer Success Story

575 views

Published on

"Cloudera Customer Success Story" by Nuno Barreto - Associate Partner & Big Data Lead @Xpand IT on the event Cloudera & Big Data Ecosystem

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Cloudera Customer Success Story

  1. 1. Customer Success Story Cloudera & Xpand IT Nuno Barreto Associate Partner & Big Data Lead nuno.barreto@xpand-it.com Proprietary & Confidential www.xpand-it.com
  2. 2. THE PROBLEM How is process Y progressing? Who are the main cluster users/departments? Which engines does each department use? Do I need to plan on an upgrade? How much is process X costing me? Are there available time slots?
  3. 3. THE SOLUTION TELEMETRY ETL FLOW CONTROL DATA PREPARATION
  4. 4. ARCHITECURE CORE AGENT(s) QUEUE REAL-TIME ONLINEDB ANALYTICSREPO ETL start/stop jobs start/stop jobs PDI extensionlogflow control ANALYTICS ANALYTICSDB status check metadata access data access analytics data analytical queries operational data
  5. 5. THIS INVOLES A NUMBER OF CONCEPTS NEAR REAL-TIME CLOUDERA INTEGRATION LAMBDA ARCHITECTURE STREAMING
  6. 6. NEAR REAL-TIME AND STREAMING
  7. 7. REAL-TIME & STREAMING CORE AGENT(s) QUEUE REAL-TIME ONLINEDB ANALYTICSREPO ETL start/stop jobs start/stop jobs PDI extensionlogflow control ANALYTICS ANALYTICSDB status check metadata access data access analytics data analytical queries operational data
  8. 8. REMOTE AGENTS FINE GRAINED CONTROL ETL TOOL SPECIFIC REAL TIME LOGGING ASYNC EXECUTION
  9. 9. PDI EXTENSION POINTS CAPTURE LOG START/END CAPTURE CONNECTION TYPE CAPTURE STEP LINEAGE DETAIL
  10. 10. GATHERING EXECUTION DATA USE KAFKA AS A LOG SINK FAULT TOLERANT REAL TIME CONSISTENT
  11. 11. COLLECT LOG DATA IN (AS) REALTIME (AS POSSIBLE) SPARK AS KAFKA COLLECTOR REAL TIME LOG PARSING ETL TOOL ADAPTABLE DATA DUMPS IN IMPALA AND HBASE GENERATES NOTIFICATIONS
  12. 12. LAMBDA ARCHITECTURE
  13. 13. LAMBDA ARCHITECTURE CORE AGENT(s) QUEUE REAL-TIME ONLINEDB ANALYTICSREPO ETL start/stop jobs start/stop jobs PDI extensionlogflow control ANALYTICS ANALYTICSDB status check metadata access data access analytics data analytical queries operational data
  14. 14. DISCLAIMER What you are about to see is a Work In Progress so, be gentle in case… • the demo doesn’t work • features don’t work as described • connection goes down
  15. 15. DEMO REAL-TIME AND STREAMING
  16. 16. CLOUDERA INTEGRATION
  17. 17. HOW TO MANAGE ALL THESE COMPONENTS LOTS OF MOVING PARTS OPERATIONS LOADS OF CONFIG FILES
  18. 18. THE ANSWER
  19. 19. EXTENSIBLE ARCHITECTURE SEAMLESS INTEGRATION MONITORING CONFIGURATION MANAGEMENT DEPENDENCIES MANAGEMENT LOG CHECK SETUP AND ADMIN
  20. 20. DEMO CLOUDERA INTEGRATION
  21. 21. SUMMARY NOT EVERYTING WE DO IS THIS COMPLEX HADOOP STACK CHOICE MATTERS RE-USABLE DESIGN PATTERNS
  22. 22. QUESTIONS?

×