Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Filling the Data Lake

1,318 views

Published on

Filling the Data Lake

Published in: Technology
  • Be the first to comment

Filling the Data Lake

  1. 1. Filling the Data Lake June 29, 2016 Chuck Yarbrough Sr Director, Solutions Marketing and Management @cyarbrough Mark Burnette Enterprise Sales Engineer @MarkCBurnette
  2. 2. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75552 Emerging Big Data Use Cases Improve operational effectiveness Machines/sensors: predict failures, network attacks Financial risk management: reduce fraud, increase security Reduce data warehouse cost Improve customer experience Build a 360° view to fully understand and serve the customer Drive personalized and adjusted interaction Use automated recommendations logic Drive incremental revenue Predict customer behavior across all channels Understand and monetize customer behavior Begin to monetize data as a service
  3. 3. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75553 Spectrum of Big Data Use Cases Entry Transform Advanced Optimize Data Warehouse Optimization Streamlined Data Refinery Big Data Exploration Customer 360 Degree View Harnessing Machine & Sensor Data Next Generation Applications Internal Big Data as a Service On-Demand Big Data Blending Big Data Predictive Analytics Use Case Complexity BusinessImpact Monetize My Data Data Warehouse Optimization Data Warehouse Optimization Streamlined Data Refinery 360 Degree View Big Data Onboarding Filling the Data Lake
  4. 4. What Does Pentaho Do?
  5. 5. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75555 Administration Security Lifecycle Management Data Provenance Dynamic Data Pipeline Monitoring Automation Data Pipeline Data Engineering Managing and Automating the Pipeline Data Engineering AnalyticsData Preparation Data Lake
  6. 6. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75556 The Data Swamp
  7. 7. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75557 The Data Lake
  8. 8. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75558 Does Hadoop Have to be Hard? Empower team members to integrate and process Hadoop Data Establish a modern data on boarding process that is flexible and scalable Deliver governed analytic insights for large production use bases Things that can help ease the pain
  9. 9. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75559 Proper Care and Feeding of the Data Lake
  10. 10. Data Onboarding Challenges
  11. 11. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755511 More Data, More Problems Even with good integration tools, major data onboarding projects can be painful: User Challenges §  Repetitive manual design §  Very time-consuming §  Difficult to maintain Business Challenges §  Takes too long §  Business deadlines at risk §  Opportunity cost
  12. 12. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755512 How do we effectively scale data pipelines to accommodate exploding data sources, volumes, and complexity? More Data, More Problems Have you ever had the pleasure of… Migrating hundreds of sources between systems? Enabling business users to onboard a variety of data themselves? Ingesting hundreds of changing data sources into Hadoop?
  13. 13. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755513 More Data, More Problems Modern data onboarding is more than just “dumping data” – it includes: Managing a changing array of data sources Establishing repeatable processes at scale Maintaining control and governance
  14. 14. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755514 CSV RDBMS Data On Boarding Filling the Data Lake Ingest Procedures Disparate Data Sources Integration Processes Transformations Hadoop AVRO
  15. 15. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755515 CSVCSV RDBMS Data On Boarding at Scale RDBMS Disparate Data Sources Integration Processes Transformations RDBMS Ingest Procedures Hadoop AVRO
  16. 16. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755516 Filling the Data Lake A Modern Data Onboarding Blueprint Streamline data ingest from wide variety of source data Reduce dependence on hard coded data movement procedures Simplify regular data movement at scale into Data Lake
  17. 17. Template-based Approach
  18. 18. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755518 CSVCSV RDBMS Dynamic ELT Ingest Templates Hadoop RDBMS Disparate Data Sources Dynamic Integration Processes Dynamic Transformations RDBMS Pass metadata in at run time to generate jobs on the fly (metadata injection)
  19. 19. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755519 CSV CSV RDBMS Templated workflows RDBMS -> AVRO Template Hadoop RDBMS Disparate Data Sources Dynamic Integration Processes Dynamic Transformations RDBMS CSV -> AVRO Template CSV -> HDFS Template
  20. 20. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755520 Variety – different metadata, one template Hadoop Disparate Data Sources Dynamic Integration Processes Dynamic Transformations CSV -> AVRO Template
  21. 21. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755521 Key Takeaway Managing ELT and ELT procedures Managing Metadata Metadata Injection
  22. 22. Metadata Acquisition
  23. 23. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755523 RDBMS Ingestion Automated Metadata Extraction Extract table and store in AVRO §  Database connection details §  Table(s) §  Field names (if available) §  Data types §  String length §  Mask for numbers and dates §  …
  24. 24. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755524 Option 1: Ingest RAW files into HDFS (no parsing) §  Path to CSVs CSV Ingestion Option 2: Parse and store in AVRO §  Path to CSVs §  Delimiter §  Field names (if available) §  Data types §  String length §  Mask for numbers and dates §  … Automated Metadata Extraction
  25. 25. Demonstration
  26. 26. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755526 Key Takeaway ELT development DAYS Provisioning MINUTES Automated Metadata Extraction
  27. 27. Summary
  28. 28. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755528 Key Takeaways Template-based Data Integration Manage metadata vs. ELT procedures Automated Metadata Extraction Provide minimum required configuration Reduce Risk Maintain an organized, standardized, & clean, data lake Data Onboarding Blueprint
  29. 29. © 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-755529 Learn more about Big Data Onboarding at Pentaho.com Download Pentaho Platform at Pentaho.com What Next?
  30. 30. Q&A
  31. 31. Thank You

×