Successfully reported this slideshow.
Your SlideShare is downloading. ×

TechEvent DWH Modernization

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 28 Ad

More Related Content

Slideshows for you (20)

Similar to TechEvent DWH Modernization (20)

Advertisement

More from Trivadis (20)

Recently uploaded (20)

Advertisement

TechEvent DWH Modernization

  1. 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MANNHEIM MÜNCHEN STUTTGART WIEN ZÜRICH DWH Modernization Do I need a data lake? If yes, why? Jan Ott @jan_ott_ch https://janottblog.com
  2. 2. Jan Ott Working at Trivadis 20 years Principal Consultant BI Speaker at Conferences Consultant, Trainer, Software Architect for BI: DWH & Big Data More than 20 years of software development experience Contact: jan.ott@trivadis.com TechEvent September 20182 26.09.2018
  3. 3. Agenda TechEvent September 20183 26.09.2018 1. Initial situation at the customer 2. DWH - Big Data Architecture 3. Lizences & Knowledge 4. Summary - Do I need a data lake?
  4. 4. TechEvent September 20184 26.09.2018 Initial Situation
  5. 5. Current and desired status TechEvent September 20185 26.09.2018 Current: • 1 x load per week full load • 1 x load per day CRM delta load • Loading window getting to small • ... Desired: • 1 x per day delta load • Streaming of some data • Plattform for analytics team • Methodes to add public available data • ...
  6. 6. Data Warehouse Architecture Data Warehouse Staging Area Cleansing Area Core Data Marts Meta Data BI PlatformSources ETL 6 TechEvent September 201826.09.2018
  7. 7. The „Big“ Shift in Analytical Data Management TechEvent September 201826.09.2018  Stable and Consolidated Data  DWH as Single Point of Truth  Business Driven Analytical Schema  Assured Data Quality and Data History Preservation  Governed and Secure Data to meet Compliance  Agile to support New Business Demands  Support of Self-Service features  Right-time (near real-time, not batch)  Scales to support More Data, New Sources and Broader Use Cases  Simplified from modelling, quality and development It´s much more an enrichment than a substitution of the requirements! Traditional BI/DWH Requirements Emerging Analytical Requirements Velocity Volume Variety 7
  8. 8. Data Lake = DWH + Möglichkeiten - Komplexität? 8 … Inter- net CRM Event ERP 26.09.2018 TechEvent September 2018
  9. 9. TechEvent September 20189 26.09.2018 DWH - Big Data Architecture
  10. 10. Reference Architecture Analytical Platform Automation Meta DataGeneratorTemplate Generate Artefact Data Lineage Generate Tracing info 26.09.2018 TechEvent September 201810
  11. 11. Reference Architecture Analytical Platform Automation Meta DataGeneratorTemplate Generate Artefact Data Lineage Generate Tracing info 1 3 2 4 5 0 26.09.2018 TechEvent September 201811
  12. 12. How to do Big Data? 26.09.2018 TechEvent September 201812
  13. 13. Big Data Ecosystem – many choices …. 26.09.2018 TechEvent September 201813
  14. 14. Reference Architecture Analytical Platform Automation Meta DataGeneratorTemplate Generate Artefact Data Lineage Generate Tracing info 1 3 2 4 5 0 0 1 2 3 4 5 1 1 2 2 3 3 2 CONNECT 26.09.2018 TechEvent September 201814 * DB is a logical standby
  15. 15. Key Success Factors for a Big Data Project 1. Support from Business Sponsor 2. Start with Outcome Answer First 3. Involve Real Users and Create Effective Use Cases 4. Define Quick-Win and Phasing 5. Sufficient Data Source 6. Choose the Open Technology Platform 7. Identify SLA for Service Operation 8. Project Review 15 TechEvent September 201826.09.2018
  16. 16. Big Data is still “work in progress” Choosing the right architecture is key for any (big data) project Big Data is still quite a young field and therefore there are no standard architectures available which have been used for years In the past few years, a few architectures have evolved and have been discussed online Know the use cases before choosing your architecture To have one/a few reference architectures can help in choosing the right components 16 TechEvent September 201826.09.2018
  17. 17. StreamSets Data Collector Founded by ex-Cloudera, Informatica employees Continuous open source, intent-driven, big data ingest Visible, record-oriented approach fixes combinatorial explosion Batch or stream processing • Standalone, Spark cluster, MapReduce cluster IDE for pipeline development by ‘civilians’ Relatively new - first public release September 2015 So far, vast majority of commits are from StreamSets staff 17 TechEvent September 201826.09.2018
  18. 18. Apache Avro • Row-based Data Serialization system • Uses JSON based schemas • Uses RPC calls to send data • Schema’s sent during data exchange • Integrated with many languages • Fast binary data format or encode with JSON { "namespace": "trimazon.schema.customer", "type": "record", "name": "customer", "fields": [ {"name": "firstName", "type":"string"}, {"name": "lastName", "type":"string"}, {"name": "age", "type":"int"}, {"name": "email", "type":"string"} ] } 18 TechEvent September 201826.09.2018
  19. 19. 19 TechEvent September 201826.09.2018 Next Generation Data Warehousing
  20. 20. DWH Challenges & Key Issues – Data Warehouse Automation TechEvent September 201820 Drive development performance ensure standardization Automation of development tasks& generator based standardization Close the gap in Requirements-Development-Governance Closed loop Design & Development process in one application Manage the change: Lifecycle Management Extensive Version Management for documentation and impact analysis Agility - agile data warehousing Automation enables short Release Cycles and Sandboxing approaches Achieve Flexibility – support for individual architecture options Configurable generator is able to support real world DWH-architecture 26.09.2018
  21. 21. Drive development performance ensure standardization TechEvent September 201821 Reduced Testing effort substantial time and cost savings Standardization Generator Data Base Objects Mappings Data Flow Model Meta- definition Staging Cleansing DWH-Core Data Mart Source Source Source Automation of Development tasks Huge amount of recurring and monotonic development tasks. Standards/ Best Practices 1 26.09.2018
  22. 22. TechEvent September 201822 26.09.2018 Lizence & Knowledge
  23. 23. Lizence / Distributors TechEvent September 201823 26.09.2018 Cloudera • Hadoop • Hbase • Hive • Impala • Yarn • ... Databricks • Spark • Spark R • Spark SQL • ... Confluent • Kafka • Kafka Connect • Kafka Streams • Kafka Schema • Kafka KSQL • ... StreamSets Trivadis • BiGenius
  24. 24. Knowledge Myth - One does it all. IT is no longer required? 26.09.2018 TechEvent September 201824 The Data Lab / Data Scientist Solves It All
  25. 25. TechEvent September 201825 26.09.2018 Summary – Do I need a Data Lake?
  26. 26. Summary TechEvent September 201826 26.09.2018 Pro:  Streaming  Plattform for data analysis  Flexibility • Different data formats • Add new data quickly  Basis to build on  Ready for the future  More Data available • More years • Higher granularity Contra: • Cost • Complexity • New Knowledge required
  27. 27. Questions & Answers 26.09.2018 TechEvent September 2018 Jan Ott jan.ott@trivadis.com 27
  28. 28. Session Feedback – now TechEvent September 201828 26.09.2018 Please use the Trivadis Events mobile app to give feedback on each session Use "My schedule" if you have registered for a session Otherwise use "Agenda" and the search function If the mobile app does not work (or if you have a Windows smartphone), use your smartphone browser – URL: http://trivadis.quickmobileplatform.eu/ – User name: <your_loginname> (such as "svv") – Password: sent by e-mail...

×