Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Refactoring your EDW with Mobile Analytics Products

201 views

Published on

Present by Zhi Zhu (China Construction Bank) and Luke Han (Kyligence) at Strata New York 2018

Published in: Software
  • Be the first to comment

Refactoring your EDW with Mobile Analytics Products

  1. 1. Refactoring your EDW With Mobile Analytics Products Zhi Zhu @CCB FinTech Luke Han @Kyligence
  2. 2. Strata New York 2018 EDW core data 1PB Incremental data 4TB/DAY On-line data storage 5PB >600M customers >2,000M accounts Big data – How big is big? CCB - 2nd biggest bank in China. About China Construction Bank (CCB)
  3. 3. Strata New York 2018 Tactical Decision Makers General Business Users Strategic Decision Makers Operational Decision Makers Headquarters Source Systems ALS CLPM CCMI S EDW Teradata 5450 (6 nodes), 18T ERPF CCBS SMIS Material DSS Database OCR M … Cube CMIS CMIS CCD A …1104 Operational Data Storage ODS Historic al data Branches Source Systems 100+ reports 100+ Users 1st Generation EDW (2004)
  4. 4. Strata New York 2018 Dining Room Readily Accessible to End Users (and BI Developers) Safe, Hospitable Environment Data Assets “Ready for Primetime” Dimensionally Structured Kitchen Off Limits to End Users Data Professionals Only Please Dangerous / Inhospitable Environment ”Data Assets “Not Ready for Primetime” Structured Variably For Data Processing Dimensional Semantic Layer Dimensional Tier [Physical or Virtual (CIF or Data Vault)] (Virtual or Physical) Un/Semi-Structured Data Movement Un/Semi-Structured Source Data Persistent Un/Semi- Structured Staging Area Unstructured -> Structured Data Discovery Processing Structured Data Movement Structured Source Data Persistent Structured Data Repository Insight Generation / Data Mining Big Data Blueprint (2012)
  5. 5. Strata New York 2018 Tactical Decision Makers General Business Users Strategic Decision Makers Operational Decision Makers Presentation Layer Headquarters Source System ALS CLPM CCMIS EDW Teradata 6650 (10+10 nodes), 600T `Big Data Analytics Platform Hadoop Legacy Data Marts OCRM… 1000+ Cube CMIS Historical Data SOR MPP DB ERPF CCBS CCDA…1104 Operational Data Storage Branch ODSB Performance Marketing EDW Teradata 2750 (32 nodes), 750T Branches Source Systems 25,000+ reports 2,000+ Data Mining Theme s SQL Translation between different databases is a big lesson. 100,000+ Users User Experience Challenges: Data latency High-performance EDW Challenges: System I/O Maintanence and data lineage Big Data Transformation (2016)
  6. 6. Strata New York 2018 Tactical Decision Makers General Business Users Strategic Decision Makers Operational Decision Makers Presentation Layer Headquarters Source System ALS CLPM CCMIS EDW Teradata 6650 (10+10 nodes), 600T `Big Data Analytics Platform Hadoop Legacy Data Marts OCRM… 1000+ Cube CMIS Historical Data SOR MPP DB ERPF CCBS CCDA…1104 Operational Data Storage Branch ODSB Performance Marketing EDW Teradata 2750 (32 nodes), 750T Branches Source Systems 25,000+ reports 2,000+ Data Mining Theme s SQL Translation between different databases is a big lesson. 100,000+ Users User Experience Challenges: Data latency High-performance EDW Challenges: System I/O Maintanence and data lineage Big Data Transformation (2016)
  7. 7. Strata New York 2018 Tactical Decision Makers General Business Users Strategic Decision Makers Operational Decision Makers Presentation Layer Headquarters Source System ALS CLPM CCMIS EDW Teradata 6650 (10+10 nodes), 600T `Big Data Analytics Platform Hadoop Legacy Data Marts OCRM… 1000+ Cube CMIS Historical Data SOR MPP DB ERPF CCBS CCDA…1104 Operational Data Storage Branch ODSB Performance Marketing EDW Teradata 2750 (32 nodes), 750T Branches Source Systems 25,000+ reports 2,000+ Data Mining Theme s SQL Translation between different databases is a big lesson. 100,000+ Users User Experience Challenges: Data latency High-performance EDW Challenges: System I/O Maintanence and data lineage Big Data Transformation (2016)
  8. 8. Strata New York 2018 Tactical Decision Makers General Business Users Strategic Decision Makers Operational Decision Makers Presentation Layer Headquarters Source System ALS CLPM CCMIS EDW Teradata 6650 (10+10 nodes), 600T `Big Data Analytics Platform Hadoop Legacy Data Marts OCRM… 1000+ Cube CMIS Historical Data SOR MPP DB ERPF CCBS CCDA…1104 Operational Data Storage Branch ODSB Performance Marketing EDW Teradata 2750 (32 nodes), 750T Branches Source Systems 25,000+ reports 2,000+ Data Mining Theme s SQL Translation between different databases is a big lesson. 100,000+ Users User Experience Challenges: Data latency High-performance EDW Challenges: System I/O Maintanence and data lineage Big Data Transformation (2016)
  9. 9. Strata New York 2018 100,000+ users1,200+ million records PB-level data storageMillisecond-level responding Metrics can be published by sub-organizations, and be subscribed by end-user touching Intelligent Eyes(1st version, Sept 2016) Mobile product brought an opportunity
  10. 10. Strata New York 2018 Benefits TCO  Teradata no longer increased  Cost of unit storage ↓ 66%  Delivery cycle time ↓ from 6 months to 1 months 1 Performance  Mobile users ↑ from 0 to 100,000+;  Active PC users ↓ 90%;  Page view (PV) up to 1,000,000 daily  Real-time applications emerged  Data latency ↓ from 48 hours to 7 hours  Millisecond-level responding. 2 User Experience  Access data anywhere and anytime  25000+ reports ↓ to 5000 and 800 mobile data metrics  Eliminating vertical shaft data problems 3
  11. 11. Strata New York 2018 How to re-engineering legacy EDW to Data Lake • Discover users’ values by collecting their usage records. • Enable end users to join the data game. • Build data conformance bus on Hive. • Rebuild Analytics layer by Apache Kylin. • Testing driven development.
  12. 12. Strata New York 2018 AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP L2 Cache Oracle Database DATA MARTS TEST/ DEV ANALYTICAL ARCHIVE CAPTURE | STORE | REFINE MDX RESTFUL SERVICE DATA LAB INDEPENDENT DATA MART DUAL SYSTEMS TD 66XX TD 2700 L2 Cache HBase GP L1 Cache Redis ETL EDW has evolved to Data Ecosystem
  13. 13. Strata New York 2018 AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP L2 Cache Oracle Database DATA MARTS TEST/ DEV ANALYTICAL ARCHIVE CAPTURE | STORE | REFINE MDX RESTFUL SERVICE DATA LAB INDEPENDENT DATA MART DUAL SYSTEMS TD 66XX TD 2700 L2 Cache HBase GP L1 Cache Redis ETL EDW has evolved to Data Ecosystem
  14. 14. Strata New York 2018 AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP L2 Cache Oracle Database DATA MARTS TEST/ DEV ANALYTICAL ARCHIVE CAPTURE | STORE | REFINE MDX RESTFUL SERVICE DATA LAB INDEPENDENT DATA MART DUAL SYSTEMS TD 66XX TD 2700 L2 Cache HBase GP L1 Cache Redis ETL EDW has evolved to Data Ecosystem
  15. 15. Apache Kylin
  16. 16. Strata New York 2018 About Apache Kylin • Leading Open Source OLAP for Big Data • Open source by eBay in 2014 • Graduated to Apache Top Project in 2015 • 1000+ Adoptions world wild • 2015 InfoWorld Bossie Awards • 2016 InfoWorld Bossie Awards
  17. 17. Strata New York 2018 Presentation Visualization Data Lake Data Source o Too many options o Low performance o Long learning curve o Compatibility issue o Technology vs Data OLAP: The Missing Part of Big Data Hive Impala Spark SQL Drill MapReduce …Spark
  18. 18. Strata New York 2018 Presentation Visualization Data Lake Data Source o SQL Acceleration for Big Data o Semantic Layer o Speed up Analytics o ANSI SQL Interface o High Performance and High Concurrency Apache Kylin: Bring OLAP back to Big Data OLAP Data Mart Hive Impala Spark SQL Drill MapReduce …Spark
  19. 19. Kyligence= Kylin + Intelligence
  20. 20. About Us Kyligence = Kylin + Intelligence - Kyligence is formed by the team who created Apache Kylin, leading open source OLAP for Big Data. Kyligence provides an intelligent data warehouse built for data cognitive analytics at web scale. - Funding by leading VCs: Redpoint Ventures, Cisco, CBC Capital and Shunwei Capital, Eight Roads Ventures (Fidelity International Arm) - CRN Top 10 Big Data Startups 2018 © Kyligence Inc. 2018, Confidential.
  21. 21. Strata New York 2018 Featured Customers Trusted by Fortune 500 Lenovo #226 of Fortune 500 OPPO #4 Smart Phone Vendor Global Lufax #1 Fintech in China CPIC #252 of Fortune 500 SAIC #41 of Fortune 500 #47 of Fortune 500 Huawei #83 of Fortune 500 Huatai Securities Top Securities in China Top 3 Telecom in China McDonald’s #436 Fortune 500 China UnionPay #3 Payment Network Data from Fortune Global 500 year 2017: http://fortune.com/global500/list/ #33 of Fortune 500
  22. 22. Strata New York 2018 Partners Global Ecosystem Microsoft Azure Partner Amazon Web Service Technology Partner Tableau Technology Partner Cloudera Sliver Partner MapR Converge Partner Hortonworks Community Partner Huawei Solution Partner
  23. 23. Evolution of Data Warehousing Data Mart Orders Payments Contacts Products Customers Data Warehouse Contacts Orders Payments Products Data Warehouse Data Lake Contacts Orders Payments Products Data Warehouse Contacts Orders Payments Products Next GenerationCloud Contacts Orders Payments Products Data Warehouse Products Contacts Orders Payments ?
  24. 24. Traditional Data Warehousing Enormous Manual Efforts and Repeated Work © Kyligence Inc. 2018, Confidential.
  25. 25. Human Intelligence Intelligence and Automation The future of Data Analytics Artificial IntelligenceVS
  26. 26. Historical Real time Fusion of Historical & Real-time Data Fusion of Local and Cloud On-premises Cloud EDW Data Lake Fusion of Traditional DW & Big Data Fusional DW Architecture Kyligence Enterprise Product Screenshot
  27. 27. Intelligent DW Architecture
  28. 28. Augmented Analytics Platform SQL Query Log Analytic Behavior Data Schema Data Profile ML-based Discovery of Analytic Pattern Proprietary Data Modeling Automation Self-directed Storage Layer Optimization Intelligent Query Push- down & Routing BI Real-time Analysis Data-as-a- Service Local Deployment Cloud Platform Container Data Services © Kyligence Inc. 2018, Confidential.
  29. 29. Strata New York 2018 Kyligence Position in Big Data Ecosystem Fill the gap between business and technology Kyligence Enterprise powered by Apache Kylin BI Visualization OLAP Data Mart Data Lake Source Data HDFS YARN MapReduce Spark Kafka …Spark SQL • Fusional • Unified EDW & Data Lake • Unified Realtime and Historical • Unified On-Prem and Cloud • Intelligent • Machine Learning-augmented modeling • High Performance • Sub-seconds query speed on massive dataset • High Concurrency • Web-scale OLAP query
  30. 30. Evolution of Data Warehousing Data Mart Orders Payments Contacts Products Customers Data Warehouse Contacts Orders Payments Products Data Warehouse Data Lake Contacts Orders Payments Products Data Warehouse Contacts Orders Payments Products Fusional & Intelligent DW Cloud Contacts Orders Payments Products Data Warehouse Products Contacts Orders Payments
  31. 31. Strata New York 2018 Kyligence Cloud Transforming Big Data Analytics to Cloud Kyligence Cloud ANSI SQL Dashboard OLAP Hadoop Customer Cloud Account client cloud Kyligence Enterprise Platform streaming Cluster Deploy Account Management Diagnosis & Optimization Queries & Reporting cloud storage tables, logs, files RDBMS (metadata) ANSI SQL Cloud Data Warehouse Cluster Management
  32. 32. Strata New York 2018 Kyligence Cloud Transforming Big Data Analytics to Cloud One-click provisioning Auto Scaling High Performance Seamless Integration Intelligent Ops Deploy globally in 30 minutes Scale cluster automatically for different workloads Powered by Kyligence Analytics Platform Connect to cloud data sources Enterprise ODBC driver for BI Online diagnosis and continuous optimization Speed Up OLAP analysis and mission-critical queries to interactive speed
  33. 33. Solutions
  34. 34. SQL Acceleration for Big Data
  35. 35. Strata New York 2018 SQL Acceleration for Big Data Kyligence Enterprise Powered by Apache Kylin ANSI SQL Kyligence Storage Hadoop Platform T-SQL Oracle SQL PostgreSQL Ingestion SQL Pushdown Impala Query Analytics
  36. 36. Strata New York 2018 SQL Acceleration for Big Data < 1s DB line_orders buyer_accounts seller_accounts product_items … √ √ √ SQL SQL
  37. 37. Strata New York 2018 SQL Acceleration for Big Data Intelligent Cubing Kyligence Enterprise ANSI SQL Pushdown For Ad-Hoc Aggregation & Index query Solution • Speed up SQL on Hadoop automatically • Supports Hive, Impala, Spark SQL and more will coming • High performance and high concurrency OLAP Benefits • Unified analytics platform for aggregation and ad-hoc query • Self-services enables analysts without IT SQL on Hadoop
  38. 38. Powering Excel for Big Data
  39. 39. Strata New York 2018 Powering Excel for Big Data Extend big data analytics to every analysts desktop Analyze Your Big Data LIVE with Excel MDX/ANSI SQL Interface Self-service Big Data from On- Perm to Cloud
  40. 40. Strata New York 2018 LIVE No data import is needed Slice and dice your big data Your Excel can fully leverage Kyligence Cube capability
  41. 41. Strata New York 2018 LIVE No data import is needed Slice and dice your big data Your Excel can fully leverage Kyligence Cube capability
  42. 42. Strata New York 2018 Anywhere Desktop Website Mobile Kyligence currently support Pin your Excel report to Power BI mobile
  43. 43. Migrating EDW to Data Lake
  44. 44. Strata New York 2018 Kyligence Acceleration Solution for Greenplum Kyligence Enterprise Build Cube SQL SQL Pushdown ~ minutes Cube Access ~ sub-seconds • Change data source connection • Intelligently build cubes from Greenplum • Accelerate mission-critical analytics • Pushdown flexible queries to Greenplum for data exploration
  45. 45. Strata New York 2018 Kyligence Acceleration Solution for Greenplum 100x faster SQL Pushdown to Greenplum: minutes latency,min duration > 20s After acceleration:sub-seconds latency,max duration < 1s Seamlessly migrated Query Performance ~ 100x Reporting rendering ~14x Same Tableau reports 100x faster!
  46. 46. Streaming OLAP for near real time
  47. 47. Strata New York 2018 Streaming OLAP Consume Streaming Data via Kafka MDX/ANSI SQL Interface Batch & Streaming together Data Source HDFS (Recent data) Kyligence Enterprise Pushdown Cube Access Build Cube Loading Processing Kafka Topic Monitor Prediction Alerts … BI MOLAP … Cube (Full history data) Near Real-time (On recent data) Historical (On full history data)
  48. 48. Q & A luke.han@kyligence.io

×