Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building Enterprise OLAP on Hadoop for FSI


Published on

Building Enterprise OLAP on Hadoop for Finance Services Industry, and following a use case of CPIC (fortune 500 insurance company) about how to replace legacy IBM Cognos OLAP with Kyligence platform

Published in: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website!
    Are you sure you want to  Yes  No
    Your message goes here

Building Enterprise OLAP on Hadoop for FSI

  1. 1. Building Enterprise OLAP on Hadoop for Financial Services Industry Luke Han | @lukehq Co-founder & CEO of Kyligence Creator & VP of Apache Kylin Microsoft Regional Director & MVP
  2. 2. About Kyligence • Formed by creators of Apache Kylin in 2016 • Offers Enterprise and Cloud version of Apache Kylin • Funding from Redpoint, Cisco, CBC and Shunwei • Member of Microsoft Accelerator Shanghai 2017 • Dual HQ in Silicon Valley & Shanghai, China Kyligence booth: #855
  3. 3. Transition to Big Data… How about your traditional data warehouse? How about your existing OLAP/BI application?
  4. 4. Data Warehouse/OLAP in Financial Services Industry o The biggest industry rely on DW/OLAP application o Thousands applications build on top of EDW o Experienced analysts with decade expertise …in data…but not in technologies
  5. 5. Presentation Visualization OLAP Data Mart Enterprise Data Warehouse Data Source o Optimized for mission-critical analytics o Well modeling o Best practices of industry o Thriving ecosystem o Trained experts everywhere Enterprise Data Warehouse Architecture
  6. 6. But you are asked to… o Migrate or build existing OLAP/BI app to Big Data o Better performance…just because you have Big Data now o Train yourself to learn MR/Spark/ML…and AI
  7. 7. Presentation Visualization Data Lake Data Source o Too many options o Low performance o Long learning curve o Compatibility issue o Technology vs Data OLAP: The Missing Part of Big Data Hive Impala Spark SQL Drill MapReduce …Spark
  8. 8. Presentation Visualization Data Lake Data Source o MOLAP on Hadoop o Simplified Data Modeling o Optimized for aggregation query o ANSI SQL o Native on Hadoop o On-Prem & In the Cloud Apache Kylin: Bring OLAP back to Big Data OLAP Data Mart Hive Impala Spark SQL Drill MapReduce …Spark
  9. 9. Kylin vs Hive: Star-Schema Benchmark 0.17 0.17 0.18 142.42 161.66 189.17 0 20 40 60 80 100 120 140 160 180 200 2 10 20 ResponseTime(seconds) Data Volume (Scale Factor) Apache Kylin vs. Apache Hive (lower is better) KAP Apache Hive * Based on 4 Nodes, 16 Core CPU, 96 GB Memory per node Apache Kylin
  10. 10. Global Users FSI • ABC • CCB • CMB • CPIC • Citic Bank • China Unionpay • HUATAI Securities • GUOTAI Securities • Lufax Telecom • China Mobile • China Telecom • Chine Unicom • AT & T Internet • eBay • Yahoo! Japan • Baidu • Meituan • NetEase • Expedia • • • 360 • Toutiao Others • MachineZone • Glispa • Inovex • Adobe • iFLYTEC 500+ use cases in production global Manufacturing • SAIC • HUAWEI • Lenovo • OPPO • XIAOMI • VIVO Data collected from public information and kylin community
  11. 11. Enterprise OLAP on Hadoop
  12. 12. Kyligence: Enterprise OLAP on Hadoop Kyligence Robot Online Optimize & Tuning Services Kyligence Analytics Platform (KAP) Kyligence Solutions Apache Kylin Open Source OLAP On Hadoop KyStorage Columnar Storage KyStudio Model Designer KyManager Administrator Tool KyAnalyzer Agile BI Security Cell Level ACL On-Demand Deployment On-Premises Hybrid In the Cloud
  13. 13. Kyligence: Enterprise OLAP on Hadoop Hive Spark SQL Impala Kyligence Analytics Platform (KAP) Mission Critical AnalyticsData Exploration/Discovery Intelligent Cubing by KAP Query Pushdown: minutes latency Cube Access: sub-second latency
  14. 14. Support Data Exploration and Discovery
  15. 15. TPC-DS 0 50 100 150 200 250 1 4 7 101316192225283134374043464952555861646770737679828588919497 KAP: TPC-DS • Hive: 33 queries can’t support or run out of time • KAP: all 99 queries supported • Routine query between SQL on Hadoop and Apache Kylin
  16. 16. Speed Up Mission Critical Analytics
  17. 17. TPC-H Benchmark 0 10 20 30 40 50 60 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 KAP vs SparkSQL 2.1 (lower is better) SparkSQL 2.1 KAP 2.4
  18. 18. Kyligence Studio: Data Modeling Designer o Drag & Drop o Smart Data Modeling o Intelligent Optimization
  19. 19. Integrate with Business Intelligence tools
  20. 20. Seamless Integration with BI tools o KyAnalyzer o Tableau o Power BI/Excel o IBM Cognos o MicroStrategy o Superset o Zeppenlin o Saiku o …
  21. 21. Enhanced Security and Management Cell Level ACL/SSO/LDAP/Kerberos…
  22. 22. Use Case: CPIC
  23. 23. CPIC: China Pacific Insurance (Group) Co., LTD • Global Fortune 500 insurance company • Top 2 insurance company in China • $40+ billion revenue • 8+ million customers • 97,000+ employees
  24. 24. Challenges • Legacy IBM Cognos + DB2 solution can’t support Big Data scenarios • Long waiting time (minutes ~ hours for reporting) • Low concurrency (100,000+ employees!) • High cost
  25. 25. 2016.12 ~ 2017.01 KAP POC: Performance Testing • Query Latency • Concurrency KAP POC: Compatibility • Cognos Connection • Cognos Syntax 2017.01 ~ 2017.03 Development • Fixed Reports • Flexible Reports 2017.03 ~ 2017.05 Go alive • All dataset aggregation and testing • Fixed Reports released 2017.05 ~ 2017.06 Journey of Kyligence Analytics Platform • No changes on Hadoop side • No additional engineers required • Most of work done by analysts
  26. 26. KAP + Cognos: Deployment Dynamic Report JDBC Fixed Report ODBC KAP Query Server Reporting & Dashboard OLAP & Data Mart Big Data Platform
  27. 27. Benefits after Adopting Kyligence • One-stop BI platform generates complicated reports • Over 90% queries return within 3 seconds (including high-dimensional queries) • Seamless integration with IBM Cognos, no change at front-end • 2 KAP cubes replaced 2000+ IBM Cognos cubes • Cost reduced significantly by adopting open source technology
  28. 28. Customer Quote “Kyligence enables us to find valuable insights faster from every insurance policy within seconds. Kyligence’s platform allows us to achieve more with less. Our lean management system has improved significantly” -- Minchen Wu, Depute GM of IT, CPIC
  29. 29. Fusion Big Data Platform • Open: Connect to Teradata/Greenplum and IBM Cognos/Saiku… • Flexible: Self-Services for end users • Efficiency: Speed up PC and Mobile analytics experience China Construction Bank (CCB): 2nd Largest Bank in the World “Apache Kylin is last piece of puzzle to serving data asserts management between legacy DW and new Big Data.” -- Zhi Zhu, Vice Senior Manager of Tech Dept, CCB
  30. 30. Enterprise OLAP on Hadoop Speed Up Mission Critical Analytics Booth #855