Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building your bi system-HadoopCon Taiwan 2015

575 views

Published on

Building the Business Intelligence framework with Hive, Teradata, and Tableau.

Published in: Data & Analytics
  • Be the first to comment

Building your bi system-HadoopCon Taiwan 2015

  1. 1. BUILD YOUR BI SYSTEM PRACTICE IN DATA LAKE ECOSYSTEM Bryan@Vpon Data
  2. 2. • Experience Vpon Data Engineer TWM, Keywear, Nielsen • Bryan’s notes for data analysis http://bryannotes.blogspot.tw • Spark.TW • Linikedin https://tw.linkedin.com/pub/bryan-yang/7b/763/a79 ABOUT ME
  3. 3. AGENDA • User Story • Data Lake • Frame Work of BI
  4. 4. DEAL WITH BIG DATA
  5. 5. SMALL RETAILER
  6. 6. MORE COMPLEX AND BIG… http://www.slideteam.net/technology-powerpoint-templates/mobile-phones.html
  7. 7. 3 KINDS OF PROBLEMS https://kavyamuthanna.wordpress.com/category/big-data/
  8. 8. BIG DATA BIG PROBLEM http://www.mn.uio.no/ifi/studier/masteroppgaver/nd/masteroppgave_cloud_bigdata_hpc.html
  9. 9. BIG DATA BIG COST • The cost of data storage What does the data keep? How long? • The cost of data management Is the machine and infra easy to maintain? Data Flow(ETL)? • The time cost of data processing How long will the users can wait? Accessibility of the data Human costs you can not see
  10. 10. A REAL CASE
  11. 11. SO MANY ADHOC QUERIES SALES MARKETING FINANCE BUSINESS
  12. 12. EVEN A SIMPLE QUERY Q: HI, PLEASE TELL ME HOW MANY USERS FROM THE BEGINNING? A:SELECT COUNT(1) FROM LOG
  13. 13. ttps://myreelpov.wordpress.com/2012/12/23/which-story-do-you-prefer-life-of-pi/life-of-pi-2 Your Life Boss Family and Lover Customers Data Ocean
  14. 14. Overviews Business intelligence (BI) is the set of techniques and tools for the transformation of raw data into meaningful and useful information for business analysis purposes. —Wikipedia
  15. 15. DIFFERENT FEATHERS Price Perfomance Accessibility Hadoop Low Median Low SQL Server Low-Median Depends on Median Data Warehouse High High Median BI System High Depends on High
  16. 16. http://www.datalytyx.com/big-data-data-lakes/
  17. 17. WHY DATA LAKE tp://thesologuide.com/332/the-seesaw-of-success-when-taking-a-rest-is-bes
  18. 18. HIVE • Create at Facebook • Data warehouse in Hadoop ecosystem • HiveQL(SQL like interface) • Metastore(Save the schema of data, schema on read) • UDF
  19. 19. http://www.stratapps.net/intro-hive.php
  20. 20. http://hortonworks.com/blog/hive-cheat-sheet-for-sql-users/
  21. 21. ONE MORE THING
  22. 22. TERADATA • Massively Parallel Processing • Each processor handles different threads of the program, and Each processor itself has its own operating disk • Teradata SQL is fully certified at the SQL 92
  23. 23. http://www.slideshare.net/alam7/module-02-teradata-basics
  24. 24. https://www.safaribooksonline.com/library/view/teradata-architecture-for
  25. 25. TABLEAU • Visualization Tool • Connect with kinds of database • VizQL • Tableau Server
  26. 26. http://www.clearpeaks.com/blog/tableau/tableau-8-2-new-features
  27. 27. https://www.youtube.com/watch?v=fYpy04vmG_o
  28. 28. m/services/business-intelligence-services/tableau-consulting/table
  29. 29. JENKINS • Manage ETL processes • Free & Many Plugins • Monitor Jobs Status and dependency • Communication with Git and SVM • Email alert
  30. 30. User Interface回到首頁 管理選單 建置中項目 Joblist 建置資訊 建置狀態 下次建置項目及時間 ip:port Job Name List Job Name List
  31. 31. Build Steps call python script call the remote shell call local shell script
  32. 32. Build Graph Job Name Job Name Job Name Job Name Job Name Job Name Job Name Job Name Job Name
  33. 33. LET’S PUT IT ALL TOGETHER
  34. 34. Hadoop Cluster 1 Hadoop Cluster 2 Teradata Tableau Server User Data Transfer Request ETL Live Query Too Slow Data Slicing
  35. 35. Hadoop Cluster 1 Hadoop Cluster 2 Teradata Tableau Server User Data Transfer Request ETL Extract Data Insufficient Space Data Slicing
  36. 36. Hadoop Cluster 1 Hadoop Cluster 2 Teradata Tableau Server User Data Transfer Request ETL Extract Data Every Day Table View Statistical Tables Data Slicing
  37. 37. USER EXPERIENCE TUNING 0 30 60 90 120 150 HIVE TERADATA BI 120X Faster
  38. 38. HOW TO CHOOSE THE COMPONENT IN YOUR BI FRAMEWORK ? • The cost of data storage • The cost of data management • The time cost of data processing
  39. 39. CONSIDERINGS AND SUGGESTIONS • Time is money • HDD space/ money for the time • Understanding the components and relationships • Get balance of the needs and costs • Good framework will help business growth
  40. 40. COST CURVE Business Growth CostofBusinessGrowth
  41. 41. Hardware *More Nodes *More Memories *Graph Card … Software *Spark *Tez *Tachyon *Algorithm … IN THE FUTURE Cloud *EC2 *Big Query *Bluemix *SAP …
  42. 42. THANK YOU FOR YOUR LISTENING Special Thank Vpon Hood, Meiyen, Gil and OPS Team
  43. 43. Q & A

×