Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus

302 views

Published on

9de BI congres van het BICC-Thomas More: 24 maart 2016

De hoeveelheid data die via weblogs verzameld wordt, neemt steeds meer toe. Lisa Truyers zet aan de hand van een praktische case uiteen hoe Keyrus hiermee aan de slag ging

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus

  1. 1. BIG DATA ANALYTICS BUSINESS INTELLIGENCE INFORMATION MANAGEMENT PERFORMANCE MANAGEMENT
  2. 2. © Copyright 2015 – Keyrus 2 DIVING INTO WEBLOG DATA WITH SAS ON HADOOP Lisa Truyers, Data Scientist Consultant at Keyrus March 24, 2016 Logo
  3. 3. © Copyright 2015 – Keyrus 3 Project summary WHO HAS EVER TRIED TO OPEN A 1 GB FILE ON A COMPUTER?
  4. 4. © Copyright 2015 – Keyrus 4 What is Hadoop? Project summary Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  5. 5. © Copyright 2015 – Keyrus 5 PROS  Open-source software framework  Storage and large-scale data processing  Easy and economic scaling  Both structured and unstructured data  Low-cost commodity hardware  Starts multiple copies of the same task for the same block of data What is Hadoop? 51% OF COMPANIES THINKS ABOUT INTEGRATING HADOOP IN THEIR COMPANY BY 2016 Philip Russom, TDWI Best Practices Report= Integrating Hadoop into Business
  6. 6. © Copyright 2015 – Keyrus 6 CONS  Management and high-availability capabilities are just starting to emerge  Data security is fragmented  MapReduce is very batch-oriented  No easy-to-use, full-feature tools for data integration, data cleansing, governance and metadata  Lacking skilled professionals What is Hadoop? MANAGE THE DATA AND USE ANALYTICS TO QUICKLY IDENTIFY PREVIOUSLY UNKNOWN INSIGHTS: ACCESS THE DIFFERENT TOOLS OF SAS
  7. 7. © Copyright 2015 – Keyrus 7 WHAT ARE COMPANIES DOING WITH HADOOP? The percentages mentioned here cover the whole world, not only Europe. What is Hadoop? What? Percentage Data warehouse extensions 46 % Data exploration and discovery 46 % Data staging for data warehousing and data integration 39 % Data lake 39 % Queryable archive for non-traditional data 36 % Computational platform and sandbox for advanced analytics 33 %
  8. 8. © Copyright 2015 – Keyrus 8 WHY IS HADOOP (NOT) IMPORTANT? “Cost savings. Linear scalability. Evaluate ‘the hype’ practically. Complement BI.” BI architect, telecom, Europe “Reduces cost of data. New ability to query big data sets. Supply chain improvements. Predictive analytics.” Vice president, food and beverage, Asia “Our existing infrastructure cannot handle the tenfold increase in data volumes.” Data strategy manager, hospitality, US “It’s important to realize the potential of big data and to explore new business opportunities.” Data specialist, consulting, Asia What is Hadoop?
  9. 9. © Copyright 2015 – Keyrus 9 What is Hadoop? Project summary Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  10. 10. © Copyright 2015 – Keyrus 10 INTRODUCTION Project summary 1. Discover web traffic data • Discover web traffic data • Sheer volume of data makes it impossible to analyse at the moment • Prove the added value of a combined Hadoop – SAS environment 2. Lead generation • More business oriented: scoring a neural network model takes one hour on daily basis • Reducing this time
  11. 11. © Copyright 2015 – Keyrus 11 Project summary What is Hadoop? Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  12. 12. © Copyright 2015 – Keyrus 12 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  13. 13. © Copyright 2015 – Keyrus 13 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  14. 14. © Copyright 2015 – Keyrus 14 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  15. 15. © Copyright 2015 – Keyrus 15 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  16. 16. © Copyright 2015 – Keyrus 16 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  17. 17. © Copyright 2015 – Keyrus 17 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  18. 18. © Copyright 2015 – Keyrus 18 SAS COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® Enterprise Guide®
  19. 19. © Copyright 2015 – Keyrus 19 SAS COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® Enterprise Guide®
  20. 20. © Copyright 2015 – Keyrus 20 Project summary What is Hadoop? Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  21. 21. © Copyright 2015 – Keyrus 21 FULL PROCESS Setup to load data Day A Partitioned, non-parsed for day-files C Partitioned, parsed for day-files Hour B Partitioned, non-parsed for hour-files D Partitioned, parsed for hour-files
  22. 22. © Copyright 2015 – Keyrus 22 Setup to load data
  23. 23. © Copyright 2015 – Keyrus 23 PROCESS C Setup to load data Delete HIVE Table Transfer to Hadoop Parse data Merge Loop
  24. 24. © Copyright 2015 – Keyrus 24 Project summary What is Hadoop? Components of the Hadoop-SAS framework SAS-tools used in this project Setup to load data Benchmarks Lessons learned AGENDA
  25. 25. © Copyright 2015 – Keyrus 25 HADOOP COMPARED TO SERVER Server  Query test one day: 35 seconds  Parsing data on one day: 15 minutes  Parsing of one week: 4hours 30 minutes Benchmarks Hadoop  Query test on one day: 35 seconds  Parsing data on one day: 15 minutes  Parsing of one week: 53 minutes MORE TIME NEEDED FOR EXTRA BENCHMARKS
  26. 26. © Copyright 2015 – Keyrus 26 Project summary What is Hadoop? Components of the Hadoop-SAS framework SAS-tools used in this project Setup to load data Benchmarks Lessons learned AGENDA
  27. 27. © Copyright 2015 – Keyrus 27 Teamwork is key • Set-up Hadoop cluster with Hadoop-experts • Install SAS with experts from the company SAS ON HADOOP  In SAS, take your time to set the correct variable length  Choose the strength of the cluster rationally  Create Benchmarks on both environments (server VS Hadoop) early on so a good comparison can be done and the correct decision can be taken  Data must be large enough on Hadoop to see a difference Lessons learned
  28. 28. THANK YOU FOR YOUR ATTENTION To contact us www.keyrus.com contact@keyrus.com

×